HashSet is a type of data structure that is used for storing unique elements in an unordered manner. A HashSet supports the add
, remove
, and contains
operations, all of which have an average time complexity of O(1) under the assumption that the number of keys are uniformly distributed.
In this article, we will focus on solving the LeetCode problem 705 – “Design HashSet,” using Python. In this problem, we need to design our own HashSet without using any built-in hash table libraries. We will discuss different solutions ranging from straightforward to optimized approaches, helping you understand the concept thoroughly.
Problem Statement
To be more precise, the problem asks us to implement the MyHashSet
class:
void add(key)
: Inserts the valuekey
into the HashSet.bool contains(key)
: Returns whether the valuekey
exists in the HashSet or not.void remove(key)
: Removes the valuekey
in the HashSet. Ifkey
does not exist in the HashSet, do nothing.
Method 1: Using an Array
Let’s first start with a simple solution using a list in Python.
class MyHashSet:
def __init__(self):
self.set = []
def add(self, key: int) -> None:
if not self.contains(key):
self.set.append(key)
def remove(self, key: int) -> None:
if self.contains(key):
self.set.remove(key)
def contains(self, key: int) -> bool:
return key in self.set
In this simple solution, we use an array (self.set
) to store the elements. While this solution works, it has a high time complexity due to searching in an array, which takes O(n) time. This would be inefficient when dealing with a large number of elements.
Method 2: Using a Boolean Array
To optimize our solution, we can use a boolean array. This solution assumes that the keys are non-negative and fit into the memory.
class MyHashSet:
def __init__(self):
self.set = [False]*1000001 # Initialize a boolean array
def add(self, key: int) -> None:
self.set[key] = True
def remove(self, key: int) -> None:
self.set[key] = False
def contains(self, key: int) -> bool:
return self.set[key]
In this solution, the index of the array represents the key, and the value (True/False) indicates whether the key is present. The add, remove, and contains operations all have O(1) time complexity, making this a more efficient solution.
Method 3: Using a Bucket and LinkedList
While the previous method is efficient, it assumes that keys are non-negative and have an upper bound. To make our HashSet more flexible, we can implement it using an array of linked lists, often called “buckets.”
In this method, we first map the key to a specific bucket using a hash function. If a collision happens (two different keys map to the same bucket), we use a linked list to store all the keys.
Let’s implement this in Python:
class Node:
def __init__(self, value, nextNode=None):
self.value = value
self.next = nextNode
class Bucket:
def __init__(self):
# A dummy head node
self.head = Node(0)
def insert(self, newValue):
# Check the key is already existed or not
if not self.exists(newValue):
newNode = Node(newValue, self.head.next)
self.head.next = newNode
# Else: do not insert the new value since it's already existed
def delete(self, value):
prev = self.head
curr = self.head.next
while curr is not None:
if curr.value == value:
# Delete the current node
prev.next = curr.next
return
prev = curr
curr = curr.next
def exists(self, value):
curr = self.head.next
while curr is not None:
if curr.value == value:
return True
curr = curr.next
return False
class MyHashSet:
def __init__(self):
self.size = 1997 # size of the bucket list
self.buckets = [Bucket() for i in range(self.size)]
def add(self, key: int) -> None:
self.buckets[key % self.size].insert(key)
def remove(self, key: int) -> None:
self.buckets[key % self.size].delete(key)
def contains(self, key: int) -> bool:
return self.buckets[key % self.size].exists(key)
In this approach, we implement a singly linked list for each bucket. We chose the size of the bucket list to be a prime number (1997) to reduce the chance of mapping different keys to the same bucket, which can lead to a performance degradation. This approach provides a balance between the time and space complexity and makes fewer assumptions about the keys.
Conclusion
Designing a HashSet data structure is a great exercise to understand hashing, a crucial concept in computer science. In this article, we have explored different ways to implement a HashSet to solve the LeetCode problem “Design HashSet.”
We started with a simple approach using an array, then improved it using a boolean array, and finally implemented an optimized approach using a bucket of linked lists, suitable for a wider range of applications.