Leetcode – Design HashSet Solution in Python

Spread the love

HashSet is a type of data structure that is used for storing unique elements in an unordered manner. A HashSet supports the add, remove, and contains operations, all of which have an average time complexity of O(1) under the assumption that the number of keys are uniformly distributed.

In this article, we will focus on solving the LeetCode problem 705 – “Design HashSet,” using Python. In this problem, we need to design our own HashSet without using any built-in hash table libraries. We will discuss different solutions ranging from straightforward to optimized approaches, helping you understand the concept thoroughly.

Problem Statement

To be more precise, the problem asks us to implement the MyHashSet class:

  • void add(key): Inserts the value key into the HashSet.
  • bool contains(key): Returns whether the value key exists in the HashSet or not.
  • void remove(key): Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.

Method 1: Using an Array

Let’s first start with a simple solution using a list in Python.

class MyHashSet:

    def __init__(self):
        self.set = []

    def add(self, key: int) -> None:
        if not self.contains(key):
            self.set.append(key)

    def remove(self, key: int) -> None:
        if self.contains(key):
            self.set.remove(key)

    def contains(self, key: int) -> bool:
        return key in self.set

In this simple solution, we use an array (self.set) to store the elements. While this solution works, it has a high time complexity due to searching in an array, which takes O(n) time. This would be inefficient when dealing with a large number of elements.

Method 2: Using a Boolean Array

To optimize our solution, we can use a boolean array. This solution assumes that the keys are non-negative and fit into the memory.

class MyHashSet:

    def __init__(self):
        self.set = [False]*1000001  # Initialize a boolean array

    def add(self, key: int) -> None:
        self.set[key] = True

    def remove(self, key: int) -> None:
        self.set[key] = False

    def contains(self, key: int) -> bool:
        return self.set[key]

In this solution, the index of the array represents the key, and the value (True/False) indicates whether the key is present. The add, remove, and contains operations all have O(1) time complexity, making this a more efficient solution.

Method 3: Using a Bucket and LinkedList

While the previous method is efficient, it assumes that keys are non-negative and have an upper bound. To make our HashSet more flexible, we can implement it using an array of linked lists, often called “buckets.”

In this method, we first map the key to a specific bucket using a hash function. If a collision happens (two different keys map to the same bucket), we use a linked list to store all the keys.

Let’s implement this in Python:

class Node:
    def __init__(self, value, nextNode=None):
        self.value = value
        self.next = nextNode

class Bucket:
    def __init__(self):
        # A dummy head node
        self.head = Node(0)

    def insert(self, newValue):
        # Check the key is already existed or not
        if not self.exists(newValue):
            newNode = Node(newValue, self.head.next)
            self.head.next = newNode
        # Else: do not insert the new value since it's already existed

    def delete(self, value):
        prev = self.head
        curr = self.head.next
        while curr is not None:
            if curr.value == value:
                # Delete the current node
                prev.next = curr.next
                return
            prev = curr
            curr = curr.next

    def exists(self, value):
        curr = self.head.next
        while curr is not None:
            if curr.value == value:
                return True
            curr = curr.next
        return False

class MyHashSet:

    def __init__(self):
        self.size = 1997  # size of the bucket list
        self.buckets = [Bucket() for i in range(self.size)]

    def add(self, key: int) -> None:
        self.buckets[key % self.size].insert(key)

    def remove(self, key: int) -> None:
        self.buckets[key % self.size].delete(key)

    def contains(self, key: int) -> bool:
        return self.buckets[key % self.size].exists(key)

In this approach, we implement a singly linked list for each bucket. We chose the size of the bucket list to be a prime number (1997) to reduce the chance of mapping different keys to the same bucket, which can lead to a performance degradation. This approach provides a balance between the time and space complexity and makes fewer assumptions about the keys.

Conclusion

Designing a HashSet data structure is a great exercise to understand hashing, a crucial concept in computer science. In this article, we have explored different ways to implement a HashSet to solve the LeetCode problem “Design HashSet.”

We started with a simple approach using an array, then improved it using a boolean array, and finally implemented an optimized approach using a bucket of linked lists, suitable for a wider range of applications.

Leave a Reply