
Introduction
In programming, sets are a commonly used data structure to store collections of distinct elements. Set manipulation and analysis form the foundation of many coding challenges. In this article, we’ll deep dive into solving the “Set Mismatch” problem from Leetcode, which is based on set manipulation. We will analyze the problem statement, discuss different approaches to solve the problem, and provide Python code for each of these solutions.
Problem Statement
You have a set of integers s
, which originally contains all the numbers from 1
to n
. Unfortunately, due to some error, one of the numbers in the set got duplicated to another number in the set, which results in repetition of one number and loss of another number.
You are given an integer array nums
representing the data status of this set after the error.
Find the number that occurs twice and the number that is missing and return them in the form of an array.
Example:
Input: nums = [1,2,2,4]
Output: [2,3]
Note:
2 <= nums.length <= 10^4
1 <= nums[i] <= 10^4
Solution
1. Sorting Approach
The first approach that comes to mind is sorting the array and then iterating through it to find the duplicated and missing numbers.
- Sort the
nums
array. - Iterate through the sorted array:
- If
nums[i]
is equal tonums[i-1]
, the duplicated number is found. - If
nums[i]
is greater thannums[i-1] + 1
, the missing number is found.
- If
- If the missing number is not found in the iteration, it must be
n
.
def find_error_nums(nums):
nums.sort()
dup = -1
missing = 1
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
dup = nums[i]
elif nums[i] > nums[i - 1] + 1:
missing = nums[i - 1] + 1
return [dup, missing if missing > 1 else len(nums)]
Time Complexity:
- O(n log n), where n is the number of elements in the array, due to the sorting step.
Space Complexity:
- O(1), as we are using a constant amount of space.
2. Hashing Approach
We can make use of hashing to keep track of the frequency of each number in the array.
- Initialize a hash table or a list to store the frequency of each number in the array.
- Iterate through the array, updating the frequency table.
- Iterate from
1
ton
, checking the frequency table to find the duplicated and missing numbers.
def find_error_nums(nums):
frequency = [0] * (len(nums) + 1)
for num in nums:
frequency[num] += 1
dup = missing = -1
for i in range(1, len(frequency)):
if frequency[i] == 2:
dup = i
elif frequency[i] == 0:
missing = i
return [dup, missing]
Time Complexity:
- O(n), where n is the number of elements in the array.
Space Complexity:
- O(n), due to the additional space required for the frequency table.
3. Mathematical Approach
We can make use of the properties of sets to find the duplicate and missing numbers mathematically without additional space.
- Calculate the sum of the numbers in the array, call this
actual_sum
. - Calculate the sum of squares of the numbers in the array, call this
actual_sum_squares
. - Calculate the sum and sum of squares for the first
n
natural numbers. - Use the calculated sums to find the duplicated and missing numbers.
def find_error_nums(nums):
n = len(nums)
actual_sum = sum(nums)
actual_sum_squares = sum(x**2 for x in nums)
expected_sum = n * (n + 1) // 2
expected_sum_squares = n * (n + 1) * (2 * n + 1) // 6
delta_sum = expected_sum - actual_sum
delta_sum_squares = expected_sum_squares - actual_sum_squares
duplicated = (delta_sum_squares - delta_sum**2) // (2 * delta_sum)
missing = duplicated + delta_sum
return [duplicated, missing]
Time Complexity:
- O(n), where n is the number of elements in the array.
Space Complexity:
- O(1), as we are using a constant amount of space.
Conclusion
In this article, we explored the “Set Mismatch” problem on Leetcode and discussed three different approaches in Python: sorting approach, hashing approach, and mathematical approach. The mathematical approach is the most optimized solution in terms of both time and space complexity.