Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
O(n2)
.Credits:
Special thanks to @jianchao.li.fighter for adding this problem and creating all test cases.
The first two approaches mentioned do not satisfy the constraints given in the prompt, but they are solutions that you might be likely to come up with during a technical interview. As an interviewer, I personally would not expect someone to come up with the cycle detection solution unless they have heard it before.
Proving that at least one duplicate must exist in nums
is simple
application of the
pigeonhole principle.
Here, each number in nums
is a "pigeon" and each distinct number that can
appear in nums
is a "pigeonhole". Because there are numbers are
distinct possible numbers, the pigeonhole principle implies that at
least one of the numbers is duplicated.
Intuition
If the numbers are sorted, then any duplicate numbers will be adjacent in the sorted array.
Algorithm
Given the intuition, the algorithm follows fairly simply. First, we sort the array, and then we compare each element to the previous element. Because there is exactly one duplicated element in the array, we know that the array is of at least length 2, and we can return the duplicate element as soon as we find it.
Complexity Analysis
Time complexity :
The sort
invocation costs time in Python and Java, so it
dominates the subsequent linear scan.
Space complexity : (or )
Here, we sort nums
in place, so the memory footprint is constant. If we
cannot modify the input array, then we must allocate linear space for a
copy of nums
and sort that instead.
Intuition
If we store each element as we iterate over the array, we can simply check each element as we iterate over the array.
Algorithm
In order to achieve linear time complexity, we need to be able to insert
elements into a data structure (and look them up) in constant time. A Set
satisfies these constraints nicely, so we iterate over the array and insert
each element into seen
. Before inserting it, we check whether it is already
there. If it is, then we found our duplicate, so we return it.
Complexity Analysis
Time complexity :
Set
in both Python and Java rely on underlying hash tables, so
insertion and lookup have amortized constant time complexities. The
algorithm is therefore linear, as it consists of a for
loop that
performs constant work times.
Space complexity :
In the worst case, the duplicate element appears twice, with one of its
appearances at array index . In this case, seen
will contain
distinct values, and will therefore occupy space.
Intuition
If we interpret nums
such that for each pair of index and value
, the "next" value is at index , we can reduce this
problem to cycle detection. See the solution to
Linked List Cycle II
for more details.
Algorithm
First off, we can easily show that the constraints of the problem imply that
a cycle must exist. Because each number in nums
is between and
, it will necessarily point to an index that exists. Therefore, the list
can be traversed infinitely, which implies that there is a cycle.
Additionally, because cannot appear as a value in nums
, nums[0]
cannot be part of the cycle. Therefore, traversing the array in this manner
from nums[0]
is equivalent to traversing a cyclic linked list. Given this,
the problem can be solved just like
Linked List Cycle II.
To see the algorithm in action, check out the animation below:
!?!../Documents/287_Find_the_Duplicate_Number.json:1280,720!?!
Complexity Analysis
Time complexity :
For detailed analysis, refer to Linked List Cycle II.
Space complexity :
For detailed analysis, refer to Linked List Cycle II.
Analysis and solutions written by: @emptyset