220. Contains Duplicate III


Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j] is at most t and the absolute difference between i and j is at most k.


Summary

This article is for intermediate readers. It introduces the following ideas: Binary Search Tree, HashMap, and Buckets.

Solutions


Approach #1 (Naive Linear Search) [Time Limit Exceeded]

Intuition

Compare each element with the previous elements and see if their difference is at most .

Algorithm

This problem requires us to find and such that the following conditions are satisfied:

The naive approach is the same as Approach #1 in Contains Duplicate II solution, which keeps a virtual sliding window that holds the newest elements. In this way, Condition 1 above is always satisfied. We then check if Condition 2 is also satisfied by applying linear search.

Java

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
    for (int i = 0; i < nums.length; ++i) {
        for (int j = Math.max(i - k, 0); j < i; ++j) {
            if (Math.abs(nums[i] - nums[j]) <= t) return true;
        }
    }
    return false;
}
// Time limit exceeded.

Complexity Analysis

  • Time complexity : . It costs time for each linear search. Note that we do at most comparisons in one search even if can be larger than .

  • Space complexity : . We only used constant auxiliary space.


Approach #2 (Binary Search Tree) [Accepted]

Intuition

  • If elements in the window are maintained in sorted order, we can apply binary search twice to check if Condition 2 is satisfied.

  • By utilizing self-balancing Binary Search Tree, one can keep the window ordered at all times with logarithmic time insert and delete.

Algorithm

The real bottleneck of Approach #1 is due to all elements in the sliding window are being scanned to check if Condition 2 is satisfied. Could we do better?

If elements in the window are in sorted order, we can apply Binary Search twice to search for the two boundaries and for each element .

Unfortunately, the window is unsorted. A common mistake here is attempting to maintain a sorted array. Although searching in a sorted array costs only logarithmic time, keeping the order of the elements after insert and delete operation is not as efficient. Imagine you have a sorted array with elements and you are adding a new item . Even if you can find the correct position in time, you still need time to insert into the sorted array. The reason is that you need to shift all elements after the insert position one step backward. The same reasoning applies to removal as well. After removing an item from position , you need to shift all elements after one step forward. Thus, we gain nothing in speed compared to the naive linear search approach above.

To gain an actual speedup, we need a dynamic data structure that supports faster insert, search and delete. Self-balancing Binary Search Tree (BST) is the right data structure. The term Self-balancing means the tree automatically keeps its height small after arbitrary insert and delete operations. Why does self-balancing matter? That is because most operations on a BST take time directly proportional to the height of the tree. Take a look at the following non-balanced BST which is skewed to the left:

            6
           /
          5
         /
        4
       /
      3
     /
    2
   /
  1

Figure 1. A non-balanced BST that is skewed to the left.

Searching in the above BST degrades to linear time, which is like searching in a linked list. Now compare to the BST below which is balanced:

          4
        /   \
       2     6
      / \   /
     1   3  5

Figure 2. A balanced BST.

Assume that is the total number of nodes in the tree, a balanced binary tree maintains its height in the order of . Thus it supports time for each of insert, search and delete operations.

Here is the entire algorithm in pseudocode:

  • Initialize an empty BST set
  • Loop through the array, for each element
    • Find the smallest element in set that is greater than or equal to , return true if
    • Find the greatest element in set that is smaller than or equal to , return true if
    • Put in set
    • If the size of the set is larger than , remove the oldest item.
  • Return false

One may consider the smallest element that is greater or equal to as the successor of in the BST, as in: "What is the next greater value of ?". Similarly, we consider the greatest element that is smaller or equal to as the predecessor of in the BST, as in: "What is the previous smaller value of ?". These two values and are the two closest neighbors from . Thus by checking the distance from them to , we can conclude if Condition 2 is satisfied.

Java

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
    TreeSet<Integer> set = new TreeSet<>();
    for (int i = 0; i < nums.length; ++i) {
        // Find the successor of current element
        Integer s = set.ceiling(nums[i]);
        if (s != null && s <= nums[i] + t) return true;

        // Find the predecessor of current element
        Integer g = set.floor(nums[i]);
        if (g != null && nums[i] <= g + t) return true;

        set.add(nums[i]);
        if (set.size() > k) {
            set.remove(nums[i - k]);
        }
    }
    return false;
}

Complexity Analysis

  • Time complexity : . We iterate through the array of size . For each iteration, it costs time (search, insert or delete) in the BST, since the size of the BST is upper bounded by both and .

  • Space complexity : . Space is dominated by the size of the BST, which is upper bounded by both and .

Note

  • When the array's elements and 's value are large, they can cause overflow in arithmetic operation. Consider using a larger size data type instead, such as long.

  • C++'s std::set, std::set::upper_bound and std::set::lower_bound are equivalent to Java's TreeSet, TreeSet::ceiling and TreeSet::floor, respectively. Python does not provide a Self-balancing BST through its library.


Approach #3 (Buckets) [Accepted]

Intuition

Inspired by bucket sort, we can achieve linear time complexity in our problem using buckets as window.

Algorithm

Bucket sort is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, using a different sorting algorithm. Here is an illustration of buckets.

Illustration of buckets

Figure 3. Illustration of buckets.

From the above example, we have 8 unsorted integers. We create 5 buckets covering the inclusive ranges of individually. Each of the eight elements is in a particular bucket. For element with value , its bucket label is and here we have . Sort each bucket using some other sorting algorithm and then collect all of them bucket by bucket.

Back to our problem, the critical issue we are trying to solve is:

  1. For a given element is there an item in the window that is within the range of ?
  2. Could we do this in constant time?

Let us consider an example where each element is a person's birthday. Your birthday, say some day in March, is the new element . Suppose that each month has days and you want to know if anyone has a birthday within days of yours. Immediately, we can rule out all other months except February, March, April.

The reason we know this is because each birthday belongs to a bucket we called month! And the range covered by the buckets are the same as distance which simplifies things a lot. Any two elements that are not in the same or adjacent buckets must have a distance greater than .

We apply the above bucketing principle and design buckets covering the ranges of . We keep the window using this buckets. Then, each time, all we need to check is the bucket that belongs to and its two adjacent buckets. Thus, we have a constant time algorithm for searching almost duplicate in the window.

One thing worth mentioning is the difference from bucket sort – Each of our buckets contains at most one element at any time, because two elements in a bucket means "almost duplicate" and we can return early from the function. Therefore, a HashMap with an element associated with a bucket label is enough for our purpose.

Java

public class Solution {
    // Get the ID of the bucket from element value x and bucket width w
    // In Java, `-3 / 5 = 0` and but we need `-3 / 5 = -1`.
    private long getID(long x, long w) {
        return x < 0 ? (x + 1) / w - 1 : x / w;
    }

    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
        if (t < 0) return false;
        Map<Long, Long> d = new HashMap<>();
        long w = (long)t + 1;
        for (int i = 0; i < nums.length; ++i) {
            long m = getID(nums[i], w);
            // check if bucket m is empty, each bucket may contain at most one element
            if (d.containsKey(m))
                return true;
            // check the neighbor buckets for almost duplicate
            if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)
                return true;
            if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)
                return true;
            // now bucket m is empty and no almost duplicate in neighbor buckets
            d.put(m, (long)nums[i]);
            if (i >= k) d.remove(getID(nums[i - k], w));
        }
        return false;
    }
}

Complexity Analysis

  • Time complexity : . For each of the elements, we do at most three searches, one insert, and one delete on the HashMap, which costs constant time on average. Thus, the entire algorithm costs time.

  • Space complexity : . Space is dominated by the HashMap, which is linear to the size of its elements. The size of the HashMap is upper bounded by both and . Thus the space complexity is .

See Also