Given an array of integers, find out whether there are two distinct indices i and j in the array such that the absolute difference between nums[i] and nums[j] is at most t and the absolute difference between i and j is at most k.
This article is for intermediate readers. It introduces the following ideas: Binary Search Tree, HashMap, and Buckets.
Intuition
Compare each element with the previous elements and see if their difference is at most .
Algorithm
This problem requires us to find and such that the following conditions are satisfied:
The naive approach is the same as Approach #1 in Contains Duplicate II solution, which keeps a virtual sliding window that holds the newest elements. In this way, Condition 1 above is always satisfied. We then check if Condition 2 is also satisfied by applying linear search.
Java
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { for (int i = 0; i < nums.length; ++i) { for (int j = Math.max(i - k, 0); j < i; ++j) { if (Math.abs(nums[i] - nums[j]) <= t) return true; } } return false; } // Time limit exceeded.
Complexity Analysis
Time complexity : . It costs time for each linear search. Note that we do at most comparisons in one search even if can be larger than .
Space complexity : . We only used constant auxiliary space.
Intuition
If elements in the window are maintained in sorted order, we can apply binary search twice to check if Condition 2 is satisfied.
By utilizing self-balancing Binary Search Tree, one can keep the window ordered at all times with logarithmic time insert
and delete
.
Algorithm
The real bottleneck of Approach #1 is due to all elements in the sliding window are being scanned to check if Condition 2 is satisfied. Could we do better?
If elements in the window are in sorted order, we can apply Binary Search twice to search for the two boundaries and for each element .
Unfortunately, the window is unsorted. A common mistake here is attempting to maintain a sorted array. Although searching in a sorted array costs only logarithmic time, keeping the order of the elements after insert
and delete
operation is not as efficient. Imagine you have a sorted array with elements and you are adding a new item . Even if you can find the correct position in time, you still need time to insert into the sorted array. The reason is that you need to shift all elements after the insert position one step backward. The same reasoning applies to removal as well. After removing an item from position , you need to shift all elements after one step forward. Thus, we gain nothing in speed compared to the naive linear search approach above.
To gain an actual speedup, we need a dynamic data structure that supports faster insert
, search
and delete
. Self-balancing Binary Search Tree (BST) is the right data structure. The term Self-balancing means the tree automatically keeps its height small after arbitrary insert
and delete
operations. Why does self-balancing matter? That is because most operations on a BST take time directly proportional to the height of the tree. Take a look at the following non-balanced BST which is skewed to the left:
6 / 5 / 4 / 3 / 2 / 1
Figure 1. A non-balanced BST that is skewed to the left.
Searching in the above BST degrades to linear time, which is like searching in a linked list. Now compare to the BST below which is balanced:
4 / \ 2 6 / \ / 1 3 5
Figure 2. A balanced BST.
Assume that is the total number of nodes in the tree, a balanced binary tree maintains its height in the order of . Thus it supports time for each of insert
, search
and delete
operations.
Here is the entire algorithm in pseudocode:
set
set
that is greater than or equal to , return true if
set
that is smaller than or equal to , return true if
set
One may consider the smallest element that is greater or equal to as the successor of in the BST, as in: "What is the next greater value of ?". Similarly, we consider the greatest element that is smaller or equal to as the predecessor of in the BST, as in: "What is the previous smaller value of ?". These two values and are the two closest neighbors from . Thus by checking the distance from them to , we can conclude if Condition 2 is satisfied.
Java
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { TreeSet<Integer> set = new TreeSet<>(); for (int i = 0; i < nums.length; ++i) { // Find the successor of current element Integer s = set.ceiling(nums[i]); if (s != null && s <= nums[i] + t) return true; // Find the predecessor of current element Integer g = set.floor(nums[i]); if (g != null && nums[i] <= g + t) return true; set.add(nums[i]); if (set.size() > k) { set.remove(nums[i - k]); } } return false; }
Complexity Analysis
Time complexity : .
We iterate through the array of size . For each iteration, it costs time (search
, insert
or delete
) in the BST, since the size of the BST is upper bounded by both and .
Space complexity : . Space is dominated by the size of the BST, which is upper bounded by both and .
Note
When the array's elements and 's value are large, they can cause overflow in arithmetic operation. Consider using a larger size data type instead, such as long.
C++'s std::set
, std::set::upper_bound
and std::set::lower_bound
are equivalent to Java's TreeSet
, TreeSet::ceiling
and TreeSet::floor
, respectively. Python does not provide a Self-balancing BST through its library.
Intuition
Inspired by bucket sort
, we can achieve linear time complexity in our problem using buckets as window.
Algorithm
Bucket sort is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, using a different sorting algorithm. Here is an illustration of buckets.
Figure 3. Illustration of buckets.
From the above example, we have 8 unsorted integers. We create 5 buckets covering the inclusive ranges of individually. Each of the eight elements is in a particular bucket. For element with value , its bucket label is and here we have . Sort each bucket using some other sorting algorithm and then collect all of them bucket by bucket.
Back to our problem, the critical issue we are trying to solve is:
- For a given element is there an item in the window that is within the range of ?
- Could we do this in constant time?
Let us consider an example where each element is a person's birthday. Your birthday, say some day in March, is the new element . Suppose that each month has days and you want to know if anyone has a birthday within days of yours. Immediately, we can rule out all other months except February, March, April.
The reason we know this is because each birthday belongs to a bucket we called month! And the range covered by the buckets are the same as distance which simplifies things a lot. Any two elements that are not in the same or adjacent buckets must have a distance greater than .
We apply the above bucketing principle and design buckets covering the ranges of . We keep the window using this buckets. Then, each time, all we need to check is the bucket that belongs to and its two adjacent buckets. Thus, we have a constant time algorithm for searching almost duplicate in the window.
One thing worth mentioning is the difference from bucket sort – Each of our buckets contains at most one element at any time, because two elements in a bucket means "almost duplicate" and we can return early from the function. Therefore, a HashMap with an element associated with a bucket label is enough for our purpose.
Java
public class Solution { // Get the ID of the bucket from element value x and bucket width w // In Java, `-3 / 5 = 0` and but we need `-3 / 5 = -1`. private long getID(long x, long w) { return x < 0 ? (x + 1) / w - 1 : x / w; } public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (t < 0) return false; Map<Long, Long> d = new HashMap<>(); long w = (long)t + 1; for (int i = 0; i < nums.length; ++i) { long m = getID(nums[i], w); // check if bucket m is empty, each bucket may contain at most one element if (d.containsKey(m)) return true; // check the neighbor buckets for almost duplicate if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w) return true; if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w) return true; // now bucket m is empty and no almost duplicate in neighbor buckets d.put(m, (long)nums[i]); if (i >= k) d.remove(getID(nums[i - k], w)); } return false; } }
Complexity Analysis
Time complexity : . For each of the elements, we do at most three searches, one insert, and one delete on the HashMap, which costs constant time on average. Thus, the entire algorithm costs time.
Space complexity : . Space is dominated by the HashMap, which is linear to the size of its elements. The size of the HashMap is upper bounded by both and . Thus the space complexity is .