686. Repeated String Match


Given two strings A and B, find the minimum number of times A has to be repeated such that B is a substring of it. If no such solution, return -1.

For example, with A = "abcd" and B = "cdabcdab".

Return 3, because by repeating A three times (“abcdabcdabcd”), B is a substring of it; and B is not a substring of A repeated two times ("abcdabcd").

Note:
The length of A and B will be between 1 and 10000.


Approach #1: Ad-Hoc [Accepted]

Intuition

The question can be summarized as "What is the smallest k for which B is a substring of A * k?" We can just try every k.

Algorithm

Imagine we wrote S = A+A+A+.... If B is to be a substring of S, we only need to check whether some S[0:], S[1:], ..., S[len(A) - 1:] starts with B, as S is long enough to contain B, and S has period at most len(A).

Now, suppose q is the least number for which len(B) <= len(A * q). We only need to check whether B is a substring of A * q or A * (q+1). If we try k < q, then B has larger length than A * q and therefore can't be a substring. When k = q+1, A * k is already big enough to try all positions for B; namely, A[i:i+len(B)] == B for i = 0, 1, ..., len(A) - 1.

Complexity Analysis

  • Time Complexity: , where are the lengths of strings A, B. We create two strings A * q, A * (q+1) which have length at most O(M+N). When checking whether B is a substring of A, this check takes naively the product of their lengths.

  • Space complexity: As justified above, we created strings that used space.


Approach #2: Rabin-Karp (Rolling Hash) [Accepted]

Intuition

As in Approach #1, we've reduced the problem to deciding whether B is a substring of some A * k. Using the following technique, we can decide whether B is a substring in time.

Algorithm

For strings , consider each as some integer ASCII code. Then for some prime , consider the following function modulo some prime modulus :

Notably, . This shows we can get the hash of every substring of A * q in time complexity linear to it's size. (We will also use the fact that .)

However, hashes may collide haphazardly. To be absolutely sure in theory, we should check the answer in the usual way. The expected number of checks we make is in the order of where is the number of substrings we computed hashes for (assuming the hashes are equally distributed), which is effectively 1.

Complexity Analysis

  • Time Complexity: (at these sizes), where are the lengths of strings A, B. As in Approach #1, we justify that A * (q+1) will be of length , and computing the rolling hashes was linear work. We will also do a linear final check of our answer times. In total, this is work. Since , we can consider this to be linear behavior.

  • Space complexity: . Only integers were stored with additional memory.


Analysis written by: @awice