10. Regular Expression Matching


Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

Solution


Approach #1: Recursion [Accepted]

Intuition

If there were no Kleene stars (the * wildcard character for regular expressions), the problem would be easier - we simply check from left to right if each character of the text matches the pattern.

When a star is present, we may need to check many different suffixes of the text and see if they match the rest of the pattern. A recursive solution is a straightforward way to represent this relationship.

Algorithm

Without a Kleene star, our solution would look like this:

If a star is present in the pattern, it will be in the second position . Then, we may ignore this part of the pattern, or delete a matching character in the text. If we have a match on the remaining strings after any of these operations, then the initial inputs matched.

Complexity Analysis

  • Time Complexity: Let be the lengths of the text and the pattern respectively. In the worst case, a call to match(text[i:], pattern[2j:]) will be made times, and strings of the order and will be made. Thus, the complexity has the order . With some effort outside the scope of this article, we can show this is bounded by .

  • Space Complexity: For every call to match, we will create those strings as described above, possibly creating duplicates. If memory is not freed, this will also take a total of space, even though there are only order unique suffixes of and that are actually required.


Approach #2: Dynamic Programming [Accepted]

Intuition

As the problem has an optimal substructure, it is natural to cache intermediate results. We ask the question : does and match? We can describe our answer in terms of answers to questions involving smaller strings.

Algorithm

We proceed with the same recursion as in Approach #1, except because calls will only ever be made to match(text[i:], pattern[j:]), we use to handle those calls instead, saving us expensive string-building operations and allowing us to cache the intermediate results.

Top-Down Variation

Bottom-Up Variation

Complexity Analysis

  • Time Complexity: Let be the lengths of the text and the pattern respectively. The work for every call to dp(i, j) for ; is done once, and it is work. Hence, the time complexity is .

  • Space Complexity: The only memory we use is the boolean entries in our cache. Hence, the space complexity is .


Analysis written by: @awice