737. Sentence Similarity II


Given two sentences words1, words2 (each represented as an array of strings), and a list of similar word pairs pairs, determine if two sentences are similar.

For example, words1 = ["great", "acting", "skills"] and words2 = ["fine", "drama", "talent"] are similar, if the similar word pairs are pairs = [["great", "good"], ["fine", "good"], ["acting","drama"], ["skills","talent"]].

Note that the similarity relation is transitive. For example, if "great" and "good" are similar, and "fine" and "good" are similar, then "great" and "fine" are similar.

Similarity is also symmetric. For example, "great" and "fine" being similar is the same as "fine" and "great" being similar.

Also, a word is always similar with itself. For example, the sentences words1 = ["great"], words2 = ["great"], pairs = [] are similar, even though there are no specified similar word pairs.

Finally, sentences can only be similar if they have the same number of words. So a sentence like words1 = ["great"] can never be similar to words2 = ["doubleplus","good"].

Note:

  • The length of words1 and words2 will not exceed 1000.
  • The length of pairs will not exceed 2000.
  • The length of each pairs[i] will be 2.
  • The length of each words[i] and pairs[i][j] will be in the range [1, 20].

  • Approach #1: Depth-First Search [Accepted]

    Intuition

    Two words are similar if they are the same, or there is a path connecting them from edges represented by pairs.

    We can check whether this path exists by performing a depth-first search from a word and seeing if we reach the other word. The search is performed on the underlying graph specified by the edges in pairs.

    Algorithm

    We start by building our graph from the edges in pairs.

    The specific algorithm we go for is an iterative depth-first search. The implementation we go for is a typical "visitor pattern": when searching whether there is a path from w1 = words1[i] to w2 = words2[i], stack will contain all the nodes that are queued up for processing, while seen will be all the nodes that have been queued for processing (whether they have been processed or not).

    Complexity Analysis

    • Time Complexity: , where is the maximum length of words1 and words2, and is the length of pairs. Each of searches could search the entire graph.

    • Space Complexity: , the size of pairs.


    Approach #2: Union-Find [Accepted]

    Intuition

    As in Approach #1, we want to know if there is path connecting two words from edges represented by pairs.

    Our problem comes down to finding the connected components of a graph. This is a natural fit for a Disjoint Set Union (DSU) structure.

    Algorithm

    Draw edges between words if they are similar. For easier interoperability between our DSU template, we will map each word to some integer ix = index[word]. Then, dsu.find(ix) will tell us a unique id representing what component that word is in.

    For more information on DSU, please look at Approach #2 in the article here. For brevity, the solutions showcased below do not use union-by-rank.

    After putting each word in pairs into our DSU template, we check successive pairs of words w1, w2 = words1[i], words2[i]. We require that w1 == w2, or w1 and w2 are in the same component. This is easily checked using dsu.find.

    Complexity Analysis

    • Time Complexity: , where is the maximum length of words1 and words2, and is the length of pairs. If we used union-by-rank, this complexity improves to , where is the Inverse-Ackermann function.

    • Space Complexity: , the size of pairs.


    Analysis written by: @awice.