527. Word Abbreviation


Given an array of n distinct non-empty strings, you need to generate minimal possible abbreviations for every word following rules below.

  1. Begin with the first character and then the number of characters abbreviated, which followed by the last character.
  2. If there are any conflict, that is more than one words share the same abbreviation, a longer prefix is used instead of only the first character until making the map from word to abbreviation become unique. In other words, a final abbreviation cannot map to more than one original words.
  3. If the abbreviation doesn't make the word shorter, then keep it as original.

Example:

Input: ["like", "god", "internal", "me", "internet", "interval", "intension", "face", "intrusion"]
Output: ["l2e","god","internal","me","i6t","interval","inte4n","f2e","intr4n"]

Note:
  1. Both n and the length of each word will not exceed 400.
  2. The length of each word is greater than 1.
  3. The words consist of lowercase English letters only.
  4. The return answers should be in the same order as the original array.

Approach #1: Greedy [Accepted]

Intuition

Let's choose the shortest abbreviation for each word. Then, while we have duplicates, we'll increase the length of all duplicates.

Algorithm

For example, let's say we have "aabaaa", "aacaaa", "aacdaa", then we start with "a4a", "a4a", "a4a". Since these are duplicated, we lengthen them to "aa3a", "aa3a", "aa3a". They are still duplicated, so we lengthen them to "aab2a", "aac2a", "aac2a". The last two are still duplicated, so we lengthen them to "aacaaa", "aacdaa".

Throughout this process, we were tracking an index prefix[i] which told us up to what index to take the prefix to. For example, prefix[i] = 2 means to take a prefix of word[0], word[1], word[2].

Complexity Analysis

  • Time Complexity: where is the number of characters across all words in the given array.

  • Space Complexity: .


Approach #2: Group + Least Common Prefix [Accepted]

Intuition and Algorithm

Two words are only eligible to have the same abbreviation if they have the same first letter, last letter, and length. Let's group each word based on these properties, and then sort out the conflicts.

In each group G, if a word W has a longest common prefix P with any other word X in G, then our abbreviation must contain a prefix of more than |P| characters. The longest common prefixes must occur with words adjacent to W (in lexicographical order), so we can just sort G and look at the adjacent words.

Complexity Analysis

  • Time Complexity: where is the number of characters across all words in the given array. The complexity is dominated by the sorting step.

  • Space Complexity: .


Approach #3: Group + Trie [Accepted]

Intuition and Algorithm

As in Approach #1, let's group words based on length, first letter, and last letter, and discuss when words in a group do not share a longest common prefix.

Put the words of a group into a trie (prefix tree), and count at each node (representing some prefix P) the number of words with prefix P. If the count is 1, we know the prefix is unique.

Complexity Analysis

  • Time Complexity: where is the number of characters across all words in the given array.

  • Space Complexity: .


Analysis written by: @awice. Approach #1 inspired by @ckcz123.