583. Delete Operation for Two Strings


Given two words word1 and word2, find the minimum number of steps required to make word1 and word2 the same, where in each step you can delete one character in either string.

Example 1:

Input: "sea", "eat"
Output: 2
Explanation: You need one step to make "sea" to "ea" and another step to make "eat" to "ea".

Note:

  1. The length of given words won't exceed 500.
  2. Characters in given words can only be lower-case letters.


Solution


Approach #1 Using Longest Common Subsequence [Time Limit Exceeded]

Algorithm

In order to determine the minimum number of delete operations needed, we can make use of the length of the longest common sequence among the two given strings and , say given by . If we can find this value, we can easily determine the required result as . Here, and refer to the length of the two given strings and .

The above equation works because in case of complete mismatch(i.e. if the two strings can't be equalized at all), the total number of delete operations required will be . Now, if there is a common sequence among the two strings of length , we need to do lesser deletions in both the strings leading to a total of lesser deletions, which then leads to the above equation.

In order to find the length of the longest common sequence, we make use of a recursive function lcs(s1,s2,i,j) which returns the length of the longest common sequence among the strings and considering their lengths upto and respectively. For evaluating the function, we check if the characters and for equality. If they match, we can consider the corresponding strings upto 1 lesser lengths since the last characters have already been considered and add 1 to the result to be returned for strings of 1 lesser lengths. Thus, we make the function call lcs(s1, s2, i-1, j-1).

If the last characters don't match, we have two options, either we can consider the second last character of and the last character of , or we can consider the second last character of and the last character of . We need to consider the larger result obtained out of the two considerations for getting the required length.

Thus, the function call lcs(s1,s2,m,n) returns the required value.

Complexity Analysis

  • Time complexity : . Size of recursion tree will be . Here, and refer to the lengths of and respectively.

  • Space complexity : . The depth of the recursion tree will go upto .


Approach #2 Longest Common Subsequence with Memoization [Accepted]

Algorithm

We can observe that in the last approach, while determining the value, a lot of redundant function calls are made, since the same and values to be used for the function calls could be obtained going through many different paths. We can remove this redundancy by making use of a array to store the value to be returned for these function calls if they have been called once with the corresponding parameters. Thus, is used to store the result for the function call lcs(s1,s2,i,j).

Thus, by returning the already stored values from the array, we can prune the search space to a great extent.

Complexity Analysis

  • Time complexity : . array of size x needs to be filled once. Here, and refer to the length of the strings and respectively.

  • Space complexity : . array of size x is used. Also, The depth of the recursion tree will go upto .


Approach #3 Using Longest Common Subsequence- Dynamic Programming [Accepted]

Algorithm

Another method to obtain the value of is to make use of Dynamic Programming. We'll look at the implemenation and carry-on alongside the idea behind it.

We make use of a 2-D , in which represents the length of the longest common subsequence among the strings and considering their lengths upto index and index only respectively. We fill the array in row-by-row order.

In order to fill the entry for , we can have two cases:

  1. The characters and match with each other. In this case, the entry for will be one more than the entry obtained for the strings considering their lengths upto one lesser index, since the matched character adds one to the length of LCS formed till the current indices. Thus, the entry is updated as . Note that has been used because the matched character belongs to both and .

  2. The characters and don't match with each other. In this case, we can't increment the current entry as compared to entries corresponding to the previous indices, but we need to replicate the previous entry again to indicate that the length of LCS upto the current indices also remains the same. But, which entry to pick up? Now, since the current character hasn't matched, we have got two options. We can remove the current character from consideration from either or and use the corresponding entries given by and respectively. Since we are considering the length of LCS upto the current indices we need to pick up the larger entry out of these two to update the current entry.

At the end, again, we obtain the number of deletions required as , where and refer to the lengths of and . now refers to the length of LCS among the two given strings.

!?!../Documents/583_Delete1.json:1000,563!?!

Complexity Analysis

  • Time complexity : . We need to fill in the array of size x. Here, and refer to the lengths of and .

  • Space complexity : . array of size x is used.


Approach #4 Without using LCS Dynamic Programmming [Accepted]:

Algorithm

Instead of finding the length of LCS and then determining the number of deletions required, we can make use of Dynamic Programming to directly determine the number of deletions required till the current indices of the strings.

In order to do so, we make use of a 2-D array. Now, refers to the number of deletions required to equalize the two strings if we consider the strings' length upto index and index for and respectively. Again, we fill in the array in a row-by-row order. Now, in order to fill the entry for , we need to consider two cases only:

  1. The characters and match with each other. In this case, we need to replicate the entry corresponding to itself. This is because, the matched character doesn't need to be deleted from any of the strings.

  2. The characters and don't match with each other. In this case, we need to delete either the current character of or . Thus, an increment of 1 needs to be done relative to the entries corresponding to the previous indices. The two options available at this moment are and . Since, we are keeping track of the minimum number of deletions required, we pick up the minimum out of these two values.

At the end, gives the required minimum number of deletions. Here, and refer to the lengths of and .

!?!../Documents/583_Delete2.json:1000,563!?!

Complexity Analysis

  • Time complexity : . We need to fill in the array of size x. Here, and refer to the lengths of and .

  • Space complexity : . array of size x is used.


Approach #5 1-D Dynamic Programming [Accepted]:

Algorithm

We can observe that in the last approach, in order to update the current entries, we need only the values of the previous row of . Thus, rather than using a 2-D array, we can do the same job by making use of a 1-D array.

Thus, now, refers to the number of deletions that need to be made in order to equalize the strings and if we consider string upto the index and string upto the last to current index of .

Now, we make the updations for the current row in an array of the same size as , and use the entries as if they correspond to the previous row's entries. When, the whole array has been filled, we copy it the array so that array now reflects the new row's entries.

Complexity Analysis

  • Time complexity : . We need to fill in the array of size , times. Here, and refer to the lengths of and .

  • Space complexity : . array of size is used.


Analysis written by: @vinod23