604. Design Compressed String Iterator


Design and implement a data structure for a compressed string iterator. It should support the following operations: next and hasNext.

The given compressed string will be in the form of each letter followed by a positive integer representing the number of this letter existing in the original uncompressed string.

next() - if the original string still has uncompressed characters, return the next letter; Otherwise return a white space.
hasNext() - Judge whether there is any letter needs to be uncompressed.

Note:
Please remember to RESET your class variables declared in StringIterator, as static/class variables are persisted across multiple test cases. Please see here for more details.

Example:

StringIterator iterator = new StringIterator("L1e2t1C1o1d1e1");

iterator.next(); // return 'L'
iterator.next(); // return 'e'
iterator.next(); // return 'e'
iterator.next(); // return 't'
iterator.next(); // return 'C'
iterator.next(); // return 'o'
iterator.next(); // return 'd'
iterator.hasNext(); // return true
iterator.next(); // return 'e'
iterator.hasNext(); // return false
iterator.next(); // return ' '


Solution

Approach #1 Uncompressing the String [Time Limit Exceeded]

Algorithm

In this approach, we make use of precomputation. We already form the uncompressed string and append the uncompressed letters for each compressed letter in the to the stringbuilder. To find the uncompressed strings to be stored in , we traverse over the given . Whenver we find an alphabet, we find the number following it by making use of decimal mathematics. Thus, we get the two elements(alphabet and the count) required for forming the current constituent of the uncompressed string.

Now, we'll look at how the next() and hasNext() operations are performed:

  1. next(): We start off by checking if the compressed string has more uncompressed letters pending. If not, hasNext() returns a False value and next() returns a ' '. Otherwise, we return the letter pointed by , which indicates the next letter to be returned. Before returning the letter, we also update the to point to the next letter in .

  2. hasNext(): If the pointer reaches beyond the end of array, it indicates that no more uncompressed letters are left beyond the current index pointed by . Thus, we return a False in this case. Otherwise, we return a True value.

Performance Analysis

  • We precompute the elements of the uncompressed string. Thus, the space required in this case is , where refers to the length of the uncompressed string.

  • The time required for precomputation is since we need to generate the uncompressed string of length .

  • Once the precomputation has been done, the time required for performing next() and hasNext() is for both.

  • This approach can be easily extended to include previous(), last() and find() operations. All these operations require the use an index only and thus, take time. Operations like hasPrevious() can also be easily included.

  • Since, once the precomputation has been done, next() requires time, this approach is useful if next() operation needs to be performed a large number of times. However, if hasNext() is performed most of the times, this approach isn't much advantageous since precomputation needs to be done anyhow.

  • A potential problem with this approach could arise if the length of the uncompressed string is very large. In such a case, the size of the complete uncompressed string could become so large that it can't fit in the memory limits, leading to memory overflow.


Approach #2 Pre-Computation [Accepted]

Algorithm

In this approach, firstly, we split the given based on numbers(0-9) and store the values(alphabets) obtained in array. We also split the based on the alphabets(a-z, A-Z) and store the numbers(in the form of a string) in a array(after converting the strings obtained into integers). We do the splitting by making use of regular expression matching.

A regular expression is a special sequence of letters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.

This splitting using regex is done as a precomputation step. Now we'll look at how the next() and hasNext() operations are implemented.

  1. next(): Every time the next() operation is performed, firstly we check if there are any more letters to be uncompressed. We check it by making use of hasNext() function. If there aren't any more letters left, we return a ' '. We make use of a pointer to keep a track of the letter in the that needs to be returned next. If there are more letters left in the uncompressed string, we return the current letter pointed to by . But, before returning this letter, we also decrement the entry to indicate that the current letter is pending in the uncompressed string by one lesser count. On decrementing this entry, if it becomes zero, it indicates that no more instances of the current letter exist in the uncompressed string. Thus, we update the pointer to point to the next letter.

  2. hasNext(): For performing hasNext() operation, we simply need to check if the has already reached beyong the end of array. If so, it indicates that no more compressed letters exist in the . Hence, we return a False value in this case. Otherwise, more compressed letters exist. Hence, we return a True value in this case.

Performance Analysis

  • The space required for storing the results of the precomputation is , where refers to the length of the compressed string. The and array contain a total of elements.

  • The precomputation step requires time. Thus, if hasNext() operation is performed most of the times, this precomputation turns out to be non-advantageous.

  • Once the precomputation has been done, hasNext() and next() requires time.

  • This approach can be extended to include the previous() and hasPrevious() operations, but that would require making some simple modifications to the current implementation.


Approach #3 Demand-Computation [Accepted]

Algorithm

In this approach, we don't make use of regex for finding the individual components of the given . We do not perform any form of precomputation. Whenever an operation needs to be performed, the required results are generated from the scratch. Thus, the operations are performed only on demand.

Let's look at the implementation of the required operations:

  1. next(): We make use of a global pointer to keep a track of which compressed letter in the needs to be processed next. We also make use of a global variable to keep a track of the number of instances of the current letter which are still pending. Whenever next() operation needs to be performed, firstly, we check if there are more uncompressed letters left in the . If not, we return a ' '. Otherwise, we check if there are more instances of the current letter still pending. If so, we directly decrement the count of instances indicated by and return the current letter. But, if there aren't more instances pending for the current letter, we update the to point to the next letter in the . We also update the by obtaining the count for the next letter from the . This number is obtained by making use of decimal arithmetic.

  2. hasNext(): If the pointer has reached beyond the last index of the and becomes, it indicates that no more uncompressed letters exist in the compressed string. Hence, we return a False in this case. Otherwise, a True value is returned indicating that more compressed letters exist in the .

Performance Analysis

  • Since no precomputation is done, constant space is required in this case.

  • The time required to perform next() operation is .

  • The time required for hasNext() operation is .

  • Since no precomputations are done, and hasNext() requires only time, this solution is advantageous if hasNext() operation is performed most of the times.

  • This approach can be extended to include previous() and hasPrevious() operationsm, but this will require the use of some additional variables.


Analysis written by: @vinod23