726. Number of Atoms


Given a chemical formula (given as a string), return the count of each atom.

An atomic element always starts with an uppercase character, then zero or more lowercase letters, representing the name.

1 or more digits representing the count of that element may follow if the count is greater than 1. If the count is 1, no digits will follow. For example, H2O and H2O2 are possible, but H1O2 is impossible.

Two formulas concatenated together produce another formula. For example, H2O2He3Mg4 is also a formula.

A formula placed in parentheses, and a count (optionally added) is also a formula. For example, (H2O2) and (H2O2)3 are formulas.

Given a formula, output the count of all elements as a string in the following form: the first name (in sorted order), followed by its count (if that count is more than 1), followed by the second name (in sorted order), followed by its count (if that count is more than 1), and so on.

Example 1:

Input: 
formula = "H2O"
Output: "H2O"
Explanation: 
The count of elements are {'H': 2, 'O': 1}.

Example 2:

Input: 
formula = "Mg(OH)2"
Output: "H2MgO2"
Explanation: 
The count of elements are {'H': 2, 'Mg': 1, 'O': 2}.

Example 3:

Input: 
formula = "K4(ON(SO3)2)2"
Output: "K4N2O14S4"
Explanation: 
The count of elements are {'K': 4, 'N': 2, 'O': 14, 'S': 4}.

Note:

  • All atom names consist of lowercase letters, except for the first character which is uppercase.
  • The length of formula will be in the range [1, 1000].
  • formula will only consist of letters, digits, and round parentheses, and is a valid formula as defined in the problem.

  • Approach #1: Recursion [Accepted]

    Intuition and Algorithm

    Write a function parse that parses the formula from index i, returning a map count from names to multiplicities (the number of times that name is recorded).

    We will put i in global state: our parse function increments i throughout any future calls to parse.

    • When we see a '(', we will parse whatever is inside the brackets (up to the closing ending bracket) and add it to our count.

    • Otherwise, we should see an uppercase character: we will parse the rest of the letters to get the name, and add that (plus the multiplicity if there is one.)

    • At the end, if there is a final multiplicity (representing the multiplicity of a bracketed sequence), we'll multiply our answer by this.

    Complexity Analysis

    • Time Complexity: , where is the length of the formula. It is to parse through the formula, but each of multiplicities after a bracket may increment the count of each name in the formula (inside those brackets), leading to an complexity.

    • Space Complexity: . We aren't recording more intermediate information than what is contained in the formula.


    Approach #2: Stack [Accepted]

    Intuition and Algorithm

    Instead of recursion, we can simulate the call stack by using a stack of counts directly.

    Complexity Analysis

    • Time Complexity , and Space Complexity . The analysis is the same as Approach #1.

    Approach #3: Regular Expressions [Accepted]

    Intuition and Algorithm

    Whenever parsing is involved, we can use regular expressions, a language for defining patterns in text.

    Our regular expression will be "([A-Z][a-z]*)(\d*)|(\()|(\))(\d*)". Breaking this down by capture group, this is:

    • ([A-Z][a-z]*) Match an uppercase character followed by any number of lowercase characters, then ((\d*)) match any number of digits.
    • OR, (\() match a left bracket or (\)) right bracket, then (\d*) match any number of digits.

    Now we can proceed as in Approach #2.

    • If we parsed a name and multiplicity ([A-Z][a-z]*)(\d*), we will add it to our current count.

    • If we parsed a left bracket, we will append a new count to our stack, representing the nested depth of parentheses.

    • If we parsed a right bracket (and possibly another multiplicity), we will multiply our deepest level count, top = stack.pop(), and add those entries to our current count.

    Complexity Analysis

    • Time Complexity , and Space Complexity . The analysis is the same as Approach #1, as this regular expression did not look backwards when parsing.

    Analysis written by: @awice. Approaches #1 and #2 inspired by @zestypanda. Java solution for #3 by @jianchao.li.fighter.