Longest-Common-Subsequence Problem

https://leetcode.com/problems/longest-common-subsequence/

Definition

Given a sequence $X = ⟨ x_{1}, x_{2}, \dots, x_{m} ⟩$ , another sequence $Z = ⟨ z_{1}, z_{2}, \dots, z_{k} ⟩$ is a subsequence of $X$ if there exists a strictly increasing sequence $⟨ i_{1}, i_{2}, \dots, i_{k} ⟩$ such that $x_{i_{j}} = z_{j}$ .

Algorithm

Optimal substructure: the LCS of two sequences contains within it an LCS of prefixes of the two sequences.
Recursive solution
- If $x_{m} = y_{n}$ , find LCS of $X_{m - 1}$ and $Y_{n - 1}$
- If $x_{m} \neq = y_{n}$ , find the LCS of $X_{m - 1}$ and $Y_{n}$ , and that of $X_{m}$ and $Y_{n - 1}$ . Pick the longer LCS.

c [i, j] = ⎩ ⎨ ⎧ 0, c [i - 1, j - 1] + 1, max {c [i, j - 1], c [i - 1, j]}, i = 0 or j = 0, i, j > 0 and x_{i} = y_{i}, i, j > 0 and x_{i} \neq = y_{j} t

Approach 1: Memoization

Discuss whether text1[p1] is included in the optimal solution or not.
If it is, we can find the first occurrence of this char in text2 and the problem is reduced.
Time complexity: $O (m n^{2})$ , as there are $mn$ possible pairs of strings, and solving each of them can take $O (n)$ time, as we’re searching the occurrence of a char in a string of length $n$ .

from functools import lru_cache
m, n = len(text1), len(text2)
@lru_cache()
def helper(p1, p2):
    # Base case: empty strings
    if p1 == m or p2 == n:
        return 0
    
    # Option 1: We don't include text1[p1] in the solution.
    option_1 = helper(p1 + 1, p2)
    
    # Option 2: We include text1[p1] in the solution, as long as a match for it
    # in text2 at or after p2 exists.
    first_occurence = text2.find(text1[p1], p2)
    if first_occurence != -1:
        option_2 = 1 + helper(p1 + 1, first_occurence + 1)
    else:
        option_2 = 0
 
    # Return the best option.
    return max(option_1, option_2)
 
return helper(0, 0)

Approach 2: Improved Memoization

A more intuitive way of structuring the subproblems.

If the current char is the same, consider them a match. This must be part of the optimal solution.
If not, either skip a char in text1 or skip a char in text2.

Time complexity becomes $O (mn)$ , as there are $mn$ combinations of substrings (or suffixes, more specifically.)

from functools import lru_cache
m, n = len(text1), len(text2)
@lru_cache(maxsize=None)
def memo_solve(p1, p2):
    # Base case: If either string is now empty, we can't match up anymore
    # characters.
    if p1 == m or p2 == n:
        return 0
    
    # Recursive case 1.
    if text1[p1] == text2[p2]:
        return 1 + memo_solve(p1 + 1, p2 + 1)
    
    # Recursive case 2.
    else:
        return max(memo_solve(p1, p2 + 1), memo_solve(p1 + 1, p2))
    
return memo_solve(0, 0)

Approach 3: Bottom-Up DP

Starting from the final char on both strings. Loop in reversed order.
First letter is the same? add 1 to length of LCS
First letter not the same? Have to kick a letter out, either from str1 or str2. Then take the max of sub problems.

# Make a grid of 0's with len(text2) + 1 columns and len(text1) + 1 rows.
dp = [[0] * (len(text2) + 1) for _ in range(len(text1) + 1)]
 
# Iterate up each column, starting from the last one.
for col in reversed(range(len(text2))):
    for row in reversed(range(len(text1))):
        # If the corresponding characters for this cell are the same...
        if text2[col] == text1[row]:
            dp[row][col] = 1 + dp[row + 1][col + 1]
        # Otherwise they must be different...
        else:
            dp[row][col] = max(dp[row + 1][col], dp[row][col + 1])
 
# The original problem's answer is in dp[0][0]. Return it.
return dp[0][0]

Approach 4: Bottom-Up DP with Space Optimization

Note that we’re only referencing the previously calculated column and the current column that we’re calculating.

if len(text2) < len(text1):
    text1, text2 = text2, text1
 
# The previous column starts with all 0's and like before is 1
# more than the length of the first word.
previous = [0] * (len(text1) + 1)
 
# Iterate up each column, starting from the last one.
for col in reversed(range(len(text2))):
    # Create a new array to represent the current column.
    current = [0] * (len(text1) + 1)
    for row in reversed(range(len(text1))):
        if text2[col] == text1[row]:
            current[row] = 1 + previous[row + 1] # previous col is to the right
        else:
            current[row] = max(previous[row], current[row + 1])
    # The current column becomes the previous one.
    previous = current
 
# The original problem's answer is in previous[0]. Return it.
return previous[0]

An slightly adjusted version:

if len(text2) < len(text1):
    text1, text2 = text2, text1
 
previous = [0] * (len(text1) + 1)
current = [0] * (len(text1) + 1)
 
for col in reversed(range(len(text2))):
    for row in reversed(range(len(text1))):
        if text2[col] == text1[row]:
            current[row] = 1 + previous[row + 1]
        else:
            current[row] = max(previous[row], current[row + 1])
    previous, current = current, previous
 
return previous[0]

My Vault

Explorer

lcs