Longest-Common-Subsequence Problem

https://leetcode.com/problems/longest-common-subsequence/

Definition

Given a sequence , another sequence is a subsequence of if there exists a strictly increasing sequence such that .

Algorithm

  • Optimal substructure: the LCS of two sequences contains within it an LCS of prefixes of the two sequences.
  • Recursive solution
    • If , find LCS of and
    • If , find the LCS of and , and that of and . Pick the longer LCS.

Approach 1: Memoization

  • Discuss whether text1[p1] is included in the optimal solution or not.
  • If it is, we can find the first occurrence of this char in text2 and the problem is reduced.
  • Time complexity: , as there are possible pairs of strings, and solving each of them can take time, as we’re searching the occurrence of a char in a string of length .
from functools import lru_cache
m, n = len(text1), len(text2)
@lru_cache()
def helper(p1, p2):
    # Base case: empty strings
    if p1 == m or p2 == n:
        return 0
    
    # Option 1: We don't include text1[p1] in the solution.
    option_1 = helper(p1 + 1, p2)
    
    # Option 2: We include text1[p1] in the solution, as long as a match for it
    # in text2 at or after p2 exists.
    first_occurence = text2.find(text1[p1], p2)
    if first_occurence != -1:
        option_2 = 1 + helper(p1 + 1, first_occurence + 1)
    else:
        option_2 = 0
 
    # Return the best option.
    return max(option_1, option_2)
 
return helper(0, 0)

Approach 2: Improved Memoization

A more intuitive way of structuring the subproblems.

  • If the current char is the same, consider them a match. This must be part of the optimal solution.
  • If not, either skip a char in text1 or skip a char in text2.

Time complexity becomes , as there are combinations of substrings (or suffixes, more specifically.)

from functools import lru_cache
m, n = len(text1), len(text2)
@lru_cache(maxsize=None)
def memo_solve(p1, p2):
    # Base case: If either string is now empty, we can't match up anymore
    # characters.
    if p1 == m or p2 == n:
        return 0
    
    # Recursive case 1.
    if text1[p1] == text2[p2]:
        return 1 + memo_solve(p1 + 1, p2 + 1)
    
    # Recursive case 2.
    else:
        return max(memo_solve(p1, p2 + 1), memo_solve(p1 + 1, p2))
    
return memo_solve(0, 0)

Approach 3: Bottom-Up DP

  • Starting from the final char on both strings. Loop in reversed order.
  • First letter is the same? add 1 to length of LCS
  • First letter not the same? Have to kick a letter out, either from str1 or str2. Then take the max of sub problems.
# Make a grid of 0's with len(text2) + 1 columns and len(text1) + 1 rows.
dp = [[0] * (len(text2) + 1) for _ in range(len(text1) + 1)]
 
# Iterate up each column, starting from the last one.
for col in reversed(range(len(text2))):
    for row in reversed(range(len(text1))):
        # If the corresponding characters for this cell are the same...
        if text2[col] == text1[row]:
            dp[row][col] = 1 + dp[row + 1][col + 1]
        # Otherwise they must be different...
        else:
            dp[row][col] = max(dp[row + 1][col], dp[row][col + 1])
 
# The original problem's answer is in dp[0][0]. Return it.
return dp[0][0]

Approach 4: Bottom-Up DP with Space Optimization

  • Note that we’re only referencing the previously calculated column and the current column that we’re calculating.
if len(text2) < len(text1):
    text1, text2 = text2, text1
 
# The previous column starts with all 0's and like before is 1
# more than the length of the first word.
previous = [0] * (len(text1) + 1)
 
# Iterate up each column, starting from the last one.
for col in reversed(range(len(text2))):
    # Create a new array to represent the current column.
    current = [0] * (len(text1) + 1)
    for row in reversed(range(len(text1))):
        if text2[col] == text1[row]:
            current[row] = 1 + previous[row + 1] # previous col is to the right
        else:
            current[row] = max(previous[row], current[row + 1])
    # The current column becomes the previous one.
    previous = current
 
# The original problem's answer is in previous[0]. Return it.
return previous[0]

An slightly adjusted version:

if len(text2) < len(text1):
    text1, text2 = text2, text1
 
previous = [0] * (len(text1) + 1)
current = [0] * (len(text1) + 1)
 
for col in reversed(range(len(text2))):
    for row in reversed(range(len(text1))):
        if text2[col] == text1[row]:
            current[row] = 1 + previous[row + 1]
        else:
            current[row] = max(previous[row], current[row + 1])
    previous, current = current, previous
 
return previous[0]