Longest Common Subsequence

 

A Longest Common Subsequence (LCS) is a concept in computer science and information theory that refers to the longest sequence of characters or elements that are common to two or more sequences. The term is commonly used in the context of strings, where the sequences are strings of characters, but it can be generalized to other types of sequences as well.


For example, consider two strings "ABCD" and "ACDF." The common subsequences are "ACD" and "AC," but the longest common subsequence is "AC" with a length of 2.


The key point about the LCS is that the characters in the subsequence must appear in the same order in both original sequences, but they don't necessarily have to be consecutive. In other words, the characters in the LCS must maintain their relative order, but they may have other characters in between.


Dynamic programming is a common approach used to find the LCS of two sequences efficiently. The dynamic programming approach builds a table to store intermediate results, and the final result is obtained by backtracking through this table.


The LCS problem has applications in various fields, including bioinformatics (for comparing DNA sequences), version control systems (tracking changes in files), and data comparison (finding similarities in datasets). It is a fundamental problem with practical implications in diverse areas of computer science and beyond.

Let's take two example sequences, "ABCBDAB" and "BDCAB." We'll find the Longest Common Subsequence (LCS) using dynamic programming.


1. **Initialization:**

   Create a table where rows correspond to the characters of the first sequence, and columns correspond to the characters of the second sequence. Initialize the table with zeros.


   ```

    |   |   B |   D |   C |  A |  B |

   | ---|---|---|---|---|---|

   | A   | 0  | 0   | 0    | 0   | 0 |

   | B   | 0  | 0  | 0    | 0    | 0 |

   | C   | 0 | 0   | 0     | 0   | 0 |

   | B   | 0 | 0   | 0     | 0   | 0 |

   | D   | 0 | 0   | 0     | 0   | 0 |

   | A   | 0 | 0   | 0     | 0    | 0 |

   | B   | 0 | 0   | 0     | 0    | 0 |

   ```


2. **Filling the Table:**

   Iterate through the sequences and fill in the table based on the following rules:

   - If the characters are equal, take the value from the diagonal cell and increment it by 1.

   - If the characters are not equal, take the maximum of the value in the cell above or the value in the cell to the left.


   ```

   |   | B | D | C | A | B |

   |---|---|---|---|---|---|

   | A | 0 | 0 | 0 | 1 | 1 |

   | B | 1 | 1 | 1 | 1 | 2 |

   | C | 1 | 1 | 2 | 2 | 2 |

   | B | 2 | 2 | 2 | 2 | 3 |

   | D | 2 | 3 | 3 | 3 | 3 |

   | A | 2 | 3 | 3 | 4 | 4 |

   | B | 3 | 3 | 3 | 4 | 5 |

   ```


3. **Backtracking:**

   Start from the bottom-right corner and backtrack through the table to reconstruct the LCS. Follow these rules:

   - If the characters are equal, move diagonally and add the character to the LCS.

   - If the characters are not equal, move to the cell with the maximum value.


   Starting from the bottom-right corner (5), backtrack to the top-left corner (0):


   ```

   LCS: B -> A -> B

   ```


   So, the Longest Common Subsequence for "ABCBDAB" and "BDCAB" is "BAB."



c implementation:


#include <stdio.h>

#include <string.h>


#define MAX_LENGTH 100


// Function to find the length of LCS and construct LCS

void findLCS(char X[], char Y[], int m, int n, char lcs[]) {

    int L[MAX_LENGTH][MAX_LENGTH];

    

    // Build the L[][] table

    for (int i = 0; i <= m; i++) {

        for (int j = 0; j <= n; j++) {

            if (i == 0 || j == 0)

                L[i][j] = 0;

            else if (X[i - 1] == Y[j - 1])

                L[i][j] = L[i - 1][j - 1] + 1;

            else

                L[i][j] = (L[i - 1][j] > L[i][j - 1]) ? L[i - 1][j] : L[i][j - 1];

        }

    }


    // Construct LCS using L[][]

    int index = L[m][n];

    lcs[index] = '\0'; // Null-terminate the LCS string


    int i = m, j = n;

    while (i > 0 && j > 0) {

        if (X[i - 1] == Y[j - 1]) {

            lcs[index - 1] = X[i - 1];

            i--;

            j--;

            index--;

        } else if (L[i - 1][j] > L[i][j - 1])

            i--;

        else

            j--;

    }

}


int main() {

    char X[] = "ABCBDAB";

    char Y[] = "BDCAB";

    int m = strlen(X);

    int n = strlen(Y);


    char lcs[MAX_LENGTH];


    findLCS(X, Y, m, n, lcs);


    printf("LCS: %s\n", lcs);


    return 0;

}

output:

LCS: BDAB

Process returned 0 (0x0)   execution time : 0.039 s
Press any key to continue.


This c code defines a function findLCS to compute the LCS and then prints the LCS for the example strings "ABCBDAB" and "BDCAB"


pracrice problem:

GIVEN two string
x=A,B,C,B,D,A,B
Y=B,D,C,A,B,A
find out the length of the common subsequence and three longest common subsequence




solution:



To find the length of the Longest Common Subsequence (LCS) between two strings and to identify three of the longest common subsequences, you can use dynamic programming. Here's how you can approach this problem:

### Step 1: Create a Table
Create a table to store the lengths of the common subsequences of substrings of the given strings.

```
   |   |   | A | B | C | B | D | A | B |
   |--- |---|---|---|---|---|---|---|---|
   |     | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
   | B | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
   | D | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 2 |
   | C | 0 | 0 | 1 | 2 | 2 | 2 | 2 | 2 |
   | A | 0 | 1 | 1 | 2 | 2 | 2 | 3 | 3 |
   | B | 0 | 1 | 2 | 2 | 2 | 2 | 3 | 4 |
   | A | 0 | 1 | 2 | 2 | 2 | 2 | 3 | 4 |
```

### Step 2: Fill in the Table
Use dynamic programming to fill in the table. The value at `table[i][j]` represents the length of the LCS of the substrings `x[1...i]` and `y[1...j]`.

### Step 3: Backtrack to Find the LCS
Starting from the bottom-right corner of the table, use the values to backtrack and find the LCS.

### Step 4: Find Three Longest Common Subsequences
After finding the length of the LCS, you can use the table to find three different LCS. Here are three possibilities:

1. **B, C, B**
2. **B, D, A**
3. **B, C, A**

### Summary:
- Length of LCS: 3
- Three Longest Common Subsequences:
  1. B, C, B
  2. B, D, A
  3. B, C, A

Keep in mind that there can be multiple valid LCS of the same length. The above examples are just one set of possible LCS.

Post a Comment

0 Comments