A Longest Common Subsequence (LCS) is a concept in computer science and information theory that refers to the longest sequence of characters or elements that are common to two or more sequences. The term is commonly used in the context of strings, where the sequences are strings of characters, but it can be generalized to other types of sequences as well.
For example, consider two strings "ABCD" and "ACDF." The common subsequences are "ACD" and "AC," but the longest common subsequence is "AC" with a length of 2.
The key point about the LCS is that the characters in the subsequence must appear in the same order in both original sequences, but they don't necessarily have to be consecutive. In other words, the characters in the LCS must maintain their relative order, but they may have other characters in between.
Dynamic programming is a common approach used to find the LCS of two sequences efficiently. The dynamic programming approach builds a table to store intermediate results, and the final result is obtained by backtracking through this table.
The LCS problem has applications in various fields, including bioinformatics (for comparing DNA sequences), version control systems (tracking changes in files), and data comparison (finding similarities in datasets). It is a fundamental problem with practical implications in diverse areas of computer science and beyond.
Let's take two example sequences, "ABCBDAB" and "BDCAB." We'll find the Longest Common Subsequence (LCS) using dynamic programming.
1. **Initialization:**
Create a table where rows correspond to the characters of the first sequence, and columns correspond to the characters of the second sequence. Initialize the table with zeros.
```
| | B | D | C | A | B |
| ---|---|---|---|---|---|
| A | 0 | 0 | 0 | 0 | 0 |
| B | 0 | 0 | 0 | 0 | 0 |
| C | 0 | 0 | 0 | 0 | 0 |
| B | 0 | 0 | 0 | 0 | 0 |
| D | 0 | 0 | 0 | 0 | 0 |
| A | 0 | 0 | 0 | 0 | 0 |
| B | 0 | 0 | 0 | 0 | 0 |
```
2. **Filling the Table:**
Iterate through the sequences and fill in the table based on the following rules:
- If the characters are equal, take the value from the diagonal cell and increment it by 1.
- If the characters are not equal, take the maximum of the value in the cell above or the value in the cell to the left.
```
| | B | D | C | A | B |
|---|---|---|---|---|---|
| A | 0 | 0 | 0 | 1 | 1 |
| B | 1 | 1 | 1 | 1 | 2 |
| C | 1 | 1 | 2 | 2 | 2 |
| B | 2 | 2 | 2 | 2 | 3 |
| D | 2 | 3 | 3 | 3 | 3 |
| A | 2 | 3 | 3 | 4 | 4 |
| B | 3 | 3 | 3 | 4 | 5 |
```
3. **Backtracking:**
Start from the bottom-right corner and backtrack through the table to reconstruct the LCS. Follow these rules:
- If the characters are equal, move diagonally and add the character to the LCS.
- If the characters are not equal, move to the cell with the maximum value.
Starting from the bottom-right corner (5), backtrack to the top-left corner (0):
```
LCS: B -> A -> B
```
So, the Longest Common Subsequence for "ABCBDAB" and "BDCAB" is "BAB."
c implementation:
#include <stdio.h>
#include <string.h>
#define MAX_LENGTH 100
// Function to find the length of LCS and construct LCS
void findLCS(char X[], char Y[], int m, int n, char lcs[]) {
int L[MAX_LENGTH][MAX_LENGTH];
// Build the L[][] table
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (X[i - 1] == Y[j - 1])
L[i][j] = L[i - 1][j - 1] + 1;
else
L[i][j] = (L[i - 1][j] > L[i][j - 1]) ? L[i - 1][j] : L[i][j - 1];
}
}
// Construct LCS using L[][]
int index = L[m][n];
lcs[index] = '\0'; // Null-terminate the LCS string
int i = m, j = n;
while (i > 0 && j > 0) {
if (X[i - 1] == Y[j - 1]) {
lcs[index - 1] = X[i - 1];
i--;
j--;
index--;
} else if (L[i - 1][j] > L[i][j - 1])
i--;
else
j--;
}
}
int main() {
char X[] = "ABCBDAB";
char Y[] = "BDCAB";
int m = strlen(X);
int n = strlen(Y);
char lcs[MAX_LENGTH];
findLCS(X, Y, m, n, lcs);
printf("LCS: %s\n", lcs);
return 0;
}
0 Comments