Edit Distance problems (http://en.wikipedia.org/wiki/Edit_distance):
In information theory and computer science, the edit distance between two strings of characters is the number of operations required to transform one of them into the other. There are several different ways to define an edit distance, depending on which edit operations are allowed: replace, delete, insert, transpose, and so on. There are algorithms to calculate its value under various definitions:
Most popular:
a) Levenshtein distance
[http://en.wikipedia.org/wiki/Levenshtein_distance]
- to have 2 string, one (S) is m, one(T) is n,
- generate a matrix d[(m+1)*(n+1)]
- fill up the first row and first column with 1->m ,and 1-> n
- do the following to propagation of the matrix (i = i... m, j = 1...n):
* if S[i] =T[j] : d[i, j] = d[i-1,j-1] // which is its left-upper element.
// no operation required
* if not: d[i, j ] = min(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1 // an insertion
= d[i-1, j-1] + 1 // a substitution
)
The final result contains at the bottom right element (shown red below)
which is the smallest one in the 3 "previous directions" plus 1.
Draw the below picture, and it will be very easy to see, notice that when drawing, we could keep 3 differnt arrows denoting the selection is from where it comes from, this will correspond to the operations such as deletion, insertion...as shown above
[http://en.wikipedia.org/wiki/Levenshtein_distance]
- to have 2 string, one (S) is m, one(T) is n,
- generate a matrix d[(m+1)*(n+1)]
- fill up the first row and first column with 1->m ,and 1-> n
- do the following to propagation of the matrix (i = i... m, j = 1...n):
* if S[i] =T[j] : d[i, j] = d[i-1,j-1] // which is its left-upper element.
// no operation required
* if not: d[i, j ] = min(
d[i-1, j] + 1, // deletion
d[i, j-1] + 1 // an insertion
= d[i-1, j-1] + 1 // a substitution
)
The final result contains at the bottom right element (shown red below)
which is the smallest one in the 3 "previous directions" plus 1.
Draw the below picture, and it will be very easy to see, notice that when drawing, we could keep 3 differnt arrows denoting the selection is from where it comes from, this will correspond to the operations such as deletion, insertion...as shown above
|
|
This will require d[i,j] operations, and since the next element only require the i-1, or j-1 level of element, we can only keep the i-1 and j-1 level to save the space.
b) Longest common subsequence problem
[http://en.wikipedia.org/wiki/Longest_common_subsequence_problem]
[http://en.wikipedia.org/wiki/Longest_common_subsequence_problem]
The difference between the Leve-D and Longest-Common-subsequence is the former one is to compute the approximate string matching, often used in spell check to find a certain way to change the words in one or two sentence with least cost.
Personal think that the best way to study this problem is follow the instruction in wiki, to draw the picture:
Ø | A | G | C | A | T | |
---|---|---|---|---|---|---|
Ø | 0 | 0 | 0 | 0 | 0 | 0 |
G | 0 | 0 | 1 | 1 | 1 | 1 |
A | 0 | 1 | 1 | 1 | 2 | 2 |
C | 0 | 1 | 1 | 2 | 2 | 2 |
Then move forward to exe to draw the path of :
| 0 1 2 3 4 5 6 7
| M Z J A W X U
-----|-----------------
0 | 0 0 0 0 0 0 0 0
1 X | 0 0 0 0 0 0 1 1
2 M | 0 1 1 1 1 1 1 1
3 J | 0 1 1 2 2 2 2 2
4 Y | 0 1 1 2 2 2 2 2
5 A | 0 1 1 2 3 3 3 3
6 U | 0 1 1 2 3 3 3 4
7 Z | 0 1 2 2 3 3 3 4
Others :
Hamming distance
Longest common subsequence problem
Levenshtein distance
Damerau–Levenshtein distance
Jaro–Winkler distance
Wagner–Fischer edit distance
Ukkonen's algorithm
Hirschberg's algorithm
# DP: maximum sum submatrix of given matrix
From : http://www.artofproblemsolving.com/Forum/viewtopic.php?p=385650#385650There are 2 algos for it. One N^3 and another N^4.
N^4: you need to keep S[a][b] where it is the sum of all the position from (0,0) to (m,n). now run N^4 algo I mean select starting left up node and select down right node. And compute the sum.
N^3: it keeps track on the sum of rows. I mean S[m][n] means (m=row n=col) S[m][n]=sum(i=1 to m) S[i][n]. now select any two col. And go through the rows. If the sum becomes negative any time exclude the last row and update your maximum if necessary. Then go to next row and thus continue running. Same for minimum!
N^4: you need to keep S[a][b] where it is the sum of all the position from (0,0) to (m,n). now run N^4 algo I mean select starting left up node and select down right node. And compute the sum.
N^3: it keeps track on the sum of rows. I mean S[m][n] means (m=row n=col) S[m][n]=sum(i=1 to m) S[i][n]. now select any two col. And go through the rows. If the sum becomes negative any time exclude the last row and update your maximum if necessary. Then go to next row and thus continue running. Same for minimum!
The best algorithm I can think of has the running time about , which is much better than the of the trivial algorithm that just considers all possible rectangles and computes the sum in each rectangle in the most straightforward way, but still unacceptably long for (at least for my PC, which can do about simple operations per second). So, the algorithm I'm going to describe doesn't solve the problem completely.
Compared to the trivial algorithm, one can make two major improvements. The first one is to replace the given matrix by the matrix of the partial sums where . This can be done in time replacing the row elements by row partial sums first and then doing the same operation with columns. The point is that the sum of over a rectangle can be computed now as . This improvement alone reduces the running time of the trivial algorithm from to .
Now, assume that and are fixed (about possible choices) and our task is to find the best and . Let . We need to find such that is as large as possible. To this end, for every , find and . Also, record the positions where the maximum and the minimum are attained. This can be done in linear time. After that it remains to find the maximum of , which can also be done in linear time. So, the total time is about .
Compared to the trivial algorithm, one can make two major improvements. The first one is to replace the given matrix by the matrix of the partial sums where . This can be done in time replacing the row elements by row partial sums first and then doing the same operation with columns. The point is that the sum of over a rectangle can be computed now as . This improvement alone reduces the running time of the trivial algorithm from to .
Now, assume that and are fixed (about possible choices) and our task is to find the best and . Let . We need to find such that is as large as possible. To this end, for every , find and . Also, record the positions where the maximum and the minimum are attained. This can be done in linear time. After that it remains to find the maximum of , which can also be done in linear time. So, the total time is about .
Code:
for i = 1 to N
sum[i][0] = 0
for j = 1 to N
sum[i][j] = matrix[i][j] + sum[i][j - 1]
best = 0
for i = 1 to N
for j = i to N
max[0] = 0
min[0] = 0
result = 0
for k = 1 to N
result = result + sum[k][j] - sum[k][i]
if result > max[k - 1] then max[k] = result else max[k] = max[k - 1]
if result < min[k - 1] then min[k] = result else min[k] = min[k - 1]
if max[N] - min[N] > best then best = max[N] - min[N]
return best
sum[i][0] = 0
for j = 1 to N
sum[i][j] = matrix[i][j] + sum[i][j - 1]
best = 0
for i = 1 to N
for j = i to N
max[0] = 0
min[0] = 0
result = 0
for k = 1 to N
result = result + sum[k][j] - sum[k][i]
if result > max[k - 1] then max[k] = result else max[k] = max[k - 1]
if result < min[k - 1] then min[k] = result else min[k] = min[k - 1]
if max[N] - min[N] > best then best = max[N] - min[N]
return best
Also: [Ref] A sub-cubic time algorithm for the k-maximum subarray problem, which achieve O(n3 √log log n/ log n + k log n) for a general problem where overlapping is allowed for solution arrays. This complexity is sub-cubic when k = o(n3/ log n). The best known complexities of this problem are O(n3 + k log n), which is cubic when k = O(n3/ log n), and O(kn3 √log log n/ log n), which is sub-cubic when k = o(√log n/ log log n).
# To find the Maximum size square sub-matrix for a 1-0 matrix
Given a binary matrix, find out the maximum size square sub-matrix with all 1s.
For example, consider the below binary matrix.
0 1 1 0 1 1 1 0 1 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 0 0
The maximum square sub-matrix with all set bits is
1 1 1 1 1 1 1 1 1
Algorithm:
Let the given binary matrix be M[R][C]. The idea of the algorithm is to construct an auxiliary size matrix S[][] in which each entry S[i][j] represents size of the square sub-matrix with all 1s including M[i][j] and M[i][j] is the rightmost and bottommost entry in sub-matrix.
Let the given binary matrix be M[R][C]. The idea of the algorithm is to construct an auxiliary size matrix S[][] in which each entry S[i][j] represents size of the square sub-matrix with all 1s including M[i][j] and M[i][j] is the rightmost and bottommost entry in sub-matrix.
1) Construct a sum matrix S[R][C] for the given M[R][C]. a) Copy first row and first columns as it is from M[][] to S[][] b) For other entries, use following expressions to construct S[][] If M[i][j] is 1 then S[i][j] = min(S[i][j-1], S[i-1][j], S[i-1][j-1]) + 1 Else /*If M[i][j] is 0*/ S[i][j] = 0 2) Find the maximum entry in S[R][C] 3) Using the value and coordinates of maximum entry in S[i], print sub-matrix of M[][]
Some explanations form stackoverflow:
For each of the cells we will keep a counter of how big a square can be made using that cell as top left. Clearly all cells with 0 will have 0 as the count.Start iterating from bottom right cell and go to bottom left, then go to one row up and repeat.At each scan do this:- If the cell has 0 assign
count=0
- If the cell has 1 and is an edge cell (bottom or right edge only), assign
count=1
- For all other cells, check the count of the cell on its right, right-below, and below. Take the min of them and add 1 and assign that to the count. Keep a global
max_count
variable to keep track of the max count so far.
At the end of traversing the matrix, max_count
will have the desired value.Complexity is no more that the cost of traversal of the matrix.This is how the matrix will look like after the traversal. Values in parentheses are the counts, i.e. biggest square that can be made using the cell as top left.1(1) 0(0) 1(1) 0(0) 1(1) 0(0) 1(1) 0(0) 1(4) 1(3) 1(2) 1(1) 0(0) 1(1) 1(3) 1(3) 1(2) 1(1) 0(0) 0(0) 1(2) 1(2) 1(2) 1(1) 1(1) 1(1) 1(1) 1(1) 1(1) 1(1)
Implementation in Python
def max_size(mat, ZERO=0):
"""Find the largest square of ZERO's in the matrix `mat`."""
nrows, ncols = len(mat), (len(mat[0]) if mat else 0)
if not (nrows and ncols): return 0 # empty matrix or rows
counts = [[0]*ncols for _ in xrange(nrows)]
for i in reversed(xrange(nrows)):
# for each row
assert len(mat[i]) == ncols # matrix must be rectangular
for j in reversed(xrange(ncols)): # for each element in the row
if mat[i][j] == ZERO:
counts[i][j] = (1 + min( counts[i][j+1], # east
counts[i+1][j], # south
counts[i+1][j+1] # south-east ))
if i < (nrows - 1) and j < (ncols - 1) else 1 # edges
return max(c for rows in counts for c in rows)
--
♥ ¸¸.•*¨*•♫♪♪♫•*¨*•.¸¸♥
really cool....
ReplyDeletecan anyone comment about this book?
http://www.careermonk.com/?qa=data-structures-and-algorithms-made-easy