Levenshtein Distance Website Links For
Distance
 

Information About

Levenshtein Distance




It is useful in applications that need to determine how similar two strings are, such as Spell Checker s.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since these three edits change one into the other, and there is no way to do it with fewer than three edits:

# kitten → sitten (substitution of 's' for 'k')
# sitten → sittin (substitution of 'i' for 'e')
# sittin → sitting (insert 'g' at the end)

It can be considered a generalization of the Hamming Distance , which is used for strings of the same length and only considers substitution edits. There are also further generalizations of the Levenshtein distance that consider, for example, exchanging two characters as an operation, like in the Damerau-Levenshtein Distance algorithm.


THE ALGORITHM

A commonly-used bottom-up Dynamic Programming algorithm for computing the Levenshtein distance involves the use of an (n + 1) × (m + 1) matrix, where n and m are the lengths of the two strings. This algorithm is based on the Wagner-Fischer algorithm for edit distance. Here is Pseudocode for a function ''LevenshteinDistance'' that takes two strings, ''s'' of length ''m'', and ''t'' of length ''n'', and computes the Levenshtein distance between them:

int LevenshteinDistance('''char''' s '''char''' t[1..n )
''// d is a table with m+1 rows and n+1 columns''
declare '''int''' d 0..n

for i '''from''' 0 '''to''' m
d 0 := i
for j '''from''' 1 '''to''' n
d j := j

for i '''from''' 1 '''to''' m
for j '''from''' 1 '''to''' n
if s = t[j '''then''' cost := 0
else cost := 1
d j := minimum(
d j + 1, ''// deletion''
d j-1 + 1, ''// insertion''
d j-1 + cost ''// substitution''
)

return d n

Two examples of the resulting matrix (the minimum steps to be taken are highlighted):


  {class "wikitable"