FANDOM


Definition

The edit distance of two strings is the minimum number of atomic edtion operations (insertion, deletion or substitution of characters) that must be done in order to transform one string in the other.

Also known as Levenshtein distance.

Examples

EditDist('house','')= 5, (removing all the characters)
EditDist('house','houses')=1, (inserting an 's' at the end of the first string)
EditDist('house','horses')=2, (changing the 'u' in the first string by 'r' and inserting an 's' at the end)
EditDist('boulevard','blvd')=5, (removing 'o', 'u', 'e', 'a' and 'r' from de first string).

Normalization

$ EditDistSim(s,t) = 1-\frac{EditDist(s,t)}{\max(|s|,|t|)} $

Examples

EditSim('house','')= 1-5/5 = 0,
EditSim('house','houses')=1-1/6= 0.8333,
EditSim('house','horses')=1-2/6= 0.6667,
EditSim('boulevard','blvd')=1-5/9= 0,4444,

Variations

There are variations that assign different weight to the atomic edition operations.

Applications

Useful for matching abbreviations and acronyms.

References