The Algorithm Design Manual
About the Book
Programming Challenges

The Stony Brook Algorithm Repository

Steven Skiena
Stony Brook University
Dept. of Computer Science

1.7.4 Approximate String Matching

Problem Input | Problem Output

INPUT                    OUTPUT


Input Description: A text string t and a pattern string p. An edit cost bound k.

Problem: Does there exist an alignment between t and p with edit cost at most k, ie. can we transform part of t to p using at most k additions, deletions, and substitutions.

Excerpt from The Algorithm Design Manual: Approximate string matching is fundamental to text processing, because we live in an error-prone world. Any spelling correction program must be able to identify the closest match for any text string not found in a dictionary. Genbank has become a fundamental tool for molecular biology by supporting homology (similarity) searches on DNA sequences. Suppose you were to sequence a new gene in man, and you discovered that it is similar to the hemoglobin gene in rats. It is likely that this new gene also produces hemoglobin, and any differences are the result of genetic mutations during evolution.


Implementations

  • agrep - Approximate General Regular Expression Pattern Matcher (C) (rating 10)
  • TRE regexp matching library (C) (rating 9)
  • HT/DIG -- image compression codes (C) (rating 5)
  • Handbook of Algorithms and Data Structures (C,Pascal) (rating 2)

  • Recommended Books

    The Art of Computer Programming : Sorting and Searching by Donald Knuth Algorithms on Strings, Trees, and Sequences by Dan Gusfield Practical Algorithms for Programmers by A. Binstock and J. Rex
    Text Algorithms by M. Crochemore and W. Rytter Handbook of Algorithms and Data Structures by G. Gonnet and R. Baeza-Yates Introduction to Algorithms by T. Cormen and C. Leiserson and R. Rivest and C. Stein
    Introduction to Algorithms by U. Manber Computer Algorithms by S. Baase Time Warps, String Edits, and Macromolecules: the theory and practice of sequence comparison by D. Sankoff and J. Kruskal

    Related Links

  • Wikibooks programs for computing edit (Levenshtein) distance in Ada, C++, Emacs Lisp, Io, JavaScript, Java, PHP, Python, Ruby VB, C# and others

    Related Problems


      
    Longest Common Substring
      
    String Matching



    This page last modified on 2008-07-10 .
    www.algorist.com