A Rising Star in Combinatorial Pattern Matching

The Department of Computer Science congratulates faculty member Dominik Kempa on being recognized as a “rising star in combinatorial pattern matching.”

In a recent review article published in the Communications of the ACM, Gonzalo Navarro, a professor at the University of Chile, introduces new research conducted by computer science professor Dominik Kempa along with Tomasz Kociumaka of the Max Planck Institute for Informatics in Germany.

Kempa and Kociumaka discover the resolution of the “BWT Conjecture,” concerning the widely used Burrows-Wheeler Transform -- an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. This problem has been a confounding variable in the fields of data compression and bioinformatics.

As Professor Navarro mentioned, it has been challenging to find a way to condense repetitive datasets as statistical compression has already proven to be inefficient. Kempa and Kociumaka prove an upper bound r = O(z log^2 n) on the output size of BWT compression, establishing the first in 25 years link to LZ77 compression, and thus proving that it achieves much stronger compression than previously thought. They also propose the first algorithm for the construction of BWT from LZ77 in time proportional to the size of data in compressed form (which is up to exponentially faster than prior approaches).

Navarro regales the research paper as he describes it as a “beautiful masterpiece” and proclaims Kempa and Kociumaka as “rising stars.”

About the Researcher

Dominik Kempa joined the Department of Computer Science at Stony Brook University following postdoc positions at Johns Hopkins University and University of California in Berkeley. He earned a PhD from the University of Helsinki, with research focusing on the efficient construction of fundamental data structures for string indexing. In November 2022 he will deliver a keynote talk at SPIRE 2022, 29th International Symposium on String Processing and Information Retrieval.