Dominik Kempa: 'Searching in compressed data'

Friday, September 3, 2021 - 2:40pm to 3:40pm
NCS 120
Event Description: 

Abstract: Compressed indexing is a research field concerned with the design and construction of data structures to store massive string collections in space close to the size of compressed data, while simultaneously providing searching functionality (such as pattern matching) on the original uncompressed data. Example applications of these structures include software repositories, versioned documents (such as Wikipedia), and multi-terabyte DNA databases containing hundreds of thousands of genomes.


I will give an overview of compressed indexing, focusing on the two main types of indexes: Lempel-Ziv (LZ) indexes and indexes based on the Burrows-Wheeler transform (BWT). I will show an example of a compressed index, define the most fundamental types of queries, and give an overview of my recent work on proving the relationship between the sizes of BWT and LZ indexes. I will conclude by showing avenues for future work, including new algorithms for the construction of compressed indexes that have the potential to cut indexing time by orders of magnitude.

Location: New Computer Science, room 120


Computed Event Type: