Boosting Efficient Data Storage and Retrieval

MIT News recently highlighted the research that Professor Michael Bender in the Department of Computer Science at Stony Brook University is working on with William Kuszmaul (MIT) and Google’s Bradley Kuszmaul. 

The team’s theoretical research findings relate to “linear-probing hash tables,” first introduced in 1954 and still among the fastest available data structures today. Data structures provide mechanisms organize and store data in computers with hash tables being a common approach.

However, linear-probing hash tables are generally understood to suffer from a limitation that if we allow them to become too full, the insertion of new data items becomes slower. So a standard practice has been to operate such structures at a low capacity raising costs for organizations who need to maintain large amounts of data. By examining the use of linear-probing hash tables and considering the economic impact, Michael Bender and his colleagues had a theoretical breakthrough which could result in more efficient data storage and retrieval.

According to the MIT article, “The time-honored principle of operating hash tables at low capacity has been totally upended by the work.” Furthermore, the article states, “Their findings conclude that for applications where the number of insertions and deletions stays about the same — and the amount of data added is roughly equal to that removed — linear-probing hash tables can operate at high storage capacities without sacrificing speed.”

Research findings were presented in 2021 at the Foundations of Computer Science (FCOS) Symposium in Colorado. The full MIT article can be found here.