NSF Research Award Funds Storage Design for Scientific Big Data


Stony Brook, NY – September, 2013

Faculty from Stony Brook University’s (SBU) Department of Computer Science recently received research funding through the National Science Foundation (NSF) Big Data Science & Engineering program. The award of over $400,000 will be used by Stony Brook University Professors Erez Zadok, Michael Bender, and Rob Johnson to improve the performance and reliability of the Hierarchical Data Format (HDF5), a container format for scientific data.

The research proposal, “BIGDATA: Small: DCM: Collaborative Research: An efficient, scalable, and portable storage system for scientific data containers” is a collaboration with Prof. Liuba Shrira at Brandeis University and Prof. Werner Benger at Louisiana State University (LSU). The proposed research is based on the knowledge that big data sets are becoming too large and complex to fit in computer memory. This funding allows researchers to design new data structures and algorithms that will aid in improving performance by reducing searching and indexing costs of massive data. Additionally, to address data vulnerabilities and improve provenance, researchers will use recent results from algorithms, database, and storage research to improve the reliability of standard scientific data formats.

Researchers will develop snapshots support in HDF5, critical for error recovery, and design new data structures and algorithms to scale HDF5 data access on modern storage devices. By designing several new HDF5 drivers—mapping objects to a Linux file system, storing objects in a database, and accessing data objects on remote Web servers—researchers aim to improve the efficiency of big data storage and processing. These improvements are evaluated by the team using large-scale visualization applications with big data, stemming from real-world scientific computations, such as those that run on Stony brook’s recently launched Reality Deck (RD).

An educational outcome of the research is the exposure of future computer scientists at each institution to practical scientific big data management. Additionally, techniques and discoveries surrounding data storage systems and data structures will influence other scientific domains and society.