April 29 - Unifying HDFS and GPFS: Enabling Analytics on Enterprise Storage


Faculty Colloquium and CSE 600

Join us in Room 120 at 2:30p for the talk Unifying HDFS and GPFS: Enabling Analytics on Enterprise Storage presented by speaker,  from IBM TJ Watson. 


Distributed file systems built for Big Data Analytics and cluster file systems built for traditional applications have very different functionality requirements, resulting in separate storage silos. In enterprises, there is often the need to run analytics on data generated by traditional applications that is stored on cluster file systems. The absence of a single data store that can serve both classes of applications leads to data duplication and hence, increased storage costs, along with the cost of moving data between the two kinds of file systems.

It is difficult to unify these two classes of file systems since the classes of applications that use them have very different requirements in terms of performance, data layout, consistency and fault tolerance. In this paper, we look at the design differences of two file systems - IBM’s GPFS and the open source Hadoop’s Distributed File System (HDFS) and propose a way to reconcile these design differences. In this talk, I will present our recent work to unify the two file systems to enable analytical applications to efficiently access data stored in GPFS. Evaluations show that our system outperforms HDFS for most Big Data applications, while retaining all the guarantees provided by traditional cluster file systems. This solution is included in the IBM Spectrum Scale 4.2 release.

Finally, I will present some of our more recent work on analyzing large scale time series data on distributed runtime platforms such as Apache Spark. 

Ramya Raghavendra is a research scientist and Master Inventor at IBM TJ Watson Research Center working on Next Generation Systems research.  She received her MS (2009) and PhD (2010) in Computer Science from University of California, Santa Barbara where she was awarded Outstanding Graduate Student award, Grace Hopper fellowship and ACM Student Research award. Currently, her work focuses on building systems, runtimes and algorithmic support for Big Data Analytics. Her research and is supported by ARL, DARPA and DHS grants. Ramya is the recipient of IBM Master Inventor award in 2015 and her work has appeared in venues such as ICDE.