Exploiting Structure in Large-Scale Machine Learning Problems

Abstract:

With an immense growth of data, there is a great need for solving large-scale machine learning problems. Classical optimization algorithms usually cannot scale up due to huge amount of data and/or model parameters. In this talk, I will show that the scalability issues can often be resolved by exploiting three types of structure in machine learning problems: problem structure, model structure, and data distribution. This central idea can be applied to many machine learning problems, including kernel machines for classification or regression, matrix factorization for recommender systems, and structure learning for graphical models.

To demonstrate this central idea, I will describe a Newton-like algorithm for solving the L1-regularized Gaussian maximum likelihood estimator (MLE). This estimator has strong statistical guarantee in recovering a sparse inverse covariance, but requires solving a difficult non-smooth log-determinant program with number of parameters that scale quadratically with number of random variables. State-of-the-art methods thus cannot handle more than 20,000 random variables. I will present a Newton-like algorithm for solving this problem. By exploiting structure of problem, model, and data distribution, our proposed algorithm can solve 1 million dimensional L1-regularized Gaussian MLE (which has 1-trillion parameters) in one day using a single machine.

Short Bio:

Cho-Jui Hsieh is a Ph.D. student at University of Texas at Austin. His research focus is developing new algorithms and optimization techniques for large-scale machine learning problems. Cho-Jui obtained his B.S. degree in 2007 and M.S. degree in 2009 from National Taiwan University (advisor: Chih-Jen Lin). Currently he is a member of Center for Big Data Analytics led by Inderjit Dhillon. He is the recipient of the IBM Ph.D. fellowship in 2013-2015, the best research paper award in KDD 2010, and the best paper award in ICDM 2012.