Location
CS2311
Event Description

Exploring the Power of Heterogeneous Information Sources

Nowadays, a vast ocean of data are collected from trillions of connected devices everyday. Useful knowledge is usually buried in multiple genres of data, which are from different sources, in different formats, and with different types of representation. Many interesting patterns cannot be extracted from a single data collection, but have to be discovered from the integrative analysis of all heterogeneous data sources available. Although many algorithms have been developed to analyze multiple information sources, real applications continuously pose new challenges: Data can be gigantic, noisy, unreliable, dynamically evolving, highly imbalanced, and heterogeneous. Meanwhile, users provide limited feedback, have growing privacy concerns, and ask for actionable knowledge. In this talk, I will discuss our research on exploring the power of multiple heterogeneous information sources in challenging learning scenarios. I will present two perspectives of learning from multiple sources, i.e., exploring their similarities (knowledge integration) or their differences (inconsistency detection). First, for knowledge integration, we proposed a graph-based consensus maximization framework to combine multiple supervised and unsupervised models, which significantly improves classification accuracy. Second, we developed approaches based on matrix factorization and spectral embedding to detect objects performing inconsistently across multiple sources as a new type of anomalies. I will show the effectiveness of these general learning techniques with a few sample applications in social media, networking, cyber security and bioinformatics.

 

Jing Gao is an assistant professor in the Computer Science and Engineering Department at the University at Buffalo, The State University of New York. She is broadly interested in data and information analysis with a focus on information integration, ensemble methods, transfer learning, anomaly detection and mining data streams. She obtained her PhD degree in Computer Science from University of Illinois at Urbana-Champaign in 2011. Her thesis work was supported by IBM PhD fellowship and lead to a well-received tutorial at SDM’10 conference. She has published more than 30 papers in refereed journals and conferences.