Feb. 26 - Representation Learning in Image and Audio Processing


Join the Department of Computer Science as we welcome , from New York University, as a faculty candidate and CSE 600 presenter. Dr. Sprechmann's talk will take place on Friday, Feb. 26 at 2.30pm in Room 120 of the new CS building.  

Talk Title: Representation learning in image and audio processing: from sparse models to deep learning ​
Abstract: Learning representations is a set of techniques that aim at learning to transform raw input data into a representation that can be effectively exploited in a high level task such as restoration, prediction, or classification. In this talk I will discuss two successful techniques for learning representations from natural audio and image data: sparse modeling and deep learning. In the first part of the talk, I will discuss interesting connections between these two approaches. Sparse models received a lot of attention in recent years, achieving numerous state-of-the-art results in various signal processing applications. Traditionally, such modeling approaches rely on an iterative algorithm that minimizes an objective function. The inherently sequential structure and the data-dependent complexity and latency of iterative optimization tools often constitute a major computational bottle-neck.  Another limitation encountered by these modeling techniques is the difficulty of their inclusion in discriminative learning scenarios. To overcome this limitations, we develop a process-centric view of sparse modeling, in which a learned deterministic fixed-complexity pursuit process is used in lieu of iterative optimization establishing
connections with representations learned using deep neural networks.
I will illustrate these ideas in several audio and image processing tasks. A fundamental ingredient in the success of sparse models, is their ability to capture local regularity in the data. However, these methods are not design to model global properties of the signal which are key to capture complex geometrical structures and textured regions. On the other hand, representations learned for solving discriminative tasks, such as object recognition, are global representations, stable to local deformations. In the second part of the talk I will discuss a new method for exploiting representations learned from discriminative tasks in the context of generative models. I will illustrate our method on an image super resolution task. The idea is to use as conditional model a Gibbs distribution, where its sufficient statistics are given by deep Convolutional Neural Networks (CNN). The resulting sufficient statistics minimize the uncertainty of the target signals given the degraded observations, while being highly informative.
Bio: Pablo Sprechmann is currently a postdoctoral researcher in Yann LeCun's group at CILVR lab, Computer Science Department, Institute for Mathematical Sciences, New York University. He received an MSc degree from the Universidad de la República, Uruguay, in 2009, and a PhD degree in 2012 from the
Department of Electrical and Computer Engineering, University of Minnesota. He worked as postdoctoral researcher at the ECE Department, Duke University during 2013. His main research interests include the areas of machine learning and its application to computer vision, signal processing and music
information retrieval.