Qingqing Cao: PhD Thesis Defense: "Efficient Natural Language Processing for Heterogeneous Platforms"

Dates: 
Tuesday, September 28, 2021 - 1:00pm to 3:20pm
Location: 
NCS 220, or Zoom
Event Description: 
Talk abstract: Natural language processing (NLP) technology has supercharged many real-world applications ranging from intelligent personal assistants (like Alexa, Siri, and Google Assistant) to commercial search engines such as Google and Bing. Deep learning has made NLP systems more effective, but large neural models are expensive and impractical in many cases. It is challenging to deploy neural NLP systems like question answering (QA) applications. Because they (i) are extremely compute-intensive and cause large amounts of energy consumption, (ii) cannot run on mobile devices, making on-device, privacy-preserving applications impractical. 
In this talk, I will present four projects to make neural NLP systems more efficient and practical for heterogeneous hardware. I first introduce DeQA, an on-device question-answering system to help mobile users find information on their devices more efficiently without privacy issues. Deep learning QA systems are slow and unusable on mobile devices. We design the latency- and memory- optimizations widely applicable for state-of-the-art QA models to run locally on mobile devices. Next, I present DeFormer, a simple decomposition-based technique that takes pre-trained Transformer models and modifies them to enable faster inference for QA for both the cloud and mobile. Then I describe MobiVQA, a set of optimizations that adapt state-of-the-art visual question answering systems to run efficiently on mobile devices so visually impaired users can navigate visual surroundings more easily. MobiVQA dynamically reduces the computation based on the visual question difficulty and prunes unnecessary image information, thus effectively reducing the inference latency and memory footprints of VQA systems without sacrificing model accuracy. Importantly, besides improving model efficiency by reducing latency and memory footprint, Energy consumption is another essential metric for developing efficient NLP models. It is critical for reducing server costs and deploying models to battery-powered mobile devices. I describe IrEne, a multi-level regression approach to provide accurate energy estimation and interpretable energy analysis for the NLP models. 
Computed Event Type: 
Mis