Mallesham Dasari, Ph.D. Proposal Defense: "AI-Driven Optimization of Multimedia Applications"

Dates: 
Monday, August 16, 2021 - 11:30am to 1:00pm
Location: 
Zoom - contact events@cs.stonybrook.edu for Zoom info.
Event Description: 

Abstract:

 

The Internet has become an integral part of everyday life dominated by multimedia applications involving various web, video, and AR/VR applications. A critical performance issue in these applications is the Quality of Experience (QoE) of end-users. Despite the intense research in the past, delivering the best possible QoE for these applications is still a challenging problem because of 1) lack of clear understanding of QoE bottlenecks, 2) single-dimensional approach to resource optimization, and 3) lack of effective methodologies to combine multiple modalities of information. Our work makes three contributions to address these challenges.

 

First, we develop an understanding of QoE by using a large-scale user study and model the objective QoE for multimedia applications. Modeling the QoE is essential for network operators to capture the actual user experience in the wild for efficient resource provisioning. Using objective QoE models, we conduct a measurement study to find out the root cause of QoE issues across diverse applications and devices. We develop an understanding of various device-related bottlenecks (e.g., processor/memory) along with the impact of network capacity affecting different applications differently. 

 

Second, using the insights gathered from the above measurement studies, we build a system called PARSEC (PAnoRamic StrEaming with neural Coding) to study bandwidth-intensive 360-degree video streaming applications. PARSEC leverages a deep learning-based super-resolution technique to enhance video quality on the client-side when the network condition is poor. In doing so, PARSEC exploits a tradeoff between network bandwidth and client-side compute capacity to improve the video quality. PARSEC achieves up to 1.8x improvement in QoE or equivalently requires 43% less bandwidth for the same QoE when compared to state-of-the-art methods. 

 

Third, we develop ROVAR, a multi-user tracking system for AR/VR applications. Good QoE in these applications demand accurate and robust tracking of multiple users. ROVAR fuses multi-modal sensor information from visual tracking (e.g., SLAM) and RF positioning (e.g., WiFi/UWB) enabling a rich substrate for effective tracking. To effectively fuse the multi-modal information, ROVAR brings together the complementary strengths of data-driven (for accurate tracking) and algorithmic (for robustness in unseen locations) approaches. When tested under challenging conditions, ROVAR not only demonstrates robustness and scalability across multiple users but also tracks within about 15cm error while prior methods show more than a meter error under the same conditions. 

 

Finally, in our ongoing work, we are exploring what improvements in video compression are possible via exploiting recent advances in deep learning and computer vision. Our initial results have shown a significant promise in exploiting deep learning to develop layered video coding that provides several benefits over the existing layered coding technique. In our future work, we will develop a mature set of techniques, evaluate them comprehensively and develop supporting adaptive bitrate control techniques for networked video applications.

Hosted By: 
Samir Das
Computed Event Type: 
Mis