Friday, April 26, 2024 - 02:30pm to Friday, April 26, 2024 - 03:30pm
NCS 120
Event Description

Abstract: As machine learning (ML) technologies get widely applied to many domains, it has become essential to rapidly develop and deploy ML models. Towards this goal, MLOps has recently emerged as a set of tools and practices for operationalizing production-ready models in a reliable and efficient manner. However, several open problems exist, including how to automate the ML pipeline that includes data collection, model training, and deployment (inference) with support for distributed data and models stored at multiple sites. In this talk, I will cover some theoretical foundations and practical approaches towards enabling distributed MLOps, i.e., MLOps in large-scale distributed systems. I will start with explaining the requirements and challenges. Then, I will describe how our recent theoretical developments in the areas of coreset, federated learning, and model uncertainty estimation can support distributed MLOps. As a concrete example, I will dive into the details of a federated learning algorithm with flexible control knobs, which adapts the learning process to accommodate time-varying and unpredictable resource availabilities, as often seen in systems in operation, while conforming to a given budget for model training. I will finish the talk by giving an outlook on some future directions.

Bio:ShiqiangWangis a Staff Research Scientist at IBM T. J. Watson Research Center, NY, USA. He received his Ph.D. from Imperial College London, United Kingdom, in 2015. His current research focuses on the intersection of distributed computing, machine learning, networking, and optimization, with a broad range of applications including data analytics, edge-based artificial intelligence (Edge AI), Internet of Things (IoT), and future wireless systems. He received the IEEE Communications Society (ComSoc) Leonard G. Abraham Prize in 2021, IEEE ComSoc Best Young Professional Award in Industry in 2021, IBM Outstanding Technical Achievement Awards (OTAA) in 2019, 2021, 2022, and 2023, and multiple Invention Achievement Awards from IBM since 2016. For more details, please visit his homepage at:

Event Title
Seminar: 'Towards Distributed MLOps: Theory and Practice' - Shiqiang Wang, IBM