Ph.D. Thesis Defense: Songzhu Zheng, 'Robust Deep Learning with Corrupted Data - from Noisy Labels to Poisoned Data'

Dates

Monday, November 07, 2022 - 01:00pm to Monday, November 07, 2022 - 03:00pm

Location

Old Computer Science Building, Room 2314

Event Description

Abstract: We have witnessed the unprecedented development of deep learning (DL) in the recent decade. In many areas, such as computer vision and natural language processing, DL has shown strong performance. However, over-parameterized deep learning models are prone to over-fitting, and thus are highly sensitive to data corruption, due to noise and due to malicious attacks. Training with these corrupted data will cause deteriorated performance or unexpected malicious behaviors.

We study robust DNN training with corrupted data. We first introduce our algorithms for training DL models with noisy labels. Our first algorithm uses the confidence given by a corrupted model to correct labels of the dataset. Our second algorithm uses the geometry and topology of data representation generated by models to identify and collect clean data. Both algorithms are designed to deal with feature-independent noise. In practice, the label noise usually depends on the input feature. To tackle this more general situation, we propose our third algorithm that progressively corrects labels starting from most confident data points and gradually refines the DL model. Finally, we further relax our assumptions and work with dataset that contains data with dominating noise. We propose an algorithm to detect these
uninformative data and ignore them during training. These algorithms deal with noise that is intrinsic to the data generation process.

In modern world, data points might be deliberately corrupted by people for malign purposes. Trojan attack is such a scenario, where attackers manually inject modified data into the training database and manipulate the output of the trained network. We will first present an algorithm to detect the Trojaned model using the structural connection of neurons. Next, we provide a unified framework to study Trojan attack together with adversarial attack, and develop stronger attack algorithms that are robust to training strategies.

The reported results have been published in ICML, NeurIPS, ICLR and NAACL.

Event Title

Ph.D. Thesis Defense: Songzhu Zheng, 'Robust Deep Learning with Corrupted Data - from Noisy Labels to Poisoned Data'