Dates
Monday, June 13, 2022 - 02:30pm to Monday, June 13, 2022 - 04:00pm
Location
Computer Science Building, Room 2314 (BMI conference room), and Zoom
Event Description

Abstract: Backdoor attacks pose a severe threat to AI systems. In the Trojan attack scenario, the backdoor can be injected by purposely training the model with a mixture of the clean samples and poisoned samples. The poisoned samples are a fraction of clean training samples with an attacker-defined stealthy Trojan trigger added, with a flipped label. A Trojan model will consistently give an incorrect prediction on a poisoned sample. Whereas on clean samples, it will behave similarly as a clean model, i.e., predicting the correct labels. While different methods have been developed in CV, our understanding of Trojan attacks in NLP is relatively limited.

In this report, we study Trojaned transformers through the attention mechanism. We observe the attention focus drifting behavior of Trojan BERTs, i.e., given the clean input and poisoned input, the trigger token hijacks the attention focus from normal tokens to trigger tokens. We provide a thorough qualitative and quantitative analysis of this phenomenon, revealing insights into the Trojan mechanism. Based on the observation, we propose an attention-based Trojan detector to distinguish Trojan BERTs from clean ones. As an extension, we observe that Trojaned models share an attention hijacking pattern that is more efficient to compute. This pattern persists in both BERTs and ViTs. This observation inspires us to build a more efficient Attention-Hijacking Trojan Detector (AHTD) to discriminate against the Trojan Transformers.

Contact events [at] cs.stonybrook.edu for Zoom information.

Event Title
Ph.D. Research Proficiency Presentation: Weimin Lyu, 'A Study of the Attention Abnormality in Trojan Attacked Transformers'