Dates
Tuesday, January 18, 2022 - 10:00am to Tuesday, January 18, 2022 - 12:00pm
Location
Zoom (contact events@cs.stonybrook.edu for details)
Event Description

Abstract: Analyzing human gaze automatically is of vital importance in various applications. Computer vision researchers generally analyze human gaze through three tasks: gaze estimation, gaze following, and saliency prediction. Specifically, gaze following tries to predict the gaze target location of a person in the scene image, as well as whether the gaze target is located inside the image, given the image and head location of that person. In this report, I first review gaze estimation datasets and methods as relevant background and then do a comprehensive review on gaze following. The review shows that recent works formulate the gaze target prediction task as a heatmap regression task and train the model with a mean-square-error (MSE) loss to the ground truth heatmap created from a single annotated gaze coordinate. Despite more accurate target localization with the heatmap formulation, the pixel-wise MSE loss limits the model from paying attention to the uncertainty in gaze as different annotators usually disagree on the exact target location in gaze following. In addition, current models regard the gaze target prediction task and in/out prediction task as two separate subtasks without considering their correlation. To address these issues, I introduce a model that replaces the in/out prediction task with a patch distribution prediction (PDP) task. Extracted encoding features are taken as 'inside tokens' and another 'outside token' is added to consider for the outside cases. Each token is predicted with a probability score and is normalized to get the overall gaze distribution. The model is trained with a ground truth patch distribution created from the heatmap with Kullback-Leibler (KL) divergence loss, and combined with the MSE loss for heatmap regression. This PDP task not only relaxes the stricter constraint of heatmap regression by considering the uncertainty in gaze, but also integrates the gaze target prediction and in/out prediction subtasks by modeling the overall gaze distribution in all cases. A patch attention module and a temporal attention module are also added to aggregate information in the patch and temporal level. Experiments show improvement in the performance of our model on the public gaze following datasets.

Event Title
Ph.D Research Proficiency Presentation: Qiaomu Miao, 'Gaze Following with Patch Distribution Prediction Prediction'