Friday, April 19, 2013 - 14:30

Speaker: Raymond Mooney

Title: Generating Natural-Language Video Descriptions Using Text-Mined Knowledge

Abstract: We present a novel method for automatically generating English sentences describing short YouTube vidoes by combining techniques from computer vision and natural-language processing. We first use state-of-the-art visual object and activity detectors to determine a potential set of entities and events in the video. We then use statistics mined from a large parsed corpus of English to determine the most probable subject-verb-object triplet for describing the video. We show that text-mined knowledge enhances content selection, significantly improves activity recognition, and allows generating simple descriptive sentences preferred by human judges.

Bio: Raymond J. Mooney is a Professor in the Department of Computer Science at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 150 published research papers, primarily in the areas of machine learning and natural language processing. He was the President of the International Machine Learning Society from 2008-2011, program co-chair for AAAI 2006, general chair for HLT-EMNLP 2005, and co-chair for ICML 1990. He is a Fellow of the American Association for Artificial Intelligence and the Association for Computing Machinery, and the recipient of best paper awards from AAAI-96, KDD-04, ICML-05 and ACL-07.

