Yejin Choi Detecting Deception in Online Reviews

STONY BROOK, NY, October 24, 2012

This research is based on the postulation that there exist natural distributions of opinions in product reviews. A deceptive business entity that hires people to write fake reviews will necessarily distort its distribution of review scores, leaving distributional footprints behind. A range of experiments confirm the hypothesized connection between the distributional anomaly and deceptive reviews.

The startling message from this research is that it is possible to detect business entities with deceptive reviews based only on the shape of the distribution of opinions. After all, there is no perfect crime: hotels that hire fake reviews will necessarily distort the shape of the review scores of their own hotels, which then forms the footprint of the deceptive activity. The more they do it, the stronger footprint they leave behind.

Additionally, Yejin Choi appeared at the WNBC for its 5pm News on Sep 27, where she offers tips for consumers to identify fake reviews in popular hotel review websites. This interview is based on her earlier paper "Finding Deceptive Opinion Spam by Any Stretch of the Imagination" presented at the Association for Computational Linguistics (ACL) 2011. Yejin also appeared at Bloomberg Businessweek, in its September 29, 2011 edition . This research resulted from collaboration with researchers at Cornell University: Jeff Hancock, Claire Cardie, and Myle Ott, and has been also featured in numerous other media including ABC News and New York Times.

This research concentrates on linguistic cues that are indicative of deception in the product review domain by developing statistical techniques that can distinguish different writing styles between deceptive reviewers and truthful reviewers. Interestingly, humans are not very good at detecting deceptive reviews, performing only slightly better than chance in controlled lab studies. In contrast, statistical algorithms can identify deceptive reviews with accuracy as high as 90%. This research results in a number of interesting insights into deception cues in online reviews, some of which are highlighted in the WNBC interview linked here.