About Me

I am advised by Prof. Dimitris Samaras. Prior to Fall '19, I was advised by Prof. Roy Shilkrot.
Currently my research is 3D estimation and texture reconstruction from single images, specifically document images. I am trying to dewarp folded/warped document images to its equivalent flatbed scanned version using an end-to-end neural network [Read more >].

My broad interest is creating novel assistive technologies using deep learning. Although my current research is in Computer Vision, I have significant experience in working with text and speech data. Check out my old projects on Speech disfluency correction, and Dark Patterns detection in webpages.

During my Masters', I have worked on Evolutionary and Nature Inspired Algorithms, advised by Prof. Haider Banka.

Education

PhD. Computer Science, Stony Brook University, New York, USA (2016-Present)

M.Tech. Computer Science & Engineering, Indian Institute of Technology Dhanbad, India (2014-2016)

B.Tech. Computer Science & Engineering, West Bengal University of Technology, India (2009-2013)

Updates

July '19
Paper accepted at ICCV '19, Seoul, KR.
June '19
Joined as Computer Vision intern at Tulip Interfaces
February '19
This Spring I'm a part-time intern at Kasisto Inc.
February '19
Paper accepted at ICASSP '19, Brighton, UK.
August '18
(Co-authored) Research grant accepted under Samsung GRO.
June '17
Paper accepted at ACM Document Engineering '17, Malta.

Projects

Dewarping Document Images [Aug 2018-Currently Active]
Reconstruction of folded/warped document images in 3D using end-to-end CNN.
Python, PyTorch
Patternminator: Dark Pattern Detection in Web
Classify and warn users about cunning and deceptive UIs (DarkPatterns) in web pages.
Python, Keras/Tensorflow, JS
Increase Apparent Public Speaking Fluency by Speech Augmentation
Classify and remove disfluencies from a given speech for better speaker fluency.
Python, Tensorflow
The Common Fold: Dewarping Four-Folded Printed Documents
De-warped double-half folded papers from a single image of a regular (non range) camera.
Python, Caffe

Dewarping Document Images

Overview

  • Single warped document image as input.
  • Ouput unwarped texture.
  • Using an intermediate representation- Depth/3D coords.
  • Rendered 3D dataset of ~100k document images (synthetic), ~1k meshes (real) and appropriate ground-truths.
  • This is an ongoing research, more details will be updated soon!

Dataset

Coming Soon!

Publications

  • DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks
  • Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, Roy Shilkrot [ ICCV, 2019 ]

Patternminator: Dark Pattern Detection in Web

Overview

  • Take a webpage html, screenshot as input.
  • Segemnt html, and obtain corresponding segment images.
  • Extract features (html, text and image) from segments.
  • Classify -> Dark Patterns and Non-Dark Patterns (segment level).
  • Currently our system leverages from visual, textual, HTML features and achieves f1 score of 0.84 in detecting Dark Patterns contained in web elements.
  • Used SVM, Logistic Regression and XGBoost for classification.

Dataset

Coming Soon!

Publications

Coming Soon!

Increase Public Speaking Fluency by Speech Augmentation

Overview

  • Take a impromptu/unrehersed speech as input.
  • Segment the disfluencies in a sound segmentation approach.
  • Delete the disfluencies.
  • Classify silences -> Fluent (micro-pause) and Disfluent (unnatural, long-pause)
  • Current system classifies and segments filler words (uh, umm) with frame level precision of 0.95.
  • We can classify unnatural pauses and natural pauses with f1score of 0.70 given a word utterance pair.
  • Finally silences are synthesized for fluent speech.

Publications

  • Increase Apparent Public Speaking Fluency by Speech Augmentation
  • Sagnik Das, Nisha Gandhi, Tejas Naik, Roy Shilkrot [ IEEE ICASSP, 2019 ]

The Common Fold

Overview

  • Single half-folded document image as input.
  • Dewarped image as output.
  • We propose a segmentation-reconstruction approach.
  • Semantic segmentation to find the creases on paper using fully convolutional network (FCN)
  • Use the creases to separate parts of paper.
  • Reconstruction using a Coons-patch on each part.
  • On our de-warped image the OCR word accuracy was ~3 times more compared to the folded version.

Publications

  • The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image
  • Sagnik Das, Gaurav Mishra, Akshay Sudharshana, Roy Shilkrot [ ACM DocEng, 2017 ]

Publications

Journal Publications

CRHS: Clustering and Routing in Wireless Sensor Networks using Harmony Search Algorithm
Praveen Lalwani, Sagnik Das, Haider Banka, and Chiranjeev Kumar, Neural Computing and Applications 30, no. 2 (2018): 639-659.

Conference Proceedings

DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks
Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, and Roy Shilkrot, IEEE International Conference on Computer Vision (ICCV), 2019.

Increase Apparent Public Speaking Fluency By Speech Augmentation
Sagnik Das, Nisha Gandhi, Tejas Naik, and Roy Shilkrot, In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6890-6894. IEEE, 2019.

The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image
Sagnik Das, Gaurav Mishra, Akshay Sudharshana, and Roy Shilkrot, In Proceedings of the 2017 ACM Symposium on Document Engineering, pp. 125-128. ACM, 2017.

Bacterial Foraging Optimization Algorithm for CH Selection and Routing in Wireless Sensor Networks
Praveen Lalwani, and Sagnik Das, In Recent Advances in Information Technology (RAIT), 2016 3rd International Conference on, pp. 95-100. IEEE, 2016.