Sameer Dharur

sameerdharur at gatech dot edu

I am a research engineer at Apple in the Siri Perception group. I work on building novel research prototypes at the intersection of computer vision, natural language and speech processing enabling rich user experiences with Siri.

In Spring 2021, I received a Master's degree in Computer Science (specializing in Machine Learning) from Georgia Tech, where I was advised by Dhruv Batra, and worked closely with Devi Parikh and Ramprasaath Selvaraju.

I spent the summer of 2020 as a Conversational AI Intern at Salesforce, working on Einstein Reply Recommendations. Prior to commencing my MS degree in Fall 2019, I was a software engineer for 3 years at Qualcomm, most recently in the Machine Learning group working on the Snapdragon Neural Processing Engine.

I earned a Bachelor's degree in Computer Science in 2016 from BITS Pilani, where I was advised by Chittaranjan Hota.

Through the past decade, I enjoyed being a professional Quiz Master conducting quizzing competitions at different levels across India. A selection of my content can be viewed here.

Research Interests

My research interests lie in building AI agents that can see (computer vision), communicate (natural language processing) and act (robotics) in novel settings in reasonable, logical and interpretable ways. Concretely, my research revolves around :

Training models that achieve high-level AI goals such as navigation and question-answering.
Interpreting the decision-making processes of models to better understand their flaws.
Equipping models with the ability to reason about the world the way humans naturally do.

I have also dabbled in inter-disciplinary reasearch, exploring the use of transformer-based natural language processing algorithms to better inform public policy discussions around sustainable transportation.

News

Mar 2022 → Our paper on Episodic Memory Question Answering got accepted to Computer Vision and Pattern Recognition (CVPR), 2022 as an Oral presentation!
Feb 2022 → Served as a reviewer for Computer Vision and Pattern Recognition (CVPR), 2022.
Sep 2021 → Our paper on Habitat 2.0: Training Home Assistants to Rearrange their Habitat got accepted to NeurIPS 2021 as a Spotlight presentation!
May 2021 → Served as a reviewer for the International Conference on Computer Vision (ICCV), 2021.
May 2021 → Graduated from Georgia Tech with a Master's degree in Computer Science.
Mar 2021 → Our patent application on Motion Assisted Image Segmentation and Object Detection was granted by the United States Patent and Trademark Office.
Mar 2021 → Served as a reviewer for ACL-IJCNLP 2021.
Mar 2021 → Our paper on SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency got accepted to NAACL 2021!
Jan 2021 → Our inter-disciplinary journal article on Topic Classification of Electric Vehicle Consumer Experiences with Transformer-Based Deep Learning got accepted to Patterns.
Oct 2020 → Our paper on SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency got accepted to the NeurIPS 2020 workshop on Interpretable Inductive Biases and Physically Structured Learning.
Aug 2020 → Started serving as the Head Teaching Assistant for the CS 4803/7643 Deep Learning class at Georgia Tech.
May 2020 → Started a summer internship at Salesforce, working on its conversational AI product Einstein Reply Recommendations.
Apr 2020 → Our paper on Extracting User Behavior at Electric Vehicle Charging Stations with Transformer Deep Learning Models got accepted to CARMA 2020!
Mar 2020 → Our paper on Using Machine Learning Techniques to Aid Environmental Policy Analysis: A Teaching Case Regarding Big Data and Electric Vehicle Charging Infrastructure got accepted for publication in the Case Studies In The Environment journal.
Jan 2020 → Started serving as a Graduate Teaching Assistant for the CS 4803/7643 Deep Learning class at Georgia Tech, taught in collaboration with Facebook AI.
Aug 2019 → Started my MS degree in Computer Science at Georgia Tech.
Dec 2018 → Our patent application on Motion Assisted Image Segmentation and Object Detection was filed by Qualcomm with the United States Patent and Trademark Office.
Nov 2018 → Our project on User Privacy Through AI in a Video Call was a National Finalist at the Qualcomm India Maker Challenge 2018.
Feb 2018 → Started working as a Machine Learning Software Engineer in the Snapdragon Neural Processing Engine (SNPE) team at Qualcomm.
Oct 2016 → Started working as a Software Engineer in the Modem Software Engineering team at Qualcomm.
Jul 2016 → Started working as an Applications Developer at Oracle.
May 2016 → Graduated from BITS Pilani with a Bachelor's degree in Computer Science.

Selected Publications

	Episodic Memory Question Answering Samyak Datta, Sameer Dharur, Vincent Cartillier, Mukul Khanna, Ruta Desai, Dhruv Batra, Devi Parikh. Keywords: computer vision, embodied AI, natural language processing, virtual assistants Computer Vision and Pattern Recognition (CVPR), 2022 (Oral) paper \| website
	SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju. Keywords: visual question answering, consistency, reasoning, natural language processing Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021 NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning, 2020 paper \| code \| talk
	Habitat 2.0: Training Home Assistants to Rearrange their Habitat Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra. Keywords: computer vision, embodied AI, explainable AI Neural Information Processing Systems (NeurIPS), 2021 (Spotlight, top 3% of 9122 submissions) paper \| code \| press
	Motion Assisted Image Segmentation and Object Detection Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat. Keywords: computer vision, segmentation, object detection, edge computing United States Patent and Trademark Office, 2018
	Topic Classification of Electric Vehicle Consumer Experiences with Transformer-Based Deep Learning Sooji Ha, Daniel J Marchetto, Sameer Dharur, Omar Isaac Asensio. Keywords: electric vehicles, mobile data, natural language processing, transformer models Patterns, Cell Press, 2021 paper \| code \| blog \| press
	Extracting User Behavior at Electric Vehicle Charging Stations with Transformer Deep Learning Models Daniel J Marchetto, Sooji Ha, Sameer Dharur, Omar Isaac Asensio. Keywords: electric vehicles, mobile data, natural language processing, transformer models 3rd International Conference on Advanced Research Methods and Analytics (CARMA), 2020 paper \| code \| blog
	Using Machine Learning Techniques to Aid Environmental Policy Analysis: A Teaching Case Regarding Big Data and Electric Vehicle Charging Infrastructure Omar Isaac Asensio, Ximin Mi, Sameer Dharur. Keywords: natural language processing, machine learning, econometrics, electric vehicles Case Studies In The Environment, University of California Press, 2020 journal \| code \| blog

Selected Miscellaneous Projects

	Visually Interpreting Point Goal Navigation Keywords: embodied AI, explainable AI, computer vision, deep learning. MS Thesis, Georgia Tech (work in progress) Conducting gradient-based interpretability experiments in the Habitat framework on Point Goal Navigation to answer the question - 'Where does a deep reinforcement learning (RL) model look while navigating a novel environment?'. Results from our experiments are coming soon.
	Generating hashtag sequences on image based social media posts Pradyumna Tambwekar, Sameer Dharur. Keywords: computer vision, natural language processing, deep learning, social media. Deep Learning, Fall 2019, Georgia Tech Introduced a multi-modal vision-and-language application of generating hashtag sequences on social media posts. Scraped a dataset from publicly available Instagram posts to trained a CNN + LSTM encoder and an LSTM decoder for the task of hashtag sequence generation. Reported a BLEU score of 0.69 on the validation split. paper \| code
	Improving cancer detection in lung X-rays via data augmentation by VAEs Arvind Akpuram Srinivasan, Sameer Dharur, Shalini Chaudhuri, Shreya Varshini, Sreehari Sreejith. Keywords:* computer vision, unsupervised learning, deep learning, explainable AI. Machine Learning, Fall 2019. Georgia Tech Used Variational Autoencoders (VAEs) for data augmentation to generate realistic malignant and benign lung X-rays and help train more accurate detection models. Improved mean F1 scores on cancer detection by 4.5% points over baselines. website \| code
	User Privacy via Face Detection in a Video Call Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat. Keywords: computer vision, object detection, semantic segmentation, edge computing. Qualcomm India Maker Challenge, 2018. Built a feature to enhance user privacy in a video call by obscuring the background, through object detection and semantic segmentation on Qualcomm's Snapdragon Neural Processing Engine (SNPE). Was a National Finalist - Top 5 among 350 projects - at the Qualcomm India Maker Challenge 2018. summary \| code

Teaching

Head Teaching Assistant → Deep Learning, Georgia Tech, Fall 2020.
Teaching Assistant → Deep Learning, Georgia Tech, Spring 2020.
Teaching Assistant → Cryptography, BITS Pilani, Spring 2016.

"Intelligence is the ability to navigate through problem space." ~ Siddhartha Mukherjee
Design inspired from here