Sameer Dharur

sameerdharur at gatech dot edu


I am a research engineer at Apple in the Siri Perception group. I work on building novel research prototypes at the intersection of computer vision, natural language and speech processing enabling rich user experiences with Siri.

In Spring 2021, I received a Master's degree in Computer Science (specializing in Machine Learning) from Georgia Tech, where I was advised by Dhruv Batra, and worked closely with Devi Parikh and Ramprasaath Selvaraju.

I spent the summer of 2020 as a Conversational AI Intern at Salesforce, working on Einstein Reply Recommendations. Prior to commencing my MS degree in Fall 2019, I was a software engineer for 3 years at Qualcomm, most recently in the Machine Learning group working on the Snapdragon Neural Processing Engine.

I earned a Bachelor's degree in Computer Science in 2016 from BITS Pilani, where I was advised by Chittaranjan Hota.

Through the past decade, I enjoyed being a professional Quiz Master conducting quizzing competitions at different levels across India. A selection of my content can be viewed here.

E-mail  |  CV  |  Scholar  |  LinkedIn  |  Github  |  Twitter

Research Interests

My research interests lie in building AI agents that can see (computer vision), communicate (natural language processing) and act (robotics) in novel settings in reasonable, logical and interpretable ways. Concretely, my research revolves around :

  • Training models that achieve high-level AI goals such as navigation and question-answering.
  • Interpreting the decision-making processes of models to better understand their flaws.
  • Equipping models with the ability to reason about the world the way humans naturally do.
I have also dabbled in inter-disciplinary reasearch, exploring the use of transformer-based natural language processing algorithms to better inform public policy discussions around sustainable transportation.

News
Selected Publications
3DSP Episodic Memory Question Answering
Samyak Datta, Sameer Dharur, Vincent Cartillier, Mukul Khanna, Ruta Desai, Dhruv Batra, Devi Parikh.
Keywords: computer vision, embodied AI, natural language processing, virtual assistants
Computer Vision and Pattern Recognition (CVPR), 2022 (Oral)
paper | website
3DSP SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju.
Keywords: visual question answering, consistency, reasoning, natural language processing
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning, 2020
paper | code | talk
3DSP Habitat 2.0: Training Home Assistants to Rearrange their Habitat
Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra.
Keywords: computer vision, embodied AI, explainable AI
Neural Information Processing Systems (NeurIPS), 2021 (Spotlight, top 3% of 9122 submissions)
paper | code | press
3DSP Motion Assisted Image Segmentation and Object Detection
Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat.
Keywords: computer vision, segmentation, object detection, edge computing
United States Patent and Trademark Office, 2018
3DSP Topic Classification of Electric Vehicle Consumer Experiences with Transformer-Based Deep Learning
Sooji Ha, Daniel J Marchetto, Sameer Dharur, Omar Isaac Asensio.
Keywords: electric vehicles, mobile data, natural language processing, transformer models
Patterns, Cell Press, 2021
paper | code | blog | press
3DSP Extracting User Behavior at Electric Vehicle Charging Stations with Transformer Deep Learning Models
Daniel J Marchetto, Sooji Ha, Sameer Dharur, Omar Isaac Asensio.
Keywords: electric vehicles, mobile data, natural language processing, transformer models
3rd International Conference on Advanced Research Methods and Analytics (CARMA), 2020
paper | code | blog
3DSP Using Machine Learning Techniques to Aid Environmental Policy Analysis: A Teaching Case Regarding Big Data and Electric Vehicle Charging Infrastructure
Omar Isaac Asensio, Ximin Mi, Sameer Dharur.
Keywords: natural language processing, machine learning, econometrics, electric vehicles
Case Studies In The Environment, University of California Press, 2020
journal | code | blog
Selected Miscellaneous Projects
3DSP Visually Interpreting Point Goal Navigation
Keywords: embodied AI, explainable AI, computer vision, deep learning.
MS Thesis, Georgia Tech (work in progress)

Conducting gradient-based interpretability experiments in the Habitat framework on Point Goal Navigation to answer the question - 'Where does a deep reinforcement learning (RL) model look while navigating a novel environment?'. Results from our experiments are coming soon.

3DSP Generating hashtag sequences on image based social media posts
Pradyumna Tambwekar*, Sameer Dharur*.
Keywords: computer vision, natural language processing, deep learning, social media.
Deep Learning, Fall 2019, Georgia Tech

Introduced a multi-modal vision-and-language application of generating hashtag sequences on social media posts. Scraped a dataset from publicly available Instagram posts to trained a CNN + LSTM encoder and an LSTM decoder for the task of hashtag sequence generation. Reported a BLEU score of 0.69 on the validation split.

paper | code
3DSP Improving cancer detection in lung X-rays via data augmentation by VAEs
Arvind Akpuram Srinivasan*, Sameer Dharur*, Shalini Chaudhuri*, Shreya Varshini*, Sreehari Sreejith*.
Keywords: computer vision, unsupervised learning, deep learning, explainable AI.
Machine Learning, Fall 2019. Georgia Tech

Used Variational Autoencoders (VAEs) for data augmentation to generate realistic malignant and benign lung X-rays and help train more accurate detection models. Improved mean F1 scores on cancer detection by 4.5% points over baselines.

website | code
3DSP User Privacy via Face Detection in a Video Call
Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat.
Keywords: computer vision, object detection, semantic segmentation, edge computing.
Qualcomm India Maker Challenge, 2018.

Built a feature to enhance user privacy in a video call by obscuring the background, through object detection and semantic segmentation on Qualcomm's Snapdragon Neural Processing Engine (SNPE). Was a National Finalist - Top 5 among 350 projects - at the Qualcomm India Maker Challenge 2018.

summary | code
Teaching


"Intelligence is the ability to navigate through problem space." ~ Siddhartha Mukherjee
Design inspired from here