Sameer Dharur

sameerdharur at gatech dot edu

I am an incoming research engineer at Apple in the Siri group.

In Spring 2021, I received a Master's degree in Computer Science (specializing in Machine Learning) from Georgia Tech, where I was advised by Dhruv Batra, and worked closely with Devi Parikh and Ramprasaath Selvaraju.

I spent the summer of 2020 as a Conversational AI Intern at Salesforce, working on Einstein Reply Recommendations. Prior to commencing my MS degree in Fall 2019, I was a software engineer for 3 years at Qualcomm, most recently in the Machine Learning group working on the Snapdragon Neural Processing Engine.

I earned a Bachelor's degree in Computer Science in 2016 from BITS Pilani, where I was advised by Chittaranjan Hota.

Through the past decade, I enjoyed being a professional Quiz Master conducting quizzing competitions at different levels across India. A selection of my content can be viewed here.

E-mail  |  CV  |  Scholar  |  LinkedIn  |  Github  |  Twitter

Research Interests

My research interests lie in building AI agents that can see (computer vision), communicate (natural language processing) and act (robotics) in novel settings in reasonable, logical and interpretable ways. Concretely, my research revolves around :

  • Training models that achieve high-level AI goals such as navigation and question-answering.
  • Interpreting the decision-making processes of models to better understand their flaws.
  • Equipping models with the ability to reason about the world the way humans naturally do.
I have also dabbled in inter-disciplinary reasearch, exploring the use of transformer-based natural language processing algorithms to better inform public policy discussions around sustainable transportation.

Selected Publications
3DSP SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
Sameer Dharur, Purva Tendulkar, Dhruv Batra, Devi Parikh, Ramprasaath R. Selvaraju.
Keywords: visual question answering, consistency, reasoning, natural language processing
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
NeurIPS workshop on Interpretable Inductive Biases and Physically Structured Learning, 2020
paper | code | talk
3DSP Motion Assisted Image Segmentation and Object Detection
Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat.
Keywords: computer vision, segmentation, object detection, edge computing
United States Patent and Trademark Office, 2018
3DSP Topic Classification of Electric Vehicle Consumer Experiences with Transformer-Based Deep Learning
Sooji Ha, Daniel J Marchetto, Sameer Dharur, Omar Isaac Asensio.
Keywords: electric vehicles, mobile data, natural language processing, transformer models
Patterns, Cell Press, 2021
paper | code | blog | press
3DSP Extracting User Behavior at Electric Vehicle Charging Stations with Transformer Deep Learning Models
Daniel J Marchetto, Sooji Ha, Sameer Dharur, Omar Isaac Asensio.
Keywords: electric vehicles, mobile data, natural language processing, transformer models
3rd International Conference on Advanced Research Methods and Analytics (CARMA), 2020
paper | code | blog
3DSP Using Machine Learning Techniques to Aid Environmental Policy Analysis: A Teaching Case Regarding Big Data and Electric Vehicle Charging Infrastructure
Omar Isaac Asensio, Ximin Mi, Sameer Dharur.
Keywords: natural language processing, machine learning, econometrics, electric vehicles
Case Studies In The Environment, University of California Press, 2020
journal | code | blog
Selected Projects
3DSP Visually Interpreting Point Goal Navigation
Keywords: embodied AI, explainable AI, computer vision, deep learning.
MS Thesis, Georgia Tech (work in progress)

Conducting gradient-based interpretability experiments in the Habitat framework on Point Goal Navigation to answer the question - 'Where does a deep reinforcement learning (RL) model look while navigating a novel environment?'. Results from our experiments are coming soon.

3DSP Generating hashtag sequences on image based social media posts
Pradyumna Tambwekar*, Sameer Dharur*.
Keywords: computer vision, natural language processing, deep learning, social media.
Deep Learning, Fall 2019, Georgia Tech

Introduced a multi-modal vision-and-language application of generating hashtag sequences on social media posts. Scraped a dataset from publicly available Instagram posts to trained a CNN + LSTM encoder and an LSTM decoder for the task of hashtag sequence generation. Reported a BLEU score of 0.69 on the validation split.

paper | code
3DSP Improving cancer detection in lung X-rays via data augmentation by VAEs
Arvind Akpuram Srinivasan*, Sameer Dharur*, Shalini Chaudhuri*, Shreya Varshini*, Sreehari Sreejith*.
Keywords: computer vision, unsupervised learning, deep learning, explainable AI.
Machine Learning, Fall 2019. Georgia Tech

Used Variational Autoencoders (VAEs) for data augmentation to generate realistic malignant and benign lung X-rays and help train more accurate detection models. Improved mean F1 scores on cancer detection by 4.5% points over baselines.

website | code
3DSP User Privacy via Face Detection in a Video Call
Sameer Dharur, Vishal Jain, Rashi Tyagi, Harpal Singh Dhoat.
Keywords: computer vision, object detection, semantic segmentation, edge computing.
Qualcomm India Maker Challenge, 2018.

Built a feature to enhance user privacy in a video call by obscuring the background, through object detection and semantic segmentation on Qualcomm's Snapdragon Neural Processing Engine (SNPE). Was a National Finalist - Top 5 among 350 projects - at the Qualcomm India Maker Challenge 2018.

summary | code

"Intelligence is the ability to navigate through problem space." ~ Siddhartha Mukherjee
Design inspired from here