Harsh Mankodiya
Hi! I am Harsh Mankodiya, final-semester grad-student pursing Master’s in Computer Science at Arizona
State
University, specializing in Machine Learning. I am passionate about building
scalable AI systems that bridge research and real-world applications. My work spans Vision Model,
Multimodal Learning,
Reinforcement Learning (RL), and Natural Language Processing (NLP), with a strong focus on developing
adaptable AI solutions across diverse domains.
I specialize in translating theoretical machine learning concepts into scalable systems, optimizing
model pipelines for performance, and designing frameworks tailored to specific applications.
I received my Bachelors degree in Computer Science from Nirma University.
I am originally from Gujarat, India, and outside of research, I enjoy spending time taking long walks,
playing video games, and socializing.
Seeking full-time Machine Learning / Data Scientist opportunities for 2025.
[Email]
[CV]
[Google Scholar]
[LinkedIn]
[GitHub]
|
|
Selected Research
Please see my Google
Scholar
for
a full list of work.
|
|
Trustworthy Conceptual Explanations for Neural Networks in Robot Decision-Making
Som Sagar*, Aditya Taparia*, Harsh Mankodiya, Pranav Bidare, Yifan Zhou, Ransalu
Senanayake
NeurIPS Workshop on Safe & Trustworthy Agents, 2024
[PDF]
We introduce BaTCAV, a Bayesian TCAV framework with uncertainty estimations that enhances the
interpretability of robotic actions across both simulation platforms and real-world robotic systems.
|
|
OD-XAI: Explainable AI-Based Semantic Object Detection for Autonomous Vehicles
Harsh Mankodiya, Dhairya Jadav, Rajesh Gupta, Sudeep Tanwar
MDPI Applied Sciences, 2024
[PDF]
We propose an XAI integrated AV system that improves the explainability of semantic segmentation models,
which
are considered black box models and are difficult to analyze and comprehend.
|
|
Instruction Following for LLaMA2-7B using Supervised Finetuning
[Code]
Fine-tuned LLaMA2-7B on the LIMA dataset for efficient
instruction following. Trained on structured prompts covering general knowledge, reasoning, and
conversational
tasks. Compared performance with QLoRA, LoRA and base LLaMA2 models, demonstrating superior instruction
adherence with minimal trainable parameters.
|
|
Multilingual Sentiment Classification using LLMs
[Code]
Fine-tuned LLaMA2-7B using Quantized Low-Rank Adaptation (Q-LoRA) for multilingual sentiment analysis
across 12 languages, achieving a 30% increase in test AUC and a 20% improvement in accuracy. Trained on
datasets like IndoNLU, GoEmotions, and multilingual Amazon reviews, encompassing diverse domains such as
social media, e-commerce, and movie reviews. Conducted comparative analysis with GPT-2 and BERT,
showcasing LLaMA2-7B's superior performance with minimal trainable parameters.
|
|
Transformer-Powered Image Captioning with pre-trained DinoV2 embeddings
[Code]
Developed a caption generation model leveraging the CLIP Vision encoder and DINOv2 transformer embeddings
trained on the MS COCO Captions dataset. Integrated and fine-tuned GPT-2 decoder, on ~1% of the MS COCO
dataset, which
achieved a BLEU-4 score of 7%. Utilized a pre-trained GPT-2 tokenizer for efficient caption tokenization.
Implemented and evaluated diverse decoding strategies, including greedy decoding and beam search, to
optimize caption generation. The model demonstrated robust performance, showcasing its capability for
high-quality image-to-text generation.
|
|
Deep Q-Network From Scratch
[Code]
Built a Deep Q-Network (DQN) from scratch using PyTorch and OpenAI Gym, improving cumulative rewards
through experience replay. Implemented epsilon-greedy exploration for
efficient learning and trained a CNN-based Q-function for optimal action selection. Logged training
performance and visualized reward trends.
|
Machine Learning Intern
Cellino Biotech, Cambridge, MA
May 2024 - August 2024
Developed a PoC for a central embedding model leveraging pretrained architectures for downstreaming patch
selection,
anomaly detection, and segmentation, achieving an 82% F1-Score by fine-tuning DinoV2 with ViT-based heads.
|
Graduate Researcher
LENS Lab, ASU, Tempe, AZ
August 2023 - May 2024
Integrated eXplainable AI with autonomous vehicle agents in simulation environments like Carla and
Gymnasium.
Trained PPO with VAE-based feature extraction using StableBaselines3 and employed CLIP models for
zero-shot
segmentation and concept sampling.
|
Research Intern
Bosch (AIShield), Bengaluru, India
Jan 2023 - May 2024
Formulated a novel Knowledge Distillation methodology using GradCAM for image
segmentation models. Leveraged PyTorch Lightning to streamline data processing, model training,
evaluation, and inference, with experiment tracking via MLFlow. Trained SegNet and U-Net segmentation
models on NVIDIA DGX A100 systems, achieving high relative IoU scores exceeding 85% across multiple
datasets.
|
Machine Learning Intern
Samyak Infotech, Ahmedabad, India
June 2022 - July 2022
Trained a BERT-based LMLayout model for business invoice information extraction, achieving an F1-Score of
81%.
Designed labeling criteria, annotated 200+ invoices, integrated labels into training pipelines, and
managed a
surrogate SQL database for efficient data retrieval.
|
Undergraduate Researcher
STLabs, Nirma University, Ahmedabad, India
August 2021 - May 2023
Collaborated with researchers, Ph.D. students, and undergraduates on projects in Computer Vision, Deep
Learning,
and Explainable AI (XAI).
|
|