Currently a Machine Learning Engineer at SmartProtect, building scalable AI solutions for public safety, legal automation, and demand forecasting.
Actively seeking full-time opportunities as an AI/ML Engineer, Data Scientist, Data Analyst, or Data Engineer starting June 2025.
Full-Stack AI/ML and Data Science Engineer with a strong track record of building scalable intelligent systems that drive real-world results. I completed my Master's in Computer Science (Data Science Track) from NC State with a 4.0 GPA and am actively seeking full-time roles in Data Science, Machine Learning, AI Engineering, or Analytics. I'm ready to start immediately and aiming to secure a position by June 2025.
My experience spans the full ML pipeline, from developing LSTM-based forecasting systems and optimizing ETL workflows using AWS and Snowflake to integrating LLMs for NLP applications and deploying models with FastAPI and SageMaker. I've led impactful projects across public safety, legal automation, and customer intelligence, combining Python, SQL, and cloud technologies to deliver results at scale.
Alongside applied work, I've contributed to peer-reviewed research in areas like graph neural networks, language modeling, and fairness in AI, with multiple publications across conferences and journals.
I'm looking to bring this blend of hands-on engineering and deep analytical thinking to a high-impact team. Let's connect.
Built CNN models for X-ray and diabetes prediction.
Built GPT-powered legal assistants with RAG & vector DBs.
Forecasted 911 call volume and optimized staffing using ML.
Boosted campaign ROI via LSTM & ETL optimization.
Built lane detection and GNN-based driver behavior forecasting.
Built LSTM models for IoT network health prediction at scale.
Data Science Track
AI & Machine Learning
Deep Learning, Neural Networks, NLP
Data Science
Analytics, Mining, Visualization
Systems & Architecture
Databases, Algorithms, Software Engineering
Information Technology
Big Data Analytics
Minor Specialization
Software Engineering
Core Focus
AI & ML
Technical Electives
SmartProtect Public Safety Solutions
May 2024 - Present · Part-time
Wilmington, Delaware, United States · Remote
Forecast Accuracy
+20%
Model Performance
Operational Efficiency
18%
Overtime Reduction
Processing Speed
35%
Faster Retraining
North Carolina State University
Aug 2024 - May 2025 · Part-time
Raleigh, North Carolina, United States
Course Coverage
2
Advanced ML Courses
Teaching Duration
10
Months of Instruction
Jan 2025 - May 2025 · Under Prof. Thomas Price
Aug 2024 - Dec 2024 · Under Prof. Xipeng Shen
Defence Research and Development Organisation (DRDO)
Jan 2023 - Jun 2023 · Research Internship
Bengaluru, Karnataka, India · On-site
Model Performance
3.19
Language Model Perplexity (SOTA)
Processing Efficiency
40%
Reduction in retraining latency
Merkle
May 2022 - Jul 2022 · Internship
Bengaluru, Karnataka, India · Hybrid
Campaign Profitability
+10%
Revenue Optimization
Query Performance
40%
Latency Reduction
Data Scale
16M+
Records Processed
Manipal Institute of Technology
Mar 2021 - Jun 2022 · Part-time
Udupi, Karnataka, India · On-site
Model Accuracy
99.4%
IoT Network Prediction
Automation Impact
60%
Reduction in Manual Tasks
A full-stack AI-powered app that simplifies health insurance documents using LLMs and Retrieval-Augmented Generation (RAG). Users can upload plans, ask natural-language questions, view smart summaries, compare multiple plans side-by-side, and export personalized PDF reports. Built with end-to-end semantic search, summarization, and secure document handling.
A campus-wide parking management system that tracks lot availability, zoning rules, permit assignments, and citations. It allows administrators to efficiently manage parking resources, issue fines, and generate reports to support data-driven decisions for better traffic control and user experience.
An automated cold outreach tool that combines LangChain's ChatGroq + LLaMA3 with ChromaDB to extract job descriptions, match user skills, and generate personalized emails using RAG. Includes an interactive Streamlit UI for seamless job-to-email generation.
A chatbot powered by LLMs (OpenAI GPT / LLaMA) and RAG, designed to retrieve, summarize, and answer complex legal queries from document repositories with high accuracy and fast vector-based search.
Built a full ML pipeline for predicting customer churn using Apache Airflow, AWS (S3, SageMaker, ECR), and Dockerized Flask APIs, enabling scalable deployment and real-time churn inference.
Implemented a hybrid SegNet + LSTM deep learning model to detect lane lines, compute lane curvature, and measure vehicle offset using OpenCV-based image processing.
Developed a CNN-based classifier to detect pneumonia (COVID-19) from chest X-ray images, achieving 95.28% training accuracy and 89.52% validation accuracy using preprocessed radiology data.
Built a robust hybrid forecasting model combining CNN and BiLSTM architectures to predict daily item-level sales across 10 stores. Leveraged the Kaggle Store-Item Demand Forecasting dataset (2013–2017) and benchmarked against models like XGBoost, ANN, and ARIMA. The hybrid model achieved the lowest MSE, improving forecasting precision and enabling optimized retail inventory decisions.
Implemented CycleGAN to translate images between domains without paired datasets—such as Monet paintings to real photographs and human faces to zombies. Trained on publicly available datasets and deployed the model for real-time translation, showcasing the power of unsupervised generative learning in computer vision tasks.
Used U-Net architecture to perform semantic segmentation on brain MRI images, detecting and outlining tumor regions. The project utilized the LGG MRI Segmentation dataset from Kaggle and focused on pixel-level mask prediction using FLAIR MRI sequences. Achieved high segmentation accuracy and visual interpretability for potential medical diagnostics.
Designed a hybrid recommender system that combines cosine similarity with sentiment analysis to suggest movies tailored to user preferences. Scraped metadata from TMDB and IMDB, offering dynamic updates, cast bios, trailers, and review sentiment. Upgraded from a static recommendation system to an interactive, emotionally aware movie discovery experience.
Novel approach to privacy-preserving language models in healthcare settings using prompt-induced sanitization techniques to reduce PII/PHI leakage while maintaining contextual utility.
Large Language Models (LLMs) have shown remarkable capabilities in privacy-sensitive domains like healthcare and hiring. However, concerns around regurgitating Personally Identifiable Information (PII) and Protected Health Information (PHI) persist. This study evaluates GPT-3.5, GPT-4, and GPT-4 Turbo, examining their propensity to leak sensitive data and the effectiveness of prompt-induced sanitization techniques. Using synthetic datasets reflecting HIPAA and GDPR compliance requirements, the models were benchmarked across privacy leakage, anonymization accuracy, and contextual utility. Our results show that GPT-4 and GPT-4 Turbo significantly reduce sensitive data leakage while preserving output utility, providing actionable insights for deploying LLMs in regulated environments.
Investigation of dynamic representation learning in large language models using graph-based attention mechanisms for improved context understanding.
In the world of Large Language Modeling, incremental learning plays an important role in evolving data such as streaming text. We introduce an incremental learning approach for dynamic contextualized word embeddings in the setting of streaming data. We call the embeddings generated by our model as Incremental Dynamic Contextualized Word Embeddings (iDCWE). Our model introduces the incremental BERT (iBERT) (BERT stands for Bidirectional Encoder Representations from Transformers) to create a dynamic and incremental model to perform incremental training. Our model further captures the semantic drift of words using dynamic graphs. Our paper is the first in the line of research on (incremental) dynamic modeling of streaming text which we also refer to as Neural Dynamic Language Modeling. The performance of our model on the benchmark datasets is on par and even often outperforms the dynamic contextualized word embeddings which was the first paper to combine contextualization with dynamic word embeddings, to the best of our knowledge. Moreover, the compute time efficiency of our model is better than that of the aforementioned paper.
A Temporal Dynamic Graph Neural Network (TDGNN) framework designed to model real-world dynamic graphs by integrating time-aware message passing, graph topology, and point process theory to enhance prediction in social and interaction networks.
The utilization of Graph Neural Networks (GNN) in modeling real-world graph structures has shown promising results, making it a widely recognized method for extracting information from non- Euclidean data, such as social networks. For static networks, there are many different sophisticated GNN architectures and models; nevertheless, the development of comparable methods for dynamic graphs has been slow. Due to the dynamic nature of graphs, Dynamic Graph Neural Networks have recently attracted interest in various fields, especially social networks. Continuous Time Dynamic Graphs (CTDG) effectively incorporate temporal information, graph structure (topology), and node properties to record the continuous time progression of dynamic graphs. The computational and memory requirements for Dynamic Graph Neural Networks provide considerable difficulties. In order to address this, a novel deep learning methodology using a model called TDGNN (Temporal Dynamic Graph Neural Network) has been developed to effectively model dynamic graphs while including temporal data, graph structure, and node attributes. The model suggests an approach that has been shown to perform better than baseline methods when tested on benchmark datasets.
Time series analysis and prediction framework for optimizing network traffic offloading in 5G networks using software-defined networking.
The continuous growth of mobile traffic and limited spectrum resources limits the capacity and data rate. Heterogeneous Networks (HetNet) is a solution with multiple radio interfaces in smartphones to realize such demands. Simultaneous data transfer on Long Term Evolution (LTE) and WiFi has gained attention for data offloading in 5G HetNet. Maintaining the average throughput and minimum delay for LTE users is still a challenge in data offloading owing to the mobility and load in the network. This study explores the benefits of Software-Defined Networking (SDN) based multipath for data offloading schemes for LTE-WiFi integrated networks to maintain the user's average throughput based on channel quality classification. We classify future link qualities using deep learning models such as Long Short-Term Memory Networks (LSTM) and Bidirectional Long Short-Term Memory Networks (BLSTM). The received signal strength indicator (RSSI) and packet data rate (PDR) are parameters used in BLSTM. The results of the prediction were compared with those of state-of-the-art methods. We obtained a 2.1% better prediction than the state-of-the-art methods. The predicted results were used to offload the data using LTE and WiFi. The performance of HetNet was compared with the state-of-the-art method for average throughput, and with the proposed method, a 6.29% improvement was observed.
Novel approach to mitigating bias in computer vision models through adversarial debiasing and balanced representation learning.
This paper introduces a feature distillation framework that aims to learn fairer representations without significantly sacrificing task performance. We propose a Maximum Mean Discrepancy (MMD)-based loss to distill information from an unfair teacher network while encouraging feature invariance across protected attributes such as race or gender. Our method demonstrates marked reduction in disparity measures while maintaining competitive accuracy on standard computer vision benchmarks.
Integration of visual and textual cues for early detection of conversation derailment in multimodal AI systems.
This paper presents a hierarchical transformer-based framework that jointly models textual and visual modalities for detecting derailment in multimodal discussions. The proposed system integrates BERT-based text encoding with Faster R-CNN-derived visual features, achieving 71.0% accuracy and 78.3% AUC, outperforming text-only baselines by 6.1%. Our approach demonstrates the importance of multimodal cues in understanding conversation dynamics and early detection of potential derailments.
Development of a lightweight graph contrastive learning framework for efficient recommendation systems.
GraphNeural Networks (GNNs)have emerged as apotent framework for graph-structured recommendation tasks. Incorporating contrastive learning with GNNs has recently demon strated remarkable efficacy in addressing challenges posed by data sparsity, thanks to in novative data augmentation strategies. However, many existing methods employ stochastic perturbations (e.g., node or edge modifications) or heuristic approaches (e.g., clustering based augmentations) to generate contrastive views, which may distort semantic integrity or amplify noise. We introduce LightGCL, a novel and streamlined graph contrastive learn ing model to address these limitations. LightGCL utilizes Singular Value Decomposition (SVD) to achieve robust augmentation, facilitating structural refinement and global collab orative relation modeling without manual augmentation strategies. Extensive experiments on benchmark datasets showcase its substantial performance enhancements over state-of the-art methods. Further analysis highlights the model's resilience to challenges like data sparsity and bias related to item popularity.
Analysis of ethical considerations and policy implications in autonomous vehicle systems, focusing on Tesla's Autopilot implementation.
This case study delves into the ethical ramifications of incidents involving Tesla's Autopilot, emphasizing Tesla Motors' moral responsibility. Using a seven-step ethical decision-making process, we examine user behavior, system constraints, and regulatory implications. The analysis offers insights into ethical considerations in evolving technological landscapes and proposes a framework for evaluating autonomous system deployments in safety-critical applications.
Time series forecasting model for predicting flight delays using weather data and historical flight performance.
This research investigates flight delay trends by examining factors such as departure time, airline, and weather conditions. We employ time-series models including LSTM, Hybrid LSTM, and Bi-LSTM, comparing them with baseline regression models. Our approach focuses on identifying influential features in delay prediction, potentially informing flight planning strategies. The study demonstrates the effectiveness of deep learning approaches in capturing complex temporal patterns in aviation operations.
Machine learning approach to diabetes prognosis using patient data and clinical markers for early detection.
This study addresses the critical need for early diabetes detection using machine learning algorithms. We compare K-Nearest Neighbor, Random Forest, and Artificial Neural Network approaches, incorporating comprehensive preprocessing and feature engineering strategies. The Random Forest classifier achieved the highest accuracy of 87.89%, significantly outperforming traditional diagnostic methods and demonstrating the potential for ML-assisted medical decision-making.
Deep learning approach to automated bone age assessment using X-ray images for pediatric growth evaluation.
This research evaluates deep learning models for automated bone age assessment, comparing pre-trained architectures including VGG-16, InceptionV3, XceptionNet, and MobileNet. Our study focuses on pediatric X-ray images, developing a system that can accurately determine skeletal maturity without expert intervention. The results demonstrate the potential for AI-assisted growth evaluation in clinical settings, with particular emphasis on reducing assessment time while maintaining accuracy.
Actively seeking full-time opportunities as an AI/ML Engineer, Data Scientist, Data Analyst, or Data Engineer. Whether you're building innovative systems or solving real-world problems with data, I'd love to be a part of it. Let's chat about how I can help your team move faster with intelligent, scalable solutions.