.whiteTextOverride{ color:white; }
About Me

Who Am I?

Hi, I'm Arnav! I am currently pursuing a masters in Data Science at Columbia University.

The curriculum strengthens my foundation in statistics, algorithms, and machine learning while also allowing me to explore areas such as deep learning, data visualization, and financial analysis. What excites me most about Columbia is the opportunity to learn from world-class faculty and collaborate with peers at the cutting edge of research, all while being immersed in New York City's vibrant tech and finance ecosystem. This unique combination of academic excellence and industry exposure equips me to bridge the gap between data-driven innovation and real-world impact.

I graduated from Birla Institute of Technology and Science Pilani in 2023, where I pursued a dual degree in Computer Science and Mathematics. My passion lies at the intersection of technology, mathematics, and business, and I have found Machine Learning to be the perfect fusion of these interests. I am driven by the immense, yet untapped potential of this field, and I continually seek to expand my knowledge, enhance my skills, and delve deeper into this space.

During my undergraduate studies, I have gained exposure to various machine learning techniques and developed a solid understanding of the underlying statistics that drive these algorithms. In the past year, my focus has been on exploring diverse algorithms within the realms of Reinforcement Learning based Large Language Models. Through research projects and internships, I've been able to apply my creativity and problem-solving skills to real-world challenges in these areas.

From 2023 to 2025 I have worked at Barclays as a software data engineer where I have exposure on how data is used in the finance industry. I have been responsible for deploying and maintaining infrastructure for over 100 quantitative models that provide actionable insights to the bank. I have also been involved in building cloud based ETL pipelines that help create datasets for these models.

I am proficient in classical machine learning as well as deep learning algorithms. I am also proficient in AWS services and have experience in deploying ML models to cloud.

Beyond academia, I maintain an active lifestyle by engaging in sports such as tennis and swimming. Additionally, I have devoted my time to volunteering at an NGO, where I had the opportunity to tutor high school students, further honing my communication and mentoring abilities. I am also an avid reader, particularly intrigued by novels and exploring the intricate world of movies. Additionally, I find great fascination in studying the stock market and staying informed about geopolitical changes worldwide.

I am eagerly seeking an environment that fosters a culture of innovation, collaboration, and hard work, where I can continue to thrive and contribute my skills and passion. By combining my academic achievements, practical experience, and diverse interests, I am confident in my ability to make a meaningful impact in the field of Machine Learning and beyond.

What I do?

Here are some of my expertise

Problem Solving

Machine Learning

Artificial Intelligence

Database management

Cloud

Community Service

Cups of Tea
Projects
Research Papers
Skills

Skills

                   

                   

                       
Education

Education

Columbia University in the City of New York 2025 - 2026
Masters in Data Science Program

Coursework Includes:

  • Probability and Statistics
  • Algorithms
  • Exploratory Data Analysis
  • Data Visualization
  • Machine Learning
  • Financial Analysis
  • Applied Deep Learning
The rigorous curriculum at Columbia's Data Science program provides me with a strong foundation in both the theoretical and applied aspects of the field. Courses such as Probability and Statistics and Algorithms sharpen my analytical and problem-solving skills, while Exploratory Data Analysis and Data Visualization enhance my ability to derive insights and communicate them effectively. By engaging with Machine Learning and Applied Deep Learning, I gain hands-on experience in building predictive models and deploying advanced AI techniques. Additionally, Financial Analysis equips me with the quantitative and domain-specific knowledge to apply data-driven decision-making in real-world contexts. Together, these courses enable me to approach data science with both technical rigor and practical application, preparing me to tackle complex challenges with creativity and impact.

Birla Institute of Technology & Science, Pilani 2018 - 2023
CGPA: 8.29

B.E. Computer Science & M.Sc. Mathematics Dual Degree Program

Coursework Includes:
  • Computer Science:
    • Data Structure and Algorithms
    • Database Management Systems
    • Microprocessors and Interfacing
    • Object Oriented Programming
    • Logic in Computer Science
    • Digital Design
    • Computer Programming
  • Mathematics:
    • Applied Stochastic Processes
    • Optimization
    • Operations Research
    • Graphs and Networks
    • Discrete Mathematics
    • Linear Algebra and Complex Analysis
    • Integral Calculus
    • Multi-variable Calculus Higher Order Differential Equations and their analysis
    • Mathematical Methods
    • Numerical Analysis
    • Topology
The combination of these subjects, with their unique blend of theoretical knowledge and practical application, equips me with a comprehensive skill set and a well-rounded mindset that is perfectly suited to thrive in the dynamic and ever-evolving industry. Each subject contributes its own valuable insights, allowing me to approach challenges from multiple perspectives and consider a wide range of innovative solutions. The depth of my understanding in these areas empowers me to navigate complex problems with confidence and propose creative approaches that push the boundaries of what is possible. Whether it's leveraging my analytical prowess to unravel intricate puzzles or harnessing my artistic sensibility to craft visually stunning solutions, I possess the ideal toolkit to address the diverse and demanding issues that arise in the industry. By combining my expertise in these subjects, I am able to offer a unique and holistic approach, consistently delivering exceptional results and making a meaningful impact in the field.

Saint Xavier Senior Secondary School, Jaipur, Rajasthan, India 2016-2018
Score: 95.6%

Saint Xavier Senior Secondary School, Jaipur, Rajasthan, India 2016
Score: 10/10 CGPA

Publications

Publications

Use of spatio-temporal features for earthquake forecasting of imbalanced data.

(IEEE) International Conference on
Intelligent Innovations in Engineering and Technology (ICIIET)
This paper addresses the challenge of predicting large earthquakes from imbalanced seismic datasets, where high-magnitude events are rare compared to smaller ones. I propose transforming time-series earthquake catalogs into feature-rich datasets by incorporating temporal and geospatial indicators such as fault density. Three machine learning approaches—weighted Support Vector Machines, distance-weighted K-Nearest Neighbors, and weighted Decision Trees—are evaluated across multiple seismic regions including the Himalayas, Central Java, Sumatra, Sulawesi, and Southeast Asia. Results show that distance-weighted KNN outperforms the others in accuracy, precision, and F1 score, demonstrating its robustness against data imbalance. The study highlights the potential of spatio-temporal feature engineering and algorithm-level adjustments for more reliable earthquake forecasting.

Disease Identification in Tomato Leaf using pre-trained ResNet and Deformable Inception

(Springer) 5th International Conference on
Computational Intelligence in Data Science
This paper presents a deep learning approach for detecting tomato leaf diseases by combining ResNet-50 with Inception modules and deformable convolutions. Unlike previous models trained mostly on lab-curated datasets, the proposed architecture is evaluated on both controlled (PlantVillage) and real-world farmland (PlantDoc) images, as well as an augmented dataset to improve robustness. The model achieves state-of-the-art accuracy—99.08% on PlantVillage and 66.06% on PlantDoc, significantly outperforming prior methods. By leveraging skip connections, multi-scale filters, and deformable kernels, the approach enhances disease recognition in realistic agricultural settings, offering a promising step toward practical crop disease monitoring systems.

Forecasting Earthquakes Using Neural Network Models.

(Springer Nature) Disaster Management in Complex Himalayan Terrains
Natural Hazard Management, Methodologies and Policy Implications
This chapter explores the use of artificial neural networks to forecast earthquakes in the Himalayan region, one of the most seismically active zones in the world. Using seismic data from 1980 to 2020, I extracted eight key seismicity indicators based on the Gutenberg Richter law and other empirical relations to capture intrinsic earthquake patterns. A neural network architecture optimized with deep learning techniques achieves 90% accuracy and an F1-score of 0.89, demonstrating its effectiveness in modeling the nonlinear and heterogeneous nature of seismic processes. The study underscores the potential of machine learning in advancing earthquake hazard assessment and guiding disaster preparedness in vulnerable Himalayan communities.
Experience

Professional & Research Experience

AI Agents Speculative Actions July 2025 - Present

Columbia University

This project explores speculative actions as a way to speed up agent workflows by treating every step—tool calls, LLM calls, and even human responses—as API calls that can be predicted and executed in advance. Instead of waiting for each actions result before starting the next, the system speculates on likely outcomes, executes multiple candidate paths in parallel, and rolls back when necessary. By capturing real dependencies in a DAG, the approach ensures correctness while enabling efficiency gains. Potential applications of this project range from customer service and chatbot interactions (preparing answers before user replies), to developer workflows (pre-fetching packages, pre-computing tool calls), and simulation platforms (predicting API or human responses). The goal is to showcase the viability and effectiveness of speculative execution in interactive systems, balancing cost, accuracy, and speedup.

AI Agent Safety June 2025 - Present

Columbia University

This project focuses on training AI agents to complete tasks safely while minimizing potential harm to users. Traditional task evaluation measures only whether an agent successfully completes a task, but in real-world scenarios, agents can inadvertently cause side effects such as data loss, privacy breaches, operations disruption, financial harm, or security violations. To address this, the project develops a structured side effect evaluation framework that systematically monitors for these risks across different applications, including file systems, web browsers, document editing, and databases. By combining task success metrics with side effect assessments, the framework ensures agents not only achieve the desired outcomes but also operate robustly, responsibly, and in alignment with user safety expectations. Additionally, the project explores methods to capture initial system states, predict potential harmful actions, and design mitigation strategies, enabling agents to act both effectively and safely in complex interactive environments.

Model Integration and Deployment August 2023 - April 2025

Barclays (Team - Model Implementation Team)

  • Contributed to the curation of model ready datasets for multiple teams within Barclays
  • Spearheaded the design and implementation of a unified messaging service integrating diverse team services.
  • Built and managed infrastructure to integrate and run over 100 quantitative models for the bank
My time here at Barclays has introduced me to the finance industry and how Data Science is used in the banking domain. Here I have learnt how data is provisioned to create model ready datasets, specifically in the financial context. My experiences working with etl processes here have have made me wonder about the possibility of automation in data transformation which would significantly speed up the process and reduce the amount of effort spent. Here I developed a model integration tool which automated the integration and deployment of models on the cloud; to be accessed by my team's microservices based application to run the models. During my time here, I had the privilege of participating in a global Generative AI hackathon, which brought together 2,000 participants from across the organization. Collaborating in this innovative and competitive environment, my team and I developed a solution using a Large Language Decoder Model to address a pressing business challenge in the Anti-Money Laundering area. Judged by senior stakeholders, our solution was well received during the presentation phase, and we secured the first position in the hackathon.

Western Australia Transforming Community Health Jan 2022 - June 2022

Western Austrlia Department of Health

  • Analyzed ~19000 attributes for 373 suburbs in the Australian continent for improving community health.
  • Implemented heirarchical clutering and PCA based clustering for social determinants of health based attribute correlation
  • Obtained a specific suburb from the data for in-depth analysis and evaluation of policy effectiveness.
Fresh off my industry experience at Amazon, I worked on my final semester thesis with the Western Australia Department of Health, a government body which works to improve and protect the health of the community in the Western Australian suburbs near Perth. This experience honed my ability to tackle complex data challenges and develop actionable insights that have a positive impact on communities. It also has made me more sensitive about health related problems affecting the world.

User Action Automation June 2022 - December 2022

Amazon (Team - Selection Monitoring)

  • Analyzed web domain data for competitor e-commerce websites.
  • Utilized AWS resources like Sagemaker, S3, Stepfunctions to implement baseline models for web domain data.
  • Constructed a Reinforcement Learning and Webpage Segmentation based approach for user action automation in the web crawler.
My internship at Amazon provided invaluable exposure to the industry, allowing me to gain firsthand insights into the deployment of large-scale models. Moreover, it presented me with a unique opportunity to delve deeply into the realm of reinforcement learning, particularly in the context of automating web tasks. This experience enabled me to explore the intricacies of applying advanced techniques in real-world scenarios and solidified my understanding of the practical implementation of reinforcement learning algorithms.

Identifying Disease Using Machine Learning Jan 2022 - May 2022

BITS Pilani

  • Analysed single nucleotide polymorphism datafor identifying the susceptibility to diabetic retinopathy.
  • Implemented Lasso Regression and Random Forest algorithm for feature selection in SNPs.
  • Used machine learning algorithms like kNN, SVM, Gradient Boosted DT for predicting the susceptibility.
This project exposed me to the world of machine learning in biology and a variety of new coding practices in python.

Recognition of Devnagri Script using Virtual Pen Hover Aug 2021 - Dec 2021

BITS Pilani

This was a research project which aims to create a virtual hover pen system with recognition support for devnagri script. The initial hover pen was designed using openCV contours and an Encoder Decoder model will be used for the recognition part of devnagri script.
Technologies worked with:
  • openCV
  • Google Colab
  • Jupyter Notebooks
  • Apart from Python and other mainstream statistical tools
This project exposed me to the area of Human Computer Interation (HCI) and the various developments taking place in it. I integrated support for hindi language recognition of text written with hover pen.

Crop Disease Identification Aug 2020 - May 2021

BITS Pilani

This was a research project which led to the development of a new architecture inspired from inceptionNet and resNet. It produced a higher accuracy than the traditional models. I also created a new dataset by merging images, which significantly improved the accuracy on the real conditions dataset.
I achieved an accuracy of 98.16% which is higher than the traditional resnet model (97.5%) for the ideal dataset, and an improvement of more that 30% in the accuracy of the real world dataset with images in less than ideal conditions.
This project taught me about the applications of machine learning in the agriculture sector. This work has been published in Springer Book Series titled Advances in Information and Communication Technology.

Earthquake Forecasting Aug 2020 - December 2020

BITS Pilani

  • Implemented a time series forecasting model which forecasts earthquakes using seismicity information in the five different regions including the Indian Himalayan Region
  • Achieved an accuracy of 90.4% for predicting the probability of an upcoming earthquake of magnitude higher than threshold magnitude within 30 days.
This projects exposed me to the world of geophysics and the important work that has been done in the area of forecasting natural disasters. This work has been published in (Springer Nature) Disaster Management in Complex Himalayan Terrains - Natural Hazard Management, Methodologies and Policy Implications.

Facial Recognition Based Attendance System May 2020 - July 2020

Tamil Nadu Health Systems Project

As a part of this project I worked in an government organization under the Tamil Nadu government and:

  • Developed a facial-recognition based attendance system using Computer Vision and facial recognition libraries to help curb the spread of COVID-19 by avoiding contact with infected surfaces
  • Achieved a reduction in queue size as well in large hospitals
As a part of this project I learnt several technical skills including the functionality of the OpenCV library as well as multiple methods of the solving the facial recognition challenge.
Apart from this I also learned several soft skills such as working in a team, communication skills to get my ideas across effectively as well as presentation skills.

Epidemiological Analysis of COVID-19 March 2020 - June 2020

BITS Pilani

  • Analyzed COVID – 19 data with respect to the SIR epidemic model of disease spread. Predicted the number of individuals who are susceptible to infection, are actively infected, or have recovered from infection at any given time.
  • Estimated the parameters of the model, which define the characteristics of the epidemic, by minimizing squared error loss
  • Calculated the reproductive number to be close to 1.2

English to Hindi Language Transliteration June 2020 - July 2020

BITS Pilani

  • Trained an Encoder-Decoder model which transliterated English alphabets to Hindi language font
  • Deployed Gated Recurrent Units with attention mechanism to enhance the performance of the mode

Occlusion Analysis & Filter Visualization March 2020 - April 2020

BITS Pilani

  • Analyzed the filter in a CNN for detecting the important parts of an image
  • Performed occlusion sensitivity analysis on various image