About me

Aspiring data scientist and data analyst with a solid foundation in mathematics and statistics, driven by a passion for uncovering patterns and discovering hidden insights in data. I thrive on solving complex problems and finding creative solutions that make a meaningful impact.
Beyond my analytical pursuits, I’m a trained Indian classical dancer and a music enthusiast currently learning the ukulele. These creative outlets have taught me discipline, precision, and the beauty of patterns—not just in numbers but also in rhythm and movement.
I believe my unique blend of analytical skills and artistic perspective allows me to approach challenges with both logic and creativity, making me adaptable and innovative in the ever-evolving field of data science.

Technical Skills:

Education

UCI

University of California, Irvine

Master of Data Science

GPA: 3.95
During my master's, I had the opportunity to integrate my background in mathematics and statistics with the principles of computer science. I gained a deeper understanding of how statistics applies in real-world scenarios through courses in Linear regression, Bayesian statistics, and Probability theory. To strengthen my computer science foundation, I took classes in Data structures, Algorithms, and Database management systems. Additionally, courses like Big Data Management, Machine learning and Probabilistic models introduced me to new technology and kept me updated on current developments in the AI field.

MIT

MIT World Peace University, Pune, India

Master of Science, Statistics

GPA: 3.94
During my undergraduate studies, I found statistics to be particularly fascinating and wanted to explore it in greater depth. I took courses such as Asymptotic Inference, Clinical Trials, Time Series Analysis, and Hypothesis Testing etc., which further fueled my interest in the field. I also had the opportunity to take an introductory course in Machine Learning, which I found incredibly exciting. This experience sparked my decision to pursue a career in data science, where I could combine my love for mathematics, statistics, and problem-solving to derive meaningful insights from data.

I received Dr. Vishwanath Karad Merit Scholarship for academic year 2022-23.

FC

Savitribai Phule Pune University
Fergusson College, Pune, India

Bachelor of Science, Mathematics

GPA: 3.93
I have been passionate about mathematics for as long as I can remember. I’ve always loved playing with numbers and exploring the fundamental principles that explain how things work. This curiosity led me to pursue a bachelor's degree in Mathematics, with minors in Physics and Statistics. During my undergraduate studies, I delved into topics such as Real Analysis, Complex Analysis, Combinatorics, Linear Algebra, and Differential Equations. These subjects not only deepened my understanding of mathematics but also honed my logical reasoning and critical thinking skills.

Experience

Company Logo

Innopiphany

Data Analyst | Sept 2024 - Dec 2024

Innopiphany is a data-driven consulting firm specializing in providing innovative forecasting and strategic solutions for the pharmaceutical and healthcare industries.
My capstone team partnered with Innopiphany to provide them a forecasting tool that aids early-stage pharmaceutical companies about anticipated government spending for their upcoming drugs.

  • Integrated openFDA and financial datasets via API calls using Python (Pandas, NumPy) for efficient data retrieval and processing.
  • Conducted exploratory data analysis (EDA) with Seaborn and Matplotlib to identify key predictors and uncover top drug candidates in the market.
  • Designed a forecasting method leveraging analog drug data, utilizing curve fitting with moving averages to predict government spending on newly launched drugs.
  • Collaborated on the development of a user-friendly, web-based UI using Streamlit for interactive data visualization and informed decision-making.
  • UI Demo
Company Logo

UCI School of Medicine

Data Analyst

The goal of this project was to develop an automated voice classification system that could classify speech as cleft or non-cleft, reducing the reliance on manual assessments by clinicians, ultimately leading to faster diagnosis and better accessibility to treatment.

  • Implemented an audio processing pipeline in Python using Parselmouth to extract key audio features from speech samples.
  • Developed and trained an LSTM-based neural network (SpeechLSTM) on the extracted features to classify cleft and non-cleft speech.
  • Evaluated the model’s performance by measuring accuracy (58.50%) and Binary Cross-Entropy Loss (0.6759) to assess its classification accuracy.
Company Logo

Karkinos Healthcare Pvt. Ltd.

Data Analyst Intern | Feb 2023 - Jul 2023

Karkinos Healthcare is an oncology-focused health-tech company dedicated to improving cancer care through early detection, advanced diagnostics, and patient-centric solutions.

  • Led data cleaning efforts to identify issues in data collection for timely medical interventions based on cancer risk scores.
  • Created dashboards in Excel using pivot tables and Power BI for accessible patient data across multiple states in India.
  • Conducted data sampling for quality assurance in tele-screening, identifying outliers to enhance cancer care standards.
  • Volunteered at medical camps, collecting pre-cancer screening data from over 100 patients in rural communities.
Company Logo

Paathshala Education

Educator and Academic Content Creator | Dec 2020 - Jul 2021

  • Tutored over 20 high school students in mathematics through personalized one-on-one sessions.
  • Designed tailored teaching strategies for each student, focusing on their individual strengths and addressing areas for improvement.
  • Created over 300 simplified tutorial solutions to improve comprehension and boost academic performance for asynchronous learning during lockdown.
Company Logo

Bilvaani School of Dance and Movement based learning

Trainer | Feb 2020 - Feb 2023

  • Trained a batch of 15 students, aged 7 and above, in Kathak for over a year, fostering discipline and creativity through classical dance.
  • Assisted elementary school students in understanding mathematical concepts by incorporating dance, movements, and rhythm, effectively addressing learning loss during the lockdown.
  • Coordinated a stage performance featuring 20+ children, showcasing Kathak recitals and movement-based compositions that creatively illustrated concepts in math, history, geography, and languages.

Projects

Academic Performance Analysis Using Regression Model using Python

This project aims to analyze Student Performance Data. The goal is to understand the patterns that can help in recognizing students at risk of under performing, thereby facilitating the development of targeted interventions and support strategies to improve their academic achievements.

  • Conducted Exploratory data analysis (EDA) using Python libraries such as Seaborn and Matplotlib to visualize data patterns, identify trends, and uncover insights from the dataset.
  • Developed a Multiple Linear Regression model to predict outcomes, carefully selecting relevant features for better accuracy.
  • Incorporated Lasso regularization to address multicollinearity and enhance the robustness of the model.
  • Achieved a high R-squared value of 0.94, demonstrating the model's effectiveness in explaining the variability in the data.
  • Performed thorough diagnostic checks to validate model assumptions, including tests for linearity, homoscedasticity, and normality, and supported findings with A/B testing to ensure statistical validity.
  • GitHub

    Sales Performance Analytics Dashboard using PowerBI

  • Designed and developed interactive sales performance dashboards using AdventureWorks and Maven Market data, transforming raw data into meaningful insights through ETL processes, data modeling, and DAX calculations. I created calculated columns, measures, and custom aggregations to enable dynamic analysis of key business metrics such as revenue, profits, order volume, and return rates.
  • To enhance data visualization, built interactive Power BI dashboards with drill-through functionality, slicers, and filters, allowing stakeholders to explore trends intuitively. By optimizing data models and refining performance through DAX expressions, I improved data retrieval speed and report responsiveness.
  • Through multi-metric analysis, identified key sales trends and opportunities, leading to a 20% improvement in sales forecasting accuracy. Additionally, the dashboards streamlined decision-making, boosting efficiency by 40%, enabling businesses to make more data-driven strategic choices.
  • GitHub

    Hierarchical Analysis of Power Consumption Data using R

  • Built a Bayesian hierarchical model to analyze power usage patterns across 3 city zones, identifying 4+ key predictors and uncovering zone-specific consumption trends using R and the Tetouan City dataset.
  • Performed model validation using posterior checks, trace plots, and 95% credible intervals, ensuring convergence and improving parameter estimate reliability for informed decision-making.
  • Conducted EDA on 55000+ observations, revealing that temperature and humidity explained 75% of variance in energy use across zones, enabling data-driven strategies for energy efficiency.
  • GitHub

    Landed Cost Analysis & Tariff Optimization

  • Quantified the total landed cost impact of HTS, Section 301/232, and IEEPA duties across a product portfolio, achieving a clear cost-to-profit breakdown for three internal systems (Aircraft, Zip, Dock).
  • Identified high-leverage cost-saving strategies, including HTS code reclassification and logistics mode optimization, with the goal of reducing overall duty exposure and freight costs.
  • Resolved data inconsistencies in freight and material costs across disparate system data, ensuring parameter reliability for an accurate Total Landed Cost calculation.
  • Developed an Automation Roadmap proposing migration from the manual Excel framework to a scalable BI solution (SQL and Python integration) to enable real-time risk management and compliance monitoring.
  • GitHub

    Online Retail Sales Analysis using SQL and PowerBI

    This project involved cleaning, transforming, and analyzing e-commerce sales data to uncover business insights using SQL for data querying and Power BI for interactive dashboard creation. The dataset included over 500,000 transactions across multiple countries.

  • Cleaned and preprocessed ~135,000 rows with missing customer data and removed over 11,000 invalid transactions using PostgreSQL.
  • Built a Power BI dashboard featuring revenue trends, product-level sales insights, repeat customer analysis, and a country-wise revenue map using DAX and visuals like slicers, line charts, and maps.
  • Enabled executive insights by identifying top 10 revenue-generating products and top customer spenders, improving business decision-making potential.
  • GitHub

    Predicting Patient Treatment Continuation using Python

    This project aimed to develop and evaluate multiple machine learning models to predict whether a decision regarding a proposed health plan would be upheld or overturned. We explored different models like Random Forest, Decision Tree, Multi-Layer Perceptron, Logistic Regression, and XGBoost.

  • Implemented and evaluated various classification models, with XGBoost emerging as the most successful with a mean cross-validation score of 64.83% across 5 folds for the XGBoost classifier, demonstrating consistent performance.
  • Performed feature importance analysis and tested XGBoost on various subsets of the dataset.
  • Conducted hyperparameter tuning using GridSearchCV for further optimization.
  • Identified data imbalance as a key challenge impacting model performance and suggested further exploration of data balancing techniques.
  • GitHub

    Contact me