YYash Gupta
Data Engineer · Bangalore, India

I build data platforms that are fast, reliable & cheap to run.

I independently own and scale data platforms in fast-paced, lean environments — leading critical migrations and architectural shifts that cut query latency by 90%+ and infrastructure cost by 35–40%. I specialize in cost-efficient, reliable pipelines and real-time systems.

Working acrossClickHouseSparkAirflowAWSdbt
Data Platform Orchestrator
All systems healthy
S3
Ingest
Spark / EMR
CH
ClickHouse
Query SLA
60s 5s
↓ 12× faster
Cost saved
40%
▼ Aurora spend
Latency reductionlive
medallion · bronze → silver → gold synced
300GB DocumentDB → Aurora · 0 downtime
dbt run · 124 models building…
0%+
Query latency reduced
0%
AWS Aurora cost cut
0GB
Migrated, zero downtime
0%
Lakehouse cost reduction
01 — About

Strong ownership in lean engineering environments.

I'm a Data Engineer who likes the hard, unglamorous problems: migrations that can't drop a row, warehouses that need to answer in seconds instead of minutes, and cloud bills that need to come down without anyone noticing a regression.

⚙️

Platform migrations

Moved a cloud-native lakehouse to ClickHouse and 300GB from DocumentDB to Aurora Postgres — structured, zero-downtime, no cost increase.

📉

Cost engineering

Consolidated databases and re-architected the medallion lakehouse to drive 35–40% cost reductions across production and lower environments.

📡

Real-time & reliability

Built real-time monitoring dashboards with automated alerting for proactive issue detection and higher data reliability.

02 — Experience

Where I've shipped.

Data Engineer I

Connect and Heal
Bangalore, India
Jun 2024 — Present

Healthcare technology company focused on digital solutions to improve patient engagement and management.

  • Migrated the data platform from a cloud-native lakehouse to ClickHouse, optimizing warehouse design and cutting query SLA from 60+s to ~5s on average.
  • Consolidated and migrated databases, reducing AWS Aurora Postgres costs across production and lower environments by 40%.
  • Designed scalable workflows with S3, MWAA, Glue, DMS, Hudi, Spark & EMR in a medallion architecture — driving a 35% lakehouse cost reduction.
  • Led a 300GB migration from DocumentDB to Aurora Postgres — unstructured → structured, zero downtime, no cost increase.
  • Designed real-time monitoring dashboards with automated alerting for proactive issue detection and improved data reliability.
  • Led PoCs across RisingWave, Redshift, Dagster, dbt & Olake to evaluate streaming, orchestration and transformation trade-offs, influencing platform architecture decisions.

Software Engineer Intern

MagicPin
Gurugram, India
May 2022 — Aug 2022

A tech startup focused on enhancing retail & shopping experiences through data-driven insights.

  • Enhanced data crawling & parsing to efficiently extract product information from multiple sources — a 10% improvement in speed and performance.
  • Applied Python web crawling to automate data collection, improving efficiency and scalability by reducing Docker image size by 20%.
  • Assisted in deploying containerized data pipelines using Docker & Kubernetes, improving the scalability of data ingestion workflows.

Summer Research Intern

National University of Singapore
Singapore
Jun 2022

A leading research university specializing in innovation and advanced technology.

  • Improved AI model performance through optimization and experimental evaluation, in collaboration with NUS and Hewlett Packard Enterprise.
  • Group research project: designed an AI-powered fashion design system using Neural Style Transfer (NST) and Generative Adversarial Networks (GANs).
03 — Stack

The tools I reach for.

🧮

Data Processing & Platforms

SparkSQLPythonClickHouseHudiIceberg
☁️

Cloud & Storage

S3EMRGlueLambdaRDSDynamoDBAthenaDMSIAMEC2ECSECRDocumentDBAurora Postgres
🪄

Orchestration & Transformation

Airflow (MWAA)Dagsterdbt
🛠️

DevOps & Tools

DockerJenkinsGitKubernetes
04 — Projects

Things I've built for fun.

Airflow
Spark
MinIO
Metabase

Stock Market Data Pipeline

End-to-end pipeline on Apache Airflow + Docker to ingest, process & store daily stock data. Dockerized Spark transforms, stored in MinIO (S3-compatible) & PostgreSQL, visualized in Metabase.

AirflowDockerSparkMinIOPostgreSQLMetabase
style+contentdesign

AI-Generated Fashion Design

AI design system using Neural Style Transfer & GANs to blend artistic styles into unique patterns — reducing iteration time from hours to seconds. Secured 2nd place at NUS for innovation.

PythonNSTGANsDeep Learning
🎓

B.Tech in Computer Science Engineering

Shiv Nadar University · India

Aug 2020 — May 2024
05 — Contact

Let's build something reliable.

Open to data engineering roles & interesting platform problems. The fastest way to reach me is email.

LinkedIn·+91 99901 81300·Bangalore, India