Data Engineer · Bangalore, India

I build data platforms that are fast, reliable & cheap to run.

I independently own and scale data platforms in fast-paced, lean environments — leading critical migrations and architectural shifts that cut query latency by 90%+ and infrastructure cost by 35–40%. I specialize in cost-efficient, reliable pipelines and real-time systems.

View my work Get in touch

Working acrossClickHouseSparkAirflowAWSdbt

Data Platform Orchestrator

All systems healthy

Ingest

⚡

Spark / EMR

ClickHouse

Query SLA

60s 5s

↓ 12× faster

Cost saved

40%

▼ Aurora spend

Latency reductionlive

medallion · bronze → silver → gold synced

300GB DocumentDB → Aurora · 0 downtime

dbt run · 124 models building…

0%+

Query latency reduced

AWS Aurora cost cut

0GB

Migrated, zero downtime

Lakehouse cost reduction

01 — About

Strong ownership in lean engineering environments.

I'm a Data Engineer who likes the hard, unglamorous problems: migrations that can't drop a row, warehouses that need to answer in seconds instead of minutes, and cloud bills that need to come down without anyone noticing a regression.

⚙️

Platform migrations

Moved a cloud-native lakehouse to ClickHouse and 300GB from DocumentDB to Aurora Postgres — structured, zero-downtime, no cost increase.

📉

Cost engineering

Consolidated databases and re-architected the medallion lakehouse to drive 35–40% cost reductions across production and lower environments.

📡

Real-time & reliability

Built real-time monitoring dashboards with automated alerting for proactive issue detection and higher data reliability.

02 — Experience

Where I've shipped.

Data Engineer I

Connect and Heal

Bangalore, India

Jun 2024 — Present

Healthcare technology company focused on digital solutions to improve patient engagement and management.

Migrated the data platform from a cloud-native lakehouse to ClickHouse, optimizing warehouse design and cutting query SLA from 60+s to ~5s on average.
Consolidated and migrated databases, reducing AWS Aurora Postgres costs across production and lower environments by 40%.
Designed scalable workflows with S3, MWAA, Glue, DMS, Hudi, Spark & EMR in a medallion architecture — driving a 35% lakehouse cost reduction.
Led a 300GB migration from DocumentDB to Aurora Postgres — unstructured → structured, zero downtime, no cost increase.
Designed real-time monitoring dashboards with automated alerting for proactive issue detection and improved data reliability.
Led PoCs across RisingWave, Redshift, Dagster, dbt & Olake to evaluate streaming, orchestration and transformation trade-offs, influencing platform architecture decisions.

Software Engineer Intern

MagicPin

Gurugram, India

May 2022 — Aug 2022

A tech startup focused on enhancing retail & shopping experiences through data-driven insights.

Enhanced data crawling & parsing to efficiently extract product information from multiple sources — a 10% improvement in speed and performance.
Applied Python web crawling to automate data collection, improving efficiency and scalability by reducing Docker image size by 20%.
Assisted in deploying containerized data pipelines using Docker & Kubernetes, improving the scalability of data ingestion workflows.

Summer Research Intern

National University of Singapore

Singapore

Jun 2022

A leading research university specializing in innovation and advanced technology.

Improved AI model performance through optimization and experimental evaluation, in collaboration with NUS and Hewlett Packard Enterprise.
Group research project: designed an AI-powered fashion design system using Neural Style Transfer (NST) and Generative Adversarial Networks (GANs).

03 — Stack

The tools I reach for.

🧮

Data Processing & Platforms

SparkSQLPythonClickHouseHudiIceberg

☁️

Cloud & Storage

S3EMRGlueLambdaRDSDynamoDBAthenaDMSIAMEC2ECSECRDocumentDBAurora Postgres

🪄

Orchestration & Transformation

Airflow (MWAA)Dagsterdbt

🛠️

DevOps & Tools

DockerJenkinsGitKubernetes

04 — Projects

Things I've built for fun.

Airflow

Spark

MinIO

Metabase

Stock Market Data Pipeline

End-to-end pipeline on Apache Airflow + Docker to ingest, process & store daily stock data. Dockerized Spark transforms, stored in MinIO (S3-compatible) & PostgreSQL, visualized in Metabase.

AirflowDockerSparkMinIOPostgreSQLMetabase

style+contentdesign

AI-Generated Fashion Design

AI design system using Neural Style Transfer & GANs to blend artistic styles into unique patterns — reducing iteration time from hours to seconds. Secured 2nd place at NUS for innovation.

PythonNSTGANsDeep Learning

🎓

B.Tech in Computer Science Engineering

Shiv Nadar University · India

Aug 2020 — May 2024

05 — Contact

Let's build something reliable.

Open to data engineering roles & interesting platform problems. The fastest way to reach me is email.

yashgupta1470@gmail.com Download resume

LinkedIn·+91 99901 81300·Bangalore, India