Projects

End-to-end data engineering and analytics work — pipelines, dimensional models, and dashboards built on real datasets.

TravelGenie — AI Travel Recommendation System

End-to-end Gen AI pipeline — Airflow, Snowflake, dbt, Neo4j knowledge graph, and Pinecone vector search — routing natural language queries through a LangGraph agentic workflow with entity extraction and human-in-the-loop correction. Benchmarked 4 LLMs (Phi-3, Gemma2, Llama3.2, Qwen2) before selecting Phi-3 for entity extraction accuracy and latency.

AirflowSnowflakedbtNeo4jLangGraphOpenAIPython
View on GitHub

Multi City Traffic Collision Analytics

Multi-city (NYC, Chicago & Austin) BI pipeline integrating ~2.9M collision records via 17 Talend jobs per city, with a Type 2 SCD dimensional model and bridge tables for many-to-many relationships. Delivered 12 Power BI dashboards covering hotspot maps, temporal trends, and fatality breakdowns by road user type.

TalendSQL ServerPower BITableauSCD Type 2Star Schema
View on GitHub

Food Safety Compliance Analytics

Unified 740K+ food inspection records from two structurally incompatible city datasets — Alteryx profiling, Talend ETL, and a Kimball star schema with two fact tables and six dimensions in SQL Server. Surfaces compliance KPIs by facility type, zip code, violation code, and inspection outcome.

TalendAlteryxSQL ServerStar SchemaER/StudioT-SQL
View on GitHub

IMDB Movie Analytics BI

Data warehouse across 38M+ rows from IMDb, MovieLens (25M ratings), and Box Office Mojo — 19 parallelized Talend jobs, 27 dimensions, and 16 fact tables. Power BI and Tableau dashboards surface title rankings, people explorer, genre ratings, and franchise box office trends.

TalendSQL ServerPower BITableauStar SchemaER/Studio
View on GitHub