End-to-end data engineering and analytics work — pipelines, dimensional models, and dashboards built on real datasets.
TravelGenie — AI Travel Recommendation System
End-to-end Gen AI pipeline — Airflow, Snowflake, dbt, Neo4j knowledge graph, and Pinecone vector search — routing natural language queries through a LangGraph agentic workflow with entity extraction and human-in-the-loop correction. Benchmarked 4 LLMs (Phi-3, Gemma2, Llama3.2, Qwen2) before selecting Phi-3 for entity extraction accuracy and latency.
Multi-city (NYC, Chicago & Austin) BI pipeline integrating ~2.9M collision records via 17 Talend jobs per city, with a Type 2 SCD dimensional model and bridge tables for many-to-many relationships. Delivered 12 Power BI dashboards covering hotspot maps, temporal trends, and fatality breakdowns by road user type.
TalendSQL ServerPower BITableauSCD Type 2Star Schema
Unified 740K+ food inspection records from two structurally incompatible city datasets — Alteryx profiling, Talend ETL, and a Kimball star schema with two fact tables and six dimensions in SQL Server. Surfaces compliance KPIs by facility type, zip code, violation code, and inspection outcome.
Data warehouse across 38M+ rows from IMDb, MovieLens (25M ratings), and Box Office Mojo — 19 parallelized Talend jobs, 27 dimensions, and 16 fact tables. Power BI and Tableau dashboards surface title rankings, people explorer, genre ratings, and franchise box office trends.