Vansh Sinha — Data Engineer & Full-Stack Developer

TL;DR

Role

Full-Stack Engineer & Product Designer

Timeline

6-7 days (v1.0)

Team

Solo Project

Year

2025

Daily Ingestion200+ items

Data Sources30+ feeds

SearchSemantic + AI

Overview

DeepDive is my attempt at building a daily workspace for developers. Instead of juggling Twitter, Reddit, Hacker News, GitHub, and 10 different tabs, DeepDive brings it all into one place, powered by AI. It's not just aggregation—it's a smart, developer-first intelligence layer for the tech world. A single workspace where you can discover, track, and understand what's happening in tech, in real time.

Problem

Keeping up with tech is exhausting. Developers juggle multiple platforms—Twitter for announcements, Hacker News for discussions, Reddit for insights, GitHub for repos, RSS feeds for news. Each has its own interface, search limitations, and noise. There's no unified way to search across everything, track topics you care about, or get AI-powered summaries. You're either drowning in tabs or missing important updates.

Approach

Built an ingestion pipeline that pulls 200+ items/day from Twitter (X), Reddit, Hacker News, and 30+ RSS feeds (TechCrunch, OpenAI blog, GCP, AWS, etc.). Content is cleaned, tagged, and embedded using sentence-transformers, then stored in a FAISS vector database. A multi-factor trending algorithm (popularity + recency + velocity + tags) surfaces what matters. Semantic search powered by embeddings lets users ask real questions, not just keyword match. Collections and watchlists enable personalized tracking.

Architecture

Next.js 14 + Tailwind + shadcn/ui frontend (Vercel) → FastAPI backend on Cloud Run (Python 3.11, SQLite + SQLAlchemy, FAISS for vectors) → Redis caching + rate limiting. Google OAuth + JWT for auth. Ingestion pipeline runs continuously, embedding content with sentence-transformers and indexing in FAISS. Structured logging tracks performance and errors.

Key Decisions

Results

Launched DeepDive v1.0 after 6-7 days of heads-down building. The platform ingests 200+ new items daily from Twitter, Reddit, HN, and 30+ RSS feeds. Semantic search powered by sentence-transformers and FAISS enables concept-based queries. Users can create collections and watchlists to track topics they care about. Multi-factor trending surfaces what's actually hot in tech.

Daily Ingestion200+ items

Data Sources30+ feeds

SearchSemantic + AI

What I Learned

Building for yourself first creates the best product motivation. Semantic search is a game-changer—keyword matching feels primitive once you've used embeddings. Starting with SQLite and FAISS keeps infrastructure simple while delivering powerful features. Ship v1 fast, iterate based on real usage.

What I'd Do Next

Internship Hub: real-time feed of internship postings for students. AI-curated digests: daily/weekly tech briefings via email. Podcast + YouTube integration: add dev podcasts and channel updates alongside papers and repos. Collaboration features: share collections with friends or teammates. Mobile-first companion app: lightweight version for on-the-go discovery.

Next Case Study

BigQuery Query Rerun Manager

Metadata-driven query execution system for reliable batch analytics

DeepDive — In Progress

TL;DR

Overview

Problem

Approach

Architecture

Key Decisions

FAISS for Vector Search

Multi-Factor Trending Algorithm

SQLite + SQLAlchemy

Google OAuth + JWT

Cloud Run for Serverless Scale

Results

What I Learned

What I'd Do Next

BigQuery Query Rerun Manager