Building Sippy Scout: a Chicago happy-hour map

Sippy Scout is a Chicago happy-hour discovery app: an interactive map plus a searchable list of bars and restaurants, each annotated with its recurring happy-hour windows and deals. You can filter by time, neighborhood, price, and whether there's a patio, then see exactly where to go for a 4–6pm half-price oyster special.

This post is a walk through how it's actually built — the data pipeline that turns a messy, ever-changing city into structured deals, the architecture decisions that keep it fast, and a few honest notes on shipping a project like this solo.

The hard part isn't the map — it's the data

On the surface this looks like a simple scraping project. It is not. Chicago is an extremely dynamic market: bars and restaurants open and close constantly, and happy-hour menus are some of the least standardized content on the internet.

Inconsistent sources. Some venues publish a clean happy-hour menu with good imagery. Many bury it in a PDF, a photo, an Instagram post — or don't put it online at all.
Verification overhead. That inconsistency makes accuracy at scale genuinely hard. Building a reliable pipeline meant dealing with fragmented data points and unstructured menu formats, then verifying what came out the other end.

Almost every architectural decision below exists to manage that one problem: how do you collect deal data that's trustworthy enough to publish, at city scale, without a team?

The stack, and why

Sippy Scout is a single Next.js application that serves the public catalog, a signed-in experience, and an admin curation console from one codebase.

Layer	Choice	Why
Framework	Next.js (App Router) + React	One framework for SSR pages, client interactivity, and serverless API routes. Server Components keep the public catalog fast.
Language	TypeScript	A shared type vocabulary spans the database models, API responses, and component props, so the queue → publish → render path is type-checked end to end.
Data & files	Appwrite (BaaS)	Managed database + object storage so I didn't have to operate Postgres/S3 myself.
Auth	Auth.js v5	First-class Next.js integration with JWT sessions and a clean edge/Node split. Google is the only provider — one-click onboarding.
Maps	Mapbox GL JS	GPU-rendered vector maps that can draw thousands of clustered points cheaply.
Scraping	Cheerio	Lightweight server-side HTML parsing to strip pages to visible text before extraction — no headless browser needed.
Hosting	Vercel	Native Next.js target. Its ~60s serverless function ceiling became a recurring design constraint (more on that below).

A couple of environment realities are baked into the code: the dev/build scripts set NODE_EXTRA_CA_CERTS so server-side calls survive a corporate TLS-intercepting proxy, and if the backend env vars are absent the admin tooling falls back to local JSON files so curation still runs with zero cloud dependencies.

How a venue becomes a published listing

The core of the system is one unified ingest engine. A single ingestVenue() function is the only path that assembles a venue record, and every entry point — single admin ingest, bulk discovery, an approved user submission, a backfill job — funnels through it. That way data-quality and provenance rules live in exactly one place.

For any venue, it runs three stages:

Resolve facts. Query a maps/search aggregator for the canonical name, address, coordinates, phone, website, business hours, an outdoor-seating amenity, and a hero photo — authoritative data for location and identity.
Make the photo durable. Source photo URLs are signed and expire (thumbnails silently 403 later). So I download the bytes once and re-host them in our own storage bucket, then store our never-expiring URL.
Extract the deals. If the venue has a website, gather candidate pages (/happy-hour, /specials, /menu…) politely, clean them to visible text with Cheerio, and run an extraction LLM to turn that text into structured happy-hour windows.

Each source contributes only what it's reliable for:

Source	Contributes	Trust
Maps / search aggregator	Name, address, coordinates, hours, amenities, photo	High for facts
The venue's own website	First-party deal / happy-hour content	High for deals
Extraction LLM	Structured windows + metadata gap-fill	Medium — always re-validated
Crowdsourced reports	New venues, inaccuracy fixes	Low — routed through review

Two principles do a lot of heavy lifting here:

Everything lands in review — never auto-publish. Ingest assembles a record but writes nothing public. Results stage in a review queue; a human approves them; an explicit publish step promotes them to the public collections. That trades immediacy for trust, which matters when some of the data was LLM-extracted.
Never trust raw model output. Everything the LLM returns is re-validated: days coerced to 0–6, times to HH:MM (or all-day), deals trimmed and de-duplicated, patio reduced to a strict true | false | null tri-state, and confidence discounted when there's no verbatim evidence. A window only survives if it has at least one valid day and a coherent time pair.

Before anything is staged, it's de-duplicated against the existing catalog by exact normalized key (name|address), by place identifier, and by a fuzzy name match (Levenshtein similarity with "The/A/An" stripped). That stops the same bar showing up under three slightly different spellings from three different sweeps.

Keeping the public side fast

The app is read-optimized for the public and write-gated for curation. On the read path:

Server join + CDN cache + client SWR. The public /api/venues endpoint joins venues to their happy hours server-side, filters to venues with at least one deal, returns one slim payload, and is CDN-cached for five minutes. The client layers a versioned sessionStorage stale-while-revalidate cache on top, so repeat navigations are instant. This collapsed what used to be thousands of client-side joins into one shared, cached computation.
GPU map layers, not DOM markers. The map renders points as a single GeoJSON source with GPU circle layers and clustering — not one DOM node per venue. That's the difference between a crawling map and a smooth one at city scale. The map initializes once; prop changes are applied imperatively.
Cursor pagination, never offset. The backing store caps reported totals and max offset at 5,000 rows, which silently truncated the catalog. Every full sweep now cursors by the last record ID instead.

Working around the serverless clock

Because every Vercel function invocation has to finish under ~60 seconds, the heavy deal-backfill can't run as one long job. Instead it's driven one venue per request: a local runner script loops with no platform timeout, calling a fill-venue endpoint that only fills missing fields for a single venue per call. Short, safe invocations on the server; an unattended grind that can chew through thousands of venues locally.

A note on building with AI

I'll be transparent: a large majority of the codebase was written with AI assistance — I leaned heavily on tools like Cursor and Claude, plus a places API to verify location data. Building this solo from scratch would have taken months.

But the interesting part wasn't generating code — it was avoiding "AI slop." The real work was meticulously guiding the tooling so the final product felt like a polished, cohesive app rather than a pile of disconnected snippets. Most of the decisions in this post — the single ingest path, validate-everything, never auto-publish — are exactly the kind of guardrails that keep an AI-accelerated project from collapsing into inconsistency.

Where it's going

There are plenty of incumbents in this space with a head start on raw venue count, but they tend to share three flaws: clunky UI, aggressive monetization, and stale data. The scraping challenges above explain why their data goes stale — but it still defeats the purpose for the person standing on a sidewalk deciding where to go. Sippy Scout is built to not repeat those mistakes.

On the roadmap:

Migrate auth from Appwrite to Auth.js to drop a vendor dependency and make a future database move easier.
A lightweight review system (thumbs up/down) to validate happy hours — gated mostly on having an active user base.
User-submitted happy hours, which would make scaling far easier than manual scraping. The hurdle is the classic cold-start problem.
Fighting the "West Loop bias." There's a massive concentration of bars in Fulton Market / West Loop, and the high turnover there skews where data is easiest to find. I want to deliberately expand coverage further south for a wider, more authentic spread of neighborhoods.

If you want to see it in action, it's live at sippyscout.com.