Case Study May 1, 2026 9 min read

How We Built an AI Trip Planner in 12 Hours

Jarrett Dargusch · Founder & CEO, OneChair

Introduction

At 9 AM, we had a brief. By 9 PM, Cody Yellowstone was live on a staging URL.

That sentence sounds like marketing copy until you understand exactly how it happened — which agents ran in parallel, which architectural decisions made it possible, and which early choices would have caused the whole timeline to collapse if made differently. This post is the unabridged version: every hour, every decision, every tradeoff.

The finished product is a fully functional AI-powered trip planner for Yellowstone National Park. Users answer a personalized quiz — travel dates, party size, interests, mobility considerations, desired pace — and the system generates a multi-day itinerary with an interactive map, weather context, attraction recommendations ranked by fit, and dining suggestions. The itinerary is exportable and shareable. The whole experience runs on a mobile-optimized web app that works without an account.

We are writing this up because the question we get most often from prospective clients is some version of "prove it." This is us proving it, with the receipts.

The Brief

Katrina Southern reached out on behalf of a Cody Yellowstone tourism organization. The context was straightforward: the region attracts hundreds of thousands of visitors annually, the majority of whom arrive without detailed plans and often miss the experiences that would have made their trip memorable. The existing tourism website had static itinerary PDFs — useful as a starting point but completely unable to account for individual preferences, travel dates, weather, or group composition.

The ask was to build something that could replace the "planning anxiety" that first-time Yellowstone visitors experience. The product needed to feel genuinely intelligent, not like a fancy form with a printout at the end. It needed to work on mobile because most trip planning happens on phones. It needed to be shareable because people plan trips with partners and families. And it needed to actually know Yellowstone — the wildlife activity patterns by season, the road closures, the geyser schedules, the hidden gems that only the third-time visitor knows about.

The brief also included one non-negotiable: the client wanted to see a working product on a staging URL before committing to the full engagement. That constraint was the context in which the 12-hour timeline became real. We said we could deliver a working demonstration of the core product in a single day. We had made that commitment. Now we had to execute it.

The scope we locked in that morning: personalized quiz flow, AI-generated itinerary, interactive map with pins for each recommended stop, basic weather integration, and a shareable link. No user accounts, no booking integration, no payment flow — those were phase two. Phase one was proving the concept worked and felt right.

Hour 0-2: Analysis and Architecture

The first two hours were entirely planning and no code. This is the discipline that makes the subsequent hours possible — skipping architecture to "get started faster" is the mistake that costs you five hours at hour eight.

The first question was tech stack. The constraints shaped the answer quickly. We needed server-side rendering for SEO and initial load performance, a backend capable of handling OpenAI API calls with reasonable latency, a database capable of storing quiz responses and generated itineraries for sharing, map rendering with enough flexibility to handle custom pins and polygons, and a deployment pipeline that could produce a shareable staging URL within the build window.

The answer was: Next.js 15 on the frontend with App Router, NestJS on the backend, PostgreSQL for persistence, Mapbox GL JS for the map, and OpenAI GPT-4o for itinerary generation. The monorepo would be managed with Turborepo to allow the frontend and backend agents to work in parallel without stepping on each other's changes. Deployment to Railway for the backend and Vercel for the frontend, both with preview deployment URLs on push.

The database schema took about forty minutes to finalize. The core entities were: trip_session (the quiz response and metadata), itinerary (the generated plan, stored as JSONB for flexibility), itinerary_day (normalized day-level data for the map), and attraction (a seeded catalog of Yellowstone locations with coordinates, categories, seasonal flags, and accessibility notes). Normalizing the attraction catalog was a deliberate decision — it gave the AI a structured knowledge base to reason against rather than hallucinating geography, and it meant the map pins would always have real coordinates.

We also spent time during this phase defining the OpenAI prompt architecture. Vague prompts produce vague itineraries. The prompt needed to: provide the structured attraction catalog as context, incorporate the quiz responses with explicit reasoning about fit, enforce a specific output schema (JSON with day arrays, each containing ordered stops with times, notes, and attraction IDs from the catalog), and include seasonal and weather context. Getting this right in hour one prevented a painful iteration cycle in hour six.

By 11 AM, we had a Turborepo monorepo initialized, both apps scaffolded, the database schema written and reviewed, and the agent task allocation finalized. Four agents were queued to work in parallel: frontend quiz flow, backend API and database layer, attraction catalog population, and Mapbox integration scaffolding.

Hour 2-5: Core Platform Build

This three-hour block was the heaviest parallel execution of the build. Four agents working simultaneously on separate concerns, with coordination checkpoints at hours 3 and 4 to catch interface mismatches before they compounded.

The frontend agent built the quiz flow in Next.js 15 using the App Router and React Server Components where appropriate. The quiz has six steps: trip dates, group composition (adults, children, seniors, accessibility requirements), primary interests (wildlife, geysers, hiking, history, photography, relaxation), desired pace (packed vs relaxed), accommodation location (west entrance, north entrance, south entrance, in-park lodging), and one open text field for anything the system should know. Each step has client-side validation and smooth transitions. The state is managed with Zustand — lightweight, no boilerplate, works cleanly with Next.js App Router's boundary between server and client components.

The backend agent built the NestJS API layer: three primary endpoints — POST /trips to create a trip session from quiz data, GET /trips/:id/itinerary to fetch a generated itinerary (triggering generation if not yet cached), and GET /trips/:id/share to generate a shareable URL. The data layer used Prisma for type-safe database access with the schema defined in the architecture phase. The trip session controller included basic rate limiting (5 requests per IP per hour) to protect the OpenAI spend during the demo period.

The catalog agent populated the attraction database. This was less glamorous but arguably the most important work of this phase. We seeded 87 Yellowstone attractions: Old Faithful and the Upper Geyser Basin, Grand Prismatic Spring and Midway Geyser Basin, the Grand Canyon of the Yellowstone with both rim trails, Lamar Valley with its wildlife density notes and optimal viewing times by season, Mammoth Hot Springs with the active terrace schedule, Norris Geyser Basin, Hayden Valley, the Boiling River (and its seasonal closure dates), the major trailheads with difficulty ratings, the primary dining options including their hours and waitlist behavior, and the key visitor facilities. Each attraction has: coordinates, a category enum, an accessibility rating (1-4), average time required, optimal seasons as a bitmask, crowd-intensity estimate by time of day, and a 150-word description written for the AI context window. Getting this catalog right meant the AI would reason accurately about which attractions fit a given traveler profile, and the map pins would place correctly.

The Mapbox agent scaffolded the interactive map component in isolation, building against a mock data fixture that matched the expected itinerary schema. Mapbox GL JS with a custom style was chosen over Google Maps for reasons we cover in the technical decisions section below. The component renders day-coded pins (color by day number), a route polyline connecting the day's stops in order, a sidebar panel showing stop details on pin click, and a day selector that filters the visible pins. Working against a fixture from the start meant this component was fully functional before the backend was ready to provide real data — no blocking dependencies.

The coordination check at hour 3 caught one interface mismatch: the frontend was sending the open-text field as additionalContext and the backend was expecting notes. Two-minute fix. This is exactly why coordination checkpoints exist — the same mismatch discovered at hour 7, mid-integration, would have taken twenty minutes and caused cascading confusion.

Hour 5-8: AI Integration

By noon, the scaffolding was solid enough to wire the AI layer. This is where the architecture decisions from hour zero paid off or would have cost us badly.

The OpenAI integration used GPT-4o with structured outputs — JSON mode enforced via the response_format parameter. Structured outputs eliminate the most common failure mode in LLM integration: the model returns valid prose that cannot be parsed into the data structure your UI expects. With JSON mode enforced, the model either returns a valid JSON object matching the schema or raises a structured error. No parsing, no cleaning, no "the model returned markdown-wrapped JSON again" debugging.

The prompt was assembled server-side in the ItineraryGenerationService. The system prompt established the AI's role (an expert Yellowstone trip planner with deep knowledge of the park), the output format (a JSON object with a days array, each day containing an ordered_stops array with attraction IDs from the catalog, suggested arrival times, dwell times, and a 2-3 sentence contextual note), and the constraints (total daily driving time under 2.5 hours, accessibility requirements are non-negotiable hard constraints, wildlife activity patterns must be accounted for). The user prompt injected the quiz responses, the current date for seasonal context, and a filtered subset of the attraction catalog — filtered by season, by accessibility score if required, and by category weights derived from the traveler's stated interests.

The filtering step was important. Sending all 87 attractions to GPT-4o would have worked but produced longer latency, higher token cost, and less focused itineraries. By pre-filtering to the 25-35 attractions most relevant to the specific traveler profile, we kept the context window focused and the generation time under 4 seconds on average — acceptable for a "generating your itinerary" loading state.

Weather integration used the National Weather Service API (free, no rate limits for this usage volume) to pull a 7-day forecast for the park's central coordinates. The forecast was summarized into a plain-English string — "expect afternoon thunderstorms on days 2-3, morning temperatures around 45F" — and injected into the prompt as additional context. The AI used this to adjust activity sequencing (outdoor geysers in the morning before storms, indoor visitor centers in the afternoon) and to add weather notes to relevant stops.

The itinerary generation was made asynchronous with a simple polling pattern. The POST /trips endpoint created the trip session and immediately returned the trip ID with a status: generating. The frontend polled GET /trips/:id/itinerary every two seconds. Once the generation completed, the endpoint returned the full itinerary and the frontend transitioned from the loading state to the results view. This pattern kept the API response fast and the UX honest — the loading spinner actually corresponded to real generation work happening, not a fake progress bar.

Map integration connected at 2:30 PM. The Mapbox component consumed the real itinerary data for the first time and everything aligned: attraction IDs resolved correctly to coordinates from the catalog, the day-coded pin colors rendered, the route polylines drew correctly between stops, and the sidebar panel populated with the right content on pin click. One issue surfaced: two attractions in the catalog had swapped latitude/longitude values (a data entry error during catalog population). Identified immediately by the pins appearing in the ocean off the coast of Africa. Fixed in five minutes. The catalog agent had a validation step for coordinate ranges that should have caught this; we added it retroactively to prevent recurrence.

Hour 8-10: Polish and Testing

With the core functionality working, the build shifted into a different mode: the relentless identification and resolution of everything that would make a real user distrust the product.

Mobile responsiveness was the first pass. The quiz flow and results view were tested at 375px (iPhone SE), 390px (iPhone 15), and 414px (iPhone Plus). The Mapbox component required explicit work at mobile widths — the sidebar panel was redesigned as a bottom sheet that slides up when a pin is selected, which is both more mobile-native and more space-efficient than the desktop sidebar layout. Touch targets on the quiz navigation buttons were expanded to 48px minimum. The itinerary day selector on the map was converted from a tab row to a horizontal scroll container with visible overflow for smaller screens.

Loading states needed attention. The quiz step transitions were instant — fine. The itinerary generation step had a multi-state loading UI: an initial "Analyzing your preferences..." message, then a sequence of messages cycling every 3 seconds ("Checking seasonal wildlife patterns...", "Mapping your route...", "Adding local tips...") to maintain engagement during the 3-5 second generation window. These messages were not random — they corresponded to the actual work happening in the prompt, which gave them authenticity that users notice even if they cannot articulate why.

Error handling was split into two categories. Generation errors (OpenAI timeout, rate limit, malformed response) triggered a retry with a fallback prompt that requested a simpler itinerary format. If the retry also failed, the user saw a friendly error with a "Try again" button that reused their quiz responses. Quiz validation errors showed inline on each field with clear correction guidance. Both error paths were tested with forced failures to verify they resolved gracefully.

Edge cases that surfaced during testing: what happens when a user selects a single-day trip? The AI correctly generated one day but the day selector UI showed "Day 1 of 1" which looked odd; we changed the logic to hide the selector entirely for single-day itineraries. What happens with a party requiring full wheelchair accessibility? We tested this path explicitly and confirmed the catalog filtering correctly excluded trails rated 1-2 on accessibility and the AI respected the constraint in its stop notes. What happens if the user submits the quiz extremely fast and the itinerary generation starts before the database write completes? We added a transaction to ensure the trip session exists before triggering generation.

Hour 10-12: Deployment and Handoff

By 7 PM, the product was functionally complete. The final two hours were deployment, smoke testing in production, and preparing the handoff package for Katrina.

The Railway deployment for the backend was straightforward: environment variables for the PostgreSQL connection string, OpenAI API key, and Mapbox token were set in the Railway dashboard. The Prisma migration ran cleanly against the production database. The NestJS app started and passed its health check endpoint. Total time from pushing the production branch to the backend being live: eleven minutes.

The Vercel deployment for the frontend required one adjustment: the Next.js configuration needed the NEXT_PUBLIC_API_URL environment variable set to the Railway backend URL, and the API URL needed to be in Vercel's allowed domains list for server actions. That was a two-minute configuration change. The Vercel preview deploy built in four minutes and the production deploy followed immediately.

Smoke testing in production covered the full user journey: quiz submission, itinerary generation with real OpenAI calls, map rendering with Mapbox against real Yellowstone coordinates, share link generation and access from a different browser session. One production-specific issue: the weather API calls from the Railway backend needed an explicit User-Agent header set to avoid being blocked by the NWS API's bot detection. The development environment had not hit this because we were making fewer requests; the production environment triggered it on the first call. Five-minute fix.

The handoff package delivered to Katrina at 9 PM included: the staging URL with test credentials, a 10-minute Loom walkthrough of the full user journey, a technical summary of the stack and architecture, and a roadmap of phase two features we had identified during the build (booking integration with park lodges, user accounts for saving itineraries, PDF export, and a guide-facing admin panel for updating the attraction catalog without code changes). The attraction catalog update workflow was specifically noted as a near-term priority — the park's seasonal operations change frequently, and a non-technical admin interface for keeping it current was important for the product's long-term accuracy.

What the Client Saw

Katrina accessed the staging URL at 9:15 PM. Her first message back was two sentences: "This is exactly what I was imagining. How fast can we go to production?"

The experience from her perspective: she entered a trip she was actually planning for later in the month — three adults, one with limited mobility, five days, primary interest in geysers and photography, staying near the west entrance. The quiz took about ninety seconds to complete. The loading screen ran for four seconds. The itinerary that appeared was genuinely useful: it sequenced Old Faithful for the early morning to beat the crowds, included Grand Prismatic with a note about the Mesa Falls Scenic Byway as an alternative for the mobility constraint, recommended Lamar Valley for the late afternoon golden hour photography, and flagged the afternoon thunderstorm forecast for day three with a suggested indoor alternative at the Yellowstone Heritage and Research Center.

This is the moment that matters in any product demo: when someone uses it with real data from their real life and it gives them something they would not have figured out themselves. That is the signal that the product is solving the actual problem, not just passing a technical demonstration.

The full case study with screenshots and outcome metrics from the production launch is in our Cody Yellowstone portfolio entry.

Technical Decisions That Mattered

Post-mortems are most useful when they are specific about which decisions mattered and why. These are the five decisions that most determined the outcome.

Turborepo for the monorepo. The alternative was separate repositories for frontend and backend. Separate repositories would have required the frontend and backend agents to coordinate on API contracts through an external channel (a shared document or constant communication) rather than sharing type definitions from a common packages/types directory. The shared types package meant both agents were always working against the same interface definitions, and interface mismatches surfaced at compile time rather than runtime. In a 12-hour build, every hour of debugging prevented is significant.

Mapbox over Google Maps. Google Maps was the initial instinct — more familiar, more documentation. We chose Mapbox for two reasons. First, Mapbox GL JS allows fully custom map styles, and we wanted the map to feel like a product feature rather than an embedded widget. The custom style matched the product's visual language in ways that the Google Maps API does not support without significant additional work. Second, Mapbox's React integration is cleaner for dynamic data — the component model maps more naturally to React's rendering behavior than the imperative Google Maps API. The tradeoff is that Mapbox is less familiar to most engineers, which increases initial setup time. In this case, the initial setup was handled by the Mapbox specialist agent and the time tradeoff was net positive.

Next.js 15 App Router. We could have used Pages Router, which is more established and has more existing examples for the patterns we needed. We chose App Router because React Server Components allowed us to fetch the initial itinerary data server-side on the results page, which eliminated a loading flash on navigation and improved the page's Core Web Vitals scores. The App Router's streaming support also allowed the map and the itinerary sidebar to hydrate incrementally rather than waiting for all data to be ready — the user sees the map with pins before the sidebar prose is fully loaded. These are not dramatic differences, but they compound into a noticeably more polished experience.

Structured outputs for OpenAI. The alternative is prompt-engineering your way to consistent JSON output — asking the model to "respond only in JSON format" and parsing the result. This works until it does not: a long generation occasionally breaks the JSON structure, or the model wraps the response in a markdown code block, or the schema drifts slightly in a way that breaks the downstream parser. With OpenAI's structured outputs feature (available on GPT-4o), the model is constrained at the API level to produce output matching the provided JSON schema. The failure mode shifts from "unparseable response at runtime" to "API error at call time," which is dramatically easier to handle. For a build where itinerary generation is the core value, this reliability improvement was non-negotiable.

Pre-seeded attraction catalog. The most tempting shortcut was to skip the catalog and let the AI generate all attraction data from its training knowledge. This would have been faster to build and catastrophically worse in output quality. LLMs hallucinate geographic data confidently. Without a ground-truth coordinate database, the map would have pins in plausible but wrong locations. Without a structured catalog with seasonal flags and accessibility ratings, the AI would produce itineraries that occasionally recommended trails closed for bear management or suggested a geyser viewing platform that had been rebuilt since the training cutoff. The catalog took three hours to populate correctly. Those three hours bought us a product that the client could actually use for real visitors without embarrassment.

Lessons Learned

We do these post-mortems after every fast-turnaround build, and the lessons accumulate into better architecture decisions on the next one. Here is what this build taught us or reinforced.

What worked perfectly. The parallel agent architecture — four agents building independent components simultaneously — is the single biggest accelerant in a compressed timeline. The key enabler is the architecture phase: if you have not defined the interfaces between components clearly before agents start building, parallel work converges into a coordination nightmare. Clear interfaces, shared type definitions, and coordination checkpoints every 90 minutes is the operational model that makes parallel work safe.

The decision to build the Mapbox component against a mock fixture from the start was correct. Building UI components against mocked data rather than waiting for the real API means frontend work never blocks on backend work. The mock exactly matched the real data schema because both were derived from the same shared type definitions in the monorepo. This is the value of types as coordination artifacts, not just type safety.

The prompt architecture defined in the first two hours held for the entire build. We did not need to significantly revise the system prompt after the first test generation. This is unusual; most AI-integrated builds require multiple rounds of prompt iteration. The reason it worked first time is that we spent forty minutes thinking through the constraints before writing any prompt text, rather than starting with something vague and iterating toward specificity.

What we would do differently. The attraction catalog validation step — checking that all coordinates fall within valid geographic bounds — should have been automated from the start rather than added after we found the two inverted coordinates. A simple validation script run as a post-seed step would have caught the error in minutes rather than requiring us to spot it in the map output. We have added this to our standard catalog seeding template for future builds.

The NWS API User-Agent requirement should have been caught in the production readiness checklist. We have added "verify all third-party API calls include appropriate headers for production environments" to the deployment smoke test protocol. It was a five-minute fix, but a five-minute fix that should never have reached production in the first place.

Finally: the sharing feature was simpler than it needed to be. A shareable URL that resolves to a read-only view of a generated itinerary works, but users naturally want to create variations — "show me the same trip but with a more relaxed pace" — without rerunning the full quiz. A "remix" feature that pre-populates the quiz with the current trip's parameters and allows modification would significantly increase the product's utility and shareability. It is on the phase two roadmap now, but it could have been scoped in at hour zero with an hour of additional implementation time. The lesson is that share patterns should be designed at architecture time, not treated as a simple link-generation feature.

The full technical architecture, including the database schema and the OpenAI prompt templates, is available on request for teams building similar products. We built OneSpark to make this kind of AI-accelerated development accessible beyond one-off projects. If you have a product concept you want to see on a staging URL in days rather than months, start with our MVP development service. The brief takes fifteen minutes. The first staging URL takes less time than you think.

For a deeper look at the methodology behind this build, read our posts on what AI-orchestrated development actually means and how AI agents build software — both of which describe the operational model we used on Cody Yellowstone in more technical detail.

Was this article helpful?

Have a question about this topic?

Ask us directly — we respond within 24 hours.