The Wrong Way to Add AI
Most teams reach for a new AI feature the same way they'd bolt on a third-party widget: drop in an API call, get the result, render it. It works in demos. It fails in production.
The problem isn't the AI — it's the architecture around it. When an LLM is treated as a synchronous utility function, three things predictably go wrong:
**Latency kills UX.** A call to the Claude API or GPT-4 typically takes 2–8 seconds. Blocking your UI on that response destroys the perception of speed your product has spent months building.
**Errors are invisible.** AI APIs return 500s, rate limit errors, and malformed outputs at a meaningful frequency. If you don't design failure paths, your users see broken states.
**Cost is uncontrolled.** Token consumption scales with usage in ways that surprise everyone the first time. Without usage tracking, a single power user can triple your monthly bill.
The Pattern That Works
After integrating AI features into RoofMarshal and several client projects, I've converged on a pattern that holds up in production.
1. Async by default
Never make AI calls synchronous in the user's request path. Instead, trigger them as background jobs and stream or poll for results.
// ❌ Synchronous — blocks the response
const summary = await anthropic.messages.create({ ... });
return Response.json({ summary: summary.content[0].text });
// ✅ Async — returns immediately, streams result
const jobId = await enqueueAIJob({ type: "summarize", input });
return Response.json({ jobId });In Next.js, this means your Server Actions or API routes hand off to a queue (I use Upstash QStash), and the UI polls or subscribes to updates.
2. Typed prompt functions
Treat every LLM interaction as a typed function — defined input, defined output schema, validation on both ends.
async function scoreLead(lead: Lead): Promise<{ score: number; reason: string }> {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 256,
messages: [{ role: "user", content: buildLeadScoringPrompt(lead) }],
});
const text = response.content[0].type === "text" ? response.content[0].text : "";
return parseLeadScore(text); // validates structure, provides defaults
}This makes the AI integration testable, mockable in tests, and debuggable when it goes wrong.
3. Explicit cost controls
Every AI call should have token limits. Every feature should have usage caps. Track consumption at the user/tenant level from day one — retrofitting this later is painful.
const response = await anthropic.messages.create({
model: "claude-haiku-4-5-20251001", // cheaper model for simple tasks
max_tokens: 512, // hard cap — never let this float
...
});Match the model to the complexity of the task. Not everything needs Opus.
What to Build First
If you're adding your first AI feature, pick one that is:
- **Non-blocking** — the user doesn't wait for it in real-time (suggestions, summaries, scoring)
- **Recoverable** — the product still works if the AI fails (it's enhancement, not core flow)
- **Measurable** — you can tell if it's actually helping users
Lead scoring, content summarization, and "smart defaults" in forms all fit this pattern. Real-time chat-style features are harder to get right and should wait until you understand your AI infrastructure.
The Infrastructure You Need
Before shipping AI in production, you need:
- **Queue** for async jobs (Upstash QStash, Inngest, or similar)
- **Logging** for every AI call — inputs, outputs, latency, tokens, cost
- **Rate limiting** per user to prevent runaway cost
- **Fallback behavior** when the AI is unavailable
None of this is glamorous. All of it is necessary.
The teams that ship reliable AI features aren't doing anything magical — they're just treating LLM calls with the same rigor they'd apply to a database query.