Ishmael McCalla Back to portfolio
Case study · Conversation design + Production AI

AI agents that behave like good employees.

I build production AI chatbots end-to-end. Design, prompts, code, deploy. Two live agents for CoverTurn shown here. A chatbot that captures leads after hours, on Claude Haiku and Cloudflare Workers. An autonomous outbound voice agent on Retell. Both designed against WCAG 2.2 AA, Nielsen's heuristics, and the Cooperative Principle.

Role
Conversation designer + builder
Stack
Claude, Retell, Cloudflare
Surfaces
Chat widget, phone, webhook
Status
Live in production
Agent 01 · Customer-facing chatbot

Captures leads while the owner is asleep.

Hey, I'm CoverTurn's assistant. Can I ask what kind of business you run?
Mobile mechanic in Brixton.
Got it. Want a quick site preview I can email over?
Agent 02 · Outbound voice agent

Runs sales calls end-to-end.

+1 (646) 681 5650
Live
01The brief

Two agents. Both customer-facing. Both have to feel honest.

One inbound chatbot to capture leads from the website at any hour. One outbound voice agent to run calls and book demos without me on the line. Both customer-facing, both representing the brand.

The constraint that shaped both was trust. They cannot oversell, cannot lie about what they are, cannot loop, cannot embarrass. Designed against the same standards as a human employee, and built so the underlying model can be swapped without rewriting the conversation.

02Approach

Treat each agent like a junior employee.

Grounded in the conversation design canon (Hall, Pearl, Grice), accessibility for non-visual interfaces (WCAG 2.2 AA, ARIA live regions, focus management), and Gong's analysis of more than three hundred million cold calls. Cross-checked against transcripts from my own early test runs.

Working theory: give every agent a script, a voice, and a clear set of rules about when to escape. The rest is craft.

03Agent 01: The chatbot

Designed for after hours, scoped against Nielsen.

A small orange widget in the bottom-right of every CoverTurn site, running on Claude Haiku. One job: capture a lead, qualify lightly, hand off by email. Built against Nielsen H1, H5, H6, H7, H10 and WCAG 2.2 AA.

Chatbot · standards applied
WCAG 2.2 AA Nielsen H1, H5, H6, H7, H10 ARIA live regions Focus trap Keyboard-first Prefers-reduced-motion 4.5:1 contrast 44px hit targets

Persona rules. Forward-looking framing only. One or two sentences per turn. Repair on off-script input. Honest disclosure on any "are you a bot" question. Nothing about the agent is allowed to deceive a user about what it is.

The configuration that ships

ModelClaude Haiku, tuned for fast structured intake
Latency targetSub-second first token, <2s full response
StreamingServer-sent events, ARIA live region polite
PersistenceSession-only. No PII stored client-side.
Hand-offEmail summary to owner via Cloudflare Worker
Failure modeAlways offer the email and phone number as fallback

Hosted on Cloudflare on the same domain as the site to keep latency low. Every conversation is summarised into a short email to the owner. From the owner's perspective: wake up, find "a lead came in at 11pm, here's what they wanted, here's their number".

04Agent 02: The voice agent

An outbound caller that talks like a person.

Calls real prospects, books demos onto Cal.com. No UI, no buttons. If the first three seconds sound like a script, the prospect hangs up. Voice UX is mostly latency, prosody, and turn-taking, and little else.

First script was an information-delivery pitch. It went nowhere. Real calls don't work like pitches; they work like conversations.

The 4-yes Socratic flow

The version in production (v19) is built on a four-yes structure: Cialdini commitment, SPIN situation and problem questions, Sandler pain funnel compressed for cold-call attention spans.

Permission-based opener, two discovery questions each followed by a mirror-back yes, then a qualifying yes, then the demo offer. Four yeses before the ask.

Agent: Hey, is this Mike?
Mike: Yeah, who's this?
Agent: I'll be honest, this is a cold call. Want to hang up or give me 30 seconds?Permission-based opener.
Mike: Go on then.
Agent: Quick one. When was your website built?SPIN situation question.
Mike: Maybe four years ago.
Agent: Got it, so it's been a few years. Where do most of your jobs come from now?Mirror-back, then discovery 2.
Mike: Mostly word of mouth.
Agent: So mostly word of mouth, yeah? Are you looking to get more leads from Google right now?Qualifying yes.

Settings shipped. Latency: 500–750ms end-to-end, inside the human-pacing band. Interruption sensitivity at maximum, so the agent waits rather than talks over real prospects. A separate inbound agent handles callbacks with a context-aware greeting.

Voice agent · standards applied
Cooperative Principle (Grice) Cialdini commitment + consistency SPIN situation + problem Sandler pain funnel (compressed) Permission-based opener (Gong data) Forward-looking framing Latency <800ms target AI disclosure on direct ask One ask per turn
05The vibe-coding workflow

Six steps. Disciplined, not improvised.

01

Read the canon and the user data

Conversation design research for the agent, real transcripts and chat logs for the user. Both, every time.

02

Spec before prompt

Persona, voice, openings, escape hatches, failure modes, a11y floor, latency budget. Written as a brief.

03

Pair with Claude in Cursor

Spec in, working code out. The designer's job becomes editing for taste, a11y, and brand voice.

04

Test live

Five live calls or twenty real chats teach more than fifty internal tests. Read every conversation end-to-end.

05

Version like code

Nineteen voice agent versions in two weeks. One named change, one hypothesis per version.

06

Audit every release

Forward-looking framing, locale, a11y floor. Copy audit before every deploy. Regressions treated like bugs.

06Outcomes

Two agents in production. Real conversations, real data.

2
production agents shipped: chatbot and outbound voice
19
script versions, each with a single named change and a single hypothesis
500–750ms
end-to-end voice latency, inside the human-pacing band
Sub-1s
first-token response on the chatbot via Claude Haiku
AA
WCAG 2.2 floor met across both agent surfaces
3
conversation design frameworks woven through: Grice, Cialdini, SPIN
07Reflection

Conversation design is product design without the UI.

Same heuristics, same accessibility floor, same craft. Every turn the agent takes is a layout decision in time instead of in space. AI-native design isn't faster design at the same level of taste; it's the same speed of design at a higher level, because the time saved on boilerplate gets reinvested in research, conversation review, and audits.

See the chatbot in action.
Open it ↗ Get on a call →