By Dan Hnatkovskyy in AI for Builders — Jul 3, 2026

Voice AI Latency in Home Builder Sales | Jome

A buyer says hello. There's a half-second pause. Then the AI responds.

That half-second is the difference between a buyer who stays on the line and a buyer who hangs up.

Latency is the most underrated variable in voice AI evaluation. It doesn't show up in vendor decks. It rarely makes it onto an RFP. And it determines, more than almost any other technical spec, whether your buyers will treat the AI conversation as a real conversation.

This post is the definitional piece. What latency is, where it comes from, what's "normal" in 2026, and how to actually test it before you sign a vendor agreement.

TL;DR — what to remember

Latency is the time from the buyer finishing their sentence to the AI's first audible word back.
Under 500ms feels normal. Most buyers don't notice anything.
500-800ms is the awkward zone. Buyers feel it. Some hang up.
Over 1 second feels broken. You'll see contact-rate drops in the data.
Latency varies by call segment, time of day, network, and prompt complexity. Don't trust a single number.
Test it on a real call, not on the vendor's demo. Demos are tuned to look fast.

What latency actually is in a voice AI call

When a buyer speaks, six things have to happen before the AI's voice comes out of their phone:

The audio travels from the buyer's phone to the carrier network
The carrier hands it off to the AI vendor's call infrastructure
The vendor's speech-to-text (STT) layer transcribes the audio
The language model interprets the transcribed text and generates a response
The text-to-speech (TTS) layer turns the response into audio
The audio travels back to the buyer

The latency the buyer experiences is the sum of all six steps. Engineers usually call this mouth-to-ear delay, and the international telecommunications standard (ITU-T G.114) has used roughly 150ms one-way as the threshold for "no perceptible delay" in human-to-human calls for decades.

In a human-to-human call, the network round-trip is the dominant cost. In a human-to-AI call, the STT + LLM + TTS pipeline adds 300-1500ms on top of the network. That's the variable that matters.

What's "normal" in 2026

The numbers below are based on hands-on testing of voice AI vendors serving builder, dealer, and contact-center markets in the first half of 2026. They are directional, not benchmarked.

Tier	Typical latency	What it feels like
Best in class	250-450ms	Indistinguishable from human
Strong	450-700ms	Slight pause, generally acceptable
Average	700-1,000ms	Noticeable. Buyers may interrupt.
Weak	1,000-1,500ms	Awkward. Hang-up risk increases.
Broken	1,500ms+	Conversation falls apart.

Two things move latency up: complex turns (the AI has to retrieve information from a CRM or knowledge base mid-call) and longer responses (TTS for a 5-second answer takes longer to start than TTS for a 1-second one, depending on the streaming architecture).

Why this matters for builder sales calls

Most builder calls are short and pattern-driven. A first-touch call lasts 2-4 minutes. The buyer is probably driving, cooking dinner, or watching their kids. They're impatient.

The cost of a 1-second delay isn't just one awkward moment. It compounds:

The buyer interrupts the AI more often, breaking the flow.
Hang-up rates rise. Some buyers think the call dropped.
Trust erodes. The buyer assumes the AI is "thinking too hard" and stops giving useful information.

In one test, the difference between a 450ms and 900ms response time on a first-touch call moved completion rate from 78% to 61% on the same script with the same buyer audience. That's a meaningful pipeline difference.

Where latency comes from (and what each vendor controls)

When you ask a vendor about latency, the answer should walk through these four layers honestly:

Network. The vendor doesn't fully control this. Carrier handoffs, regional connectivity, and your buyer's cell signal all matter. A good vendor has tier-1 carrier relationships that minimize the variable.

STT (speech-to-text). Streaming STT (transcribing as the buyer speaks) is faster than batched STT (waiting for the buyer to stop). All modern voice AI is streaming. If a vendor is still batching, that's a major flag.

LLM inference. This is the biggest variable. Model size, context length, hosting region, and prompt complexity all factor in. Vendors who host models close to call infrastructure have the lowest latency.

TTS (text-to-speech). Streaming TTS (start playing audio while the rest of the sentence is still synthesizing) is faster than batched TTS. Modern voice AI is streaming TTS too.

The good news for builders running existing CRM stacks: AI sales extensions like Jome integrate with Lasso, BuilderTrend, Salesforce, and Pipedrive without adding network latency to the voice path — the CRM lookup happens on a parallel thread, not in the buyer's call. The voice loop stays fast; the data write happens in the background.

How to actually test latency before you buy

Three tests, all of which a serious vendor will agree to.

Test 1 — the cold call test. Have the vendor call a phone number you control. Time the first response after you say hello. Repeat 5 times. The median should be under 700ms. If it's higher, ask why.

Test 2 — the mid-call interrupt. Mid-conversation, ask a question that requires the AI to look up information ("what's the lot premium on lot 14 at the Cedar Park community?"). Time the response. This is the worst case — STT + LLM + retrieval + TTS. Should be under 1.2 seconds.

Test 3 — the load test. Have the vendor run 5 simultaneous calls to your test number. Latency on call 5 should be the same as on call 1. If it isn't, you're seeing infrastructure constraints that will hit you on a real launch day.

Don't accept "we run on great infrastructure" as an answer. Numbers, on calls you can hear.

Common mistakes when evaluating latency

Mistake 1 — testing on the vendor's demo URL. The demo is tuned for the demo. The test number you'll be assigned in production isn't always on the same infrastructure.

Mistake 2 — confusing latency with naturalness. A voice AI can have 300ms latency and still sound robotic, or 800ms latency and sound great. Test both separately. We covered the naturalness question in our earlier piece on whether AI voice still sounds robotic in new home sales.

Mistake 3 — only testing in your home market. If your communities span Phoenix, Austin, and Charlotte, test from cell numbers in all three. Regional latency varies more than vendors admit.

Mistake 4 — accepting an "average" without a distribution. "Average 600ms" can include a lot of 1,200ms outliers. Ask for the 95th percentile, not the median.

Mistake 5 — ignoring time-of-day effects. Some vendors run on shared infrastructure that gets congested during business hours. Test at 9 AM, 1 PM, 6 PM, and midnight.

What this means for your team

Latency isn't the only thing that matters in voice AI evaluation, but it's the variable that most often surprises builders after they've signed. A vendor with great voice quality, great training data, and great CRM integration can still fail in production if their inference stack runs hot.

Add three latency tests to your RFP. Run them on real numbers, real cell signals, real times of day. The vendors who pass will be the ones whose pilots actually convert. (Bokka Group's broader analysis of how AI is changing builder marketing is a useful adjacent read on the category overall.)

FAQ

Is latency the same as response time?Roughly. Engineers split hairs about whether to measure mouth-to-ear, end-of-speech-to-first-syllable, or time-to-first-token. For builder evaluation, measure end-of-buyer-speech to start-of-AI-audio. That's what the buyer feels.

Does a slower model always mean better answers?No. The biggest models have higher latency but the answer quality on first-touch builder calls is roughly equivalent across tiers — the conversation patterns are simple enough that smaller, faster models hold their own. Test both.

Can latency be reduced after launch?Sometimes. Vendor-side caching, prompt optimization, and routing changes can shave 100-200ms. Big infrastructure changes (different STT vendor, different model, different region) take longer. If you're not happy with launch-day latency, ask what's possible in 30 days.

Should we run our own STT or rely on the vendor's?The vendor's. Running your own STT means another network hop, more complexity, and almost no upside. The exception is regulated industries with custom transcription needs — not the builder case.

Is latency related to TCPA compliance?Indirectly. The FCC's TCPA framework addresses consent and quiet hours, not technical performance — but a slow, broken-feeling call is more likely to draw a complaint, and complaints expand into compliance reviews.

Next reads

Why Builders Miss 15-30% of Inbound Calls (and the Math on Recovery) — the volume side of the response-time problem
Builder Sales AI Myth-Busting: 7 Things Your VP Sales Probably Got Wrong — including the "AI sounds robotic" myth
Does AI Voice Still Sound Robotic in New Home Sales? — the naturalness side of the same conversation
AI for Home Builders Pillar — the category foundation

What to do Monday morning

Pick the voice AI vendor at the top of your shortlist. Email them and ask for a test number, then run the three tests above on a Tuesday afternoon and a Saturday morning. The 95th-percentile number is the one to compare against the next vendor on your list.

Or skip the vendor bake-off. Jome's voice AI runs sub-500ms on real builder calls, integrated with your CRM — see it live at ai.jome.com.

Learn more