Breaking
May 13, 2026

AI voice chat sucks. This startup thinks it’s cracked it | usagoldmines.com

Voice chatting with today’s AI can feel as stilted as an old-school CB radio exchange, where you’re forced to take turns as you talk.

“Hey ChatGPT, let’s talk about the movies! Over.”

“Sure Ben, what movie would you like to talk about? Over.” 

OK, so you don’t literally have to say “over” and “out” during voice chats with ChatGPT or Gemini, but that’s essentially what’s happening behind the scenes. 

In some ways, AI voice modes are even more limited than CB radio chats. Not only does the AI have to wait while you talk, it has no perception of anything else that’s going on while you’re speaking, including the passage of time. Similarly, when the AI speaks, it’s too busy generating its response to “think” of anything else. In other words, AI voice mode is just standard AI text chat with tacked-on voices. Hence, I barely ever use it.

That could change thanks to a new generation of “interaction” AI models that can actually follow the ebb and flow of a conversation, even interrupting while listening to you in real-time.

Developed by Thinking Machines, an AI startup founded by ex-OpenAI exec Mira Murati, these “interaction” models aren’t like today’s single-threaded AI models, which can neither think while they’re listening or react to you while they’re speaking. Instead, these new models employ a “multi-stream, micro-turn” configuration that allows them to continue processing inputs–including sights and sounds–while they’re listening to you, and then can even interrupt based on what you’re saying.

In a series of demo reels, Thinking Machines shows its models (which is still in a research preview) reacting to its human participants in real-time during video chats, identifying products they’re holding up and keeping a running tally of “animal” words (like “deer” and “sheep”) as a human user continues to speak. The Thinking Machines models also show impressive restraint during another interaction, waiting patiently rather than jumping in as its human partner takes a mid-sentence sip of coffee.

In another demo, the model does interrupt (as instructed), correcting a human speaker in real-time as she mispronounces the word “acai” and correcting her intentionally inaccurate statement that acai bowls originated in Argentina. Yes, that sounds annoying, but the demo makes the point that Thinking Machine’s AI can react while it listens, rather than being stuck while waiting its turn.

So, what’s Thinking Machine’s trick? The company actually employs a pair of AI models: an “interaction” model that’s continually “present” with the user, processing inputs and outputs in rapid-fire 200ms chunks, while a second “background” model does the heavy lifting for more complex tasks, handing off the results to the speedier interaction model when they’re ready.

Thinking Machine’s new interactive AI models are still works in progress (I’ve yet to see or hear them in action yet.) The startup admits that its models struggle with “very long” conversations, and that they depend on “reliable connectivity” to work properly. The company’s current “interaction” model is also on the small side, as larger models are “too slow to serve in this setting.”

Still, Thinking Machine’s new “full-duplex” paradigm could be a game-changer for AI voice chat, making it feel smooth and natural rather than a strained Smokey and the Bandit-era back-and-forth.

 

This articles is written by : Nermeen Nabil Khear Abdelmalak

All rights reserved to : USAGOLDMIES . www.usagoldmines.com

You can Enjoy surfing our website categories and read more content in many fields you may like .

Why USAGoldMines ?

USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.