Thinking Machines Breaks AI's Turn-Based Mold with Real-Time Voice and Video Interaction Models

By

Groundbreaking AI Interaction Models Unveiled

January 28, 2025 — Thinking Machines, the AI startup founded by former OpenAI chief Mira Murati and co-founder John Schulman, today announced a research preview of its new interaction models that enable near-real-time voice and video conversations. The models process input and output simultaneously in 200-millisecond chunks, eliminating the traditional turn-based latency.

Thinking Machines Breaks AI's Turn-Based Mold with Real-Time Voice and Video Interaction Models
Source: venturebeat.com

“We are fundamentally moving AI beyond the era of turn-based chat,” said Mira Murati, CEO of Thinking Machines. “Our models treat interactivity as a first-class citizen of the architecture, allowing them to listen, talk, and see simultaneously.” The announcement marks a significant step toward fluid human-AI collaboration.

Full-Duplex Architecture Redefines AI Processing

Unlike current frontier models that freeze perception while generating responses, Thinking Machines’ system uses a multi-stream, micro-turn design. It processes input and output concurrently—a technique known as full-duplex communication. This allows the AI to interject or react to visual cues in real time, such as a user spotting a bug in code during a video call.

The model employs encoder-free early fusion, taking raw audio as dMel and image patches via a lightweight embedding layer. All components are co-trained from scratch. “This is a fundamental shift in how AI perceives time and presence,” the company stated in its blog post. “It moves away from forcing humans to contort themselves to AI interfaces.”

Read background on current AI limitations | What this means for the future

Background: The Turn-Based Bottleneck

Current AI assistants—whether text, voice, or video—operate on a strict turn-based model: user input, wait, AI output. This creates a collaboration bottleneck, forcing users to batch their thoughts and phrase queries like email. For tasks requiring natural interaction, such as real-time translation or live customer support, this latency is unacceptable.

Thinking Machines argues that true interactivity requires AI to process and respond simultaneously. Their new models are designed to support seamless backchanneling—listening while speaking, watching while explaining.

What This Means: A New Era of Human-AI Interaction

If successful, these interaction models could revolutionize several industries:

“This moves AI from being a tool to a true partner,” said Dr. Alice Chen, an AI researcher at Stanford. “The ability to process and respond in real time is crucial for tasks that require natural back-and-forth.” However, the models are not yet public. Thinking Machines plans to open a limited research preview in the coming months, with wider release later in 2025.

The announcement has already sparked debate about ethical implications—particularly in surveillance and deepfakes. The company says it is developing safety guards but has not disclosed specifics.

Stay tuned: For updates on the public release, follow Thinking Machines’ official channels. Back to top

Tags:

Related Articles

Recommended

Discover More

The Art of Concealing Bluetooth Trackers in Postal Mail: A Technical GuideFDA Closes Loophole for Compounded Weight Loss Drugs: What Patients Need to KnowNavigating Controversy: The IMO’s Quest for a Net-Zero Shipping FrameworkRocsys M1: Pioneering Autonomous Charging for Robotaxi FleetsFinancial Firms Race to Scale AI as Adoption Hits 88% – But Most Pilots Never Reach Production