AI News
OpenAI gpt-realtime Model and Realtime API Production Updates
OpenAI releases gpt-realtime speech-to-speech model with MCP server support, image input, and SIP calling capabilities for production voice agents.
The gpt-realtime model represents a notable advancement over OpenAI's previous speech-to-speech offerings, with measurable improvements in instruction following, function calling accuracy, and audio quality. On the Big Bench Audio benchmark measuring reasoning capabilities, gpt-realtime scored 82.8% accuracy compared to 65.6% for the December 2024 model.
Model Performance and Capabilities
The new model demonstrates significant improvements across core metrics relevant to enterprise deployments. Function calling accuracy increased from 49.7% to 66.5% on the ComplexFuncBench audio evaluation, while instruction following improved from 20.6% to 30.5% on the MultiChallenge audio benchmark.
OpenAI enhanced the model's ability to handle nuanced audio requirements, including language switching mid-sentence, alphanumeric detection across multiple languages, and tone adaptation based on instructions. The model can now follow fine-grained audio directions like "speak quickly and professionally" or "speak empathetically in a French accent."
Two new voices, Cedar and Marin, are available exclusively through the Realtime API, with existing voices receiving quality improvements. The model also supports asynchronous function calling, allowing conversations to continue while waiting for tool responses.
Production-Ready API Features
The Realtime API now supports remote Model Context Protocol (MCP) servers, enabling developers to extend voice agents with external tools without manual integration work. Sessions can connect to MCP servers by passing a URL in the configuration, with the API handling tool calls automatically.
Image input support allows applications to ground conversations in visual context. Users can share screenshots, photos, or documents with the voice agent, enabling interactions like "read the text in this screenshot" or "what do you see?" The system treats images as conversation elements rather than live video streams, giving developers control over when and what visual information the model processes.
SIP phone calling support connects voice agents directly to traditional phone networks, PBX systems, and desk phones. This integration removes technical barriers for enterprises wanting to deploy voice agents in existing telecommunications infrastructure.
Enterprise Adoption Considerations
For European enterprises evaluating voice AI deployments, the Realtime API's single-model architecture offers advantages over traditional speech-to-text and text-to-speech pipelines. Direct audio processing reduces latency and preserves speech nuances that matter for customer service applications.
The MCP server integration addresses a common enterprise requirement: connecting AI systems to existing business tools and databases. Rather than building custom integrations, technical teams can point sessions to MCP-compatible servers and access those capabilities immediately.
Pricing and availability details for European markets were not specified in the announcement, though the general availability status suggests broader enterprise rollout readiness compared to the previous beta offering.
Technical Implementation
Developers can implement reusable prompts across Realtime API sessions, similar to the existing Responses API. This feature supports consistent voice agent behavior across different deployments and reduces configuration overhead for teams managing multiple applications.
The API maintains existing WebSocket-based architecture while adding new input types and integration points. Image inputs use base64 encoding within conversation items, while MCP servers connect through URL configuration in session setup.
Safety and privacy safeguards remain integrated into the service, though specific details about European data handling or GDPR compliance were not detailed in the technical documentation.
Original source: This analysis is based on OpenAI's announcement available at https://openai.com/index/introducing-gpt-realtime
AI News Updates
Subscribe to our AI news digest
Weekly summaries of the latest AI news. Unsubscribe anytime.
More News
Other recent articles you might enjoy.
Chat with 100+ AI Models in one App.
Use Claude, ChatGPT, Gemini alongside with EU-Hosted Models like Deepseek, GLM-5, Kimi K2.5 and many more.