When AI speaks for you: The future of live translation
While the promise of seamless, human-like translation is not yet fully realised, rapid advances in AI are bringing it closer than ever.
Imagine watching Money Heist or Prison Break in your preferred language without waiting for a dubbed version. Or picture being a die-hard K-Drama fan who does not understand Korean but can still follow conversations without constantly reading subtitles. That vision sits at the heart of live translation, a technology increasingly shaping entertainment, education and cross-border communication.
Over the past two decades, translation tools have moved from static text boxes to experiments in real-time audio. While the promise of seamless, human-like translation is not yet fully realised, rapid advances in artificial intelligence are bringing it closer than ever.
Early years: Literal text on screen
When Google Translate launched in 2006, it functioned as a basic text tool. Users typed or pasted content and received instant translations. The results were often literal and awkward, struggling with grammar and nuance, but the service was transformative. For the first time, millions could access free translations across dozens of languages.
Microsoft took an early step towards spoken translation with Skype Translator in 2014. The system enabled near real-time speech translation between limited language pairs, including English and Spanish. Although conversations required pauses and corrections, it demonstrated that translation could move beyond text.
Smartphones expand the use case
The spread of smartphones widened translation's reach. Google introduced voice input, allowing users to speak and see translations appear on screen. Camera translation followed, letting travellers scan signs, menus and documents.
These tools made travel and everyday interactions easier, but real conversations still felt fragmented. Delays, robotic voices and mismatched tone reminded users that translation remained mediated by machines rather than flowing naturally.
Smart speakers and home use
Amazon entered the space in 2020 with Alexa's Live Translation on Echo devices. The feature allowed conversations across several major languages, including English, Spanish, French, German, Italian, Portuguese and Hindi. It proved useful in bilingual households, though its reliance on Amazon hardware limited broader adoption.
Conversation mode and meetings
Google's Conversation Mode marked another step forward. Two speakers could take turns while the app translated each side. While more practical, users still had to hold their phones and wait for processing, preventing a fully fluid dialogue.
Video-conferencing platforms also began to experiment. Zoom introduced AI-generated live captions and translated captions, and in 2025 said it was testing more advanced real-time translation features for meetings, according to company announcements. These tools primarily targeted corporate users rather than everyday conversations.
Gemini and real-time audio experiments
Recent attention has focused on Google's Gemini AI model. In early 2026, Google demonstrated and began limited testing of an upgraded "Live Translate" experience within Google Translate, according to company briefings. The system aims to deliver near real-time spoken translations through connected audio devices, including headphones, rather than only displaying text on a screen.
Google says the feature is designed to better reflect tone and emphasis than earlier systems, though it remains in beta testing and continues to face challenges around latency, accuracy and emotional nuance. Access is currently limited by region, language pair and device compatibility, and the company has not positioned the technology as flawless or universally available.
Rose Yao, a Google executive involved in product development, said in company materials that users can activate live translation through the Translate app, while Google continues to refine the experience based on feedback.
Competing approaches
Other technology companies are pursuing similar goals through different routes. Meta announced live translation features for its Ray-Ban smart glasses in 2024, with wider availability beginning in 2025. The glasses play translated speech through built-in speakers, but require dedicated hardware.
Apple has also expanded translation features across Messages, FaceTime and phone calls since 2025, and later integrated translation support with AirPods. Apple emphasises on-device processing for privacy, but its tools remain closely tied to its own ecosystem.
Microsoft continues to offer translation through Azure Cognitive Services and Teams, focusing on enterprise users, while Amazon maintains translation capabilities through Alexa.
Text translation becomes smarter
Alongside audio, AI has improved written translation. Systems are increasingly able to interpret meaning rather than translating word-for-word. Idioms, slang and local expressions are now more likely to be rendered in context, according to developers from Google, Apple and Microsoft.
This shift reflects a broader move away from mechanical translation towards systems that attempt to model how humans interpret language, though errors and cultural misunderstandings still occur.
Language learning and education
Google is also integrating translation into language learning. New features allow speakers of several languages, including Bengali, Hindi, German and Italian, to practise English, while English speakers can practise selected foreign languages. Feedback tools and progress tracking aim to encourage regular use, placing Google Translate closer to dedicated learning apps.
The bigger picture
From simple text boxes to camera scans, conversation modes and experimental real-time audio, live translation has steadily evolved. What began as a traveller's aid is becoming a broader communication platform, though fully natural, human-level translation remains a work in progress.
With Google, Apple, Microsoft, Amazon, Meta and Zoom all investing heavily, the competition to make live translation faster, more accurate and more natural is intensifying. Headphone-based translation may represent the next step, but for now, it remains an emerging technology rather than a finished solution.
—-------------------------------------------------------------------------------
Mishuk Rahman and Md Jafar Uddin contributed to this report
