Meta launches multilingual AI model that translates 100 languages into speech or text real-time

So cool.

Daniel Seow| August 25, 2023, 05:07 PM

Meta has developed the first artificial intelligence tool that can understand nearly 100 languages, and provide real-time translations in both speech and text.

The company, formerly known as Facebook, has been developing their Universal Language Translator, a machine learning model that will eventually allow people who speak different languages to communicate effortlessly with each other in real-time.

This current tool, named SeamlessM4T, represents the latest milestone in their efforts.

As of Aug. 22, SeamlessM4T has been publicly released under a research license, so that researchers and developers can build on the work.

What can it do?

According to the Meta website, the model can detect speech and text in nearly 100 languages, and translate it into text in nearly as many languages.

It can also produce speech translations in 36 languages.

Additionally, it can detect when more than one language has been mixed in the same sentence and translate the sentence into a single language.

This could be helpful for speakers who code switch and use multiple languages, the project's researchers explained.

For example, a sentence spoken in Telugu and Hindi could be translated into English speech.

The model's functionality was demonstrated in a Facebook video shared by Meta founder Mark Zuckerberg on August 23.

How was it made?

Meta stated that the project was built upon advancements in translation software made over the years.

Some of the model's components are from previous Meta projects, like the No Language Left Behind text translation provider, which supports 200 languages.

The AI model has also analysed millions of hours of multilingual speech, so that it can make sense of it.

It follows in the footsteps of an AI speech translation tool, developed in October 2022, that can translate Hokkien speech real time.

Google, one of Meta's competitors in this field, has been working on a similar project, the Universal Speech Model.

This is an AI model that can automatically recognise speech across more than 300 popular and under-resourced languages.

It is also currently used to produce closed captions on YouTube videos.

An ongoing effort

According to the developer's blog, SeamlessM4T is still being refined to reduce risks of it wrongly transcribing what a person intends to say.

Translations are also being monitored for toxicity, gender bias and inaccuracies.

While it is far from perfect, the dream certainly is promising.

Through the Universal Language Translator, Meta hopes to eliminate language barriers, and enable people to access multilingual content in an increasingly interconnected world.

This is inspired by the Babel Fish, a universal translator from science fiction classic "The Hitchhiker’s Guide to the Galaxy".

In the novel, it was a small, bright yellow fish, placed in someone's ear so they can hear any language translated into their first language.

"While such a capability has long been dreamed of in science fiction, AI is on the verge of bringing this vision into technical reality," the SeamlessM4T website noted.

You can try a demo here.

Top image from Mark Zuckerberg on Instagram / Webtools Word Cloud Generator.