Scan to Download Gate App
qrCode
More Download Options
Don't remind me again today

Meta AI Introduces Omnilingual ASR, Advancing Automatic Speech Recognition Across More Than 1,600 Languages

In Brief

Meta AI has launched the Omnilingual ASR system, providing speech recognition for over 1,600 languages and released open-source models and a corpus for 350 underserved languages.

Meta AI Introduces Omnilingual ASR, Advancing Automatic Speech Recognition Across More Than 1,600 Languages

Research division of technology company Meta specializing in AI and augmented reality, Meta AI announced the release of the Meta Omnilingual Automatic Speech Recognition (ASR) system

This suite of models delivers automatic speech recognition for over 1,600 languages, achieving high-quality performance at an unprecedented scale. In addition, Meta AI is open-sourcing Omnilingual wav2vec 2.0, a self-supervised, massively multilingual speech representation model with 7 billion parameters, designed to support a variety of downstream speech tasks.

Alongside these tools, the organization is also releasing the Omnilingual ASR Corpus, a curated collection of transcribed speech from 350 underserved languages, developed in partnership with global collaborators.

Automatic speech recognition has advanced in recent years, achieving near-perfect accuracy for many widely spoken languages. Expanding coverage to less-resourced languages, however, has remained challenging due to the high data and computational demands of existing AI architectures. The Omnilingual ASR system addresses this limitation by scaling the wav2vec 2.0 speech encoder to 7 billion parameters, creating rich multilingual representations from raw, untranscribed speech. Two decoder variants map these representations into character tokens: one using connectionist temporal classification (CTC) and another using a transformer-based approach similar to those in large language models.

This LLM-inspired ASR approach achieves state-of-the-art performance across more than 1,600 languages, with character error rates under 10 for 78% of them, and introduces a more flexible method for adding new languages

Unlike traditional systems that require expert fine-tuning, Omnilingual ASR can incorporate a previously unsupported language using only a few paired audio-text examples, enabling transcription without extensive data, specialized expertise, or high-end compute. While zero-shot results do not yet match fully trained systems, this method provides a scalable way to bring underserved languages into the digital ecosystem.

Meta AI To Advance Speech Recognition With Omnilingual ASR Suite And Corpus

The research division has released a comprehensive suite of models and a dataset designed to advance speech technology for any language. Building on FAIR’s prior research, Omnilingual ASR includes two decoder variants, ranging from lightweight 300M models for low-power devices to 7B models offering high accuracy across diverse applications. The general-purpose wav2vec 2.0 speech foundation model is also available in multiple sizes, enabling a wide range of speech-related tasks beyond ASR. All models are provided under an Apache 2.0 license, and the dataset is available under CC-BY, allowing researchers, developers, and language advocates to adapt and expand speech solutions using FAIR’s open-source fairseq2 framework in the PyTorch ecosystem.

Omnilingual ASR is trained on one of the largest and most linguistically diverse ASR corpora ever assembled, combining publicly available datasets with community-sourced recordings. To support languages with limited digital presence, Meta AI partnered with local organizations to recruit and compensate native speakers in remote or under-documented regions, creating the Omnilingual ASR Corpus, the largest ultra-low-resource spontaneous ASR dataset to date. Additional collaborations through the Language Technology Partner Program brought together linguists, researchers, and language communities worldwide, including partnerships with Mozilla Foundation’s Common Voice and Lanfrica/NaijaVoices. These efforts provided deep linguistic insight and cultural context, ensuring the technology meets local needs while empowering diverse language communities globally.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)