Today's Bulletin: May 21, 2026

More results...

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
Africacom
AfricaCom 2024
AfricaCom 2025
AI
Apps
Apps
Arabsat
Banking
Broadcast
Cabsat
CABSAT
Cloud
Column
Content
Corona
Cryptocurrency
DTT
eCommerce
Editorial
Education
Entertainment
Events
Fintech
Fixed
Gitex
Gitex Africa
Gitex Africa 2025
GSMA Cape Town
Healthcare
IBC
Industry Voices
Infrastructure
IoT
MNVO Nation Africa
Mobile
Mobile Payments
Music
MWC Barcelona
MWC Barcelona 2025
MWC Barcelona 2026
MWC Kigali
MWC Kigali 2025
News
Online
Opinion Piece
Orbiting Innovations
Podcast
Q&A
Satellite
Security
Software
Startups
Streaming
Technology
TechTalks
TechTalkThursday
Telecoms
Utilities
Video Interview
Follow us

Pleias and GSMA Launch CommonLingua, an Open-Source AI Model Supporting 61 African Languages

April 28, 2026
3 min read
Author: Editorial Team

It is delivered under the GSMA’s AI Language Models in Africa, by Africa, for Africa initiative, a coalition dedicated to closing the African language gap in AI.

Pleias  and the GSMA  announced the release of CommonLingua, an open-source language identification (LID) model purpose-built to unlock African language data at scale. It is delivered under the GSMA’s AI Language Models in Africa, by Africa, for Africa initiative, a coalition dedicated to closing the African language gap in AI.

Africa is home to more than 2,000 living languages, many of which remain underrepresented in AI training data. As a result, language identification systems often perform less reliably on African-language content, particularly when distinguishing between closely related or code-mixed text. Before a Swahili, Yoruba, or Wolof language model can be built, the underlying text must first be correctly identified by language – a step where existing tools currently often fail on African content.

This is because leading LID systems such as fastText, GlotLID, and OpenLID were built around European and Asian high-resource languages and frequently mislabel African-language text as English or French. Even state-of-the-art frontier models drop roughly 30 points in accuracy on African languages compared to major world languages.

CommonLingua is designed to fix this first step of the pipeline. On the new CommonLID benchmark, CommonLingua achieves 83% accuracy and a macro score F1 of 0.79, outperforming leading LID models by more than 10 percentage points under comparable evaluation conditions, while using roughly one three-hundredth of the parameters. The model is lightweight at 2 million parameters and shipping as an 8 MB checkpoint, and is designed for efficient deployment, running approximately 20 texts per second on CPU and up to 3,000 texts per second on a single GPU.

CommonLingua covers 334 languages in total, including 61 African languages across eight language families: Bantu (21), Niger-Congo / West African (18), Afro-Asiatic and Semitic (7), Cushitic and Chadic (4), Berber (3), Nilo-Saharan (3), and pidgins, creoles, and other (5). The model operates directly on UTF-8 byte sequences rather than relying on a language-specific tokenizer, enabling consistent handling across scripts including Latin, Arabic, Ethiopic, N’Ko, and Tifinagh.

“African languages are not an edge case. They are the working languages of hundreds of millions of people, and they deserve AI infrastructure built with the same care as any other language. CommonLingua is deliberately the first brick we are laying: you cannot curate what you cannot identify.”

Pierre-Carl Langlais, Co-founder and Chief Technology Officer, Pleias

The model is trained exclusively on open-licensed and public domain content aggregated through the Common Corpus project, including Wikipedia, Scientific publications in OpenAlex, VOA Africa, WaxalNLP, Cultural Heritage, and Pralekha. All datasets are released under permissive licenses.

“Closing the gap in African-language AI is is fundamental to digital inclusion and unlocking economic opportunity. Progress has long been held back by the lack of foundational infrastructure, beginning with something as essential as language identification. CommonLingua addresses this critical gap, enabling the development of richer datasets and more representative AI systems at scale. Through our initiative, the GSMA is bringing partners together to move beyond fragmented efforts towards shared infrastructure that can power Africa’s digital ecosystem.”

Louis Powell, Director of AI Initiatives, GSMA

This conversation will continue at MWC26 Kigali, where GSMA and partners will bring together industry leaders to accelerate progress on African-language AI. Register now to be part of the discussion.

The TechAfrica News Podcast

Follow us on LinkedIn

Newsletter signup

Sign up for our weekly newsletter and get the latest industry insights right in your inbox!

Please wait...

Thank you for sign up!