Google Research Africa Launches WAXAL, Open Dataset Covering 21 African Languages

February 2, 2026

2 min read

Author: Akim Benamara

Developed over three years, WAXAL aims to empower researchers and support the creation of inclusive technology across Africa.

For many people around the world, speaking to devices has become second nature, whether to get directions, check the news, or dictate voice notes. However, this convenience often disappears when technology cannot understand local languages—a reality for hundreds of millions of people, particularly in Sub-Saharan Africa, where over 2,000 distinct languages are spoken. The main challenge in developing inclusive voice technology for the region has been the lack of accessible, high-quality speech data.

To address this gap, researchers have introduced WAXAL, a dataset named after the Wolof word for “speak.” Developed over three years, WAXAL aims to empower researchers and support the creation of inclusive technology across Africa. The dataset covers 21 languages, including Acholi, Hausa, Luganda, and Yoruba, and comprises over 11,000 hours of speech data from nearly two million recordings. It includes approximately 1,250 hours of transcribed speech for automatic speech recognition (ASR) and more than 20 hours of studio recordings for text-to-speech (TTS) applications.

The project is a collaborative effort led by African institutions and experts. Makerere University in Uganda and the University of Ghana collected data for 13 languages, while Digital Umuganda in Rwanda led data collection for five additional languages. High-quality studio recordings were produced in partnership with Media Trust and Loud n Clear, and the African Institute for Mathematical Sciences (AIMS) contributed multilingual datasets for future expansions. The framework ensures that partners retain ownership of the data they collected while making resources available to the global research community.

WAXAL captures authentic speech ethically, combining everyday language use—such as participants describing pictures in their native tongues—with professional voice recordings for text-to-speech development. Beyond supporting AI innovation, WAXAL is expected to aid in the digital preservation of African languages. The full dataset is released under an open license and is available today on Hugging Face, with detailed methodology published in an accompanying research paper.

Newsletter signup

Name

Company

Position

Please wait...

Thank you for sign up!

EDITOR'S CHOICE

#MWC Barcelona

Ralph Mupita on MTN’s New Frontier: Connectivity, Content, and African AI
Ralph Mupita, Group President and CEO, MTN Group

April 9, 2026

4 min read

#MWC Barcelona

Trustonic’s Blueprint for De-Risking Smartphone Financing and Connecting Millions of Africans
Craig Fleischer, Executive Vice President, Middle East Africa, Trustonic

April 9, 2026

3 min read

Health Data Without Continuity: Why Electronic Health Records Still Have Not Scaled Across Africa

April 23, 2026

6 min read

#TechTalkThursday

Where Commitment Meets Action: The DRC’s Telecommunications Sector Gets Its Defining Moment

April 16, 2026