African voices into your machine learning pipeline. Our Africa Speech Dataset collection spans dozens of countries, languages and dialects — including Swahili, Hausa, Yoruba, Amharic, Zulu, Afrikaans, Arabic variants, and dozens of regional languages historically underrepresented in voice AI.

Each dataset is sourced from native speakers across diverse acoustic environments — urban markets, rural communities, broadcast media, and controlled studio settings. Carefully annotated with speaker metadata, tonal markers, and linguistic tags, these collections are engineered for teams building inclusive, high-performance voice models.

Africa Speech Datasets

19 December 2025

Tsonga (Xitsonga) Speech Dataset

speech_data_
19 December 2025

Sepedi Speech Dataset

speech_data_
19 December 2025

Sesotho Speech Dataset

speech_data_
18 December 2025

Tswana Speech Dataset

speech_data_

Africa Speech Datasets

Tsonga (Xitsonga) Speech Dataset

Sepedi Speech Dataset

Sesotho Speech Dataset

Tswana Speech Dataset