Embrace the extraordinary linguistic complexity of Morocco with our Morocco Speech Dataset collection. Spanning Darija (Moroccan Arabic), Modern Standard Arabic, Tamazight (Berber), French, and Spanish — across Morocco’s diverse regions, from the Rif and Atlas mountains to Atlantic coastal cities and Saharan communities — these datasets are built for teams developing voice AI that reflects Morocco’s uniquely multilingual and code-switching reality.

Each recording is sourced from native speakers across varied acoustic environments — from Casablanca’s cosmopolitan streets and Marrakech’s vibrant medinas to rural Amazigh villages, northern Spanish-influenced communities, and broadcast media settings. Meticulously annotated with language and dialect tags, Arabic script transcriptions, Tifinagh markers for Tamazight, and rich speaker demographics, our Moroccan collections are engineered for the phonological complexity and multilingual fluidity that authentic Moroccan voice AI demands.

Morocco Speech Datasets

25 December 2025

Arabic Speech Dataset

speech_data_
18 December 2025

Cebuano Speech Dataset

speech_data_
18 December 2025

Central Atlas Tamazight Speech Dataset

speech_data_
17 December 2025

Moroccan Arabic Speech Dataset

speech_data_
17 December 2025

Tashelhit Speech Dataset

speech_data_

Morocco Speech Datasets

Arabic Speech Dataset

Cebuano Speech Dataset

Central Atlas Tamazight Speech Dataset

Moroccan Arabic Speech Dataset

Tashelhit Speech Dataset