Navigate the extraordinary linguistic diversity of the Middle East with our Middle East Speech Dataset collection. Spanning Arabic, Persian (Farsi, Dari), Turkish, Hebrew, Kurdish, Pashto, Urdu, and Armenian — across more than 20 countries and scores of regional dialects — these datasets are built for teams developing voice AI that reflects the region’s rich, multilingual reality.

Each recording is sourced from native speakers across varied acoustic environments — from ancient bazaars and coastal metropolises to mountainous communities and modern urban centers. Comprehensively annotated with language family tags, dialect markers, script-accurate transcriptions, and speaker demographics, our Middle Eastern collections are engineered for the phonetic complexity and script diversity this region uniquely demands.

Middle East Languages Datasets

25 December 2025

Levantine Arabic Speech Dataset

speech_data_
25 December 2025

Sudanese Arabic Speech Dataset

speech_data_
17 December 2025

Moroccan Arabic Speech Dataset

speech_data_
12 December 2025

Egyptian Arabic Speech Dataset

speech_data_
10 December 2025

Algerian Arabic Speech Dataset

speech_data_

Middle East Languages Datasets

Levantine Arabic Speech Dataset

Sudanese Arabic Speech Dataset

Moroccan Arabic Speech Dataset

Egyptian Arabic Speech Dataset

Algerian Arabic Speech Dataset