Capture the full linguistic depth of Iran with our Iran Speech Dataset collection. Spanning Persian (Farsi), Azerbaijani, Kurdish, Gilaki, Mazandarani, Luri, Balochi, and Turkmen — across urban centers, regional provinces, and diaspora communities — these datasets are built for teams developing voice AI that reflects Iran’s rich, multilingual and multi-dialectal reality.

Each recording is sourced from native speakers across varied acoustic environments — from Tehran’s bustling metropolitan soundscape and Caspian coastal communities to Kurdish highlands, desert cities, and spontaneous conversational settings. Meticulously annotated with dialect markers, Persian script transcriptions, phonetic alignments, and speaker demographics, our Iranian collections are engineered for the phonological richness and script complexity that high-performance Persian voice AI demands.

Iran Speech Datasets

17 December 2025

Balochi Speech Dataset

speech_data_
17 December 2025

Farsi Speech Dataset

speech_data_
17 December 2025

Gilaki Speech Dataset

speech_data_
17 December 2025

Kurmanji Kurdish Speech Dataset

speech_data_
17 December 2025

Luri Speech Dataset

speech_data_
17 December 2025

Mazanderani Speech Dataset

speech_data_

Iran Speech Datasets

Balochi Speech Dataset

Farsi Speech Dataset

Gilaki Speech Dataset

Kurmanji Kurdish Speech Dataset

Luri Speech Dataset

Mazanderani Speech Dataset