The Gilaki Speech Dataset provides an extensive repository of authentic audio recordings from native Gilaki speakers across Gilan province, Iran. This specialized linguistic resource contains 170 hours of professionally recorded Gilaki speech, accurately annotated and organized for sophisticated machine learning tasks. Gilaki, a Northwestern Iranian language spoken by over 3 million people in Iran’s lush northern regions along the Caspian coast, is documented with its unique phonetic characteristics and distinctive linguistic features essential for building effective speech recognition and language processing systems.
The dataset features balanced demographic distribution across gender and age categories, offering comprehensive representation of Gilaki linguistic diversity from one of Iran’s most distinctive regional languages. Available in MP3/WAV format with consistent audio quality, this dataset is specifically designed for AI researchers, speech technologists, and developers creating voice applications, conversational AI, and natural language understanding systems for Caspian linguistic communities.
Dataset General Info
| Parameter | Details |
| Size | 170 hours |
| Format | MP3/WAV |
| Tasks | Speech recognition, AI training, voice assistant development, natural language processing, acoustic modeling, speaker identification |
| File size | 239 MB |
| Number of files | 782 files |
| Gender of speakers | Female: 46%, Male: 54% |
| Age of speakers | 18-30 years: 30%, 31-40 years: 29%, 40-50 years: 24%, 50+ years: 17% |
| Countries | Iran (Gilan province) |
Use Cases
Regional Cultural Documentation: Cultural institutions and academic researchers can utilize the Gilaki Speech Dataset to develop comprehensive digital archives of Gilaki oral literature, traditional music, and cultural practices. Voice technology enables preservation of Gilan’s distinctive cultural heritage including unique traditions of Iran’s lush northern regions, supports documentation of agricultural and fishing traditions, and maintains Gilaki linguistic identity through modern digital preservation methods.
Local Commerce and Business Services: Businesses operating in Gilan province can leverage this dataset to create voice-enabled customer service platforms for regional trade, agricultural commerce, and tourism services. Voice interfaces in Gilaki support local businesses in tea cultivation, rice farming, and handicraft sectors, improve accessibility for Gilaki-speaking customers, and enhance regional economic development through language-inclusive technology solutions.
Educational and Community Programs: Educational organizations and community centers can employ this dataset to develop Gilaki language learning applications, mother-tongue education resources, and literacy tools. Voice-based educational content supports Gilaki speakers in accessing education while maintaining linguistic heritage, enables documentation for future generations, and promotes bilingual education approaches that respect regional linguistic identity alongside Persian language requirements.
FAQ
Q: What does the Gilaki Speech Dataset include?
A: The Gilaki Speech Dataset contains 170 hours of authentic audio recordings from native Gilaki speakers across Gilan province in northern Iran. The dataset includes 782 files in MP3/WAV format totaling approximately 239 MB, with transcriptions, speaker demographics, regional information from Gilan’s diverse geography, and linguistic annotations.
Q: How does Gilaki relate to other Iranian languages?
A: Gilaki is a Northwestern Iranian language related to but distinct from Mazanderani and Persian. It has unique phonological features and grammatical structures. The dataset includes linguistic annotations marking Gilaki-specific characteristics, ensuring accurate recognition of Gilaki as distinct language with its own identity in northern Iran’s linguistic landscape.
Q: What makes Gilan culturally distinctive?
A: Gilan has unique culture influenced by Caspian climate including rice cultivation, tea production, distinctive architecture, and specific traditions. Gilaki language embodies this cultural identity. The dataset supports preservation of Gilaki culture through technology, maintaining linguistic heritage in one of Iran’s most agriculturally important and culturally distinctive regions.
Q: Can this dataset support agricultural applications?
A: Yes, Gilan is major agricultural region known for tea and rice production. The dataset supports development of voice-based agricultural advisory systems for tea cultivation, rice farming, and horticulture in Gilaki, delivering technical guidance to farmers in their mother tongue and supporting agricultural productivity in northern Iran.
Q: What regional varieties are captured?
A: The dataset captures Gilaki speakers from across Gilan province representing variations from coastal areas to mountainous regions. With 782 diverse recordings, it ensures coverage of Gilaki as spoken across Gilan’s varied geography from Caspian coast to Alborz foothills.
Q: What is the demographic breakdown?
A: The dataset includes 46% female and 54% male speakers with age distribution of 30% aged 18-30, 29% aged 31-40, 24% aged 40-50, and 17% aged 50+. This balanced representation ensures equitable performance across different demographic segments.
Q: What applications benefit from Gilaki speech technology?
A: Applications include agricultural extension services for tea and rice farmers, tourism information for Caspian tourism, cultural heritage documentation including traditional music, local media transcription, regional government services, educational resources, and community platforms serving Gilan’s distinctive linguistic community.
Q: How does this contribute to linguistic diversity?
A: The dataset recognizes Gilan’s linguistic distinctiveness within Iran and supports maintaining regional linguistic diversity. It enables technology development that respects local identity, promotes multilingual approaches alongside Persian, and ensures technological progress includes all of Iran’s linguistic communities rather than only Persian speakers.
How to Use the Speech Dataset
Step 1: Dataset Acquisition
Download the dataset package from the provided link. Upon purchase, you will receive access credentials and download instructions via email. The dataset is delivered as a compressed archive file containing all audio files, transcriptions, and metadata.
Step 2: Extract and Organize
Extract the downloaded archive to your local storage or cloud environment. The dataset follows a structured folder organization with separate directories for audio files, transcriptions, metadata, and documentation. Review the README file for detailed information about file structure and naming conventions.
Step 3: Environment Setup
Install required dependencies for your chosen ML framework such as TensorFlow, PyTorch, Kaldi, or others. Ensure you have necessary audio processing libraries installed including librosa, soundfile, pydub, and scipy. Set up your Python environment with the provided requirements.txt file for seamless integration.
Step 4: Data Preprocessing
Load the audio files using the provided sample scripts. Apply necessary preprocessing steps such as resampling, normalization, and feature extraction including MFCCs, spectrograms, or mel-frequency features. Use the included metadata to filter and organize data based on speaker demographics, recording quality, or other criteria relevant to your application.
Step 5: Model Training
Split the dataset into training, validation, and test sets using the provided speaker-independent split recommendations to avoid data leakage. Configure your model architecture for the specific task whether speech recognition, speaker identification, or other applications. Train your model using the transcriptions and audio pairs, monitoring performance on the validation set.
Step 6: Evaluation and Fine-tuning
Evaluate model performance on the test set using standard metrics such as Word Error Rate for speech recognition or accuracy for classification tasks. Analyze errors and iterate on model architecture, hyperparameters, or preprocessing steps. Use the diverse speaker demographics to assess model fairness and performance across different groups.
Step 7: Deployment
Once satisfactory performance is achieved, export your trained model for deployment. Integrate the model into your application or service infrastructure. Continue monitoring real-world performance and use the dataset for ongoing model updates and improvements as needed.
For detailed code examples, integration guides, and troubleshooting tips, refer to the comprehensive documentation included with the dataset.





