Speech
Collecting and labelling audio for African languages. The tasks below share the same recording, consent and quality groundwork; they differ in what is produced from the audio.
Shared across speech data
- Speaker recruitment, consent & voice rights
- Recording setup & environment (device, microphone, background noise)
- Audio formats, sample rates & file standards
- Audio quality control (SNR, clipping, silence, channel checks)
- Transcription conventions (orthography, code-switching, disfluencies)
- Metadata (speaker demographics, device, environment, locale)
- Licensing & ethical handling of voices
Tasks
- ASR (Automatic Speech Recognition) – (transcription, multilingual ASR, code-switching)
- TTS (Text-to-Speech) – (single-speaker, multi-speaker, expressive TTS)
- Speech-to-Speech Translation (STS) – (direct speech translation across languages)
- Audio Understanding – (audio classification, sound event detection)
- Speech emotion recognition
- Speaker diarization