Skip to main content

Speech

Collecting and labelling audio for African languages. The tasks below share the same recording, consent and quality groundwork; they differ in what is produced from the audio.

Shared across speech data

  • Speaker recruitment, consent & voice rights
  • Recording setup & environment (device, microphone, background noise)
  • Audio formats, sample rates & file standards
  • Audio quality control (SNR, clipping, silence, channel checks)
  • Transcription conventions (orthography, code-switching, disfluencies)
  • Metadata (speaker demographics, device, environment, locale)
  • Licensing & ethical handling of voices

Tasks

  • ASR (Automatic Speech Recognition) – (transcription, multilingual ASR, code-switching)
  • TTS (Text-to-Speech) – (single-speaker, multi-speaker, expressive TTS)
  • Speech-to-Speech Translation (STS) – (direct speech translation across languages)
  • Audio Understanding – (audio classification, sound event detection)
  • Speech emotion recognition
  • Speaker diarization
Contributor
@abumafrim

Join the discussion

Spotted an error, have a question, or want to share what worked on a real project? Sign in with GitHub to add your voice — every thread lives in the open, powered by GitHub Discussions.

Loading discussion…