Community
Grassroots Data Collection Training
Capacity building for community annotators across African language groups
About the workshop
A training session designed for grassroots annotators, linguists, and community leaders. Participants learn how to set up annotation projects, manage contributors, apply quality-control measures, and connect their data pipeline to the AfricaNLP Playbook governance guidelines.
Agenda
- 20 minData governance and community ownership principles
- 40 minSetting up and managing an annotation project
- 40 minHands-on annotation across language groups
- 30 minQuality control and validation workflows
- 30 minDiscussion — sustainability and long-term engagement
Objectives
- Build local capacity for running annotation projects.
- Teach data governance and community ownership principles.
- Connect grassroots annotators with the broader Masakhane community.
- Identify priority languages and tasks for upcoming data collection campaigns.
Expected outcomes
- Participants able to independently run annotation projects.
- A network of trained community annotators across African language groups.
- A list of priority languages and tasks for upcoming campaigns.
Who should attend
Grassroots annotators and community leadersLinguists working with under-resourced languagesNGOs and civic tech groups interested in language data
Organizers
Masakhane community team