Community Ecosystems

African NLP is, to an unusual degree, a community endeavour. The datasets, models, and benchmarks that move the field forward are built less by large institutions than by distributed networks of volunteers, students, and researchers who coordinate online across the continent. The Masakhane community captures the spirit in its name, which means "we build together" in isiZulu, and in its insistence that African language technology should be built by Africans, for Africans, with the community owning what it creates (Masakhane). Knowing this ecosystem is practical knowledge, because it is where you find collaborators, datasets, annotators, and funding.

The communities and what they do

The ecosystem has a recognisable shape. Masakhane is the pan-African hub, a grassroots community of thousands whose participatory model produced foundational datasets such as MasakhaNER and MAFAND (Nekoto et al., 2020; Adelani et al., 2022). Around and within it sit language- and country-focused groups like HausaNLP, EthioNLP, GhanaNLP, and Digital Umuganda in Rwanda, which carry deep expertise in their specific languages. The Deep Learning Indaba, founded in 2017, is the community's annual meeting place and runs IndabaX events that had reached 47 countries by 2025, building the human capacity the field depends on (Deep Learning Indaba). AI4D-Africa funded and coordinated dataset creation through its African Language Program (Siminyu et al., 2021), Zindi hosts competitions and a large pool of African data scientists, Lelapa AI builds African-centred models and the Esethu framework, and Lanfrica maps the resources so they can be found (Lanfrica). This is not an exhaustive list, and it grows every year.

Academic and industry collaboration

Good African-language datasets increasingly come from partnerships that cross the academic, industry, and community lines. University centres such as the Data Science for Social Impact group at the University of Pretoria, the Maseno Centre for Applied AI in Kenya, and SADiLaR, South Africa's national language-resource infrastructure, bring linguistic depth and continuity. Companies and funders bring scale and money: African Next Voices paired community centres with Gates Foundation funding to record thousands of hours of speech (African Next Voices, 2025), and Lacuna Fund underwrites dataset creation across the continent (Lacuna Fund, n.d.). The healthiest of these partnerships keep the community in control of the data even when the money comes from outside, which is exactly what the data governance chapter argues for.

Contributing, and crediting contributors

A community resource is only as strong as the people who contribute to it, so make contribution easy and make credit real. Provide clear contributor guidelines that say what kinds of help are wanted, whether that is writing, reviewing, sharing a dataset, or translating, and how to get started. Lower the barrier to a first contribution, because many of the field's most valuable people began as students who fixed one small thing. And name contributors in datasheets, in publications, and in release notes, because recognition is both fair and the cheapest way to keep a volunteer community alive. The growth of African NLP over two decades, from a handful of papers a year to hundreds, has been driven precisely by this widening circle of contributors (Belay et al., 2025).

Cite this page

Contributor

The communities and what they do​

Academic and industry collaboration​

Contributing, and crediting contributors​

Join the discussion

The communities and what they do

Academic and industry collaboration

Contributing, and crediting contributors