Research & Publications
Papers, datasets, and surveys we contribute to African NLP — built with native-speaker communities and released openly.
All Publications
20 publications found
Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South
A participatory red-teaming methodology for text-to-image safety evaluation grounded in Global South contexts, surfacing harms that generic Western-centric pipelines miss.
Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks
An agreement-based clustering method for subjective annotation tasks that preserves disagreement signal instead of collapsing it under majority voting.
AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models
An LLM-driven language tutor for low-resource African languages, designed around the cultural context of learners as well as the linguistic content.
Full Fine-Tuning vs. Parameter-Efficient Adaptation for Low-Resource African ASR: A Controlled Study with Whisper-Small
A controlled comparison of full fine-tuning vs. parameter-efficient adaptation (LoRA-style) for adapting Whisper-Small to low-resource African languages.
NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages
A multi-accent benchmark for speech-to-speech translation across low-resource Nigerian languages, capturing the accent diversity found in real Nigerian speech.
Building a Conversational AI Assistant for African Travel Services with LLMs and RAG
An LLM + retrieval-augmented generation pipeline for an African travel-services assistant, with attention to domain grounding and language coverage.
DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis
Multilingual, multi-domain datasets for dimensional aspect-based sentiment analysis — moving beyond polarity labels to finer-grained sentiment dimensions.
DimStance: Multilingual Datasets for Dimensional Stance Analysis
Multilingual stance-analysis datasets that decompose stance into finer dimensions rather than reducing it to a single agree/disagree label.
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
A re-evaluation of state-of-the-art language identification systems against realistic web data, exposing where current LID models silently fail.
AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text
Domain-adaptive pretraining of multilingual models on African-language social media text, improving downstream performance on noisy real-world inputs.
Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review
A systematic review of ASR research for low-resource African languages — data, models, evaluation gaps, and open problems for the community to take on.
The Rise of AfricaNLP: A Survey of Contributions, Contributors, Community Impact, and Bibliometric Analysis
A bibliometric survey of the African NLP research landscape — who contributes, where the work is published, and how the field has grown.
BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages
A human-annotated emotion-recognition dataset covering 28 languages — including several low-resource African languages — with labels for joy, sadness, anger, fear, surprise, and disgust.
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
A position paper on the state of ASR for African low-resource languages — what is blocking progress and where the community should focus next.
HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing
A survey of Hausa NLP research — datasets, models, and open challenges — written as a roadmap for researchers entering the area.
The State of Large Language Models for African Languages: Progress and Challenges
A survey of where current large language models stand on African languages — what works, what fails, and what the open research problems are.
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization
A benchmark for studying online polarization across multiple languages, cultures, and events — moving beyond single-language polarization research.
AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
A community-built hate-speech and abusive-language corpus across multiple African languages, with culturally grounded annotation guidelines reviewed by speakers from each language community.
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
A benchmark that probes large language models on everyday cultural knowledge across diverse languages and cultures — revealing where LLMs default to a Western prior.
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
A sentiment-analysis benchmark covering 14 African languages, built with native-speaker annotators and released openly. Used as the basis for SemEval-2023 Task 12.