Research & Publications

Papers, datasets, and surveys we contribute to African NLP — built with native-speaker communities and released openly.

All Publications

20 publications found

Research PaperMay 2026

Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South

A participatory red-teaming methodology for text-to-image safety evaluation grounded in Global South contexts, surfacing harms that generic Western-centric pipelines miss.

T2I SafetyRed TeamingGlobal South

Research PaperMay 2026

Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks

arXiv

An agreement-based clustering method for subjective annotation tasks that preserves disagreement signal instead of collapsing it under majority voting.

AnnotationPerspectivismSubjective NLP

Research PaperApr 2026

AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

arXiv

An LLM-driven language tutor for low-resource African languages, designed around the cultural context of learners as well as the linguistic content.

LLMEducationLow-Resource

Research PaperApr 2026

Full Fine-Tuning vs. Parameter-Efficient Adaptation for Low-Resource African ASR: A Controlled Study with Whisper-Small

AfricaNLP 2026

A controlled comparison of full fine-tuning vs. parameter-efficient adaptation (LoRA-style) for adapting Whisper-Small to low-resource African languages.

ASRWhisperFine-Tuning

Research PaperApr 2026

NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages

arXiv

A multi-accent benchmark for speech-to-speech translation across low-resource Nigerian languages, capturing the accent diversity found in real Nigerian speech.

Speech-to-SpeechBenchmarkNigerian Languages

Research PaperApr 2026

Building a Conversational AI Assistant for African Travel Services with LLMs and RAG

AfricaNLP 2026

An LLM + retrieval-augmented generation pipeline for an African travel-services assistant, with attention to domain grounding and language coverage.

LLMRAGConversational AI

Research PaperJan 2026

DimABSA: Building Multilingual and Multidomain Datasets for Dimensional Aspect-Based Sentiment Analysis

arXiv

Multilingual, multi-domain datasets for dimensional aspect-based sentiment analysis — moving beyond polarity labels to finer-grained sentiment dimensions.

SentimentABSAMultilingual

Research PaperJan 2026

DimStance: Multilingual Datasets for Dimensional Stance Analysis

arXiv

Multilingual stance-analysis datasets that decompose stance into finer dimensions rather than reducing it to a single agree/disagree label.

StanceMultilingualDatasets

Research PaperJan 2026

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

arXiv

A re-evaluation of state-of-the-art language identification systems against realistic web data, exposing where current LID models silently fail.

LIDEvaluationWeb Data

Research PaperNov 2025

AfroXLMR-Social: Adapting Pre-trained Language Models for African Languages Social Media Text

Findings of EMNLP 2025

Domain-adaptive pretraining of multilingual models on African-language social media text, improving downstream performance on noisy real-world inputs.

Pretrained ModelsSocial MediaAfrican Languages

SurveyOct 2025

Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review

arXiv

A systematic review of ASR research for low-resource African languages — data, models, evaluation gaps, and open problems for the community to take on.

ASRLow-ResourceLiterature Review

SurveySep 2025

The Rise of AfricaNLP: A Survey of Contributions, Contributors, Community Impact, and Bibliometric Analysis

arXiv

A bibliometric survey of the African NLP research landscape — who contributes, where the work is published, and how the field has grown.

SurveyBibliometricsAfricaNLP

Research PaperJul 2025

BRIGHTER: BRIdging the Gap in Human-Annotated Textual Emotion Recognition Datasets for 28 Languages

ACL 2025

A human-annotated emotion-recognition dataset covering 28 languages — including several low-resource African languages — with labels for joy, sadness, anger, fear, surprise, and disgust.

EmotionMultilingualAfrican Languages

SurveyJun 2025

Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions

AfricaNLP 2025

A position paper on the state of ASR for African low-resource languages — what is blocking progress and where the community should focus next.

ASRAfrican LanguagesPosition Paper

SurveyJun 2025

HausaNLP: Current Status, Challenges and Future Directions for Hausa Natural Language Processing

AfricaNLP 2025

A survey of Hausa NLP research — datasets, models, and open challenges — written as a roadmap for researchers entering the area.

HausaSurveyAfricaNLP

SurveyJun 2025

The State of Large Language Models for African Languages: Progress and Challenges

arXiv

A survey of where current large language models stand on African languages — what works, what fails, and what the open research problems are.

LLMAfrican LanguagesSurvey

Research PaperMay 2025

POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization

arXiv

A benchmark for studying online polarization across multiple languages, cultures, and events — moving beyond single-language polarization research.

PolarizationBenchmarkMultilingual

Research PaperApr 2025

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

NAACL 2025

A community-built hate-speech and abusive-language corpus across multiple African languages, with culturally grounded annotation guidelines reviewed by speakers from each language community.

Hate SpeechMultilingualAfrican Languages

Research PaperDec 2024

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

NeurIPS 2024

A benchmark that probes large language models on everyday cultural knowledge across diverse languages and cultures — revealing where LLMs default to a Western prior.

LLMCultural KnowledgeBenchmark

Research PaperDec 2023

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

EMNLP 2023

A sentiment-analysis benchmark covering 14 African languages, built with native-speaker annotators and released openly. Used as the basis for SemEval-2023 Task 12.

SentimentBenchmarkAfrican Languages