That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects
Published in Proceedings of the 9th Widening NLP Workshop, Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Recommended citation: Coggins, W., Jasmine McKenzie, Sangpil Youm, Pradham Mummaleti, Juan Gilbert, Eric Ragan, and Bonnie Dorr. That Ain’t Right: Assessing LLM Performance on QA in African American and West African English Dialects, Proceedings of the 9th Widening NLP Workshop, Conference on Empirical Methods in Natural Language Processing (EMNLP), Suzhou, China, pp. 123–129, 2025. https://aclanthology.org/2025.winlp-main.21/
As Large Language Models (LLMs) gain mainstream public usage, understanding how users interact with them becomes increasingly important. Limited variety in training data raises concerns about LLM reliability across different language inputs. To explore this, we test several LLMs using functionally equivalent prompts expressed in different English dialects. We frame this analysis using Question-Answer (QA) pairs, which allow us to detect and evaluate appropriate and anomalous model behavior. We contribute a cross-LLM testing method and a new QA dataset translated into AAVE and WAPE variants. Results show a notable drop in accuracy for one dialect relative to the baseline.
