Analyzing Vocal Bias via Accents in Voice Recognition Technologies

CS 105 Final Project

During my sophomore fall semester at Harvard, I took CS 105. This class is about Privacy and Technology. For the final project, I worked on a three person team where we wanted to analyze algorithmic bias. We set out wondering if technology that claims accessibility is really accessible to everyone. This interest spurred our curiosity about how vocal differences caused by factors such as regional background, socioeconomic status, or speech impediments affect the accuracy of voice to text algorithms. This led to our research question: are voice-to-text algorithms biased?

We researched and analyzed vocal bias via accents in voice recognition technologies using George Mason University’s Speech Accent Archive, Python, and IBM, Amazon, Google, and Microsoft speech to text technologies. We analyzed bias both through edit distance and accuracy rate. We found that IBM has the top-performing speech to text technology. More importantly, we found clear differenes in the accuracy of these technologies by accent. Three of the four technologies performed considerably better accuracy-wise on English spoken with a US American accent than on any other accent. Speaking English with a Vietnamese or Spanish accent proved particularly troublesome for all four technologies. These discrepancies have significant implications for the accessibility of hands-free and voice recognition technologies for individuals speaking English with a non-US American accent and confirms our suspicion of vocal bias in voice recognition technologies.