When AI attacks our accents


June 2022, the SANAS company announces that it has raised 32 million dollars for the creation of a technology based on artificial intelligence whose objective is to remove accents . September 2022, the platform is born not without creating interest, curiosity and excitement in both the English-speaking and French – speaking world .

Such software plunges us into a contemporary dystopia where technology comes to make differences, markers of identity and cultures of individuals disappear. This idea is not new, however: the film “Sorry to bother you” released in 2018 already addressed the issue of the accent of African-American populations in a satire on call centers.

So how can you actually remove an accent? Between utopia and dystopia, why can developing an artificial intelligence capable of “removing” accents be a problem more than a solution? What do you remove more than a sound mark by neutralizing an accent?

How artificial intelligence can silence an accent

Accent can be defined as a bundle of often oral clues (vowels, consonants, intonation, etc.) which participates in the more or less conscious development of hypotheses on geographical, social or linguistic origin. individuals. This accent can be said, among other things, “regional” or “foreign” by referring to different imaginaries. The relevance of identifying an accent lies in the fact that a certain number of sound characteristics seem homogeneous among speakers of a language, a geographical area or a social group, as underlined by Philippe Boula de Mareuil.

These technologies from start-ups often constitute a black box and little concrete information on the tools used to “remove” the accent is available. However, the means are multiple and they mainly aim to partially transform the structure of the sound wave in order to bring certain acoustic indices closer to a perceptually determined standard . We can thus play on the timbre of certain vowels, the realization of consonants or even transform parameters such as rhythm, intonation or accentuation according to expected perceptual targets. At the same time, we will maintain a maximum of vocal parameters allowing to identify the voice of the initial speaker in the image of voice cloningwhich can lead to voice deepfake scams . These technologies make it possible to dissociate what is in the order of speech from what is related to the voice.

Automatic and real-time speech processing poses technological difficulties, the main one being the quality of the sound signal to be processed. Nevertheless, there are different solutions based on deep learning and neural networks , as well as large speech corpora , which allow better management of uncertainties in the signal.

In the case of foreign languages, Sylvain Detey, Lionel Fontan and Thomas Pellegrini identify a few issues inherent in the development of these technologies , namely which standard to use to make a comparison with what is expected, or even the role that corpora can have in the determination of these objectives – without any particularly promising answers emerging for the moment.

The Myth of the Neutral Accent

However, identifying an accent is not limited to acoustic cues alone. Donald L. Rubin was able to demonstrate that listeners can recreate the impression of a perceived accent simply by associating faces with supposedly different origins to voices. Similarly, in the absence of these other cues, speakers are not so good at recognizing accents they don’t regularly hear or stereotypically represent to themselves, e.g. idea that there are a lot of consonants in German.

Wanting to remove accents to counter the social effects of discrimination on accent amounts to asking the question of what a “neutral” accent is. Now, all variations of pronunciation imply representations. Médéric Gasquet-Cyrus, “Marseille specialist” according to the media, reminds us that even the so-called “Parisian” accent is an accent . In French, the accent that is described as “standard” has evolved based on sociologically dominant groups  : Parisian upper middle class, media (radio, TV), favored middle classes, for example.

For several years, researchers grouped together in a collective have been trying to determine the contours of a reference French based on the similarities that exist between all the dialects of the Francophonie. The project “Phonology of contemporary French” has thus made it possible to provide the general public with accents to hear.

It should also be noted that the value attributed to an accent (strong, soft, romantic, hard) depends largely on individuals, periods and social groups. However, Iván Fónagy, a Hungarian philologist, has shown that individuals tend to attribute the same properties to sounds in his book La vive voix: Essays in psychophonetics  : the /r/ a feisty sound, the /i/ as a small , /u/ (the spelling “ou”) as opulent, etc.

Delete or keep, the chicken or the egg?

In sociology, Wayne Brekhus raises the question of the need to look at the invisible and to deal at the same time with the marked and the unmarked – the accent and what is considered to be a non-accent. This leads to reviewing the power relations that exist between individuals and the way in which we homogenize the marked: the one who has (according to others) an accent.

Also, we are led to question how emerging technologies can make us more “actor” or “actress” than “automaton” , according to Catherine Pascale, by participating in the creation of an eco-ethical framework. Removing an accent means valuing a dominant type of accent while neglecting the fact that other co-factors will contribute to the perception of this accent just as much as the emergence of language discrimination. Removing the accent does not remove the discriminations. On the contrary, the accent makes the identity heard, thus participating in phenomena of humanization, of adherence to the group or even of empathy: the accent is quite alteritarian.

If the evolution of technologies by artificial intelligence and deep learning offer society still unexplored potential, they can also lead to a dystopia where dehumanization leads to relegating to the background the political and social role, however major, on living together and diversity echoed by the UNESCO Universal Declaration on Cultural Diversity .

Rather than hiding them, it seems necessary to make recruiters aware of how accents can contribute to customer satisfaction and that politicians take up this issue. If the National Assembly had taken a strong step by voting, in 2020, a text prohibiting discrimination on the accent, La Provence recalls that the Senate does not seem to seize it since it still does not appear at its order . today, two years later.

Author Bio: Gregory Miras is a University Professor in Language Didactics at the University of Lorraine