Why does autocorrect make so many mistakes? - World leading higher education information and services

We’re meeting a friend for lunch at a restaurant. When the first courses arrive, she interrupts the conversation to say, “Would you pass me the…” while looking at a specific spot on the table. We probably don’t need any more words or gestures to understand that she’s referring to salt and we pass it to her.

People don’t need their interlocutor to finish their sentence to know what they mean. Our knowledge of the internal structure of language allows us to anticipate which word they will use. Furthermore, the context of the communicative situation gives us information about the content and expressions our interlocutor will use.

The mobile keyboard’s autocorrect attempts to replicate this human behavior using statistical and natural language processing (NLP) techniques. It calculates the probability of a letter, word, or sequence appearing based on its frequency in the large amounts of text used to train the model.

On this statistical basis, NLP also incorporates the analysis of the structure and meaning of words, looking for patterns and relationships between them to generate corrections that are more consistent with the context .

Why then do you insist on replacing “jobar” with “Jonathan” (when I don’t know any Jonathan) or make us look a bit eccentric if we claim in a message that we have submitted documentation “telepathically” rather than “telematically”?

Combining rules and individual use

The natural language processing system of the autocorrectors we use every day relies on its internal dictionary, the language’s own syntactic rules, and the user’s history. The internal dictionary is initially built from training texts from books, academic articles, and online sources, among others, which provide a general understanding of the language. From there, the system combines this prior learning with predefined linguistic rules and information gathered from the user’s history. As a result, the system anticipates the most likely text string based on what it has learned.

Initially, these tools were developed to assist people with physical, perceptual, or cognitive disabilities in their use of language through computer systems . However, once properly integrated into the application interface, they can benefit any user by improving the speed and effort required to type.

Predicting how to spell is not easy

The mobile keyboard app manages its own dictionary of words and constructions, which may not cover all options . Predictions are individualized based on the user’s typing and the frequency with which they use certain expressions .

Even so, it remains a complex task for the system because it’s not enough to know all the possible terms. It must also decide which is most appropriate based on the context and the user’s intent. For example, the noun “house” is completely correct and accepted in everyday speech. However, in an official or administrative process, it’s more appropriate to use “housing.”

Why predictions ‘fail’

You can imagine the device’s dictionary as a tree structure in which, given a block of text entered, certain possibilities open up with varying degrees of frequency, which is fine-tuned as the user types. Within the predictions, some may be motivated by specific system programming, such as avoiding the use of profanity , and others by explicit learning, in which the user adds certain expressions to the device’s own dictionary. For this reason, the autocorrector doesn’t always match what the user expects at any given time.

To optimize the writing process, apps offer two main ways to include suggestions: offering a list of options based on probability or entering the term directly into the text. In the first case, the user must consciously analyze the alternatives. In the second, the speech is built more quickly and organically, but the user must actively delete the suggestion if it isn’t the desired one.

Although the system finds the required word most of the time, up to 94% of the time we tend to remember much more vividly those moments when it makes a serious mistake. Furthermore, according to one study , we often experience frustration when the same mistakes are repeated systematically. However, this is normal: the autocorrector’s learning process is gradual rather than immediate, and it works probabilistically, combining what it already knows from large texts with new information it gleans from the user’s history.

Despite this, the majority of users, both iOS and Android , agree that the keyboard’s built-in autocorrect improves their typing efficiency and helps reduce errors. Furthermore, continued use of the tools progressively improves their effectiveness by offering a more personalized experience.

Lexical competence is only human

However, we must not forget that the autocorrect dictionary is a word store, whose operation is separate from the mental lexicon of the human user. This lexicon is constructed by establishing networks between different lexical units based on different types of relationships (lexical families, semantic fields, cognates, etc.). The autocorrect, for its part, has a wide lexical availability, but it does not master aspects related to the form, meaning, and use of each lexical unit; that is, it lacks the lexical and communicative competence that speakers possess.

Despite these limitations, proposals are being developed that demonstrate the potential for improved contextual correction, such as PALABRIA-CM-UC3M , which focuses on the linguistic phenomenon of the impersonal “tú .” Using linguistic techniques and artificial intelligence models that learn the patterns and contexts of this phenomenon, the system can identify and correct errors that a conventional autocorrector would miss.

Although they can continue to learn new patterns and expand their knowledge to offer increasingly accurate corrections, autocorrectors are still mathematical models that operate from learned patterns and rules, lacking the deep, flexible, and contextual understanding that characterizes human language use. They will never be infallible. Not even we are: often, as in the example at the beginning, our predictions may turn out to be wrong, and we pass the salt to someone who actually wanted the jug of water.

Author Bios: Pedro Manuel Moreno-Marcos is Professor in the Department of Telematics Engineering, Marina Serrano-Marín is Assistant Professor in the Department of Humanities, Natalia Centeno Alejandre is a Specialist Technician in Artificial Intelligence and Rafael Fernández Castillejos is a Specialist Technician in the Department of Humanities: Philosophy, Language and Literature all at Carlos III University