When AI makes misleading connections - World leading higher education information and services

What if dictionary sales in Germany were linked to the number of swimming class registrations in Japan? What if solar power production in Taiwan influenced Netflix’s stock price?

We learn early on that there’s a distinction between correlation and causation: a link still doesn’t explain a cause. Our brains still look for meaning and logical explanations when analyzing data: lines that follow the same slope, bars that rise together, or points that cluster together in a graph. Instinctively, it seems unlikely that a country’s per capita chocolate consumption would correlate with the number of its Nobel laureates: this is a “spurious correlation.”

An Apple research team published a paper in September 2024 illustrating how a simple change in the first names or attributes of characters in a mathematical statement reduced the proportion of correct answers provided by various generative artificial intelligences by up to 10%. These seemingly logical links lead to spurious correlations. Imagine asking an AI: “Adam has one apple and Eve has two, how many apples do they have?”, then asking it: “Ada has one apple and Evan has two, how many apples do they have?” and getting different answers! For a child, it seems clear that the presence of Adam rather than Ada in the problem statement does not change the answer. For an AI, it is not so simple.

How is it that we can instantly understand that these are spurious correlations, where AIs can clearly be fooled?

This problem is not trivial, since some types of AI prone to these logical misunderstandings are used in critical computer security systems. They are vulnerable to a type of attack sometimes called adversarial learning or “adversarial attacks.”

To address this problem, researchers are developing methods that correct AI learning processes by identifying parasitic characteristics that lead to spurious correlations.

How do “GPT”-like AIs learn spurious correlations?

To understand how “GPTs,” these AIs that seem so promising, get tangled up in the carpet of fallacious correlations, we need to understand how they work.

Among the models evaluated in Apple’s September 2024 release is GPT-4o, the latest creation from OpenAI. Behind GPT-4o’s success is a pre-trained generative-transformer neural network (GPT).

Generative because it aims to generate text, pre-trained because it can be re-trained to process specialized documentary corpora: contracts, mathematical composition or software code analysis for example.

GPTs belong to a larger family of models called Large Language Models (LLMs ) . LLMs have helped transform human-machine interactions. They allow the user to interact with the machine via natural language instructions, called “prompts.” For example, “write me an article for The Conversation on the topic of Generative AI” is a valid instruction. In return, the LLM will also respond in natural language, but the article in question would not be published because it would be against The Conversation’s editorial charter!

To pre-train the models, OpenIA researchers used a set of text sequences (in the order of a trillion words ). Then, like a guessing game, the transformer must analyze the sequences, some of which are hidden, and predict the missing content. At each test, the model’s parameters are readjusted to correct the prediction; this is called learning.

After training, the learned parameters allow us to numerically represent the semantic relationships between words (this is the language model). To respond to a user (this is inference), it is the same process: analyze the sequence (the prompt ), predict the next word, then the next, then the next, etc.

For a user unfamiliar with the mechanism at work, the result will be astonishing, but once again, it is only intelligence simulated by a machine. The syntax seems exact, the reasoning logical, the applications infinite: mathematics, literature, history or geography. It will not be long before LLMs start generating students’ copies , students’ dissertations , or relieving researchers from carrying out tedious tasks .

Why is this dangerous in practice?

If there are spurious links in the training sequences, these will be integrated during the learning phase and regenerated in the inference phase. This phenomenon of “spurious correlation” does not only concern LLMs, but more generally deep neural networks using large amounts of data for training.

In the field of computer security, researchers had already warned in January 2024 about similar symptoms for LLMs specialized in software vulnerability research: their research shows how a change in variable names, although without impact on the logic of the analyzed code, affects the model’s ability to correctly identify vulnerable code by up to 11%. Just as in the case of a change in first names in the statement of the apple math problem above, one of the audited LLMs, for example, learned to associate functions using variables named “myVariable” (often given in examples addressed to beginners) and their vulnerability. However, there is no cause-and-effect relationship between the name of this variable and the security of the software. The correlation is fallacious.

These LLMs are now used in companies to review code written by developers, who are supposed to ensure the detection of software bugs. AI can identify vulnerabilities or malicious behavior in computer security, so this analysis work is crucial. Without it, a subtle attacker could profile the detection system to identify these biases, manipulate it, and play on them to circumvent it.

Therefore, similar to the work on source code analysis, we are exploring the application of causal inference methods to improve the robustness of neural networks used by intrusion detection systems in computer networks.

The work of Judea Pearl, winner of the 2011 Turing Prize for Computer Science, indicates that under certain conditions, it is possible to distinguish correlations that are probably the result of a causal relationship from those that are fallacious .

By working on an intrusion detection system, a tool that monitors network traffic to detect suspicious activity, it is possible to identify correlations that could be causing bias. We can then disrupt these correlations (like a first name change) and retrain the detection model. Mathematically, the spurious correlation is marginalized in the mass of perturbed examples, and the new model is debiased.

AI is a tool, let’s not let it think for us!

Whether generative or not, AIs that have learned spurious correlations expose their users to more or less significant biases. While spurious correlations can appear amusing because of their absurdity , they can also be a source of discrimination .

More generally, recent advances in deep learning, which go well beyond generative AI, benefit and will benefit many areas, including computer security.

However, although promising, these AIs must be reconsidered in their proper place: they can certainly increase expertise capacities, but also induce blindness whose consequences can be dramatic if we come to delegate our capacity to think to algorithms.

Therefore, it’s important to educate ourselves about how these systems work—and their limitations—so we don’t blindly follow them. The problem isn’t so much the absurdity of a name change causing a drop in performance, but rather the credibility we can give to AI-generated content.

Author Bios: Pierre-Emmanuel Arduin is Lecturer in Computer Science and Myriam Merad is the CNRS Research Director – Disaster risk prevention, safety, security, resilience, social responsibility – Decision support both at Paris Dauphine University – PSL