Can we detect anxiety and other mental disorders from the words we use?


Esperanza (fictitious name) is in the psychologist’s office to have her first session. This is not the usual waiting room with a low glass table in the center and a few scattered fashion magazines.

In this there are laptops for patients who, honoring their condition, wait their turn patiently. Meanwhile they write a short story about their relationship with friends, family or co-workers.

Moments after finishing writing hers, Esperanza is called for a consultation, and the psychologist attends her holding a detailed report of her case in her hands.

“How can it be, if it’s the first time I’ve been here and I haven’t opened my mouth yet!” Esperanza thinks, surprised.

The report has been generated with natural language processing techniques, which have carefully analyzed the linguistic patterns contained in the newly written story, and have produced a preliminary diagnosis that will serve as a starting point for the psychologist’s work.

Linguistic patterns

The above is an imaginary description of something that today seems like science fiction, but that could be a reality in the not too distant future. It’s true? Could artificial intelligence interpret our behaviors to such an extent?

To answer these questions, the following must first be addressed: are there linguistic patterns that show a correlation with different mental disorders or behavioral problems?

As a preface, perhaps we should recall a recent piece of news that may have gone unnoticed by many: the discovery of a work by Lope de Vega thanks to artificial intelligence . In the research that led to this finding, a machine learning system was trained to recognize the lexical usage of up to 350 playwrights, and it turned out that the play entitled La francesa Laura exhibits a lexical usage that closely aligns with the style of the “ phoenix of wits”.

In addition to the literary value of the finding, the research puts us on the track of a really interesting concept: there are linguistic patterns that can be associated with specific people, and that can be detected automatically.

Symptomatology and language

On this, the seasoned reader may be asking new questions: are there patterns that can be associated with personality traits? And patterns that can be associated with generalized anxiety disorders? Can anxiety be detected through some kind of linguistic pattern?

Currently there is already evidence of a statistically significant relationship between the symptoms associated with anxiety and the characteristics of the language used.

A clear example is the predominance of first person pronouns and negative words in various mental or psychological pathologies. The authors of the study that we linked start from texts in English of less than 500 words extracted from Internet forums on mental health, and manage to find a significant difference in the use of such pronouns.

Automatic classification allows, with a set of properly labeled samples, to train a neural network to recognize the patterns that cause a text to receive one or another label.

This pattern classification technique, based on the deep learning technique (an architecture model known as transformers , the same architecture used by the already famous ChatGPT), has a very high predictive capacity.

On the other hand, the lack of explainability of this technique is also high. Given a prediction, the system offers no information as to why it made that decision. It goes without saying how important the explanation that should accompany a mental health diagnosis is.

Types of words and emotion

On the other hand, if instead of classifying patterns what we do is train in the extraction of characteristics, it has a lower predictive value, but a better explainability.

Given a text, it is possible to quantify elements such as the complexity of the sentences formulated or the words used, the frequency of use of certain types of words (pronouns, adverbs, adjectives), the narrative style (passive or active voice) or even the You can analyze the primary emotion that predominates in the analyzed text, or the semantic field to which it belongs.

Use in research and detection

There are many challenges to be addressed in this field. The first one involves a more precise detection of the different disorders. In other words, right now there is the possibility of detecting whether a patient suffers from a disorder related to mental health, but currently it is not possible to distinguish which one we would be talking about specifically.

We don’t yet know if this precise detection is possible or not. In any case, the investigation will go through meeting another of the pending challenges: the collection of complete and reliable data corpuses.

A large part of the existing works today use texts extracted from different internet sources, be they social networks, specialized forums or more specific services. It is not always clear who is the author of each text and, as such, it is difficult (rather, impossible) to know the mental reality of said person.

Without a reliable data source (and social networks are not), the validity of the data and results can always be questioned. Therefore, we must work on solid and reliable data capture methods, aligned with the research needs of each case.

The challenge of explainability

Although there are more or less useful approximations, current automatic classification techniques do not provide a list of reasons why this or that label has been assigned to each case. Without a good collection of arguments, it is difficult for any doctor to feel comfortable with such a delicate diagnosis.

It is therefore imperative to address this challenge of explainability, providing artificial intelligence tools with the ability to provide explanations for the decisions made.

It is possible that, combining classification and feature extraction techniques, we can solve these challenges and, who knows, perhaps the imaginary story of the waiting room will become a reality in the coming years.

Author Bios: Luis de la Fuente Valentin is Professor of the Master’s Degree in Analysis and Visualization of Massive Data and Joaquin Manuel Gonzalez Cabrera is a Teacher and Researcher. Prof. University (Level 1). Dept. School, Family and Society. Education Faculty. Principal Investigator of the Cyberpsychology Group (UNIR) both at UNIR – International University of La Rioja