Is AI impoverishing journalistic (and societal) language? - World leading higher education information and services

What happens to public language when a growing proportion of the texts circulating in the press, online, and on social media begin to be written by machines? This issue doesn’t just affect journalism as a profession. It can also affect the richness of the language we use to understand, describe, and discuss reality.

Historically, the press has been one of the spaces where public language expands and enriches itself. It is not the only engine of linguistic change, of course, but it is one of the arenas where societies circulate words, turns of phrase, and ways of naming emerging events. Various works on journalistic language and neologisms show precisely that newspapers have functioned as spaces for the creation and dissemination of new vocabulary, especially when they must report on events, technologies, or social transformations for broad audiences.

This role can be weakened if a significant portion of journalistic writing is delegated to generative systems. Large-scale language models are generally based on predicting the next token or likely word within a sequence. Therefore, they produce fluid and plausible texts, but they also tend to favor statistical regularities, frequent patterns, and already stabilized formulations . In itself, this does not imply an automatic degradation of language. The problem arises when this logic becomes dominant in public writing.

AI training with texts produced by other AIs

The risk becomes more serious when these systems begin training on texts produced by other AIs. This is what several recent studies have described as a model collapse dynamic : a degenerative process in which data generated by one model contaminates the training of subsequent generations .

Translated into linguistic terms, this means that if systems learn more and more from synthetic texts, and if these texts begin to fill the web and public spaces, the verbal ecosystem available for future training will shrink. More artificial text means less contact with the actual social variation of human language, which can lead to a decline in language quality in various contexts.

Reproduction and amplification of biases

To begin with, when data variation decreases and established patterns predominate, biases present in the training material can be reinforced rather than corrected. Recent literature on language model evolution and bias warns precisely that recursive processes can amplify existing biases instead of diversifying perspectives.

On the other hand, writing begins to sound increasingly repetitive : syntactic structures, middle tones, formulaic sequences, and predictable paragraph developments are all common. This is particularly important in journalism, because the press not only transmits information but also mediates between specialized and broad audiences, selects emphasis, translates vocabulary, and tests formulations. When public prose becomes too uniform, this capacity for fine-tuning in the face of novelty diminishes.

Erosion of linguistic innovation

Thus, rare or specialized words, less frequent constructions, and some pragmatic nuances, such as irony, ambiguity, or certain modulations of point of view, are reduced. The increase in synthetic text during training is associated with performance degradation and poorer coverage of the distribution of human language. Simply put: the system preserves the center better than the edges .

Many innovations are born as unstable deviations, unusual uses, or local solutions for naming something new. If the system always favors the most probable, these emerging forms have less room to circulate and consolidate . This point should not be understood as an abstract opposition between “human” and “machine,” but rather as a difference between a language exposed to social contingency and prose generated from already learned regularities.

Deterioration of the public linguistic ecosystem

It’s not just about having fewer distinct words, but also a diminished capacity to make subtle distinctions. When language becomes vaguer, more repetitive, or more predictable, the tools with which a society describes problems, nuances positions, and engages in public debate are also impoverished.

On a broader level, the problem is no longer just what happens to a model, but what happens to the public linguistic ecosystem. If the web becomes filled with synthetic texts, readers, journalists, and institutions alike will find themselves living with a less diverse public language. Some recent studies even speak of the “contamination” of the web ecosystem by synthetic data and show that the way in which real and artificial data are mixed is crucial to preventing further deterioration .

Is all lost?

However, it’s important not to exaggerate. Research doesn’t suggest that any use of AI will inevitably lead to collapse or degradation. Some studies show that when synthetic data is mixed with real data instead of completely replacing it, the collapse doesn’t occur in the same way, and the error can remain contained. In other words, the problem isn’t using AI occasionally or prudently mixing synthetic and human data, but rather massively replacing human writing and then recycling that replacement as if it were living language.

With the integration of AI into journalistic production routines, journalism gains in efficiency. But what does a society lose when the language circulating publicly becomes more uniform, more predictable, and less open to new ideas? If the press relinquishes, even partially, its function of writing, translating, naming, and experimenting with new formulations, not only do work routines change, but one of the spaces where public language has historically been most enriched, renewed, and expanded is also weakened.

Author Bios: Xosé López-García works in Digital journalism, digital communication and Cristian Augusto Gonzalez Arias is a Researcher, Pontifical Catholic University of Valparaiso both at the University of Santiago de Compostela