
What becomes of public language when a growing proportion of the texts circulating in the press, on the internet, and on social media begin to be written by machines? This question doesn’t just concern journalism as a professional activity. It can also affect the richness of the language we use to understand, describe, and debate reality.
Historically, the press has been one of the spaces where common language has developed and been enriched. It is obviously not the sole driver of linguistic change, but it is one of the places where societies circulate new words, new expressions, and new ways of naming emerging phenomena. Several studies on journalistic language and neologisms show that newspapers have long played a crucial role in the creation and dissemination of new vocabulary, particularly when it comes to reporting on events, technologies, or social transformations to a broad audience.
This role could be weakened if a significant portion of journalistic writing were delegated to generative AI systems. Large language models generally rely on predicting the most probable word—or more precisely, the most likely token—within a sequence. They thus produce fluent and plausible texts, but also tend to favor statistical regularities, the most frequent formulations, and already established turns of phrase .
This does not, in itself, mean that language automatically degrades. The problem arises when this logic becomes dominant in the production of texts that populate the public sphere.
When AIs train on texts produced by other AIs
The risk becomes more serious when these systems start being trained on texts produced by other AIs. This is what several recent works describe as model collapse : a degeneracy process in which the data generated by a model ends up contaminating the training of subsequent generations .
Applied to language, this means that if systems increasingly learn from synthetic texts, and if these texts eventually saturate the web and public spaces, the linguistic reservoir available for future training shrinks. The more artificial texts there are, the less the models are exposed to the real diversity of human language use. Ultimately, this can lead to an impoverishment of language in various domains.
Reproduction and amplification of biases
First, when data diversity decreases and models rely primarily on pre-established patterns, biases present in the training data are likely to be reinforced rather than corrected. Recent literature on the evolution of language models specifically warns against the fact that recursive processes can amplify existing biases instead of diversifying perspectives.
Furthermore, writing tends to become increasingly repetitive : the same syntactic structures, the same intermediate tones, the same formulations, and the same ways of organizing paragraphs recur constantly. This evolution is particularly important for journalism, because the press does more than simply transmit information: it connects specialized knowledge with a broad public, prioritizes issues, translates technical vocabulary, and experiments with new formulations. When the language of the public sphere becomes too uniform, its capacity to adapt subtly to novelty weakens .
An erosion of linguistic innovation
In this context, rare or specialized words, less frequent constructions, and certain pragmatic nuances—such as irony, ambiguity, or variations in perspective—tend to be absent. The increased proportion of synthetic texts in the training data is associated with a decline in performance and a poorer representation of the diversity of human language. Simply put, the system preserves the center better than the margins .
However, many linguistic innovations arise precisely in these margins: in the form of unstable usages, occasional repurposing, or local solutions invented to name a new reality. If the system systematically favors the most probable formulations, these emerging forms have less space to circulate and establish themselves .
This issue should not be understood as an abstract opposition between “human” and “machine”, but rather as the difference between a language nourished by the contingencies of social life and a prose produced from already learned regularities.
An impoverishment of the linguistic ecosystem
The issue is not simply a reduction in the number of different words. It also concerns the ability to make subtle distinctions. When language becomes vaguer, more repetitive, or more predictable, the tools a society has to describe problems, nuance positions, and debate in the public sphere are also diminished.
On a broader scale, the question is therefore no longer simply what happens to an AI model, but what happens to the public language ecosystem as a whole. If the web becomes filled with synthetic texts, readers, journalists, and institutions will gradually be exposed to a less diverse public language. Some recent work even goes so far as to suggest a form of “contamination” of the digital ecosystem by synthetic data and shows that the way in which real and artificial data are combined is crucial to preventing more significant degradation .
An inevitable scenario?
However, the risk should not be exaggerated. Research does not conclude that all use of AI inevitably leads to collapse or degradation. Some studies show that when synthetic data is mixed with real data, rather than replacing it entirely, the degradation mechanisms do not manifest themselves in the same way, and errors can remain limited. In other words, the problem does not lie in the occasional use of AI or in a careful combination of synthetic and human data, but in the massive replacement of human writing followed by the recycling of this artificial output as if it were a living language.
With the integration of AI into journalistic production routines, journalism becomes more efficient. But what does a society lose when the language circulating in the public sphere becomes more uniform, more predictable, and less open to novelty? If the press relinquishes, even partially, its function of writing, translating, naming, and experimenting with language, it is not only professional practices that are transformed. It is also one of the main spaces where the common language has historically been able to enrich itself, renew itself, and expand its range of possibilities that is thereby weakened.
Author Bios: Xosé López-García specialises in Digital periodism, digital communication and Cristian Augusto Gonzalez Arias who is an Investigator, Pontificia Universidad Catolica de Valparaiso both at the University of Santiago de Compostela