Estimating a person’s vocabulary with total precision is a chimera, and all the data that have been proposed so far on this are, at best, good approximations. The reason for the poor success of vocabulary estimates lies in the wonderful ability of the human brain to use linguistic resources to create and modify words.
We know that calculus is a computation done through mathematical operations. Therefore, we know that calculating is performing that action, that a calculator is the person or machine that performs it, and that calculable is what can be calculated. We also know that I will calculate corresponds to the future, and that I calculated corresponds to the past. And we know that an individual can be calculating , but if there are two or more, then they will be calculators .
Without moving too much from the same lexical anchor point, we can check the exponential knowledge of specific words that we can have using simple mechanisms of morphological inflection and derivation. And so we can realize how complex it would be to accurately determine the vocabulary known to a person if we wanted, for example, to measure all the verb forms of the verb calculate . Calculate it yourself!
To try to conquer this chimerical field of knowledge and bring it closer to reality, another more effective option could be to put all the words included in the dictionary to the test. In the linguistic field, these entries are known as slogans.
Are all the words in the dictionary?
An approach based on exploring known lemmas would relieve the need to test verb, gender and number inflections, assuming that speakers of a language will be able to apply the correct rules of concordance and dependency. If our language were fully represented in a dictionary and if the number of slogans collected were manageable, it would not be difficult to test the population with all those words.
But neither of the two conditions is met, returning us to the realm of illusion and the unrealizable: not all the words known to the speakers are listed in the dictionary, nor is the number of words that are listed manageable.
The first is quite obvious, especially if we consider that languages are living and changing cultural manifestations. In fact, the Dictionary of the Spanish language itself went from including approximately 83,000 slogans in its 21st edition in 1992, to 88,000 slogans in its 22nd edition in 2001, and to include nearly 93,000 in its 23rd edition in 2014.
And so, in addition to seeing the magnitude and richness of a living language, we also verify that the number of words changes and grows. Who would be willing to answer a survey with about 10,000 questions?
Make an accurate estimate
Thanks to large-scale psycholinguistic studies (called mega-studies) and the support of social media and online platforms, today we are one step closer to solving these unknowns. How can we accurately estimate the vocabulary known to a person? The answer requires a combination of elements that, mixed in an ideal way, can guide us towards a much more accurate knowledge of the lexical level of people.
First, we will need to choose a number of slogans that is high and representative of the language. Second, we must integrate these words into a task that poses a challenge for people and that provides us with information about their ability to recognize lexically. Third, we will have to create a gamified platform with which people can test themselves, and in turn can invite and challenge their acquaintances and relatives, generating a snowball effect. Relying on the viralization of the platform, fourthly, a random sampling algorithm must be generated to obtain data from tens of thousands of words, asking each person to respond only to a small and manageable number of them. And, fifthly, we will have to collect basic sociodemographic information from peoplebig data ), predictions and reliable estimates on vocabulary knowledge.
Following this recipe for alchemical ingredients, some international laboratories have already managed to give the first answers to this great question about the lexicon. The Center for Reading Research at the University of Ghent is undoubtedly the world’s pioneer institution in estimating the vocabulary of speakers of languages such as English or Dutch , putting hundreds of thousands of people to the test.
In 2020, and thanks to the collaboration of researchers from the Nebrija University , the Basque Center on Cognition, Brain and Language and the University of Ghent itself, a study came to light that, for the first time, made it possible to estimate the vocabulary known to speakers of Spanish.
In order to calculate this estimate of the known lexicon, the team coordinated by the author of this article gathered the necessary ingredients to elaborate the recipe for success. First, they selected more than 45,000 words from Spanish. Then, they designed a classic task in psycholinguistics called visual lexical decision: each person would see a series of text strings on the screen, and had to decide if what was presented was a real word from Spanish or if, on the contrary, it was a word invented ( pseudoword ).
With this, a platform was launched that could be accessed from devices with an internet connection and where lexical knowledge was put to the test. Each time the game started, each participant would receive a group of 70 words and 30 pseudowords chosen randomly.
In addition, the players had to provide some general information to be able to adjust the calculations later, such as their gender, age, years of education and number of known languages.
In just a few weeks, nearly 170,000 native Spanish speakers from 19 different countries completed the game. With the roughly 12 million individual pieces of data collected for the words and thanks to a series of complex statistical analyzes, the team was finally able to provide an answer to the big question.
According to general statistics, the average citizen is a person around 45 years of age. How many words will that person know? With some variability due to the number of years that they may have been in the educational system, whether they are male or female and the number of languages they can speak, the answer will not leave us indifferent: approximately 30,000 words. In other words, an average citizen correctly recognizes two thirds of the words listed in the Dictionary of the Spanish Language.
Factors for the lexicon to increase or decrease
And what factors make lexical knowledge increase or decrease? The factor with the greatest impact on people’s vocabulary level is their age. As is logical, during the first part of our life is when the growth of the number of known words grows exponentially.
Thus, throughout childhood we populate that initial lexical tabula rasa until we reach youth with the ability to recognize around half of the words in our dictionary (around 25,000 words at age 25).
Interestingly, and contrary to what some intuitively thought, the level of vocabulary increases with age, reaching 35,000 words at age 80, or, what is the same, about 80% of the slogans of the dictionary.
Therefore, we must thank our elders for their contribution, among many other things, to general lexical knowledge. In a country with a clear demographic aging, the vocabulary known to the elderly groups is a reference for the rest of the population, and is a tribute to continuous learning.
Another factor directly related to the previous one and with a determining weight in people’s vocabulary level is the educational level they have reached. The more years of formal education a person has and the higher the educational level passed, the higher their lexical level will also be.
This finding coincides with the results of studies that show that the number of years that a person spends in the educational system is a critical factor for their intellectual level, also extending them to the lexical level. Education, intelligence and vocabulary are traveling companions on this path we call life.
Finally, another of the most surprising discoveries, and which also coincides with the findings of the teams from other countries, is the fact that the size of the vocabulary increases with the knowledge of other languages. Knowledge of the Spanish lexicon increases linearly depending on the number of languages a person speaks. In a world in which multilingualism is more the norm than the exception, this is a promising piece of information that revalues language learning.
We constantly learn new words. Sometimes we learn voluntarily. Other times we learn accidentally, perhaps without realizing it. Thus, at the time of the de – escalation of the effects of the coronavirus , we combine weekdays full of telework video calls with leisure time during the weekend . A few years ago, few people knew these terms. Today, almost all of us use them, and they are already part of our lexicon, and also of the Dictionary of the Spanish Language, after its last update .
Author Bio: Jon Andoni Duñabeitia is Director of the Center for Cognitive Science of the Faculty of Languages and Education at Nebrija University