In the early 1980s my parents brought home a home computer connected to the television screen. This motivated me to “play” scientist. I was proud to publish in those early computer magazines simple programs on how to draw mathematical functions, monitor the machine code of the microprocessor or rudimentary animations and computer games.
Everything was published, including detailed explanations of how the work was done and the complete source code of the program, so that anyone could easily copy, test, understand, reproduce and modify it for any purpose. This is the most basic version of open science, conceived as a collective and cumulative universal enterprise.
The principles of open science
Open science refers to the practice of making all stages of the scientific process transparent and accessible to others. This includes publishing research articles with their data, detailed methods, theoretical and practical foundations, experiments, as well as any information or tools needed to be able to repeat the research.
The goals are to enable reproducibility, foster collaboration, and facilitate building on prior knowledge to advance knowledge. This is essential for scientific research to be credible, ethical, accessible, and to be able to be reviewed, validated, and built upon.
What’s happening with AI?
As in any discipline, open science in artificial intelligence is the only way to ensure reproducibility and transparency and, therefore, its public advancement and use consistent with collaborative, cumulative principles for the benefit of humanity.
The vast majority of computer science researchers believe in publishing their advances according to these principles. Open source is one of the important elements – although not the only one – of any computing tool that aims to promote scientific progress.
Specialists in this area of knowledge have been creating various non-profit organizations to precisely define what research and development consists of in their field.
For example, in 1998, the Open Source Initiative (OSI) was founded , and its definition of open source is the most widely accepted international standard.
For a program to be considered open source, it is not enough to provide access to the compiled program, but also to the entire source code . Let us bear in mind that the latter –also called high-level language– is a program written in a programming language that is readable by a person. Meanwhile, the compiled code –or machine language– is a translation of the source code into a binary file that an electronic circuit can execute, but a person cannot understand.
Another requirement of open source is that it allows modification and redistribution under these same terms and for all uses, including commercial.
The case of technology companies
There are many companies that create wealth, benefit society and also benefit from society. However, very few invest in research unless they believe they will get a return on their investment.
It is common for private technology companies to take advantage of public research (funded by taxpayers) and use it to develop products from which they make huge profits. Economist Mariana Mazzucato often describes in detail a paradigmatic example: the case of Apple’s iPhone .
With AI companies, this reality is even more striking. It may be natural to base their products on previously published ideas and research from others, but it turns out that most of the most advanced AI models are mere impregnable black boxes: their internal logic is not explained, their operation and fairness are not guaranteed, and their source code cannot be analyzed.
Many of the most popular products, such as ChatGPT or Meta’s modern SeamlessM4T translator , turn out to have these undesirable features, even though they are advertised as open science articles.
DeepSeek is not open source
Some newer ones, such as DeepSeek , try to outdo the competition by allowing compiled code to be made available, but this is not open source, and does not advance scientific research.
That is, even though DeepSeek is advertised as “open source,” it does not allow access to the source code, only to the binary (compiled) version. It cannot be read, understood, or modified. That is why no one can improve this program. It can only be used as a client of the company, not as a computer science researcher.
Given this scenario, the reality is that the lack of transparency and reproducibility of these computer models hinders scientific progress and erodes confidence in AI research.
The example of Rosetta and AlphaFold 3
David Baker, Demis Hassabis, and John M. Jumper have been awarded the 2024 Nobel Prize in Chemistry for protein structure prediction. The Rosetta software began life in the late 20th century as a small project in David Baker’s lab at Washington State’s main public university . The source code was written and distributed in the high-level language Fortran – which any specialist can read, understand, and modify – and focused on ab initio structure prediction of small proteins .
Based on these ideas and using protein databases published by the research community, Google DeepMind developed a powerful statistical data analysis using its AI code AlphaFold and AlphaFold 2 .
In May 2024, DeepMind introduced its AlphaFold 3 model via a Nature journal article , which surprisingly allowed DeepMind to keep the software code unavailable, despite its own editorial policy, which focuses on “ making associated materials, data, code and protocols readily available to readers without undue qualification .”
AlphaFold is not open source either
More than a thousand members of the scientific community specializing in the area signed a letter sent to Nature because the article “ does not meet the standards of the scientific community of being usable, scalable and transparent .”
Six months later, DeepMind made the code available under a restrictive Creative Commons license . However, its terms do not meet the OSI definition of “open source.” DeepMind does not publish the weights (the result of training its neural network) of the model. To obtain them, you have to request them and it is the company itself that decides whether or not to provide them in each case. Without them, it is not possible to use AlphaFold 3 to predict protein structure.
It also explicitly prohibits the use of AlphaFold 3 model parameters or results for commercial activities, including the training of similar biomolecular models.
This approach attempts to satisfy both the scientific needs and the commercial interests of the company, but it must be made clear that we are not dealing with an open science process. It is a burden on the advancement of scientific knowledge, which belongs to all humanity.
Author Bio: Victor Etxebarria Ecenarro is Professor at the University of the Basque Country / Euskal Herriko Unibertsitatea