A Turning Point for (Data) Science
Why open science is essential for scientific progress.
The cover story of this issue of Biomedical Computation Review is titled “Data’s Identity Crisis”—with good reason. As vast stores of biomedical data are being created on a daily basis, our ability to make thorough use of them is stymied by our failure to share. The result: Scientific progress is radically slower than it needs to be. This seems to me to fit the Webster’s dictionary definition of a crisis as “a difficult or dangerous situation that needs serious attention.”
Fortunately, there is a solution to this crisis: open science.
Academic researchers take for granted that discoveries need to be published to achieve maximal impact and may be surprised by so much talk about open science. They may not realize that what is at stake in the open science/open data discussion is not whether results should be made public, but whether and how the data and analytical tools that led to those results should be available.
Traditional scientific publishing models were created when providing access to data and software was not possible due the constraints of print media. As we evolved into an era where data and software can be made available online, the belated discussion focuses on how to share them and who has rights and responsibilities to do so.
Provided individual privacy is protected, opening data for further analyses beyond an original study is about diversifying approaches to extract knowledge. It is not about witch hunting to destroy the work of those who previously analyzed the data. It is not about taking advantage of someone else’s work without acknowledgements. It is not about removing incentives for good science.
On the contrary, open science is about reproducibility so there is no wasted time pursuing approaches that are flawed. It is about eliciting new ideas to reuse data that were collected with funding from taxpayers. It is ultimately about accelerating findings in a time frame that may make a difference for those who are suffering. People don’t care who discovers a cure for cancer; they just want someone to discover it. Soon.
So let’s not waste time creating chasms that pit biomedical and behavioral researchers against data scientists. Data scientists are not science “parasites” who use other people’s data without attribution and without sufficient knowledge. Medical researchers are not data hoarders who want exclusive rights to discoveries. We all want science to translate into better health for everyone on the planet. Let’s focus on what needs to happen to create an environment that promotes rapid discoveries that make a true difference.
To achieve the open science ideal, there’s a lot of meticulous, time-consuming work that must be done. For example, for datasets to be reused properly, they need to be clearly identified, posted in searchable repositories, and (perhaps most importantly for re-usability and reproducibility) contain descriptions or annotations (so-called metadata) that allow users to understand the context in which data were collected and pre-processed, their potential limitations, and how they can be accessed.
The move to open science is an exciting turning point for scientists everywhere, as it will allow data to be used in many more ways than what we have traditionally envisioned. And plenty of people are already on board: Many scientists, coming from different backgrounds and different specialties, emphasize the importance of maximizing the use of data through systematic annotation and organization. The whole community must unite to help design the ecosystem for data sharing in a way that moves us beyond the ideas of a few researchers and accelerates meaningful biomedical discoveries.