The importance of open data and data sharing in fighting COVID-19
September 02, 2020
By Maria Fernanda Ziegler | Agência FAPESP – The management of data obtained in scientific research is extremely important to support discoveries in all knowledge areas. During the COVID-19 pandemic, it has become evident that open data sharing arrangements involving researchers in different countries and communities can even save lives.
“At this time of global challenges, it’s important to amass all the information that we’re able to produce and from that to generate evidence that enables different communities and countries to be better prepared to face these challenges,” said Luiz Eugênio Mello, FAPESP’s Scientific Director, in his opening remarks to the webinar on “Open Data Under the COVID-19 Pandemic”, held on August 5. This was the fourth FAPESP COVID-19 Research Webinar.
According to Mello, high-quality data from prior research, epidemiological surveys and clinical care have enabled the scientific community to identify difficulties that need to be surmounted and propose solutions to address the pandemic.
“FAPESP is a groundbreaker in this field. On July 1, it launched Brazil’s first open data repository with demographic data as well as clinical and laboratory data for patients who have been tested for COVID-19 in this country,” said Claudia Bauzer Medeiros, a professor at the University of Campinas’s Institute of Computing (IC-UNICAMP) and a participant in FAPESP’s open data repository project, known as COVID-19 Data Sharing/BR.
To date, the repository holds anonymized data on more than 177,000 patients, 9,634 outcomes, and almost 5 million clinical examinations and laboratory tests performed nationwide since November 2019 by Fleury Group (a private laboratory chain) and two leading private hospitals in São Paulo City, Syrian-Lebanese Hospital and Albert Einstein Jewish Hospital (HIAE).
Despite the great importance and benefits of sharing data on COVID-19, the venture faces many challenges. In the webinar, experts from Brazil, the United States and France discussed these challenges as well as the need for a large-scale global effort to control the pandemic.
“Data sharing initiatives and facilities bolster our research capabilities and help us deepen our understanding of the impact of the disease. Major challenges to be faced in this field include data heterogeneity and data privacy. There are also legal issues that the pandemic has made even more evident. Different countries allow different legal rationales for the use of data, making it hard to connect all this information,” said Maurício Barreto, a researcher affiliated with Oswaldo Cruz Foundation (Fiocruz), a leading public health research institution linked to the Brazilian Ministry of Health. Barreto heads Fiocruz’s Center for Health Data Integration (CIDACS) in Salvador, in the state of Bahia.
CIDACS conducts interdisciplinary research projects involving big data and platforms such as the Cohort of 100 Million Brazilians, the Zika Platform, Technologies and Innovations for SUS (the Brazilian public health system), Equity and Urban Sustainability, Bioinformatics, and Genomic Epidemiology.
The pandemic has also evidenced the need to go beyond health data to understand the overall impact of the global crisis. “It isn’t hard to understand that this isn’t only a health and medical crisis. It’s also a social, economic and political crisis worldwide. Everything about the social world has changed quite abruptly, and how successful we are in understanding the impact of the disease on the population involves a broad range of issues,” said Amy Pienta, a research scientist in the Inter-University Consortium for Political and Social Research (ICPSR) at the University of Michigan in the US. Founded in 1962, ICPSR has one of the largest and oldest collections of social science data archives and databases in the world.
The issues involved range from overcrowded housing, unemployment and inequality to small businesses shuttering, growing mental health problems, and the impact of distance learning on child development, among many others, Pienta noted. “Social scientists are collecting and providing data that help answer questions about the impact of the pandemic, enabling the research community to deepen its understanding of these various crises and find out where the problems are. This involves a wide variety of data that must be correlated,” she said.
For Anne Cambon-Thomsen, Emeritus Research Director at France’s National Center for Scientific Research (CNRS), international cooperation for data sharing and reuse is as important as data integration in different knowledge areas.
Currently working with an epidemiology and public health team at Paul Sabatier University (UPS, also known as Toulouse III), Cambon-Thomsen discussed the need for cross-border coordination. “One of the main lessons of the pandemic is the need to balance speed, timeliness and agility with precision,” she said. “In addition, we suffer from a lack of pre-approved international data sharing agreements, and we need to coordinate the efforts of different jurisdictions to foster global open science through policy and investment.”
According to Cambon-Thomsen, early publication of data and the use of trustworthy data repositories committed to the long-term preservation of data holdings with sustained access are key goals if these efforts are to succeed. “Another key objective is the need to balance ethics and privacy, taking into account public interests and benefits while addressing the health crisis,” she said.
Cambon-Thomsen is a member of the Research Data Alliance (RDA), launched as a community-driven initiative in 2013 by the European Commission, the US National Science Foundation (NSF) and the Australian Government’s Department of Innovation with the goal of building the social and technical infrastructure to enable open data sharing and reuse.
The RDA COVID-19 Working Group has drafted best practice guidelines and detailed recommendations for researchers and policymakers to help maximize data sharing via open platforms in public health emergencies, such as the COVID-19 pandemic.
“We had subgroups for four priority areas, each focusing on a specific aspect and type of data: clinical, omics, epidemiology, and social sciences,” she explained. “There were also four subgroups on cross-cutting themes [community participation for data sharing, indigenous data, legal and ethical considerations, research software for data sharing] to focus the discussion and provide an initial set of guidelines on a fast-track basis.”
The next speaker was Dawei Lin, a data scientist, bioinformatics expert affiliated with the US National Institutes of Health (NIH), and a member of the RDA (as is Pienta). According to Lin, lack of time is a major problem for data scientists associated with the pandemic. “The people who are producing the data are very busy saving lives or pursuing discoveries of vital importance to the fight against the virus, so it’s hard for them to take the time and care to manage and share their data. Hence the need for innovative tools, protocols and structures to enable data sharing even under these difficult conditions,” he said.
Lin repeatedly underscored the need to ensure that data sources are trustworthy. “Repositories must win the trust of the communities they serve. They must demonstrate that they are reliable and able to manage the data they hold satisfactorily,” he said.
Furthermore, he explained that the term TRUST can be considered an acronym for Transparency, Responsibility, User focus, Sustainability, and Technology.
“Data availability is a priority in a pandemic,” Cambon-Thomsen concluded. “A variety of well-documented data is needed, with good metadata and a focus on interdisciplinarity.”
“Data sharing is time-consuming, and during a pandemic, it’s even harder. It’s our job to help scientists in this task,” Pienta concluded.
“There’s nothing new about data repositories, of course, but the pandemic has emphasized their importance and especially the challenges we face in managing them on an open and trustworthy basis. There are funding opportunities out there to help people share data,” Lin concluded. “What keeps a repository alive is its relevance to the community.”
A complete recording of the 4th FAPESP COVID-19 Research Webinar can be watched at covid19.fapesp.br/open-data-under-the-covid-19-pandemic-august-5-2020/265.
Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.