Brazilian COVID-19 patient data repository becomes fully operational | AGÊNCIA FAPESP

Brazilian COVID-19 patient data repository becomes fully operational Platform created by FAPESP in partnership with University of São Paulo, Albert Einstein Jewish Hospital, Hospital Sírio-Libanês and Fleury Group begins providing access to data for more than 177,000 patients, 5 million clinical examinations and 9,600 outcomes.

Brazilian COVID-19 patient data repository becomes fully operational

July 15, 2020

By Elton Alisson  |  Agência FAPESP – COVID-19 Data Sharing/BR, Brazil’s first open-access repository of demographic, clinical and blood work data regarding patients tested for COVID-19 in the state of São Paulo and across the country is fully operational after completing a pilot stage and receiving feedback from the research community.

The repository holds anonymized data on more than 177,000 patients, 9,630 outcomes, and almost 5 million clinical examinations and laboratory tests performed nationwide since November 2019 by Fleury Group (a private laboratory chain) and two leading private hospitals in the city of São Paulo, Hospital Sírio-Libanês and Albert Einstein Jewish Hospital (HIAE).

Although the first case of COVID-19 was notified in February by HIAE, the period covered by the data will enable researchers to analyze patient histories and look for evidence of symptoms in patients treated before that. 

Fleury, HIAE and Sírio-Libanês will upload fresh data on a regular basis to the repository, which is hosted by the University of São Paulo (USP). All four institutions have contributed infrastructure, technology and personnel to facilitate data sharing. 

FAPESP is in advanced talks with other public and private healthcare institutions, which are being invited to upload data to COVID-19 Data Sharing/BR.

The platform is the result of an initiative taken by FAPESP in partnership with USP with the purpose of sharing anonymized clinical patient data to support scientific research on the disease in various knowledge areas.

“Science is increasingly a collective activity, and data sharing ventures are becoming more frequent worldwide. FAPESP’s open science strategy is the backdrop for this initiative. We’re taking advantage of the crisis to spur the data sharing initiative hosted by USP. We expect not only to bring in more partners but above all to contribute to the supply of top-quality data as a basis for the scientific community to put forward solutions that enable us to face the pandemic,” said Luiz Eugênio Mello, FAPESP’s Scientific Director.

Three categories of information can be extracted from the repository: demographics (patient gender, year of birth and area of residence); clinical exams and lab tests, as well as patient movements and hospitalizations (when available); and outcomes (recovery or death). 

In a second stage, which is now being planned, COVID-19 Data Sharing/BR will also hold imaging data such as X-rays and CT scans. 

Researchers’ contributions

The repository has been rolled out in three stages. A small dataset was first offered as a pilot starting on June 17. The research community was able to download data, analyze it, and visualize it using data science techniques (read more at The second stage (to June 24) involved interaction between the team responsible for the repository and research groups interested in providing feedback on their experience of the pilot, so that enhancements could be made to the data and documentation. Some 30 emails with questions and suggestions were received from researchers during this period. 

“All suggestions will be analyzed to see what can be implemented in the short, medium or long term,” Cláudia Bauzer Medeiros, a professor in the Institute of Computer Science at the University of Campinas (UNICAMP) and a participant in the project, told Agência FAPESP

One of the suggestions has already resulted in a minor adjustment to the final version of the repository. “In most situations we analyze what can be done to contribute to the largest possible number of research projects,” said Medeiros, who is a member of the steering committee for the FAPESP Research Program on eScience and Data Science.

The initial dataset posted on June 17 was accessed and downloaded by scientists from Argentina, Belgium, Brazil, Canada, Finland, France, Germany, India, Netherlands, Portugal, Romania, Spain, Thailand, the United Kingdom and the United States.

International initiatives

The launch of the repository is part of the global open science movement to make research data publicly available for free. The momentum of the open science movement has been boosted by the COVID-19 pandemic.

Other research funders around the world are investing in initiatives comparable to the repository launched by FAPESP. In the US, the National Center for Data to Health (CD2H) and the National Center for Advancing Translational Sciences (NCATS) announced on June 16 the creation of a centralized COVID-19 clinical data portal in partnership with several other US agencies including the National Institutes of Health (NIH).

According to a press release the platform is part of an effort called the National COVID Cohort Collaborative (N3C), and will systematically collect clinical, laboratory and diagnostic data from healthcare provider organizations nationwide. It will then harmonize the aggregated information into a standard format and make it available rapidly for researchers and healthcare providers to accelerate COVID-19 research and improve clinical care.

Patient privacy will be protected, and approved users must analyze data within the platform. “They use an international standard that doesn’t allow access to the raw data, but to an interface with a system that receives requests for analysis of datasets. The platform provides the results,” Medeiros said.

Another recent initiative to provide public access to clinical data for COVID-19 patients is the Virus Outbreak Data Network (VODAN), linked to the European Open Science Cloud (EOSC).

The VODAN consortium is a public-private partnership set up to “make SARS-CoV-2 virus data FAIR”, i.e. Findable, Accessible, Interoperable and thus Reusable by both humans and machines. 

“Currently the platform lists a number of initiatives at all levels of data sharing with strict access protocols,” Medeiros said.

COVID-19 Data Sharing/BR differs from these projects in that it holds Brazilian patient data with sufficient quality for inclusion in major cross-border studies, contributing not only to research on the epidemic in Brazil but also to a faster pace of global research and development leading to vaccines or a cure for the disease, she added.

“The data being posted to these international platforms comes largely from clinical trials. It’s not patient clinical information and laboratory test results like the material available from COVID-19 Data Sharing/BR,” Medeiros said.




Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.

Topics most popular