COVID-19 Data Sharing/BR repository helps scientists make discoveries in health and computation
July 14, 2021
By Elton Alisson | Agência FAPESP – A year after its launch, the COVID-19 Data Sharing/BR repository has registered around 4,000 downloads by users in 36 countries. It holds more than 50 million datasets from 800,000 patients.
“We want COVID-19 Data Sharing/BR to serve as the foundation for us to motivate the scientific community to share information produced to a high-quality standard with the aim of creating new knowledge,” said Luiz Eugênio Mello, Scientific Director of FAPESP. “Data sharing is a novel way of doing science, and FAPESP intends to travel increasingly in this direction.”
The repository is the first of its kind in Latin America and was established by FAPESP in cooperation with the University of São Paulo (USP). Its purpose is to assure open access to patient data relating to COVID-19 for use in research on the disease. Data from the repository has already made several discoveries possible in both health and computation.
Results of some of the research conducted using data from the repository were presented during the online event “The COVID-19 Data Sharing/BR repository – open data to combat the pandemic”, held on June 18, the platform’s first anniversary.
The repository currently holds more than 50 million records, most of which are the results of laboratory tests and clinical examinations of more than 800,000 patients, as well as over 300,000 outcome records.
In Brazil, the repository has been used by researchers in all states. “Almost half the downloads in Brazil were by users outside São Paulo. This could be considered evidence of its use in collaborative research involving several centers,” said Fátima Nunes, a professor at USP and a participant in the project.
The repository holds anonymized demographic and clinical data for patients who were tested for COVID-19 or underwent some kind of examination relating to the disease.
“The data is available for individuals but none of them can be identified. Before being posted to the open-access repository, all datasets are checked to make sure demographics are included only if the combinations permit a minimum grouping,” said Gabriela Barnabé, a consultant to FAPESP and adtech data lead at Globo, Brazil’s largest media company.
The architecture of COVID-19 Data Sharing/BR is based on the São Paulo State Network of Research Data Repositories, which took three years to build. Its 100-strong team included university administrators, researchers and technicians from USP, the University of Campinas (UNICAMP), São Paulo State University (UNESP), the Federal Universities of São Paulo (UNIFESP), the ABC (UFABC) and São Carlos (UFSCar), and the Aeronautical Technology Institute (ITA).
“These institutions built their repositories together and now export metadata to a single interface, which can be accessed by everyone worldwide. It’s the first data infrastructure of the kind in Latin America,” said Claudia Bauzer Medeiros, a professor at UNICAMP’s Institute of Computing and coordinator of the network project.
Because the network of repositories was designed to be extensible and open-access, it was possible to create COVID-19 Data Sharing/BR and connect it to the network in a fortnight. The process would have taken months without the existing infrastructure.
“By the end of June, we’ll have the latest data from most of our partners. It’s being processed at this time and has to complete the revalidation process before being posted to the platform,” Medeiros said. “The difference compared to other repositories is the variety of datasets. They include hundreds of types of clinical tests, for example, facilitating research on comorbidities. The potential for knowledge advancement in many areas is still being explored.”
Based on the data now available from the repository, researchers at USP’s Institute of Mathematical and Computational Sciences (ICMC) have developed interoperable data visualization tools so that users can manipulate COVID-19 data in graph form to see and understand the relevant relationships.
One of these tools is the Interoperable Covid Visualizer (I-CovidVis), which tracks the progress of a patient with the disease while in hospital, integrating the results of clinical exams and tests of all kinds.
“The tool enables a specialist to view the results of tests as sets of analytes for a specified timespan, and obtain the necessary information for decision making very quickly,” said Agma Traina, a professor at ICMC-USP and coordinator of the project.
The data available from the repository has also been used to create a sophisticated natural language query system that can establish correlations between test results and patients. “Another important aspect of this type of research is that people who aren’t specialists in computation can analyze the data using Brazilian Portuguese rather than computer programming languages,” said Marco Antonio Casanova, a professor at the Pontifical Catholic University of Rio de Janeiro (PUC-Rio). “Because the platform is open-access, we were able to test and validate it with a large volume of real data.”
Augmented inflammatory response
Another study in the health area using data posted to the repository by researchers at various institutions in Brazil and abroad showed why men and older people are more likely to develop severe COVID-19. The researchers analyzed the clinical exams of over 178,000 patients tested for COVID-19 available from the repository and found that the disease induced similar alterations in laboratory parameters for men and women. Older male patients, however, had significantly more abnormal results than older female patients, including augmented levels of inflammatory markers.
“We found that the levels of inflammatory markers in severe COVID-19 patients were extremely high and varied according to sex and age,” said Helder Nakaya, Vice Director of the University of São Paulo’s School of Pharmaceutical Sciences (FCF-USP) and principal investigator for the study.
Shortly after the first upload of data to the repository, the researchers used bioinformatics techniques to analyze the laboratory parameters of more than 33,000 patients with a confirmed COVID-19 diagnosis provided by Fleury Group, a private laboratory chain, and two leading private hospitals in the city of São Paulo, Syrian-Lebanese Hospital and Albert Einstein Jewish Hospital (HIAE). Most of the patients had positive RT-PCR test results. This test is considered the gold standard in detection of the disease.
The results of the analysis showed elevated levels of C-reactive protein and ferritin in older male COVID-19 patients. These substances are produced by the liver and increase in the blood because of infection or inflammation. The results also showed abnormal levels of liver function enzymes in several age groups except young women.
“The findings can serve as guidelines for more research on the pathogenesis of COVID-19 and contribute to the development of predictive models of infection by SARS-CoV-2 and progression to a severe form of the disease,” Nakaya said (more at: agencia.fapesp.br/34191).
The researchers are now developing machine learning methods to analyze a new upload of lab test results for COVID-19 patients to the repository, including data supplied by Hospital das Clínicas, the hospital complex run by the University of São Paulo’s Medical School (HC-FM-USP), and Beneficência Portuguesa de São Paulo (BP), one of the largest private hospitals in Latin America.
The aims of the study include identifying outcomes and interactions among several laboratory parameters. “We also mean to analyze in more detail what happens to laboratory parameters for patients diagnosed with COVID-19 and other infections, such as dengue,” Nakaya said.
The webinar was chaired by Roberto Marcondes César, a professor at USP and a member of the steering committee for FAPESP’s Research, Innovation and Dissemination Centers (RIDCs), with João Eduardo Ferreira, USP’s Director of Information Technology, moderating the questions. César and Ferreira are both members of the repository’s technical development committee.
Other participants in the event included representatives of the institutions that are participating in the repository: Luiz Fernando Lima Reis, Syrian-Lebanese Hospital’s Director of Teaching and Research; Luiz Vicente Rizzo, Executive Director of Albert Einstein Jewish Institute for Education and Research (IIEP); Edgar Rizzati, Fleury Group’s Medical and Technical Executive Director; and Geraldo Busatto, a professor at HC-FM-USP.
A recording of the event can be watched at: www.youtube.com/watch?v=qHlKOMAtM1Q.
Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.