FAPESP creates repository of clinical information to facilitate research on COVID-19
July 01, 2020
By Elton Alisson | Agência FAPESP – Since June 17, universities and research institutions throughout Brazil have been able to access COVID-19 Data Sharing/BR, the country’s first repository of anonymized demographic, clinical and laboratory data regarding patients tested for COVID-19 in the state of São Paulo.
The purpose of the platform is to enable sharing of patient data to support scientific research on the disease in various knowledge areas.
Creation of the database is an initiative taken by FAPESP in partnership with the University of São Paulo (USP). Several leading hospitals and clinical laboratory chains are participating.
So far the partners are Fleury Group (a private laboratory chain) and two leading private hospitals in São Paulo City, Hospital Sírio-Libanês and Albert Einstein Jewish Hospital (HIAE), which have contributed infrastructure, technology and personnel to facilitate data sharing. FAPESP has invited other healthcare institutions to contribute data.
“The key idea behind the platform is to assist scientific research on COVID-19 by sharing data that wouldn’t otherwise be available and mobilize the community of computer scientists, mathematicians and information analysts to contribute new ideas for combating the epidemic,” Luiz Eugênio Mello, FAPESP’s Scientific Director, said during an online press conference held by FAPESP on June 17 to launch COVID-19 Data Sharing/BR.
The repository will initially hold open-access anonymized data on 75,000 patients, 6,500 sets of outcome data, and more than 1.6 million clinical examinations and laboratory tests performed nationwide by Fleury and by Sírio-Libanês and HIAE in São Paulo City since November 2019.
Although the first case of COVID-19 was notified in February by HIAE, the period covered by the data will enable researchers to analyze patient histories and look for evidence of symptoms in patients treated before that. Fleury, HIAE and Sírio-Libanês will upload fresh data on a regular basis.
Three categories of information can be extracted from the repository: demographics (patient gender, year of birth and area of residence); clinical exams, laboratory tests and hospitalizations; and outcomes (recovery or death). Later it will also contain images, such as X-rays and CT scans.
“It would cost tens of millions of dollars to obtain such data by other means,” Mello said. “Free access to the repository is possible thanks to the generosity and commitment of the three institutions that are participating in the initiative.”
Three stages of the repository have been planned. A small dataset was offered first as a pilot. The research community was able to download data, analyze it, and visualize it using data science techniques. The second stage (to June 24) involved interaction between the team responsible for the repository and research groups interested in providing feedback on their experience of the pilot, so that enhancements could be made to the data and documentation. In the third stage, the general public can access the full dataset from July 1.
“Initially we put up a pilot dataset for exploratory analysis, and then we extended it as the data analysts started to use it,” said João Eduardo Ferreira, a professor in the University of São Paulo’s Institute of Mathematics and Statistics (IME-USP), a participant in the project.
Understanding the disease
According to Edgar Rizzatti, Medical and Technical Executive Director of Fleury Group, the repository will permit access to data not just for research by the scientific community but also for the development of technological solutions by entrepreneurs and startups.
“Since the onset of the pandemic we’ve been contacted by startups, universities and research institutions, as part of individual projects or collaborations, to ask if they could access anonymized COVID-19 patient data for research purposes or to develop data science strategies and artificial intelligence algorithms. So I expect this groundbreaking initiative to enable many more people and groups to understand COVID-19 much more deeply,” Rizzatti said.
Luiz Fernando Lima Reis, Head of Education and Research at Sírio-Libanês, agreed. “The database will enable the scientific community to access data that reflects the current status of the COVID-19 epidemic in Brazil and the characteristics acquired here by the disease, which can be combated only by means of data-based solutions,” he said.
The steering committee for the repository, he stressed, has taken care to anonymize all patient data in order to assure confidentiality, and has complied fully with the requirements of Brazil’s Data Protection Law.
For Luiz Vicente Rizzo, HIAE’s Head of Research, the crisis offers an opportunity to demonstrate the strength and power of the research done by non-governmental institutions committed to fighting COVID-19.
“Here at the Einstein [as HIAE is known], we’re working on 68 research projects relating to COVID-19 and begun in the past six months, as well as another 113 that are about to get under way,” he said. “This shows that we as non-governmental institutions have an important role to play and can contribute significantly to research in the state of São Paulo and Brazil as a whole.”
Origin of the repository
The idea of creating COVID-19 Data Sharing/BR arose a little over a month ago and was quickly put into practice thanks to another project launched by FAPESP late last year, the São Paulo State Network of Scientific Data Repositories. Development of the network took almost three years. Its open-access platform offers a metadata search engine and data libraries associated with scientific research in all knowledge areas conducted by higher education and research institutions in the state of São Paulo. The same platform will also host COVID-19 Data Sharing/BR.
The network was created by USP and five other public universities – the University of Campinas (UNICAMP), São Paulo State University (UNESP), the Federal University of São Carlos (UFSCar), the Federal University of the ABC (UFABC) and the Federal University of São Paulo (UNIFESP) – plus the Aeronautical Technology Institute (ITA) and the Brazilian Agricultural Research Corporation’s scientific and technological information unit (CNPTIA/EMBRAPA).
“Data sharing is essential to cope with the current situation, which is likely to last a long time,” said Sylvio Canuto, USP’s Pro-Rector for Research.
For Cláudia Bauzer Medeiros, a professor in UNICAMP’s Computer Science Institute and a participant in the project, the repository will be useful not only for research on COVID-19 but also as a support for policymaking to prevent a recurrence of situations like the current crisis and attenuate the impact of future pandemics. “The repository collects data produced by Brazilians and will contribute to world science,” she said.
Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.