A computer system will facilitate the management and sharing of Brazilian climate data
April 08, 2020
By Maria Fernanda Ziegler | Agência FAPESP – Every day, researchers in the Amazon and other parts of Brazil collect vast amounts of data on atmospheric conditions, such as levels of particulate matter and gases, cloud properties, temperature, humidity and wind speed. The data help scientists understand the impact of pollution on climate and the hydrological cycle, for example. It also serves as input for estimates of the impact of climate change on specific localities.
However, most of these data are not published and are inaccessible to many scientists. To change this situation and contribute to an improvement in the Brazilian science and technology system, the University of São Paulo (USP), Federal University of São Paulo (UNIFESP) and National Space Research Institute (INPE) have signed an agreement with Oak Ridge National Laboratory (ORNL), an agency of the United States Department of Energy with 27 years of experience in managing environmental and atmospheric data.
The agreement calls for cooperation in terms of knowledge exchanges (sharing of data and open-source software) and exchanges of researchers to foster learning on data management techniques and mechanisms and data integration, modeling and visualization. All these modes of collaboration were discussed at the Fifth Workshop on Data Science – Challenges in the Brazilian Context to Promote Atmospheric Data Management held in February at the University of São Paulo’s Engineering School (POLI-USP).
The agreement will support the development of a computational system and services to enable Brazilian institutions to manage and curate atmospheric data, starting with the data collected as part of projects supported by FAPESP, such as the Green Ocean Amazon scientific campaign (GOAmazon) and the project “Aerosol and cloud life cycles in Amazonia”, linked to FAPESP’s Research Program on Global Climate Change (RPGCC).
“An estimated 80% of all the atmospheric data collected isn’t published, and most of it is obtained with public money, so it’s public data. Advances in the management of this knowledge to disseminate it and make it accessible to all scientists mean optimizing the use of these resources and contributing to scientific progress so that more researchers and society as a whole can access this valuable information,” said Paulo Artaxo, a researcher at the University of São Paulo’s Physics Institute (IF-USP) and a member of the RPGCC steering committee.
The initiative is intended to remove a major obstacle to research in this field. Data collection outpaces data analysis worldwide, as investment in infrastructure and technological advances have boosted collection, but big data analytics is time-consuming and requires a large amount of human capital. This is the focus for POLI-USP’s Big Data Research and Extension Group, which organized the workshop.
“Brazil’s sovereignty is linked to the capacity to manage this data, make discoveries based on it, and develop our own infrastructure. Our atmospheric science is very advanced, but we also need training and infrastructure to manage large amounts of atmospheric data. This involves integrated management and curating, from acquisition and processing to publication and analysis,” said Pedro Luiz Pizzigatti Corrêa, a professor in the Computer Engineering and Digital Systems Department at the University of São Paulo’s Engineering School (POLI- USP) and coordinator of the agreement.
According to Corrêa, the formalization of the agreement with ORNL will enable Brazilian scientists to use data from other atmospheric data collection projects and develop appropriate local infrastructure. “The goal is to bring these competencies to Brazil, creating a computational infrastructure that suits our research environment while at the same time training researchers to manage it,” he said.
For Artaxo, data management and sharing are important to all scientific disciplines. The agreement is a major step forward in facilitating the optimization of data use. “Huge amounts of environmental data are collected these days, and tools for big data visualization and access are absolutely necessary,” he said. “The data is important, sometimes historical, but because of the huge volume it can’t be analyzed and published by any single researcher or research group, so it ends up abandoned in a drawer or on the hard disk of some computer. It’s unique and funded by public money, yet it will probably be lost in ten or 20 years,” said Artaxo, who is also a lead author of several reports produced by the UN Intergovernmental Panel on Climate Change.
The agreement is intended to make the data accessible to everyone, as is already the case, for example, for PRODES, INPE’s Amazon deforestation database, which holds 30 years’ worth of Amazon Rainforest monitoring data.
“The objective is integration. Collaboration will enable us to aggregate data of different kinds, from atmospheric emissions and climate change to weather and deforestation. So it will be possible, for example, to find out on just one platform whether there were a lot of forest fires in a year with high pollution levels,” said Luciana Varanda Rizzo, a professor at UNIFESP and a member of the GOAmazon team.
Another advantage of the initiative is that it encourages interdisciplinarity. “The volume of data is too enormous for only a few people to work on, so data sharing will enable researchers in other areas to compute correlations that help determine the impact of pollution on the population of a particular species, for example,” Rizzo said.
Under the agreement, ORNL’s Atmospheric Radiation Measurement (ARM) Data Science and Integration Group will provide high-performance computing resources for atmospheric data management, sharing and visualization by researchers at USP, UNIFESP and INPE to operationalize a center of their own, with large databases and artificial intelligence for the productive use of the data collected.
Since 2015, most of the data collected by GOAmazon have been held by ARM, with the rest going to a repository run by the Max Planck Institute for Chemistry (MPIC) in Germany. “We’ve worked with USP for the past five years on data sharing, open data standards and data management. Over the next three years we’ll work initially on knowledge sharing and the development of several tools. The agreement we’ve entered into will also provide for student mobility for training purposes,” said Giri Prakash, Director of the ARM Data Center.
At present, research project data are managed solely by the researchers involved. “The huge amount of data being collected means not all researchers have the expertise, infrastructure and/or time required to make it available, so tools are being built to facilitate this aspect of their work,” Artaxo said.
Since 2017, FAPESP has required applicants for funding to attach a data management plan to their proposals, specifying the data that will be produced and how it will be managed, shared and conserved.
Agência FAPESP licenses news reports under Creative Commons license CC-BY-NC-ND so that they can be republished free of charge and in a straightforward manner by other digital media or by print media. The name of the author or reporter (when applied) must be cited, as must the source (Agência FAPESP). Using the button HTML below ensures compliance with the rules described in Agência FAPESP’s Digital Content Republication Policy.