The Fourth Paradigm debates the challenges of eScience, the new area dedicated to dealing with the immense volume of information that characterizes science today

Challenges of the "data tsunami"
2011-11-23

If the lack of data limited advances in science a few years ago, the problem has reversed itself today. Development of new data-collecting technology in a plethora of areas and scales has generated a volume of information so immense that its excess has become a bottleneck to scientific advancement.

Challenges of the "data tsunami"

If the lack of data limited advances in science a few years ago, the problem has reversed itself today. Development of new data-collecting technology in a plethora of areas and scales has generated a volume of information so immense that its excess has become a bottleneck to scientific advancement.

2011-11-23

The Fourth Paradigm debates the challenges of eScience, the new area dedicated to dealing with the immense volume of information that characterizes science today

 

By Fábio de Castro

Agência FAPESP
– If the lack of data limited advances in science a few years ago, the problem has reversed itself today. Development of new data-collecting technology in a plethora of areas and scales has generated a volume of information so immense that its excess has become a bottleneck to scientific advancement.

Computer scientists have come together with specialists from different areas to develop new concepts and theories to deal with the torrents of contemporary datasets. The resulting field is called eScience.
The topic is discussed in the book O Quarto Paradigma – Descobertas científicas na era da eScience (The Fourth Paradigm—Scientific discoveries in the era of eScience), released in Portuguese on November 3rd by the Microsoft Research-FAPESP IT Research Institute.

Coordinated by Tony Hey, Stewart Tansley, Kristin Tolle – all from Microsoft Research – the publication was released at the FAPESP headquarters in an event attended by the Foundation’s scientific director, Carlos Henrique de Brito Cruz.

Roberto Marcondes Cesar Jr. from the Mathematics and Statistics Institute (IME) at the Universidade de São Paulo (USP) presented a lecture at the event on eScience in Brazil. Another lecture entitled “The Fourth Paradigm: advanced data-intensive computing in the advancement of scientific discovery” was given by Daniel Fay, MSR director of Earth, Energy and Environment.

Brito Cruz highlighted FAPESP’s interest in stimulating the development of eScience in Brazil. “The idea is very important to FAPESP because many of our projects and programs present this need for increased dataset management capacity. Our challenge is in the science behind this capacity to deal with large volumes of data,” he said.

Initiatives such as The FAPESP Research Program on Global Climate Change - (RPGCC), BIOTA-FAPESP and the FAPESP Bioenergy Research Program (BIOEN) are examples of programs that must integrate and process immense volumes of data.

“We know that science makes advances when new instruments come into use. On the other hand, scientists normally don’t see the computer as a new tool for revolutionizing science. FAPESP is interested in actions to help the scientific community become conscious of the many challenges in the field of eScience,” said Brito Cruz.

The book comprises a collection of 26 technical essays divided into four sections: Earth and Environment, Health and Well-being, Scientific Infrastructure and Academic Communication.

“The book speaks to the emergence of a new paradigm for scientific discoveries. Thousands of years ago, the established paradigm was experimental science, founded in the description of natural phenomena.  A few hundred years ago, the paradigm of theoretic science emerged, symbolized by the laws of Newton. A few decades ago, computer science emerged, simulating complex phenomena. Now we have arrived at the fourth paradigm, that of science oriented by data,” said Fay.

According to him, with the advent of the new paradigm came a complete change in the nature of scientific discovery. Complex models with ample spatial and temporal scales came into play requiring more multidisciplinary interaction all the time.

“Data in unbelievable quantity are coming from different sources and also need to be dealt with in a multidisciplinary approach, oftentimes in real time. The scientific communities are also more spread out. All this transformed the way in which discoveries are made,” said Fay.

Ecology, one of the areas highly affected by the large volumes of data, is an example of how scientific advancement will depend more and more on collaboration between academic researchers and computing specialists. 

“We live in a storm of remote sensoring, inexpensive land sensors and internet data access. But extracting the variables that science needs from this heterogeneous mass of data continues to be a problem. We need specialized knowledge of algorithms, file formats and data cleansing, for example, which isn’t always accessible to ecologists,” he explained.

The same thing happens in areas such as medicine and biology—that benefit from new technologies such as brain activity registering or DNA sequencing—or astronomy and physics, where modern telescopes capture terabytes of information every day and the Large Hadron Collidor (LHC) generates petabytes of data every year.

Virtual Institute

According to Cesar Jr., the Brazilian eScience community is growing. There are 2,167 courses in information systems or computer engineering and sciences in the country. He said that 45,000 undergraduate degrees were given in these areas in 2009 and that between 2007 and 2009, there were 32 post-graduate courses, 1,000 advisors, 2,705 master’s degree students and 410 doctoral students.  

“Science changed from the data acquisition paradigm to data analysis. We have different technologies that produce terabytes in many fields of knowledge and today we can say that these areas are focused on the analysis of a deluge of data,” the FAPESP Computer Science and Engineering board member commented.

In 2006, the Brazilian Computer Society (SBC) organized an event to identify the key problems and main challenges in the field. The meeting led to a number of proposals that the National Council for Scientific and Technological Development (CNPq) create a specific program for this type of problem.

“We held a series of workshops at FAPESP in 2009 that brought together scientists from the fields of agriculture, climate change, medicine, transcriptomics, games, electronic government and social networks to discuss the question. The initiative resulted in excellent collaborations between groups of scientists with similar problems and started up many initiatives,” said César Jr.

According to him, the Microsoft Research-FAPESP Institute for IT Research’s calls for proposals have been and important part of the group of initiatives to promote eScience, along with the organization of the São Paulo School in Computer Image Processing and Visualization. FAPESP has also supported many research projects related to the topic.

“The São Paulo eScience community has worked with professionals from many areas and published articles in many periodicals. This is an indication of the level of quality reached by the community to take on the great challenge we will have in coming years,” said César Jr.

The Fourth Paradigm: http://research.microsoft.com/en-us/collaboration/fourthparadigm
 

  Republish
 

Republish

The Agency FAPESP licenses news via Creative Commons (CC-BY-NC-ND) so that they can be republished free of charge and in a simple way by other digital or printed vehicles. Agência FAPESP must be credited as the source of the content being republished and the name of the reporter (if any) must be attributed. Using the HMTL button below allows compliance with these rules, detailed in Digital Republishing Policy FAPESP.