Participants at the São Paulo School of Advanced Science on Learning from Data warn that because minorities have less access to services that generate data, they tend to be underrepresented in databases used for machine learning projects (photo: Sérgio Andrade)

Mathematics as a tool to make the use of big data more representative
2019-08-28
PT ES

Participants at the São Paulo School of Advanced Science on Learning from Data warn that because minorities have less access to services that generate data, they tend to be underrepresented in databases used for machine learning projects.

Mathematics as a tool to make the use of big data more representative

Participants at the São Paulo School of Advanced Science on Learning from Data warn that because minorities have less access to services that generate data, they tend to be underrepresented in databases used for machine learning projects.

2019-08-28
PT ES

Participants at the São Paulo School of Advanced Science on Learning from Data warn that because minorities have less access to services that generate data, they tend to be underrepresented in databases used for machine learning projects (photo: Sérgio Andrade)

 

By Maria Fernanda Ziegler   |   Agência FAPESP – The data revolution has made many advances possible. Concepts such as big data, machine learning and artificial intelligence have become commonplace, and in practice, they have permitted the development of countless technological solutions for use in decision support. Mathematics is seen as a key tool for breaking down barriers in this process, minimizing biases and avoiding skewed statistical analysis.

The topic was discussed by experts during a panel session that opened the São Paulo School of Advanced Science on Learning from Data, which was held on July 29-August 9, 2019, at the University of São Paulo’s conference center (CDI-USP). The event was supported by FAPESP and organized by the university’s Mathematics and Statistics Institute (IME-USP).

“Mathematics generates rigorous inputs, making everything clearer. Nowadays anyone can generate, host and consume data. The world is changing a lot in terms of how data is generated. However, minorities tend to be left out of the data that’s generated. In the US, we can see this in hospitals, for example,” said Ling Liu, a professor at Georgia Institute of Technology (GeorgiaTech), told Agência FAPESP.

According to Liu, some social groups lack access to services and are therefore unrepresented in the data used as a basis for machine learning in medical diagnostics, for example. “It’s not done on purpose, but there are biases. The diagnostic model isn’t suited to these people, and they become ‘rare cases’,” she said.

For Liu, mathematics and data science in conjunction can help society, young people and professionals change this new reality. “With math and the tools of data science, you can make data into scientific products. Data science can help you think more critically instead of just looking for statistical phenomena,” she said.

In this context, data scientists need to be aware that bias in data generation produces skewed statistics and other incorrect outputs. “It’s more or less the same as in the physical world, where governments and companies drive up participation by women and minorities in order to display more diversity in their activities and avoid prejudice. What happens digitally with data is analogous. You need to use mathematical tools to look into the data, see where it comes from and find out if it’s biased,” Liu said.

André Carvalho, Full Professor at USP’s Mathematics and Computer Sciences Institute (ICMC), made similar comments. “Data science needs much more science than data. Data science uses mathematics to make sense of the data,” he said.

Of the 642 researchers who applied to attend, the São Paulo School of Advanced Science on Learning from Data selected 150 researchers from 19 countries. The program included 11 short courses and five keynote lectures covering the main aspects of data science.

“We aimed above all to present an integrated vision of the four areas of computer science – database management, machine learning, high-performance computing, and image processing – all of which have math as their foundation,” said João Eduardo Ferreira, who chaired the organizing committee. This integration will be vital in surmounting the challenges posed by the new age of computer systems, he added.

Given the transversality of data science, the School also included other knowledge areas associated with applications of data science techniques to problems in astronomy, economics, genetics, image processing, and data management.

Theory and practice

Also addressing the societal impact of the data revolution, Yaser Said Abu-Mostafa, a professor at California Institute of Technology (Caltech) and author of the book Learning from Data, which inspired the name of the School, noted that data science has evolved differently from other knowledge areas.

“In many fields of knowledge, the current complaint is that plenty of theory has been established but it takes a long time to put things into practice. In data science, the opposite happens. What we have now are major practical achievements in classical fields such as computational fusion, natural language processing and speech recognition. These fields are backed by many years of dedication and methods applied to them, and they have now become businesses,” he said.

For Abu-Mostafa, this context poses many challenges. “Math isn’t a luxury. It’s our way of understanding how things are and being able to make progress in order to achieve even more,” he said.

Time is a constant that operates as a key factor in the societal impact of data science. “The problem is that major changes no longer take even a decade to occur, so turbulence in the social arena is much more significant. Take self-driving cars and the number of professional drivers in the world, for example. In about five years, autonomous cars will be part of our routine. I’m not just talking about the big issue of jobs. It’s also a matter of human interaction. Everyone has a best friend, which is their own smartphone,” he said.

Applications in many knowledge areas

Since 2010, FAPESP has supported 76 international events as São Paulo Schools of Advanced Science (SPSAS), whose mission is to attract young scientists to São Paulo and contribute to the creation of an internationalized environment within the state’s universities.

At the opening of the São Paulo School of Advanced Science on Learning from Data, FAPESP President Marco Antônio Zago emphasized the importance of the topic. “Data science is a rational way to collect, classify and organize the data being generated in ever-growing quantities, thanks to technical progress. It’s also an opportunity to explore the integration of different areas and to create means of extracting knowledge and information from data,” he said.

Professor Vahan Agopyan, Rector of the University of São Paulo who also attended the opening, noted that data science is a strategic research field. “It’s satisfying to host a School that discusses and promotes these ideas,” he said.

Also present at the opening, Patricia Ellen da Silva, São Paulo State Secretary for Economic Development, highlighted the state government’s commitment to investing in science and technology to promote development in São Paulo via FAPESP and public universities.

“We believe we can only grow economically and socially through this kind of investment, which can also help us become global players in innovation,” she said. “All these technologies, which are driving a new revolution in such different areas as healthcare and smart cities, are interconnected. Data science is increasingly important to us all.”

Roberto Marcondes Cesar Junior and Claudia Maria Bauzer Medeiros, members of the steering committee for the FAPESP Research Program on eScience and Data Science, also attended the opening of the School. For more information, visit: sites.usp.br/datascience/spsas-learning-from-data.

 

  Republish
 

Republish

The Agency FAPESP licenses news via Creative Commons (CC-BY-NC-ND) so that they can be republished free of charge and in a simple way by other digital or printed vehicles. Agência FAPESP must be credited as the source of the content being republished and the name of the reporter (if any) must be attributed. Using the HMTL button below allows compliance with these rules, detailed in Digital Republishing Policy FAPESP.