The system could help forecast the impact of communication actions in networks such as Twitter and Facebook

Brazilian scientists develop a social media simulator
2013-06-26

The system could help forecast the impact of communication actions in networks such as Twitter and Facebook.

Brazilian scientists develop a social media simulator

The system could help forecast the impact of communication actions in networks such as Twitter and Facebook.

2013-06-26

The system could help forecast the impact of communication actions in networks such as Twitter and Facebook

 

By Elton Alisson

Agência FAPESP – The dissemination power and the speed of propagation of information in social media has sparked interest among companies and organizations in communicating on platforms such as Twitter and Facebook.

One of the challenges when making the decision to do so, however, is forecasting the impact that these campaigns will have in social media, because they have a highly viral effect—the information spreads very quickly, making it difficult to estimate the repercussions.

“If a person previously spread information by word of mouth to three or four people, now they have an audience that could reach thousands of followers via the internet. That’s why it is difficult to forecast the impact of an action in social media,” commented Claudio Pinhanez, leader of the systems and services research group at IBM Research - Brazil, the Brazilian research lab of the U.S.-based tech giant.

To attempt to find a response to this challenge, the group began a project in partnership with researchers from the Computer Department of the Universidade de São Paulo’s Mathematics and Statistics Institute (IME–USP), focused on developing a simulator capable of forecasting the impact of communication actions in social media based on the behavioral patterns of users.

The initial results of the project were presented at the beginning of May during the 14th International Workshop on Multi-Agent-Based Simulation, held in Saint Paul, Minnesota and later at the Latin American eScience Workshop 2013, which was held on May 14 and 15 at the Espaço Apas in São Paulo.

The second event, promoted by FAPESP and Microsoft Research, brought researchers and students from Europe, South America, North America, Asia and Oceania to discuss the advances in several areas of knowledge that have been made possible by the improvement in the capacity for analysis of large volumes of information produced by research projects.

According to Pinhanez, to develop an initial method to model and simulate the interactions among social network users, the researchers collected the messages published by 25,000 people on the Twitter feeds of U.S. President Barack Obama and his political opponent Mitt Romney in October 2012, the last month of the recent presidential campaign in the United States.

The researchers analyzed the content of the messages and the behavior of users on the networks of Obama and Romney to identify standard actions, how often messages were posted, whether they were positive or negative and the influence of these messages on other users.

Based on this set of data, the researchers developed a model for simulation of agents. Under the system, each user is represented by individual computer programs (run in an integrated and simultaneous manner) that indicate the probability of action on the network of each of these people, indicating the most probable time of the day when the individual will post a positive or negative message, based on their behavioral history.

One of the findings of the experiments with the simulator was that removing the ten most engaged users in the discussions on the president’s Twitter feed would have more of an impact on the social network than if Obama were excluded.

“These results are preliminary, and we still don’t have a way of confirming that they are valid because the model is still the initial model and is very simple. They serve, however, to show that the model is capable of showing interesting situations and that when ready it will be very useful for testing a hypothesis and responding to questions such as whether the frequency with which President Obama publishes a message affects his social network,” said Pinhanez.

IBM already had a system that allowed analysis of the “sentiment”—which is how the tone of messages is classified—of large volumes of text in English and the continuous flow (in real time) of information. The company intends to improve the system and make it available in Brazil.

“We are working to offer a series of technologies and adapt them to the Portuguese language and Brazilian culture because Brazil is the second most engaged country on social networks worldwide, behind only the United States,” affirmed Pinhanez.

According to the researchers, one of the main challenges for analysis of the sentiment of messages published on social networks in Brazil is that Portuguese in these new media tends not to follow standard Portuguese, and this is not necessarily related to the fact the user does not know the language.

“There are conventions about how to write cool things on social networks,” comments Pinhanez. Because of this, one of the challenges in Brazil will be to incorporate the new vocabulary emerging from these forums.

“The language used in Twitter is much more natural. There are many expressions and variations of words, which makes the classification of messages much more complicated. Sometimes there is not enough information to guarantee whether a tweet is, in fact, positive or negative because there is no label that allows for comparison. For this reason, many of these messages must be labeled manually,” commented Samuel Martins Barbosa Neto, a doctoral student at IME and participant in the project.

Another challenge is to extract data from social networks. In the beginning, access to the message data on networks such as Twitter was completely open. Today it is limited. Furthermore, the information generated by social networks grows exponentially, imposing upon researchers the challenge of extracting significant samples of major volumes of data to validate their research.

“The Obama network on Twitter must have reached 25 million followers. As we can only extract a small portion of these data, the challenge is to guarantee that they are not biased—representing, for example, only a niche of followers—to generate a valid result,” explained Barbosa Neto.

Research collaboration

Roberto Marcondes Cesar Jr., an IME-USP professor and the adviser for Barbosa Neto’s doctoral thesis, explained that the development project for the social network simulator is the first conducted by his group in collaboration with IBM Research - Brazil.

The IME group has been working for 10 years to develop data analysis tools for use in areas such as biology and medicine, to aid in the discovery of new genes and new genetic networks, for example. Most recently, it began to develop research on the application of mathematical models in the social sciences.

“We entered this area with the intention of applying the same mathematical and computer techniques to situations in which the data come from some human activity, specifically, instead of action by a gene or a protein, for example. We saw the opportunity to use these techniques in social networks, which from an abstract point of view, have many similarities with genetic networks, because they are networks that connect elements,” commented Marcondes Cesar, who is a member of FAPESP’s Adjunct Coordination of Exact Sciences and Engineering and who coordinated the thematic project “Models and methods of e-Science for life and agrarian sciences.”

The article “Large-Scale Multi-Agent-Based Modeling and Simulation of a Microblogging-Based Online Social Network,” by Pinhanez et al., can be read in the proceedings of the 14th International Workshop on Multi-Agent-Based Simulation.

 

  Republish
 

Republish

The Agency FAPESP licenses news via Creative Commons (CC-BY-NC-ND) so that they can be republished free of charge and in a simple way by other digital or printed vehicles. Agência FAPESP must be credited as the source of the content being republished and the name of the reporter (if any) must be attributed. Using the HMTL button below allows compliance with these rules, detailed in Digital Republishing Policy FAPESP.