Researchers have developed a solution that recognizes pornographic content by using a combination of machine learning techniques and analysis of static and moving images

New method identifies 97% of pornography on smartphone and computer screens
2017-04-19

Researchers have developed a solution that recognizes pornographic content by using a combination of machine learning techniques and analysis of static and moving images.

New method identifies 97% of pornography on smartphone and computer screens

Researchers have developed a solution that recognizes pornographic content by using a combination of machine learning techniques and analysis of static and moving images.

2017-04-19

Researchers have developed a solution that recognizes pornographic content by using a combination of machine learning techniques and analysis of static and moving images

 

By Peter Moon  |  Agência FAPESP – One pressing problem resulting from the universal use of cell phones and the Internet is indiscriminate access to sites and videos with pornographic content. Adult users may access such content voluntarily, but allowing minors access to pornographic content is inadvisable.

Access may be either involuntary or inadvertent. Involuntary access occurs when a user receives unwanted emails or spam containing advertisements with obscene content, while inadvertent access occurs when a user visits a site that has been co-opted by hackers who have posted pornographic content on that site.

The information technology industry and content providers are constantly looking for new methods to filter undesirable content in real time with the goal of blocking its transmission or display.

Law enforcement agencies find such systems valuable because they can improve the speed and performance of processes that trace and combat child pornography, for example. Business organizations and schools also benefit from advances in this technology because it allows them to prevent their computers from accessing content considered offensive.

In search of a system with these characteristics, in 2012, scientists at the Samsung Research Institute of Brazil contacted the University of Campinas’s Computer Science Institute (IC-UNICAMP) in São Paulo State, Brazil.

“They were looking for a solution that could be installed on smartphones, smart TVs and computers so that consumers with children could block access to sensitive content – either upon buying the device or at any time before use by a child,” said Anderson Rocha, a professor at IC-UNICAMP and principal investigator for the project.

The partnership between the university and SRI-Brazil led to the development in 2015 of a system based on machine learning (or artificial intelligence) and capable of filtering out over 90% of the pornographic content available on any device. The new technology was co-patented by Samsung and UNICAMP.

The researchers continued to look for new ways to detect an even higher percentage of sensitive content, whether pornographic or violent. The solution deemed to produce the best results was based on analysis of a combination of static information from photographs and dynamic information from videos.

“Today we have a very efficient filter that identifies over 97% of the pornographic content available,” said Anderson Rocha, currently on sabbatical at Nanyang Technological University in Singapore. “That’s more than the solutions currently considered state of the art, whose efficiency is often in the 87%-94% range.”

Rocha and other researchers affiliated with IC-UNICAMP and Samsung recently published an article in the journal Neurocomputing that describes in detail how the new technology was developed. The research project was supported by FAPESP.

The new method proposed by the researchers combines static and motion information with an artificial intelligence technique known as deep learning.

“This methodology uses deep learning based on non-linear representations of data. An image, for example, can be represented as a vector of pixel values that denote various linear transformations and can capture information relating to the pixels' neighborhoods, shapes, edges or other features,” Rocha said.

In deep learning, the idea is to look for better representations at each level of learning, typically represented by a multilayer network, and to create models that can learn these representations on a large scale. Some of the representations were inspired by advances in neuroscience.

Sex and violence

The first systems invented to detect pornography began by identifying nudity and then defined thresholds for acceptable physical exposure. All content that exceeded a threshold was considered pornography and filtered out. These solutions typically compared skin characteristics such as color and texture and human body geometry characteristics.

However, the results were often unsatisfactory. These early systems filtered or blocked more content than necessary. The problem is that not all images containing large expanses of human skin are sexual images: they may show people sunbathing, swimming, running or wrestling, for example.

A more advanced solution would be able to filter adult content based on a list of words classified according to descriptions of what is permitted and what is pornographic.

An intermediate stage would include a description of the image between the initial data extraction of the content to be filtered and its classification for permission or blocking. However, this method would not be able to handle ambiguity by distinguishing between, for example, medical images and pornographic images.

For videos, the researchers at UNICAMP believe they can address the ambiguity problem by including an additional element that classifies motion information extracted over time.

The solution they developed extracts one frame per second from each video accessed in real time via a smartphone or computer. The frames with static images are then analyzed by the permitted/pornographic description classification method, while at the same time, the sequence of frames supplies the elements required for an analysis of the movements of the objects and people present in the scene. The video can then be blocked depending on the type of motion detected.

According to Rocha, this method has been tested using a dataset containing approximately 140 hours of video, including 1,000 pornographic videos and 1,000 non-pornographic videos, each lasting between 6 seconds and 33 minutes.

The pornographic videos involved actors from various ethnic groups. Animated videos (or cartoons) were also considered. The non-pornographic videos included scenes of bathers on the beach and in clubs as well as wrestling scenes. Using this methodology, the team at IC-UNICAMP succeeded in raising the accuracy of pornography filtering to 97%.

“The level falls to 90% for content involving child pornography and to 80% for content containing violent scenes because these situations are much harder to filter,” Rocha said. “Both types of content are part of the team’s future research efforts. The filter detects when the undesirable content begins and when it ends. It blocks display of the content as soon as the pornography starts and frees it when the banned content ends”.

“In a forensic context the system can analyze, for example, 30 hours of videos on a user's hard disk to detect a half-hour of child pornography, which is sufficient evidence to prosecute [the user].”

The new motion analysis technology has several other applications besides tracing and blocking pornography. The researchers are now starting to use the method to analyze scenes of violence in street demonstrations, for example.

“It’s already possible to trace an individual in a crowd just by analyzing gait,” Rocha said. Data analysis can be used for an impressive range of learning today.”

The article “Video pornography detection through deep learning techniques and motion information,” (doi: http://dx.doi.org/10.1016/j.neucom.2016.12.017) by Mauricio Perez, Sandra Avila, Daniel Moreira, Daniel Moraes, Vanessa Testoni, Eduardo Valle, Siome Goldenstein and Anderson Rocha, can be retrieved from: sciencedirect.com/science/article/pii/S0925231216314928.

 

  Republish
 

Republish

The Agency FAPESP licenses news via Creative Commons (CC-BY-NC-ND) so that they can be republished free of charge and in a simple way by other digital or printed vehicles. Agência FAPESP must be credited as the source of the content being republished and the name of the reporter (if any) must be attributed. Using the HMTL button below allows compliance with these rules, detailed in Digital Republishing Policy FAPESP.