The insidious science of deceiving artificial intelligence / SurprizingFacts

At the beginning of the 20th century, Wilhelm von Austin, the German horse trainer and mathematician, announced to the world that he had taught the horse to count. Years von Austin traveled through Germany with a demonstration of this phenomenon. He asked his horse, nicknamed Clever Hans (Orlov trotter breed), to count the results of simple equations. Hans responded by stomping his hoof. Two plus two? Four hits.

But scientists did not believe that Hans was as smart as von Austin claimed. Psychologist Karl Stumpf conducted a thorough investigation, which was dubbed the "Hansov Committee". He found that Smart Hans does not solve the equation, but reacts to visual signals. Hans tapped with his hoof until he got to the right answer, after which his coach and the enthusiastic crowd burst into screams. And then he just stopped. When he did not see these reactions, he continued to knock.

Informatics can learn a lot from Hans. The accelerating pace of development in this area suggests that most of the AI ​​that we created has been trained enough to give the right answers, but does not really understand the information. And it's easy to deceive.

The machine learning algorithms quickly turned into the all-seeing shepherds of the human herd. The software connects us to the Internet, monitors spam and malicious content in our post, and soon our cars will be driven. Their deception shifts the tectonic foundation of the Internet, and threatens our security in the future.

Small research groups – from the Pennsylvania State University, from Google, from the US military department – are developing plans to protect against potential attacks on AI. Theories put forward in the study say that the attacker can change what the "robot" sees. Or activate voice recognition on the phone and get him to go to a malicious site using sounds that will be for the person only noise. Or let the virus seep through the firewall.

Left – the image of the building, to the right – a modified image that the deep neural network refers to ostriches. In the middle, all the changes applied to the primary image are shown.

Instead of intercepting control over the control of the robotic vehicle, this method shows him something of a hallucination – an image that is actually not there.

Such attacks use images with a dirty trick [adversarial examples – устоявшегося русского термина нет, дословно получается нечто вроде «примеры с противопоставлением» или «соперничающие примеры» – прим. перев.]: images, sounds, text, looking normal to people, but perceived in a completely different way by machines. The small changes made by the attackers can cause the deep neural network to draw the wrong conclusions about what it demonstrates.

"Any system that uses machine learning to make decisions critical to security is potentially vulnerable to this type of attack," – Says Alex Kanchelian, a researcher at the University of Berkeley, studying attacks on machine learning using deceptive images.

Knowing these nuances in the early stages of developing AI gives researchers a tool to understand the methods of These shortcomings. Some have already done this, and they say that their algorithms have also become more effective because of this.

Most of the main stream of AI research is based on deep neural networks, which in turn are based on a vast field of machine learning. MO technologies use differential and integral calculus and statistics to create the software used by most of us, such as spam filters in mail or Internet searches. Over the past 20 years, researchers have begun to apply these techniques to a new idea, neural networks – software structures that mimic the work of the brain. The idea is to decentralize the calculations for thousands of small equations ("neurons") that receive data that process and transmit them further to the next layer of thousands of small equations.

These AI algorithms are trained the same way as In the case of the MO, which, in turn, copies the process of learning rights. They are shown examples of different things and related tags. Show the computer (or the child) the image of the cat, say that the cat looks like this, and the algorithm will learn to recognize the cats. But for this, the computer will have to view thousands and millions of images of cats and cats.

The researchers found that these systems can be attacked by specially selected deceptive data, which they called "adversarial examples."

In the work of 2015, researchers from Google showed that deep neural networks can be made to refer this image of panda to gibbons.

"We show you photos, On which the school bus is clearly visible, and we are forced to think that this is an ostrich, "says Ian Goodfellon [Ian Goodfellow]a researcher from Google, actively working in the field of similar attacks on neural networks.

Changing the images provided to neuronets only by 4%, the researchers were able to deceive them by making a mistake with the classification in 97% of cases. Even if they did not know how the neural network processes images, they could fool it in 85% of cases. The last option of deception without data about the architecture of the network is called the "attack on the black box". This is the first documented case of a functional attack of this kind on a deep neural network, and its importance lies in the fact that approximately in this scenario, attacks can take place in the real world.

In the work, researchers from Pennsylvania State University, Google and the Research Laboratories under the US Navy conducted an attack on a neural network that classifies images supported by the MetaMind project and serves as an online tool for developers. The team built and trained the attacked network, but their attack algorithm worked independently of the architecture. With such an algorithm, they were able to deceive the neural network with an accuracy of 84.24%.

The upper row of photos and signs is correct recognition of signs.
The lower row – the network was forced to recognize the signs completely wrong.

Feeding machines with incorrect data – the idea is not new, but Doug Taygar [Doug Tygar]a professor at the University of Berkeley, who studied machine learning with opposition for 10 years , Says that this technology of attacks has evolved from simple MO to complex deep-seated neural networks. Malicious hackers have used this technique on spam filters for years.

The Tiger study originates from the 2006 work on this kind of attacks on a network with MO, which it expanded in 2011 with researchers from the University of California at Berkeley and Microsoft Research. The Google team, which first began using deep-seated neural networks, published its first work in 2014, two years after the discovery of the possibility of such attacks. They wanted to make sure that this is not an anomaly, but a real possibility. In 2015, they published another work that described how to protect networks and improve their efficiency, and Ian Goodfellow has since advised on other scientific work in this area, including the attack on the black box.

Researchers Call a more general idea of ​​unreliable information "Byzantine data", and thanks to the course of research they have come to deep learning. The term comes from the well-known "task of Byzantine generals," a thought experiment in the field of informatics, in which a group of generals must coordinate their actions with the help of messengers, not knowing at the same time which of them is a traitor. They can not trust the information received from their colleagues.

"These algorithms are designed to cope with random noise, but not with Byzantine data," Taygar says. To understand how such attacks work, Goodfellow suggests imagining a neural network in the form of a scatter diagram.

Each point in the diagram represents one pixel of the image processed by a neural network. Usually, the network tries to draw a line through data that best corresponds to the aggregate of all points. In practice, this is a bit more complicated, because different pixels have different value for the network. In reality, this is a complex multidimensional computer-processed graph.

But in our simple analogy of the dispersion diagram, the shape of the line drawn through the data determines what the network thinks it sees. To successfully attack such systems, researchers need to change only a small part of these points, and force the network to make a decision that is really not there. In the example with a bus that looks like an ostrich, a photo of a school bus is dotted with pixels arranged according to a scheme associated with the unique characteristics of photos of ostriches, familiar to the network. This is an invisible eye contour, but when the algorithm processes and simplifies the data, the extreme data points for the ostrich seem to be a suitable classification option for it. In the version with the black box, the researchers checked the work with different input data to determine how the algorithm sees certain objects.

By giving the object classifier spurious input data and studying the decisions made by the machine, the researchers were able to restore the algorithm's work so as to deceive Image recognition system. Potentially, such a system in robotic vehicles in this case can instead see the sign "stop" to see the sign "give way". When they realized how the network works, they could make the machine see anything. 19459004

An example of how an image classifier spends different lines depending on different Objects in the image.

The researchers say that such an attack can be introduced directly into the image processing system bypassing the camera, or these manipulations can be carried out with a real sign.

But security specialist from Columbia University Alison Bishop says that such a forecast is unrealistic, and depends on the system used in the robotic car. If attackers already have access to a stream of data from the camera, they can already issue any input data.

"If they can get to the camera entrance, such complexity is not needed," she says. "You can just show her the stop sign."

Other methods of attack, except bypassing the camera-for example, drawing visual marks on a real sign, seem to Bishop to be slightly likely. She doubts that low resolution cameras used on robotic cars will generally be able to distinguish between small changes on the sign.

The untouched image on the left is classified as a school bus. Corrected right – like an ostrich.

Two groups, one at the University of Berkeley and the other at the University of Georgetown, successfully developed algorithms capable of delivering speech commands to digital assistants like Siri and Google Now, sounding like an illegible noise. For a person, such commands seem random noise, but they can give commands to devices like Alexa, not for their owners.

Nicholas Carlini, one of the researchers of Byzantine audio attacks, says that in their tests they were able to activate recognizable Audio programs with open source, Siri and Google Now, with an accuracy of more than 90%.

Noise is similar to any negotiations of aliens from science fiction. It is a mixture of white noise and a human voice, but it is not at all like a voice command.

According to Carlini, with such an attack, any phone that has heard a noise (while it is necessary to plan attacks on iOS and Android separately) To a web page that also plays noise, which will also infect nearby phones. Or this page can quietly download a malicious program.

Such attacks can occur because the machine is trained to ensure that virtually any data contains important data, And that one thing is more common than others, as explained by Goodfellow.

Deceiving the network by making her believe that she sees a common object is easier, because she believes she should see such objects more often. So Goodfellow and another group from the University of Wyoming were able to get the network to classify images that did not exist at all-it identified objects in white noise, randomly generated black and white pixels.

In Goodfellow's study, random white noise transmitted through a network , It was classified as a horse. This, incidentally, brings us back to the story of Clever Hans, a not very mathematically gifted horse.

Goodfellow says that neural networks, like Smart Hans, do not actually learn any ideas, but only learn to recognize , When they find the right idea. The difference is small, but important. The lack of fundamental knowledge facilitates malicious attempts to recreate the appearance of finding the "correct" results of the algorithm, which in actual fact turn out to be false. To understand what something is, the machine also needs to understand what it is not.

Goodfellow, after training the network-sorting network on both natural images and processed (fake), found that he could not only reduce the efficiency Of these attacks by 90%, but also to make the network better cope with the original task.

"Forcing to explain really unusual fake images, we can achieve an even more reliable explanation of the underlying concepts," says Goodfellow.

Two groups of audio researchers used an approach similar to that of the Google team, protecting their neural networks from their own attacks by overtraining. They also achieved similar successes, reducing the effectiveness of the attack by more than 90%.

It is not surprising that this field of research was of interest to the US military department. The Army Research Laboratory even sponsored two newest works on this topic, including an attack on a black box. And although the agency finances research, this does not mean that the technology is going to be used in the war. According to the representative of the department, from studies to soldier-friendly technologies can take up to 10 years.

Anantram Swami [Ananthram Swami]a researcher from the US Army Laboratory took part in the creation of several recent works devoted to deception of AI. The army is interested in the issue of detecting and stopping fraudulent data in our world, where not all sources of information can be thoroughly checked. Swami points to a set of data obtained from public sensors located in universities and working in open source projects.

"We do not always monitor all data. It is quite easy for our enemy to deceive us, "says Swami. "In some cases, the consequences of such deception can be frivolous, in some, on the contrary."

He also says that the army is interested in autonomous robots, tanks and other means of transportation, so the purpose of such research is obvious. By studying these issues, the army can win a head start in developing systems that are not subject to attacks of this kind.

But any group using a neural network should have concerns about the potential for attacks with AI deception. Machine learning and AI are in an embryonic state, and at this time security blunders can have terrible consequences. Many companies trust highly sensitive information to AI systems that have not passed the test of time. Our neural networks are still too young for us to know about them all that is needed.

A similar oversight led to the fact that Microsoft's Twitter bot, Tay, quickly turned into a racist with a penchant for genocide. The flow of malicious data and the "repeat after me" function caused Tay to deviate from the intended path. Bot was deceived by substandard input, and this serves as a convenient example of poor implementation of machine learning.

Kanchelian says he does not believe that the possibilities of such attacks have been exhausted after the successful research of the team from Google.

The field of computer security, attackers are always ahead of us, "says Kanchelian. "It's pretty dangerous to say that we solved all the problems with tricking neural networks with their re-training."