A Conversation with Rodney Brooks and Gary Marcus
Rod Brooks and Gary Marcus are, simultaneously, two of the progenitors of modern artificial intelligence and robotics and also among their fields’ most persistent critics. Brooks, the cofounder of iRobot and Rethink Robotics, and a former director of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), has for years stressed the limitations of what robots can do. For all the advances of AI, he believes that we will need new ideas if we hope to build robots or self-driving cars that navigate the built, human world without our help. Gary Marcus, a professor of cognitive science at New York University and the founder of Geometric Intelligence, which was acquired by Uber, also thinks modern AI “must be supplemented by other techniques if we are to reach artificial general intelligence.” Recently, the two heretics decided to solve the problems that so occupy their imaginations by founding a company, Robust.ai, that will build brains for practical robots. Jason Pontin, Flagship Pioneering’s senior advisor, met with Brooks and Marcus in Palo Alto at Playground, the investment firm that is incubating the startup.
Jason Pontin: The current boom in artificial intelligence is the result of the remarkable success of a particular technology. What was it, when was it invented, and why has it exploded?
Gary Marcus: The technique is called deep learning. The concept has been around, although not under that name, since the '40s. The basic idea is supervised learning. How it works is this: A so-called neural network is fed examples of something, those examples are matched to labels that the net has been trained upon, and the procedure repeatedly adjusts the weights of the connections in the network to minimize the difference between the actual output of the net and the desired output: a procedure called back propagation. Today, the network has what we call hidden layers—hence, “deep learning”—in between the input and the output layers, which over time get better and better as they are reweighted.
People have been doing this—or, trying to do this—for a long time. The main piece of mathematics was developed in the 1960s and '70s and rediscovered in the '80s, but it didn’t really work well until we had a lot of data and could do a lot of computation really fast. Some people had the intuition in the ’80s that deep learning would be a breakthrough. They couldn’t prove their case because they didn’t have enough data and they didn’t really have fast enough computers. But around 2012 people started using GPUs—graphics processing units—that had been designed for video games, and at the same time data had become cheaper and more available. Smart people figured out how to apply deep learning to artificial intelligence problems.
Suddenly, in 2012, all the pieces fell together. It was possible for computers to recognize images on Image Net—roughly a million labels for particular pictures that a lot of researchers used—with unprecedented accuracy. With enough data and computing power, deep learning was fast and worked really well for a certain set of problems. Some people insist it works for all problems. That’s not really the case. But it’s great for labeling pictures and recognizing the syllables in people’s sentences, such that you can decode their speech.
"People can rationalize what’s going on in their thought processes. Deep learning can’t: These systems have no idea how they’re thinking or how they’re categorizing themselves."
JP: In the real world of business applications and consumer technologies, what can I do with pattern recognition?
GM: The most common everyday use is for speech recognition. So if you talk to Alexa or Google at Home, you’re using deep learning; if you have your pictures automatically labeled in something like Google Photos, then you’re using it, too. People are also using it to some degree and with some success—but not complete success—in driverless cars. People are applying it to all kinds of creative things like colorizing old movies. It has many applications, including in scientific research—but it also has many limitations. And the limitations are not always covered enough in the media.
JP: Rod, what are those limitations? What can’t deep learning do well and why?
Rodney Brooks: At first, it was thought these machine learning systems were deeply mathematical, but more recently people have shown that the algorithms self-driving cars use will perceive a stop sign with a few pieces of electrical tape at critical positions on it as a 45-mile-an hour speed limit sign. To a human it looks like a stop sign with some pieces of electrical tape on it. So this is a little frightening: These deep learning algorithms are not as robust as we might hope.
JP: Are these neural networks so brittle because deep learning has no real understanding?
RB: Deep learning systems do not understand in the same way that we understand. People say to me, “Well, how can AI see a stop sign as a 45-mile-an-hour sign, because stop signs are red and 45-mile-an-hour speed-limit signs are white. How could it possibly be that way?’ Well, it turns out that red is not really red. In different lighting, the colors of the pixels that are in our eyes are not the same that we label as colors. We do all sorts of things to compensate for variations in lighting. These algorithms are just trained with photos. They haven’t built what’s called color constancy, which is fundamental to how we perceive colors. So they just don’t do the same sorts of things that humans do with their vision. The human world is designed implicitly for human visual systems to work well.
JP: Gary, in addition to being brittle, you’ve said these deep learning systems are “greedy.” What do you mean by that?
GM: Deep learning needs a lot of data to work properly. So if you have millions of observations of something that doesn’t change very often, then deep learning can be a great tool for that. If you have a dozen observations of something that’s changing over time, they don’t work very well. So they’re “greedy” in the sense that they work best if they have tons and tons of data. This is a problem that people recognize in the field but don’t always talk about. Sometimes you can get the data that you need to make greedy learning work and sometimes you can’t. So, for example, if you want to translate English and French, you can get a huge database from the Canadian parliament of English and French set up in parallel; you’ve got the data that you need. But if you want to translate English into Swahili and you don’t have that database, then your translation technique doesn’t work anymore.
JP: As well as being brittle and greedy, in what sense is deep learning “opaque” and “shallow”?
RB: Well, “opaque” in this sense: meaning there’s no explanation for a conclusion or a decision. People can rationalize what’s going on in their thought processes. Deep learning can’t: These systems have no idea how they’re thinking or how they’re categorizing themselves. And “shallow” because they’re called deep networks, but that word “deep” actually comes from the number of layers that are employed. Remember we used to only have three layers in the networks in the ’80s and then we got to a dozen, and now sometimes it’s 100 layers, or more. People misunderstand that when they hear “deep learning,” they think it's deep thought, but in fact it’s very shallow.
GM: It’s kind of like the difference between causation and correlation. Deep learning is essentially learning a sophisticated version of correlation. You can have correlation without understanding why two things are related. So deep learning is actually shallow in the sense that it’s just saying statistically this thing and that thing tend to co-occur, but it doesn’t mean that the system understands why they co-occur. So if the circumstances change, then those statistics are no longer a very good guide. The system doesn’t understand what some problem is about.
Remember the Atari game system that Deep Mind developed? It looks like it can play Breakout, and it does play a very good game of Breakout, but it doesn’t really understand what a paddle is or what a ball is, or what a set of bricks are. That would be a deep understanding. It just understands these particular statistics. Now if you disrupt that and you move the paddle up a few pixels then the whole system falls apart because what it has learned is so shallow.
"True intelligence is being able to approach a new problem you haven’t had a lot of direct experience with. A human being can play a game that they’ve never played before and in a matter of minutes figure out something about what’s going on. Machines still can’t do that."
JP: Because the system has no common sense understanding of the rules at all? Why has hype so overwhelmed reality? You both make it sound as if “artificial intelligence” itself is misapplied to what deep learning is doing.
GM: Deep learning is a tool that you could use in artificial intelligence, but it’s not really that intelligent. I think in a lot of fields there’s a kind of “silver bullet-itis,” where people are looking for one ring to rule them all. In psychology, which is my native discipline, for example, before I was born, we had [B.F.] Skinner and behaviorism; people were looking for one magical equation that would describe all of behavior. Of course that fell apart. People don’t really think you can capture human behavior that way any longer. But in artificial intelligence right now, they’re trying to have one set of equations, basically back propagation, about how you tune these neural networks over time, and are trying to use that tool over and over again to do everything. It’s seductive, and it often works a little bit. It’s not that hard to get it to work a little bit, and it’s so much harder to solve problems in a proper way. More people should want to do that. It’s like the [Bertrand] Russell quote about theft and honest toil. [Editor’s note: Russell once said, “The method of ‘postulating’ what we want has many advantages; they are the same as the advantages of theft over honest toil.”]
RB: And it’s undeniable that deep learning systems are able to do things we didn’t think we could do 10 years ago. The speech understanding that Amazon Echo, Alexa, or Google at Home display is actually quite a big jump from what we had before. So people see that and then they make the inference that it’s just going to keep getting better and better exponentially, which is what happened with Moore’s Law for 50 years. But these sorts of technologies don’t just get better by themselves. You need new conceptual understandings to get the next improvement. And they’re very unpredictable, and they only come along every few decades.
GM: Some progress has been genuinely exponential, and [Ray] Kurtzweil likes to harp on those, but in other areas of artificial intelligence we’ve seen more linear progress, or no progress at all. True intelligence is being able to approach a new problem you haven’t had a lot of direct experience with. A human being can play a game that they’ve never played before and in a matter of minutes figure out something about what’s going on. Machines still can’t do that. With the problem of natural language understanding, we’ve had some improvement: Alexa can understand the basic request, but we don’t have a system that can understand a conversation like the three of us are having right now. We’re no closer on that, I would argue, in 2019 than we were in 1959.
If you see an error in this story, contact us.