Over the last few years I have gone from an AGI skeptic to being significantly worried about just how soon it will likely come to exist and how dramatically it will reshape the world. Like climate change, it is a challenging problem in part due to a widespread reluctance to even recognize it as a tangible issue, and then when it is it’s unclear what can or should be done.
I want to start by describing the thought process that led to my “conversion,” and then I have some thoughts on why I think neural engineering is really our only option for addressing it.
Around 200,000 years ago, humans were a relatively undifferentiated primate living on the African savannah. Earth’s fauna were diverse and thriving across a range of ecosystems. Today, humans absolutely dominate the planet, and we keep our closest living relatives, chimpanzees, in glass boxes for our entertainment while their natural population has declined by about 80% since 1900 and is now restricted to a small swath of protected territory. We have killed, or otherwise driven to extinction, everything between Homo sapiens and Pan troglodytes, to say nothing of the precipitous overall decline in biodiversity that has been secondary to humanity’s progress with technology.
This didn’t happen because humans were stronger, or faster, or more energy efficient, but because we were more intelligent. It is important to understand: this is a thing that actually happened. It is part of the natural history of Earth. There is well-known precedent for increases in intelligence being absolutely devastating for the life around it. The more I came to understand this, the more I came to view consciousness and intelligence as distinct phenomena. Both are rare, but while one is precious and fragile, the other is one of the universe’s great hazards.
For various reasons, I never found Bostrom-style arguments compelling, but we do not need to resort to them to understand the dangers of AGI. We only need to look at history.
With this motivation, my question turned to whether post-2012 deep learning is a likely path to AGI. There is a long history of artificial intelligence research, and basically every idea you can think of has been tried. For years the focus was on Symbolic AI; or even “sub-Symbolic” “embodied” intelligence; or every possible statistical learning method humans have imagined. Simultaneously, Nature has given us a model system to study in our brains, which is our existence proof of support in our physics for general intelligence in the first place.
Thus it is suspicious that the big breakthrough that unlocked dramatic advances in tasks as diverse as computer vision, machine translation, and winning at real time strategy games is the deep neural network. To be clear, there are significant differences in how the artificial neural networks used by deep learning and biological neural networks used by our brains work, but the concepts absolutely do rhyme. We have never discovered a second, completely unrelated approach to behavior that appears interestingly intelligent. All we have, in various forms, are neural networks.
If we know that information flowing through sequences of configurable nonlinearities is expressive enough to produce general intelligence, we also have a well-known algorithm for designing those networks: evolution. If you zoom out, fundamentally we know that with enough compute power, genetic search is capable of designing neural networks that are at least as smart as humans. Whereas neural networks are apparently sui generis for implementing intelligence, I suspect that evolution is probably less special and many optimization methods will turn out to be capable of designing such networks. Indeed, progress on developing deep neural networks from scratch is already well known in the literature, most notably in my mind being Google’s AutoML-Zero.
Simultaneously we have found that very simple architectures scaled up to enormous parameter counts can be powerful on their own. OpenAI’s astonishing GPT-3 is essentially just a Transformer scaled up to 175 billion parameters.
If very large language models can generate not only believable, but useful, text then what’s the gap left for “general” intelligence? I’ve been thinking about this question a lot and I now believe the answer is: nothing. That’s it right there, that’s intelligence. It can certainly be more intelligent, but I can’t find any reason not to call GPT-3 on its own intelligent. This feels like an important discovery.
Now, GPT-3 is not about to take over the world under its own power. First of all, its cognitive abilities are full of scotomas. For a minority of inputs it provides astonishing displays of intelligence, but you have to get your question right for it to work. As crazy as it might sound, I believe this is probably just an issue of larger networks. With a big enough Transformer, we can probably build a language model that is happy to respond coherently on any topic.
Secondly, it lacks independent agency. It responds to prompts agents give it; it does not have goals or ambitions of its own. But really this is just saying that GPT-3 is a language model. Reinforcement learning agents are perfectly capable of having their own goals and learning policies that are effective at achieving them. RL agents are already famously the world’s best players for all board games, including the once-believed-unsolveable Go, as well as being extremely strong, if not dominant, in both DOTA and StarCraft.
Like GPT-3, I don’t find OpenAI Five, AlphaStar, or MuZero particularly concerning from an immediate safety perspective. Models that learn policies for succeeding at any particular task seem complementary to us — tools for us to use — rather than competitive. They can certainly be weapons, but they are weapons that will be wielded by humans rather than themselves. This in itself has important implications: since the development of nuclear weapons, the superpowers have been left with large armed forces that they broadly cannot use. AI tools in the hands of humans may be software weapons that can actually be used to great effect. But it is a different concern than humanity simply being left behind.
Where I begin to worry about AGI with independent agency is when we start to talk about optimizing them under natural selection. My hypothesis is that when you select for survival rather than artificially selecting for some other property you enter different territory, and particularly I expect this is where you begin to see violence. A paperclip optimizer, lacking an instinct to defend itself or hide, seems probably easy to turn off when it begins to get out of hand. I think we underestimate how much of the human condition — fear of death, desire for autonomy and status — stems from the influence of natural selection on our evolution.
There is nothing special about doing this; it is just a different software environment, and banning it effectively is impossible. With advances in compute power, the ability to fit such networks will be widespread well within a decade.
Such agents won’t, of course, start out with any particular desire to break out of their simulation. (It’s probably best to avoid talking about their “desires” entirely.) But it is important to be precise in how we understand their training environment: where we see icons moving around a screen, managing resources and performing actions, they see memory addresses and syscalls. They will absolutely use the whole environment, not just the part we thought intentionally to build. Humans are very poor at writing bug-free software, and modern computers are fantastically complicated.
Simply detecting that an unsupervised AGI is on the internet is non-trivial. With careful monitoring and full-time security teams, organizations like Google and OpenAI will be able to quickly detect misbehaving agents, but the requisite compute power will be extremely hard if not impossible to globally regulate. It only has to happen once and there will be no going back.
Faced with autonomous artificial intelligence, developed under a selection for its survival, humans will immediately be in a precarious situation. If we are seen as a threat to its continued existence we can easily imagine it acting violently. Importantly, this is true regardless of whether there is a single AI or an equilibrium of many. We will be an easy item to remove from the field while they deal with their other risks. Humans don’t bother carving out a preserve for Staphylococcus aureus, we just want it gone from our lives so we can deal with other things.
This brings us to neural engineering. What brain-machine interfaces are about, in this world, is not augmenting individual humans to upgrade our cognitive abilities, but blurring the borders between ourselves and these agents so we are not seen as the other. (Inevitably, this will simultaneously allow us to blur the borders between ourselves, should we so choose.) Individual humans could conceivably come and go, as long as our impression on the whole system is significant enough at all times. Importantly, any individual dangerous agent doesn’t need to choose to collaborate with us, since all we need is one that is sufficiently powerful and which considers us a part of its existence. Presumably we will be able to more rapidly and better resource a dangerous agent we desire to support compared to one that is spreading as malware, though early detection is critical.
Finally, while I’m fairly confident that optimization under natural selection will produce dangerous agents, that’s not to say that there aren’t other ways — possibly even hand design! — to get to the same result.
I started this post with observations that I think sound totally reasonable and now suddenly here I am talking about dangerous artificial superintelligences and the need for brain-to-brain merging with it, like a crazy person. But I think the logic is sound, and if we take it seriously, this is where we end up, and I worry that the timeline involved may be shorter than is widely believed.