"Name one non-handwavy way in which an ASI could kill everyone!"
"OK, but it won't be satisfying because a realistic answer sounds like: 'the AI gains power, then more, then more, then more,' and so on for the length of an entire book and the parts where humans die are footnotes." When explaining why a misaligned superintelligence would be existentially dangerous, the analogy of not knowing what moves Magnus Carlsen will use to beat you in chess doesn't seem to land with people who don't already share the underlying intuitions. It's time to ask ourselves what those intuitions actually are, both for the sake of more effective communication as well as to check for unsupported assumptions. When I think about the danger of adversarial intelligence, I think about the relationship between cognition, power, and runaway feedback loops. Any agentic, thinking entity can be thought of as possessing:
We can thus refine the question of how powerful intelligence is by considering an entity with extremely good processing (P), but much less impressive input and output channels (I & O). At first, one might think that the power of the system to shape the world according to its goals follows a function like: I * P * O. So if (I) is just OK and (O) is weak then (P) needs to be truly incredible to easily overcome a competing agent that is solid across all three dimensions—and far more so if it is trying to overthrow a global society of such agents. Such a formulation, however, fails to account for feedback loops, which requires understanding the relationship between the elements. To illustrate, let's consider an agent with increasing levels of situational awareness:
At the start, the agent has N capabilities, and these can be combined into some finite range of actions. The first time it achieves some new lever of influence on the world, it now has N+1 capabilities, expanding its range of action not only by what the new lever is directly capable of, but by all of the synergies between the new lever and all of its old ones. This means that as N increases, the potential to increase N also increases. There are, however, limitations:
The concept of superintelligence assumes away 2, but 1 and 3 map to common doubts about existential risk. For any given task, there is some threshold of processing ability needed to perform the task at all and another, higher threshold where additional processing has diminishing returns. And where the environment is unpredictable, learning to navigate it is either impossible, or at the very least requires experimentation, which imposes limits on speed. There is obviously headroom above humans with respect to the ability to find useful actions in search space, but the extent to which that headroom turns out to be useful seems highly context-dependent. But then, probing for contexts that are bottlenecked by search is itself an information processing task! For 1 and 3 to be bottlenecks to AGI recursively gaining power, the entire world must either be too simple or too chaotic for there to be meaningful contexts where it can leverage its search capacity. This seems obviously false—especially given human civilization as a proof-of-concept. But wait, a skeptic might object, there's another layer of potential negative feedback. The world is already filled with (tech augmented) humans expanding and protecting their own spheres of influence, who would surely band together to stop an AI expanding its power at their expense. By now, we've hopefully moved beyond thinking of extinction risk in terms of catastrophes that are so bad they wipe everyone out in one epic blow. We should instead expect existential scenarios to look like the logical conclusion of unchecked recursive power expansion. Catastrophes, if they occur at all, will be mere dramatic moments resulting from unique tactical circumstances. Strategies available to a misaligned AGI range along spectrums of cooperative to adversarial and secretive to overt. Each of these spectrums come with tradeoffs. Cooperation potentially elicits cooperation in kind (or at least decreases hostility), but also limits options while empowering an eventual opponent. Secrecy prevents retaliation, but also limits options and has the potential for even sharper retribution than overt hostility if caught. In general, an effective strategy is to be cooperative/secretive when weaker than an opponent and adversarial/overt when stronger, since the impact of another's retribution is inversely proportional to one's relative power. Of course, a more strategically aware actor will not take these trade-offs for granted, but try to manipulate them for greater maneuverability. Putting these concepts together, I would expect the strategy of a rogue AI to be something along the following pattern:
For an illustration of the above strategy in narrative form (but with ambiguity as to the AI's ultimate intentions), see the prelude from Max Tegmark's Life 3.0 The path from disempowerment to extinction is a question of the AI's ultimate goals. By default, I expect humans to go the way of the countless species of wild animals that have gone extinct as an unintended side effect of human expansion. It is possible that the AI will have goals that cause it leave us enough of the world to live in, but given how it is currently being built in service to and in the image of relentlessly expansive corporate profit-seeking, I wouldn't bet your family's life on it.
0 Comments
Leave a Reply. |
Archives
May 2025
Articles
AI Explained AI, from Transistors to ChatGPT Ethical Implications of AI Art Alignment What is Alignment? Learned Altruism Unity Gridworlds Doom Debate The Caring Problem Predictions Superintelligence Soon? AI is Probably Sentient Extinction is the Default Outcome AI Danger Trajectories Intelligence, Power, and Feedback Interview Transcripts Vanessa Kosoy Robert Kralisch Substrate Needs Convergence (SNC) What if Alignment is not Enough? Lenses of Control The Robot, the Puppetmaster, and the Psychohistorian Solutions Fixing Facebook Fixing Global Warming PauseAI Meetup Opening Speech Creative Black Box The Boy Who Cried Wolf Stephen Hawking's Cat Dumb Ways to Die Other A Hogwarts Guide to Citizenship Anti-Memes |