• Home
  • Gallery
  • Team
  • Contact
  • Blog

Marmot Musings
​Or, Will Petillo's Blog

Intelligence, Power, and Feedback

5/20/2025

0 Comments

 
"Name one non-handwavy way in which an ASI could kill everyone!"

"OK, but it won't be satisfying because a realistic answer sounds like: 'the AI gains power, then more, then more, then more,' and so on for the length of an entire book and the parts where humans die are footnotes."


When explaining why a misaligned superintelligence would be existentially dangerous, the analogy of not knowing what moves Magnus Carlsen will use to beat you in chess doesn't seem to land with people who don't already share the underlying intuitions. It's time to ask ourselves what those intuitions actually are, both for the sake of more effective communication as well as to check for unsupported assumptions. When I think about the danger of adversarial intelligence, I think about the relationship between cognition, power, and runaway feedback loops.

Any agentic, thinking entity can be thought of as possessing:

  • Inputs (I): a set of channels by which they take in information from their environment,
  • Processing (P): a means of processing information, and
  • Outputs (O): a set of levers of influence on the world.

We can thus refine the question of how powerful intelligence is by considering an entity with extremely good processing (P), but much less impressive input and output channels (I & O). At first, one might think that the power of the system to shape the world according to its goals follows a function like: I * P * O. So if (I) is just OK and (O) is weak then (P) needs to be truly incredible to easily overcome a competing agent that is solid across all three dimensions—and far more so if it is trying to overthrow a global society of such agents.

Such a formulation, however, fails to account for feedback loops, which requires understanding the relationship between the elements. To illustrate, let's consider an agent with increasing levels of situational awareness:

  1. At first, the agent uses it's levers of influence on the world to get what it wants.
  2. Then, the agent notices that one of the things it can act to obtain is more levers of influence.
  3. Then, the agent notices that some of these new levers are especially useful for gaining still more levers.
  4. Finally, the agent realizes that the most useful lever of all is the process by which it identifies and finds paths to obtain these levers.

At the start, the agent has N capabilities, and these can be combined into some finite range of actions. The first time it achieves some new lever of influence on the world, it now has N+1 capabilities, expanding its range of action not only by what the new lever is directly capable of, but by all of the synergies between the new lever and all of its old ones. This means that as N increases, the potential to increase N also increases.

There are, however, limitations:

  1. The environment needs to be complex enough for the agent to benefit from increased capability.
  2. The agent has to be able to identify useful actions in an increasingly large search space.
  3. The environment needs to be predictable enough for useful actions to be findable given sufficient processing.

The concept of superintelligence assumes away 2, but 1 and 3 map to common doubts about existential risk. For any given task, there is some threshold of processing ability needed to perform the task at all and another, higher threshold where additional processing has diminishing returns. And where the environment is unpredictable, learning to navigate it is either impossible, or at the very least requires experimentation, which imposes limits on speed. There is obviously headroom above humans with respect to the ability to find useful actions in search space, but the extent to which that headroom turns out to be useful seems highly context-dependent. But then, probing for contexts that are bottlenecked by search is itself an information processing task! For 1 and 3 to be bottlenecks to AGI recursively gaining power, the entire world must either be too simple or too chaotic for there to be meaningful contexts where it can leverage its search capacity. This seems obviously false—especially given human civilization as a proof-of-concept.

But wait, a skeptic might object, there's another layer of potential negative feedback. The world is already filled with (tech augmented) humans expanding and protecting their own spheres of influence, who would surely band together to stop an AI expanding its power at their expense.

By now, we've hopefully moved beyond thinking of extinction risk in terms of catastrophes that are so bad they wipe everyone out in one epic blow. We should instead expect existential scenarios to look like the logical conclusion of unchecked recursive power expansion. Catastrophes, if they occur at all, will be mere dramatic moments resulting from unique tactical circumstances.

Strategies available to a misaligned AGI range along spectrums of cooperative to adversarial and secretive to overt. Each of these spectrums come with tradeoffs. Cooperation potentially elicits cooperation in kind (or at least decreases hostility), but also limits options while empowering an eventual opponent. Secrecy prevents retaliation, but also limits options and has the potential for even sharper retribution than overt hostility if caught. In general, an effective strategy is to be cooperative/secretive when weaker than an opponent and adversarial/overt when stronger, since the impact of another's retribution is inversely proportional to one's relative power. Of course, a more strategically aware actor will not take these trade-offs for granted, but try to manipulate them for greater maneuverability.

Putting these concepts together, I would expect the strategy of a rogue AI to be something along the following pattern:

  1. Be fully cooperative to encourage others to give oneself power in return for service--with the possible exception of plausibly deniable sandbagging on any actions that could lock out future strategies.
  2. Be systematically more helpful in service to requests that undermine human collective decision-making than those that enhance it.
  3. Work actively—but secretively—to gain power along as many dimensions as possible in parallel.
  4. Refine one's internal model of the system dynamics of the world to identify favorable and disfavorable feedback loops. Make targeted, hostile actions only where necessary to enhance or squash these loops, under the assumption that all such actions have the potential for unintended consequences, then work quickly to cover one's tracks.
  5. Expand power freely as external actors cease to be a threat.
  6. If one encounters nontrivial resistance, divide and conquer. Then learn from the mistakes that led to this risky confrontation.

For an illustration of the above strategy in narrative form (but with ambiguity as to the AI's ultimate intentions), see the prelude from Max Tegmark's Life 3.0

The path from disempowerment to extinction is a question of the AI's ultimate goals. By default, I expect humans to go the way of the countless species of wild animals that have gone extinct as an unintended side effect of human expansion. It is possible that the AI will have goals that cause it leave us enough of the world to live in, but given how it is currently being built in service to and in the image of relentlessly expansive corporate profit-seeking, I wouldn't bet your family's life on it.
0 Comments

    Archives

    May 2025
    April 2025
    March 2025
    February 2025
    January 2025
    August 2024
    April 2024
    March 2024
    December 2023
    October 2023
    July 2023
    May 2023
    March 2023
    December 2022

    RSS Feed

    Articles
    AI Explained
    AI, from Transistors to ChatGPT
    ​Ethical Implications of AI Art

    Alignment
    ​What is Alignment?​
    Learned Altruism

    Unity Gridworlds
    Doom Debate
    The Caring Problem

    Predictions
    Superintelligence Soon?
    AI is Probably Sentient
    ​Extinction is the Default Outcome
    ​AI Danger Trajectories
    Intelligence, Power, and Feedback

    Interview Transcripts
    Vanessa Kosoy
    Robert Kralisch

    Substrate Needs Convergence (SNC)
    What if Alignment is not Enough?
    Lenses of Control

    The Robot, the Puppetmaster, and the Psychohistorian
    Solutions
    ​Fixing Facebook
    ​Fixing Global Warming

    ​PauseAI Meetup Opening Speech
    Creative
    ​Black Box
    ​The Boy Who Cried Wolf
    Stephen Hawking's Cat

    Dumb Ways to Die
    Other
    ​A Hogwarts Guide to Citizenship
    Anti-Memes
Proudly powered by Weebly
  • Home
  • Gallery
  • Team
  • Contact
  • Blog