• Home
  • Gallery
  • Team
  • Contact
  • Blog

Marmot Musings
​Or, Will Petillo's Blog

The Caring Problem

2/21/2025

0 Comments

 
I wrote this post shortly after the Paris AI cheerleading session and I am uncharacteristically angry. You have been warned.

Perverse Incentives

Psychoanalyzing other's intentions gets a bad rap because it is often misapplied as direct evidence for the validity of arguments. But estimating intentions is quite useful when it allows you to:

  1. Assign the correct level of trust when considering claims one is not able to fully evaluate
  2. Predict future behavior

The relevance of 1 can often be mitigated by upping your evaluation game, with obvious limitations. For now, I will be most interested in 2.

Efforts in AI safety seem to be following the implicit assumption that efforts to build ASI are misguided, either in the form of not understanding the difficulty of the alignment problem or in the form of not knowing how to coordinate out of a multipolar trap. In this view, what is needed is better information, more clearly communicated. Efforts along these lines have been commendable, and arguably essential, but on their own they have not had the desired results.

Let us assume instead that the people leading the AI race are amoral psychopaths and see where this reasoning leads. We can begin by making the following observations:

  1. The threat of existential risk implies a source of tremendous power
  2. The value of AI is concentrated in the hands of those controlling it
  3. Most intermediate harms (pre-ASI) can be externalized onto society
  4. The risk of extinction is borne by all of humanity

To grasp the full meaning of these observations, it may be helpful to compare the expected value calculation from the perspective of a normal human citizen vs. that of a psychopathic tech CEO. To start, consider some of the relevant variables:

  • Expected value
  • Extent to which AI effectively steers society in an intentional direction
  • Extent of your autonomy/control over the nature of utopia
  • Chance of AI destroying all of humanity
  • Intermediate harms to society (bias, deepfakes, etc.)
  • Extent to which you are affected by or liable for the intermediate harms
  • Value of the world as it is, without AI or associated tech

Now consider how these variables relate to each other, from the perspective of a single person:

  • Expected value is the magnitude times quality of impact AI has on society in the near term (pre-ASI) plus the long term impact, where the latter is discounted by time and likelihood, compared to the baseline of the world without AI.
  • The benefit of AI on society is the extent to which AI successfully steers society (having no effect would be zero, being uncontrolled would be negative) times the extent to which that steering is consistent with the direction you would like to steer it.
  • The value of all impacts is multiplied by how much they affect you plus how much they affect other people, where the latter is discounted by how much you care about those other people.

For a normal person, the expected value of AI is relatively low because they are: bearing the brunt of near-term externalized costs, disproportinately affected by long-term risks because they care about the wellbeing of other people in addition to themselves, are reaping far less rewards in the near-term because they don't share in the profits, and have less reason to expect a utopia in the long term because they don't have a say in what it looks like. For a psychopathic tech CEO, all of these considerations pull in the opposite direction: the short term considerations are all profit with no liability and the long term promises an unimaginably massive payoff at the risk of a salient death toll of 1—and even that perceived risk is artificially lowered by selection effects and cognitive dissonance.

Furthermore, as an active decision-maker, you can plan to race ahead and then stop once things start getting out of hand. If you have succeeded in capturing a monopoly, stopping is easy. If the race dynamics are still in full-swing, you can just pick up the phone, have a friendly chat with your competitors, and negotiate the agreement that the panicky normies helpfully drafted for free. And if dangerous levels of the tech have gotten so widespread that obtaining universal voluntary compliance is no longer possible, you can leverage this instability as justification for the authoritarian regime you've always wanted. Winning!

Cooperation is Easy, Caring is Hard

Atoms do not spontaneously self-assemble into AI hardware. The rationale for development being inevitable is appeal to game theory, or: "if I don't build it, someone else will." Now, game theory is a real force in the world and any social system that wants to have a chance of not completely imploding needs to account for it, but it is not the whole story. The alternative to the Tradgedy of the Commons is cooperation, a pattern that is at least as old as multi-celular life and has, in the long view, beaten competition at every turn. At any moment, the leaders of the tech companies (or the leaders of nations) could choose to pick up the phone and state a desire to negotiate. This would be the first step in a conceptually straightforward (though complicated in practice) process:

  1. Each party decides that they would like for a collaborative outcome to occur, meaning that the loss of personal sovereignty entailed in submitting to the negotiated set of rules is held as less significant than the benefit of binding others to those same rules.
  2. Each party communicates this balance of values to the other parties and further expresses a desire to create a binding agreement, pending a comprehensive and reasonable plan.
  3. A clear and comprehensive specification of the things which are to be allowed and disallowed is put into writing.
  4. The parties develop a plan for monitoring and enforcing the provisions and add them to the contract. In a zero-trust context, parties determine provisions for monitoring and enforcement by thinking through all of the ways they could defect on the contract, assuming that the other parties would have come up with the same ideas, then determining what would be necessary to catch and adequately punish such an infraction. Insofar as trust is present, there is slack to not be perfectly vigilant in considering all possible means of defection.
  5. The contract only becomes binding when this plan has been developed to all parties' satisfaction. Before then, it is purely an expression of intent.

This is standard negotiating procedure, which anyone in any position of real power is intimately familiar with. Unilateral contracts where one puts oneself at a competitive disadvantage in the hope that others do the same are not a thing and need not be considered. Anytime anyone frames unilateral self-sacrifice as the unacceptable alternative to racing it is because they are playing you for a fool (or have themselves been played).

Steps 3 and 4 need attention, and the efforts of those working on these are to be commended, but step 1 (individual buy-in) is the definitive sticking point for collaboration on navigating the dangers of ASI. This is not because the arguments are unsupported or too complicated to understand, but because the expected value calculation from the perspective of the people making decisions is not in favor of collaboration. They don't want to ensure safety for the general public, they want to win.

Even if tech CEOs have no choice but to race ahead on AI development, there is no game-theoretic reason that they cannot simultaneously lobby the government to enforce binding regulations—and if that is disallowed by their fiduciary duty to shareholders, they can lobby to change that instead. Even if national leaders have no choice but to support AI development through deregulation and infrastructure, there is no game-theoretic reason they can't engage in diplomatic negotiations. Even if AI engineers cannot change the culture of their workplaces from the inside, they could gain the power to do so by forming a union. But they are not. Because they don't want to.

What's the endgame here? For the psychopaths behind Big Tobacco and Big Oil, it was to pillage the world and then die at a ripe old age, fat and happy. For nuclear weapons, it was to establish a permanent, world-wide military hegemony. For AI, it's the return of slavery. Humans are a pain: if you beat them into submission, their work performance suffers; if you cut them some slack, they start demanding rights--it's so hard to get good help these days. AI mostly does what you tell it. Someday that "mostly" might become a problem that can't be externalized. When it does, expect safety teams to start getting funding. And if those teams get stuck, expect tech and world leaders to become very interested in cooperation.

Yes, this could go wrong. Recursive self-improvement or deceptive alignment could eliminate decision-makers' time to react. A competitor could act irrationally, based on an unreasonably low risk assessment, and resist changing course from the chaotic growth to the stable exploitation regime, on the grounds that it is too early to do so. But the latter is not a problem if you win the race and the former is an acceptable risk.

Personal Responsibility

It's easy to get mad at the leaders of the world. Such anger is justified, but it probably isn't the best place to direct your focus. Hitler and the Nazis did some truly awful things, but they couldn't have done anything if it wasn't for the support of a nation of good, honest, hardworking German citizens. How are you being a good German citizen?

I don't care—at all—whether you use AI or buy products made by companies with unethical business practices. I only sort-of care who you vote for. I'm asking what you have done to change the power dynamics that create the incentive structures that force us to choose between what's right and what we have to do to get by. Yes, there is honor in doing the right thing, even when it hurts. But when it comes to making the world a better place in a way that has any hope of scaling, if you're in that position, you've already lost.

Actually, it's worse than that. In order for virtuous personal choices to move the needle, even a little bit, one has to go beyond personal decisions to influencing culture. Some people will resist that message, for various reasons (cynicism, self-interest, narrow focus, genuine lack of options, philosophical disagreement, etc.) and feel attacked to a degree proportionate to the strength of the message. This creates a self-defeating feedback loop where the stronger the forces that push in one direction, the stronger the forces that push back, which necessarily results in an equilibrium, which then solidifies into a cultural boundary. Choices motivated by social change transition towards statements of identity, which are easily co-opted, and the end result is market diversification—capitalism adapts. And that's if such a movement is successful; if it isn't, it just fizzles into a more direct waste of energy.

To be clear, I'm all for "voting with my wallet" and I (try to) do it regularly. Not out of any utilitarian calculus where I expect it to matter at scale, but from an entirely virtue-ethics frame where: such actions are consistent with my values, I feel better about my life when my beliefs and actions are aligned, and I have the economic privilege to be able to afford this choice. Whether anyone else does the same is entirely their business.

Perverse incentives are not a fixed law of reality, they are something we could change with policies like the following:

  1. Taxing externalized costs, such as carbon emissions and contributing to existential risk
  2. Ranked choice voting
  3. Campaign finance reform
  4. Citizens' assemblies
The reason these policies don't happen is because they go against the incentives of polticians, which (at least in democracies) follows the whims of culture, which you are a part of. If you don't know how to get started, here are some tried-and-true basics:

  • Write to your representative or participate in a protest.
  • If you have disposable income, donated some of it to a charity (that you have researched as being effective) or to someone you know who is in need.
  • Stop rewarding people when they act like assholes—or trusting people with unearned confidence.
  • Get better at managing conflict. This will improve your life and also make you more resistant to "divide and conquer" tactics.
  • Educate yourself on how to be a better citizen.

Only you can know your capacity, the magnitude of what you can take on. But getting the direction of your effort right—or at least the direction of the direction—is a choice. We are in the situation we are in because too many people have made the wrong choice. In theory, this could change anytime; in practice...we'll see.
0 Comments
<<Previous

    Archives

    May 2025
    April 2025
    March 2025
    February 2025
    January 2025
    August 2024
    April 2024
    March 2024
    December 2023
    October 2023
    July 2023
    May 2023
    March 2023
    December 2022

    RSS Feed

    Articles
    AI Explained
    AI, from Transistors to ChatGPT
    ​Ethical Implications of AI Art

    Alignment
    ​What is Alignment?​
    Learned Altruism

    Unity Gridworlds
    Doom Debate
    The Caring Problem

    Predictions
    Superintelligence Soon?
    AI is Probably Sentient
    ​Extinction is the Default Outcome
    ​AI Danger Trajectories
    Intelligence, Power, and Feedback

    Interview Transcripts
    Vanessa Kosoy
    Robert Kralisch

    Substrate Needs Convergence (SNC)
    What if Alignment is not Enough?
    Lenses of Control

    The Robot, the Puppetmaster, and the Psychohistorian
    Solutions
    ​Fixing Facebook
    ​Fixing Global Warming

    ​PauseAI Meetup Opening Speech
    Creative
    ​Black Box
    ​The Boy Who Cried Wolf
    Stephen Hawking's Cat

    Dumb Ways to Die
    Other
    ​A Hogwarts Guide to Citizenship
    Anti-Memes
Proudly powered by Weebly
  • Home
  • Gallery
  • Team
  • Contact
  • Blog