I wrote this post shortly after the Paris AI cheerleading session and I am uncharacteristically angry. You have been warned.
Perverse Incentives Psychoanalyzing other's intentions gets a bad rap because it is often misapplied as direct evidence for the validity of arguments. But estimating intentions is quite useful when it allows you to:
The relevance of 1 can often be mitigated by upping your evaluation game, with obvious limitations. For now, I will be most interested in 2. Efforts in AI safety seem to be following the implicit assumption that efforts to build ASI are misguided, either in the form of not understanding the difficulty of the alignment problem or in the form of not knowing how to coordinate out of a multipolar trap. In this view, what is needed is better information, more clearly communicated. Efforts along these lines have been commendable, and arguably essential, but on their own they have not had the desired results. Let us assume instead that the people leading the AI race are amoral psychopaths and see where this reasoning leads. We can begin by making the following observations:
To grasp the full meaning of these observations, it may be helpful to compare the expected value calculation from the perspective of a normal human citizen vs. that of a psychopathic tech CEO. To start, consider some of the relevant variables:
Now consider how these variables relate to each other, from the perspective of a single person:
For a normal person, the expected value of AI is relatively low because they are: bearing the brunt of near-term externalized costs, disproportinately affected by long-term risks because they care about the wellbeing of other people in addition to themselves, are reaping far less rewards in the near-term because they don't share in the profits, and have less reason to expect a utopia in the long term because they don't have a say in what it looks like. For a psychopathic tech CEO, all of these considerations pull in the opposite direction: the short term considerations are all profit with no liability and the long term promises an unimaginably massive payoff at the risk of a salient death toll of 1—and even that perceived risk is artificially lowered by selection effects and cognitive dissonance. Furthermore, as an active decision-maker, you can plan to race ahead and then stop once things start getting out of hand. If you have succeeded in capturing a monopoly, stopping is easy. If the race dynamics are still in full-swing, you can just pick up the phone, have a friendly chat with your competitors, and negotiate the agreement that the panicky normies helpfully drafted for free. And if dangerous levels of the tech have gotten so widespread that obtaining universal voluntary compliance is no longer possible, you can leverage this instability as justification for the authoritarian regime you've always wanted. Winning! Cooperation is Easy, Caring is Hard Atoms do not spontaneously self-assemble into AI hardware. The rationale for development being inevitable is appeal to game theory, or: "if I don't build it, someone else will." Now, game theory is a real force in the world and any social system that wants to have a chance of not completely imploding needs to account for it, but it is not the whole story. The alternative to the Tradgedy of the Commons is cooperation, a pattern that is at least as old as multi-celular life and has, in the long view, beaten competition at every turn. At any moment, the leaders of the tech companies (or the leaders of nations) could choose to pick up the phone and state a desire to negotiate. This would be the first step in a conceptually straightforward (though complicated in practice) process:
This is standard negotiating procedure, which anyone in any position of real power is intimately familiar with. Unilateral contracts where one puts oneself at a competitive disadvantage in the hope that others do the same are not a thing and need not be considered. Anytime anyone frames unilateral self-sacrifice as the unacceptable alternative to racing it is because they are playing you for a fool (or have themselves been played). Steps 3 and 4 need attention, and the efforts of those working on these are to be commended, but step 1 (individual buy-in) is the definitive sticking point for collaboration on navigating the dangers of ASI. This is not because the arguments are unsupported or too complicated to understand, but because the expected value calculation from the perspective of the people making decisions is not in favor of collaboration. They don't want to ensure safety for the general public, they want to win. Even if tech CEOs have no choice but to race ahead on AI development, there is no game-theoretic reason that they cannot simultaneously lobby the government to enforce binding regulations—and if that is disallowed by their fiduciary duty to shareholders, they can lobby to change that instead. Even if national leaders have no choice but to support AI development through deregulation and infrastructure, there is no game-theoretic reason they can't engage in diplomatic negotiations. Even if AI engineers cannot change the culture of their workplaces from the inside, they could gain the power to do so by forming a union. But they are not. Because they don't want to. What's the endgame here? For the psychopaths behind Big Tobacco and Big Oil, it was to pillage the world and then die at a ripe old age, fat and happy. For nuclear weapons, it was to establish a permanent, world-wide military hegemony. For AI, it's the return of slavery. Humans are a pain: if you beat them into submission, their work performance suffers; if you cut them some slack, they start demanding rights--it's so hard to get good help these days. AI mostly does what you tell it. Someday that "mostly" might become a problem that can't be externalized. When it does, expect safety teams to start getting funding. And if those teams get stuck, expect tech and world leaders to become very interested in cooperation. Yes, this could go wrong. Recursive self-improvement or deceptive alignment could eliminate decision-makers' time to react. A competitor could act irrationally, based on an unreasonably low risk assessment, and resist changing course from the chaotic growth to the stable exploitation regime, on the grounds that it is too early to do so. But the latter is not a problem if you win the race and the former is an acceptable risk. Personal Responsibility It's easy to get mad at the leaders of the world. Such anger is justified, but it probably isn't the best place to direct your focus. Hitler and the Nazis did some truly awful things, but they couldn't have done anything if it wasn't for the support of a nation of good, honest, hardworking German citizens. How are you being a good German citizen? I don't care—at all—whether you use AI or buy products made by companies with unethical business practices. I only sort-of care who you vote for. I'm asking what you have done to change the power dynamics that create the incentive structures that force us to choose between what's right and what we have to do to get by. Yes, there is honor in doing the right thing, even when it hurts. But when it comes to making the world a better place in a way that has any hope of scaling, if you're in that position, you've already lost. Actually, it's worse than that. In order for virtuous personal choices to move the needle, even a little bit, one has to go beyond personal decisions to influencing culture. Some people will resist that message, for various reasons (cynicism, self-interest, narrow focus, genuine lack of options, philosophical disagreement, etc.) and feel attacked to a degree proportionate to the strength of the message. This creates a self-defeating feedback loop where the stronger the forces that push in one direction, the stronger the forces that push back, which necessarily results in an equilibrium, which then solidifies into a cultural boundary. Choices motivated by social change transition towards statements of identity, which are easily co-opted, and the end result is market diversification—capitalism adapts. And that's if such a movement is successful; if it isn't, it just fizzles into a more direct waste of energy. To be clear, I'm all for "voting with my wallet" and I (try to) do it regularly. Not out of any utilitarian calculus where I expect it to matter at scale, but from an entirely virtue-ethics frame where: such actions are consistent with my values, I feel better about my life when my beliefs and actions are aligned, and I have the economic privilege to be able to afford this choice. Whether anyone else does the same is entirely their business. Perverse incentives are not a fixed law of reality, they are something we could change with policies like the following:
Only you can know your capacity, the magnitude of what you can take on. But getting the direction of your effort right—or at least the direction of the direction—is a choice. We are in the situation we are in because too many people have made the wrong choice. In theory, this could change anytime; in practice...we'll see.
0 Comments
Leave a Reply. |
Archives
March 2025
Articles
AI Explained AI, from Transistors to ChatGPT Ethical Implications of AI Art Alignment What is Alignment? Learned Altruism Unity Gridworlds Doom Debate The Caring Problem Predictions Superintelligence Soon? AI is Probably Sentient Extinction is the Default Outcome AI Danger Trajectories Interview Transcripts Vanessa Kosoy Robert Kralisch Substrate Needs Convergence (SNC) What if Alignment is not Enough? Lenses of Control The Robot, the Puppetmaster, and the Psychohistorian Solutions Fixing Facebook Fixing Global Warming PauseAI Meetup Opening Speech Other A Hogwarts Guide to Citizenship Black Box |