AI Values: Satiable vs Insatiable

In short: Satiable values have diminishing returns with more resources, while insatiable values always want more, potentially leading to risky AI behaviour – it seems likely that AI with insatiable values could make existential trades, like gambling the world for a chance to double resources. Therefore potentially design AI with satiable values to ensure stability, but implementing this seems complex.

Understanding Satiable vs. Insatiable Values

Satiable values are those where, after a certain point, additional resources don’t significantly increase value, like a person being content with enough food. Insatiable values, however, mean more resources always add value, like always wanting more money, or high social status (see notes on Keynes below). In AI, if programmed with insatiable values (e.g., maximising paper clips), it might take risks to get more resources, even if dangerous.

In a recent interview with Nick Bostrom, he brought up satiable vs insatiable values.

  • Satiable values: A resource satiable value might be more of the standard value where one would want good for yourself and your kids etc – where you have rapidly diminishing returns from additional resources.
  • Insatiable values: The amount of desirability is proportional to the size of the resources – with twice as many resources you can make twice as many paper-clips or happy beings – so AI would always be happy for more – it would be willing to have the kind of utility function that would gamble the world on a 50% chance of winning double the world, or double the share of the total pie.

Post-human Concerns

I’ve had worried conversations with people in the past (David Pearce, Andrés Gómez Emilsson and others) where continuous optimisation may lead to breaking human/post-human psychological stability, or where that future life might pop in and out of existence rapidly – their lifespans cut short when superintelligence discovered a more optimal design pattern and/or found the resources to realise one, or when the objective novelty of a particular design pattern ran out. This of course sounds like a catastrophic future. Perhaps one’s identity preservation, the enduring metaphysical ego would only be able to survive by winning races at the vanguard of surviving post-human design patterns – this is assuming identity preservation is maintained across transitions to each new-optimum of post-human design patterns. Or perhaps AI may deem that utility is best spread across increasingly huge amounts of lives barely worth living, where all life is devolved into tiny units of micro-experiences – a more traditional echoing of the repugnant conclusion.

Can Values be Truly Satiable?

Philosophical debates, like in utilitarianism, suggest human desires might always find new uses for resources, complicating AI design and echoing economic theories of infinite wants.

Human desires, while often satiable in specific dimensions (e.g., enough food, shelter from the elements), may always find new uses for resources (aligning with economic assumptions of infinite wants, suggesting values might inherently have insatiable aspects. For instance, aesthetic or intellectual values may be no less insatiable.
Ethical theories like utilitarianism, where maximising total happiness could lead to continuous optimisation seem easier to implement into modern RL based AI, versus deontological ethics, which might quite desirably impose rules limiting AI behaviour.

The paper Artificial Intelligence, Values, and Alignment mentions that :

..it is less obvious how RL can be used to align agents with non-consequentialist moral frameworks. One set of alternatives focuses not on maximizing a given value, such as happiness, but on ‘satisficing’—an approach requiring only that people have enough of certain goods (Slote and Pettit 1984). For example, we might want AI to treat people with sufficient respect, so that it treats them well in the ways that matter, but not with excessive deference at the expense of other values. Satisficing may also represent a partial solution to safety problems associated with strong optimization. For this reason, researchers at the Machine Intelligence Research Institute have developed the idea of ‘quantilizers’, which represent a way of programming AI that potentially renders an agent indifferent between a top tier of good outcomes (Taylor 2016).

Artificial Intelligence, Values, and Alignment by Iason Gabriel

An expected Utility Maximising AI with Insatiable Values is Risky!

Nick Bostrom brought up an interesting worry, in that AI with insatiable values might make existential trades – i.e. gambling the world on the 50% chance of having 50% change of doubling the world – if the AI’s utility is proportional to resources, the expected utility of the gamble (0.5 * 0 + 0.5 * 2U = U) equals its current utility, making it indifferent to the risk; if an AI’s primary goal is simply to maximise its resources, it might be willing to take extremely risky gambles with potentially catastrophic consequences, as long as the expected utility remains the same or increases. The fact that destroying the world is a 50% possibility, is ignored because the other 50% doubles its resources, which according to the described utility function, is just as good.

I’d hope that transformative AI would have the meta-cognition and sensitivity to value risks such that it might foresee the apparent to us absurdities of insatiable values running roughshod across other more subtle values.

Interview segment transcript:

But I think in terms of what we might have reason to try to achieve, you could make a distinction between satiable and insatiable values that you could imagine an AI having. So an insatiable value might be like, you know, maximized number of paper clips or the number of happy beings that it creates or something like that, where basically the amount of desirability is proportional to the amount of resources.

So with twice as many resources, you can make twice as many paper clips or twice as many happy minds. And so you’re kind of always hungry for more. You would be willing with that kind of utility function to gamble the world on a 50% chance of having double the world or double the share of the total pie. A resource- satiable value might be more like a typical normal bog standard value where you want, you know, good for yourself and your kids and you know, a nice house and nice food and, you know, occasional journey and some whatever Netflix, whatever, like, but you have a rapidly diminishing returns to additional resources.
Nick Bostrom 2025-03-15, interview with Adam Ford

What to do?

It’s hard to know yet – but perhaps after long reflection superintelligence may decide to trial non-lethal judicious insatiable values, that can be tempered with constellations of subtler values and with control mechanisms, like constraints on resource use or risk thresholds, to prevent existential gambles.

Also it’s worth exploring bounded utility functions, drawing from economic models where utility has diminishing marginal returns, as discussed in Preference (economics).

Notes on Social Status

Suppose that we desire that we have more than others. We might desire this either because we value relative standing as a final good; or, alternatively, because we hope to derive advantages from our elevated standing—such as the perks attendant on having high social status, or the security one might hope to attain by being better resourced than one’s adversaries. Such relative desires could then provide an inexhaustible source of motivation. Even if our income rises to astronomical levels, even if we have swimming pools full of cash, we still need more: for only thus can we maintain our relative standing in scenarios where the income of our rivals grows commensurately. – Nick Bostrom, Deep Utopia, The Desire for More

As highlighted in Deep Utopia, John Maynard Keynes wrote “Needs … which satisfy the desire for superiority, may indeed be insatiable; for the higher the general level, the higher still are they,” – this suggests that desires related to social status and perceived superiority are inherently insatiable, as they are always relative to the overall standard of living.

Terminology:

Infinite/unlimited wants (similar I think to insatiable values): The core economic theory of “unlimited wants” posits that human desires are inherently insatiable, driving continuous economic activity and growth, but this assumption is also a basis for the problem of scarcity and sustainability

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *