Coherent Extrapolated Volition

Some people want to infuse AI with the codification of our existing moral judgements. CEV takes the idea a bit further.
In short, CEV is a conceptual approach to aligning superintelligent AI by designing it to act according to an idealised version of what humans would want if they had more knowledge, were faster thinkers, and were more of what we wished to be.

In Chapter 13, Choosing the Criteria for Choosing, Nick Bostrom explores how we might design AI systems to discover and align with moral truths or human values without directly specifying them. The strategies he discusses revolve around the concept of indirect normativity, where we delegate the determination of values to a process that is expected to yield morally appropriate outcomes. There are many ways to elicit AI to achieve indirect normativity – Coherent Extrapolated Volition (CEV) is one of them.

Update: Nick Bostrom touched on CEV in my recent interview with him, which you can see to the right.

CEV Strategy

The Coherent Extrapolated Volition (CEV) strategy, proposed by Eliezer Yudkowsky in 2004 (archive), is one of the most well known concepts in discussions of indirect normativity, and besides discussions on email lists. In this foundational work, Yudkowsky proposes that an AI should be designed to act according to humanity’s collective will, as it would be if we were more informed, rational, and cohesive. This approach aims to align AI behaviour with human values by extrapolating our volition under idealised conditions.

Among many other things, CEV involves pointing an AI at humans and saying (in effect) “See that?  That’s where you find the base content for self-renormalizing morality.” – EY – Mirrors & Paintings

The idea is to achieve dynamic collective convergence on idealised human preferences (as if humans had greater knowledge, better reasoning and loads more resources and time to reflect). An idealised collective convergence would aggregate human preferences to reflect the collective will of humanity. The process would be dynamic – a continuous refinement of preferences as humans and societies evolve toward more informed and rational states.

CEV Process:

  • Model Humanity’s Current Values: gather data on human preferences (including moral intuitions, and cultural values) and identify convergences and divergences in these values.
  • Simulate Ideal Conditions: Use AI to project how individuals and groups might revise their values based on increasingly more accurate information, human cognitive enhancement and increasingly wise states (less attachment to emotional distortions and cognitive biases).
  • Aggregate and Synthesize: Extract individual volitions, resolve conflicts and inconsistencies among values that fall out form these volitions, and merge all the coherent volitions into a coherent collective framework.
  • Action Guidance: Direct AI’s behaviour based on the merged coherent voltions – as circumstances evolve and humanity becomes more enlightened, continuously update CEV to reflect this.

Resolving conflicts and inconsistencies across aggregate human values seems impossible without some objective standards for adjudication. While Yudkowsky doesn’t go as far as moral realism to address the philosophical clarity to address this, he acknowledges the difficulty in his writing on the complexity of human value.

CEV & Moral Anti-Realism

This approach suggests that moral directives for AI should be derived from an idealised version of human preferences rather than from objective moral truths. In meta-ethical terms, this positions CEV closer to moral anti-realism. It can be seen as a form of projectivism – the idea that all judgements about the world (including moral ones) derive from internal [human] experience rather than empirical or stance independent facts.

As described, CEV has dependence on human perspectives: CEV bases moral guidance on what humans would want under ideal conditions. This implies that moral values are contingent upon human cognition and human social development, aligning with anti-realist views that see morality as constructed or subjective rather than as reflecting independent moral facts.

By focusing on extrapolated human volition, CEV does not assume the existence of universal moral truths that exist independently of human beliefs or attitudes. This contrasts with moral realism, which posits that such objective moral facts do exist.

“Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.” – Eliezer Yudkowsly

“Our CEV ” – CEV stipulates that our human moral understanding can evolve as we become more informed and rational. This adaptability suggests that morality is not fixed or absolute but is instead a product of ongoing human reflection and consensus, a hallmark of anti-realist positions. This was written quite some time ago. Perhaps since then he has more refined views on the topic.

Eliezer also seems to believe we need the kind of control that is mathematically rigorous – this could be because AI not being human, and perhaps quite alien, wouldn’t naturally converge on the same values as we do now (or the values we might converge on under more ideal conditions), and therefore needs to be controlled to be in service to human values.

Part of this narrative doesn’t seem to gel well with other things Eliezer believes, like for instance that good Bayesians should update based on new evidence. But evidence from where? other peoples beliefs or the empirical evidence from the real world and stance independent logical truths? This question should effect both or reflection, and consensus making. So either we should be good Bayesians about value as well as everything else, which entails taking evidence from the real world, or we should be good Bayesians about everything except value.

Bayesian Updating on Moral Claims

The Bayesian framework for how we should revise our beliefs in light of new evidence, in the context of morality, means that as we encounter new information or arguments that either support or challenge a particular moral claim, our degree of belief in that claim should be adjusted accordingly.  New evidence can take various forms, such as empirical findings about human behaviour, neuroscience about sentience, good philosophical arguments, or even changes in cultural norms. These can lead us to update our beliefs about what is morally permissible, desirable, or wrong. 

Analogously to discussions in the philosophy of science, a natural justification for wanting our moral theories to have Bayesian features (alongside parsimony, explanatory power, consistency, unity etc) is that they are conducive to truth, including moral facts.

References

Projectivism – “Derived from the Humean idea that all judgements about the world derive from internal experience, and that people therefore project their emotional state onto the world and interpret it through the lens of their own experience. Projectivism can conflict with moral realism, which asserts that moral judgements can be determined from empirical facts, i.e., some things are objectively right or wrong.” – wikipedia

Moral Realism – The thesis that there are mind-independent moral truths, which hold independently of our attitudes, beliefs, and practices. (I argue that AIs could be more moral than us, that we should align AI to higher values than human values – which I believe entails moral realism, and we should align AIs to Moral Realism)

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *