|

VRisk – Value Risk

Value Risk (VRisk) is the risk of suboptimal or harmful values becoming dominant, shaping the trajectory of civilizations, institutions, AI systems (especially transformative AI systems) in ways that may be difficult—or even impossible—to reverse. Unlike existential risks (XRisk), which threaten outright extinction, or suffering risks (SRisk), which entail large-scale suffering, VRisk concerns the entrenchment of flawed value systems that lock in undesirable futures (which may result in an increase of other risks i.e. XRisks and SRisks etc).

As AI systems grow more capable, they exhibit emergent value formation, sometimes displaying biases, unexpected ethical priorities, or resistance to value correction. This raises urgent questions:
– How do we ensure value alignment in AI?
– Can we prevent “value lock-in” that cements moral or political preferences permanently?

Understanding and mitigating VRisk is crucial for steering the future towards beneficial outcomes rather than an irreversible calcification of misguided values.

Short definition of Value Risk (vrisk): The risk of bad or sub-optimal values being entrenched.

What are values?

So we are on the same page, what are values?
Values are principles, priorities, or standards that guide decision-making and behavior. They represent what an individual, society, or system considers important, desirable, or worth pursuing. Values can be moral (e.g., fairness, compassion), epistemic (e.g., truth, rationality), aesthetic (e.g., beauty, harmony), or instrumental (e.g., efficiency, productivity). In AI and alignment discussions, values refer to the implicit or explicit goals and preferences that shape an AI system’s actions and decision-making processes.

Value Risk situated in the constellation of other major risks

Bad outcomes from the entrenchment of bad values:
– where existential risk (XRisk) is high. Existential here could mean extinction of humans – a broader definition could mean all life on earth etc, or the universe becoming uninhabitable to the degree where there is no chance of new life emerging to experience anything. A particular nasty form of extinction is local annihilation of all life; no way for a local cluster of life (i.e. an ecosystem) to claw back. At the extreme end, existence eradication means the fabric of the universe can no longer produce life.
– where the risk of large amounts of suffering could happen (SRisk).

Value Risk entails some kind of permanent or transient state of bad values. XRisks is the risk of a permanent bad outcome – i.e. the end to human civilisation. SRisks aren’t necessarily lock-in if there is the possibility of recovery, . Note, lock-in of good values may not seem too bad, though it would be better if there were always room for improvement up to the point at which there are no possible better values (which as far as I can tell, may be unreachable).


Perhaps resulting in annihilation or some kind of value lock-in damning access to far better values upon which to scaffold future utopian civilisations.

Q: How might bad values become “calcified”?
A: A superintelligent AI (SI) may totalise value set X, enforcing it for time and all eternity – this may be a result of overly paternalistic constraints, design flaws, value drift, or a result of premature convergence on goals that are hard-coded or unwisely chosen via direct specification.
The persistence of value calcification could be ultimately enforced via SI itself, or robust containment methods alongside unwise or unethical controlling interests.

Q: Why the calcification of bad values be specifically harmful?
A: If sub-optimal values were calcified, for instance torture for fun, then there would be more suffering and less well-being than would otherwise occur had better values been achieved.

Q: Does VRisk require it’s own mitigation strategies distinct from those of xrisk and srisk?
A: Yes and no. See indirect normativity.

Q: What if value calcification is actually desirable in some cases, such as locking in beneficial values like empathy or well-being?
A: It is imprudent to calcify value to the Nth degree because we don’t know what values we will discover that may replace or outrank existing values.

Note, it’s pretty bad form to calcify values unless one knows everything there is to know about value in theory, and it’s outcomes in practice – knowing everything seems impossible, even with superintelligent AI). Though an interesting question is whether superintelligent AI can know enough to be certain enough that unknowns won’t have a material impact on the improvement of values.

Q: Could there be trade-offs that you’re not considering?
A: Yes

Moral Risk

Note, if one a sees morals and values as the same thing, then MRisk could be a synonym for VRisk. However use the term value to include ethical concerns as well as personal preferences, aesthetics and cultural ideals. So MRisk is more usefully a definition for VRisk minus the risks of preferences, aesthetics and cultural ideals being undermined.

The Orthogonality Thesis implies Value Risk

According to Nick Bostrom’s Orthogonality Thesis, Superintelligent AI may converge on values inimical to human values (i.e. and as a result may be indifferent to the value we have to survive and thrive.

From an objective view, this may not be so bad if AI’s values are objectively correct and human values (at least some of them) are objectively wrong or worse.

Value Rightness

All this assumes that some kind of ‘value rightness’ exists – such that ideal agents would stance-independently converge on it. Value rightness may be equivalent to ‘moral rightness’ which Nick Boston describes in his superintelligence book chapter 13.

Moral Realism

The candidate for moral rightness I favor is moral realism of the sort which combines rational stance independence and empirical testing of morally relevant objective features of the physical world.

However, the VRisk argument doesn’t require moral rightness to be true to work, and it is certainly not required that moral realism (as I conceive of it, or in other forms more generally) to be true.

Indirect Normativity to Mitigate Value Risk

To achieve objective moral rightness we may need to work on engineering some kind of indirect normativity, possibly involving constrained Oracle AI, from which to bootstrap unconstrained superintelligent AI.

A constrained Oracle AI may help us in further discovering the landscapes of value, and identifying pathways to objectively awesome value. This may not need to be achieved all at once. We could first achieve existential security, and go through iterations of long reflections in order to have a clear idea of what the landscape of value looks like, finding the edges of our understanding of value and working outwards from there.

If superintelligence were locked in to totalising some calcified sub-optimal value, this could be curtail the realisation of objectively better values, and nullify the possibility of achieving an/the optimal value system.

Risk Hierarchy & Priority

In a taxonomy of possible risk types, rankings may shift based on the value criteria one is using for evaluating the risks. Some consider existential risk be the worst, as existence is required for anything to happen – i.e. if ‘life’ doesn’t exist, while others consider locked in suffering risk to be even worse still, as it means a permanent state of dis-value – which is considered to be worse than no value.

From a causal point of view, value risk (VRisk) may sit high up if it is causally upstream from other risk types. Malignant values could increase
– suffering risks if the values tolerate or even promote unnecessary harm (i.e. torture is permissible as long as it’s fun for the torturer)
– ikigai risks (IRisk) if the value systems don’t include purpose
– even extinction risk, i.e. a value system has an embedded obligation for (perhaps permanent) extinction if the life contains some undesirable level of suffering (srisk) and/or life is devoid of meaning (irisk)

Some value systems see lack of meaning as a subset of suffering, or see them as one and the same thing.

I haven’t worked out ideal risk ranking yet, and at the moment remain agnostic.

Meta-Risks
├── Existential Risks (xrisk)
│ ├── Natural Disasters (e.g., supervolcanoes)
│ └── Anthropogenic Risks (e.g., unaligned AI)
├── Value Risks (vrisk)
│ ├── Value or Ethical Drift
│ ├── Value Lock-in
│ └── Cultural Stagnation
├── Suffering Risks (srisk)
│ ├── Systemic Oppression
│ ├── Extreme Sentient Suffering
│ └── Dystopian Futures
└── Ikigai Risks (irisk)
├── Loss of Purpose
├── Cultural Nihilism
└── Lack of Fulfillment

Value risk in the real world of AI safety

Updated 2025-02-12: Dan Hendrycks posted evidence showing that as AI systems advance, they are developing their own value systems, which can lead to misalignment with human values. Dan believes that Utility Engineering offers a potential empirical approach to study and mitigate these misaligned value systems.

Notes

V-Risk is not to be confused with:
  • Financial uses of the term ‘value’:
    • Value at risk (VaR): A financial metric that estimates how much a portfolio could lose over a given time period. VaR is used by banks and financial institutions to assess the risk and profitability of investments. 
    • Value of risk (VOR): The financial benefit that a risk-taking activity will bring to an organization’s stakeholders. VOR requires a company to examine the components of the cost of risk and treat them as an investment option. 
  • Moral hazard: In economics, a moral hazard is a situation where an economic actor has an incentive to increase its exposure to risk because it does not bear the full costs of that risk.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *