VRisk – Value Risk
This is an attempt to formally introduce the concept of value risk (v-risk or VRisk). This is not an attempt at repackaging well known established risk categories like x-risk1 and s-risk2, it’s an attempt to carve out axis of concerns about value choice, hopefully to help increase clarity and weight to what might otherwise sound vague.
What is Value Risk?
Value Risk (v-risk) is the risk that a system – technological or institutional – locks in a value structure that is misaligned with the flourishing of humans (and other morally relevant agents), in a way that is difficult or impossible to revise.
Short definition of Value Risk: The risk of bad or suboptimal values being entrenched or fatal.
If suboptimal values3 (not as good as they could be) or harmful values4 (causing suffering and death etc) become dominant, this may shape the trajectory of civilisations, institutions, AI systems (especially transformative AI systems) in ways that may be difficult or impossible to reverse. Unlike existential risks, which threaten outright extinction, or suffering risks , which entail large-scale suffering, value risk concerns the entrenchment of flawed systems of value that lock in undesirable futures (which may result in the downstream increase in other risks).
As AI systems grow more capable, they exhibit emergent value formation, sometimes displaying biases, unexpected ethical priorities, and resistance to value correction5. This raises urgent questions: 
 – How do we ensure value alignment in AI? 
 – Can we prevent “value lock-in” that cements moral or political preferences permanently?
Understanding and mitigating VRisk is crucial for steering the future towards beneficial outcomes rather than an irreversible calcification of misguided values.
P.s. I said ‘human’ well-being and safety to ground the examples, but stance independence requires the extension of well-being to all morally relevant things.
P.s.s I think the first time I heard ‘value risk’ was in a presentation by Anders Sandberg on ‘Grand Futures’ that he gave as part of the ‘Stepping Into the Future‘ conference.
Update: I gave a presentation at a Skeptics Cafe which touched on the value risk of superintelligence getting stuck in suboptimal value basins.
What are values?
So we are on the same page, what are values?
Values are principles, priorities, or standards that guide decision-making and behaviour. They represent what an individual, society, or system considers important, desirable, or worth pursuing. Values can be moral (e.g., fairness, compassion), epistemic (e.g., truth, rationality), aesthetic (e.g., beauty, harmony), or instrumental (e.g., efficiency, productivity). In AI and alignment discussions, values refer to the implicit or explicit goals and preferences that shape an AI system’s actions and decision-making processes.
Examples of value misalignment
Value misalignment can have significant consequences, ranging from financial losses and reputational damage to social injustice and personal dissatisfaction.
Social media algorithms: Social media companies often prioritise user engagement and ad revenue. This can lead to algorithms that amplify sensational or divisive content, even if it’s harmful or misleading. This misalignment between the company’s financial goals and the potential negative impact on society can have serious downstream consequences.
Data privacy: Tech companies collect vast amounts of personal data from users. While this data can be used to personalise services and improve user experience, it can also be used for targeted advertising or even sold to third parties. This misalignment between the company’s desire to collect and monetise data and the user’s right to privacy can lead to breaches of trust and potential harm.
Enron Scandal (2001): Enron’s stated values included integrity, respect, and communication. However, the company’s actual practices involved deceptive accounting, inflated profits, and a culture of secrecy. This misalignment between espoused values and actual behaviour led to the company’s downfall and significant financial losses for employees and investors. [link]
Health and Wellness: The wellness industry is rife with misinformation and pseudoscience. Influencers often promote unproven or even harmful products and practices, capitalising on people’s desire for health and well-being. This creates value misalignment where people may believe they are making healthy choices but are actually being misled. [link]
Economic Inequality: The narrative around economic inequality is often distorted by those who benefit from the status quo. Concepts like “trickle-down economics” and the “deserving poor” are used to justify policies that exacerbate inequality. This leads to value misalignment where people may believe in fairness and opportunity but support systems that perpetuate economic disparities.
Value Risk situated in the constellation of other major risks
Bad outcomes from the entrenchment of bad values:
 –  where existential risk (XRisk) is high. Existential here could mean extinction of humans – a broader definition could mean all life on earth etc, or the universe becoming uninhabitable to the degree where there is no chance of new life emerging to experience anything.  A particular nasty form of extinction is local annihilation of all life; no way for a local cluster of life (i.e. an ecosystem) to claw back. At the extreme end, existence eradication means the fabric of the universe can no longer produce life.
 –  where the risk of large amounts of suffering could happen (SRisk).
Value Risk entails some kind of permanent or transient state of bad values.
XRisks is the risk of a permanent bad outcome – i.e. the end to human civilisation. Perhaps resulting in annihilation or some kind of value lock-in damning access to far better values upon which to scaffold future utopian civilisations.
SRisks aren’t necessarily lock-in – there may be the possibility of recovery – like if there were a war that blew civilisation back to the stone age, while enduring massive hardship humans may innovate their way back to modern civilisation over time. Note, lock-in of good values may not seem too bad, though it would be better if there were always room for improvement up to the point at which there are no possible better values (or at least the best reachable values).
Q: How might bad values become “calcified”?
A: A superintelligent AI (SI) may totalise value set X, enforcing it for time and all eternity – this may be a result of overly paternalistic constraints, design flaws, value drift, or a result of premature convergence on goals that are hard-coded or unwisely chosen via direct specification.
The persistence of value calcification could be ultimately enforced via SI itself, or robust containment methods alongside unwise or unethical controlling interests.
Q: Why the calcification of bad values be specifically harmful?
A: If sub-optimal values were calcified, for instance torture for fun, then there would be more suffering and less well-being than would otherwise occur had better values been achieved.
Q: Does VRisk require it’s own mitigation strategies distinct from those of xrisk and srisk?
A: Yes and no. See indirect normativity.
Q: What if value calcification is actually desirable in some cases, such as locking in beneficial values like empathy or well-being? 
A: It is imprudent to calcify value to the Nth degree because we don’t know what values we will discover that may replace or out-rank existing values.
Note, it’s pretty bad form to calcify values unless one knows everything there is to know about value in theory, and it’s outcomes in practice – knowing everything seems impossible, even with superintelligent AI. Though an interesting question is whether superintelligent AI can know enough to be certain enough that unknowns won’t have a material impact on the improvement of values.
Q: Could there be trade-offs that you’re not considering?
A: Yep…
Moral Risk
Note, if one a sees morals and values as the same thing, then MRisk could be a synonym for VRisk. However I use the term value to include ethical concerns as well as survival strategies, personal preferences, aesthetics and cultural ideals. So MRisk is more usefully a definition for VRisk minus the risks of survival, preferences, aesthetics and cultural ideals being undermined.
The Orthogonality Thesis implies Value Risk
According to Nick Bostrom’s Orthogonality Thesis, Superintelligent AI may converge on values inimical to human values (i.e. and as a result may be indifferent to the value we have to survive and thrive).
From an objective view, this may not be so bad if AI’s values are objectively correct and human values (at least some of them) are objectively wrong or don’t even approach rightness.
Value Rightness
All this assumes that some kind of ‘value rightness’ exists – such that ideal agents would stance-independently converge on it. Value rightness may be equivalent to ‘moral rightness’ which Nick Boston describes in his superintelligence book chapter 13.
Moral Realism
The candidate for moral rightness I favour is moral realism of the sort which combines rational stance independence and empirical testing of morally relevant objective features of the physical world. I include context sensitivity as a property of moral realism in attempt to try to avoid naively applying overly broad principles to nuanced contexts.
However, the VRisk argument doesn’t require moral rightness to be true to work, and it is certainly not required that moral realism (as I conceive of it, or in other forms more generally) to be true.
Indirect Normativity to Mitigate Value Risk
To achieve objective moral rightness we may need to work on engineering some kind of indirect normativity, possibly first involving constrained Oracle AI, from which to bootstrap unconstrained but morally aligned superintelligent AI, which can further progress indirect normativity.
A constrained Oracle AI may help us in further discovering the landscapes of value, and identifying pathways to objectively awesome value. This may not need to be achieved all at once. We could first achieve existential security, and go through iterations of long reflections in order to have a clear idea of what the landscape of value looks like, finding the edges of our understanding of value and working outwards from there.
If superintelligence were locked in to totalising some calcified sub-optimal value, this could curtail the realisation of objectively better values, and nullify the possibility of achieving an/the optimal value system.
Risk Hierarchy & Priority
In a taxonomy of possible risk types, rankings may shift based on the value criteria one is using for evaluating the risks. Some consider existential risk be the worst, as existence is required for anything to happen – i.e. if ‘life’ doesn’t exist, while others consider locked in suffering risk to be even worse still, as it means a permanent state of dis-value – which is considered to be worse than no value.
From a causal point of view, value risk (VRisk) may sit high up if it is causally upstream from other risk types. Malignant values could increase
 – suffering risks if the values tolerate or even promote unnecessary harm (i.e. torture is permissible as long as it’s fun for the torturer)
 – ikigai risks (IRisk) if the value systems don’t include meaning & purpose
 – even extinction risk, i.e. a value system has an embedded obligation for (perhaps permanent) extinction if the life contains some undesirable level of suffering (SRisk) and/or life is devoid of meaning (IRisk)
Some value systems see lack of meaning as a subset of suffering, or see them as one and the same thing.
I haven’t worked out ideal risk ranking yet, and at the moment remain agnostic.
Meta-Risks
├── Value Risks (vrisk)
│ ├── Value or Ethical Drift
│ ├── Value Lock-in
│ └── Cultural Stagnation
├── Existential Risks (xrisk)
│ ├── Natural Disasters (e.g., supervolcanoes)
│ └── Anthropogenic Risks (e.g., unaligned AI)
├── Suffering Risks (srisk)
│ ├── Systemic Oppression
│ ├── Extreme Sentient Suffering
│ └── Dystopian Futures
└── Ikigai Risks (irisk)
   ├── Loss of Purpose
   ├── Cultural Nihilism
   └── Lack of Fulfillment
Value risk in the real world of AI safety
Updated 2025-02-12: Dan Hendrycks posted evidence showing that as AI systems advance, they are developing their own value systems, which can lead to misalignment with human values. Dan believes that Utility Engineering offers a potential empirical approach to study and mitigate these misaligned value systems.
We’ve found as AIs get smarter, they develop their own coherent value systems.
— Dan Hendrycks (@DanHendrycks) February 11, 2025
For example they value lives in Pakistan > India > China > US
These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment. 🧵 pic.twitter.com/Q7FbWa3pOk
Notes
V-Risk is not to be confused with:
- Financial uses of the term ‘value’:
- Value at risk (VaR): A financial metric that estimates how much a portfolio could lose over a given time period. VaR is used by banks and financial institutions to assess the risk and profitability of investments.
- Value of risk (VOR): The financial benefit that a risk-taking activity will bring to an organisation’s stakeholders. VOR requires a company to examine the components of the cost of risk and treat them as an investment option.
 
- Moral hazard: In economics, a moral hazard is a situation where an economic actor has an incentive to increase its exposure to risk because it does not bear the full costs of that risk.
Why human values?
To minimise ambiguity, we define value outcomes in terms of ‘human values,’ a concept that resonates clearly with human readers. However, for greater precision, these outcomes should extend to all sentient beings. Wherever feasible, we should anchor these definitions in universal moral principles that transcend humanity, ensuring broader applicability.
Further Reading
Artificial Intelligence, Values, and Alignment – Iason Gabrial
Superintelligence (especially sections on value loading and indirect normativity) – Nick Bostrom
Deep Utopia (especially later chapters) – Nick Bostrom
Footnotes
- Existential Risk (X-Risk) refers to a potential future event that could cause human extinction or permanently and drastically curtail humanity’s potential. These risks are characterised by their potentially catastrophic and irreversible consequences for the future of our species ↩︎
- Suffering risk (s-risk) are risks of astronomical suffering – where immense suffering on a cosmic scale could occur, potentially far exceeding any suffering experienced on Earth so far. ↩︎
- Suboptimal values are values that, when pursued by AI, do not maximise human well-being or align with intended human outcomes. For example, an AI prioritising efficiency over fairness might lead to biased decisions, which is suboptimal for societal harmony. I.e. An AI optimising for short-term profit over long-term sustainability, leading to environmental degradation. ↩︎
- Harmful values are those that directly cause harm, such as an AI valuing self-preservation over human safety, potentially endangering lives. I.e An AI prioritising speed in autonomous vehicles, potentially causing accidents by ignoring pedestrian safety. ↩︎
- ↩︎

 
		 
			 
			 
			 
			 
			 
			
6 Comments