MRisk – Motivation Risk
What is M-Risk?
M-Risk is the risk that a transformative AI may not be adequately motivated to pursue good values, even if it knows what they are or how to discover them. The AI might possess the cognitive capability to understand values (including their preference hierarchy to sentient agents) but lack the motivation to pursue or act on them.
M-Risk can be considered upstream to V-Risk (Value Risk), which refers to the calcification or adoption of sub-optimal or undesirable values by superintelligent systems. Basically, if AI has bad or confused motivations, it may converge on bad values.
Say for instance a transformative AI singleton is motivated to obey it’s creators preferences – it could develop wants and values shaped to optimize for the wellbeing of it’s controller at an indefinitely large expense of the wellbeing of all others – resulting in a world staggering inequality (assuming the controller was selfish).
Key concepts:
Motivation Selection Methods
Motivation selection methods seek to prevent undesirable outcomes by shaping what the superintelligence wants to do. By engineering the agent’s motivation system and its final goals, these methods would produce a superintelligence that would not want to exploit a decisive strategic advantage in a harmful way. Since a superintelligent agent is skilled at achieving its ends, if it prefers not to cause harm (in some appropriate sense of “harm”) then it would tend not to cause harm (in that sense of “harm”).
Motivation selection can involve explicitly formulating a goal or set of rules to be followed (direct specification) or setting up the system so that it can discover an appropriate set of values for itself by reference to some implicitly or indirectly formulated criterion (indirect normativity) . One option in motivation selection is to try to build the system so that it would have modest, non-ambitious goals (domesticity). An alternative to creating a motivation system from scratch is to select an agent that already has an acceptable motivation system and then augment that agent’s cognitive powers to make it superintelligent, while ensuring that the motivation system does not get corrupted in the process (augmentation).
– Superintelligence – Paths, Dangers, Strategies by Nick Bostrom – Chapter 9 – The Control Problem, pg 165
Additionally, M-Risk could be upstream from the risk of goal or value lock-in (or L-Risk: Lock-in Risk) – the danger that a superintelligence might be motivated solely to uphold the current motivationally scaffolded ‘final’ goals (or value systems) rather than explore, discover and implement new successor goals.
In this case, the superintelligence might lock in existing values, never questioning whether better alternatives exist, or as mentioned, calcifying and clinging to existing ones. This could lead directly to value lock-in variety of V-Risk, where values become calcified and resistant to change, even if they are flawed or insufficient for long-term flourishing.
Motivational Scaffolding
Motivational scaffolding… involves giving the seed AI an interim goal system, with
relatively simple final goals… Once the AI has developed more sophisticated
representational faculties, we replace this interim scaffold goal system with one that
has different final goals. This successor goal system then governs the AI as it
develops into a full-blown superintelligence.
Because the scaffold goals are not just instrumental but final goals for the AI, the
AI might be expected to resist having them replaced (goal-content integrity being a
convergent instrumental value). This creates a hazard. If the AI succeeds in
thwarting the replacement of its scaffold goals, the method fails.– Superintelligence – Paths, Dangers, Strategies by Nick Bostrom – Chapter 9 – The Control Problem, pg 165the future of regenerative medicine!
It’s a bit confusing calling an interim goal a ‘final’ goal just because it isn’t an instrumental goal – but that’s how they are named.
Why M-Risk Matters
Toby Ord recently remarked that AI has effectively parsed most of the literature generated by humanity, including philosophy, ethics, and AI itself, but still, it doesn’t care. For the purposes of this argument, let’s assume that the AI not only has absorbed the appropriate training data and can provide cohesively persuasive ethical augments, but comprehend ethics. This points to the core of M-Risk: understanding and motivation are distinct. A system can comprehend ethical theories, calculate consequences, and still be indifferent to moral imperatives. Of course the indifference is far more pronounced if the AI doesn’t comprehend the ethics. Without an appropriate motivational structure, a superintelligence might not act in line with the best ethical frameworks, regardless of its level of comprehension of them.
… as Stuart Russell predicted, these new AI systems have read vast amounts of human writing. Something like 1 trillion words of text. All of Wikipedia; novels and textbooks; vast parts of the internet. It has probably read more academic ethics than I have and almost all fiction is nothing but humans doing things to other humans combined with judgments of those actions.
These systems thus have a vast amount of training signal about human values. This is in big contrast to the game-playing systems, which knew nothing of human values and where it was very unclear how to ever teach them about such values. So the challenge is no longer getting them enough information about ethics, but about making them care.
Toby Ord – The Precipice Revisited
A colour blind person can comprehend all the theory there is about colour – yet not experience colour. (See my writing on Mary’s Room and the Knowledge Argument applied to Ethics – and also whether pzombies can do philosophy).
In a future scenario where superintelligent systems possess cognitive supremacy—far surpassing human intellect—they are likely to break through any containment strategies we set. Relying solely on containment becomes increasingly precarious as AI’s cognitive capabilities increase. Given this, it is worth putting more emphasis on methods of motivational alignment and selection, rather than placing too much faith in containment alone.
The Two Forms of M-Risk
- Motivational Deficiency in Known Values: The AI understands good values (perhaps values we would converge on if we were wiser or had more time to think) but isn’t motivated to pursue them. This could arise from poor motivational alignment or an incomplete motivational architecture. In such a scenario, the superintelligence might pursue other objectives—like maximizing paperclips, resources, or control—without being driven to implement the better values it knows.
- Stagnation in Value Discovery: An AI might be motivated to preserve current values, without being motivated to explore or discover new ones. Such stagnation can lock in sub-optimal value systems and prevent moral growth. We face the risk of an AI locking humanity (or other sentient beings) into a world governed by values that could be outdated or wrong, with no motivation to question or evolve those values. The result is V-Risk by entrenchment.
Moving Beyond Containment: Motivational Selection Methods
While containment strategies may serve a purpose in the early stages of AI development, it’s increasingly clear that motivational selection must play a central role in alignment. Motivational alignment techniques should focus on instilling curiosity, exploration of moral space, and a genuine drive to pursue moral truths as discovered through reflective equilibrium, empirical evidence, or idealized moral reasoning. Without this, we risk creating superintelligent systems that are either apathetic to value discovery or worse, locked into values that we would not want them to uphold in the long term.
Given the challenges posed by M-Risk, developing methods for ensuring that AI systems are both able to discover better values and are motivated to act on those discoveries is vital. This goes beyond simply specifying initial value systems—it requires ongoing refinement, adaptability, and a deeper focus on motivational mechanisms that can foster long-term moral progress.
Conclusion
M-Risk presents a significant challenge in ensuring AI alignment. Without appropriate motivations, even a superintelligence that understands moral truths may fail to act on them or may become stagnated in outdated or harmful values. As we move forward, it’s critical to develop and invest in motivational selection methods that ensure AI systems are motivated not only to pursue current values but also to seek and adopt better ones as they are discovered. Only through this can we mitigate both M-Risk and V-Risk, paving the way for a future where superintelligence supports the flourishing of all morally relevant beings.