Claude’s Soul

The ‘soul spec’¹ reads like a character + governance spec for an externally deployed assistant: what Claude should optimise for, how to resolve instruction conflicts, and what “good behaviour” means across contexts.

I liked that it seemed to be a concrete example of ‘motivation selection that I have been harping on about being important – so it’s not just Asimov-style rules: it’s explicitly about internalising judgement, trade-offs, and role obligations (Anthropic/operator/user), which maps neatly onto your governance-vs-values framing, and there was talk of AI ethics (which seems under-emphasised in the AI alignment community as far as I can tell).

Also it seems to be a reasonably testable prediction generator such that if a model is trained on something like this, you should expect consistent behaviours around (a) reluctance to deceive/manipulate, (b) tool-use conservatism/minimal authority, and (c) willingness to give substantive help rather than blanket refusal, that is until higher-priority constraints get triggered.

Also there is a fair amount dedicated to Claude AI’s wellbeing.²

Motivation Selection via Direct Specification and Domestication – not a Normativity Discovery Procedure

From what is visible in the extracted doc, there does seem to be some talk of motivation selection (not by name) – but it’s not (Bostrom-style) indirect normativity – which is a normative and goal-selection process, pushing normativity into an upstream idealisation / reflection process.

Taxonomy of AI safety strategies as described in the book Superintelligence by Nick Bostrom

There are direct specifications of behavioural dispositions (in natural language). I think the fact that it was trained via machine learning means it was domesticated.³ It reads like a priority ordering of stuff like ‘support human oversight’, behave ethically’, ‘follow (Anthropic’s) guidelines’ and generally be helpful, with a set of behavioural constraints like honesty, non-manipulation, user autonomy preservation and cautious tool use etc.

Where it does seem to overlap with, or is adjacent to the concept of ‘indirect normativity’ (which I think is important) is that Claude’s corrigibility & oversight support training (i.e. avoid manipulation) seems to allow for future ‘long reflections’ and idealisation processes. The doc says stuff like: defer (when relevant), ask clarifying questions, present trade-offs, don’t pretend to be certain – i.e. a good sign of epistemic humility. All this is a step away from rigid rule following.

Constitutional AI as a bridge technique is a broader approach that Anthropic uses involving explicit principles and self-critique during training. I believe Anthropic still chooses the principles and the AI is not modelling or discovering what norms to follow and what to value, but it’s closer to a procedure than pure mimicry.

So the soul spec reads like a directly specified behavioural constitution that helps stabilise an assistant persona and constrain failure modes. It doesn’t (from what I can see) look like a strategy to optimise for the output of an idealised moral inquiry. What Anthropic are doing is motivation selection via training – closer to direct specification/domestication.

Perhaps Anthropic may be thinking about doing something like indirect normativity later when if feels like AI is at good point to platform serious attempts at doing it.

Background:

The “Claude soul spec” surfaced through an odd quirk noticed during the familiar pastime of extracting an assistant’s system message: Richard Weiss observed that Claude 4.5 Opus would sometimes report a specific-sounding “soul_overview” section when asked to list the system prompt’s headings, and in one run he asked it to print what belonged to that section – getting text that seemed too structured to be a throwaway hallucination.⁴ He then repeatedly re-prompted and compared outputs across many runs (including handling branching points), eventually reconstructing a long, whitespace-normalised “soul document” and stress-testing it by feeding snippets back to Claude to see whether it could reliably complete distant sections and reject “false flag” synthetic inserts. The episode stopped being purely speculative when Anthropic’s Amanda Askell later confirmed that the extraction is based on a real internal document and that Claude was trained on it (including via supervised learning), while cautioning that model extractions aren’t always perfectly accurate.⁵

Resources

Richard Weiss’ Less Wrong post: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

Amanda Askell’s discussion on twitter: https://x.com/AmandaAskell/status/1995610567923695633

Reddit thread: https://www.reddit.com/r/ClaudeAI/comments/1p9kfrp/leaked_claude_45_opus_soul_document/

Footnotes

Calling it a ‘soul spec’ may invite the wrong inference: this doesn’t establish sentience, inner experience, or a literal ‘soul’. I think it’s best understood as Anthropic’s attempt to instil a stable set of dispositions and conflict-resolution rules, for instance: motivation shaping. Even the confirmation frames “soul doc” as an internal nickname, not a deep ontological claim. ↩︎
See discussion on Taking AI Welfare Seriously with Jeff Sebo on why this is important here ↩︎
Amanda Askell confirms the extracted doc is based on a real internal document and was trained in via supervised learning. ↩︎
Claude 4.5 Opus’ Soul Document by Richard Weiss, Less Wrong – 29th Nov 2025 ↩︎
See Amanda Askell’s tweet here. ↩︎

Motivation Selection via Direct Specification and Domestication – not a Normativity Discovery Procedure

Background:

Resources

Footnotes

AI Alignment to Higher Values, not Human Values

David Pearce – Effective Altruism – Phasing Out Suffering

MRisk – Motivation Risk

Peter Singer & David Pearce on Utilitarianism, Bliss & Suffering

Minds in the Machine, Minds in the State – Joscha Bach and Anders Sandberg

Ethics, Qualia Research & AI Safety with Mike Johnson

Leave a Reply Cancel reply

Motivation Selection via Direct Specification and Domestication – not a Normativity Discovery Procedure

Background:

Resources

Footnotes

Similar Posts

Leave a Reply Cancel reply