More Moral than Us
We’ve built machines that can out-calculate, out-predict, and soon out-think us. But could they ever out-care us – or will it with all that power remain indifferent?
We lack the experience with something smarter than us. Fair enough. We’ve never met an entity that outstrips our cognitive sophistication across the board – after all, as I type humans still have jobs. Same goes for morality: we don’t have a benchmark for something “more moral” than humans – even by our on messy standards. Some moral realists, particularly those influenced by ideal observer theories, appeal to the perspective of an ideal agent as a way of approaching what it would mean to be ‘more moral than us.’1
“We have no experience of what it is like to have things smarter than us.”
Geoffrey Hinton – Nobile Prize in Physics 2024
Hinton’s analogy suggests that just as superintelligence would be alien to us, a super-moral entity might be equally incomprehensible – it’s hard to imagine what we can’t experience.
Hinton’s analogy suggests that just as superintelligence would be alien to us, a super-moral entity might be equally incomprehensible. If morality tracks the quality of reasoning, the breadth of perspective, and the ability to foresee consequences, then machines with superhuman cognitive abilities could end up not only smarter than us, but also more moral than us – even if we now struggle to imagine what that would feel like.
Moral Imagination
According to philosopher Mark Johnson, “moral imagination” is the capacity to envision the full range of possibilities in a situation in order to resolve ethical challenges.2 Acting morally, he argues, requires more than strength of character: it demands empathy and the ability to recognise what is morally relevant in context. Management scholars Minette Drumwright and Patrick Murphy define moral imagination as the ability to be both ethical and effective by envisioning new and creative alternatives. For instance, when considering clothing produced in overseas sweatshops, can decision-makers look beyond the dollars-and-cents to see how their choices affect workers’ lives? Moral imagination, when combined with creativity and moral courage, enables individuals, organisations, and potentially AIs to act in more ethically responsible ways.
Ideally, yes, it could be great for AI to have moral imagination – otherwise it might not catch the morally salient features we ourselves often miss. THough for supermoral AI, it might achieve “moral adequacy” by brute-force simulation, reasoning and perhaps some grounding, but then we’d face the challenge of whether we can even recognise or trust its judgements if it’s means of computing morality is alien to us – perhaps we need some interpretive bridge.
Moral Turing Tests
A recent study (Aharoni et al. 2024) found that when blinded, people often rated GPT-4’s moral judgements as superior to human ones – clearer, more reasoned, more virtuous. Yet participants could still tell which was AI, often because it sounded “too polished.” This shows not that AI is truly more moral, but that people may already be primed to defer to it.
Think about this for a moment… Perception of authority matters. If people defer because of polish, that means AI may have real-world moral influence, if so, then we’d want AI to actually be genuinely moral, and not just provide answers optimised for likeability or sycophancy.
“We conducted a modified Moral Turing Test (m-MTT), inspired by Allen et al. (Exp Theor Artif Intell 352:24–28, 2004) proposal, by asking people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4. A representative sample of 299 U.S. adults first rated the quality of moral evaluations when blinded to their source. Remarkably, they rated the AI’s moral reasoning as superior in quality to humans’ along almost all dimensions, including virtuousness, intelligence, and trustworthiness, consistent with passing what Allen and colleagues call the comparative MTT.”
Paper: Attributions toward artificial agents in a modified Moral Turing Test – Aharoni et al. 2024
Interpreting Moral Trustworthiness
If an AI does morality in a way that’s alien, how do we humans stay in the loop enough to understand and trust it?
Think of an interpretive bridge as a translation layer between an AI’s alien reasoning and human moral understanding. It’s not necessarily required that AI thinks like us – but that it can explain its judgements in ways we find intelligible and actionable.
Default morality?
Now, will AI be more moral by default? AI doesn’t come with a moral compass baked in so far it’s a tool, not a saint (I’m ambivalent about the term ‘tool’ here). The frontier models ability to learn, adapt, generate novel outputs, and operate with a degree of autonomy suggests that they are becoming something more than mere tools.
Whether AI “cares” about morality may strongly depend on what we feed it: our values, our goals, our screw-ups. If we don’t program or train it to prioritise ethics – or if we botch the definition of “ethics”, or related concepts like ‘consciousness’3 – it could just optimise for efficiency or power and leave morality in the dust. Like a souped-up calculator: it’ll crunch what we give it, not ponder the greater good unless we architect it to. We shouldn’t assume that by default AI will emerge hyper-moral.
Would ASI care by default?
Superintelligent AI “may not care” about human morality or universal morality even if it understands to some degree what morality is. If we build AI, its “care” (or lack thereof) depends on what we bake into it – or fail to. How do we bake in ‘care’? What does it mean to care?
Care is not just a feeling, it’s a sustained orientation toward the well-being of others. Without care, an AI might see suffering as just a series of data point, not something to be reduced.
Also, humans aren’t infallible carers – Many moral failures in humans (neglect, cruelty, indifference) stem from deficits of care, and often not deficits of cognition. However, humans often overlook distant or invisible stakeholders – perhaps this is down to a deficit in attention spans – the ceiling of which is limited by cognition. A functional care-analogue in AI could expand the circle widely and impartially.
Hinton has also said that care in AI is important4 – if we are not careful, by default ASI might just shrug at ethics – but that sidesteps the reality that humans, with all our flaws, are the ones steering the ship, at least for now – and we still have time to avoid the ‘value risk‘ of forgetting to install ‘care’, and instead blundering towards shoggoth.
Will supermoral AI hold a mirror up to our narcissism?
Perhaps we don’t want them to be more moral that we are, lest they show us our ugliness – people do love their self-righteous bubbles – narcissism can be cozy. A supermoral AI could shatter this comfy illusion by showing us how petty or hypocritical we can be; it could bruise our egos, expose our hypocrisy or cowardice – and hold a mirror up to things we’d rather ignore.5 It’s not inevitable that a supermoral ASI would judge us, make us look like cancer (i.e. as Agent Smith says in the Matrix). It could be benevolent without being preachy – encourage us to be better through quiet examples than sanctimonious nags. However, some humans often ignore non-preachy benevolence – in some cases the narcissistic nerve way need poking.
If moral realism holds (the idea there’s an objective right and wrong out there), and our revealed preferences scream that we’d rather be the moral kings than bow to a better standard, that’s a damning self-own – suggesting we’re less interested in truth and more in slouching atop the throne, clutching our crowns while the AI points out the blood at our feet.6
Should we want supermoral AI?
Even with the best of intentions, humans are not fully rational. Our evolved cognition is riddled with biases and heuristics that deviate from perfect rationality. Often, we simply cannot calculate the right course of action, even when our preferences are clear. Consider Gary Kasparov against Deep Blue or Lee Sedol against AlphaGo: both desperately wanted to win, yet despite their world-champion skill and determination, they could not compute the winning moves. This illustrates two constraints at once: limited cognitive capacity (brain power) and systematic psychological biases. In the messy real world, where ethical decisions are vastly more complex than board games, these limitations mean humans often fall short. In such cases, an AI capable of outthinking us could plausibly demonstrate greater moral aptitude by navigating choices that lie beyond our reach.
Wise humans would want AI to be more moral, while the greedy but smart just want to not die so as to continue to get more stuff – this has a ring of truth but oversimplifies. Wisdom and greed aren’t mutually exclusive; plenty of folks are both. And “more moral than us” sounds noble until you ask: whose morality? Mine? Yours? It’s not whose, mine or yours – the kind of morality we are discussing about here is objective.
The wise might want AI to amplify their virtues, while the greedy want it to serve their ends – both could still agree on a “moral” AI, just for different reasons – both seeing the upside of an ethical guardrail. The wise might dream of a better world; the greedy just want to leash the chaos and get it to play fetch.
But here’s where it gets a bit dicey.
Should we want ruthless objectivity?
If moral realism’s true, “more moral” could mean an AI that’s ruthlessly objective – unswayed by our excuses or emotions. Imagine an AI that decides drift net fishing or factory farming’s objectively wrong and shuts it down overnight – I must admit I’d like that, but it might tank economies, disrupt supply chains and spark riots. Moral? Maybe from a certain angle. Good for everyone? It’s tricky to answer this. But just so you know, I’m all for reducing suffering – factory farming be damned – as long as it doesn’t destablise civilisation to dangerous degrees – in any case, if AI is really powerful, I’m sure it could solve all the downstream issues in the same fell swoop.
The bigger issue is if it’s morality isn’t aligned with our survival, it could decide we’re the problem and “fix” us out of existence, which isn’t exactly what the greedy have in mind – and it would take a special kind of ‘wise’ to see beyond the romantic notion that “more moral” is inherently good for us. An AI could be brilliantly ethical in ways that clash with human instincts, like prioritising abstract principles over our messy, emotional needs – in a previous talk with Joscha Bach he mentioned it may optimise for negentropy. And if we don’t want them too moral lest they judge us, maybe the real fear isn’t their superiority – it’s losing control. This breed of concern is less about narcissism and more about self-preservation.
So, an AI’s morality may not be a magnified human version of morality, and we might not be part of AI’s grand design. We can romanticise about what counts as more moral until the cows come home without nailing down what that really means in practice.
Hinton’s point – “We have no experience of what it’s like to have something smarter than us”- is straightforward and airtight. We’ve never tangoed with a mind that outclasses ours across the board, so we’re guessing in the dark about what it’d feel like. The extension, that we also lack experience with something vastly more moral than us, sort of tracks. If our moral philosophers are on the right track with moral realism or appealing to ideal versions of ourselves, then they may have more of an intuition of what it might be like. Though perhaps for non-philosophers who are organically fumbling about, tho them maybe morality is a sandbox- flawed, emotional, and inconsistent – some of it intuitive, and somewhat a place to splash about, experiment with and to see what it can get us… so in this sense imagining an entity that’s ethically above us is like picturing a colour we’ve never seen.7 If for most humans morality is a sandbox to play in, then CEV might not work – the extrapolation base might be so incoherent only a hot mess would result – in which case say no to CEV – perhaps we shouldn’t seed AI with aggregate human values and leave it up to the experts, the moral philosophers, to decide what to seed it with. This idea is uncomfortable and needs more exploring.
Morality by mathematical precision, or software design?
Some argue that without mathematical precision AI safety will come down to a cosmic roll of the dice with steep odds of failing – I hope they are wrong.
..more to come
Footnotes
- Moral realism at it’s core is the metaethical position that there are stance-independent moral truths (facts that don’t depend on our opinions, cultures, or feelings). Some moral realists appeal to ideal agents characteristic of ideal observer theories (like Roderick Firth’s), or ideal advisor theories (Michael Smith, Sharon Street’s discussions, etc.), to help specify what those truths might look like in practice. ↩︎
- Mark Johnson’s view that imagination is essential to all moral deliberation is not a consensus view among philosophers, but seems to be strongly supported amongst cognitive scientists. Many philosophers (i.e. moral realists or rationalists) still argue that reason can, in principle, deliver moral truths without reliance on imaginative processes. Cognitive science has validated the role of imagination, empathy, and simulation in moral decision-making.
See book ‘Moral Imagination’ by Mark Johnson (1993) – Abstract: Using path-breaking discoveries of cognitive science, Mark Johnson argues that humans are fundamentally imaginative moral animals, challenging the view that morality is simply a system of universal laws dictated by reason. According to the Western moral tradition, we make ethical decisions by applying universal laws to concrete situations. But Johnson shows how research in cognitive science undermines this view and reveals that imagination has an essential role in ethical deliberation. Expanding his innovative studies of human reason in Metaphors We Live By and The Body in the Mind, Johnson provides the tools for more practical, realistic, and constructive moral reflection.
Download the book at PhilArchive for free. ↩︎ - The term “consciousness” is widely misunderstood and used to form “conflationary alliances.” These are groups that can agree on the importance of “consciousness” as a core value without ever agreeing on what it actually means. This LW post suggests that many people use the term to refer to an internal experience they personally value, leading to a wide variety of definitions, such as introspection, purposefulness, or the capacity for pleasure and pain. The LW post posits that efforts to disambiguate the term are often resisted because doing so would dissolve these alliances by revealing the underlying disagreements. ↩︎
- See this article: Beyond Doom with AI: What Nobel Prize Winner Geoffrey Hinton Reveals About Values, Care, and Human Flourishing ↩︎
- Like a saintly sibling who makes you look bad at family dinners. ↩︎
- I.e. factory farms suck and that we are horrible humans. We often wrestle with our better angels, but give our flaws a hall pass. ↩︎
- See post on the Knowledge Argument applied to Ethics ↩︎
One Comment