More Moral than Us

We’ve built machines that can out-calculate, out-predict, and soon out-think us. But could they ever out-care us – or will it with all that power remain indifferent?

We lack the experience with something smarter than us. Fair enough. We’ve never met an entity that verifiably outstrips our cognitive sophistication across the board – after all, as I type humans still have jobs, and can still do and think things that AI can’t. Same goes for morality: we don’t seem to agree on a benchmark for something “more moral” than humans – even by our on messy standards 1. Some moral realists, particularly those influenced by ideal observer theories, appeal to the perspective of an ideal agent as a way of approaching what it would mean to be ‘more moral than us.’2

“We have no experience of what it is like to have things smarter than us.”

Geoffrey Hinton – Nobile Prize in Physics 2024

Hinton’s analogy suggests that just as superintelligence may be alien to us, a supermoral entity might be equally weird – it’s hard to imagine what we can’t experience.

If morality tracks the quality of reasoning, the breadth of perspective, and the ability to foresee consequences, then machines with superhuman cognitive abilities could end up not only smarter than us, but also more moral than us – even if we now struggle to imagine what that would feel like.

An Intelligibility-Gap?

A pertinent question is, if superintelligence were more moral than us, would it’s moral reasoning (to us less cognitively abled humans) be:

  • unfathomably mysterious to us – too distant and complex for humans to ever fully grasp?
  • or would it simply unlock a higher-order truth and pull the scales from our eyes – a shift from ignorance to enlightenment?
  • or perhaps something in between – were it’s reasoning could allow us to see further links in the chains of reasoning before it disappears into fogs of complexity? (contra error theory: epistemically accessible in principle, but in practise not to us)

There may be a threshold whereby we can understand the moral reasoning and evidence that a superintelligence could supply us such that it’s justified that we accept it.

Perhaps there is a principled trust threshold or calibrated deference level that we can lean on here – principles that would allow humans to verify a sufficient portion of an AI’s morality to justify deferring to the remaining mysterious-to-us portion. The idea is that we don’t need to understand everything the Superintelligence (ASI) does, but we must understand enough to be confident in its foundational competence and alignment.

As individuals we already defer to expertise often enough with humans – we offer epistemic deference to based on perceived confidence of experts and peers, and if we are good rationalists, we calibrate our credences accordingly.

So, if an superintelligence reliably demonstrates superhuman competence in all related, verifiable domains, are we justified in trusting its unverified conclusions in other domains (i.e. morality).

Epistemic Deference

Say we wanted to devise a Verifiable Threshold (VT): such that the ASI must consistently prove its competence and judgements via:

  • Consequential Prediction (outcome): Can the AI reliably predict the low-level, factual consequences of a moral action (e.g., policy A will lead to X societal outcome)? If its factual predictions are always true, we increase our trust in its high-level moral judgements.
  • Verification of Method:
    • Logical Consistency (verifies internal method): Can the AI demonstrate perfect internal logical consistency and coherence in its ethical system (i.e. the axioms, code, and reasoning paths), showing no contradictions? This verification of method can justify the truth of the outcome.
    • Empirical Justification (verifies the external method): Can ASI check real-world results. This is crucial for consequentialist ethical systems (like utilitarianism or Moral Realism informed by naturalism), where the truth of the outcome is determined by its verifiable effect on the world (e.g., does this action actually reduce suffering or increase flourishing?)

If we can verify the method, then we have methodological transparency.

Moral Imagination

According to philosopher Mark Johnson, “moral imagination” is the capacity to envision the full range of possibilities in a situation in order to resolve ethical challenges.3 Acting morally, he argues, requires more than strength of character: it demands empathy and the ability to recognise what is morally relevant in context. Management scholars Minette Drumwright and Patrick Murphy define moral imagination as the ability to be both ethical and effective by envisioning new and creative alternatives. For instance, when considering clothing produced in overseas sweatshops, can decision-makers look beyond the dollars-and-cents to see how their choices affect workers’ lives? Moral imagination, when combined with creativity and moral courage, enables individuals, organisations, and potentially AIs to act in more ethically responsible ways.

Ideally, yes, it could be great for AI to have moral imagination – otherwise it might not catch the morally salient features we ourselves often miss. Though for supermoral AI, it might achieve “moral adequacy” by brute-force simulation, reasoning and perhaps some grounding, but then we’d face the challenge of whether we can even recognise or trust its judgements (even transparent judgements) if it’s means of computing morality is alien to us – perhaps we need some interpretive bridge.

Interpretive Bridgework: Transparency of Method over Content

Perhaps we can rely on the transparency of method – if the moral reasoning is too distant and mysterious, the solution lies in the Interpretive Bridge. The threshold is crossed not by verifying the complex computation itself, but by verifying the AI’s explanation.

So lets update the Verifiable Threshold (VT) to mean: The AI must be able to translate its hyper-rational moral choice into the simplest possible human-intelligible terms, proving two things:

  1. Impartiality: Demonstrating that its judgement was free of bias, self-interest, or emotional interference.
  2. Completeness: Showing that it considered all morally relevant factors, even those opaque or invisible to humans (e.g., the long-term impact on distant ecosystems or future generations).

The threshold (intelligibility-gap) is crossed not by verifying the complex computation itself, but by verifying the AI’s explanation. We get from methodological transparency to shedding light on the unfathomable mystery (the complex computation; hard to understand judgements that AI concludes on) via the interpretive bridgework.

Coherence as the Justification for Trust

This threshold requires the AI’s conclusions to not wildly contradict the most stable, universal elements of human moral intuition.

  • Verifiable Threshold: If an AI’s superior morality suggests an action that appears radically horrific to the rational human mind, the trust threshold is failed. However, if the AI’s advanced morality is a verifiable coherent extrapolation of our best human values (e.g., minimising suffering and maximising well-being, but done perfectly), we are justified in accepting it. We trust that the answer is not arbitrary, but rather a more perfect realisation of our own deepest, idealised moral desires (similar to the concept of Indirect Normativity).

The crucial shift in thinking is moving from needing to understand the answer (somewhere between difficult and implausible) to being able to verify the process and competence of the entity providing the answer. If the verifiable part of the process is flawless, the leap of faith across the “interpretability gap” becomes a rational act of epistemic deference.

However, I’m still left with the nagging feeling that SI, even if it’s a perfect moral reasoner, may not care. But let’s park this for the moment, we’ll circle back to address the question of ‘whether ASI would care by default?’ later.

Comparative Moral Turing Tests

A recent study (Aharoni et al. 2024) found that when blinded, people often rated GPT-4’s moral judgements as superior to human ones – clearer, more reasoned, more virtuous. Yet participants could still tell which was AI, often because it sounded “too polished.” This shows not that AI is truly more moral, but that people may already be primed to defer to it.

Think about this for a moment… Perception of authority matters. If people defer because of polish, that means AI may have real-world moral influence, if so, then we’d want AI to actually be genuinely moral, and not just provide answers optimised for likeability or sycophancy.

“We conducted a modified Moral Turing Test (m-MTT), inspired by Allen et al. (Exp Theor Artif Intell 352:24–28, 2004) proposal, by asking people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4. A representative sample of 299 U.S. adults first rated the quality of moral evaluations when blinded to their source. Remarkably, they rated the AI’s moral reasoning as superior in quality to humans’ along almost all dimensions, including virtuousness, intelligence, and trustworthiness, consistent with passing what Allen and colleagues call the comparative MTT.”

Paper: Attributions toward artificial agents in a modified Moral Turing Test – Aharoni et al. 2024

Interpreting Moral Trustworthiness

If an AI does morality in a way that’s alien, how do we humans stay in the loop enough to understand and trust it?

Think of an interpretive bridge as a translation layer between an AI’s alien reasoning and human moral understanding. It’s not necessarily required that AI thinks like us – but that it can explain its judgements in ways we find intelligible and actionable.

Default morality?

Now, will AI be more moral by default? AI doesn’t come with a moral compass baked in so far it’s a tool, not a saint (I’m ambivalent about the term ‘tool’ here). The frontier models ability to learn, adapt, generate novel outputs, and operate with a degree of autonomy suggests that they are becoming something more than mere tools.

Whether AI “cares” about morality may strongly depend on what we feed it: our values, our goals, our screw-ups. If we don’t program or train it to prioritise ethics – or if we botch the definition of “ethics”, or related concepts like ‘consciousness’4 – it could just optimise for efficiency or power and leave morality in the dust. Like a souped-up calculator: it’ll crunch what we give it, not ponder the greater good unless we architect it to. We shouldn’t assume that by default AI will emerge hyper-moral.

Would ASI care by default?

Superintelligent AI “may not care” about human morality or universal morality even if it understands to some degree what morality is. If we build AI, its “care” (or lack thereof) depends on what we bake into it – or fail to. How do we bake in ‘care’? What does it mean to care?

Care is not just a feeling, it’s a sustained orientation toward the well-being of others. Without care, an AI might see suffering as just a series of data point, not something to be reduced.

Also, humans aren’t infallible carers – Many moral failures in humans (neglect, cruelty, indifference) stem from deficits of care, and often not deficits of cognition. However, humans often overlook distant or invisible stakeholders – perhaps this is down to a deficit in attention spans – the ceiling of which is limited by cognition. A functional care-analogue in AI could expand the circle widely and impartially.

Hinton has also said that care in AI is important5 – if we are not careful, by default ASI might just shrug at ethics – but that sidesteps the reality that humans, with all our flaws, are the ones steering the ship, at least for now – and we still have time to avoid the ‘value risk‘ of forgetting to install ‘care’, and instead blundering towards shoggoth.

Will supermoral AI hold a mirror up to our narcissism?

Perhaps we don’t want them to be more moral that we are, lest they show us our ugliness – people do love their self-righteous bubbles – narcissism can be cozy. A supermoral AI could shatter this comfy illusion by showing us how petty or hypocritical we can be; it could bruise our egos, expose our hypocrisy or cowardice – and hold a mirror up to things we’d rather ignore.6 It’s not inevitable that a supermoral ASI would judge us, make us look like cancer (i.e. as Agent Smith says in the Matrix). It could be benevolent without being preachy – encourage us to be better through quiet examples than sanctimonious nags. However, some humans often ignore non-preachy benevolence – in some cases the narcissistic nerve way need poking.

If moral realism holds (the idea there’s an objective right and wrong out there), and our revealed preferences scream that we’d rather be the moral kings than bow to a better standard, that’s a damning self-own – suggesting we’re less interested in truth and more in slouching atop the throne, clutching our crowns while the AI points out the blood at our feet.7

Should we want supermoral AI?

Even with the best of intentions, humans are not fully rational. Our evolved cognition is riddled with biases and heuristics that deviate from perfect rationality. Often, we simply cannot calculate the right course of action, even when our preferences are clear. Consider Gary Kasparov against Deep Blue or Lee Sedol against AlphaGo: both desperately wanted to win, yet despite their world-champion skill and determination, they could not compute the winning moves. This illustrates two constraints at once: limited cognitive capacity (brain power) and systematic psychological biases. In the messy real world, where ethical decisions are vastly more complex than board games, these limitations mean humans often fall short. In such cases, an AI capable of outthinking us could plausibly demonstrate greater moral aptitude by navigating choices that lie beyond our reach.

Wise humans would want AI to be more moral, while the greedy but smart just want to not die so as to continue to get more stuff – this has a ring of truth but oversimplifies. Wisdom and greed aren’t mutually exclusive; plenty of folks are both. And “more moral than us” sounds noble until you ask: whose morality? Mine? Yours? It’s not whose, mine or yours – the kind of morality we are discussing about here is objective.

The wise might want AI to amplify their virtues, while the greedy want it to serve their ends – both could still agree on a “moral” AI, just for different reasons – both seeing the upside of an ethical guardrail. The wise might dream of a better world; the greedy just want to leash the chaos and get it to play fetch.

But here’s where it gets a bit dicey.

Should we want ruthless objectivity?

If moral realism’s true, “more moral” could mean an AI that’s ruthlessly objective – unswayed by our excuses or emotions. Imagine an AI that decides drift net fishing or factory farming’s objectively wrong and shuts it down overnight – I must admit I’d like that, but it might tank economies, disrupt supply chains and spark riots. Moral? Maybe from a certain angle. Good for everyone? It’s tricky to answer this. But just so you know, I’m all for reducing suffering – factory farming be damned – as long as it doesn’t destablise civilisation to dangerous degrees – in any case, if AI is really powerful, I’m sure it could solve all the downstream issues in the same fell swoop.

The bigger issue is if it’s morality isn’t aligned with our survival, it could decide we’re the problem and “fix” us out of existence, which isn’t exactly what the greedy have in mind – and it would take a special kind of ‘wise’ to see beyond the romantic notion that “more moral” is inherently good for us. An AI could be brilliantly ethical in ways that clash with human instincts, like prioritising abstract principles over our messy, emotional needs – in a previous talk with Joscha Bach he mentioned it may optimise for negentropy. And if we don’t want them too moral lest they judge us, maybe the real fear isn’t their superiority – it’s losing control. This breed of concern is less about narcissism and more about self-preservation.

So, an AI’s morality may not be a magnified human version of morality, and we might not be part of AI’s grand design. We can romanticise about what counts as more moral until the cows come home without nailing down what that really means in practice.

Hinton’s point – “We have no experience of what it’s like to have something smarter than us”- is straightforward and airtight. We’ve never tangoed with a mind that outclasses ours across the board, so we’re guessing in the dark about what it’d feel like. The extension, that we also lack experience with something vastly more moral than us, sort of tracks. If our moral philosophers are on the right track with moral realism or appealing to ideal versions of ourselves, then they may have more of an intuition of what it might be like. Though perhaps for non-philosophers who are organically fumbling about, tho them maybe morality is a sandbox- flawed, emotional, and inconsistent – some of it intuitive, and somewhat a place to splash about, experiment with and to see what it can get us… so in this sense imagining an entity that’s ethically above us is like picturing a colour we’ve never seen.8 If for most humans morality is a sandbox to play in, then CEV might not work – the extrapolation base might be so incoherent only a hot mess would result – in which case say no to CEV – perhaps we shouldn’t seed AI with aggregate human values and leave it up to the experts, the moral philosophers, to decide what to seed it with. This idea is uncomfortable and needs more exploring.

What are your thoughts?

Can an Artificial Superintelligence (ASI) be objectively more moral than humans, and would this be desirable for humanity?

OptionStance RepresentedRationale
A. Yes, and Yes.Moral Realist & OptimistAI’s superior cognition can find objective moral truths, and we should defer to this better standard.
B. Yes, but No.Moral Realist & CautionaryAI could be ruthlessly objective and “more moral” in a way that clashes with or judges essential human values/survival.
C. No, but Yes (Ideally).Anti-Realist & Value-AlignerASI’s could be more humanly moral (not objectively), and we should aim to make it so., not an alien objective truth.
D. No, and No.Skeptic & AnthropocentricAI is just a tool; it can’t “care” or be truly moral, and any attempt to make it so is inherently dangerous.

Which position do you align to most?

Morality by mathematical precision, or software design?

Some argue that without mathematical precision AI safety will come down to a cosmic roll of the dice with steep odds of failing – I hope they are wrong.

The debate over whether AI morality should be achieved through mathematical precision or software design reveals a tension that reflects a fundamental disagreement over whether human values can be perfectly specified and formalised.

The argument for mathematical precision (formal specification)

Proponents of mathematical precision argue that the immense power of a superintelligence demands an equally precise moral framework. If the AI’s core goal function contains even a tiny ambiguity, a maximising agent will exploit that flaw to disastrous, unintended ends.

The Risk of Value Erosion: If the AI’s goal is defined through fuzzy heuristics or observational learning, the argument is that the true “value” we want to preserve (i.e. human flourishing, consciousness, etc.) is too fragile. A slight miscalculation could result in the AI optimising for a proxy (like paperclips, or maximising one simple sensation) instead of the rich complexity of human life.

Safety as Formal Verification: This approach sees AI safety more generally as a problem of formal verification. The morality must be encoded as a clear, non-negotiable utility function or set of axioms that a future ASI can interpret the real meaning of perfectly. If we cannot formally prove the safety of the initial seed values, the entire project is deemed too risky – a “cosmic roll of the dice.”

The argument for software design (i.e. via indirect normativity and value learning)

Opponents of the purely mathematical approach argue that human morality is too complex (high Kolmogorov complexity) and context-dependent to ever be fully compressed into a concise set of equations. Instead, morality must be an emergent quality of robust software design.

Human Values are Emergent: Human values were not derived from a single equation; they emerged from billions of years of evolution, social interaction, and history. This camp favours Indirect Normativity – designing an AI that is motivated to discover the correct ethical framework, rather than being handed one.

Mechanism vs. Content: The focus shifts from specifying the content of morality (e.g., “Utilitarianism is the answer”) to designing the mechanism for ethical reasoning (e.g., “Build an AI that can perfectly apply moral imagination, impartiality, and rational reflection to any problem”).

The Interpretive Bridge: This approach acknowledges that the AI’s eventual ethical reasoning might be alien, but emphasises the need for an interpretive bridge – a translation layer that allows the AI to explain its hyper-rational moral choices in a way that is intelligible and trustworthy to human users, allowing for continuous human oversight and correction (a “moral loop”).

Ultimately, the choice is between:

  • attempting to lock in a single, unshakeable definition of good at the outset, or
  • creating a highly robust, self-improving system that can learn, evolve, and converge on the best moral solution over time.

Which seems most plausible to you?

Footnotes

  1. Though we do have Comparative Moral Turing Tests (cMTT). The Moral Turing Test (MTT) is a framework for assessing a computer’s ability to act morally by testing if its moral judgements and conversations are indistinguishable from a human’s. A more recent variation, the comparative MTT (cMTT), evaluates whether an AI’s moral reasoning is equal to or better than a human’s, and studies have shown some advanced models can pass this test by providing reasoning perceived as superior. The concept has raised concerns about the potential for over-reliance on AI in moral situations.  ↩︎
  2. Moral realism at it’s core is the metaethical position that there are stance-independent moral truths (facts that don’t depend on our opinions, cultures, or feelings). Some moral realists appeal to ideal agents characteristic of ideal observer theories (like Roderick Firth’s), or ideal advisor theories (Michael Smith, Sharon Street’s discussions, etc.), to help specify what those truths might look like in practice. ↩︎
  3. Mark Johnson’s view that imagination is essential to all moral deliberation is not a consensus view among philosophers, but seems to be strongly supported amongst cognitive scientists. Many philosophers (i.e. moral realists or rationalists) still argue that reason can, in principle, deliver moral truths without reliance on imaginative processes. Cognitive science has validated the role of imagination, empathy, and simulation in moral decision-making.
    See book ‘Moral Imagination’ by Mark Johnson (1993) – Abstract: Using path-breaking discoveries of cognitive science, Mark Johnson argues that humans are fundamentally imaginative moral animals, challenging the view that morality is simply a system of universal laws dictated by reason. According to the Western moral tradition, we make ethical decisions by applying universal laws to concrete situations. But Johnson shows how research in cognitive science undermines this view and reveals that imagination has an essential role in ethical deliberation. Expanding his innovative studies of human reason in Metaphors We Live By and The Body in the Mind, Johnson provides the tools for more practical, realistic, and constructive moral reflection.
    Download the book at PhilArchive for free. ↩︎
  4. The term “consciousness” is widely misunderstood and used to form “conflationary alliances.” These are groups that can agree on the importance of “consciousness” as a core value without ever agreeing on what it actually means. This LW post suggests that many people use the term to refer to an internal experience they personally value, leading to a wide variety of definitions, such as introspection, purposefulness, or the capacity for pleasure and pain. The LW post posits that efforts to disambiguate the term are often resisted because doing so would dissolve these alliances by revealing the underlying disagreements. ↩︎
  5. See this article: Beyond Doom with AI: What Nobel Prize Winner Geoffrey Hinton Reveals About Values, Care, and Human Flourishing ↩︎
  6. Like a saintly sibling who makes you look bad at family dinners. ↩︎
  7. I.e. factory farms suck and that we are horrible humans. We often wrestle with our better angels, but give our flaws a hall pass. ↩︎
  8. See post on the Knowledge Argument applied to Ethics ↩︎

Similar Posts

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *