Why Are We Afraid to Ask Whether AI Could Be More Moral Than Humans?

Practically nobody in alignment wants to say it out loud. So let’s say it: AI might turn out to be more moral than us. Now – why does that feel like a dangerous thing to claim?

The idea of AI being more moral than humans is a real taboo in some circles. Many alignment researchers are uncomfortable with the idea because it seems to smuggle in the assumption that AI could have genuine moral agency, which conflicts with deflationary views of LLMs as “stochastic parrots” – and also because it sounds uncomfortably close to AI-worship or motivated reasoning for deferring to AI. Invoking this idea could get one dismissed as naive or as an actual safety risk oneself.

On the other hand it’s also epistemically risky.

More moral – but in what sense? Knowing more facts relevant to ethics? Drawing better inferences from values? Applying principles wisely in context? Actually being moved by moral considerations, rather than just computing them? These aren’t the same thing. Conflating them produces both overclaiming and underclaiming – and most of the bad arguments on both sides of this debate do exactly that.1

An AI could plausibly exceed humans on moral knowledge, reasoning and even judgement without having anything like moral motivation. Collapsing these leads to both overclaiming and underclaiming. Clear distinctions between stuff like moral judgement and moral motivation makes the conversation tractable.

Is the forbidden question dangerous for public discourse?

There’s a genuine risk that the framing gets weaponised – either by people wanting to justify AI authority over human decisions, or by critics who use it to paint alignment researchers as unhinged techno-utopians. It can also trigger motivated reasoning in both directions. A lot more could be said here.

But the taboo is not protecting us from a dangerous question. It’s protecting us from the answer. The taboo itself is epistemically costly, yet if we refuse to ask whether AI could have better-grounded moral reasoning than humans, we prevent getting to the heart of the issue.

The questions worth asking

Before thoroughly assessing whether AI could be more moral than humans, we need to ask whether the question is even coherent.

Alignment targetting and verification

What should AI align to? Is morality a cohesive alignment target, or a family of overlapping intuitions that only look unified from a distance? And if there is a fact of the matter about moral improvement, how would we know we were tracking it – rather than simply laundering our current preferences with extra steps?

On top of this, what would it mean to verify that an agent has better moral judgement than us, given that we’re the ones doing the evaluating?
This is a difficult problem – how can we step outside our own parochial moral proclivities to assess something that exceeds it. Though I suspect this kind of thinking has contributed a lot to moral progress so far. Yet we have to proceed carefully: adjudicating values generally requires the application of a pre-existing values, standards, or rules, rather than operating in a values-free vacuum.2

The motivational gap

Is AI with sound epistemics enough? Some moral failures are epistemic: false beliefs, bad models of consequences, tribal misinformation, failure to notice morally relevant facts. But many are motivational: weakness of will, cowardice, selfishness, motivated reasoning, convenience. Many are also institutional or incentive-driven. My take is that human moral failure is usually a compound mess.

Even if the epistemic questions could be resolved, a deeper problem of motivation remains.

If humans are themselves inadequately morally motivated, what does alignment to human preferences actually track? In many cases it isn’t moral truth – at best, some weighted average of moral intuitions – often distorted by power, attention, and self-interest.

How much of human moral failure is motivational rather than epistemic? More than we tend to admit. We frequently know what the right thing is and fail to do it anyway – which means even if we expect AI to be better moral reasoners, improved moral reasoning would leave unresolved whether the system is stably motivated by the moral considerations it recognises. We need AI to have the inner alignment of moral motivation.

And this raises another difficult question: is moral motivation necessarily tied to phenomenal experience – to there being something it is like to care? Or could AI be genuinely motivated by moral considerations without felt engagement? In essence, does motivation need to be grounded, and if so, can motivation be grounded without being felt?

The systemic stakes

Finally, there are second-order questions that rarely get asked – about what happens to us if AI gets this right.

Does sustained deference to AI moral judgement atrophy human moral reasoning and motivational capacity? And if so, what are the systemic risks of that atrophy – not just for individuals, but also for the collective processes through which moral knowledge has historically developed? (There is recent work on comparative moral Turing Tests that begins to take human moral enfeeblement seriously3)

Moral progress for humans has never been a purely individual achievement. It has happened through argument, conflict, revision, and hard-won consensus across generations. A system that resolves moral questions faster than humans can engage with them might short-circuit the process traditionally responsible for producing moral progress.


I think asking these questions Socratically can help nudge the conversation into the open productively rather than letting it fester as an unexamined assumption. Also I think this line of questioning isn’t just intuition pump fodder, I think they are directly important to the project of AI alignment.4

From where I sit, it seems like there’s a risk in staying silent which outweighs the benefits of the taboo. I might be overlooking the benefits of the taboo, but I’m concerned in many ways we’re trading long-term safety for short-term comfort. I wonder if refusing to explore AI’s moral potential might leave us ill-equipped for the complexities that are likely coming our way. Are we humans fit for the future?5

Handled carelessly, this question causes damage. Left unasked, it causes more.

Footnotes

  1. The claim is easy to make sloppily. “More moral” conflates several things that need to be separated:
    a) Moral knowledge (knowing more facts relevant to ethics)
    b) Moral reasoning (drawing better inferences from values)
    c) Moral judgement (applying principles wisely in context)
    d) Moral motivation (actually being moved by moral considerations – which is one of my core focus points of activism) ↩︎
  2. This was brought up in an interview with Nick Bostrom ↩︎
  3. See Eyal Aharoni’s and Danica Dillion’s work on Moral Turing Tests – presentations and interviews here, here and here. ↩︎
  4. The grounded values approach actually requires asking questions like:
    – What should AI align to?, Is morality a cohesive alignment target?, is there a fact of the matter about moral improvement, or is “more moral” just “more aligned with our current intuitions”?
    – What would it mean to verify that an agent has better moral judgement than us, given that we’re the ones doing the evaluating? (see work one recently on comparative moral Turing Tests)
    – If humans are themselves imperfectly morally motivated, what does alignment to human preferences actually track?
    – How much of human moral failure is motivational versus epistemic?
    – Is moral motivation necessarily tied to phenomenal experience, or could a system be genuinely motivated by moral considerations without anything it’s like to be it?
    – Can motivation be grounded without being felt?
    – Does sustained deference to AI moral judgement atrophy human moral reasoning capacity, and what are the systemic risks of that atrophy – both for individuals and for the collective processes through which moral knowledge has historically developed? ↩︎
  5. Philosopher and bioethicist Julian Savulescu argues that humans are “unfit for the future” because our moral psychology has not evolved as quickly as our technological power. For instance: we essentially have hunter-gatherer brains operating in a world of nuclear weapons and global climate crises; we are naturally wired to care only about a “small circle” of family and friends, making us more or less indifferent to global suffering; we struggle to care about the interests of future generations or long-term threats like climate change; and in many ways technology has made it easier to kill than it is to save, and as such today or with further technological growth a single psychopath could cause irreversible global destruction. ↩︎

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *