Why Are We Afraid to Ask Whether AI Could Be More Moral Than Humans?
Is it dangerous to talk about AI becoming more moral than us?
I think the idea of AI being more moral than humans is a real taboo in some circles. Many alignment researchers are uncomfortable with the idea because it seems to smuggle in the assumption that AI could have genuine moral agency, which conflicts with deflationary views of LLMs as “stochastic parrots” – and also because it sounds uncomfortably close to AI-worship or motivated reasoning for deferring to AI. Invoking this idea could get one dismissed as naive or as an actual safety risk oneself.
It’s also epistemically risky.
The claim is easy to make sloppily. “More moral” conflates several things that need to be separated:
- Moral knowledge (knowing more facts relevant to ethics)
- Moral reasoning (drawing better inferences from values)
- Moral judgement (applying principles wisely in context)
- Moral motivation (actually being moved by moral considerations – which is one of my core focus points of activism)
An AI could plausibly exceed humans on the first two without having anything like the third or fourth. Collapsing these leads to both overclaiming and underclaiming. Clear distinctions between moral judgement and moral motivation makes the conversation tractable rather than muddled.
Is it dangerous for public discourse?
There’s a genuine risk that the framing gets weaponised – either by people wanting to justify AI authority over human decisions, or by critics who use it to paint alignment researchers as unhinged techno-utopians. It can also trigger motivated reasoning in both directions. A lot more could be said here.
But it’s worth engaging with anyway – the taboo itself is epistemically costly, yet if we _refuse to ask_ whether AI could have better-grounded moral reasoning than humans, we prevent getting to the heart of the issue. The grounded values approach actually requires asking questions like:
– What should AI align to?, Is morality a cohesive alignment target?, is there a fact of the matter about moral improvement, or is “more moral” just “more aligned with our current intuitions”?
– What would it mean to verify that an agent has better moral judgement than us, given that we’re the ones doing the evaluating? (see work one recently on comparative moral Turing Tests)
– If humans are themselves imperfectly morally motivated, what does alignment to human preferences actually track?
– How much of human moral failure is motivational versus epistemic?
– Is moral motivation necessarily tied to phenomenal experience, or could a system be genuinely motivated by moral considerations without anything it’s like to be it?
– Can motivation be grounded without being felt?
– Does sustained deference to AI moral judgement atrophy human moral reasoning capacity, and what are the systemic risks of that atrophy – both for individuals and for the collective processes through which moral knowledge has historically developed?
I think asking these questions Socratically can help nudge the conversation into the open productively rather than letting it fester as an unexamined assumption. Also I think this line of questioning isn’t just intuition pump fodder, I think they are directly important to the project of AI alignment.
Though I think this issue has the potential to be dangerous if handled carelessly.