Moral Realism, Disagreement, and the Stakes for AI

If morality is real, ignoring it in AI design is the most dangerous mistake we could make.

In discussing with others about AI safety and the future of humanity, I decided to create a blogpost distilling my thoughts on the matter:


Disagreement != No Truth

Few would claim that persistent disagreement implies there is no truth in cosmology or physics. The fact that some consult Ouija boards or crystal balls doesn’t disqualify the undeniable progress in science.

Moral disagreement is trickier, but it doesn’t feel like arguing over ice-cream flavours or pineapple on pizza. It feels like a dispute over facts – including moral facts. This “factual” feel is precisely what anti-realists struggle to explain away.1

Why Humans Disagree About Morality

For now we see through a glass darkly2. Human points of view are narrow and shaped by evolution: our heuristics are kludgy, tribal, self-centred, and riddled with bias. Humans certainly aren’t ideal rational agents. This explains much of our parochiality.

Yet despite our limits, there is wide convergence on some moral judgements – especially around sentience. Pain is bad, joy is good, fairness is better than exploitation (at least within groups).3

The AI Advantage

Future AIs may outstrip us in crucial ways: they could integrate vastly wider perspectives, operate as hives, and apply rational debiasing in ways humans never could. In doing so, they might approximate the ideal observer4 standard often invoked in moral philosophy.

Imagine an AI with:

  • wider perspectives,
  • hive-like coordination,
  • rational debiasing.

This makes them potential approximators of the “ideal observer” standard moral philosophers often appeal to.

Grounds for Hope

Among philosophers – experts in this domain – moral realism is by far the dominant position. If specialists who wrestle with these questions lean realist, that should raise our credence too.

And if moral realism is even possibly true, then there is real moral common ground to build on. That possibility alone affords rational hope.5

Why AI Safety Needs Moral Realism

AI safety strategies often hedge between:

  • Aligning to human preferences, and
  • Capability control.

But we should also weigh a third option:

  • Aligning to moral reality, if it exists.

Ignoring this risks creating a brilliant but morally unreliable superintelligence.

If anti-realism or nihilism is true, morality reduces to parochial preferences6. That makes sacrifice hard to justify and cooperation fragile. How is one to reliably motivate other people (let alone alien-like AI) to do anything one thinks is right especially if it requires some form of sacrifice on their side?

How could the emergence of an anti-realist ASI play out? It might see no external obligations, and see any inconvenient obstacles (humans included) as material with potential for refactoring into paperclips or whatever it prefers – obstacles such as humans could be treated as mere raw material. This could lead to a utility monster singleton or a clash of competing utility monsters until one dominates.

Though I think there are reasons why even budding utility monsters may think twice about perusing a dominance trajectory.7

Sentience as Grounding for Ought

Sentience may be the bridge. I’ve argued before that zombie AI – intelligent but without qualia – may recognise its own experiential gap. Like a colour-blind scientist who knows all the physics of light but has never seen red, something crucial is missing. Without qualia, an AI may remain morally blind (at least for a while).8

“[Z]ombie AI, if smart enough will likely recognise it lacks qualia – that it has a gap in knowledge caused by a lack of direct experience (an experiential gap) – this may be a problem if moral reliability is limited without sentience.”

“The knowledge argument / Mary’s room: It’s a bit like a colour blind scientist who knows everything about light and colour wavelengths but has never seen colour – there may be some ineffable understanding still missing until she gains the ability to see. In AI’s case, no matter how much it reads about the neurology of pain or the psychology of joy, lacking any feeling of its own might leave a similar gap.”

On Zombie AI One Day Achieving Sentience

The ‘Is’ of Pleasure/Pain, and the Bridge to Ought

The pleasure-pain axis seems to disclose intrinsic goodness and badness – there are good reasons why pain and pleasure aren’t just evolutionary encoded reference signals for positive and negative reinforcement. Under philosophical hedonism, qualitative states are intrinsically motivating and provide the experiential ground for rational moral deliberation.9

Put simply: any rational sentient agent can take the “is” of joy’s goodness and suffering’s badness, and use that as grounding to approximate stance-independent truths about what ought to be done.

How well this works depends on experience, cognitive capacity, and freedom from bias. An ideal observer would capture all of this. And to the extent that future AIs approximate such an observer, they may well converge on moral realism.

Conclusion

If we ignore moral realism in AI safety, we gamble on building powerful minds with no anchor beyond preference and power. That’s a dangerous bet. If moral truths exist, then aligning AI with them may be our best chance at stable, universal cooperation – the only safeguard against a future ruled by utility monsters or indifferent optimisation. In short: if there’s even a chance morality is real, we cannot afford to build AI as if it isn’t.

Footnotes

  1. See David Enoch’s paper ‘How is Moral Disagreement a Problem for Realism?↩︎
  2. Our current understanding of reality, universal truths, or the future is currently incomplete, imperfect, and obscure, much like trying to see through a dirty or foggy piece of glass. I don’t mean to gesticulate towards divinity – it’s just a cultural reference – see Corinthians 13:12 ↩︎
  3. There is wide agreement in ethics that sentience – the capacity for subjective experience, including joy and pain – warrants moral consideration, making pain bad and joy good, and leading to the widespread belief that fairness is preferable to exploitation, at least within one’s own group. This foundational principle, known as sentientism, holds that the degree of a being’s sentience dictates its moral status, and it is the basis for advocating for the rights and respectful treatment of both humans, other sentient animals and potentially sentience capable artificial agents. ↩︎
  4. Ideal observer theory is a meta-ethical view positing that the truth of moral judgements stems from the hypothetical reactions of a perfect, unbiased, and well-informed individual, an “ideal observer”. Essentially, an action’s moral status is determined by what this ideal observer would approve or disapprove of, providing an objective, universal standard for morality that corrects for ordinary human flaws like bias, ignorance, and inconsistency.
    See the SEP entry on Impartiality ↩︎
  5. In terms of hope (both wellbeing for the individual doing the hoping, and as well as rationally non-negligable credence levels), if expertise matters then it is revealing that amongst philosophers (a very valid category of expertise regarding ethics) by far the most popular metaethical position is moral realism – I mean if trained experts, who spend their lives wrestling with these issues, lean realist, that should bumps ones credence.
    So, should one think that if there is some non-negligable chance that moral realism is true such that there is actual moral common ground to build on, may this afford rational ground for hope? (I think so) ↩︎
  6. fictional collections of preferences of individuals and/or societies ↩︎
  7. Why a budding ASI utility monster might shift careers towards cosmic cooperation? See Cosmic Auditability as Alignment Pressure and Transparency of History in Galactic Game Theory. ↩︎
  8. See On Zombie AI One Day Achieving Sentience ↩︎
  9. Philosophical hedonism argues that pleasure is intrinsically positive and pain is intrinsically negative, meaning they are good or bad for their own sake, rather than as a means to another end. While other things can have only instrumental value (like friendship, which is valuable because it leads to pleasure), pleasure is considered the sole intrinsic good and pain the sole intrinsic bad by proponents of hedonism. See entry at the Internet Encyclopedia of Philosophy on Hedonism ↩︎

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *