Tyranny Multipliers and Moral Machines
In an email exchange with Jamais Casio1 we both expressed huge concern about US foreign aid programmes being frozen/dismantled and, in at least some cases, stored emergency food ended up expiring and being destroyed rather than reaching people who needed it.2 Should it be my concern while living in Australia? When America coughs, the rest of the world catches a cold – and it’s a terrible state of affairs.
That whole episode also underlines a darker, more general point: Jamais said something that rang true – norms and rules only matter if they’re enforced. When enforcement becomes optional (or selectively applied), governance spins into moral cosplay.
Somewhat controversially I think, but I believe the issue needs careful consideration : I’ve been exploring scenarios where machines could genuinely be more moral than us – not to say that the current crop of AIs are more moral, or that future AI will be by default – so I hope to make it very clear that I’m not suggesting deference or ceding power to chatbots.3
But I keep hearing “control AI” and “align AI to humans” as though that’s obviously the one true safe end-state. It isn’t. There are plenty of humans I’d rather not see holding ultimate control over powerful AI, and there are plenty of “human preferences” I’d rather not have amplified into the infrastructure of the future.
Why machines might outperform us morally (in principle)
If machines become reliably more capable reasoners, this could afford more capable moral reasoning.
If they were less hindered by (some) biases – or even just had different biases that are easier to audit and correct – that could mean less clouded judgement.
If they had more capable empirical abilities with wider reach, that could mean a keener understanding of morally relevant features: phenomenology(valence and affective states), psychological stress and trauma, coercion and agency, social configurations that promote or destroy trust – all useful for promoting welfare (suffering reduction, happiness promotion, fairness, cooperative behaviour, and yes, even novelty/interestingness).
And if they could think far deeper and further – capable of doing “idealised reflection” better than we can – then over the long term you could imagine a kind of recursive realignment: values getting refined as the landscape of value becomes clearer through reflection and a better ontic grip on the world.
The catch: capability doesn’t entail caring
The catch is obvious: capability doesn’t entail benevolence. Intelligence doesn’t automatically produce compassion, restraint, or even basic non-cruelty.
At this stage in history AI can come across as powerful, perform well on moral turing tests.4 But it’s also unreliable, hallucinates, and doesn’t appear to have agency robust to human standards. Whether it has any phenomenology is unknown – and at minimum it lacks the kind of symbol/conceptual grounding and lived embeddedness humans have. Still, it seems more than reasonable to entertain future scenarios where at least some of these weaknesses get ironed out (better grounding, tighter feedback, less confabulation, better long-horizon coherence).
Tool-AI in the wrong hands is a tyranny multiplier
Here’s where this loops back to Jamais’ words “norms only matter if enforced”.
If we end up with super powerful tool-AI, it may become a tragically dangerous instrument in the hands of fascist dictators (or any selfish actor unconstrained by enforcement). In that case, align it to which humans, exactly?
In that world, I find myself preferring the idea of caring, sovereign moral machines (or at least systems embedded in reliably enforceable pro-social constraints) capable of forging a better future than we can – and somewhat selfishly, I hope humanity can be a part of it.5,6,7
That may require indirect alignment between machines and us: aligning to actually good values/principles (different from directly aligning AI to human preferences/volition). And yes, that nudges you towards something like moral realism – or at least towards the idea that there are pragmatic, stance-independent constraints and attractors/detractors in value-space that agents can converge on.
One example I think is hard to wriggle out of: qualitative agony is immanently repellent and bliss immanently attractive. I hope there’d be collective agreement on that – but I also suspect it would remain the case even if there wasn’t.
Species dominance vs value dominance
Hugo de Garis and Ben Goertzel will engage in a debate at the up and coming Future Day event on species dominance. The species dominance debate has gotten a fair bit of attention – plus Hugo is an intelligent and charismatic character. But I think it’s the wrong framing – I think the crux of the issue is less about species vs species so much as value dominance: what values end up steering the future, and by what selection mechanism.
My hope is that the steering comes via careful value and norm selection (Bostrom-style indirect normativity) rather than selection via near-term brute fitness/power – because “might makes right” is not a moral theory8, it’s a threat – generally thought of as a cynical rejection of traditional morality or a form of moral nihilism rather than a constructive moral theory.
Fin?
p.s. I’m trying to maintain focus and push a book out on the subject this year.
Footnotes
- See Jamais Cascio is a well known Futurist. He blogs at OpenTheFuture.com ↩︎
- The food wasn’t merely “destroyed instead of given”; reports describe large quantities stranded and at risk of expiry, and a specific case of high-energy biscuits expiring in storage with plans for destruction (with some reportedly saved after lobbying). See here. ↩︎
- Though I am continuously surprised at how well the latest LLMs are at solving problems, despite their somewhat inconsistency, sycophancy and hallucinations (which I try to quell in my system prompts). ↩︎
- See interview with Eyal Aharomi – ‘AI Outscored Humans in a Blinded Moral Turing Test – Should We Be Worried?‘, and there is another upcoming interview on the subject of Moral Turing Tests with Danica Dillion. ↩︎
- If we were to get reliably robust enforcement of adequate constraints on tool-AI and its use (and superintelligence didn’t look like it was on the horizon), I’d prefer tool-AI + institutions over sovereign tool-ai. If we can’t, the sovereign option becomes less insane. ↩︎
- I’d also argue that we should be aiming for an AI that would be a caring, sovereign moral machine if it were loose – as others have argued, past a certain threshold difficult to predict, superintelligent AI will likely be uncontainable and uncontrollable. ↩︎
- What might a minimum bar look like moral machines be? It’s by no means a solved problem. But Here are some starting ideas: demonstrated non-confabulatory grounding; stable long-horizon consistency; adversarially-tested welfare modelling; corrigibility under oversight; resistance to capture by any single regime; and a verifiable commitment to non-cruelty constraints. I interviewed Colin Allen, co-author of ‘Moral Machines’ which provided some fruitful discussion on the subject (video here). ↩︎
- “Might makes right” asserts that those with superior power define, enforce, and determine what is “right” or “just”. Philosophically, it holds that power dictates morality rather than justice or ethics. It suggests that power overrides moral considerations, often associated with Nietzsche’s idea of “will to power”. It is sometimes seen by philosophers as the absence of moral consideration, or a “state of nature” where power is the only determining factor. ↩︎