AI Alignment to Higher Values, not Human Values
The meaning of homo sapien means ‘wise man’ – given the current human-caused precarious states of affairs, this self-description seems a bit of a reach. Human values in aggregate aren’t coherent, and of those that are, not all are really that great (especially if our revealed preferences hint at what they are).
Connor Leahy, who works in AI safety, says:
“Can there be such a thing that aligns to human values or at least doesn’t have it’s own goals or whatever?… I think physically there’s no reason to believe it’s not possible…the more interesting question is ‘in practice will we build one?’ and currently the answer is absolutely not… given the current political situation, the current market situation, there is an exactly zero percent chance we will find that answer.”
Zero percent chance sounds serious! Is this realistic? It wouldn’t be easy shooting an arrow into an incoherent bullseye.
It’s difficult to get people to agree on what values are best, which might make it very difficult convincing everyone which value system is most right. Iason Gabriel writes convincingly that we should align AI to some kind of ethical plurality, I kind of agree at least as an early motivational scaffold.
Do we want an AI that perfectly aligns to human values? If yes, we should be careful what we wish for. The pertinent question isn’t as much about what we desire*, but what we need – what’s actually best for us.
My take:
Human values are complex (see Eliezer Y), we disagree on what they are and many values are in conflict with each other. It may not be good to align AI to human values (let alone each and every one of them in all their incoherent glory).
If:
a) there is zero to low chance at alignment to the aggregate of human values
..and
b) there are objectively (or stance-independently) higher values
..then:
c) some of the better angels of human values may also track some of the objectively higher values
..and
d) we should seek to understand, align to and implement the higher values we don’t currently have
I’ve argued elsewhere that we should wisely motivate contained Oracle AI to increase its own understanding of higher values, then help us understand them, and ultimately implement them. I’m partial to moral realism – but whatever value rightness ends up being, we should align AI to it – and for that matter, if we are ethically serious, we should too. Nick Bostrom calls this ‘indirect normativity’ which he describes in ch13 of his book Superintelligence.
- values are measurable, can be verified
- values aren’t arbitrary, like engineering principles which align to the laws of physics
- It’s highly likely there are values we haven’t discovered yet
- relativism isn’t true, but it’s purported virtues can be rolled up into consequentialism
If some values are complex, difficult to cohere but not arbitrarily and generally get right then indirect normativity seems like a good path to arrive at objectively awesome values, that would pursue objectively better values.
In order to understand better values, our current epistemics may get us there, or perhaps improvements need to be made to the epistemics to fully grasp and select for the best of what exists in value possibility space.
Conner Leahy later says “…an aligned ASI super intelligence .. something’s called a sovereign.. or a Angelic type system would be a being of incredible intelligence and benevolence – it would be one that fundamentally would help Humanity, despite itself, towards a kind of goodness that even Humanity itself does not embody. This would be the closest that we have to the concept of an Angel – this is the closest thing we have in our vocabulary to describe what this being would be like.. depending whether you’re abrahamic or non-abrahamic.. there are no non-religious terms that accurately describe what this being would be like.”
Appealing to religious exemplifications of goodness can feel transcendant or viscerally appealing, there is a lot of history and stories embedded in our culture about religious motifs… though, as Stephen J. Gould says they exist in a separate magisterium to science and rational discourse, and so aren’t amenable to epistemology and empiricism – our most accurate and tractable means of understanding the world. This where the study of value really should be grounded – not in some purportedly separate realm that we can’t see and verify.
Value space is probably incomprehensibly large. Some of our values are likely higher values
Footnotes
- As part of the Sydney Dialogue Summit Sessions, David Wroe sits down with Connor Leahy – YouTube video here.
- What we desire may not be whats best for us. See SEP article on desire.
- Stephen Jay Gould’s concept of non-overlapping magisteria (NOMA) is a model for the relationship between science and religion that proposes that the two fields have separate domains of teaching authority, or “magisteria”, and do not overlap. Gould believed that science focuses on the empirical world, while religion addresses values like morality. He argued that conflict between the two fields can only occur if their domains overlap, which he believed they do not.