Ethics, Qualia Research & AI Safety with Mike Johnson
What’s the relationship between valence research and AI ethics?
Hedonic valence is a measure of the quality of our felt sense of experience, the intrinsic goodness (positive valence) or averseness (negative valence) of an event, object, or situation. It is an important aspect of conscious experience; always present in our waking lives. If we seek to understand ourselves, it makes sense to seek to understand how valence works – how to measure it and test for it.
Also, might there be a relationship to the AI safety/friendliness problem?
In this interview, we cover a lot of things, not least .. THE SINGULARITY (of course) & the importance of Valence Research to AI Friendliness Research (as detailed here). Will thinking machines require experience with valence to understand it’s importance?
Here we cover some general questions about Mike Johnson’s views on recent advances in science and technology & what he sees as being the most impactful, what world views are ready to be retired, his views on XRisk and on AI Safety – especially related to value theory.
This one part of an interview series with Mike Johnson (another section on Consciousness, Qualia, Valence & Intelligence).
Adam Ford: Welcome Mike Johnson, many thanks for doing this interview. Can we start with your background?
Mike Johnson: My formal background is in epistemology and philosophy of science: what do we know & how do we know it, what separates good theories from bad ones, and so on. Prior to researching qualia, I did work in information security, algorithmic trading, and human augmentation research.
Adam: What is the most exciting / interesting recent (scientific/engineering) news? Why is it important to you?
Mike: CRISPR is definitely up there! In a few short years precision genetic engineering has gone from a pipe dream to reality. The problem is that we’re like the proverbial dog that caught up to the car it was chasing: what do we do now? Increasingly, we can change our genome, but we have no idea how we should change our genome, and the public discussion about this seems very muddled. The same could be said about breakthroughs in AI.
Adam: What are the most important discoveries/inventions over the last 500 years?
Mike: Tough question. Darwin’s theory of Natural Selection, Newton’s theory of gravity, Faraday & Maxwell’s theory of electricity, and the many discoveries of modern physics would all make the cut. Perhaps also the germ theory of disease. In general what makes discoveries & inventions important is when they lead to a productive new way of looking at the world.
Adam: What philosophical/scientific ideas are ready to be retired? What theories of valence are ready to be relegated to the dustbin of history? (Why are they still in currency? Why are they in need of being thrown away or revised?)
Mike: I think that 99% of the time when someone uses the term “pleasure neurochemicals” or “hedonic brain regions” it obscures more than it explains. We know that opioids & activity in the nucleus accumbens are correlated with pleasure– but we don’t know why, we don’t know the causal mechanism. So it can be useful shorthand to call these things “pleasure neurochemicals” and whatnot, but every single time anyone does that, there should be a footnote that we fundamentally don’t know the causal story here, and this abstraction may ‘leak’ in unexpected ways.
Adam: What have you changed your mind about?
Mike: Whether pushing toward the Singularity is unequivocally a good idea. I read Kurzweil’s The Singularity is Near back in 2005 and loved it- it made me realize that all my life I’d been a transhumanist and didn’t know it. But twelve years later, I’m a lot less optimistic about Kurzweil’s rosy vision. Value is fragile, and there are a lot more ways that things could go wrong, than ways things could go well.
Adam: I remember reading Eliezer’s writings on ‘The Fragility of Value’, it’s quite interesting and worth consideration – the idea that if we don’t get AI’s value system exactly right, then it would be like pulling a random mind out of mindspace – most likely inimicable to human interests. The writing did seem quite abstract, and it would be nice to see a formal model or something concrete to show this would be the case. I’d really like to know how and why value is as fragile as Eliezer seems to make out. Is there any convincing crisply defined model supporting this thesis?
Mike: Whether the ‘Complexity of Value Thesis’ is correct is super important. Essentially, the idea is that we can think of what humans find valuable as a tiny location in a very large, very high-dimensional space– let’s say 1000 dimensions for the sake of argument. Under this framework, value is very fragile; if we move a little bit in any one of these 1000 dimensions, we leave this special zone and get a future that doesn’t match our preferences, desires, and goals. In a word, we get something worthless (to us). This is perhaps most succinctly put by Eliezer in “Value is fragile”:
“If you loose the grip of human morals and metamorals – the result is not mysterious and alien and beautiful by the standards of human value. It is moral noise, a universe tiled with paperclips. To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone. … You want a wonderful and mysterious universe? That’s your value. … Valuable things appear because a goal system that values them takes action to create them. … if our values that prefer it are physically obliterated – or even disturbed in the wrong dimension. Then there is nothing left in the universe that works to make the universe valuable.”
If this frame is right, then it’s going to be really really really hard to get AGI right, because one wrong step in programming will make the AGI depart from human values, and “there will be nothing left to want to bring it back.” Eliezer, and I think most of the AI safety community assumes this.
But– and I want to shout this from the rooftops– the complexity of value thesis is just a thesis! Nobody knows if it’s true. An alternative here would be, instead of trying to look at value in terms of goals and preferences, we look at it in terms of properties of phenomenological experience. This leads to what I call the Unity of Value Thesis, where all the different manifestations of valuable things end up as special cases of a more general, unifying principle (emotional valence). What we know from neuroscience seems to support this: Berridge and Kringelbach write about how “The available evidence suggests that brain mechanisms involved in fundamental pleasures (food and sexual pleasures) overlap with those for higher-order pleasures (for example, monetary, artistic, musical, altruistic, and transcendent pleasures).” My colleague Andres Gomez Emilsson writes about this in The Tyranny of the Intentional Object. Anyway, if this is right, then the AI safety community could approach the Value Problem and Value Loading Problem much differently.
Adam: I’m also interested in the nature of possible attractors that agents might ‘extropically’ gravitate towards (like a thirst for useful and interesting novelty, generative and non-regressive, that might not neatly fit categorically under ‘happiness’) – I’m not wholly convinced that they exist, but if one leans away from moral relativism, it makes sense that a superintelligence may be able to discover or extrapolate facts from all physical systems in the universe, not just humans, to determine valuable futures and avoid malignant failure modes (Coherent Extrapolated Value if you will). Being strongly locked into optimizing human values may be a non-malignant failure mode.
Mike: What you write reminds me of Schmidhuber’s notion of a ‘compression drive’: we’re drawn to interesting things because getting exposed to them helps build our ‘compression library’ and lets us predict the world better. But this feels like an instrumental goal, sort of a “Basic AI Drives” sort of thing. Would definitely agree that there’s a danger of getting locked into a good-yet-not-great local optima if we hard optimize on current human values.
Probably the danger is larger than that too– as Eric Schwitzgebel notes,
“Common sense is incoherent in matters of metaphysics. There’s no way to develop an ambitious, broad-ranging, self- consistent metaphysical system without doing serious violence to common sense somewhere. It’s just impossible. Since common sense is an inconsistent system, you can’t respect it all. Every metaphysician will have to violate it somewhere.”
If we lock in human values based on common sense, we’re basically committing to following an inconsistent formal system. I don’t think most people realize how badly that will fail.
Adam: What invention or idea will change everything?
Mike: A device that allows people to explore the space of all possible qualia in a systematic way. Right now, we do a lot of weird things to experience interesting qualia: we drink fermented liquids, smoke various plant extracts, strap ourselves into rollercoasters, and parachute out of plans, and so on, to give just a few examples. But these are very haphazard ways to experience new qualia! When we’re able to ‘domesticate’ and ‘technologize’ qualia, like we’ve done with electricity, we’ll be living in a new (and, I think, incredibly exciting) world.
Adam: What are you most concerned about? What ought we be worrying about?
Mike: I’m worried that society’s ability to coordinate on hard things seems to be breaking down, and about AI safety. Similarly, I’m also worried about what Eliezer Yudkowsky calls ‘Moore’s Law of Mad Science’, that steady technological progress means that ‘every eighteen months the minimum IQ necessary to destroy the world drops by one point’. But I think some very smart people are worrying about these things, and are trying to address them.
In contrast, almost no one is worrying that we don’t have good theories of qualia & valence. And I think we really, really ought to, because they’re upstream of a lot of important things, and right now they’re “unknown unknowns”- we don’t know what we don’t know about them.
One failure case that I worry about is that we could trade away what makes life worth living in return for some minor competitive advantage. As Bostrom notes in Superintelligence,
“When it becomes possible to build architectures that could not be implemented well on biological neural networks, new design space opens up; and the global optima in this extended space need not resemble familiar types of mentality. Human-like cognitive organizations would then lack a niche in a competitive post-transition economy or ecosystem. We could thus imagine, as an extreme case, a technologically highly advanced society, containing many complex structures, some of them far more intricate and intelligent than anything that exists on the planet today – a society which nevertheless lacks any type of being that is conscious or whose welfare has moral significance. In a sense, this would be an uninhabited society. It would be a society of economic miracles and technological awesomeness, with nobody there to benefit. A Disneyland with no children.”
Now, if we don’t know how qualia works, I think this is the default case. Our future could easily be a technological wonderland, but with very little subjective experience. “A Disneyland with no children,” as Bostrom quips.
Adam: How would you describe your ethical views? What are your thoughts on the relative importance of happiness vs. suffering? Do things besides valence have intrinsic moral importance?
Mike: Good question. First, I’d just like to comment that Principia Qualia is a descriptive document; it doesn’t make any normative claims.
I think the core question in ethics is whether there are elegant ethical principles to be discovered, or not. Whether we can find some sort of simple description or efficient compression scheme for ethics, or if ethics is irreducibly complex & inconsistent.
The most efficient compression scheme I can find for ethics, that seems to explain very much with very little, and besides that seems intuitively plausible, is the following:
- Strictly speaking, conscious experience is necessary for intrinsic moral significance. I.e., I care about what happens to dogs, because I think they’re conscious; I don’t care about what happens to paperclips, because I don’t think they are.
- Some conscious experiences do feel better than others, and all else being equal, pleasant experiences have more value than unpleasant experiences.
Beyond this, though, I think things get very speculative. Is valence the only thing that has intrinsic moral importance? I don’t know. On one hand, this sounds like a bad moral theory, one which is low-status, has lots of failure-modes, and doesn’t match all our intuitions. On the other hand, all other systematic approaches seem even worse. And if we can explain the value of most things in terms of valence, then Occam’s Razor suggests that we should put extra effort into explaining everything in those terms, since it’d be a lot more elegant. So– I don’t know that valence is the arbiter of all value, and I think we should be actively looking for other options, but I am open to it. That said I strongly believe that we should avoid premature optimization, and we should prioritize figuring out the details of consciousness & valence (i.e. we should prioritize research over advocacy).
Re: the relative importance of happiness vs suffering, it’s hard to say much at this point, but I’d expect that if we can move valence research to a more formal basis, there will be an implicit answer to this embedded in the mathematics.
Perhaps the clearest and most important ethical view I have is that ethics must ultimately “compile” to physics. What we value and what we disvalue must ultimately cash out in terms of particle arrangements & dynamics, because these are the only things we can actually change. And so if people are doing ethics without caring about making their theories cash out in physical terms, they’re not actually doing ethics- they’re doing art, or social signaling, or something which can serve as the inspiration for a future ethics.
Perhaps the clearest and most important ethical view I have is that ethics must ultimately “compile” to physics. What we value and what we disvalue must ultimately cash out in terms of particle arrangements & dynamics, because these are the only things we can actually change.
The analogy I’d offer here is that we can think about our universe as a computer, and ethics as choosing a program to run on this computer. Unfortunately, most ethicists aren’t writing machine-code, or even thinking about things in ways that could be easily translated to machine-code. Instead, they’re writing poetry about the sorts of programs that might be nice to run. But you can’t compile poetry to machine-code! So I hope the field of ethics becomes more physics-savvy and quantitative (although I’m not optimistic this will happen quickly).
Eliezer Yudkowsky refers to something similar with his notions of “AI grade philosophy”, “compilable philosophy”, and “computable ethics”, though I don’t think he quite goes far enough (i.e., all the way to physics).
Adam: What excites you? What do you think we have reason to be optimistic about?
Mike: The potential of qualia research to actually make peoples’ lives better in concrete, meaningful ways. Medicine’s approach to pain management and treatment of affective disorders are stuck in the dark ages because we don’t know what pain is. We don’t know why some mental states hurt. If we can figure that out, we can almost immediately help a lot of people, and probably unlock a surprising amount of human potential as well. What does the world look like with sane, scientific, effective treatments for pain & depression & akrasia? I think it’ll look amazing.
Adam: If you were to take a stab at forecasting the Intelligence Explosion – in what timeframe do you think it might happen (confidence intervals allowed)?
Mike: I don’t see any intractable technical hurdles to an Intelligence Explosion: the general attitude in AI circles seems to be that progress is actually happening a lot more quickly than expected, and that getting to human-level AGI is less a matter of finding some fundamental breakthrough, and more a matter of refining and connecting all the stuff we already know how to do.
The real unknown, I think, is the socio-political side of things. AI research depends on a stable, prosperous society able to support it and willing to ‘roll the dice’ on a good outcome, and peering into the future, I’m not sure we can take this as a given. My predictions for an Intelligence Explosion:
- Between ~2035-2045 if we just extrapolate research trends within the current system;
- Between ~2080-2100 if major socio-political disruptions happen but we stabilize without too much collateral damage (e.g., non-nuclear war, drawn-out social conflict);
- If it doesn’t happen by 2100, it probably implies a fundamental shift in our ability or desire to create an Intelligence Explosion, and so it might take hundreds of years (or never happen).
If a tree falls in the forest and no one is around to hear it, does it make a sound? It would be unfortunate if a whole lot of awesome stuff were to happen with no one around to experience it. <!–If a rainbow appears in a universe, and there is no one around to experience it, is it beautiful?–>
Mike Johnson is a philosopher living in the Bay Area, writing about mind, complexity theory, and formalization. He is Co-founder of the Qualia Research Institute. Much of Mike’s research and writings can be found at the Open Theory website.
‘Principia Qualia’ is Mike’s magnum opus – a blueprint for building a new Science of Qualia. Click here for the full version, or here for an executive summary.
If you like Mike’s work, consider helping fund it at Patreon.
4 Comments