Stuart Russell – AI Ethics – Provably Beneficial Artificial Intelligence
Delighted to have Stuart Russell on video discussing the importance of AI Alignment – achieving friendly Strong AI that is provably beneficial.
Points of discussion:
A clash of intuitions about the beneficiality of Strong Artificial Intelligence
- A clash of intuitions: Alan Turing raised the concern that if we were to build an AI smarter than we are, we might not be happy about the results. While there is a general notion amoungst AI developers etc that building smarter than human AI would be good.
- But it’s not clear why the objectives of Superintelligent AI will be inimicable to our values – so we need to solve what some poeple call the value alignment problem.
- we as humans learn values in conjunction with learning about the world
The Value Alignment problem
Basic AI Drives: Any objective generates sub-goals
- Designing an AI not want to disable it’s off switch
- 2 principles
- 1) its only objective is to maximise your reward function (this is not an objective programmed into the machine but is a kind of (non-observed) latent variable
- 2) the machine has to be explicitly uncertain about what that objective is
- if the robot thinks it knows what your objective functions are, then it won’t believe that it will make you unhappy and therefore has an incentive to disable the off switch
- the robot will only want to be switched off if thinks it will makes you unhappy
- How will the machines do what humans want if they can’t see their objective functions?
- one answer is to allow the machines to observe human behaviour, and interpret that behaviour as providing evidence of an underlying preference structure – inverse reinforcement learning
Aggregated Volition: How does an AI optimise for many peoples values?
- Has the benefit of symmetry
- difficulties in commensurability of different human preferences
- Problem: If someone feels more strongly about a value X should they get more of a share of value X?
How to deal with people who’s preferences include the suffering of others?
Should a robot be more obligated to its owner than to the rest of the world?
- should this have something to do with how much you pay for the robot?
Moral philosophy will be a key industry sector
Issues of near term Narrow AI vs future Strong AI
- Very easy to confuse the near term killer robot question with the existential risk question
Differences in the issues with the risk of the misuse of Narrow AI and the risk of Strong AI
- Weaponised Narrow AI
Should we replace the gainful employment of humans with AI?
A future where humans lose a sense of meaning & dignity
Hostility to the idea of Superintelligence and AI Friendliness
- there seems to be something else going on for AI experts to make rational arguments as simple minded as ‘If the AI goes bad, just turn the AI off’
- beating alphago is no problem – we just need to play better moves
- it’s theoretically possibe that AI could pose existential risk – but it’s also possible that a black hole could appear in near earth orbit – we don’t spend any time worrying about that so why should we spend time worrying about the existential risk of AI?
Defensive psychological reactions to feeling one’s research is under attack
- People proposing AI safety are not anti AI any more than people wanting to contain a nuclear reaction are anti physics
Provably beneficial AI
- where the AI systems responsibility is to figure out what you want
- though the data to train the AI may be sometimes unrepresentative – leading to a small prossibility of deviation from true beneficiality – probably approximately beneficial AI
Convincing the AI community that AI friendliness is important
Will there be a hard takeoff to superintelligence?
What are the benefits of building String AI?
—
Center for Human-Compatible AI – UC Berkley
Stuart Jonathan Russell is a computer scientist known for his contributions to artificial intelligence. He is a Professor of Computer Science at the University of California, Berkeley and Adjunct Professor of Neurological Surgery at the University of California, San Francisco.