What do we Need to Do to Align AI? – Stuart Armstrong
Synopsis: The goal of Aligned AI is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence.
What is Alignment?
Algorithms are shaping the present and will shape the future ever more strongly. It is crucially important that these powerful algorithms be aligned – that they act in the interests of their designers, their users, and humanity as a whole. Failure to align them could lead to catastrophic results.
Our long experience in the field of AI safety has identified the key bottleneck for solving alignment: concept extrapolation.
What is Concept Extrapolation?
Algorithms typically fail when they are confronted with new situations – they go out of distribution. Their training data will never be enough to deal with all unexpected situations – thus an AI will need to safely extend key concepts and goals, similarly – or better – to how humans do it.
This is concept extrapolation, explained in more details in this sequence. Solving the concept extrapolation problem is both necessary and almost sufficient for solving the whole AI alignment problem.
This talk is part of the ‘Stepping Into the Future‘ conference.
Bio: Dr Stuart Armstrong, Co-Founder and Chief Research Officer
Previously a Researcher at the University of Oxford’s Future of Humanity Institute, Stuart is a mathematician and philosopher and the originator of the value extrapolation approach to artificial intelligence alignment. He has extensive expertise in AI alignment research, having pioneered such ideas as interruptibility, low-impact AIs, counterfactual Oracle AIs, the difficulty/impossibility of AIs learning human preferences without assumptions, and how to nevertheless learn these preferences. Along with journal and conference publications, he posts his research extensively on the Alignment Forum.
One Comment