Stuart Armstrong on AI Interpretability, Accidental Misalignment & Risks of Opaque AI
Interview with Stuart Armstrong (Aligned AI)
Video / Audio of interview will be up soon
To watch interview live, join the zoom call:
Time: Nov 9, 2022 07:30 PM Canberra, Melbourne, Sydney
Join Zoom Meeting
Meeting ID: 813 2054 7208
Auditing and interpreting AI (and their models) seems obviously important to achieve verifiably safe AI (by reducing uncertainty), though it’s a concern that for the foreseeable future the most powerful AIs (or the models they produce) are opaque. Tracking opaque AI alignment through monitoring it’s behaviours is problematic and the problems in interpreting opaque AI may end up being difficult beyond our human capability to robustly solve before ‘superintelligence’ is solved (¿without an inside-view operational understanding, how can we verify what values an AI may have or may converge on?) .
It seems likely that as the models become larger and more general, interpretability will become far more difficult. Hence it would be useful to develop AI that is inherently less opaque at its core – i.e. some kind of causal AI (that works via causal influence). This would make it simple to predict AI actions and their consequences (and quantify uncertainty).
While transparent AI may have it’s advantages, it could introduce new risks – reduced AI opaqueness at core involving something like causal AI may end up being far more (sample) efficient (than the current state of the art AI via mass correlations across big data). Sample efficient Deep Causality – AI that requires far less correlational relations and relied heavily on causal relations – could lead to massive compute overhang, which may make dangerously capable AIs far more easily achievable by larger amounts of actors.
* Explainable Artificial Intelligence
* Misaligned Goals in AI
* Stuart Armstrong’s talk at SciFuture 2022 on Aligned AI
Bio: Dr Stuart Armstrong, Co-Founder and Chief Research Officer at Aligned AI
Previously a Researcher at the University of Oxford’s Future of Humanity Institute, Stuart is a mathematician and philosopher and the originator of the value extrapolation approach to artificial intelligence alignment. He has extensive expertise in AI alignment research, having pioneered such ideas as interruptibility, low-impact AIs, counterfactual Oracle AIs, the difficulty/impossibility of AIs learning human preferences without assumptions, and how to nevertheless learn these preferences. Along with journal and conference publications, he posts his research extensively on the Alignment Forum.