Capability Control vs Motivation Selection: Contrasting Strategies for AI Safety
In AI safety, Control and Motivation are the two primary strategies used to prevent advanced AI systems from causing harm. While control focuses on external constraints, motivation addresses the internal goals and values of the AI. Most researchers, including those at Anthropic and the Future of Life Institute, suggest a hybrid approach: using strict capability…