Radical Interpretability: When Explainable AI Becomes Mind-Reading
Seatbelt Interpretability Much talk about AI interpretability is modest, staying close to what’s happening on the ground with AI safety today. Engineers want answers to questions like: Why did the model recommend this video?… flag this message?…tag this as X? What feature triggered that refusal? Where is the bug? Is the model lying? Is it…