When AI Mistakes Reassurance for Evidence
A preliminary eval of moral sycophancy shows that language models often handle strong evidence reasonably well, but can over-credit weak, cosmetic, or reputational reassurance. A language model does not need to blatantly flatter a user to become morally sycophantic – it can fail by accepting user reassurance as evidence. This is a common pattern in…