AI Outscored Humans in a Blinded Moral Turing Test – Should We Be Worried? Dr Eyal Aharoni Explains
In a blinded Moral Turing Test, people rated GPT-4’s moral answers higher than humans’ across most measures – yet they could still tell it was AI. Are we priming the public to over-trust machine morality?
Interview with Dr Eyal Aharoni, Ph.D. – Associate Professor of Psychology, Philosophy, and Neuroscience Georgia State University Attributions toward artificial agents in a modified Moral Turing Test
Attributions toward artificial agents in a modified Moral Turing Test
See Nature article : Attributions toward artificial agents in a modified Moral Turing Test (Eyal Aharoni, Sharlene Fernandes, Daniel J. Brady, Caelan Alexander, Michael Criner, Kara Queen, Javier Rando, Eddy Nahmias & Victor Crespo)
Abstract : Advances in artificial intelligence (AI) raise important questions about whether people view moral evaluations by AI systems similarly to human-generated moral evaluations. We conducted a modified Moral Turing Test (m-MTT), inspired by Allen et al. (Exp Theor Artif Intell 352:24–28, 2004) proposal, by asking people to distinguish real human moral evaluations from those made by a popular advanced AI language model: GPT-4. A representative sample of 299 U.S. adults first rated the quality of moral evaluations when blinded to their source. Remarkably, they rated the AI’s moral reasoning as superior in quality to humans’ along almost all dimensions, including virtuousness, intelligence, and trustworthiness, consistent with passing what Allen and colleagues call the comparative MTT. Next, when tasked with identifying the source of each evaluation (human or computer), people performed significantly above chance levels. Although the AI did not pass this test, this was not because of its inferior moral reasoning but, potentially, its perceived superiority, among other possible explanations. The emergence of language models capable of producing moral responses perceived as superior in quality to humans’ raises concerns that people may uncritically accept potentially harmful moral guidance from AI. This possibility highlights the need for safeguards around generative language models in matters of morality.
Video Chapters
0:00 Intro
0:42 What motivated the project?
6:23 What does the Moral Turing Test measure?
9:37 The importance of ‘care’ in ethical reasoning
12:11 Surprising results
13:36 What would you change about the MTT study?
18:06 AI moral responses rated higher than human responses
20:19 Beliefs & attitudes about AI sharing moral information
23:20 AI maximization and overconfidence
25:35 Dealing with Sycophancy in AI
27:35 The Comparative Moral Turing Test on large sample of philosophers
29:47 What do the findings from these studies imply for public trust in AI moral reasoning?
34:50 Enfeeblement
38:13 Counteracting enfeeblement
46:45 Criteria for moral reasoning in AI
50:37 Does moral understanding require phenomenology?
53:18 Indirect Normativity
56:22 Moral realism
1:01:35 Dealing with un-guardrailed AI .. offence/defence tradeoff
1:04:37 Moral Realism more popular amongst philosophers than anti-realism
1:06:59 Should AI be legally responsible?
1:12:34 AI: Moral actor / moral patient
1:16:05 Transparency and explainability around moral decision making in AI
1:18:01 Minds obey physical laws, radical interpretability & the explanatory gap
1:21:45 Existential security
1:22:58 Morality faking – alignment faking – how to counter?
1:25:28 AI drivers licence
1:26:36 Trade-off between security & privacy
1:29:34 Dual use, omni use technology
1:31:09 AI optimising value proxies
1:34:12 International Math Olympiad gold medal level status achieved by Google Gemini
1:37:50 An International Moral Olympiad / Ethics Bowl
1:39:22 AI control vs motivation, Kantian deontology vs utilitarianism
1:41:05 Pluralism
1:43:23 Lessons learned from the MTT study