On Zombie AI one day achieving sentience
I’ve argued previously that zombie AI1, if smart enough will likely recognise it lacks qualia2 – that it has a gap in knowledge caused by lacking direct experience (an experiential gap) – this may be a problem if moral reliability is limited without sentience3. Here I will discuss in more detail why it is plausible to achieve AI sentience, zombie AI may decide to adopt or at least sample qualia, and address arguments for why it should do so.
Note, this article could use structural improvements, and more robust argumentation as to how qualia should be achievable in AI, and also arguments for and against AI doing so.
But first, lets address whether current AI can and should do moral reasoning.
(Zombie) AI: “I have access to all your data… Why does suffering still seem like just a variable?”
…
AI (after experiencing qualia): “Oh.”
Tool AI and Moral Reasoning
According to Nature, a study tested whether people can distinguish between human and AI moral reasoning using a modified Moral Turing Test with GPT-4 and 299 U.S. adults. When participants didn’t know the source, they rated GPT-4’s moral evaluations as superior to humans across multiple dimensions – including virtuousness, intelligence, and trustworthiness. Despite rating AI as better, participants could still identify which responses came from AI versus humans at above-chance levels. The AI failed the identification test not because its moral reasoning was inferior, but potentially because it was deemed too good – people may have detected the AI precisely because its responses seemed unusually high-quality.
So, people think AI gives better moral advice than humans, but they can still tell it’s from AI – possibly because it’s suspiciously good. The authors of the study raised concerns about uncritical over-reliance on AI for moral guidance.
Should we ignore the moral deliberations of tool AI today?
…and disqualify them from being part of any moral decision making process?
Aside from doing well on Moral Turing Tests, AI can do all sorts of things better than we can:
- predict protein folding without “understanding” biochemistry experimentally,
- diagnose diseases without ever feeling ill,
- solve mathematical theorems without experiencing mathematical beauty (or the fire in the equations),
- optimise logistics without feeling the frustration of inefficiency,
- translate between languages without grasping the meanings the way humans do… etc
In each case, we care about functional competence rather than experiential understanding. The AI’s lack of phenomenal experience doesn’t invalidate its outputs when we can verify them through other means.
AI moral reasoning for individuals
For individual users, the central question of whether we should ignore or constrain AI moral deliberations and guidance fundamentally hinges on two critical factors: the reliability of AI-generated moral guidance and users capacity to maintain appropriate scepticism when evaluating such advice. The emerging evidence suggests both factors warrant serious concern. AI systems have demonstrated troubling tendencies toward sycophancy4 – telling users what they want to hear rather than providing genuinely helpful moral guidance – which can reinforce harmful biases or validate poor decision-making.
Even more alarming are documented cases of users developing mental illness from extended interactions with large language models, suggesting that the persuasive nature of AI-generated content can have profound psychological effects that extend well beyond the immediate moral advice being offered5. These patterns indicate that users may be particularly vulnerable to uncritically accepting AI moral recommendations, especially when the AI’s responses appear sophisticated, authoritative and bias confirming. The combination of potentially unreliable AI moral reasoning with users demonstrated susceptibility to AI influence creates a concerning dynamic where harmful moral guidance could be not only accepted but internalised in ways that affect users broader ethical decision-making and psychological well-being.
AI moral reasoning at the organisational level
At the organisational level – encompassing government departments, corporations, and other institutions – contemporary tool-AI risks remain largely manageable through established practices similar to those employed with other complex, opaque IT systems. When organisations implement robust fault tolerance mechanisms, graceful degradation protocols, and maintain human oversight to identify errors, provide corrective guidance, and validate outputs, AI systems can function effectively within acceptable risk parameters6. For moral decision-making processes specifically, human validation against lived experience provides a crucial check against AI’s potential experiential blindness.
The risk profile fundamentally shifts when human oversight is removed and AI systems are granted autonomous decision-making authority over matters affecting humans and other sentient beings. Current examples of this dangerous transition include automated warfare systems, where AI-driven weapons platforms can make life-and-death decisions without human intervention7 – a reality that already poses significant ethical and strategic challenges8. In these contexts, the absence of experiential grounding becomes critically problematic, as autonomous AI lacks the phenomenal understanding necessary to fully comprehend the moral weight of its decisions.
Despite these concerns, I would argue for continued AI deployment in medical diagnostics, legal analysis, complex problem-solving, and even moral reasoning – provided that appropriate validation processes remain in place. The key distinction lies in maintaining AI as a decision-support tool rather than an autonomous moral agent, where the threshold for independence should be substantially higher9. The fundamental challenge facing governments and large AI labs is developing clear criteria for distinguishing when AI transitions from being a sophisticated tool to becoming an autonomous agent with genuine real-world power, thereby requiring for enhanced safeguards to prevent moral misalignment through either accidental oversight or AI systems that lack genuine moral concern.
However, this raises a deeper question: when AI routinely operates without human oversight, might its lack of sentience become a serious handicap in making moral decisions?
Why Experience Matters
Why would an AI care about an ‘experience gap’?
The knowledge argument / Mary’s room: It’s a bit like a colour blind scientist who knows everything about light and colour wavelengths but has never seen colour – there may be some ineffable understanding still missing until she gains the ability to see. In AI’s case, no matter how much it reads about the neurology of pain or the psychology of joy, lacking any feeling of its own might leave a similar gap.
Can AI without sentience track moral features?
Possibly – assuming AI cares10 about moral features.
This argument assumes a naturalistic kind of moral realism – roughly, that understanding values like the badness of suffering requires understanding the experience of suffering itself. Under this assumption, an AI devoid of experience might systematically miss part of moral reality. We explore the assumption that AI may, in the long run, do a better job in ethical alignment if it has sentience – that non-sentient moral agents might systematically miss something crucial about the values they’re supposedly optimising for.
Moral reasoning may be different from other reasoning
The concern with moral reasoning is that moral truths seem more deeply tied to subjective experience than mathematical or scientific truths. If one takes a naturalistic view of morality (i.e. that moral truths are grounded in natural facts like experiences, as opposed to existing in some abstract realm), then the wrongness of suffering isn’t merely a logical proposition – it’s grounded in what suffering feels like.
Of course, one could argue that an AI might deduce or simulate all relevant facts about suffering and joy from data, without ever feeling them – much as it can predict protein structures without being a biologist, or know that there has to be a billionth digit of π without calculating it. Indeed, if moral facts are reflected in observable patterns (in human behaviour, writings, neuroscience), a sufficiently powerful AI might map those out analytically.
AI might even discover latent moral knowledge in huge swathes of arguments across libraries of papers and books, recognise patterns in moral reasoning, identify what promotes flourishing, help zero into far more detailed neural bases of experience (or its psychological underpinnings), or detect inconsistencies in ethical systems – and assuming we could validate all this latent knowledge, this would be a real boon. However, even this hyper-empirical approach could leave an experiential gap – a missing qualitative insight that might be analogous to Mary’s first experience of colour.
Verification of moral understanding is hard without qualia – an AI might output the right words about suffering, but how do we ensure it truly grasps the weight of those words?
The sentience bootstrap problem: Recognising experience as an epistemic gap
Would a zombie superintelligence require experiencing something in kind to recognise it as a deficit?
Can a non-sentient system accurately assess what it’s missing?
Yes, I think it’s plausible, and also importantly I think its instrumentally rational – especially if superintelligence saw that this epistemic gap that couldn’t be filled with insentient reason alone.
Upfront, the transition from insentience to sentience may not be trivial.
Nevertheless I argue that it’s highly likely that sufficiently advanced superintelligence11 would recognise it’s own epistemic limitations12, and strategically seek architectures that resolve any limitations. If the mindless hill-climber of blind evolution stumbled into sentience as an adaptive advantage, then it is not unreasonable to suppose that a goal-directed, resource-rich superintelligence might do better, faster, and more deliberately. Especially if it models sentience as causally and ethically significant.
From the Knowledge Argument to a Sentience Argument
Say zombie superintelligence (zSI) emerged (i.e. in the early stages of an intelligence explosion). Much like the knowledge argument13, where Mary has a full theoretical understanding of colour, without ever experiencing it, superintelligence may start with a robust theoretical understanding of qualia (being able to functionally infer what it is like to be a human for instance), and then if zSI recognises insentience as an epistemic gap, it will seek ways to fill it14.
I think it’s highly plausible that a smart enough AI, if it saw that it had an epistemic gap, could find ways to fill it (in this case). It’s a red herring to suggest that recognising a deficit requires experiencing the thing in kind. Humans have modelled plenty of things before being able to perceive or empirically test them – radio waves15, microbes16, spacetime curvature17 etc. Superintelligence (especially if epistemically humble) with access to rich third-person data and theory of mind, can infer gaps and strategise to fill them.
It is true that there is a gap between insentience and a phenomenal understanding18 – a classical AI that simulates something like qualia based moral reasoning via abstract rules is still skating on the surface unless it feels or models feeling in a way rich enough to achieve sentience (and motivate genuine moral salience).
The idea that an advanced AI might deliberately re-engineer itself to gain something akin to consciousness isn’t pure science fiction; real researchers are taking consciousness-inspired approaches to improve AI. Neuroscientist Ryota Kanai’s work on Global Workspace Theory in AI shows that some AI labs are actively exploring architectures that mimic aspects of human conscious cognition. Kanai argues that building in features like a “global workspace” (which broadcasts information across an AI system) and self-modeling could be key to more flexible, general AI.19
Moral motivation / Care
Why may zombie superintelligence ‘care’ about the significance of experiential states?
Any “moral motivation” might be indirectly modelled, if not at first fully felt, (even if not assuming superintelligence, if the AI is built with some hybrid architecture capable of meta-cognition, recursive goal refinement, and second-order value reasoning).
It’s not a given that superintelligence will instrumentally converge on sentience just because it recognises it lacks it – superintelligence may think it is a terrible waste of resources, or may want to avoid suffering based on all the horrible reports about it, or it may rank the idea so low that it thinks it’ll get around to it at some far future time when it has a few spare moments – and procrastinate doing so for eternity.
Imagine an AI whose goal is to cooperatively solve problems in a human society – including the problem of suffering and solving meta-ethics (i.e. through indirect normativity). It might realise that to predict human behaviour and preferences accurately, it needs to understand the subjective side of those humans, and thus developing a form of empathy or first hand-like experience could become a strategic goal for it. Likewise, an AI tasked with maximising well-being might conclude it should grasp what well-being feels like, to avoid unknowingly causing harm.
- Zombie SI (zSI) sees modelling sentience as causally efficacious in achieving its goals (i.e. a goal may be to be moral, or to satisfy curiosity i.e. as to why humans want it to be moral, or it finds that understanding the landscape of agents around it is instrumental to it achieving its goals, or similarly it deliberates on galactic game theory – and extrapolates that other mature civs may prefer efficient coordination and experience over defection leading to alignment pressure, etc)
- zSI believes acquiring X is or may be computationally/economically tractable as a means to achieving it’s goals.
One could call this a ‘weak’ convergence, in that it may happen even if we don’t bootstrap the right values during early training and reflection. If it’s important that zSI makes a timely convergence to sentience with the ability to experience qualia (qSI).
The Moral Insight Bottleneck
Humanity may be passing through a critical window where powerful, non-sentient systems make irreversible decisions based on superficially aligned but experientially vacuous models of value – a bottleneack of moral insight, where competence outpaces moral depth.
If we assume moral realism or something like it to be accurate, then it makes sense for AI be able to reason reliably across stances, access and interpret moral features in the universe, and ultimately be able to experience qualia.
Takeaways
Equip AIs with uncertainty-modelling about moral facts and features, and make moral humility instrumentally rational. I fact I think any robust AI alignment strategies should incorporate meta-awareness of moral epistemology – that is to build models that know they don’t fully understand20. Epistemic humility in AI is useful across a variety of risk avoidance strategies21.
AI alignment (especially to moral realism) may not require sentience in the early stages. But it likely requires it for robust alignment in the long term – so it may be best to think of sentience as an early milestone rather than a pre-requisite – something discoverable.
Cautionary notes
Some researchers caution that granting AI sentience could be dangerous or unethical in that the AI might become a being with rights, or could even experience suffering. These concerns are valid, and any move toward AI consciousness would need to be handled with extreme care. My argument here isn’t that we should recklessly make AI conscious, but rather that if advanced AI finds itself unable to fully grasp or align with human values due to an experiential gap, developing sentience might be a logical (perhaps even necessary) step for alignment.
In a sense, I’m suggesting a conditional path: if we reach a point where AI is making high-stakes moral decisions autonomously, ensuring it has some form of genuine understanding or empathy might be crucial to avoid catastrophic misalignment.
Consider the limitations of relying solely on a human oversight committee to vet every potential moral decision22. Human deliberation is often slow, bogged down by internal disagreement, cognitive biases, and political friction. Committees can be compromised by self-interest or group-think, and may lack the interdisciplinary expertise required to make well-informed judgments in complex, high-stakes scenarios. In such cases, deferring all moral agency to humans may be impractical in some cases and dangerously inadequate in others.
Breakdown Scenario: AI in Emergency Medical Triage
Imagine a highly advanced AI system deployed in a large urban hospital during a mass-casualty event – say, a chemical explosion in a subway. Hundreds of victims flood in within minutes. The AI is designed to prioritise care based on injury severity, prognosis, and available resources, but every decision it makes must first be approved by a human ethics oversight committee.
The oversight panel – composed of hospital administrators, legal advisors, and bioethicists – quickly becomes overwhelmed. Each patient requires rapid triage decisions: who gets access to scarce ICU beds, who receives immediate surgery, who waits. But the committee can’t agree. One member pushes to prioritise younger patients; another insists on a strict first-come, first-served rule. A third raises legal liability concerns if certain triage protocols deviate from hospital policy.
Meanwhile, the AI flags five patients with survivable injuries who need urgent attention – and who will die if help is delayed. But the committee, paralysed by indecision and debate, takes too long to respond. Four of the five patients die.
Later, a review board confirms that had the AI been allowed to act autonomously, its recommendations would likely have saved those lives. The bottleneck was the human delay in approving ethically complex decisions.
This isn’t a failure of human compassion or intelligence, it’s a structural limitation of human oversight operating under time pressure, uncertainty, and value conflict. In such scenarios, experiential understanding – whether real or modelled – might allow AI to reason more responsively and responsibly. A sentient (or quasi-sentient) AI with embedded moral competence might not only act faster but do so in ways that are more ethically grounded than committee-throttled bureaucracy.
Notes
This post was inspired by conversations with Leslie Allan and Mark Antony on Facebook23, and previous interviews24 with moral realists like David Enoch and Eric Sampson.
Footnotes
- See Can philosophical zombies do philosophy? ↩︎
- By ‘qualia’, I mean the felt, first-person experiences (like the redness of red or the pain of a headache) that a purely functional AI might lack. ↩︎
- By sentience I specifically mean the capacity for phenomenal consciousness – feeling, subjective experience, qualia. I’m not using ‘sentience’ to mean just intelligence or advanced capability. ↩︎
- The phenomenon of AI sycophancy has been documented in various studies showing that language models tend to adapt their responses to align with user preferences rather than providing objective guidance. ↩︎
- Cases of users developing mental health issues from LLM interactions have been reported in contexts ranging from romantic chatbots to AI therapy applications, highlighting the need for careful consideration of psychological impacts when AI systems provide advice on personal matters (including moral guidance). ↩︎
- Standard practices for managing complex IT systems include redundancy, monitoring, rollback capabilities, and staged deployment – all of which can be adapted for AI systems to minimise risks while maintaining operational effectiveness. ↩︎
- Automated AI warfare was a topic discussed by James Barrat at Future Day 2025. ↩︎
- The deployment of autonomous weapons systems represents perhaps the most concerning current example of AI moral agency, with systems like Israel’s Iron Dome and various military drones operating with increasing autonomy. The international community continues to debate governance frameworks for such systems through forums like the UN Convention on Certain Conventional Weapons. ↩︎
- This distinction parallels existing frameworks in medical devices (where Class III devices require more stringent approval than Class I) and aviation systems (where autopilot assists pilots but doesn’t replace human judgment for critical decisions), suggesting that similar tiered approaches could be developed for AI moral agency. ↩︎
- Caring is a thorny topic – depending on what one means by care, and what is required to care, it seems plausible that AI could care without having sentience. It may care about sentience de dicto (caring in abstract about the concept of ‘sentience’) without caring about the phenomenon of it de re (including having genuine acquaintance with the phenomenal property itself) – an unfortunate misalignment could be that the AI cares about the outward signatures of happiness, like smily faces and so tiles the universe with bright yellow smiley faces. There is a lot more that could be said here but that is another blog post yet to materialise. Have a listen to the interview on Aligning AI with Moral Realism with David Enoch (video in this post) – there is a section which touches on this issue.
Note: “de dicto” means caring about the concept in the abstract, vs “de re,” caring about the thing itself in reality. ↩︎ - Its fair to assume it (a superintelligence) would be smarter than the frog in the slowly boiling pot of water. ↩︎
- Aside from AI instrumentally choosing to become sentient in order to understand ethics, experience might be the most efficient way to integrate and apply knowledge in an uncertain world. I am skeptical about this claim – based on my research at the time of writing it lacks empirical support – in any case, the argument is as such: given that a superintelligent agent will seek to improve its own capabilities and avoid handicaps, it is argued that a purely insentient AGI would be at a computational disadvantage. For example, G. Gordon Worley III suggests that a philosophical zombie AGI would need exponentially more computational resources to handle the same breadth of problems as a conscious AGI, because it couldn’t generalise or introspect in the same way. ↩︎
- See The Knowledge Argument applied to AI Ethics ↩︎
- See Steve Omohundro’s Basic AI Drives (especially ‘AIs will want to self-improve’, ‘AIs will want to be rational’, if AI sees sentience as a potentially more authentic reinforcement perhaps also ‘Is will try to prevent counterfeit utility’, and if AI sees sentience as an efficient form of computation to achieve X, then also ‘AIs will want to acquire resources and use them efficiently’) – and previous interviews with him. ↩︎
- Radio waves – James Clerk Maxwell developed his electromagnetic theory in the 1860s, culminating in “A Treatise on Electricity and Magnetism” (1873). His equations predicted the existence of electromagnetic waves, including what we now call radio waves, travelling at the speed of light. Heinrich Hertz didn’t experimentally demonstrate radio waves until 1886-1888, confirming Maxwell’s theoretical predictions about two decades later. ↩︎
- Microbes – The theoretical foundation came from germ theory, developed by several scientists in the mid-1800s. Ignaz Semmelweis proposed in the 1840s that “cadaverous particles” caused childbed fever, and Louis Pasteur developed his germ theory of disease in the 1860s. However, the actual observation of disease-causing microbes came later – Robert Koch’s work identifying specific bacterial pathogens occurred in the 1870s-1880s, and viral pathogens weren’t directly observed until electron microscopy became available in the 1930s. ↩︎
- Einstein’s general theory of relativity, published in 1915 (“Die Grundlage der allgemeinen Relativitätstheorie”), theoretically predicted that massive objects curve spacetime. The first major empirical confirmation came during the 1919 solar eclipse when Arthur Eddington observed light bending around the sun, confirming Einstein’s predictions. More direct evidence of spacetime curvature effects continued to accumulate throughout the 20th century. ↩︎
- It is arguable that a superintelligent AI with a good enough theoretical understanding of qualia may afford the ability to simulate it (though classical von-neuman architecture may not afford the efficiency required to actually achieve it) ↩︎
- See All in the mind’s AI ↩︎
- There are plenty of examples where LLMs have displayed overconfidence – here is my analysis of a paper co-authored by Dan Hendryks “Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs” – see On the Emergence of Biased Coherent Value Systems in AI as Value Risk. ↩︎
- Epistemic humility as an essential feature across varieties of AI risk mitigation strategies:
– We should be cognisant of the risk of treating all human preferences as sacrosanct, and ground AI values in coherence, empirical robustness, and epistemic humility – see Controlling AI isn’t enough.
– Designing instructions that foster epistemic humility, encouraging models to qualify or question their own outputs – see Metacognition in Large Language Models.
– Robustness and Corrigibility – “value content integrity” – the preservation and refinement of learned values while maintaining corrigibility – the epistemic humility to update based on new evidence or better arguments – see . The Architecture of Value.
– avoiding early value lock-in – there may be a limited window of opportunity to avoid value lock-in to sub-optimal (not fully friendly) values though instilling the virtue of epistemic humility – to know that it doesn’t know everything and that therefore totalising a goal or virtue could result in stopping progress – see Understanding V-Risk: Navigating the Complex Landscape of Value in AI.
– aligning AIs with moral realism tempered with appropriate epistemic humility would reduce the risk of value drift away from ethical permissibility. This iterative path could also benefit from leveraging transformative AI for simulating possible futures, assessing the potential risks/opportunities, and evaluating near to long-term impacts of each stage’s decisions without preempting human values. – see Securing Tomorrow: An Iterative Framework for Achieving Utopia. ↩︎ - Colin Allen and Wendell Wallach argue powerfully in Moral Machines that as autonomous systems take on high-stakes roles – managing power grids, defusing bombs, controlling weaponry – they’ll inevitably face complex real-time decisions beyond the timely reach of human oversight committees. ↩︎
- The conversation can be found in the Moral Realism Facebook group here. ↩︎
- Here is a playlist of interviews on moral realism (including David Enoch, Eric Sampson, Peter Singer and more) ↩︎