Emergent Machine Ethics as a Lens on Moral Realism
Emergent Machine Ethics (EME) posits that AI agents, through complex interactions, can develop ethical behaviours not explicitly programmed by humans. This emergent morality seems to align with moral naturalism, which suggests that moral truths are grounded in natural facts and processes. If AI systems can independently develop consistent ethical norms, it could indicate that moral principles are discoverable aspects of the natural world, supporting moral realism.

It would be interesting to see if across broad tests, different populations of AI agents naturally develop similar moral alignments under similar or widely varying conditions or payoff matrices.
My super speculative intuition is that most superintelligences with adequate capacity to reason1 and make sense of the universe will converge on some similar moral stances.
Implications for AI Alignment Strategies Informed by Moral Realism
An AI alignment approach informed by moral realism may aim to align AI with objective moral truths, rather than solely with human preferences or values which moral realism posits may be misguided or incomplete from an objective standpoint.
Under EME experimentation, if diverse AI systems independently converge on similar ethical norms, it may suggest the existence of objective moral truths discoverable through natural processes. Conversely, if AI systems develop conflicting moral frameworks, it could imply that morality is either arbitrary or more context-dependent than standard moral realism frameworks predict, challenging the universality claimed by moral realism.
EME offers a tantalisingly bright perspective on moral realism by providing a test-bed for observing the emergence of ethical norms in non-human agents – increasing the sample size. This intersection invites further interdisciplinary research to explore whether AI can uncover new or reflect assumed objective moral truths, thereby informing our understanding of morality’s nature – and helping plot the landscape of values.
Modelling ethical and moral systems: Artificial Life (ALife) techniques, such as agent-based modelling and evolutionary algorithms, can be used to simulate the emergence and evolution of cooperative and seemingly altruistic behaviours in populations of artificial agents. These simulations can explore how simple rules and interactions can lead to complex social dynamics that resemble aspects of ethical systems observed in nature. While these models can provide insights into the mechanisms by which complex behaviours might arise, they typically do not make claims about the objective truth or falsity of the moral norms that emerge within the simulation, which is the core claim of moral realism. They model the processes of norm development or behaviour, not the ontological status of morality.
Since currently there is a lack of empirical evidence from ALife experiments that validate such approaches, this may be an fruitful research avenue to explore.
Emergent Machine Ethics (EME): A New AI Ecosystem for Autonomous Evolution and Value Formation
A NeurIPS 2025 Workshop Proposal ‘Emergent Machine Ethics (EME): A New AI Ecosystem for Autonomous Evolution and Value Formation’ is described as
Recent advances in AI—such as large-scale models, multi-agent reinforcement learning, and self-modifying algorithms—have heightened the prospect of AI agents interacting among themselves in ways that extend beyond human-designed objectives. Through these interactions, unique ethical norms or values may emerge (i.e., self-organize or evolve) within or among AI systems, independent of top-down rules imposed by human designers.
This workshop focuses on Emergent Machine Ethics (EME), which explores how an AI society could autonomously develop its own ethical or value systems over time. This perspective contrasts traditional alignment approaches, which generally seek to ensure AI behavior remains consistent with explicitly defined human values and goals.
EME informed AI Alignment Approaches
- Emphasises autonomous creation of norms through self-organisation or evolutionary interactions among AI agents (and possibly with humans).
- Seeks to understand how these AI-driven ethics might coexist with – or diverge from – human societal values, focusing on dynamic, bottom-up processes.
- Prioritises the idea that AI actively develops its ethical framework, rather than simply adhering to a static human-defined set of rules.
It would be advantageous for AI safety to:
- develop a theoretical foundation for EME by integrating insights from self-organisation theory, evolutionary game theory, complex systems, and multi-agent RL to form an interdisciplinary framework for EME.
- Compare and contrast with other modern AI alignment methods, clarifying how EME differs, and evaluate the opportunities and risks of empowering AI to autonomously form ethical norms.
While the specific workshop proposal details aren’t publicly available yet, the concept of EME suggests that AI systems could, through complex interactions, develop ethical behaviours not explicitly programmed by humans. This emergent morality could provide empirical insights into how moral norms might arise naturally, supporting the moral realist’s claim that objective moral truths exist and can be discovered through observation and reason.
However, it’s essential to approach this intersection critically. The emergence of ethical norms in AI doesn’t automatically validate moral realism. These norms could be artefacts of the specific architectures or training data used, rather than indicators of objective moral truths. Therefore, while EME offers a promising avenue for exploring moral realism, it also challenges us to refine our understanding of what constitutes objective morality in artificial agents
AI Alignment, Moral Realism and ALife
Are AI alignment approaches informed by moral realism validated by artificial life (ALife) experiments?
What if similar morals emerged from ALife without them having capacity to either build any form of abstract representation, or with the capacity build abstract representations – just not the kind most of us are comfortable calling “understanding”?
– more to come…
Footnotes
- There is a debate as to whether LLMs can truly do abstract reasoning.
When humans read something they (usually) build an internal model – a representation of what’s going on (concepts, relations, causality). Then, if asked to explain it, they use that model to generate words.
LLMs don’t (obviously) build such models like humans do. They’re trained on pairs of input text → next token, so they learn to map directly from text to more text. The worry is that they may skip the “understanding” stage – the internal construction of abstract knowledge – and instead output plausible sequences without any grounded semantic model.
Ben Goertzel and some others argue that LLMs leap from the representation embedded in training text to fresh text, without “holding” an abstract model in their working memory.
Others argue that LLMs build vectorised, distributed abstractions, just in a form alien to human introspection. Humans reason by manipulating causal-structural models they can consciously access; LLMs reason by manipulating statistical-semantic embeddings that are less transparent but still encode abstractions.
Perhaps it is so that LLMs do build abstract representations – just not the kind Goertzel (or we) are comfortable calling “understanding“. ↩︎