|

Early Philosophical Groundwork for Indirect Normativity

Note: this is by no means exhaustive. it’s focus is on western philosophy. I’ll expand and categorise more later…

The earliest precursors to indirect normativity can be traced back to early philosophical discussions on how to ground moral decision-making in processes or frameworks rather than specific, static directives. While Nick Bostrom’s work on indirect normativity consolidated and formalized these ideas for AI, earlier approaches in moral philosophy laid the groundwork.

Here are some notable precursors and early strategies:

5th Century BCE – Socratic Method

The Socratic method involves questioning and refining ideas through dialogue, aiming for coherence and truth. This method of iterative questioning influenced Rawls’ Reflective Equilibrium and the procedural aspect of ideal observers seeking moral clarity through deliberation.


4th Century BCE – Aristotelian Phronesis and the Golden Mean

Aristotle’s practical wisdom (phronesis) and the “golden mean” between extremes emphasizes situational reasoning and finding the best course of action under particular circumstances – this influenced Simon’s satisficing approach.

Aristotle’s idea of an ideal virtuous agent can be seen as a precursor to the Ideal Observer Theory.

3rd Century BCE – 2nd Century CE – Stoicism

The philosophers Epictetus, Marcus Aurelius and Seneca’s stoicism emphases reason and detachment from emotions which informed the impartiality central to Ideal Observer Theory. Also their approach to moral decision-making through rational reflection resembles elements of proceduralism.

As well, living in accordance with nature probably influenced moral realism (arguably the most likely candidate for what Nick Bostrom calls Moral rightness).

Mediaeval Era – Scholasticism

Mediaeval philosophers like Thomas Aquinas and William of Ockham tried to reconcile faith and reason through systematic analysis and refinement of theological and moral principles. Scholastic methods of seeking coherence among complex ideas influenced Rawls’ Reflective Equilibrium, and the procedural nature of scholasticism informed early versions of moral reasoning frameworks (I.e. Humean proceduralism).

17th – 18th Century – Enlightenment Rationalism

Descartes emphasized systematic doubt and rational methods, and Kant introduced the categorical imperative, a universal principle of morality based on rationality.

Kant’s emphasis on universalizability influenced the impartiality and objectivity of the Ideal Observer Theory.

Rationalism provided a foundation for Humean proceduralism’s focus on reasoned processes.

18th – 19th Century – Utilitarianism

Jeremy Bentham and later John Stuart Mill introduced Utilitarianism – in broad brush strokes it’s all about maximizing happiness or utility as the foundation of morality.

Utilitarianism introduced systematic methods for moral evaluation, aligning with satisficing and procedural ethics.

Ideal Observer Theory indirectly draws on utilitarian ideas by considering outcomes that align with universal preferences.

18th Century – Scottish Enlightenment

David Hume emphasized moral sentiments and the role of shared human empathy. Hume’s empiricism and proceduralism were pivotal for later moral theories.

Adam Smith introduced the “impartial spectator,” concept strongly influenced both Ideal Observer Theory and Rawls’ notion of impartiality in the original position.

19th – 20th Century – Pragmatism

Philosophers: Charles Sanders Peirce, William James, John Dewey

Truth and moral reasoning as iterative processes tied to practical consequences. Pragmatism’s focus on evolving truths through inquiry influenced Reflective Equilibrium and procedural approaches.

John Dewey’s emphasis on adapting to changing contexts aligns with Herbert Simon’s satisficing.

1971 – John Rawls’ Reflective Equilibrium

Rawls proposed the idea of reaching a state of reflective equilibrium, where moral principles and judgments are adjusted until they align in a coherent framework. Rawls’ method involves iterative refinement of values, a process that could inspire AI systems tasked with resolving moral uncertainty. The emphasis on balancing principles and judgments parallels the goal of finding stance-independent moral truths.

20th Century – Ideal Observer Theory

Proposed by Roderick Firth (1952) and developed in earlier forms by Adam Smith and others, this theory suggests that moral truths are what an ideal observer—fully rational, informed, and impartial—would judge them to be. Indirect normativity strategies often involve creating systems to simulate or approximate the reasoning of an ideal observer. This idea influenced later discussions on delegating moral reasoning to AI.

18th Century – Humean Proceduralism

David Hume argued that morality arises from human sentiments and can be understood through shared reasoning processes rather than fixed rules. Hume’s emphasis on procedural ethics influenced strategies that rely on iterative processes or deliberation to determine moral values.

Mid-20th Century – Game Theory and Rational Choice

Figures like John von Neumann and Oskar Morgenstern explored decision-making under constraints and uncertainty. These ideas shaped Herbert Simon’s satisficing theory by providing a mathematical framework for “good enough” decisions.

Rational choice theory also influenced the design of Ideal Observer frameworks.

1956 – Herbert Simon’s Satisficing

In decision theory, Simon introduced the concept of satisficing—selecting an option that is “good enough” or permissable based on constraints and available knowledge. The notion of satisficing in AI design could involve approximating moral truths or satisfactory outcomes without requiring perfect knowledge or exhaustive calculation.

1990s – Early AI Alignment Discussions

Key figures like Marvin Minsky, Peter Voss, and others began discussing how AI could align with human values through iterative learning and extrapolation. These discussions laid the groundwork for delegating moral reasoning to AI systems capable of improving their ethical understanding over time.

2004 – Eliezer Yudkowsky’s Coherent Extrapolated Volition (CEV)

Arguably the earliest explicit strategy for indirect normativity in AI was CEV, which formalized the idea of delegating moral reasoning to an AI that extrapolates humanity’s collective values (or volitions) under idealized conditions. While earlier philosophical strategies, and earlier discussions on AI alignment influenced this approach, CEV is often represented as the first comprehensive framework tailored specifically to the challenges of AI alignment through indirect normativity.

2012 – Nine Ways to Bias Open-Source AGI Toward Friendliness – Ben Goertzel and Joel Pitt

In this paper Goertzel and Pitt explore some means of deferring value discovery to rational processes or AGI agents rather than directly specifying values to AGI.

Coherent Aggregated Volition (CAV)

Coherent Aggregated Volition (CAV) aims to define a value system derived from the collective values of humanity. Rather than deeply extrapolating what humanity would value under idealized conditions (as in Coherent Extrapolated Volition), CAV focuses on identifying a compact, coherent, and consistent set of values that is close to humanity’s current collective value set.

Coherent Blended Volition (CBV)

Coherent Blended Volition (CBV) evolved as a refinement of the CAV concept, addressing potential misinterpretations and aiming for greater inclusivity and harmony. The core idea is that the combined values of a diverse group of people should not be viewed as a simple average but rather as a conceptual blen d that synthesizes the most essential elements of their different perspectives. Drawing inspiration from the “conceptual blending” theory of Fauconnier and Turner (2002), CBV creates a new, synthesized value system that integrates key aspects of divergent views into a harmonious whole. For CBV to work, individuals whose views are being blended must feel that their core values are sufficiently represented in the resulting synthesis. Lastly, the blended value system should be elegant, internally consistent, and compact while respecting the diversity of inputs..

2015 – Stuart Armstrong’s Motivated Value Selection for Artificial Agents

This paper addresses the concern is that agents might manipulate the value selection process to align with their current preferences, leading to unintended behaviors. Armstrong examines conditions under which agents might engage in such manipulative behaviors and introduces the concept of an “indifferent” agent. This agent is designed to be neutral toward the value selection process, neither promoting nor obstructing changes in its values. By maintaining indifference, the agent can transition from maximizing one utility function to another without bias.

2018 – Iterated Distillation and Amplification (IDA) – Paul Christiano

While not explicitly framed as indirect normativity, Iterated Distillation and Amplification (IDA) is a framework proposed by Paul Christiano for training AI systems to achieve superhuman performance while remaining aligned with human values. The process involves two key steps: amplification and distillation.
* During amplification, a human overseer collaborates with the AI to enhance decision-making capabilities, effectively creating a more powerful composite system.
* In the distillation phase, a new, more efficient AI model is trained to emulate the improved decision-making of the amplified system. By iteratively repeating these steps, the AI progressively advances in capability, aiming to surpass human performance while maintaining alignment with human intentions.

This iterative process is inspired by techniques like those used in AlphaGo Zero, where a policy network is repeatedly refined through self-play and evaluation. In IDA, the amplification step is interactive and human-directed, allowing humans to guide the AI’s development by delegating tasks and providing feedback. This approach seeks to balance the trade-off between enabling novel AI capabilities and ensuring alignment with human values, addressing challenges in specifying complex objectives and mitigating risks associated with misalignment.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *