The Bitter Lesson in AI and Motivation Selection

I was struck by how Richard Sutton’s The Bitter Lesson1 in AI resonates with motivation selection and indirect normativity within ongoing debates in AI alignment.

Sutton argues that human-crafted domain knowledge (e.g., “how to play chess” or “how to recognise phonemes”) is brittle and doesn’t scale with compute, compared to search + learning.

If we can leverage both general purpose and scalable computation to aid in AI ethics, we could solve the problem far more reliably and quickly than relying on brittle hardcoded methods and data (human knowledge).2  I.e. large scale search and machine learning could be used to iterate through and make sense of complex ethical choices.  Here I apply Sutton’s “bitter lesson” to AI ethics.3

Just as directly specified hand-crafted knowledge in chess, Go, or vision systems in AI failed to scale, directly specified moral theories in AI may likewise fail to scale and at some stage plateau. Similarly, building in how we think ethics works may feel satisfying but could prove counterproductive in the long run. Instead, it may be more fruitful to develop systems capable of discovering morality and ethics themselves – as we do – rather than mimicking our current, inevitably limited, understanding. This seems closely aligned with Nick Bostrom’s idea of indirect normativity: designing AI not to contain our values, but to discover what we would value if we were wiser, more informed, and more coherent.

Recently Gemini won the International Math Olympiad.  I hope that one day AI will also win the Moral Olympiad.4

The Bitter Lesson Applied to AI Alignment

In short:

  • Short-term temptation: hand-crafted ethics → satisfying, publishable, but brittle.
  • Long-term success: scalable processes of moral discovery → less anthropocentric, more robust across increasing compute and context.
  • The bitter part: this may mean much of hand-coded ethics via direct specification in AI will, like handcrafted features in computer vision, eventually be seen as a dead end.
Hand-Crafted Ethics ≈ Hand-Crafted Knowledge

The dynamic in Sutton’s Bitter Lesson likely applies to AI ethics. Hand-coding moral theories or rule sets into AI, whether in the form of Kantian imperatives or utilitarian calculus, might look productive at first, but such systems are prone to plateau, to break down in edge cases, or even to constrain further moral discovery. Just as symbolic chess programs were eclipsed once brute-force search and learning were scaled, attempts solely based on embedding ethical features directly are unlikely to provide a robust long-term strategy for alignment.

Mimicking “How We Think” vs. Discovering As We Do

.. or ought to do

Sutton warns against building in how we think we think. In the ethical domain, this approach (at least by itself) may kneecap AI’s ability or motivation to discover moral truth. Many have already rebutted the idea of encoding folk moral reasoning patterns or philosophical heuristics into AI systems. While this might feel satisfying to us, it risks constraining the AI’s capacity to learn in ways that mirror the limitations of past approaches in other fields.5

On top of this we should be aware that embedding our own reasoning structures could lock systems into human biases, preventing them from scaling toward commonly held norms – or if you are moral realist inclined, stance-independent moral facts. If we hard-code our present moral frameworks, we may unintentionally create blind spots or path dependencies that limit the AI’s ability to transcend our parochial perspectives.

Sutton instead suggests that we should focus on building systems that can discover – systems capable of generalising beyond our current understanding. This resonates strongly with the idea of indirect normativity: rather than telling AI what to value, we design it to discover what we would value if we were wiser, better informed, and more coherent. In this way, the system’s moral insights are not frozen by our present limitations, but open to scaling as knowledge and computation expand.

Scalable Utilitarianism and Scalable Moral Realism

Utilitarianism, which is often about maximising overall well-being, can really benefit from this approach. Instead of relying on human-defined shortcuts, the AI can use large-scale data and learning to approximate what really leads to the best outcomes. In other words, it’s a way to let the AI figure out the utilitarian calculus dynamically, rather than imposing a fixed formula.

As I’ve written a lot about elsewhere, from a moral realism perspective, this approach might actually get us closer to stance-independent moral truths. By letting a powerful learning system explore moral landscapes, you might uncover deeper patterns of ethical reasoning that we wouldn’t have thought of on our own. It’s like leveraging the AI’s capacity to find stable, scalable moral insights that aren’t tied down by human limitations.

Indirect Normativity as a “Scalable Meta-Method”

Seen through Sutton’s lens:

  • Search = exploring value space (simulations, counterfactuals, reflective equilibria).
  • Learning = refining moral hypotheses through evidence, reasoning, debate, and long reflection.
  • Indirect normativity is then the meta-method: rather than encoding our ethical “discoveries,” we encode the capacity to discover moral truths as we would under idealised reflection.

That’s directly parallel to Sutton’s prescription for AI more generally: don’t hardcode content, hardcode processes that scale.

Caveats

Smuggling biases into the objective function

One challenge might be that even if you rely on scalable, general methods, we still need some kind of value function or objective to guide the learning. In utilitarianism, that’s often “maximising overall happiness” (total utilitarianism) or something like it. But defining that objective is a huge philosophical problem with lots of peer disagreement. We might end up smuggling in biases or assumptions just by how we frame that objective, and that’s not so different from the old handcrafted approach in a new guise.

Uncertainty in what ideal moral agents would think and discover

Another critique might be that indirect normativity is still a kind of idealisation that assumes we can know what an “ideal reasoner” or “ideal moral agent” would discover. But that’s a big assumption. If the AI is just scaling up its own search and learning, it might find solutions that are very alien and that don’t align with what humans actually consider moral. In other words, just because it’s general-purpose doesn’t guarantee it’ll land on a form of utilitarianism or moral realism that we’d be comfortable with.

  • Epistemic humility, and the ability to change

Guardrails

There’s also the risk that relying purely on scalable methods could overlook the need for certain guardrails or constraints that come from human moral reasoning – especially in the short to potentially medium term. AI doesn’t suddenly become an ideal ethical observer just by reaching AGI level.6  Just because an AI is learning ethics through indirect normativity doesn’t guarantee it will by default land on human-compatible moral truths or at least permissible value basins – guardrails may be required to help nudge AI back on track if it looks like its trajectory is aiming at something obviously or not-so-obviously abhorrent.  So, even if we don’t hard-code ethics, we might still need some framework to ensure the AI’s explorations stay within acceptable bounds (whether it be down to whether we find AI’s ‘moral intuitions’ acceptable, or whether the AI’s meanderings reflect the brittleness of currently powerful but inconsistently reliable AI systems). Otherwise, it could end up making decisions that are logically “optimal” in some alien sense but ethically bizarre and even harmful from a human (or generally sentient) perspective.  

Moral relativism at scale 

Another worry is that relying heavily on meta-methods might lead to a kind of moral relativism at scale. If the AI is just optimising based on what it discovers, it might adapt to different ethical norms in different contexts, which could make it less predictably aligned. In other words, it might not solidify into a stable moral framework that we trust, and that could lead to unexpected or inconsistent behaviour.

Considerations on caveats

Many of these caveats may be offset with general epistemic humility – there is a lot to  leverage by allowing AI to evolve into a moral agent by discovery (possibly along with wise prescription) rather than prescription alone. It’s a way to harness the power of scalable methods to potentially discover robust ethical principles in a way that aligns with big-picture moral theories – but care should be taken in avoiding overconfidence and pre-mature lock-in to these theories.

In a similar vein, we ought to avoid over-optimisation in the near-term.  Perhaps apply the Maxipok rule here.7

Conclusion

If Sutton is right and these general-purpose, meta-method approaches are the way AI naturally progresses – basically, that scaling up compute and letting learning algorithms discover solutions overpowers hand-coded, human-rule-based approaches, then it does suggest something quite hopeful. It means that if we design AI safety or ethical alignment through this lens we might end up with systems that are actually more reliable and robust in converging on ethical behaviour.  

In other words, if you’re not trying to hard-code ethics but instead are giving the AI the tools to discover what is ethical – like a moral compass that it learns to calibrate as it gains more understanding – then you might get a more stable and adaptive form of AI alignment.

And ethically, yes, that could be a huge boon such that  the AI isn’t just following a rigid set of human instructions that might become outdated someday or fail in novel situations. Instead, it’s capable of continuously refining its ethical understanding as it learns all the while being guided by principles of epistemic humility, which might indeed make it more reliably aligned over time. 

So, to sum it up: if Sutton’s view is correct and AI really does thrive on these general methods, then using that same philosophy for AI safety could well lead to more ethically reliable AI. It’s a hopeful idea that these scalable methods might not just make AI more capable, but also more ethically aligned in a robust and dynamic way.

Indirect normativity and Sutton’s bitter lesson converge on the same meta-insight – dont freeze our parochial understanding into AI; build AI that can be more moral than us by discovering truths themselves.

  1. See Sutton’s The Bitter Lesson – March 13 2019. ↩︎
  2. However, there could be some middle ground between automated value learning and hand-coding values – as discussed in this interview with Colin Allen and the book he co-authored with Wendel Wallach ‘Moral Machines‘. ↩︎
  3. I have not yet spoken with him about this, so I do so without Sutton’s consent. I don’t think he would mind. ↩︎
  4. Well, actually it’s funny – Moral Turing Tests have surfaced interesting results (see ‘AI Language Model Rivals Expert Ethicist in Perceived Moral Expertise’ link to pdf). I’ve covered this interesting development in a few posts. ↩︎
  5. Early computer vision research focused on hand-crafted feature extractors like edges and algorithms such as Scale-Invariant Feature Transform (SIFT), but these methods struggled with the complexity and variability of the real world, leading to performance limitations and stagnation until the advent of deep learning. Researchers were often “stuck” because these hand-designed features couldn’t capture sufficient information, and the approaches lacked the ability to learn from vast amounts of data, which was crucial for developing truly robust and generalised vision systems ↩︎
  6. There’s the risk that via indirect normativity we/AI will come to the premature assumption of arriving at a kind of idealised “if we were wiser, more informed” endpoint, but as AI learns more, as more sense is made of the landscape of value, the AI might see things differently, and as such it’s predictions about what the idealised notion of morality may change – we should be worried about early strong AI veering off in bizarre directions which are hard to predict. So the reality might be a lot messier and might not give us a reliable short term convergence to peaks in the moral landscape we may hope for – it may be a long journey punctuated by periods of long reflection etc. I reiterate here, we need to infuse AI value learning with epistemic humility. ↩︎
  7. The Maxipok Rule is a decision-making principle that urges individuals or societies to “Maximise the probability of an OK outcome,” where an “OK outcome” is defined as any outcome that avoids existential catastrophe. Proposed by Nick Bostrom, it functions as a rule of thumb to prioritise efforts on preventing existential risks, such as nuclear war, pandemics, and AI misuse, because even small reductions in these risks can have immense expected value given humanity’s vast potential future. ↩︎

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *