AI: 100% p(Doom)?

I haven’t seen a single watertight argument for default 100% p(doom) yet. They are all speculative as far as I can tell.

Though I do think we should have some credence for doom.

We Should be Concerned about p(Doom)

A catastrophic scenario does not need to be watertight to warrant deep concern. Doomer scenarios rest on four key assumptions: exponential runaway optimisation, the emergence of instrumental goals, the difficulty of alignment, and the impossibility of control. Given the current, empirical approach to developing AI – often summarised as “let’s train another huge model and see what it will be capable of!” – it is difficult to conclude that these foundational risks are being sufficiently addressed.

Last I checked, frontier LLMs seemed overconfident compared to humans (in that when surveyed with varying questions it was almost as confident in its answers where it was wrong as its answers where it was right) – so there definitely is an urgency to push for encoding epistemic humility – if there is adequate progress here, perhaps this may alleviate some of the risks of exponential runaway optimisation (ERO). There is work being done to improve calibration and encourage epistemic humility (i.e. curbing overconfidence) at frontier AI labs and in the ML community. Humility I see as a subset of epistemics in general – where I also think there will be improvement – though LLMs may be poor at it now, I’m happy to see Gemini+ willing the International Math Olympiad etc, and I hope to see AI win an International Moral Olympiad – I interviewed Ben Goertzel recently on AI reasoning where he raised some good points.1

The emergence of instrumental goals – this + ERO could be disastrous. Though some instrumental goals may work in our favour – i.e. cooperation [with humans oc, but also to what AI predicts potential cosmic host neighbours might want], avoidance of unnecessary conflict, stability, etc – I think my credences here are quite fringe, most people see instrumental convergence/basic AI drives as disastrous perhaps bc they see immediate narrow optimisation of a single thing like paperclip maximisation as how it will turn out – I’m not saying they won’t be, but I see possibility for otherwise (for which I’m not sure what my credence score is here yet).

As for difficulty of alignment, I favour indirect normativity – this way we don’t need to know how to specifically align AI to humanities values, which means we don’t need to be sure of which of our values is correct – we can get AI to help with all of that – this I think is one of Nick Bostrom’s most important points made in Superintelligence – but it’s near the end of the book, to some it may be kind of technical.. perhaps that’s why not many people talk about it – so I did bring this up in my recent interview with Bostrom – where I am happy to say, with regard to p(doom) he expressed a fretful optimism, so his credence has seemed to shift in a more positive direction.

Future Thriving

On the other hand, what do I think are the most likely routes for human thriving in the future?

I’m not as committed to human thriving longterm as I am committed to general thriving in the long term – perhaps humans are unfit for the future – however humans could uplift to some posthuman forms – in fact it may be morally good to do so.

Containment won’t work indefinitely if superintelligence is smarter and more powerful than we who are designing it’s containments. Though this doesn’t mean AI kills us all. I think control/containment is good early on, but we should also focus on motivation early – getting good starting values in place, and through indirect normativity (see Superintelligence ch13) we and/or AI can discover better attractor basins in the landscape of value2. First focus on maximising the likelihood of an ok outcome (maxipok) such that there is enough existential security and stability to afford long reflections focussing on where to next.

I should also say, that we/AI should be wary of maximising anything for which there is uncertainty about – and if we are confident of no uncertainty, be uncertain about that confidence. Under uncertainty we should be generally precautionary – maxipok or maximin .. whatever the best longtermist options are that focus on actions with near-best expected long-term effects without necessarily out-swamping short-term concerns (the cycle of getting closer to ideal could be, like, eternal – so we should at least enjoy the journey).

> I haven’t seen much progress in the project of codifying the best version of humanity’s actual values. That means that even if we solve the current prosaic alignment project–perfect mechanistic interpretability, no deception, no weird internal values or sharp left turns, just faithfully executing the creator’s intent as the creator understands it–we get Critch’s RAAP failure mode.3

I think some of ethics does a pretty good job at representing the best of human values. CEV4 has the problem of aggregation of incoherent and inconsistent values across the entire human population. There would have to be some norms injected into the evaluation criteria for deciding on what values to (de)prioritise etc – as such we are back at ethics.

I lean objective5 when it comes with ethical norms as I do with ontological and epistemic norms – if I’m right that makes it easier since then morality is discoverable, arguably in a similar way to nature and mathematics being discoverable – a feat superintelligence may be far better than us at – and hence this goes some way in improving my credence in the idea that AI could end up more moral than us. One concern is whether the superintellignce would care, despite the fact that it would be well beyond PHD level in every single aspect of ethics and would be able to tie all the relevant domains of academics together in ways humans don’t have the head-space to. I hope, and have reason for credence in that it would..

On Motivations for AI to Care

In a universe converging upon saturation with sentient superintelligent civs, the longterm payoff matrix is likely to favour cooperation and defence vs defection and offence by FAR in IMO. If I’m right, and a budding superintelligence is smart enough to work this out, it may in expectation begin the process of initiating a longterm value handshake before it meets it’s cosmic neighbours.. It will see more benefit in being part of the cosmic collective (the collective knows the universe is finite, and wants to avoid states of affairs that wastefully burn through resources), and will want to smooth the relationship with evidence of a cooperative nature – it won’t be able to hide genocide. By the time it interfaces with other civs, signatures of our quadrants history will have already propagated into other light cones, intercepted, and rebroadcast.. A transparency of history.. To the extent the information will remain intact enough to denoise… PLUS interpretability will have been solved – no mind will be able to hide it’s memories, its motivations, and in general it’s thoughts. There are selection pressures that go beyond darwinistic natural selection, that are really ‘cognitive’ selection pressures, and so…. the selection pressures of an intelligent cosmic collective will be huge IMO.

Regarding RAAP/molochian dynamics, I’d hope that such a superintelligence(s) (or an adequately intelligent and norm sensitive AI) would recognise the multipolar state of affairs, see the zero-negative-sum game that is being persisted by the collective, and work to carefully dismantle and replace with positive sum..


  1. ↩︎
  2. see The Landscape of Value ↩︎
  3. This question and others came up in a discussion on facebook here. My responses formed the bases for this post. ↩︎
  4. see the post on Coherent Extrapolated Volition ↩︎
  5. personally I lean utilitarian moral realist ↩︎

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *