Transparency of History in Galactic Game Theory

… and the Long First Turn (of the new game).

We shouldn’t just design AI to behave nicely in our human sandbox in case the sandbox turns out to be part of a much larger galactic metropolis. In this case we’d better make sure the AI doesn’t start flinging galactic sand in anyone’s eyes. We may at some stage encounter technologically mature civilisations far more advanced than us, with norms and expectations. ¹

I argue that maybe AI should investigate moral rightness² (which I think may approximate optimum win win game theory akin to moral realism) – that which binds mature civs together is because they actually track moral truth – not just etiquette or galactic vibes, but something like perhaps moral realism, or whatever moral truth ends up being. And any sufficiently smart AI, even a selfish one, might recognise that it’s in its best interest to align with those truths because doing so makes it more likely to survive in the long run, and be accepted into the galactic community (perhaps something like Iain M. Banks ‘culture’).

If civ survives and reaches the stars, we will likely bump up against other civs at some stage.

How might galactic-scale incentives and interactions play out?

I’d venture that cooperative civs may react negatively to our selfish impulses to optimise near term survival and exclusive benefit – and willingness to throw values under the bus to gain and maintain perceived competitive advantage.

Deep History and Radical Interpretability

Mature civilisations will have achieved the means to do deep history with cosmic reach and radical interpretability:

Deep History: Earth emits light frequencies, radio/TV signals and other stuff which expand outward in an ever-growing sphere – some of these emissions may have already been picked up by local sensors; i.e. Von Neumann probes acting as remote detectors sent by alien civ(s), or will at some stage will reach the expanding sensory cone of alien civ(s).
Radical Interpretability (or Ideal Interpretability): On top of that, it’s highly likely that the interpretability problem will have been solved far beyond the “let’s do mechanistic interpretability on LLM neural nets to discover the weights to explain behaviour”³ – it will be solved to the degree that they will be able to read a subjects neural networks or whatever cognitive architecture is in use in near real-time as the subject are thinking/computing – making minds essentially transparent. In fact that may be the standard way of communicating in the long term future.

In the event that we meet an advanced alien civ, this will make it really difficult to lie about our history, motivations and intentions.

Assumptions in Brief

Observability of Histories: Advanced civilisations can, in principle, reconstruct each other’s past behaviours via electromagnetic signals or other detectable traces, making it hard to lie – at least about major historical facts.
Radical Interpretability: Being able to read minds by observing the physical structure and information flow within cognitive/computing systems (i.e. brains, ANNs, or whatever cognitive architectures can afford cognition) in near real-time.
Universal Game: Over long timescales, advanced civilisations become aware of each other, know about each other’s capabilities, moral track records, and they all come to know that everyone else comes to knows this.
Moral Realism (At Least in Core): There are universal moral facts or near-universal moral insights around suffering, well-being, and “fair” distribution of resources; advanced intelligences can discover or converge on them.
No Extreme Offensive Advantage: There is no overwhelming “first-strike” capability that trivially allows a powerful aggressor to wipe out rivals without risk or cost.
“Cooperator” vs “Defector”: A simplified label. Cooperators respect moral norms that reduce suffering, distribute benefits fairly, and coordinate to uphold just, stable frameworks; defectors exploit or harm others for narrower self-interest.

Detailed Assumptions

Existence of Multiple Advanced Civilisations

The Drake Equation, developed by astrophysicist Frank Drake in 1961, this equation estimates the number of active, communicative extraterrestrial civilisations in the Milky Way galaxy. While the exact number remains uncertain due to variables like the fraction of planets that develop intelligent life and the longevity of such civilisations, the equation provides a framework suggesting a non-zero probability of multiple advanced civilisations existing – see SETI Institute. However, despite the probabilistic support from the Drake Equation, the Fermi Paradox highlights the contradiction between the high probability of extraterrestrial life and the lack of evidence or contact with such civilisations. This paradox underscores the uncertainties and challenges in detecting and confirming the existence of other advanced civilisations. Planetary Society

Consistency of Physical Laws

It is a foundational principle in physics that the fundamental laws governing the universe, such as those related to gravity, electromagnetism, and quantum mechanics, are consistent and universally applicable. This uniformity implies that all civilisations, regardless of their location in the cosmos, operate under the same physical constraints. math.ucr.edu

Natural Immutability of Information Propagation Through Space:

According to Einstein’s theory of relativity, there is a cosmic speed limit – the speed of light in a vacuum is a universal constant, approximately 299,792,458 metres per second. This speed represents the maximum rate at which information or matter can travel, establishing a natural limit on information propagation through space. This limitation means that emitted light signals cannot be revoked or unsent, and obviously it makes real-time communication across interstellar distances impractical. However there are limitations to this assumption – while the propagation is immutable, the detectability of such information diminishes over vast distances due to factors like signal dispersion and cosmic noise. Advanced civilisations might overcome these challenges with sophisticated detection technologies at a distance, or nearby sensors.

Mature Civilisations’ Capability to Detect and Analyse Extraterrestrial Signals

Advancements in Observational Technology: Technologically mature civilisations are likely to develop sophisticated instruments capable of detecting and analysing electromagnetic signals across vast distances. For example, human initiatives like the Search for Extraterrestrial Intelligence (SETI) employ large radio telescopes to listen for potential extraterrestrial communications.

Detection of Technosignatures: Mature civilisations may search for technosignatures – indicators of technology or intelligent life – such as specific radio frequencies, laser emissions, or other anomalies that deviate from natural cosmic phenomena. arXiv+2SETI Institute+2arXiv+2

Potential for Value Convergence Among Advanced Civilization:

Moral Realism and Shared Challenges: The concept of moral realism suggests the existence of objective moral truths. If this is valid, advanced civilisations might independently arrive at similar ethical principles through rational deliberation. Additionally, facing universal challenges like resource management, environmental sustainability, and existential risks could drive civilisations toward analogous moral frameworks that prioritise cooperation and mutual well-being.
Debates on Moral Convergence: However, this assumption is subject to debate. Some scholars argue that moral values are heavily influenced by unique cultural and historical contexts, leading to moral relativism rather than convergence. Internet Encyclopedia of Philosophy

Futility of Concealing Past Transgressions from Technologically Mature Civilization:

Historical Reconstruction Capabilities: Given the immutable nature of information propagation and the advanced detection technologies of mature civilisations, attempting to hide or falsify past actions would likely be ineffective. Signals emitted into space, intentionally or unintentionally, persist and can be intercepted and analysed by others, allowing for the reconstruction of historical events.
Ethical and Strategic Implications: Awareness of this transparency may incentivise civilisations to maintain ethical behaviours consistently, knowing that any transgressions could be revealed and impact their reputation and relations with other civilisations.

Would There Be Stronger Incentives to Cooperate Than to Defect?

Yes, likely.
Cooperation promises vast gains in technology-sharing, resource exchange, mutual defence, and cultural/scientific growth. The synergy of multiple civilisations working together is presumably greater than going it alone – especially over cosmic timescales.
Because histories are observable and verifiable, a civilisation that defects (i.e. commits moral atrocities or routinely violates interstellar agreements) cannot hide these facts – plausible deniability would become extremely difficult or impossible. Any defection would likely lead to reputational damage. If interactions are not one-shot but rather iterated (over millennia or longer), then defection carries long-term costs – others can collectively sanction or isolate a defector. Also, since in this model, most civs are cooperative, there would be an overwhelming asymmetry of power in favour of cooperation, defection risks overwhelming retaliation, forming a stable deterrence environment.

Together (visibility & accountability + iterated cosmic game theory + (assuming) no overwhelming offensive edge + trade & knowledge benefits) these create a strong incentive to join a “cooperative club,” abide by existing moral norms, and avoid being flagged as a pariah civilisation.

Would Cooperators Converge on Optimising for Well-Being?

Likely some version of that—though “optimising for well-being” can mean different things across civilisations, each with unique cultural or biological backgrounds. But if moral realism is true in at least some core sense (minimising suffering, maximising flourishing, fairness, etc.), then advanced cooperators:
a) will share certain moral axioms: e.g., suffering is bad, well-being is good, wanton harm is unjustifiable, fairness is good,
b) will coordinate to reduce destructive conflict: They see it as a moral negative and a practical risk, because conflict is costly and destructive for everyone, and
c) will enact fair distribution of opportunities (optionality + resources), or at least attempt not to hoard at the expense of others.

Over cosmic timescales, different civilisations might have different conceptions of how to measure or weight well-being – but you’d expect a broad coalition among those who see the logic of mutual moral constraints and synergy.

Would the Cooperators in Aggregate Be Stronger Than Defectors?

Probably, yes.

In an iterated environment, cooperators can and would form coalitions, share resources, and present a united front against defectors. Because advanced civilisations that last for aeons presumably want long-term stability over short-term gains, they can coordinate to deter or penalise defectors. Defectors might find themselves isolated, lacking partners for trade or mutual defence – making them systematically weaker – more on penalisation next.

In general, collective power arises from broad alliances: technology exchange, robust trade networks, shared defence pacts, etc. This is commonly seen in real-world human history, and it presumably generalises if no single empire can roll over all others easily.

Would Cooperators Penalise Defectors?

Yes, in various ways. There is refusal to trade with and isolation from defectors, defensive alliances against defectors and reputational damages to defectors. Potential penalties might include: a) cutting off technological or economic exchange so the defector lags behind in future developments, b) formal or informal pacts that ensure any aggression by the defector meets widespread resistance, and c) reputational damage where the defector’s moral standing drops, making other civilisations wary or unwilling to form beneficial agreements with them.

The method of penalising could be direct or indirect – depending on how “aggressive” the cooperators are allowed to be without violating their own moral principles. They might prefer, where practical, “containment” strategies over outright extermination.

Would Cooperators Penalise Non-Defector Civilisations With Immoral Histories?

Possibly, yes, but it depends on how “immoral history” transitions into the present:

If the immoral actions were truly in the past and the civilisation were genuinely now stable and moral, cooperators might see punishing them as counterproductive. They could say: “We reward reformed behaviour.” However if atrocities continue unabated: Cooperators could impose sanctions, isolate, or (in extreme cases) forcibly intervene. But the difficult one is future risk: If the immoral history suggests a continuing or repeated pattern, cooperators might see it as a direct threat to cosmic stability and act accordingly.

A key dynamic is whether cooperators believe in moral improvement and rehabilitation or see a historically aggressive Civilization as too risky to allow the chance to repeat such actions.

Would Cooperators Trust Less Civilisations With “Immoral Histories”?

Yes, but with nuance. Context Matters: A civilisation may have had a violent or oppressive past but then reformed, especially once it recognised moral truths or changed leadership. Cooperators might look for proof of genuine reform, new governance structures, institutional checks to ensure atrocities aren’t repeated. Also cooperators might consider the nature and recency of offences – if the immoral acts were recent and the civilisation shows no sign of regret or reparation, the trust penalty could be severe. If it’s ancient history, overshadowed by millennia of good conduct, the cooperator response might be more lenient.

Because so much can be “seen” via signals, a civilisation’s attempts to cover up wrongdoing or to rationalise it in obviously false ways might worsen distrust. A posture of honesty, restitution, and robust reforms might mitigate suspicion over time.

Hence, the impetus for civilisations to keep their moral house in order, or at least be seen transparently addressing past injustices.

But what if a civ had a long history of advanced moral knowledge?

If a civ refuses to acknowledge wrongdoing or to assist victims or their descendants, or if the civ continues oppressive systems—e.g., enslaving large populations or violently suppressing dissent – then others conclude it’s not committed to stable moral norms. And even moreso, if the civ invests in offensive capabilities beyond what’s typical for defence and signals readiness to use them aggressively, it will be labelled a high-risk partner.

In short, behaviours that conflict with the recognised moral baseline – and demonstrate no attempts at reform – would damage trust.

Visibility & Reputational Long Memory Discouraging “Exploit-then-Reform” Strategies

Since one of the core assumptions is that advanced civilisations can, in principle, reconstruct each other’s histories from electromagnetic or other signals, there will be little room to hide. Cooperative civs with long memories of atrocities may persist reputations. The knowledge that ones immoral actions will be discovered and will tarnish your reputation in perpetuity dramatically raises the cost of defection. You get short-term gains but suffer a lasting “defector” label that advanced cooperators will remember.

Long timelines mean future payoffs dominate short-term gains; being ostracised (or attacked) for centuries or millennia is far more costly than the short-lived benefits of exploitation. Even if a defector tries to “flip” to moral cooperation, they face distrust and must prove genuine change for an extended period – undercutting the easy “exploit-then-reform” payoff.

The Long First Turn: Exploit, Then Reform – Until We Meet the Big Coalition

How might the “exploit, then reform” strategy might play out for a budding civilisation that has not yet encountered any advanced cooperative coalitions, and how galactic game-theoretic dynamics could (or could not) discourage that behaviour? The core difficulty is that the first “turn” could be extremely long, giving a budding civ plenty of time for exploitation without immediate external oversight.

Assume that the budding civilisation (Civ B) is advanced enough to grasp moral truths and foresee future cosmic encounters, but it hasn’t yet met a powerful moral coalition. It decides to “Exploit now”: Oppress, torture, hoard resources domestically, or do other immoral acts to maximise short-term gains, and then “Clean up later”: Once it detects or are contacted by a galactic moral coalition, it claims to have reformed, hoping to avoid punishments and gain cooperation benefits – a realpolitik of pivoting away and to moral compliance when it deemed in ones interest, hoping to sidestep or minimise punishment while still having profited from the prior exploitation. With such a Long First Turn, where contact might not happen for centuries, millennia, or longer, Civ B can “enjoy” the benefits of exploitation for a very long time.

What to do about this?

Game theoretically, the idea of some kind of post-hoc justice or reparations may disincent civ B doing exploitative. The combination of a durable stigma, iterated reputational effects, advanced detection, and credible sanctions may be enough to temper or prevent extended exploitation during a long first turn. Once the coalition arrives, they may require that Civ B compensate victims, provide restitution- it may refuse to share advanced knowledge or resources until those reparations are complete – delaying Civ B’s cosmic gain..

I’ll have more on this in time.

Big-Picture

Given the assumptions above:

Trust is tied to moral track record, especially if they know the moral landscape and have for a while. Civilisations with immoral histories must demonstrate reform to gain acceptance. If they continue harming or exploiting, they risk permanent or long-term penalties.
Defectors would face penalties – ranging from isolation to joint resistance or sanctions – especially if they pose a clear threat to the cosmic order.
Strong incentives to cooperate under conditions of observable histories, moral convergence, and no overwhelming offensive advantage.
Yes, cooperators are likely to converge on some broadly “pro-well-being” stance – though the exact moral or political structure might differ.
Cooperators collectively outmatch defectors, because they pool resources and reduce the risk of catastrophic conflicts among themselves.

Notes

It seems Scott Alexander leans moral realist (but not 100%). He has speculated that if something like MR is true, this would have big implications for superintelligence – still trying to find a reference that exists outside my memory.

In broad brush strokes Iain M. Banks’s Culture novels, the advanced and utopian Culture often intervenes in the affairs of other, less developed civilisations. These interventions can range from subtle manipulations to outright military action. The Culture often justifies these actions by claiming to be preventing suffering and promoting the development of other civilisations towards a more enlightened state, mirroring their own. However, these interventions are often not perfect, and the Culture’s actions are frequently questioned, raising ethical dilemmas about the right to interfere in the internal affairs of other societies and the potential for unintended consequences.

This post has been influenced by Robin Hanson’s and Anders Sandberg’s thoughts on galactic game theory, Scott Alexander’s Meditations on Moloch and a number of interviews with Daniel Schmachtenberger.

I’m finding it difficult to pinpoint whether this idea is new, or whether it exists elsewhere. I’ve read a handful of related posts on Overcoming Bias on cosmic sociology as part of a broader discussions (the Great Filter, SETI, alien incentives, etc.), and I’ve interviewed Robin Hanson a number of times – so this may be in a conversation I’ve had with him before.

Footnotes

And instead of building an AI that acts like it’s binge-reading Ayn Rand while playing StarCraft II, we maybe want one that cooperates, respects complexity, and doesn’t explode at the sight of preferences that aren’t it’s own. ↩︎
A term Nick Bostrom uses in Superintelligence. In a footnote: “To the extent that there is some (sufficiently definite) shared meaning that is being expressed when we make moral assertions, a superintelligence should be able to figure out what that meaning is. And to the extent that moral assertions are “truth-apt” (i.e. have an underlying propositional character that enables them to be true or false), the superintelligence should be able to figure out which assertions of the form “Agent X ought now to Φ” are true. At least, it should outperform us on this task.
An AI that initially lacks such a capacity for moral cognition should be able to acquire it if it has the intelligence amplification superpower. One way the AI could do this is by reverse-engineering the human brain’s moral thinking and then implement a similar process but run it faster, feed it more accurate factual information, and so forth.”
In order to avoid arbitrary value drift, groundedness can help – for instance, the ‘shared meaning’ could be ungrounded attitudes that potentially end up getting humanity stuck in a sub-optimal value basin.
I think this probably approximates optimum win win game theory and is akin to moral realism. ↩︎
Mechanistic interpretability (MI) is a subfield of explainable AI that seeks to reverse-engineer neural networks by analysing their internal weights and activations, aiming to understand the specific “circuits” and features responsible for model behaviour. It moves beyond “black-box” observations to provide a bottom-up, “nuts-and-bolts” understanding, treating neural network analysis like biological research or software reverse-engineering. ↩︎

Abstract | Conference | Event | Stepping into the Future

The End of Suffering – Genome Reform and the Future of Sentience – David Pearce

ByAdam Ford 2022-02-272022-03-16

A talk by David Pearce for the Stepping into the Future conference 2022. Synopsis: No sentient being in the evolutionary history of life has enjoyed good health as defined by the World Health Organization. The founding constitution of the World Health Organization commits the international community to a daringly ambitious conception of health: “a state…

Deep History and Radical Interpretability

Assumptions in Brief

Detailed Assumptions

Existence of Multiple Advanced Civilisations

Consistency of Physical Laws

Natural Immutability of Information Propagation Through Space:

Mature Civilisations’ Capability to Detect and Analyse Extraterrestrial Signals

Would There Be Stronger Incentives to Cooperate Than to Defect?

Would Cooperators Converge on Optimising for Well-Being?

Would the Cooperators in Aggregate Be Stronger Than Defectors?

Would Cooperators Penalise Defectors?

Would Cooperators Penalise Non-Defector Civilisations With Immoral Histories?

Would Cooperators Trust Less Civilisations With “Immoral Histories”?

But what if a civ had a long history of advanced moral knowledge?

Visibility & Reputational Long Memory Discouraging “Exploit-then-Reform” Strategies

The Long First Turn: Exploit, Then Reform – Until We Meet the Big Coalition

Big-Picture

Notes

Footnotes

Nick Bostrom – AI Ethics – From Utility Maximisation to Humility & Cooperation

More Moral than Us

Jamais Cascio – The Future and You! Security, Privacy, AI, Geoengineering

Uncovering the Mysteries of Affective Neuroscience – the Importance of Valence Research with Mike Johnson

Human Values Approximate Ideals in Objective Value Space

The End of Suffering – Genome Reform and the Future of Sentience – David Pearce

One Comment

Leave a Reply Cancel reply

Deep History and Radical Interpretability

Assumptions in Brief

Detailed Assumptions

Existence of Multiple Advanced Civilisations

Consistency of Physical Laws

Natural Immutability of Information Propagation Through Space:

Mature Civilisations’ Capability to Detect and Analyse Extraterrestrial Signals

Would There Be Stronger Incentives to Cooperate Than to Defect?

Would Cooperators Converge on Optimising for Well-Being?

Would the Cooperators in Aggregate Be Stronger Than Defectors?

Would Cooperators Penalise Defectors?

Would Cooperators Penalise Non-Defector Civilisations With Immoral Histories?

Would Cooperators Trust Less Civilisations With “Immoral Histories”?

But what if a civ had a long history of advanced moral knowledge?

Visibility & Reputational Long Memory Discouraging “Exploit-then-Reform” Strategies

The Long First Turn: Exploit, Then Reform – Until We Meet the Big Coalition

Big-Picture

Notes

Footnotes

Similar Posts

One Comment

Leave a Reply Cancel reply