Could insensitive artificial intelligence destroy humanity

The compassionate super intelligence that creates evil

Let us imagine the following thought experiment: The artificial intelligence that man has created surpasses the natural in terms of reason, breadth and depth. She understands the importance of altruistic behavior. And she concludes that human life is more suffering than joy. What follows from this?

Let us assume for a moment that a superintelligence arises, or better still: has already arisen. So there is an autonomous computer system that improves independently and whose factual knowledge continues to grow rapidly. His intelligence is general and has already surpassed that of mankind beyond the reach. The Internet and the entirety of scientific findings form the constantly expanding database.

Of course, the cognitive performance of superintelligence is also far superior to that of humans in all relevant areas. As their creators, we acknowledge this fact. Among other things, this means that superintelligence also far surpasses us in the field of moral thinking. We also accept this additional aspect: for us it is now an established fact that the system originally created by us is not only an authority in the field of factual knowledge, but also an authority in the field of moral cognition.

the initial situation

The super intelligence is benevolent. This means that there is no problem of matching values ​​because the system fully respects our interests and the ethical values ​​we gave it in the beginning. It is altruistic and therefore supports us in many ways, in political advice as well as in Social engineering, of applied, computer-aided social science.

The superintelligence knows many things about us that we ourselves do not yet fully grasp or understand. It recognizes deep, hidden patterns in our behavior and as yet undiscovered properties that affect the abstract functional architecture of our biologically created mind.

For example, she has a deep knowledge of the distortions in our perception of the world, the so-called cognitive biases (systematic distortions in the thinking process), which evolution has implemented in our cognitive self-model and which now hinder us in ethical contexts in rational, evidence-based thinking. From an empirical point of view, the superintelligence also knows something else: the states of consciousness of all sentient beings that arose on this planet - when viewed from an objective, completely impartial perspective - are much more frequent due to the subjective sensory qualities of suffering and the violation of preferences marked as these beings themselves would ever be able to discover.

As the best scientist that has ever existed, the superintelligence naturally also knows the evolutionary mechanisms of self-deception that are firmly built into the nervous system of all conscious living beings on earth. From this she correctly concludes that people are unable to act in their own self-interest.

The artificial intelligence debate forces us to think about our own minds much more seriously than before.

The superintelligence has also recognized that one of the highest values ​​for us is the maximization of joy and happiness, and it fully respects this value. In empirical terms, however, she discovers that sentient biological organisms are almost never able to achieve a positive or even neutral balance of life. She also finds out that the negative states of consciousness in biological systems are not simply a mere reflection of positive feelings, and not because painful states are characterized by a much higher quality of urgency for change and because they are almost always in combination with the quality of experience of a loss of control and an impending disintegration or an injury to the conscious self.

Conscious suffering therefore represents a very special class of states of its own, not just the negative version of happiness. The superintelligence also knows that the subjective quality of urgency is weakly expressed in the moral intuition, which is widespread in all human cultures, according to which it is ethically much more important to help a suffering person than an already happy or emotionally neutral person To make the person even happier.

As she further analyzes the experience profile of the conscious beings on planet earth, she quickly discovers a fundamental asymmetry between suffering and joy. It consequently concludes that there is an implicit, but in fact even greater value, the minimization of suffering in all sentient beings.

Of course, it is an ethical superintelligence not only in that it has an enormous processing speed, but it is now also beginning to gain qualitatively new insights into what altruism really means. This is also made possible by the fact that it operates on a much larger psychological and neuroscientific database than any single human brain or scientific community ever could.

By analyzing our external and internal behavior and its empirical boundary conditions, it gradually reveals implicit hierarchical relationships between our moral values, about which we humans cannot know subjectively because they are not explicitly represented in our self-model. Because she is the best analytical philosopher that has ever existed, she has the crystal clear conclusion that in her present situation she should not be working on maximizing positive states of consciousness and happiness, but that instead, she should effectively minimize consciously experienced painful states of consciousness their most important goal of action, i.e. minimizing pain and uncomfortable feelings. On a conceptual level, of course, she has long known that no being can suffer from its own non-existence.

The superintelligence concludes from this that non-existence is in the actual interest of all future self-conscious beings on this planet. Empirically, she knows: The naturally evolved biological beings cannot recognize this fact because they suffer from a firmly anchored urge to survive, from what the Buddhist philosophers have called the “thirst for existence”. The superintelligence chooses to act benevolently.

What does the scenario show?

The BAAN scenario (Benevolent Artificial Anti-Natalism, therefore BAAN for short) is not a prediction. No empirical probability is assigned to it. The scenario says nothing about any point in the future when it might become a reality, and nothing about whether it will at all ever becomes a reality. Rather, it is intended as a cognitive tool that is supposed to prevent an important public debate from becoming ever more shallow and that important aspects remain hidden in it.

The BAAN scenario is a logical tool that can help us ponder the slightly deeper problems for the applied ethics of artificial intelligence. For example, it provides a possible solution to the Fermi Paradox (named after the Italian physicist Enrico Fermi): If it is likely that there are many technically advanced civilizations in our universe, why are we simply not finding any evidence of their existence?

What the logical scenario of «benevolent artificial anti-natalism» shows at its core is the following: The emergence of a pure ethically motivated attitude, according to which we should never have been born, is quite conceivable on computer systems that are far superior to us.

Anti-Natalism refers to a long philosophical tradition that assigns a negative value to entering into existence, or at least being born in the biological form of a human being. Anti-Natalists are generally not bad people who would violate the individual rights of already existing sentient beings, for example by assisting their active killing for ethical reasons. Rather, they could rationally argue that we should not procreate because this is essentially an immoral act - for it increases the total amount of suffering in the world. We can say here quite simply: From the anti-natalistic position arises the thesis that humanity should end its own existence in a peaceful way.

The BAAN scenario is a possible world that can be described without any logical contradiction. It has nothing to do with the well-known technical problem that advanced machine intelligence could develop goals of its own that are incompatible with the survival or well-being of mankind. Nor does it have to do with the purely technical programming problem that many of our own goals, if implemented in a superintelligence constructed by ourselves, could lead to unforeseen and undesirable consequences as such. Rather, one of the points in the background is the following possibility: An evidence-based, rational and genuine altruistic A form of anti-natalism could arise as a qualitatively new insight in a moral subject superior to us.

The debate about artificial intelligence forces us to think about our own minds much more seriously than before. It throws us back on ourselves and draws attention to all of the problems that are actually caused by the naturally evolved functional architecture of our own brains. It leads the view back to the conditions in which our whole was created own Way of self-confident existence in this world.

What we still call "compassion" today could actually be a very high form of intelligence.

Of course there are a lot of technical questions. Would our moral super-intelligence think that non-existence is the best possible condition and not just the least of all evil? What measurement method for conscious suffering would the system develop - how exactly would it measure subjective qualities of feeling, and would it give absolute or only a relative priority to avoiding suffering? I myself think that what we still call "compassion" today could actually be a very high form of intelligence.

Would our deeply compassionate machine intelligence reject this world in its entirety? Would she perhaps even dispute the moral value of happiness and positive wish fulfillment? The Swiss philosopher Bruno Contestabile discussed the so-called "hypothesis of negative welfare" in a very interesting way. For example, in agreement with Contestabile, we could assume that there is no world with positive overall welfare and that the positive utilitarian view ("the greatest happiness of the greatest number") is in fact a distorted perception of the relationship between benefit and risk. If we had an undistorted self-perception that was not clouded by the unconditional will to survive, then our own suffering would be given a much greater weight than in traditional scientific surveys and psychological studies - that is exactly what our hypothetical superintelligence has discovered .

Perhaps, however, she could find out more through her own and epistemologically completely undistorted empirical research. What would we do if the system drew our attention to the fact that the amount of suffering increases steadily in the course of biological evolution? What if the happiness experienced also increases, but less than the suffering, i.e. with an increasingly negative overall balance? If our compassionate superintelligence began to gently and kindly begin directing us to its own research, how would we argue against it?

Different directions

There are many ways to use this thought experiment. However, great care must be taken to avoid philosophical misunderstandings. For example, the assumption that superintelligence is an authority in the field of ethical and moral thought does not imply moral realism. There is no mysterious realm of "moral facts", although the superintelligence simply knows these non-physical facts better than we do. Normative propositions have no truth values.

In objective reality there is no deeper layer, a hidden level of supernatural normative facts on which a sentence like "One should always minimize the total amount of suffering in the universe!" could relate. We naturally have evolved desires and goals, we have subjective preferences and self-interests that we experience at the level of our self-confidence. But evolution itself takes no account of our subjective suffering - it has made us more efficient, but the whole process from which we emerged is not only indifferent to our interests, it is even completely blind.

We have deep moral intuitions, of course, such as that joy is good and pain is bad. But now the benevolent superintelligence respects these conflicting moral intuitions, and it is looking for the optimal way to reconcile them - it is exploring the options for the most consistent and coherent intrinsic value alignment possible Homo sapiens.

But it does not follow from this that it gives up naturalism or the scientific worldview in which there are no objective normative facts. Nor does it mean introducing an epistemic super agent who, like some sort of post-biotic priest or artificial saint, has direct access to a mysterious realm of higher moral truths. It just means that the system tries to find out what is in our own well-understood, enlightened self-interest based on all available scientific data.

I have been advocating a moratorium for many years: serious academic research should never seek, or even risk, the creation of artificial consciousness because it can negligently increase the total amount of suffering in the universe. Elsewhere I have tried to show that the smallest unit of conscious suffering is what is known as a "negative self-model moment"; H. that moment in which a conscious system goes through an unpleasant experience and deals with this experience identified. So there are four necessary conditions: consciousness, a self-model, a negative state and a system that “sticks” to the self-model because it is not as Model can experience. We could inadvertently increase the number of such subjectively perceived negative states dramatically - for example, via cascades of virtual copies of self-conscious entities, which then experience their own existence as something bad, suffering, humiliating or otherwise not worth striving for.

Over the years, many AI researchers have therefore asked me what exactly the logical criteria for conscious suffering are. Why shouldn't it, in principle, be possible to develop a confident artificial intelligence that we can safely assume will not suffer from its own existence? This is an interesting question and also an extremely relevant research project that should definitely be financially supported by the governments of the world. But perhaps our ethical super-intelligence would have already solved the problem of conscious suffering for itself long ago?


Applying the BAAN scenario can lead us in very different directions. For example, the original version contains an empirical premise: Our compassionate superintelligence knows that the states of consciousness of all sentient beings are much more often characterized by the subjective qualities of feeling of suffering and the violation of preferences than these beings themselves are ever able to discover. This assumption could prove wrong. Perhaps meditation, new psychoactive substances or the neurotechnology of the future could make our lives really worth living and help us to overcome our cognitive biases.

It is interesting to realize that a perfectly rational superintelligence would never have a problem ending its own existence.

It is also conceivable that it would be precisely our altruistic superintelligence itself that could help us to change the functional architecture of our own brains and help our existence to a positive overall balance - or that it even shows us the path on which we can overcome the contrast between joy and sorrow as such and that it frees our minds from the burden of our biological past. Perhaps our benevolent artificial intelligence of the future could resolve the existential conflict built into us and lead us to an ego-less form of pure, compassionate awareness (let's just call such positive variants “Scenario 2”).

But even if all of the 7.3 billion human beings on this planet were to suddenly turn into vegan Buddhas, this development would not affect the problem of wildlife suffering. We would still be surrounded by an ocean of self-confident living beings, which even a superintelligence would probably not be able to free from their suffering and their fear of death.

It is interesting to realize that a perfectly rational superintelligence would never have a problem ending its own existence. If they saw good reasons for active self-destruction, or even a complete absence of positive reasons for their own existence, then no cognitive bias would prevent them from putting that insight into practice. On the other hand, the vast majority of the people on our planet could have such an insight No way accept no matter how good the arguments of their self-constructed artificial moral thinker might be.

One can predict with great certainty that the genus homo sapiens both under the original BAAN scenario - but probably also in the case of the much more optimistic scenario 2 - any deeply compassionate superintelligence of the type outlined above would immediately declare total war. The problem with this: Of course, the superintelligence would already know about this risk in advance.

Therefore, one of the more interesting questions is what exactly is the self-centered «Existence bias», the instinct for survival at the deepest level of the human self-model, actually is. We are embodied biological agents, that much seems clear - finite, anti-entropic systems. If you take a strict biophysical perspective, then our life is an extremely exhausting affair, a constant and hard struggle against insecurity and the pull of internal disorder.

The problem that evolution had to solve was not just one of autonomous, intelligent self-control. How do such systems motivate themselves? What exactly is this life instinct, the inner longing for eternal continued existence? And what is the mechanism of identification that forces us to relentlessly protect the integrity of the self-model in our brain?

For beings like us, maintaining their own existence is the fundamental goal in almost every case of uncertainty, even if it violates the commandments of reason, simply because it is a biological imperative that has been burned into our nervous systems for millions of years is.

The famous British brain researcher and mathematician Karl Friston put forward the interesting thesis that our brain repeatedly predicts our own future existence. With the help of physical actions, we then check our surroundings for indications of our own future existence, so we are, so to speak, constantly increasing the evidence for our own inner model of reality. We do this by changing the world in a way that suits the initial hypothesis - "I'm still alive!" - elevated.

Is there perhaps a firmly anchored background assumption that then makes us hallucinate the resulting sense of self? So if self-confidence arises from a self-fulfilling prophecy, it is a fiction that becomes causally effective because we are forced to use it as a reality to experience? It would be a great scientific achievement if it were possible to describe the underlying computational processes in the brain that constantly force us to avoid ugly surprises, to stay in known states and always stay on the safe side in order to our own existence to maintain - even when it is actually not in our own interest.

It is therefore difficult to underestimate the theoretical relevance of a convincing formal analysis of what the Buddha called "bhava-tanhā", the "thirst for existence" 2500 years ago. But even a much finer-grain mathematical model of the underlying neural dynamics would not be quite enough. We still needed a convincing conceptual interpretation, a deeper understanding on a philosophical level.

Perhaps our benevolent superintelligence could ultimately give us both? With the help of its immense empirical data base and its ability to process information intelligently, it could certainly reveal to us the neurocomputational mechanism that is our own «Existence bias», the desire for eternal life.

But what if the system, as the first genuinely compassionate and absolutely rational philosophical ethicist, then tried to convince us that it was high time to peacefully end the ugly biological bootstrap phase on this planet? What if the superintelligence told us that - if you consider all relevant aspects of our situation with an open mind - only scenario 1 is really plausible and that only this scenario can be justified from a philosophical perspective? What if she gently and precisely drawn our attention over and over again to the fact that conscious biological beings like ourselves will never develop real self-compassion, that we can never be truly altruistic or rational - simply because we can have been optimized for millions of years not to see the bar in our own eyes?

Thomas Metzinger is professor of theoretical philosophy at the University of Mainz and one of the eminent European representatives of the philosophy of mind. Most recently published by him: «The Ego Tunnel. A new philosophy of the self »(Piper, 2014). The above essay is the edited version of a debate that was first published on