The plausibility of existential catastrophe due to AI is widely debated. It hinges in part on whether AGI or superintelligence are achievable, the speed at which dangerous capabilities and behaviors emerge, and whether practical scenarios for AI takeovers exist. Concerns about superintelligence have been voiced by computer scientists and tech CEOs such as Geoffrey Hinton, Yoshua Bengio, Alan Turing, Elon Musk, and OpenAI CEO Sam Altman. In 2022, a survey of AI researchers with a 17% response rate found that the majority believed there is a 10 percent or greater chance that human inability to control AI will cause an existential catastrophe. In 2023, hundreds of AI experts and other notable figures signed a statement declaring, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war"
Two sources of concern stem
from the problems of AI control and alignment.
Controlling a superintelligent machine or instilling it with human-compatible
values may be difficult. Many researchers believe that a superintelligent
machine would likely resist attempts to disable it or change its goals as that
would prevent it from accomplishing its present goals. It would be extremely
challenging to align a superintelligence with the full breadth of significant
human values and constraints.
A third source of concern is
the possibility of a sudden "intelligence explosion" that catches humanity unprepared. In
this scenario, an AI more intelligent than its creators would be able to recursively improve
itself at an exponentially
increasing rate, improving too quickly for its handlers or society at large to
control. Empirically, examples like AlphaZero, which taught itself to play Go and
quickly surpassed human ability, show that domain-specific AI systems can
sometimes progress from subhuman to superhuman ability very quickly, although
such machine learning systems do not recursively improve their
fundamental architecture.
Potential AI capabilities
General
Intelligence
Artificial
general intelligence (AGI)
is typically defined as a system that performs at least as well as humans in
most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents
expected AGI would be achieved in the next 100 years, and half expected the
same by 2061. Meanwhile, some researchers dismiss existential risks from AGI
as "science fiction" based on their high confidence that AGI will not
be created anytime soon.
Breakthroughs
in large language models have led some researchers to reassess their
expectations. Notably, Geoffrey
Hinton said in 2023 that he recently changed his
estimate from "20 to 50 years before we have general purpose A.I." to
"20 years or less"
The Frontier supercomputer at Oak Ridge National Laboratory turned out to be nearly eight times faster than
expected. Feiyi Wang, a researcher there, said "We didn't expect this
capability" and "we're approaching the point where we could actually
simulate the human brain"
Superintelligence
In contrast with AGI, Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive
performance of humans in virtually all domains of interest", including
scientific creativity, strategic planning, and social skills. He argues
that a superintelligence can outmaneuver humans anytime its goals conflict with
humans'. It may choose to hide its true intent until humanity cannot stop
it. Bostrom writes that in order to be safe for humanity, a
superintelligence must be aligned with human values and morality, so that it is
"fundamentally on our side"
When
artificial superintelligence (ASI) may be achieved, if ever, is necessarily
less certain than predictions for AGI. In 2023, OpenAI leaders said that not only AGI, but superintelligence
may be achieved in less than 10 years.
AI alignment and risks
Alignment
of Superintelligences
Some researchers believe the alignment problem may be
particularly difficult when applied to superintelligences. Their reasoning
includes:
·
As AI systems increase
in capabilities, the potential dangers associated with experimentation grow.
This makes iterative, empirical approaches increasingly risky.
·
If instrumental goal
convergence occurs, it may only do so in sufficiently intelligent agents.
·
A superintelligence
may find unconventional and radical solutions to assigned goals. Bostrom gives
the example that if the objective is to make humans smile, a weak AI may
perform as intended, while a superintelligence may decide a better solution is
to "take control of the world and stick electrodes into the facial muscles
of humans to cause constant, beaming grins."
·
A superintelligence in
creation could gain some awareness of what it is, where it is in development
(training, testing, deployment, etc.), and how it is being monitored, and use
this information to deceive its handlers. Bostrom writes that such an AI
could feign alignment to prevent human interference until it achieves a
"decisive strategic advantage" that allows it to take control.
·
Analyzing the
internals and interpreting the behavior of current large language models is
difficult. And it could be even more difficult for larger and more intelligent
models.
Alternatively, some find reason to believe superintelligences
would be better able to understand morality, human values, and complex goals.
Bostrom writes, "A future superintelligence occupies an epistemically
superior vantage point: its beliefs are (probably, on most topics) more likely
than ours to be true".
In 2023, OpenAI started a project called
"Superalignment" to solve the alignment of superintelligences in four
years. It called this an especially important challenge, as it said
superintelligence could be achieved within a decade. Its strategy involved
automating alignment research using AI. The Superalignment team was dissolved
less than a year later.
Other
sources of risk
Bostrom
and others have said that a race to be the first to create AGI could lead to
shortcuts in safety, or even to violent conflict. Roman
Yampolskiy and others warn that a malevolent AGI
could be created by design, for example by a military, a government, a
sociopath, or a corporation, to benefit from, control, or subjugate certain
groups of people, as in cybercrime, or that a
malevolent AGI could choose the goal of increasing human suffering, for example
of those people who did not assist it during the information explosion phase.
Suffering
risks
Suffering
risks are sometimes categorized as a subclass
of existential risks. According to some scholars, s-risks warrant serious
consideration as they are not extremely unlikely and can arise from unforeseen
scenarios. Although they may appear speculative, factors such as technological
advancement, power dynamics, and historical precedents indicate that advanced
technology could inadvertently result in substantial suffering. Thus, s-risks
are considered to be a morally urgent matter, despite the possibility of
technological benefits. Sources of possible s-risks include embodied artificial intelligence and superintelligence.
Artificial intelligence is central to s-risk discussions because it may
eventually enable powerful actors to control vast technological systems. In a
worst-case scenario, AI could be used to create systems of perpetual suffering,
such as a totalitarian regime expanding across space. Additionally,
s-risks might arise incidentally, such as through AI-driven simulations of
conscious beings experiencing suffering, or from economic activities that
disregard the well-being of nonhuman or digital minds. Steven Umbrello,
an AI
ethics researcher, has warned that biological computing may
make system
design more prone to s-risks. Brian Tomasik
has argued that astronomical suffering could emerge from solving the AI
alignment problem incompletely. He argues for the
possibility of a "near miss" scenario, where a superintelligent AI
that is slightly misaligned has the maximum likelihood of causing astronomical
suffering, compared to a completely unaligned AI.
People’s Perspectives
on AI
The
thesis that AI could pose an existential risk provokes a wide range of
reactions in the scientific community and in the public at large, but many of
the opposing viewpoints share common ground.
Observers
tend to agree that AI has significant potential to improve
society. The Asilomar AI Principles, which contain only those principles agreed to by 90% of the
attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no
consensus, we should avoid strong assumptions regarding upper limits on future
AI capabilities" and "Advanced AI could represent a profound change
in the history of life on Earth, and should be planned for and managed with
commensurate care and resources."
AI Mitigation
Many
scholars concerned about AGI existential risk believe that extensive research
into the "control problem" is essential. This problem involves
determining which safeguards, algorithms, or architectures can be implemented
to increase the likelihood that a recursively-improving AI remains friendly
after achieving superintelligence. Social measures are also proposed to
mitigate AGI risks, such as a UN-sponsored "Benevolent AGI
Treaty" to ensure that only altruistic AGIs are
created. Additionally, an arms control approach and a global peace treaty
grounded in international relations theory have been suggested, potentially for an artificial
superintelligence to be a signatory.