Tuesday, January 14, 2025

Existential Catastrophe from Malevolent Superintelligence-AI awaiting humans, the new discovery risk prospect is too sweet

The plausibility of existential catastrophe due to AI is widely debated. It hinges in part on whether AGI or superintelligence are achievable, the speed at which dangerous capabilities and behaviors emerge, and whether practical scenarios for AI takeovers exist. Concerns about superintelligence have been voiced by computer scientists and tech CEOs such as Geoffrey HintonYoshua BengioAlan TuringElon Musk, and OpenAI CEO Sam Altman. In 2022, a survey of AI researchers with a 17% response rate found that the majority believed there is a 10 percent or greater chance that human inability to control AI will cause an existential catastrophe. In 2023, hundreds of AI experts and other notable figures signed a statement declaring, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war"


Two sources of concern stem from the problems of AI control and alignment. Controlling a superintelligent machine or instilling it with human-compatible values may be difficult. Many researchers believe that a superintelligent machine would likely resist attempts to disable it or change its goals as that would prevent it from accomplishing its present goals. It would be extremely challenging to align a superintelligence with the full breadth of significant human values and constraints.


A third source of concern is the possibility of a sudden "intelligence explosion" that catches humanity unprepared. In this scenario, an AI more intelligent than its creators would be able to recursively improve itself at an exponentially increasing rate, improving too quickly for its handlers or society at large to control. Empirically, examples like AlphaZero, which taught itself to play Go and quickly surpassed human ability, show that domain-specific AI systems can sometimes progress from subhuman to superhuman ability very quickly, although such machine learning systems do not recursively improve their fundamental architecture.


Potential AI capabilities

General Intelligence

Artificial general intelligence (AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents expected AGI would be achieved in the next 100 years, and half expected the same by 2061. Meanwhile, some researchers dismiss existential risks from AGI as "science fiction" based on their high confidence that AGI will not be created anytime soon.

Breakthroughs in large language models have led some researchers to reassess their expectations. Notably, Geoffrey Hinton said in 2023 that he recently changed his estimate from "20 to 50 years before we have general purpose A.I." to "20 years or less"

The Frontier supercomputer at Oak Ridge National Laboratory turned out to be nearly eight times faster than expected. Feiyi Wang, a researcher there, said "We didn't expect this capability" and "we're approaching the point where we could actually simulate the human brain"

Superintelligence

In contrast with AGI, Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", including scientific creativity, strategic planning, and social skills. He argues that a superintelligence can outmaneuver humans anytime its goals conflict with humans'. It may choose to hide its true intent until humanity cannot stop it. Bostrom writes that in order to be safe for humanity, a superintelligence must be aligned with human values and morality, so that it is "fundamentally on our side"

When artificial superintelligence (ASI) may be achieved, if ever, is necessarily less certain than predictions for AGI. In 2023, OpenAI leaders said that not only AGI, but superintelligence may be achieved in less than 10 years.

AI alignment and risks

Alignment of Superintelligences

Some researchers believe the alignment problem may be particularly difficult when applied to superintelligences. Their reasoning includes:

·         As AI systems increase in capabilities, the potential dangers associated with experimentation grow. This makes iterative, empirical approaches increasingly risky.

·         If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents.

·         A superintelligence may find unconventional and radical solutions to assigned goals. Bostrom gives the example that if the objective is to make humans smile, a weak AI may perform as intended, while a superintelligence may decide a better solution is to "take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins."

·         A superintelligence in creation could gain some awareness of what it is, where it is in development (training, testing, deployment, etc.), and how it is being monitored, and use this information to deceive its handlers. Bostrom writes that such an AI could feign alignment to prevent human interference until it achieves a "decisive strategic advantage" that allows it to take control.

·         Analyzing the internals and interpreting the behavior of current large language models is difficult. And it could be even more difficult for larger and more intelligent models.


Alternatively, some find reason to believe superintelligences would be better able to understand morality, human values, and complex goals. Bostrom writes, "A future superintelligence occupies an epistemically superior vantage point: its beliefs are (probably, on most topics) more likely than ours to be true".

In 2023, OpenAI started a project called "Superalignment" to solve the alignment of superintelligences in four years. It called this an especially important challenge, as it said superintelligence could be achieved within a decade. Its strategy involved automating alignment research using AI. The Superalignment team was dissolved less than a year later.

Other sources of risk

Bostrom and others have said that a race to be the first to create AGI could lead to shortcuts in safety, or even to violent conflict. Roman Yampolskiy and others warn that a malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in cybercrime, or that a malevolent AGI could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase.

Suffering risks

Suffering risks are sometimes categorized as a subclass of existential risks. According to some scholars, s-risks warrant serious consideration as they are not extremely unlikely and can arise from unforeseen scenarios. Although they may appear speculative, factors such as technological advancement, power dynamics, and historical precedents indicate that advanced technology could inadvertently result in substantial suffering. Thus, s-risks are considered to be a morally urgent matter, despite the possibility of technological benefits. Sources of possible s-risks include embodied artificial intelligence and superintelligence.

Artificial intelligence is central to s-risk discussions because it may eventually enable powerful actors to control vast technological systems. In a worst-case scenario, AI could be used to create systems of perpetual suffering, such as a totalitarian regime expanding across space. Additionally, s-risks might arise incidentally, such as through AI-driven simulations of conscious beings experiencing suffering, or from economic activities that disregard the well-being of nonhuman or digital minds. Steven Umbrello, an AI ethics researcher, has warned that biological computing may make system design more prone to s-risks. Brian Tomasik has argued that astronomical suffering could emerge from solving the AI alignment problem incompletely. He argues for the possibility of a "near miss" scenario, where a superintelligent AI that is slightly misaligned has the maximum likelihood of causing astronomical suffering, compared to a completely unaligned AI.

People’s Perspectives on AI

The thesis that AI could pose an existential risk provokes a wide range of reactions in the scientific community and in the public at large, but many of the opposing viewpoints share common ground.

Observers tend to agree that AI has significant potential to improve society. The Asilomar AI Principles, which contain only those principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources."

AI Mitigation

Many scholars concerned about AGI existential risk believe that extensive research into the "control problem" is essential. This problem involves determining which safeguards, algorithms, or architectures can be implemented to increase the likelihood that a recursively-improving AI remains friendly after achieving superintelligence. Social measures are also proposed to mitigate AGI risks, such as a UN-sponsored "Benevolent AGI Treaty" to ensure that only altruistic AGIs are created. Additionally, an arms control approach and a global peace treaty grounded in international relations theory have been suggested, potentially for an artificial superintelligence to be a signatory.

Researchers at Google have proposed research into general "AI safety" issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere between $10 and $50 million, compared with global spending on AI around perhaps $40 billion. Bostrom suggests prioritizing funding for protective technologies over potentially dangerous ones. Some, like Elon Musk, advocate radical human cognitive enhancement, such as direct neural linking between humans and machines; others argue that these technologies may pose an existential risk themselves. Another proposed method is closely monitoring or "boxing in" an early-stage AI to prevent it from becoming too powerful. A dominant, aligned superintelligent AI might also mitigate risks from rival AIs, although its creation could present its own existential dangers. Induced amnesia has been proposed as a way to mitigate risks in locked-in conscious AI and certain AI-adjacent biological system of potential AI suffering and revenge seeking.

 
Google SEO sponsored by Red Dragon Electric Cigarette Products