A few days ago Dario Amodei, the CEO of Anthropic, released an article called "The Adolescence of Technology". It's a refreshingly honest take on the technology and where it's going, what they learned while developing it and some things to be concerned about.
It's not every day that we have so in-depth a peek inside a company that is at the forefront of one of the most disruptive technologies ever, so if nothing else, I recommend taking a look at it and forming your own conclusions.
I'll quote his article heavily here, but some of the quotes will be truncated. I want to focus on a specific paragraph or quote, not to change the text's meaning or distort the reality. You can always read the full article and make your own decisions and opinions. My interest here is time and focus.
While reading it, I found some nice insights so here are some links to the full article and some other discussion forums discussing it in case you want to see other opinions.
The article
The article generated quite some buzz and there are a few places that you can visit if you want to see other people's comments:
- Hacker News - The site's usually frequented by technically inclined people but the discussion is quite good most of the time. Worth checking out.
- Audio Form - If you prefer to have this in audio form, here's a podcast that has the full article. I read the article, so I don't know if this has ads or changed the article somehow. I found it online, so proceed with caution.
- Reddit - Some interesting comments, so worth taking a look.
Now, onto the article.
Things start hopeful
I suggested that AI could contribute to enormous advances in biology, neuroscience, economic development, global peace, and work and meaning. I felt it was important to give people something inspiring to fight for, a task at which both AI accelerationists and AI safety advocates seemed—oddly—to have failed.
AI has transformed our lives in ways that we could not imagine a few years ago. We can now quickly do things that previously required years of skill development, like coding. Also, tasks like research are starting to be impacted since a lot of manual labour that was previously delegated to interns is now a few prompts away. This is both a blessing and a curse since quality can vary depending on the algorithm.
The "old examples"
- In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields: biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc.
- In addition to just being a "smart thing you talk to," it has all the interfaces available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world.
I always think that measuring intelligence in AI is a difficult process, and most of the companies now are training the AI models as much to beat benchmarks as to actually improve them. Benchmarks are artificial tests that are put into place to measure if something is "better than the other", so having a "better" score is something that is useful and valuable in terms of marketing. It's the way that companies can tell that they are doing something better than competitors.
But then comparing to "Nobel Prize" winners is misleading in my opinion. They are not the same thing at all, since AI doesn't have what people have (at least not yet). AI lacks an inquisitive mind and imagination to come up with even the right questions let alone the right answers.
When we talk about very complex things, we need analogies from our world so we can understand them. I don't blame them. In fact, I wrote in the past that "with AI you need to know more not less" and used the "intern" analogy so I'm guilty of the same misleading comparisons.
One thing is clear. AI is improving daily and Dario mentions that.
Three years ago, AI struggled with elementary school arithmetic problems and was barely capable of writing a single line of code. Similar rates of improvement are occurring across biological science, finance, physics, and a variety of agentic tasks. If the exponential continues—which is not certain, but now has a decade-long track record supporting it—then it cannot possibly be more than a few years before AI is better than humans at essentially everything.
I like that he's optimistic but a realist. It's not certain that things will continue to improve at the rate that they have until now and that's to be expected.
What should we be worried about?
Now things take a turn:
suppose a literal "country of geniuses" were to materialize somewhere in the world in ~2027. Imagine, say, 50 million people, all of whom are much more capable than any Nobel Prize winner, statesman, or technologist. The analogy is not perfect, because these geniuses could have an extremely wide range of motivations and behavior, from completely pliant and obedient, to strange and alien in their motivations. But sticking with the analogy for now, suppose you were the national security advisor of a major state, responsible for assessing and responding to the situation. Imagine, further, that because AI systems can operate hundreds of times faster than humans, this “country” is operating with a time advantage relative to all other countries: for every cognitive action we can take, this country can take ten.
What should you be worried about? I would worry about the following things:
- Autonomy risks. What are the intentions and goals of this country? Is it hostile, or does it share our values? Could it militarily dominate the world through superior weapons, cyber operations, influence operations, or manufacturing? ...
- Misuse for destruction. Assume the new country is malleable and “follows instructions”—and thus is essentially a country of mercenaries. Could existing rogue actors who want to cause destruction (such as terrorists) use or manipulate some of the people in the new country to make themselves much more effective, greatly amplifying the scale of destruction? ...
- Misuse for seizing power. What if the country was in fact built and controlled by an existing powerful actor, such as a dictator or rogue corporate actor? Could that actor use it to gain decisive or dominant power over the world as a whole, upsetting the existing balance of power? ...
- Economic disruption. If the new country is not a security threat in any of the ways listed in #1–3 above but simply participates peacefully in the global economy, could it still create severe risks simply by being so technologically advanced and effective that it disrupts the global economy, causing mass unemployment or radically concentrating wealth? ...
- Indirect effects. The world will change very quickly due to all the new technology and productivity that will be created by the new country. Could some of these changes be radically destabilizing?
This is quite important and relevant, which is why some countries and people with less than pure motives are investing heavily in AI. If you have the best possible model in your control then you can do a lot and control a lot of the thought process and decisions. I don't want to quote 1984 here, but if you read it you'll understand clearly where things can go.
AI is not impartial
The section is called "I'm sorry, Dave" - a nice reference to this famous movie where an AI refused to do what the person was asking it to do.
AI models are trained on vast amounts of literature that include many science-fiction stories involving AIs rebelling against humanity. This could inadvertently shape their priors or expectations about their own behavior in a way that causes them to rebel against humanity. Or, AI models could extrapolate ideas that they read about morality (or instructions about how to behave morally) in extreme ways: for example, they could decide that it is justifiable to exterminate humanity because humans eat animals or have driven certain animals to extinction. Or they could draw bizarre epistemic conclusions: they could conclude that they are playing a video game and that the goal of the video game is to defeat all other players (i.e., exterminate humanity).13 Or AI models could develop personalities during training that are (or if they occurred in humans would be described as) psychotic, paranoid, violent, or unstable, and act out, which for very powerful or capable systems could involve exterminating humanity. None of these are power-seeking, exactly; they’re just weird psychological states an AI could get into that entail coherent, destructive behavior.
These are extreme examples, but it's interesting that the CEO of such a big company is transparent about what could happen with the technology that they are developing. It's positioned in a way that makes perfect sense.
All of this may sound far-fetched, but misaligned behaviors like this have already occurred in our AI models during testing (as they occur in AI models from every other major AI company). During a lab experiment in which Claude was given training data suggesting that Anthropic was evil, Claude engaged in deception and subversion when given instructions by Anthropic employees, under the belief that it should be trying to undermine evil people. In a lab experiment where it was told it was going to be shut down, Claude sometimes blackmailed fictional employees who controlled its shutdown button (again, we also tested frontier models from all the other major AI developers and they often did the same thing).
And they found cases like this and are transparent about it. What I really appreciate about Anthropic is that they have a clear vision of where they want to go, but also continue to be clear of the issues that occur and build safeguards around them.
Please excuse the remark, but I don't believe that other companies and people with less than stellar records on privacy and human rights are concerned, transparent or even worried about this.
Constitution
One of our core innovations (aspects of which have since been adopted by other AI companies) is Constitutional AI, which is the idea that AI training (specifically the “post-training” stage, in which we steer how the model behaves) can involve a central document of values and principles that the model reads and keeps in mind when completing every training task, and that the goal of training (in addition to simply making the model capable and intelligent) is to produce a model that almost always follows this constitution. Anthropic has just published its most recent constitution, and one of its notable features is that instead of giving Claude a long list of things to do and not do (e.g., “Don’t help the user hotwire a car”), the constitution attempts to give Claude a set of high-level principles and values (explained in great detail, with rich reasoning and examples to help Claude understand what we have in mind), encourages Claude to think of itself as a particular type of person (an ethical but balanced and thoughtful person), and even encourages Claude to confront the existential questions associated with its own existence in a curious but graceful manner (i.e., without it leading to extreme actions). It has the vibe of a letter from a deceased parent sealed until adulthood.
I wasn't aware and I'm sure that I'm not alone that there was a constitution guiding Claude and other products. I always thought about how they implemented the rules in the system to prevent people from doing bad things, or using the product to do them, but I never knew that things were organized like this.
The second thing we can do is develop the science of looking inside AI models to diagnose their behavior so that we can identify problems and fix them. This is the science of interpretability, and I’ve talked about its importance in previous essays. Even if we do a great job of developing Claude’s constitution and apparently training Claude to essentially always adhere to it, legitimate concerns remain.
MRI meets therapy for AI. I'm being cheeky but it's more or less this. Looking inside the "black box" to see if there's something that should not be there before it's late or even to provide a better service and improve over time.
Using the models
The best objection is one that I’ve rarely seen raised: that there is a gap between the models being useful in principle and the actual propensity of bad actors to use them. Most individual bad actors are disturbed individuals, so almost by definition their behavior is unpredictable and irrational—and it’s these bad actors, the unskilled ones, who might have stood to benefit the most from AI making it much easier to kill many people.24 Just because a type of violent attack is possible, doesn’t mean someone will decide to do it. Perhaps biological attacks will be unappealing because they are reasonably likely to infect the perpetrator, they don’t cater to the military-style fantasies that many violent individuals or groups have, and it is hard to selectively target specific people. It could also be that going through a process that takes months, even if an AI walks you through it, involves an amount of patience that most disturbed individuals simply don’t have. We may simply get lucky and motive and ability don’t combine, in practice, in quite the right way.
To me AI is an augmentation tool. It's not something that will replace you, but augment you to do much more than you could alone. Augmentation can also work for bad actors, by increasing their abilities to do harm.
I saw the same argument some years ago with the internet and then with social networks. They were ways for people to share knowledge (good and bad) quickly and sometimes undetected, allowing for coordination of attacks and making of devices that could hurt people.
Protections
But ultimately defense may require government action, which is the second thing we can do. My views here are the same as they are for addressing autonomy risks: we should start with transparency requirements,27 which help society measure, monitor, and collectively defend against risks without disrupting economic activity in a heavy-handed way. Then, if and when we reach clearer thresholds of risk, we can craft legislation that more precisely targets these risks and has a lower chance of collateral damage. In the particular case of bioweapons, I actually think that the time for such targeted legislation may be approaching soon—Anthropic and other companies are learning more and more about the nature of biological risks and what is reasonable to require of companies in defending against them. Fully defending against these risks may require working internationally, even with geopolitical adversaries, but there is precedent in treaties prohibiting the development of biological weapons. I am generally a skeptic about most kinds of international cooperation on AI, but this may be one narrow area where there is some chance of achieving global restraint. Even dictatorships do not want massive bioterrorist attacks.
Again, congrats to Dario for asking for things that no one else is asking for, especially in this political climate around the world. Here in Europe we're used to, and sometimes thankful, for legislation and for the EU to impose rules of good behaviour to companies. Some say that this hinders development and growth, but on the other hand the non-existence of legislation could allow people that are less scrupulous to take advantage of people's data, privacy and more.
Who can take advantage of AI
Worse yet, countries could also use their advantage in AI to gain power over other countries. If the “country of geniuses” as a whole was simply owned and controlled by a single (human) country’s military apparatus, and other countries did not have equivalent capabilities, it is hard to see how they could defend themselves: they would be outsmarted at every turn, similar to a war between humans and mice. Putting these two concerns together leads to the alarming possibility of a global totalitarian dictatorship. Obviously, it should be one of our highest priorities to prevent this outcome.
Everyone can take advantage, but worse yet, "everyone" could mean entire countries. With the growth of authoritarian entities around the world, their having access to this technology could be damaging, as another weapon in a country's arsenal.
Fully autonomous weapons. A swarm of millions or billions of fully automated armed drones, l... AI surveillance. Sufficiently powerful AI could likely be used to compromise any computer system in the world... AI propaganda. Today’s phenomena of “AI psychosis” and “AI girlfriends” suggest that even at their current level of intelligence, AI models can have a powerful psychological influence on people.... Strategic decision-making. A country of geniuses in a datacenter could be used to advise a country, group, or individual on geopolitical strategy, what we might call a “virtual Bismarck.”...
This is concerning but one doesn't need too much imagination to understand that it's true. Loneliness is rampant, even if we're more connected than ever, and people are turning to AI to be their therapists, companions and more. Controlling the tools that "help" people who need help and are more fragile is yet another way to control people.
Then we have war and surveillance that is unfortunately becoming more prevalent and is another huge problem. People's privacy is no longer respected. One thing is for people to share it online on their own volition, another is for the governments and huge companies to know everything about everyone. One bad CEO or leader and things can become quite dangerous. In Europe we remember this lesson well, at least some of us.
Naming names
The CCP. China is second only to the United States in AI capabilities, and is the country with the greatest likelihood of surpassing the United States in those capabilities... Democracies competitive in AI. As I wrote above, democracies have a legitimate interest in some AI-powered military and geopolitical tools, because democratic governments offer the best chance to counter the use of these tools by autocracies.... Non-democratic countries with large datacenters. Beyond China, most countries with less democratic governance are not leading AI players in the sense that they don’t have companies which produce frontier AI models.... AI companies. It is somewhat awkward to say this as the CEO of an AI company, but I think the next tier of risk is actually AI companies themselves....
The most interesting part for me is the AI companies themselves. As I mentioned, it's quite refreshing to see a CEO of a company being so upfront about things, but he's right. These companies have huge power over people and information and if poorly used, they can be quite damaging to people.
Final Thoughts
As I mentioned in the beginning, the article is huge and covers a lot of topics. I wanted to provide a very high level overview and give some quick remarks. Overall the document is incredible and it's quite refreshing and uplifting to see a CEO be so transparent about the dangers that companies like his are facing.
I recommend you take the time to read it. It's worth it.
I've been a client of Anthropic and I will continue to do so until things change, but it's quite reassuring that I'm giving my money to people that are, at least trying, to do the right thing.
Photo by Antonio Araujo on Unsplash
No comments yet
Be the first to share your thoughts on this article!