Our Mission

Our research teams investigate the safety, inner workings, and societal impact of AI models — so that artificial intelligence has a positive impact on society as it becomes increasingly advanced and capable.

Research Teams

Interpretability

The mission of the Interpretability team is to discover and understand how large language models work internally — the foundation of ensuring safety and positive outcomes.

Learn more

Alignment

The Alignment teams works to understand and develop ways to keep future advancements in AI helpful, honest, and harmless.

Learn more

Societal Impacts

Working closely with the Anthropic Policy and Trust & Safety teams, the Societal Impacts team is a technical research team that looks to ensure AI interacts positively with people.

Learn more

Research Principles

01

AI as a Systematic Science

Inspired by the universality of scaling in statistical physics, we develop scaling laws to help us do systematic, empirically-driven research. We search for simple relations among data, compute, parameters, and performance of large-scale networks. Then we leverage these relations to train networks more efficiently and predictably, and to evaluate our own progress. We’re also investigating what scaling laws for the safety of AI systems might look like, and this will inform our future research.

02

Safety and Scaling

At Anthropic we believe safety research is most useful when performed on highly capable models. Every year, we see larger neural networks which perform better than those that came before. These larger networks also bring new safety challenges. We study and engage with the safety issues of large models so that we can find ways to make them more reliable, share what we learn, and improve safe deployment outcomes across the field. Our immediate focus is prototyping systems that pair these safety techniques with tools for analyzing text and code.

03

Tools and Measurements

We believe critically evaluating the potential societal impacts of our work is a key pillar of research. Our approach centers on building tools and measurements to evaluate and understand the capabilities, limitations, and potential for societal impact of our AI systems. A good way to understand our research direction here is to read about some of the work we’ve led or collaborated on in this space: AI and Efficiency, Measurement in AI Policy: Opportunities and Challenges, the AI Index 2021 Annual Report, and Microscope.

04

Focused, Collaborative Research Efforts

We highly value collaboration on projects, and aim for a mixture of top-down and bottom-up research planning. We always aim to ensure we have a clear, focused research agenda, but we put a lot of emphasis on including everyone — researchers, engineers, societal impact experts and policy analysts — in determining that direction. We look to collaborate with other labs and researchers, as we believe the best research into characterizing these systems will come from a broad community of researchers working together.

Join the Research team

Publications

No results found.

Interpretability

A surprising fact about modern large language models is that nobody really knows how they work internally. The Interpretability team strives to change that — to understand these models to better plan for a future of safe AI.

Safety through understanding

It's very challenging to reason about the safety of neural networks without understanding them. The Interpretability team’s goal is to be able to explain large language models’ behaviors in detail, and then use that to solve a variety of problems ranging from bias to misuse to autonomous harmful behavior.

Multidisciplinary approach

Some Interpretability researchers have deep backgrounds in machine learning – one member of the team is often described as having started mechanistic interpretability, while another was on the famous scaling laws paper. Other members joined after careers in astronomy, physics, mathematics, biology, data visualization, and more.

In some important way, we don't build neural networks. We grow them. We learn them. And so, understanding them becomes an exciting research problem.
Chris Olah
Research Lead, Interpretability

Join the Interpretability team

Research Papers

No results found.

Alignment

Future AI systems will be even more powerful than today’s, likely in ways that break key assumptions behind current safety techniques — that’s why it’s important to have the right safeguards in place to keep models helpful, honest, and harmless. The Alignment Science team works to understand the challenges ahead and create protocols to train, evaluate, and monitor highly-capable models safely.

Evaluation and oversight

Alignment researchers validate that models are harmless and honest even under very different circumstances than those under which they were trained. They also develop methods to allow humans to collaborate with language models to verify claims that humans might not be able to on their own.

Stress-testing safeguards

Alignment researchers also systematically look for situations in which models might behave badly, and check whether our existing safeguards are sufficient to deal with risks that human-level capabilities may bring.

You want people to be completely aware of exactly what they're interacting with, and to kind of be under no illusions. I feel that's really important.
Amanda Askell
Researcher, Alignment
Portrait of Amanda Askell

Join the Alignment team

Research Papers

No results found.

Societal Impacts

From examining election integrity risks to studying how AI systems might augment (rather than replace) humans, the Societal Impacts team uses tools from a variety of fields to enable positive relationships between AI and people.

Sociotechnical alignment

Which human values should AI models hold, and how should they operate in the face of conflicting or ambiguous values? How is AI used (and misused) in the wild? How can we anticipate future uses and risks of AI? Societal Impacts researchers develop experiments, training methods, and evaluations to answer these questions.

Policy relevance

Though the Societal Impacts team is technical, they often pick research questions that have policy relevance. They believe that providing trustworthy research concerning topics policymakers care about will lead to better policy (and overall) outcomes for everyone.

As language models advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks.
Alex Tamkin
Research Scientist, Societal Impacts

Join the Societal Impacts team

Research Papers