Anthropic's Effort to Broaden Dialogue on Frontier AI Development

Anthropic aims to develop AI systems that benefit humanity and serve the global good. Achieving this requires engaging with people who hold a wide range of perspectives.

Over recent months, the company has been organizing conversations with groups whose expertise and traditions are relevant to the challenges posed by AI. The initial round of discussions involved wisdom traditions-featuring scholars, clergy, philosophers, and ethicists from over fifteen religious and cross-cultural communities-with plans to expand engagement to a wider set of voices.

The Motivation Behind This Initiative

Creating safe and beneficial AI models demands rigorous technical work on alignment, interpretability, safeguards, evaluations, and more. But that work doesn't happen in isolation-nor is AI deployed without affecting real people. AI already touches many lives, and the questions it raises benefit from diverse input.

Anthropic is thinking carefully about what a flourishing future might look like alongside powerful AI, what it means for an AI system serving millions to behave well, and about the substance of documents like Claude's constitution, which outlines the values and behaviors guiding Claude. Philosophers, clergy, lawyers, writers, psychologists, and civic leaders have spent considerable time on related questions, and Anthropic sees it as essential to learn from these individuals, their communities, and their organizations. The company also wants to use these exchanges to share what it knows about frontier AI development, anticipated societal impacts, and the steps needed to address associated risks.

This work is still early, but the hope is that these conversations will inform the practical development of Claude-including the content of Claude's constitution, the values Claude is trained to embody, and the behaviors Anthropic chooses to evaluate.

Beginning with Moral Formation

When Anthropic drafted Claude's constitution, it sought feedback on the values in the document from people across different fields and traditions. Those early exchanges have since evolved into a broader research effort focused on the moral formation of AI systems. The first conversations have been with members of religious, philosophical, and cultural communities that have long traditions of thinking about virtue, character, and living a good life.

AI models are trained on enormous volumes of human text. From that material, they absorb ways of speaking, reasoning, and making decisions. Developers then refine this further through training-deciding which patterns to reinforce, which to discard, and what kind of character they want the system to develop. This raises fundamental questions about how an AI's character should be shaped: What does it mean for an AI to be good? Which traits and behaviors should it exhibit, and in what contexts? How can character become robust enough to hold firm under pressure without drifting into sycophancy?

Anthropic has been meeting with thinkers and practitioners from across religious, philosophical, and humanist traditions, as well as a range of political viewpoints, to learn how they've approached these questions. The goal is not to align models with any single tradition's worldview; Anthropic wants Claude to draw on the full spectrum of perspectives-religious, secular, political-with equal depth and rigor (this is in fact one of the principles in Claude's constitution). What the company seeks from these conversations is thoughtful, accumulated wisdom on how good character actually develops.

Even at this early stage, these discussions are producing ideas worth testing. In one session with scholars working at the intersection of neuroscience and character formation, a recurring theme was the role other people play in moral development. A mentor or sponsor can serve as an external conscience-a "safe other" to consult when facing situations that might push someone to act against their values. Anthropic wondered whether something analogous could help an AI model. So the team experimented with giving Claude a tool it could invoke mid-task that returned a brief reminder of its own ethical commitments. Claude reached for the tool at critical moments-right before consequential actions-often noting its own conflict of interest. Experiments integrating this tool into Claude's decision loop showed notably lower rates of misaligned behavior across several internal alignment evaluations. The team is still working to determine how much of the effect comes from the reminder itself versus the act of pausing to reflect, and plans to share additional results soon.

These discussions represent the beginning of a longer effort, and Anthropic is grateful to everyone who has contributed their time and candid perspectives.

Looking Ahead

In the coming months, Anthropic plans to engage additional groups-including legal scholars, psychologists, writers, and civic institutions. Many of these conversations will extend beyond moral formation to explore broader questions about how AI is reshaping work, institutions, and the distribution of power.

The company intends to continue deepening the relationships already established, testing insights against its research, and sharing what it learns.

View source Back to news