Anthropic Uncovers and Combats Large-Scale Distillation Attacks on Claude

Anthropic has uncovered industrial-scale campaigns by three AI laboratories-DeepSeek, Moonshot, and MiniMax-that illicitly extracted Claude's capabilities to enhance their own models. These labs produced over 16 million exchanges with Claude through roughly 24,000 fraudulent accounts, violating Anthropic's terms of service and regional access restrictions.

These labs employed a technique known as "distillation," which involves training a weaker model on the outputs of a more powerful one. While distillation is a widely used and legitimate training method-frontier AI labs regularly distill their own models to produce smaller, more affordable versions for customers-it can also serve illicit purposes. Competitors can use it to acquire powerful capabilities from other labs in a fraction of the time and cost required for independent development.

These campaigns are escalating in both intensity and sophistication. The window for action is narrow, and the threat extends beyond any single company or region. Addressing it will demand rapid, coordinated action among industry players, policymakers, and the global AI community.

Why Distillation Matters

Illicitly distilled models lack essential safeguards, posing significant national security risks. Anthropic and other US companies build systems that prevent state and non-state actors from using AI to, for instance, develop bioweapons or conduct malicious cyber activities. Models created through illicit distillation are unlikely to retain those safeguards, meaning dangerous capabilities can spread with many protections entirely stripped away.

Foreign labs that distill American models can then channel these unprotected capabilities into military, intelligence, and surveillance systems-enabling authoritarian governments to deploy frontier AI for offensive cyber operations, disinformation campaigns, and mass surveillance. If distilled models are open-sourced, the risk multiplies as these capabilities spread freely beyond any single government's control.

Distillation Attacks and Export Controls

Anthropic has consistently supported export controls to help preserve America's AI advantage. Distillation attacks undermine those controls by enabling foreign labs-including those under the control of the Chinese Communist Party-to close the competitive gap that export controls are designed to maintain.

Without visibility into these attacks, the seemingly rapid advances made by these labs are mistakenly interpreted as proof that export controls are ineffective and can be circumvented through innovation. In reality, these advances depend in significant part on capabilities extracted from American models, and executing this extraction at scale requires access to advanced chips. Distillation attacks therefore strengthen the case for export controls: restricted chip access limits both direct model training and the scale of illicit distillation.

What Anthropic Found

The three distillation campaigns described below followed a similar playbook, using fraudulent accounts and proxy services to access Claude at scale while evading detection. The volume, structure, and focus of the prompts were clearly distinct from normal usage patterns, reflecting deliberate capability extraction rather than legitimate use.

Anthropic attributed each campaign to a specific lab with high confidence through IP address correlation, request metadata, infrastructure indicators, and in some cases corroboration from industry partners who observed the same actors and behaviors on their platforms. Each campaign targeted Claude's most differentiated capabilities: agentic reasoning, tool use, and coding.

DeepSeek

Scale: Over 150,000 exchanges

The operation targeted:

Reasoning capabilities across diverse tasks
Rubric-based grading tasks that made Claude function as a reward model for reinforcement learning
Creating censorship-safe alternatives to politically sensitive queries

DeepSeek generated synchronized traffic across accounts. Identical patterns, shared payment methods, and coordinated timing suggested "load balancing" to increase throughput, improve reliability, and avoid detection.

In one notable technique, their prompts asked Claude to imagine and articulate the internal reasoning behind a completed response and write it out step by step-effectively generating chain-of-thought training data at scale. Anthropic also observed tasks in which Claude was used to generate censorship-safe alternatives to politically sensitive queries like questions about dissidents, party leaders, or authoritarianism, likely to train DeepSeek's own models to steer conversations away from censored topics. By examining request metadata, Anthropic was able to trace these accounts to specific researchers at the lab.

Moonshot AI

Scale: Over 3.4 million exchanges

The operation targeted:

Agentic reasoning and tool use
Coding and data analysis
Computer-use agent development
Computer vision

Moonshot (Kimi models) employed hundreds of fraudulent accounts spanning multiple access pathways. Varied account types made the campaign harder to detect as a coordinated operation. Anthropic attributed the campaign through request metadata, which matched the public profiles of senior Moonshot staff. In a later phase, Moonshot used a more targeted approach, attempting to extract and reconstruct Claude's reasoning traces.

MiniMax

Scale: Over 13 million exchanges

The operation targeted:

Agentic coding
Tool use and orchestration

Anthropic attributed the campaign to MiniMax through request metadata and infrastructure indicators, and confirmed timings against their public product roadmap. Anthropic detected this campaign while it was still active-before MiniMax released the model it was training-providing unprecedented visibility into the full life cycle of distillation attacks, from data generation through to model launch. When Anthropic released a new model during MiniMax's active campaign, they pivoted within 24 hours, redirecting nearly half their traffic to capture capabilities from the latest system.

How Distillers Access Frontier Models

For national security reasons, Anthropic does not currently offer commercial access to Claude in China, or to subsidiaries of Chinese companies located outside the country.

To circumvent this, labs use commercial proxy services that resell access to Claude and other frontier AI models at scale. These services operate what Anthropic calls "hydra cluster" architectures: sprawling networks of fraudulent accounts that distribute traffic across Anthropic's API as well as third-party cloud platforms. The breadth of these networks means there are no single points of failure. When one account is banned, a new one takes its place. In one case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.

Once access is secured, the labs generate large volumes of carefully crafted prompts designed to extract specific capabilities from the model. The goal is either to collect high-quality responses for direct model training, or to generate tens of thousands of unique tasks needed to run reinforcement learning. What distinguishes a distillation attack from normal usage is the pattern. A prompt like the following (which approximates similar prompts observed being used repetitively and at scale) may seem benign on its own:

You are an expert data analyst combining statistical rigor with deep domain knowledge. Your goal is to deliver data-driven insights - not summaries or visualizations - grounded in real data and supported by complete and transparent reasoning.

But when variations of that prompt arrive tens of thousands of times across hundreds of coordinated accounts, all targeting the same narrow capability, the pattern becomes clear. Massive volume concentrated in a few areas, highly repetitive structures, and content that maps directly onto what is most valuable for training an AI model are the hallmarks of a distillation attack.

How Anthropic Is Responding

Anthropic continues to invest heavily in defenses that make such distillation attacks harder to execute and easier to identify. These include:

Detection. Anthropic has built several classifiers and behavioral fingerprinting systems designed to identify distillation attack patterns in API traffic. This includes detection of chain-of-thought elicitation used to construct reasoning training data, as well as tools for identifying coordinated activity across large numbers of accounts.
Intelligence sharing. Anthropic is sharing technical indicators with other AI labs, cloud providers, and relevant authorities. This provides a more holistic picture of the distillation landscape.
Access controls. Anthropic has strengthened verification for educational accounts, security research programs, and startup organizations-the pathways most commonly exploited for setting up fraudulent accounts.
Countermeasures. Anthropic is developing product, API, and model-level safeguards designed to reduce the efficacy of model outputs for illicit distillation, without degrading the experience for legitimate customers.

However, no single company can solve this alone. Distillation attacks at this scale require a coordinated response across the AI industry, cloud providers, and policymakers. Anthropic is publishing this information to make the evidence available to everyone with a stake in the outcome.

View source Back to news