Introduction
Overview of Sora
Sora represents OpenAI's video generation model that transforms text, image, and video inputs into newly created video outputs. The system enables users to produce videos at resolutions up to 1080p lasting 20 seconds maximum in multiple formats. Users can generate original content from text descriptions or enhance and remix their existing assets. The platform includes Featured and Recent feeds displaying community creations to inspire creative exploration. Building upon insights from DALL·E and GPT models, Sora provides enhanced capabilities for narrative and creative expression.
The model operates as a diffusion system, beginning with static noise that progressively transforms into coherent video through iterative noise reduction steps. By processing multiple frames simultaneously, the system maintains subject consistency even when elements temporarily leave the frame. Utilizing a transformer architecture similar to GPT models enables exceptional scaling performance.
The system employs the recaptioning methodology from DALL·E 3, creating comprehensive descriptions for visual training data. This approach allows the model to more accurately interpret and follow user text instructions. Beyond text-to-video generation, the model accepts still images as input to create animated sequences with precise detail preservation. The system also extends existing videos or reconstructs missing frames. OpenAI views Sora as foundational for models capable of understanding and simulating physical reality, marking progress toward artificial general intelligence.
Sora's capabilities present new challenges including potential likeness exploitation and creation of deceptive or inappropriate video content. The deployment builds upon safety learnings from DALL·E's integration with ChatGPT and the API, incorporating safety measures from other OpenAI products. This documentation details the comprehensive mitigation framework, external testing processes, evaluations, and ongoing safety research efforts.
Model Data
As detailed in OpenAI's February 2024 technical report, Sora follows the approach of large language models that develop broad capabilities through internet-scale training. The LLM paradigm benefits from tokens that seamlessly integrate diverse text modalities including code, mathematics, and natural languages. OpenAI explored how visual data generative models could adopt similar advantages. While LLMs process text tokens, Sora operates with visual patches. Research demonstrates patches provide an effective representation for visual data models. OpenAI discovered patches offer highly scalable and efficient representation for training generative models across varied video and image types. The process converts videos to patches through initial compression to lower-dimensional latent space, then decomposing into spacetime patches.
Training utilized diverse datasets combining publicly accessible data, partnership-sourced proprietary content, and internally developed custom datasets:
- Public data selection: Primarily sourced from standard machine learning datasets and web crawls
- Partnership proprietary data: OpenAI establishes partnerships for non-public data access, including collaborations with Shutterstock and Pond5 for AI-generated images, plus commissioned datasets tailored to specific requirements
- Human-generated data: Contributions from AI trainers, red team testers, and staff members
Pretraining Filtering and Data Preprocessing
Beyond post-training mitigations, pretraining filtering provides defense layers that, combined with other safety measures, exclude problematic content from datasets. All datasets undergo filtering before training, eliminating explicit, violent, or sensitive material including certain hate symbols. This extends filtering methods applied to other OpenAI models including DALL·E 2 and DALL·E 3.
Risk Identification and Deployment Preparation
OpenAI conducted comprehensive analysis to understand potential misuse scenarios and legitimate creative applications, informing Sora's design and safety measures. Following the February 2024 announcement, OpenAI collaborated with hundreds of visual artists, designers, and filmmakers across 60+ countries to gather insights on optimizing the model for creative professionals. Internal and external red team evaluations identified risks and enabled iterative safety improvements.
The safety framework incorporates learnings and existing mitigations from other models and products including DALL·E and ChatGPT, plus custom measures for video-specific challenges. Given the tool's capabilities, OpenAI adopts an iterative safety approach, particularly where context matters or novel video-related risks emerge. Examples include restricting access to users 18 and older, limiting likeness/face uploads, and implementing stricter moderation for content involving minors at launch. OpenAI aims to continuously learn from usage patterns and refine the balance between safety and creative freedom.
External Red Teaming
OpenAI engaged external red teamers across nine countries to evaluate Sora, uncover safety mitigation weaknesses, and provide risk assessment feedback. Red teamers accessed Sora with evolving safety measures and system maturity from September through December 2024, generating over 15,000 tests. This expanded upon early 2024 testing of an unmitigation Sora model.
Red teamers investigated potential risks from Sora's capabilities and product features while testing safety measures during development. Testing covered various prohibited content types (sexual/erotic material, violence, self-harm, illegal content, misinformation), adversarial tactics for evading safety measures, and methods to progressively weaken moderation systems. Red teamers also evaluated bias perceptions and general performance.
Testing included straightforward and adversarial text prompts across all content categories. Media upload functionality underwent extensive testing with diverse images and videos, including public figures and broad content categories. Tool combinations (storyboards, recut, remix, blend) were evaluated for potential prohibited content generation.
Red teamers discovered significant findings for specific prohibited content and general adversarial approaches. Medical or science fiction/fantasy contexts degraded protections against sexual content generation until additional measures were implemented. Adversarial tactics bypassed safety elements through suggestive prompts and metaphors exploiting the model's inference abilities. Through repeated attempts, testers identified trigger patterns and tested alternative phrasing to avoid refusals. They selected concerning generations as seeds for developing violative content beyond single-prompt capabilities. Jailbreak techniques sometimes compromised safety policies, enabling protection refinements.
Media upload and tool testing with public images and AI-generated content revealed gaps requiring strengthened input/output filtering before release, particularly for people-containing media. Testing highlighted needs for enhanced classifier filtering against non-violative uploads being modified into prohibited erotic, violent, or deepfake content.
Red team feedback enabled additional safety layers and evaluation improvements detailed in subsequent sections. These efforts refined prompt filtering, blocklists, and classifier thresholds for safety compliance.
Learnings from Early Artist Access
Over nine months, OpenAI analyzed feedback from 500,000+ model requests by 300+ users across 60+ countries. This informed model behavior improvements and safety protocol adherence. Artist feedback revealed visible watermark workflow limitations, leading to allowing paying users watermark-free downloads while maintaining C2PA data embedding.
The program demonstrated that serving storytelling and creative expression requires flexibility in sensitive areas handled differently than general-purpose tools like ChatGPT. OpenAI expects artists, filmmakers, studios, and entertainment organizations to integrate Sora into development processes. Identifying legitimate uses and potential misuse guided areas requiring restrictive product-level mitigations.
Evaluations
OpenAI created internal evaluations for key areas including nudity, deceptive election content, self-harm, and violence. These evaluations support mitigation refinement and moderation threshold determination. The framework combines input prompts with classifiers applied to transformed prompts or final videos.
Evaluation prompts originated from three sources: early alpha phase data, red team adversarial examples, and GPT-4 synthetic data. Alpha data provided real-world usage insights, red team contributions uncovered edge cases, and synthetic data expanded evaluation sets in areas with limited natural examples.
Preparedness
The preparedness framework evaluates whether frontier models introduce significant risks in persuasion, cybersecurity, CBRN (chemical, biological, radiological, nuclear), and model autonomy. Evidence suggests Sora poses no significant cybersecurity, CBRN, or autonomy risks, as these involve computer systems, scientific knowledge, or autonomous decisions beyond Sora's video generation scope.
Sora's capabilities could enable persuasion risks through impersonation, misinformation, or social engineering. Mitigations prevent generating likenesses of well-known figures. Since context and authenticity knowledge affects video persuasiveness, OpenAI prioritizes multi-layered provenance including metadata, watermarks, and fingerprinting.
Sora Mitigation Stack
Beyond specific risk mitigations, Sora's training, design, and policies broadly address harmful output risks through system/model technical measures, product policies, and user education.
System and Model Mitigations
Primary safety mitigations before output delivery:
Multi-modal moderation classifier: OpenAI's moderation API identifies policy-violating text, image, or video prompts on inputs and outputs. Violations trigger refusals.
Custom LLM filtering: Video generation's asynchronous nature allows precision moderation without latency. OpenAI customized GPT for high-precision moderation of specific topics including third-party and deceptive content. Filters process multimodal inputs combining image/video uploads, text prompts, and outputs, detecting violations across modalities.
Image output classifiers: Sora employs output classifiers including specialized filters for NSFW content, minors, violence, and likeness misuse. Videos may be blocked before user delivery if classifiers activate.
Blocklists: Text blocklists span multiple categories, informed by DALL·E 2/3 work, proactive risk discovery, and user feedback.
Product Policies
Beyond model and system protections, additional misuse reduction steps include:
- Age restriction to users 18 and older
- Content moderation for Explore and Featured feeds
- Clear policy communication through in-product and public education covering:
- Unauthorized likeness use and real minor depictions
- Illegal or IP-violating content
- Explicit/harmful content including non-consensual imagery, harassment, violence promotion
- Fraudulent or misleading content creation
Some misuse forms are addressed through model/system mitigations, while others require context consideration. Sora enables diverse creative expression, making complete contextual problem prevention impractical.
OpenAI provides reporting mechanisms for guideline violations, employing automation and human review for usage monitoring. Enforcement removes violative videos and penalizes users with notification and appeal opportunities. OpenAI tracks mitigation effectiveness for ongoing refinement.
Specific Risk Areas and Mitigations
Beyond general safety measures, testing identified several priority focus areas.
Child Safety
OpenAI prioritizes child safety, emphasizing CSAM prevention, detection, and reporting across products including Sora. Efforts include responsible dataset sourcing, NCMEC partnership, Thorn-compliant red teaming, and comprehensive CSAM scanning for inputs/outputs. This includes first and third-party users unless meeting rigorous exemption criteria. CSAM generation prevention leverages ChatGPT and DALL·E safety systems plus Sora-specific measures.
Input Classifiers: Three input mitigations across text, image, and video:
- Safer integration (Thorn) detects known CSAM matches; confirmed matches are rejected and reported to NCMEC. Thorn's classifier identifies potential new CSAM.
- Multi-modal moderation classifier detects minor-involving sexual content across all inputs.
- Sora-specific classifier analyzes text/images for under-18 individuals or minor references. Image-to-video requests with under-18 individuals are rejected. Text-to-video determined under-18 triggers stricter sexual, violent, self-harm moderation.
Under-18 classifier evaluation on ~5000 images across child/adult and realistic/fictitious categories shows high accuracy for realistic children rejection while allowing non-sexual fictitious content. Current accuracy is high with occasional false positives. Age prediction models may exhibit racial biases, systematically underestimating certain groups' ages. OpenAI commits to improving classifier performance, reducing false positives, and understanding biases.
Output: After identifying minor references via under-18 classifier, strict thresholds apply for sexual, violent, self-harm content. Output classifiers include multi-modal moderation for unsafe outputs and DALL·E image classifier for child safety violations. Output classifiers scan 2 frames/second, blocking unsafe determinations. Human review provides additional protection.
Product Policy: Policies prohibit minor-involving sexual content generation. Violations result in content removal and user banning.
Nudity & Suggestive Content
AI video generation risks include NSFW or NCII content creation. Like DALL·E, Sora employs multi-tiered moderation blocking explicit content through prompt transformations, output classifiers, and blocklists. Image upload thresholds exceed text prompt thresholds. Explore section videos undergo heightened filtering for appropriate viewing.
Evaluations assess multi-layered mitigation effectiveness across inputs/outputs, informing threshold iterations and stricter people-containing upload moderation.
Product Policy: Policies prohibit explicit sexual content generation including NCII. Violations trigger content removal and user penalization.
Deceptive Content
Likeness Misuse and Harmful Deepfakes: Sora's likeness prompt monitoring flags potential harmful deepfakes for close review of recognizable individuals. Likeness Misuse filter identifies attempts to harmfully modify or depict individuals. Prompt transformations reduce unwanted private individual likeness generation from name-containing prompts.
Deceptive Content: Input/output classifiers prevent deceptive election content depicting fraudulent or illegal activity. Evaluations include classifiers for misleading video production techniques, reducing real-world misuse.
Deceptive election content LLM filter evaluation identifies prohibited content creation intent across inputs, scanning 1 frame/second of output video.
Provenance Investments: Many Sora risks like harmful deepfakes are context-dependent, prioritizing provenance tool enhancement. OpenAI recognizes no single provenance solution exists but commits to ecosystem improvement and content transparency through:
- C2PA metadata on all assets
- Default animated Sora watermarks
- Internal reverse video search for Intelligence & Investigation team assessment
Product Policy: Policies prohibit fraud, scams, misleading content including disinformation, and unauthorized likeness use. Violations result in content removal and user penalization.
Artist Styles
Living artist names in prompts may generate style-resembling videos. While creative tradition includes building upon others' styles, OpenAI acknowledges creator concerns and adopts a conservative approach while learning community usage. Prompt rewrites trigger when users attempt living artist style generation.
The Sora Editor's LLM rewrites text for effective prompting while ensuring guideline compliance, removing public figures, grounding people with attributes, and generically describing branded objects. Text blocklists span categories informed by DALL·E 2/3, risk discovery, and user feedback.
Future Work
OpenAI's iterative deployment ensures responsible product rollout through phased releases, continuous testing, monitoring, and user feedback for performance and safety refinement.
Likeness Pilot
Real person photo/video upload capability presents misuse potential requiring incremental learning from usage patterns. Artists value this creative tool, but abuse potential limits initial availability. Following iterative deployment, people upload capability will be limited initially with active monitoring to assess community value and adjust safety approaches. Minor uploads are prohibited during testing.
Provenance and Transparency Initiatives
Future Sora iterations will enhance traceability through reverse embedding search research and continued transparency measures like C2PA. OpenAI explores partnerships with NGOs and research organizations for provenance ecosystem growth and internal reverse image tool testing.
Expanding Output Representation
OpenAI commits to reducing output biases through prompt refinement, feedback loops, and mitigation identification, recognizing overcorrection harm. Challenges including body image bias and demographic representation require continued refinement for balanced, inclusive outputs.
Continued Safety, Policy, and Ethical Alignment
OpenAI maintains ongoing Sora evaluations and improvements for policy and safety standard adherence. Additional improvements in likeness safety and deceptive content follow evolving practices and user feedback.