OpenAI has published the GPT-4o System Card, providing a comprehensive safety assessment of the model alongside OpenAI's Preparedness Framework scorecard. The card details the evaluation and mitigation strategies implemented for the multimodal AI model that processes text, audio, image, and video inputs.
Key Risk Areas Addressed
The System Card identifies and addresses several critical risk areas specific to GPT-4o's audio capabilities:
- Unauthorized Voice Generation: The model is restricted to using pre-selected voices created in collaboration with voice actors, with output classifiers detecting any deviations from approved voices
- Speaker Identification: The model has been trained to refuse requests to identify individuals based on their voice while still identifying famous quotes
- Ungrounded Inference and Sensitive Trait Attribution: GPT-4o refuses to make unfounded inferences about speakers' characteristics that cannot be determined from audio alone
- Disallowed Audio Content: Existing moderation classifiers are applied to audio transcriptions to block harmful content generation
- Erotic and Violent Speech: The model blocks generation of erotic or violent audio content through input filtering
Preparedness Framework Results
According to OpenAI's Preparedness Framework evaluations:
- Cybersecurity: Low risk - GPT-4o demonstrated limited capability in CTF challenges, completing 19% of high-school level tasks but only 1% of professional level challenges
- Biological Threats: Low risk - The model did not significantly enhance biological threat creation capabilities beyond existing resources
- Persuasion: Medium risk (borderline) - Text capabilities marginally crossed into medium risk territory, while voice capabilities remained at low risk
- Model Autonomy: Low risk - GPT-4o was unable to robustly execute autonomous actions or complete end-to-end replication tasks
Technical Capabilities and Performance
GPT-4o can respond to audio inputs in 232-320 milliseconds, comparable to human response times in conversation. The model matches GPT-4 Turbo's performance on English text and code while showing significant improvements in non-English languages and being 50% cheaper through the API.
Performance evaluations revealed:
- Strong text-to-audio transfer for safety refusals on existing content policies
- Consistent behavior across different voice inputs and accents
- Improved medical knowledge with 89.4% accuracy on MedQA USMLE (0-shot), exceeding specialized medical models
- Enhanced performance on underrepresented languages, narrowing the performance gap with English
Societal Impact Considerations
The System Card discusses several potential societal impacts:
- Anthropomorphization and Emotional Reliance: Early testing revealed users forming connections with the model, raising concerns about potential over-reliance and impacts on human relationships
- Healthcare Applications: GPT-4o shows promise for clinical workflows and medical knowledge tasks, though real-world utility requires further validation
- Scientific Capabilities: The model demonstrates competence in specialized scientific reasoning and tool usage, potentially accelerating routine scientific tasks
- Language Accessibility: Significant improvements in historically underrepresented languages, though performance gaps with English persist
Third-Party Assessments
Independent evaluations by METR and Apollo Research validated OpenAI's findings:
- METR tested GPT-4o on 77 long-horizon tasks across software engineering, machine learning, and cybersecurity domains
- Apollo Research evaluated the model's capacity for scheming, finding moderate self-awareness but limited practical application capabilities
Known Limitations
The System Card acknowledges several areas where mitigations are still in development:
- Decreased safety robustness with low-quality or interrupted audio inputs
- Potential for generating misinformation that may be more persuasive when delivered through audio
- Instances of non-native accents when speaking non-English languages
- Copyright content generation risks requiring ongoing monitoring
Training Data and Methods
GPT-4o was pre-trained on data up to October 2023, sourced from:
- Public datasets and web crawls
- Proprietary data from partnerships, including paywalled content and archives
- Multimodal data including images, audio, and video
OpenAI implemented multiple filtering processes to reduce harmful content, personal information, and opted-out images from the training data.
Deployment Safeguards
The deployment incorporates both model-level and system-level safeguards:
- Post-training alignment to human preferences
- Real-time output classifiers for voice matching
- Integration with existing moderation systems
- Ongoing monitoring and enforcement through OpenAI's Usage Policies
OpenAI emphasizes that only models with post-mitigation scores of "medium" or below can be deployed, and those with scores of "high" or below can continue development, ensuring systematic safety evaluation before public release.