GPT-4o System Card Released - ModelBeacon News

OpenAI has published the GPT-4o System Card, providing a comprehensive safety assessment of the model alongside OpenAI's Preparedness Framework scorecard. The card details the evaluation and mitigation strategies implemented for the multimodal AI model that processes text, audio, image, and video inputs.

Key Risk Areas Addressed

The System Card identifies and addresses several critical risk areas specific to GPT-4o's audio capabilities:

Unauthorized Voice Generation: The model is restricted to using pre-selected voices created in collaboration with voice actors, with output classifiers detecting any deviations from approved voices
Speaker Identification: The model has been trained to refuse requests to identify individuals based on their voice while still identifying famous quotes
Ungrounded Inference and Sensitive Trait Attribution: GPT-4o refuses to make unfounded inferences about speakers' characteristics that cannot be determined from audio alone
Disallowed Audio Content: Existing moderation classifiers are applied to audio transcriptions to block harmful content generation
Erotic and Violent Speech: The model blocks generation of erotic or violent audio content through input filtering

Preparedness Framework Results

According to OpenAI's Preparedness Framework evaluations:

Cybersecurity: Low risk - GPT-4o demonstrated limited capability in CTF challenges, completing 19% of high-school level tasks but only 1% of professional level challenges
Biological Threats: Low risk - The model did not significantly enhance biological threat creation capabilities beyond existing resources
Persuasion: Medium risk (borderline) - Text capabilities marginally crossed into medium risk territory, while voice capabilities remained at low risk
Model Autonomy: Low risk - GPT-4o was unable to robustly execute autonomous actions or complete end-to-end replication tasks

Technical Capabilities and Performance

GPT-4o can respond to audio inputs in 232-320 milliseconds, comparable to human response times in conversation. The model matches GPT-4 Turbo's performance on English text and code while showing significant improvements in non-English languages and being 50% cheaper through the API.

Performance evaluations revealed:

Strong text-to-audio transfer for safety refusals on existing content policies
Consistent behavior across different voice inputs and accents
Improved medical knowledge with 89.4% accuracy on MedQA USMLE (0-shot), exceeding specialized medical models
Enhanced performance on underrepresented languages, narrowing the performance gap with English

Societal Impact Considerations

The System Card discusses several potential societal impacts:

Anthropomorphization and Emotional Reliance: Early testing revealed users forming connections with the model, raising concerns about potential over-reliance and impacts on human relationships
Healthcare Applications: GPT-4o shows promise for clinical workflows and medical knowledge tasks, though real-world utility requires further validation
Scientific Capabilities: The model demonstrates competence in specialized scientific reasoning and tool usage, potentially accelerating routine scientific tasks
Language Accessibility: Significant improvements in historically underrepresented languages, though performance gaps with English persist

Third-Party Assessments

Independent evaluations by METR and Apollo Research validated OpenAI's findings:

METR tested GPT-4o on 77 long-horizon tasks across software engineering, machine learning, and cybersecurity domains
Apollo Research evaluated the model's capacity for scheming, finding moderate self-awareness but limited practical application capabilities

Known Limitations

The System Card acknowledges several areas where mitigations are still in development:

Decreased safety robustness with low-quality or interrupted audio inputs
Potential for generating misinformation that may be more persuasive when delivered through audio
Instances of non-native accents when speaking non-English languages
Copyright content generation risks requiring ongoing monitoring

Training Data and Methods

GPT-4o was pre-trained on data up to October 2023, sourced from:

Public datasets and web crawls
Proprietary data from partnerships, including paywalled content and archives
Multimodal data including images, audio, and video

OpenAI implemented multiple filtering processes to reduce harmful content, personal information, and opted-out images from the training data.

Deployment Safeguards

The deployment incorporates both model-level and system-level safeguards:

Post-training alignment to human preferences
Real-time output classifiers for voice matching
Integration with existing moderation systems
Ongoing monitoring and enforcement through OpenAI's Usage Policies

OpenAI emphasizes that only models with post-mitigation scores of "medium" or below can be deployed, and those with scores of "high" or below can continue development, ensuring systematic safety evaluation before public release.

View source Back to news