OpenAI Details ChatGPT Safety Updates for Sensitive Chats
OpenAI has rolled out ChatGPT safety updates designed to catch risk that builds gradually across a conversation, not just within a single exchange. Internal tests show a 50% improvement in correct handling of high-risk suicide and self-harm scenarios during extended conversations.
How Safety Summaries Improve Responses in Sensitive Conversations
The announcement describes the core mechanism as a feature called "safety summaries." A purpose-built model generates compact factual notes drawn from earlier conversation content whenever prior messages carry safety-relevant signals. ChatGPT draws on those notes when a subsequent request raises concern, allowing it to interpret that request against the full context of what came before.
According to the announcement, the summaries are temporary and tightly scoped. They are not a general personalization feature and do not function as persistent memory, activating only when a serious safety concern is present. For the vast majority of users, the system operates invisibly.
The design decisions behind the system involved a multi-year OpenAI mental health safety collaboration, the announcement notes. Psychiatrists and psychologists from the Global Physicians Network, including specialists in suicide prevention and forensic psychology, advised on when summaries should generate, how far back prior context should reach, and how heavily the model should factor that history. For users who turn to ChatGPT during personal or difficult moments, that clinical foundation means responses draw on real-world expertise rather than generic algorithmic rules.
What the Performance Numbers Mean for Users and Developers
Internal evaluations measured how often ChatGPT gave the intended safe response in scenarios built to replicate high-risk conversations. During extended single-conversation tests, safe-response rates climbed 50% across suicide and self-harm situations and 16% for cases involving potential harm to others, according to the announcement. Those figures mean users showing early distress signals are now far less likely to receive an unhelpful or unsafe response later in the same conversation, even if no single message appeared alarming on its own.
For the GPT-5.5 Instant model, currently the default in ChatGPT, the same updates produced a 52% improvement in harm-to-others scenarios and a 39% gain across suicide and self-harm cases, the announcement states.
A separate round of evaluation checked whether these ChatGPT safety updates affected routine conversation quality. Tested across over 4,000 cases, the safety summaries averaged 4.93 out of 5 on relevance and 4.34 out of 5 on factual accuracy. Evaluators comparing responses generated with and without safety summaries showed no consistent preference, confirming the changes leave ordinary interactions unaffected.
[Analysis] For developers building on the ChatGPT API or embedding the model in consumer-facing products, the open question is whether safety summaries extend to API access or remain limited to the ChatGPT product. That distinction is worth monitoring for any team deploying the model where vulnerable users could be present, such as mental health platforms or general-purpose consumer interfaces. OpenAI notes the approach may eventually expand to other high-risk domains including biology and cybersecurity, though no timeline is given.