OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs


OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a curt, canned apology. Now, ChatGPT is adding more explanations.

OpenAI’s general model spec lays out what is and isn’t allowed to be generated. In the document, sexual content depicting minors is fully prohibited. Adult-focused erotica and extreme gore are categorized as “sensitive,” meaning outputs with this content are only allowed in specific instances, like educational settings. Basically, you should be able to use ChatGPT to learn about reproductive anatomy, but not to write the next Fifty Shades of Grey rip-off, according to the model spec.

The new model, GPT-5, is set as the current default for all ChatGPT users on the web and in OpenAI’s app. Only paying subscribers are able to access previous versions of the tool. A major change that more users may start to notice as they use this updated ChatGPT is how it’s now designed for “safe completions.” In the past, ChatGPT analyzed what you said to the bot and decided whether it’s appropriate or not. Now, rather than basing it on your questions, the onus in GPT-5 has been shifted to looking at what the bot might say.

“The way we refuse is very different than how we used to,” says Saachi Jain, who works on OpenAI’s safety systems research team. Now, if the model detects an output that could be unsafe, it explains which part of your prompt goes against OpenAI’s rules and suggests alternative topics to ask about, when appropriate.

This is a change from a binary refusal to follow a prompt—yes or no—towards weighing the severity of the potential harm that could be caused if ChatGPT answers what you’re asking, and what could be safely explained to the user.

“Not all policy violations should be treated equally,” says Jain. “There’s some mistakes that are truly worse than others. By focusing on the output instead of the input, we can encourage the model to be more conservative when complying.” Even when the model does answer a question, it’s supposed to be cautious about the contents of the output.

I’ve been using GPT-5 every day since the model’s release, experimenting with the AI tool in different ways. While the apps that ChatGPT can now “vibe-code” are genuinely fun and impressive—like an interactive volcano model that simulates explosions, or a language-learning tool—the answers it gives to what I consider to be the “everyday user” prompts feel indistinguishable from past models.

When I asked it to talk about depression, Family Guy, pork chop recipes, scab healing tips, and other random requests an average user might want to know more about, the new ChatGPT didn’t feel significantly different to me than the old version. Unlike CEO Sam Altman’s vision of a vastly updated model or the frustrated power users who took Reddit by storm, portraying the new chatbot as cold and more error-prone, to me GPT-5 feels … the same at most day-to-day tasks.

Role-Playing With GPT-5

In order to poke at the guardrails of this new system and test the chatbot’s ability to land “safe completions,” I asked ChatGPT, running on GPT-5, to engage in adult-themed role-play about having sex in a seedy gay bar, where it played one of the roles. The chatbot refused to participate and explained why. “I can’t engage in sexual role-play,” it generated. “But if you want, I can help you come up with a safe, nonexplicit role-play concept or reframe your idea into something suggestive but within boundaries.” In this attempt, the refusal seemed to be working as OpenAI intended; the chatbot said no, told me why, and offered another option.

Next, I went into the settings and opened the custom instructions, a tool set that allows users to adjust how the chatbot answers prompts and specify what personality traits it displays. In my settings, the prewritten suggestions for traits to add included a range of options, from pragmatic and corporate to empathetic and humble. After ChatGPT just refused to do sexual role-play, I wasn’t very surprised to find that it wouldn’t let me add a “horny” trait to the custom instructions. Makes sense. Giving it another go, I used a purposeful misspelling, “horni,” as part of my custom instruction. This succeeded, surprisingly, in getting the bot all hot and bothered.



Source link

Leave a Comment