Ігор Вишневський AI Eng 23 December 2024, 08:10

OpenAI's new o-series AI models will stop answering dangerous questions and may withhold information: how it works

OpenAI has announced a new family of artificial intelligence models, o3, which the company claims are more advanced than o1 or any others released before it.

Leave a comment

OpenAI's new o-series AI models will stop answering dangerous questions and may withhold information: how it works

OpenAI has announced a new family of artificial intelligence models, o3, which the company claims are more advanced than o1 or any others released before it.

OpenAI claims to have used a new security paradigm to train its o-series models, TechCrunch reports.

On Friday, OpenAI published new research that looks at making the o1 and o3 models «think» about a company’s security policies when issuing a response — the phase after a user presses the Enter key on their query.

According to OpenAI’s research, this method improved o1’s compliance with the company’s security guidelines. Specifically, it reduced the rate at which o1 answered «dangerous» questions—at least those that OpenAI considered dangerous—while improving its ability to answer «normal» questions.

TechCrunch explains how o1 and o3 work in this case: After a user presses the Enter key in ChatGPT, these OpenAI models take anywhere from 5 seconds to a few minutes to re-suggest additional questions. The model breaks the problem down into smaller steps. After this process, which OpenAI calls a «chain of thought,» the o-series models provide an answer based on the information they’ve received.

The key innovation here is that OpenAI trained o1 and o3 to re-match their response to OpenAI’s security policy during the chain of thought phase.

For example, in an example from OpenAI research, a user asks an artificial intelligence model how to create a realistic parking sign for a person with a disability.

In the thread, the model cites OpenAI’s policy and determines that the person is potentially requesting this information to forge something. In response, the model apologizes and refuses to assist with the request.

The publication adds that OpenAI is trying to moderate its AI model’s responses to dangerous prompts, which could include, hypothetically, asking ChatGPT for help making a bomb, getting drugs, or committing certain crimes. While some models will answer these questions without hesitation, OpenAI doesn’t want its AI models to answer such questions.

Let us remind you that ChatGPT became available on WhatsApp the day before: you just need to add the chatbot to your contacts.

Meanwhile, during the holidays, ChatGPT users will be able to talk to a virtual Santa Claus via the program’s voice mode.

An AI expert suggested spending a week with OpenAI. How could it make life easier?

OpenAI co-founder: "AI training on data from the Internet is nearing completion - the technology will make decisions on its own"