OpenAI released GPT-5.4: what the new model can do
OpenAI has introduced GPT-5.4 in ChatGPT (GPT-5.4 Thinking mode), API, and Codex. The company also launched GPT-5.4 Pro, a version for users who need maximum performance on complex tasks.
OpenAI has introduced GPT-5.4 in ChatGPT (GPT-5.4 Thinking mode), API, and Codex. The company also launched GPT-5.4 Pro, a version for users who need maximum performance on complex tasks.
OpenAI has introduced GPT-5.4 in ChatGPT (GPT-5.4 Thinking mode), API, and Codex. The company also launched GPT-5.4 Pro, a version for users who need maximum performance on complex tasks.
According to OpenAI, GPT-5.4 is focused on «work» scenarios: preparing documents, spreadsheets, and presentations, as well as multi-step actions with tools. In ChatGPT, Thinking mode can show a short plan of action before the final answer, so that the user can adjust direction while still in the process.
For developers, the key change is in another: in the Codex and API, GPT-5.4 has a built-in ability to work with a computer as an agent. In simple words, the model can look at screenshots of the interface, click buttons with the mouse and enter text, performing chains of actions in various programs and on websites. The release states that the API supports a context of up to 1 million tokens, that is, the model can keep very long instructions and materials in memory during operation.
OpenAI also provided test results. On GDPval, which tests performance in 44 roles for professional tasks, GPT-5.4 outperforms or is equal to humans in 83,0% of comparisons (GPT-5.2 is 70,9%). On OSWorld-Verified, which evaluates performance in a desktop environment through screenshots and mouse/keyboard actions, the model has a 75,0% success rate compared to GPT-5.2's 47,3%. For coding on SWE-Bench Pro (Public), OpenAI claims a 57,7% success rate.
In the quality section, the company writes that GPT-5.4 has become more accurate: individual statements are 33% less likely to be false, and full answers are 18% less likely to contain errors compared to GPT-5.2. OpenAI also claims significant progress in office tasks: in an internal test for creating financial tables, GPT-5.4 scored 87,3% against 68,4% for GPT-5.2, and presentations from the new model are more often chosen by people.
GPT-5.4 combines the GPT-5.3-Codex approach to programming with a focus on tools and automation. In the Codex, OpenAI separately showed off an experimental Playwright (Interactive) mode that allows the model to test web applications during development.
Previously, dev.ua wrote about how OpenAI released an updated GPT-5.3 Instant model. It is much less moralistic in communicating with users and gets down to business right away.



