Наталя Хандусенко AI Eng 15 April 2025, 10:39

OpenAI launched a new family of GPT-4.1 models focused on coding: what the tests showed

OpenAI has introduced new models that focus specifically on programming — GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. The latter is the cheapest model ever, the company claims.

Leave a comment

OpenAI launched a new family of GPT-4.1 models focused on coding: what the tests showed

OpenAI has introduced new models that focus specifically on programming — GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. The latter is the cheapest model ever, the company claims.

Multimodal models, available through the OpenAI API but not through ChatGPT, have a context window of 1 million tokens, meaning they can accept approximately 750,000 words at a time, TechCrunch reports .

The goal of many tech giants, including OpenAI, is to train AI models to perform complex software development tasks. OpenAI’s grand ambition is to create an “agent-based software engineer.” GPT-4.1 is a step in that direction.

“We’ve optimized GPT-4.1 for real-world use based on direct feedback to improve the areas that developers care about most: front-end coding, fewer third-party edits, robust format compliance, response structure and order, consistent tooling, and more,” an OpenAI spokesperson told TechCrunch via email. “These improvements allow developers to build agents that are significantly better at handling real-world software development tasks.”

GPT-4.1 costs $2 per million input tokens and $8 per million output tokens.
The cost of GPT-4.1 mini is $0.40 per million input tokens and $1.60 per million output tokens.
GPT-4.1 nano — $0.10 per million input tokens and $0.40 per million output tokens.

What did the tests show?

OpenAI claims that the full GPT-4.1 model outperforms its GPT-4o and GPT-4o mini models on coding tests, including SWE-bench. It says that GPT-4.1 mini and nano are more efficient and faster at the expense of some accuracy, and that GPT-4.1 nano is the fastest — and cheapest — model ever.

According to OpenAI's internal testing, GPT-4.1, which can generate more tokens at a time than GPT-4o (32,768 vs. 16,384), scored between 52% and 54.6% on SWE-bench Verified, a human-verified subset of SWE-bench.

These numbers are slightly lower than the results reported by Google and Anthropic for Gemini 2.5 Pro (63.8%) and Claude 3.7 Sonnet (62.3%), respectively, on the same test.

In a separate evaluation, OpenAI tested GPT-4.1 using Video-MME, which is designed to measure a model’s ability to “understand” content in a video. GPT-4.1 achieved a record-breaking 72% accuracy in the “long video without subtitles” category, OpenAI says.

OpenAI also acknowledges that GPT-4.1 becomes less reliable (i.e., makes more mistakes) the more input tokens it has to deal with. In one of the company’s own tests, OpenAI-MRCR, the model’s accuracy dropped from about 84% with 8,000 tokens to 50% with 1 million tokens. GPT-4.1 also tended to be more “literal” than GPT-4o, the company says, sometimes requiring more specific, explicit cues.

Recall that Microsoft research showed that AI models do not cope with the code debugging process.

Sam Altman says 10% of the world is now using OpenAI's AI thanks to Studio Ghibli-style imagery

OpenAI plans to introduce ID verification for accessing its AI models via API to reduce unsafe use of AI

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: openai, ai, gpt-4.1, coding

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment