Олександр Кузьменко AI Eng 13 January 2025, 16:33

Claude 3.5 AI started writing better code when he was simply asked to do so

Senior Data Scientist Max Wolf from BuzzFeed tested an interesting hypothesis inspired by a viral trend from 2023. He found out what would happen if an AI chatbot was given a task and asked to improve the result several times in a row.

How AI improved its code thanks to prompting

For the experiment, he chose Anthropic’s Claude 3.5 Sonnet chatbot. According to Max Wolf, it has «incredible speed in executing all types of prompts,» especially coding prompts. In addition, coding benchmarks also favor Claude 3.5 Sonnet over GPT-4o.

For the Claude 3.5 experiment, Sonnet was given a typical interview question for beginning Python programmers. It was simple enough, yet unique enough that the AI wouldn’t copy a ready-made solution from the internet, and also one that had room for improvement.

As a benchmark for improvement, Wolf chose the execution speed of the code from the task. On his Macbook Pro M3 Pro, this code executed in an average of 657 milliseconds. In total, he made 5 iterations with the request to improve the code.

Conclusions from Wolf’s experiment

Max Wolfe’s industrial engineering results

«Overall, asking LLM to 'write better code' does indeed make the code better, depending on what you mean by 'better.' By using general iterative hints, the code objectively improved compared to the baseline examples, both in terms of additional features and speed,» concluded Max Wolf.

According to him, prompt engineering improved code performance much faster and more consistently, but was more likely to lead to subtle bugs because large language models are not optimized for generating high-performance code.

«As with any use of LLM, your results may vary, and ultimately, human intervention will be needed to fix the inevitable problems, no matter how often AI apologists call LLM magic,» Wolfe said. He added that anyone interested can view the code from the experiment, including benchmarking scripts and data visualization code, on GitHub.

«Of course, these LLMs will not replace software engineers anytime soon, because recognizing what is actually a good idea requires strong engineering training, as well as other constraints that are specific to a particular domain. Even with the amount of code available on the internet, LLMs cannot distinguish average code from good, high-performance code without outside help,» warned the Senior Data Scientist.

Ukrainian expert in PR, communications, and the application of AI technologies, Oleksiy Minakov, noted that this experiment indicates the importance of prompt engineering.

«The first answer is unlikely to be optimal because the models tend to give an average result (to be more specific, next token prediction models learn to maximize the probability of predicting the next token over huge batches of inputs, and as a result they optimize for average inputs and outputs)», — Minakov wrote on his Facebook.

He urged «not to be lazy to continue interacting with ChatGPT, Gemini, or Claude» after the first generated response.

According to Oleksandr Krakovetsky, CEO of IT companies DevRain and DonorUA, author of the book «ChatGPT, DALL·E, Midjourney: How Generative Artificial Intelligence is Changing the World», the key conclusion of Wolf’s experiment is that «although LLMs have significant potential in code generation and optimization, their work requires mandatory human control.»

«The model can suggest effective ideas or improvements, but the final verification and testing of the code must be done by the developer. This highlights the importance of understanding that LLM is a tool, not a replacement, for programmers», — he added.

Read the country's main IT news in our Telegram

An AI expert tested Grok. What are its features and how does the chatbot differ from ChatGPT Gemini and Claude?

Python became the most popular programming language according to the TIOBE index. Which languages were left behind and how the dynamics have changed over 30 years

A Google Software Engineer Got a Job Thanks to the Company's Free Programming Courses. Here Are 8 Essential Courses He Recommends Every Programmer Take

Leave a comment

Text: Олександр Кузьменко Source: Max Woolf Tags: ai, claude 3.5 sonnet, programmer

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment