UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Олександр КузьменкоAI Eng
13 January 2025, 16:33
2025-01-13
Claude 3.5 AI started writing better code when he was simply asked to do so
Senior Data Scientist Max Wolf from BuzzFeed tested an interesting hypothesis inspired by a viral trend from 2023. He found out what would happen if an AI chatbot was given a task and asked to improve the result several times in a row.
Senior Data Scientist Max Wolf from BuzzFeed tested an interesting hypothesis inspired by a viral trend from 2023. He found out what would happen if an AI chatbot was given a task and asked to improve the result several times in a row.
In November 2023, after OpenAI added the ability for ChatGPT to generate images from DALL-E 3 in the ChatGPT web interface, a short-lived meme emerged where users would give the AI a base image and keep asking the model to «make it more X», where X could be anything.
Max Wolf recalls that this trend quickly died out, with all images eventually reduced to something cosmic, regardless of the initial image and the clue. According to him, although this trend was provoked by an artificial intelligence error, from a scientific point of view it is interesting that such a vague clue had some influence on the final image.
«What would happen if we tried to apply a similar technique to code? Code generated using LLM is unlikely to be bad (although it is not impossible) because it follows strict rules, and unlike creative outputs such as images, the quality of code can be measured more objectively,» asks Wolf.
He wondered if code could really be improved simply by iterative prompts, such as asking a large language model to «make the code better,» what would happen if the code iterations were too frequent? Would the equivalent of «cosmic» code emerge?
How AI improved its code thanks to prompting
For the experiment, he chose Anthropic’s Claude 3.5 Sonnet chatbot. According to Max Wolf, it has «incredible speed in executing all types of prompts,» especially coding prompts. In addition, coding benchmarks also favor Claude 3.5 Sonnet over GPT-4o.
For the Claude 3.5 experiment, Sonnet was given a typical interview question for beginning Python programmers. It was simple enough, yet unique enough that the AI wouldn’t copy a ready-made solution from the internet, and also one that had room for improvement.
As a benchmark for improvement, Wolf chose the execution speed of the code from the task. On his Macbook Pro M3 Pro, this code executed in an average of 657 milliseconds. In total, he made 5 iterations with the request to improve the code.
Conclusions from Wolf’s experiment
Max Wolfe’s industrial engineering results
«Overall, asking LLM to 'write better code' does indeed make the code better, depending on what you mean by 'better.' By using general iterative hints, the code objectively improved compared to the baseline examples, both in terms of additional features and speed,» concluded Max Wolf.
According to him, prompt engineering improved code performance much faster and more consistently, but was more likely to lead to subtle bugs because large language models are not optimized for generating high-performance code.
«As with any use of LLM, your results may vary, and ultimately, human intervention will be needed to fix the inevitable problems, no matter how often AI apologists call LLM magic,» Wolfe said. He added that anyone interested can view the code from the experiment, including benchmarking scripts and data visualization code, on GitHub.
«Of course, these LLMs will not replace software engineers anytime soon, because recognizing what is actually a good idea requires strong engineering training, as well as other constraints that are specific to a particular domain. Even with the amount of code available on the internet, LLMs cannot distinguish average code from good, high-performance code without outside help,» warned the Senior Data Scientist.
Ukrainian expert in PR, communications, and the application of AI technologies, Oleksiy Minakov, noted that this experiment indicates the importance of prompt engineering.
«The first answer is unlikely to be optimal because the models tend to give an average result (to be more specific, next token prediction models learn to maximize the probability of predicting the next token over huge batches of inputs, and as a result they optimize for average inputs and outputs)», — Minakov wrote on his Facebook.
He urged «not to be lazy to continue interacting with ChatGPT, Gemini, or Claude» after the first generated response.
According to Oleksandr Krakovetsky, CEO of IT companies DevRain and DonorUA, author of the book «ChatGPT, DALL·E, Midjourney: How Generative Artificial Intelligence is Changing the World», the key conclusion of Wolf’s experiment is that «although LLMs have significant potential in code generation and optimization, their work requires mandatory human control.»
«The model can suggest effective ideas or improvements, but the final verification and testing of the code must be done by the developer. This highlights the importance of understanding that LLM is a tool, not a replacement, for programmers», — he added.
Python became the most popular programming language according to the TIOBE index. Which languages were left behind and how the dynamics have changed over 30 years
A Google Software Engineer Got a Job Thanks to the Company’s Free Programming Courses. Here Are 8 Essential Courses He Recommends Every Programmer Take
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.