Реклама партнера — Название партнёра
UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉

Meta introduces new AI model for programming. CWM studies how code works, not just how it looks

A team of AI researchers at Meta has unveiled a new large language model (LLM) for writing code. It deepens understanding of code by studying not only its appearance but also its behavior during execution. The model, called the Code World Model (CWM), was trained on huge datasets of how code interacts with its environment, allowing it to form an internal “world model” of how computing systems work.

Leave a comment
Meta introduces new AI model for programming. CWM studies how code works, not just how it looks

A team of AI researchers at Meta has unveiled a new large language model (LLM) for writing code. It deepens understanding of code by studying not only its appearance but also its behavior during execution. The model, called the Code World Model (CWM), was trained on huge datasets of how code interacts with its environment, allowing it to form an internal “world model” of how computing systems work.

CWM not only learns the dynamics of its environment, but also demonstrates excellent performance on typical tests of programming and mathematics. This creates a new promising direction for training AI agents that can perform more complex and dynamic software development tasks in large companies. The CWM model is part of an overall strategy aimed at moving LLM from predicting the next token to creating full-fledged “models of the world,” writes VentureBeat.

Typically, a model learns to code by predicting the next instruction in a program, much like it predicts the next word in a sentence. However, the researchers argue that to truly master coding, a model must understand “not just what the code looks like, but also what it does at runtime.” This skill is fundamental for software engineers, who have a general understanding of how changes to the code will affect local variables or the overall behavior of their program. Programmers think of code not as a sequence of tokens, but as a series of related components (variables, objects, functions, modules, etc.), which they then translate into a sequence of instructions. In other words, they develop a “model of the world” of their program as they create or modify it.

This “world modeling” ability is often neglected in large language models until after the main training is complete, and it is this approach that the Meta team is challenging.

How the CWM model works

Source: VentureBeat

CWM is a new large language model (LLM) designed to address these problems by training on large amounts of “code world modeling data.” Rather than waiting for the final tuning stage, CWM learns how code behaves during the “intermediate training” stage. The hypothesis is that if the model’s predictions are based on the dynamics of computer systems from the start, it will create a much stronger foundation for further training and reinforcement learning techniques.

The researchers focused on two key types of data.

The first is Python code execution traces, which are step-by-step recordings of how the program’s internal state (e.g., its variables) changes as each line of code is executed (as opposed to the classic scheme where models are trained on code and final results). By training on these observation-action trajectories, CWM gains a deeper understanding of how instructions affect the overall behavior of the program.

“Our premise is that teaching CWM semantics, not just program syntax, should help both in code writing and reasoning tasks such as verification, testing, and debugging,” the researchers write.

The second type of data consists of agent interactions in Docker environments. The team created a synthetic data generator called ForagerAgent that simulates a software development agent performing tasks such as fixing bugs or implementing new features. By observing these multi-step interactions at large scales in the early stages of training, CWM learns the dynamics of these environments before it is even configured to perform specific tasks in those same environments.

In practice, this allows CWM to reason about code in the same way that a human developer would. For example, when the model is given a programming challenge, CWM can generate an initial solution, then design its own input/output tests to verify its correctness, and finally compare its predicted result with the actual results of executing the code. This self-validation cycle is a direct result of its training based on a “world model.”

CWM in action

Source: VentureBeat

The model has 32 billion parameters and a context window of up to 131,000 tokens.

On SWE-bench Verified, a test that involves solving real-world problems from GitHub repositories, CWM achieved a success rate of 65.8%, outperforming other open-source models of similar size. It also scored highly on LiveCodeBench (a benchmark for competitive programming), Math-500 and AIME 2024 (mathematical reasoning), and CruxEval (Python code output prediction).

Based on the results obtained, the scientists are convinced that world models “can improve autonomous coding, allow step-by-step reproduction of Python code execution, and show the first benefits that reasoning gains from such an approach.”

However, they also highlight the model's shortcomings. CWM is released as a purely research model under a non-commercial license, and should not be used as a public assistant or chatbot. While it has received some information to execute commands, the model has not yet undergone the comprehensive optimization required for conversational mode.

Anthropic introduced its best AI model for programming — Claude Sonnet 4.5
Anthropic introduced its best AI model for programming — Claude Sonnet 4.5
On the topic
Anthropic introduced its best AI model for programming — Claude Sonnet 4.5
DeepSeek introduced a new AI model V3.1-Exp, which it called "an intermediate step towards the next generation architecture"
DeepSeek introduced a new AI model, V3.1-Exp, which it called "an intermediate step towards the next generation architecture"
On the topic
DeepSeek introduced a new AI model, V3.1-Exp, which it called "an intermediate step towards the next generation architecture"
Read the country's main IT news in our Telegram
Read the country's main IT news in our Telegram
On the topic
Read the country's main IT news in our Telegram
Also Read
Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть
Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть
Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть
Жодних ігор у метавсесвіті: Facebook припинить підтримку свого сервісу для геймерів
Жодних ігор у метавсесвіті: Facebook припинить підтримку свого сервісу для геймерів
Жодних ігор у метавсесвіті: Facebook припинить підтримку свого сервісу для геймерів
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
25 серпня вийшла чергова серія популярного подкасту The Joe Rogan Experience, гостем якого став глава компанії Meta Марк Цукерберг. Розповідаємо про головне з майже 3-годинного інтерв’ю.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

Discussion
No comments yet.