UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Олег ОнопрієнкоAI Eng
4 June 2026, 12:37
2026-06-04
Google has released the Gemma 4 12B AI model, which can be run on a regular laptop with 16 GB of memory
The main feature of the release was a unique architecture that allows processing audio and video directly, as well as optimization for running standalone agents on regular computers with 16 GB of RAM.
The main feature of the release was a unique architecture that allows processing audio and video directly, as well as optimization for running standalone agents on regular computers with 16 GB of RAM.
The release of the new neural network is announced on the official project blog. The model fills the niche between the lightweight version of E4B and the more complex 26B Mixture of Experts (MoE). It is noted that the 12 billion version practically reaches the performance of the older model in standard benchmarks, but at the same time requires half the memory resources.
The main technical innovation of Gemma 4 12B is the abandonment of separate multimodal encoders. Traditional systems use them to recognize and translate images and sound before passing them to the language model, which inevitably increases latency and computing power consumption. Instead, the new architecture integrates this data directly. The raw audio signal is projected directly into the space of text tokens, and only a lightweight embedding module is left for image processing, transferring the main work to the underlying language model.
“Thanks to the developer community, the number of downloads of the Gemma 4 family of models has already exceeded 150 million. They are used to create a wide variety of products from wearable robotic arms for physical assistance to artificial intelligence-based corporate security systems,” Google notes.
Among other features of the new product is the integration of Multi-Token Prediction (MTP) technology to further reduce latency during text generation.
One and a half times faster than Gemma 3. Interview with the leader of the Lapa LLM project — the most efficient large language model for the Ukrainian language