Наталя Хандусенко AI Eng 3 April 2026, 08:39

Google's new open AI model, Gemma 4, gives developers more freedom

Previous versions used a proprietary license that was criticized for being too restrictive. With Gemma 4, Google is moving to the Apache 2.0 license, which is much more lenient and widely used by developers, including in other Google products like Android. The new model also boasts improved performance.

Leave a comment

Google's new open AI model, Gemma 4, gives developers more freedom

Previous versions used a proprietary license that was criticized for being too restrictive. With Gemma 4, Google is moving to the Apache 2.0 license, which is much more lenient and widely used by developers, including in other Google products like Android. The new model also boasts improved performance.

Google has unveiled a new series of Gemma models with open weights, optimized for agent-based AI and programming. They are released under a more permissive Apache 2.0 license, which aims to attract the corporate sector, The Register reports .

The fourth generation of Gemma models, developed by the Google DeepMind team, has received a number of improvements, including "advanced logical reasoning" for better handling of mathematics and executing instructions, support for over 140 languages, native function calls, and the ability to process video and audio.

As with previous versions of Gemma, Google is releasing models in multiple sizes to cover a wide range of applications: from single-board computers and smartphones to laptops and corporate data centers.

The flagship of the line is a 31 billion-parameter Large Language Model (LLM), which Google says has been tuned to deliver the highest quality results, and is compact enough that companies don't have to spend hundreds of thousands of dollars on GPU servers to run or train it.

The model can run without quantization (at 16-bit precision) on a single H100 accelerator with 80 GB of memory. At the same time, at 4-bit precision, the model becomes compact enough to fit on a 24 GB graphics card, such as the Nvidia RTX 4090 or AMD RX 7900 XTX, when using frameworks such as Llama.cpp or Ollama.

For scenarios requiring lower latency (i.e. faster responses), the Gemma 4 line also includes a 26 billion parameter model built on a Mixture of Experts (MoE) architecture.

During inference, only a fraction of the model's 128 "experts" are used to process and generate each token, for a total of 3.8 billion active parameters. As long as the model fits in your video memory, it can generate tokens much faster than a "dense" model of similar size.

This high speed comes at the cost of slightly lower response quality, as only a fraction of the total number of parameters are used to generate the result. However, this may be a justifiable trade-off when running on devices with slower memory, such as laptops or consumer graphics cards.

Both models have a context window size of 256,000 tokens. This makes them suitable for creating local assistants for writing code - this is the use case that Google specifically emphasized in its announcement.

Despite their size, both models have a context window of 128,000 tokens and are multimodal. This means that in addition to text, they can accept visual and audio information as input (E2B/E4B versions only).

As with any manufacturer benchmark, these claims should be taken with a grain of salt. However, Google claims significant performance gains across a range of AI tests compared to the Gemma 3.

Perhaps the most important change in Gemma 4, however, is the move to the more liberal Apache 2.0 license. This gives enterprise customers much more flexibility in how and where they can use or deploy these models.

Previously, Google's license for the Gemma family prohibited the use of models in certain scenarios, and the company reserved the right to terminate a user's access if they did not play by its rules.

The move to Apache 2.0 means that businesses can now implement these models without fear that Google will suddenly "knock the ground out from under their feet."

Gemma 4 is already available in Google AI Studio and AI Edge Gallery, as well as in popular model repositories such as Hugging Face, Kaggle, and Ollama.

At the time of release, Google claims full support ("day-one support") for over a dozen inference frameworks, including vLLM, SGLang, Llama.cpp, and MLX.

Microsoft has released three basic AI models for generating text, voice, and images: the company claims that their advantage is a lower price

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Tags: google, ai, gemma

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Головоломка киянина Quadline перемогла на фестивалі інді-ігор Google Play

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

Харківська художниця намалювала новий дудл для Google на День Незалежності України

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment