Валентин Шнайдер AI Eng 4 March 2026, 12:59

Google releases Gemini 3.1 Flash-Lite: the fastest and cheapest model in the Gemini 3 line

Google has introduced Gemini 3.1 Flash-Lite, a model it positions as the fastest and most economical in the Gemini 3 series for long-running scenarios like translation, moderation, and other tasks with a large number of requests.

Leave a comment

Google releases Gemini 3.1 Flash-Lite: the fastest and cheapest model in the Gemini 3 line

Google has introduced Gemini 3.1 Flash-Lite, a model it positions as the fastest and most economical in the Gemini 3 series for long-running scenarios like translation, moderation, and other tasks with a large number of requests.

According to a Google blog post, 3.1 Flash-Lite is starting to roll out in preview to developers via the Gemini API in Google AI Studio and to enterprise customers via Vertex AI. The company says the model is aimed at working «at scale» where low latency and cost control are important.

Google immediately highlights the price: $0.25 per 1 million inbound tokens and $1.50 per 1 million outbound tokens. The blog post cites this as an argument for scenarios where you need to handle large volumes of short queries without noticeable quality degradation, such as high-frequency moderation, classification, or mass translation.

Another focus is speed. According to the Artificial Analysis benchmark, 3.1 Flash-Lite has a 2,5x faster Time to First Answer Token compared to Gemini 2.5 Flash and about 45% faster generation speed. Google emphasizes that such low latency is critical for «real-time» services, where the user expects an immediate response.

The company also provides quality benchmarks and comparisons with models of a similar class. In particular, Flash-Lite received an Elo of 1432 in the Arena.ai Leaderboard, showed 86,9% in GPQA Diamond and 76,8% in MMMU Pro. Separately, Google emphasizes that the model can work with multimodal tasks and instructions, and not just with simple classifications.

To help developers balance speed, cost, and accuracy, 3.1 Flash-Lite in AI Studio and Vertex AI features thinking levels, which allow you to choose how «deeply» a model should process a task. Google says this helps manage costs in high-volume scenarios while improving quality where more thought is needed, such as when generating interfaces, dashboards, or simulations.

Google also mentions that early adopters AI Studio and Vertex AI, as well as companies Latitude, Cartwheel, and Whering, are already testing 3.1 Flash-Lite in their products and note the combination of speed with «more mature» capabilities in instructions and reasoning.

Previously, dev.ua wrote about how Google introduced an updated version of its popular image creation model — Nano Banana 2. The new model, which is technically part of Gemini 3.1 Flash Image, is capable of creating much more realistic images than its predecessor.

Gemini users complain about chat histories suddenly disappearing

Gemini learned to generate 30-second music tracks with lyrics based on the Lyria AI model. dev.ua tested the service

Google releases updated Gemini 3 Deep Think model for researchers, scientists, and engineers

Read the country's main IT news in our Telegram

Leave a comment

Text: Валентин Шнайдер Photo: Блог Google Source: Блог Google Tags: google, gemini, gemini 3 flash, gemini 3.1 flash-lite, ai, artificial intelligence

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Головоломка киянина Quadline перемогла на фестивалі інді-ігор Google Play

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

Харківська художниця намалювала новий дудл для Google на День Незалежності України

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment