UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
4 March 2026, 12:59
2026-03-04
Google releases Gemini 3.1 Flash-Lite: the fastest and cheapest model in the Gemini 3 line
Google has introduced Gemini 3.1 Flash-Lite, a model it positions as the fastest and most economical in the Gemini 3 series for long-running scenarios like translation, moderation, and other tasks with a large number of requests.
Google has introduced Gemini 3.1 Flash-Lite, a model it positions as the fastest and most economical in the Gemini 3 series for long-running scenarios like translation, moderation, and other tasks with a large number of requests.
According to a Google blog post, 3.1 Flash-Lite is starting to roll out in preview to developers via the Gemini API in Google AI Studio and to enterprise customers via Vertex AI. The company says the model is aimed at working «at scale» where low latency and cost control are important.
Google immediately highlights the price: $0.25 per 1 million inbound tokens and $1.50 per 1 million outbound tokens. The blog post cites this as an argument for scenarios where you need to handle large volumes of short queries without noticeable quality degradation, such as high-frequency moderation, classification, or mass translation.
Another focus is speed. According to the Artificial Analysis benchmark, 3.1 Flash-Lite has a 2,5x faster Time to First Answer Token compared to Gemini 2.5 Flash and about 45% faster generation speed. Google emphasizes that such low latency is critical for «real-time» services, where the user expects an immediate response.
The company also provides quality benchmarks and comparisons with models of a similar class. In particular, Flash-Lite received an Elo of 1432 in the Arena.ai Leaderboard, showed 86,9% in GPQA Diamond and 76,8% in MMMU Pro. Separately, Google emphasizes that the model can work with multimodal tasks and instructions, and not just with simple classifications.
To help developers balance speed, cost, and accuracy, 3.1 Flash-Lite in AI Studio and Vertex AI features thinking levels, which allow you to choose how «deeply» a model should process a task. Google says this helps manage costs in high-volume scenarios while improving quality where more thought is needed, such as when generating interfaces, dashboards, or simulations.
Google also mentions that early adopters AI Studio and Vertex AI, as well as companies Latitude, Cartwheel, and Whering, are already testing 3.1 Flash-Lite in their products and note the combination of speed with «more mature» capabilities in instructions and reasoning.
Previously, dev.ua wrote about how Google introduced an updated version of its popular image creation model — Nano Banana 2. The new model, which is technically part of Gemini 3.1 Flash Image, is capable of creating much more realistic images than its predecessor.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.