UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
26 September 2025, 14:33
2025-09-26
Google Introduces Gemini Robotics 1.5 and Robotics-ER 1.5: AI Teaches Robots to Think, Plan, and Search for Information
The tech giant presented Gemini Robotics-ER 1.5 and Robotics 1.5, which work as a «brain and hands»: ER 1.5 builds a step-by-step plan, calls tools (including search) and transmits steps, while the VLA model perceives video/images and translates instructions into motor commands.
The tech giant presented Gemini Robotics-ER 1.5 and Robotics 1.5, which work as a «brain and hands»: ER 1.5 builds a step-by-step plan, calls tools (including search) and transmits steps, while the VLA model perceives video/images and translates instructions into motor commands.
Google detailed the updates in a blog post. They are two complementary models for robotics: Gemini Robotics 1.5 (VLA, Vision-Language-Action Model) and Gemini Robotics-ER 1.5 (VLM, Embodied Reasoning Model). The former translates visual data and instructions into motor commands for the robot, «thinking before acting» and showing the reasoning process.
The second acts as a «high-level brain»: it plans missions, makes logical inferences in the physical environment, natively invokes digital tools (including Google Search), and breaks down complex tasks into step-by-step instructions.
What the new models can do. Robotics-ER 1.5 achieves SOTA scores on 15 academic benchmarks of spatial understanding and embodied reasoning. In practice, this means that the agent first looks for rules (for example, local garbage sorting norms), correlates them with what the camera «sees», and only then gives a sequence of steps to perform. Robotics 1.5 processes each step: from «putting a papercup in a recycling container» to a specific trajectory of the manipulator, while being able to explain why it chose these actions («conscious» planning, segmenting long missions into short subtasks).
Cross-platform learning. An important breakthrough is the transfer of skills between different «bodies»: movements learned on ALOHA 2 are practiced on both the Apptronik Apollo humanoid and the two-armed Franka without special «fitting» of the model to each robot. This speeds up the onboarding of new hardware platforms and reduces the time to useful applications.
The team emphasizes responsible implementation: from high-level semantic «think-safe-before-act» and alignment with Gemini safety policies to low-level collision avoidance subsystems on board the robot. The ASIMOV benchmark for semantic safety assessment has been updated: «rare» cases, question types, and video modalities have been expanded; Robotics-ER 1.5 demonstrates SOTA through improved «thinking».
Gemini Robotics-ER 1.5 is available to developers today via the Gemini API in Google AI Studio. Gemini Robotics 1.5 is currently available with select partners; the company promises to expand the program.
The Robotics line is based on the basic multimodal Gemini models that Google has been implementing in «physical» scenarios since the beginning of the year. In previous stages, the company demonstrated how agents understand instructions, video plot, and spatial relationships; version 1.5 adds transparent turn-based reasoning, mission planning, tool calling, and skill portability between different robots — things that systems that simply «respond to a command» lacked. For industry, this means creating universal robot assistants: from logistics and manufacturing to service, everyday life, and R&D laboratories.
Previously, dev.ua wrote about how Google officially brought its Gemini conversational AI to TVs with Google TV: first, support was received by TCL models of the QM9K series, but support will be expanded throughout the year.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.