For the past several years, the most consequential AI advances have been invisible. They lived in data centers, communicated through chat boxes, and operated entirely in the digital realm. The physical world, with all its chaos and unpredictability, remained stubbornly resistant to the kind of intelligence that had made such dramatic progress on text, images, and code.

That boundary is dissolving now, faster than most people anticipated. And Google DeepMind is one of the organizations pushing hardest at it.

In September 2025, DeepMind introduced Gemini Robotics 1.5 and its companion model Gemini Robotics-ER 1.5, a two-part system designed to give robots something they have largely lacked: the ability to reason before they act. Not just to receive an instruction and execute it reflexively, but to understand what a task actually requires, break it into steps, consult external information if necessary, and then carry each piece out with awareness of the larger goal.

The technical framing DeepMind uses is "Chain-of-Action" planning — the idea that a robot should think through a sequence of decisions the same way that language models have learned to think through sequences of reasoning tokens before producing an answer. What sounds like a subtle architectural choice is, in practice, a significant shift in what robots can be asked to do. And the implications extend well beyond any single product announcement.


What Google Actually Built

To understand why this matters, it helps to understand how robotics AI has historically worked and why it has always fallen short of genuine general usefulness.

Traditional robot control systems were tightly scripted. A robot in a factory would execute a precise sequence of movements, calibrated to millimeter tolerances, in an environment engineered to remove all the variables that would trip it up. It was not intelligence. It was elaborate automation. These systems could not generalize. If you moved a part six inches to the left, the robot would fail. If a new object appeared in its path, it would stop or malfunction.

The first generation of AI-infused robotics improved on this by using machine learning models to handle perception, allowing robots to identify objects and make basic decisions about how to handle them. But these models were still narrow. They could recognize a cup. They could not understand that "clean the table" means something different when there are both dirty dishes and important documents sitting on it.

Gemini Robotics 1.5 is an attempt to close that gap with a two-model architecture that separates high-level reasoning from low-level physical execution.

The Brain: Gemini Robotics-ER 1.5

The ER model, short for Embodied Reasoning, functions as the high-level orchestrator. It does not directly control any robotic hardware. What it does is understand the environment, the goal, and the constraints, then produce a structured plan that breaks a complex request into a sequence of executable steps.

DeepMind describes it as a thinking model optimized for embodied reasoning. It achieves state-of-the-art performance on 15 academic embodied reasoning benchmarks, covering spatial understanding, pointing accuracy, image and video question answering, and physical reasoning tasks. It can generate semantically precise 2D points grounded in reasoning about object size, weight, and physical properties. It can call external tools, including Google Search, to fill gaps in its knowledge before acting. And critically, it can monitor its own progress and adjust the plan if something goes wrong partway through.

The concrete example DeepMind uses to illustrate the system is illuminating. Ask a robot to sort objects into the correct compost, recycling, and trash bins based on local regulations. A traditional robot cannot do this because it requires three distinct capabilities that have never coexisted in a single system: searching the internet for local recycling rules, visually identifying the objects in front of it, and then physically manipulating those objects through each step of the sorting process. Gemini Robotics-ER 1.5 handles the first and last of these, while handing off the physical steps to its companion model.

The Body: Gemini Robotics 1.5

The VLA model, which stands for vision-language-action, receives natural language instructions from the ER model and translates them directly into motor commands. It is what actually moves the robot's limbs.

What distinguishes this from earlier VLA models is a "think before you act" capability. Rather than mapping an instruction directly to a movement, Gemini Robotics 1.5 generates an internal reasoning chain in natural language, works through what the physical action should look like, and only then produces the motion commands. The robot can explain what it is doing and why, in language a human can understand. DeepMind frames this transparency as both a capability and a safety feature.

The model also demonstrates cross-embodiment learning, one of the most practically significant capabilities in the release. A skill learned on one robot platform transfers to a different robot without retraining. Tasks demonstrated only on the ALOHA 2 dual-arm platform during training transferred automatically to Apptronik's Apollo humanoid and to a dual-arm Franka platform. A single model works across all of these robots. For the field of robotics, where the historical default has been to retrain from scratch every time the hardware changes, this is a substantial shift in how efficiently new robots can be brought to capability.


How the Two Models Work Together

The interaction between the ER model and the VLA model is what makes the system genuinely agentic in a way that prior robotics AI has not been.

The ER model operates like a project manager who can see the full task, understands the context, can pull in external information, and breaks the work into assignable steps. The VLA model operates like a skilled worker who receives those instructions and executes each step with physical precision, reporting back when a step is complete.

When the system encounters something unexpected, the ER model can observe the new situation, revise its plan, and issue updated instructions. This recovery capability is significant. It is the difference between a robot that fails and freezes when a cup is knocked over and one that notices the problem, adapts its approach, and continues the task.

DeepMind made Gemini Robotics-ER 1.5 available to all developers through the Gemini API and Google AI Studio in September 2025. Gemini Robotics 1.5 remains available only to select hardware partners for now. The developer availability of the reasoning layer is a strategic move: it allows the broader ecosystem to begin building on top of the architecture before the full hardware ecosystem matures enough to require the VLA model.


Why the Timing Matters

The release of Gemini Robotics 1.5 landed in the middle of what many in the industry are calling a genuine inflection point for physical AI. That context matters for understanding what Google is actually positioning itself for.

The humanoid robot market has attracted serious capital and serious engineering talent at a pace that has accelerated significantly in 2024 and 2025. Figure AI raised $1 billion in funding in 2025. Apptronik, the partner hardware company working with Google on the Apollo humanoid, raised roughly $403 million in the same period. Agility Robotics raised approximately $400 million. These are not speculative bets. They are deployment-focused investments in companies that have moved from laboratory demonstrations to factory pilots.

Boston Dynamics announced at CES 2026 that its Atlas robot is entering commercial production, with plans to deploy tens of thousands of units at Hyundai Motor Group manufacturing facilities. The production version of Atlas has 56 degrees of freedom, a 50-kilogram lift capacity, and an established AI partnership with Google DeepMind. That partnership is relevant: the Atlas platform is one of the embodiments where Gemini Robotics is intended to run.

Tesla's Optimus program, despite falling short of Elon Musk's production predictions for 2025, continues to deploy internally and conducted demonstrations in October 2025 that showed improved motion control and new learning capabilities. The competitive pressure is real on every axis.

What Google is doing with the Gemini Robotics architecture is different from what Tesla or Boston Dynamics is doing, and the distinction matters. Google is not primarily building a robot. It is building the intelligence layer that can run inside robots built by many different companies. The strategic analogy is Android. Android did not build hardware. It provided an operating system that commoditized phone hardware and gave Google distribution across a vast ecosystem. Gemini Robotics, if successful, plays a similar role in the physical AI stack.


The Competitive Landscape

Google is not alone in this pursuit. The race to define the intelligence layer for physical AI is active and well-funded from multiple directions.

NVIDIA's Isaac GR00T N1 is a foundation model for humanoid robots that was introduced as part of NVIDIA's physical AI platform. NVIDIA has positioned itself as the infrastructure provider for this category, supplying the compute hardware and the simulation environments needed to train robots at scale through its Isaac Lab platform. NVIDIA's approach is to make the training infrastructure, the reasoning layer, and the deployment hardware all interoperate seamlessly.

Physical Intelligence, a San Francisco-based startup founded by former researchers from Google and academia, is developing general-purpose robot learning that aims to work across a wide variety of physical tasks without task-specific programming. Their open-source contributions have been cited as meaningful accelerants to the field.

Microsoft and OpenAI have an established relationship. Figure AI's Figure 02 robot uses large language models from OpenAI for natural language task understanding, integrating language intelligence with Figure's own hardware and motor control research. The Figure-OpenAI combination represents a similar two-layer architecture to what Google has built, with language model intelligence directing physical execution.

What separates Google's approach is the combination of multimodal depth, hardware-agnosticism, and the breadth of the Gemini model family that sits beneath it. Gemini Robotics-ER 1.5 is built on the same foundation models that power Google's broader AI products. That gives it access to the full range of capabilities in the Gemini family, including web search tool calling, multimodal perception, and the accumulated reasoning quality of a frontier-class language model. Whether that structural advantage translates into real-world robotics superiority remains to be proven, but the foundation is substantively different from a robotics-specific model trained from scratch.


What This Actually Means for Industry

The near-term deployment story is centered on manufacturing, warehousing, and logistics, where the economic case for capable robots is clearest and the tolerance for long learning curves is relatively high.

Warehouse automation powered by embodied AI has already demonstrated measurable efficiency gains in existing deployments. DHL's sorting robots have increased capacity by 40%. Amazon's Sequoia inventory system speeds up processing by 75%. These figures come from systems that are far less capable than what Gemini Robotics represents. The question is how quickly more capable, more general-purpose systems displace the narrower automation that is already in place.

The Citi Research projection that is worth paying attention to puts the current global installed base of industrial robots at around 4 million units. If robots displace just 30% of manufacturing tasks over the next decade, that installed base could reach 30 million units, growing at over 20% annually. Broader projections suggest 1.3 billion AI-enabled robots by 2035 and 4 billion by 2050. These numbers are speculative, but the structural demand drivers are real: an aging workforce in developed economies, rising labor costs, and a reshoring of manufacturing that is creating labor shortages in exactly the sectors where robots can substitute.

Healthcare is another domain receiving serious attention. Embodied AI research publications in healthcare domains grew nearly sevenfold between 2019 and 2024. The applications range from surgical assistance and clinical documentation to elder care support and physical rehabilitation. The constraints in healthcare are stricter, the liability exposure is greater, and the regulatory pathway is longer. But the potential scale is enormous, given demographic trends in both the United States and globally.


The Risks That Are Not Being Talked About Enough

The optimism around physical AI is real and largely justified by the technical progress. But there is a category of risk that is genuinely different from anything that software-only AI has presented, and the policy and governance infrastructure to address it is not yet in place.

  1. Physical harm. When a language model makes an error, the consequence is bad text. When an embodied AI system makes an error, the consequences operate in the physical world. A surgical robot with a reasoning failure can cause serious harm. A warehouse robot that misinterprets its environment can injure a human worker. The tolerance for error in embodied systems is fundamentally lower than in purely digital ones.
  2. Insufficient policy frameworks. A research paper presented at NeurIPS 2025 laid this out directly, arguing that existing policies governing industrial robots and autonomous vehicles are insufficient to address the full range of concerns that embodied AI systems present. The paper identified four categories of risk: physical harm, informational risks including surveillance and data privacy, economic disruption from labor displacement, and social risks including over-reliance and breakdown of human connection.
  3. Jailbreaking vulnerabilities. The jailbreaking concern is particularly urgent and underappreciated. Research has demonstrated that embodied AI systems can inherit jailbreaking vulnerabilities from the large language models they are built on. If a language model can be manipulated into bypassing its safety guidelines, a robot built on that model can potentially be manipulated into performing harmful physical actions. The stakes of an LLM jailbreak and a robot jailbreak are not the same.
  4. Labor displacement. Labor displacement is the risk that generates the most public concern, and that concern is legitimate. A world with 30 million capable general-purpose robots is a world with structural unemployment pressure across manufacturing, warehousing, and service industries. History suggests that technology creates new jobs even as it displaces old ones, but history also shows that the transition period is painful and that the distribution of new opportunities does not automatically reach the people who lost their jobs to automation. The policy conversation on this is several years behind the technology development.
  5. Surveillance and privacy. There is also the surveillance dimension. Embodied AI systems in homes, workplaces, and public spaces are perception systems by necessity. They have to see and understand their environment to navigate and act within it. That same perceptual capability is a surveillance capability, and the data those systems collect, about people's routines, behaviors, and physical spaces, creates privacy risks that do not have adequate legal frameworks governing them yet.
  6. Technical measures are not enough. Google has taken steps to address safety at the technical level. Gemini Robotics incorporates a "Robot Constitution" approach inspired by Isaac Asimov's Three Laws, where data-driven rules expressed in natural language steer the robot's behavior and can be modified and applied without retraining the underlying model. The ASIMOV benchmark, updated alongside the Gemini Robotics 1.5 release, provides a standardized way to evaluate a robot's semantic safety, testing whether it can recognize when an action would be unsafe to perform in a given context. These are meaningful technical measures. They are not a substitute for regulatory frameworks that do not yet exist.

What the Architecture Signals About the Future

The design choice to separate reasoning from execution in Gemini Robotics is worth examining beyond its immediate technical implications.

Google is essentially arguing that the right way to build a capable robot is to give it two distinct cognitive layers: a general-purpose language and reasoning model that understands the world broadly, and a physical execution model that handles the mechanics of translating decisions into movement. The reasoning layer can be updated independently of the execution layer. New capabilities from the broader Gemini model family flow automatically into the robotics reasoning layer. The execution model can be specialized for different robot forms without changing the reasoning architecture.

This is a modular approach, and modularity tends to win in platform competitions because it allows faster iteration on each component and easier adaptation to new hardware. If the architecture succeeds, Google could achieve what Android achieved in mobile: not dominance in any single device category, but deep integration across a diverse hardware ecosystem that makes Google's AI infrastructure the invisible layer running inside most capable robots.

That outcome is not guaranteed. The history of robotics is littered with systems that worked brilliantly in laboratory conditions and failed at scale in the real world. The gap between controlled demonstration and reliable deployment in genuinely unstructured environments remains large. Folding laundry in a DeepMind lab is a different problem from folding laundry in a house where children have scattered toys across the floor, the lighting is poor, and the laundry basket is not quite where the robot expects it to be.

But the trajectory is clear. The combination of reasoning capability, cross-embodiment learning, tool-calling for real-world information access, and transparent action explanation represents a qualitatively different kind of robotic intelligence than anything that has existed before. Whether the timeline to general-purpose robots in homes and offices is five years or fifteen is genuinely uncertain. That it is now measured in years rather than decades is not.


Frequently Asked Questions

What is Google Gemini Robotics 1.5?

Gemini Robotics 1.5 is Google DeepMind's most capable vision-language-action model for robotics. It translates visual information and natural language instructions into physical motor commands, allowing robots to perform complex real-world tasks. Unlike earlier robotics AI, it thinks before acting, generating a reasoning chain before executing any physical movement. It was announced in September 2025 and is currently available to select hardware partners.

What is the difference between Gemini Robotics 1.5 and Gemini Robotics-ER 1.5?

The two models work together as a brain-and-body architecture. Gemini Robotics-ER 1.5 is the embodied reasoning model, acting as the high-level brain that understands context, plans multi-step tasks, calls external tools like Google Search, and monitors task progress. Gemini Robotics 1.5 is the vision-language-action model, acting as the hands and eyes that receive instructions from the ER model and translate them into physical motor commands. Gemini Robotics-ER 1.5 is available to developers via the Gemini API, while Gemini Robotics 1.5 is limited to select partners.

What is Chain-of-Action planning in robotics?

Chain-of-Action planning is Google DeepMind's approach to having robots reason through a sequence of decisions before acting, rather than responding reactively to each instruction. The robot's reasoning model breaks a complex task like cleaning a room or sorting recyclables into a structured plan of discrete steps, then executes each step in sequence with awareness of the overall goal. This approach is modeled on chain-of-thought reasoning, which has significantly improved the performance of language models on complex cognitive tasks.

Which robots can run Gemini Robotics?

Gemini Robotics is designed to work across multiple robot embodiments from a single model. Google has demonstrated it running on the ALOHA 2 dual-arm platform, the Franka bi-arm robot, and Apptronik's Apollo humanoid robot. Boston Dynamics, whose Atlas robot is entering commercial production with planned deployments at Hyundai Motor Group facilities, has an AI partnership with Google DeepMind that is expected to incorporate Gemini Robotics models.

How does Google's approach compare to Tesla's Optimus or Boston Dynamics' Atlas?

Google is primarily building the intelligence layer, not the robot hardware. Tesla builds both the hardware and the AI in-house, targeting mass production at a goal price of $20,000 to $30,000 per unit. Boston Dynamics focuses on hardware engineering excellence, with Atlas targeting enterprise deployment at an estimated $140,000 or more per unit. Google's strategic position is more analogous to Android in mobile: providing the AI architecture that many different hardware makers can run, rather than competing primarily in the hardware market.

What are the biggest risks of embodied AI and physical robots?

Research presented at NeurIPS 2025 identified four primary risk categories: physical harm from malfunction or misuse, informational risks including surveillance and privacy violations from perceptual systems operating in homes and workplaces, economic disruption from labor displacement across manufacturing and service sectors, and social risks including over-reliance and the erosion of human roles. Embodied AI systems can also inherit jailbreaking vulnerabilities from underlying language models, which creates risks that have no equivalent in software-only AI. Policy frameworks to govern these risks are currently insufficient.

When will robots like Gemini Robotics be available to consumers?

General consumer availability remains several years away. Current deployments are focused on controlled industrial and research environments. Boston Dynamics is targeting commercial production for enterprise customers beginning in 2026. Tesla's Optimus is currently deployed internally in Tesla factories and has not announced a consumer release date. General-purpose home robots capable of handling unstructured domestic environments reliably represent a harder technical problem than factory deployment, and the timeline to that capability is uncertain, with credible estimates ranging from five to fifteen years.

What is the ASIMOV benchmark and why is it important for robotics safety?

The ASIMOV benchmark, developed by Google DeepMind and updated alongside the Gemini Robotics 1.5 release, is a dataset and evaluation framework for testing the semantic safety of robotic AI systems. It tests whether a robot can recognize when an action would be unsafe to perform in a given physical context, covering scenarios drawn from real-world use cases. Gemini Robotics-ER 1.5 achieves state-of-the-art performance on the benchmark, with its reasoning capability contributing significantly to improved safety understanding and adherence to physical safety constraints.


Humanoid Robots for Home — What You Can Actually Buy in 2026
A Comprehensive Guide to Consumer-Ready Humanoid Robots for the American Home
UBTECH Walker S2: Real 24/7 Humanoid Robot Revolution 2025
UBTECH Walker S2 is the world’s first humanoid robot with autonomous battery swapping, achieving 24/7 operation. With $112M orders and deployments at Nio, BYD, Zeekr factories in China, this isn’t hype—it’s the real robot revolution.