Tech

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Published

5 months ago

July 11, 2024

Admin

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Generative AI has already shown a lot of promise in robots. Applications include natural language interactions, robot learning, no-code programming and even design. Google’s DeepMind Robotics team this week is showcasing another potential sweet spot between the two disciplines: navigation.

In a paper titled “Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs,” the team demonstrates how it has implemented Google Gemini 1.5 Pro to teach a robot to respond to commands and navigate around an office. Naturally, DeepMind used some of the Every Day Robots that have been hanging around since Google shuttered the project amid widespread layoffs last year.

In a series of videos attached to the project, DeepMind employees open with a smart assistant-style “OK, Robot,” before asking the system to perform different tasks around the 9,000-square-foot office space.

Image Credits: Google DeepMind

In one example, a Googler asks the robot to take him somewhere to draw things. “OK,” the robot responds, wearing a jaunty yellow bowtie, “give me a minute. Thinking with Gemini …” The robot then proceeds to lead the human to a wall-sized white board. In a second video, a different person tells the robot to follow the directions on the whiteboard.

A simple map shows the robot how to get to the “Blue Area.” Again, the robot thinks for a moment before taking a long route to what turns out to be a robotics testing area. “I’ve successfully followed the directions on the whiteboard,” the robot announces with a level of self-confidence most humans can only dream of.

Prior to these videos, the robots were familiarized with the space using what the team calls “Multimodal Instruction Navigation with demonstration Tours (MINT).” Effectively, that means walking the robot around the office while pointing out different landmarks with speech. Next, the team utilizes hierarchical Vision-Language-Action (VLA) to “that combin[e] the environment understanding and common sense reasoning power.” Once the processes are combined, the robot can respond to written and drawn commands, as well as gestures.

Google says the robot had a 90% or so success rate across more than 50 interactions with employees.

Related Topics:DeepMind gemini 1.5 pro google

Up Next

Apple’s M2 MacBook Air drops to a new low of $799 ahead of Prime Day

Don't Miss

Teenage Mutant Ninja Turtles: Mutants Unleashed launches October 18

Crunchbase News Today

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Tech

Watch a robot navigate the Google DeepMind offices using Gemini | TechCrunch

Be prepared for record-breaking holiday travel, AAA says

Boston shoppers brave bitter cold to get last-minute gifts on Newbury Street

‘CES is no longer just about consumer tech’

Thunder, Alex Caruso reportedly agree to 4-year, $81 million extension

The Crazy, Confused World Of Used Cybertruck Pricing (Update)

‘It’s fun:’ Fromberg woman turning candle making hobby into business

28 Versatile Pieces Of Clothing That’ll Make Your Travel Wardrobe Work Harder For You

Capricorn, Daily Horoscope Today, December 23, 2024: Your hard work will pay off – Times of India

Report: Dybala wants to stay at Roma

How to spot fake shipping notifications and other tips to avoid last-minute shopping scams