Google I/O just ended — and it was packed with AI announcements. As expected, the event focused heavily on Google’s Gemini AI models, along with the ways they’re being integrated into apps like Workspace and Chrome.
Tech
Google I/O 2024: everything announced
If you didn’t get to tune in to the event live, you can catch up with all the latest from Google in the roundup below.
Google Lens already lets you search for something based on images, but now Google’s taking things a step further with the ability to search with a video. That means you can take a video of something you want to search for, ask a question during the video, and Google’s AI will attempt to pull up relevant answers from the web.
Google is rolling out a new feature this summer that could be a boon for just about anyone with years — or even more than a decade — of photos to sift through. “Ask Photos” lets Gemini pore over your Google Photos library in response to your questions, and the feature goes beyond just pulling up pictures of dogs and cats. CEO Sundar Pichai demonstrated by asking Gemini what his license plate number is. The response was the number itself, followed by a picture of it so he could make sure that was right.
Google has introduced a new AI model to its lineup: Gemini 1.5 Flash. The new multimodal model is just as powerful as Gemini 1.5 Pro, but it’s optimized for “narrow, high-frequency, low-latency tasks.” That makes it better at generating fast responses. Google also made some changes to Gemini 1.5 that it says will improve its ability to translate, reason, and code. Also, Google says it has doubled Gemini 1.5 Pro’s context window (how much information it can take in) from 1 million to 2 million tokens.
Google is rolling its latest mainstream language model, Gemini 1.5 Pro, into the sidebar for Docs, Sheets, Slides, Drive, and Gmail. When it rolls out to paid subscribers next month, it will turn into more of a general-purpose assistant within Workspace that can fetch info from any and all of the content from your Drive, no matter where you are. It will also be able to do things for you, like write emails that incorporate info from a document you’re currently looking at or remind you later to respond to an email you’re perusing. Some early testers already have access to these features, but Google says it’s rolling it out to all paid Gemini subscribers next month.
Google’s Project Astra is a multimodal AI assistant that the company hopes will become a do-everything virtual assistant that can watch and understand what it sees through your device’s camera, remember where your things are, and do things for you. It’s powering many of the most impressive demos from I/O this year, and the company’s aim for it is to be an honest-to-goodness AI agent that can’t just talk to you but also actually do things on your behalf.
Google’s answer to OpenAI’s Sora is a new generative AI model that can output 1080p video based on text, image, and video-based prompts. Videos can be produced in a variety of styles, like aerial shots or timelapses, and can be tweaked with more prompts. The company is already offering Veo to some creators for use in YouTube videos but is also pitching it to Hollywood for use in films.
Google is rolling out a custom chatbot creator called Gems. Just like OpenAI’s GPTs, Gems lets users give instructions to Gemini to customize how it will respond and what it specializes in. If you want it to be a positive and insistent running coach with daily motivations and running plans — aka my worst nightmare — you’ll be able to do that soon (if you’re a Gemini Advanced subscriber).
The new Gemini Live feature aims to make voice chats with Gemini feel more natural. The chatbot’s voice will be updated with some extra personality, and users will be able to interrupt it mid-sentence or ask it to watch through their smartphone camera and give information about what it sees in real time. Gemini is also getting new integrations that let it update or draw info from Google Calendar, Tasks, and Keep, using multimodal features to do so (like adding details from a flyer to your personal calendar).
If you’re on an Android phone or tablet, you can now circle a math problem on your screen and get help solving it. Google’s AI won’t solve the problem for you — so it won’t help students cheat on their homework — but it will break it down into steps that should make it easier to complete.
Google will roll out “AI Overviews” — formerly known as “Search Generative Experience,” a mouthful — to everyone in the US this week. Now, a “specialized” Gemini model will design and populate results pages with summarized answers from the web (similar to what you see in AI search tools like Perplexity or Arc Search).
Using on-device Gemini Nano AI smarts, Google says Android phones will be able to help you avoid scam calls by looking out for red flags, like common scammer conversation patterns, and then popping up real-time warnings like the one above. The company promises to offer more details on the feature later in the year.
Google says that Gemini will soon be able to let users ask questions about videos on-screen, and it will answer based on automatic captions. For paid Gemini Advanced users, it can also ingest PDFs and offer information. Those and other multimodal updates for Gemini on Android are coming over the next few months.
Google announced that it’s adding Gemini Nano, the lightweight version of its Gemini model, to Chrome on desktop. The built-in assistant will use on-device AI to help you generate text for social media posts, product reviews, and more from directly within Google Chrome.
Google says it’s expanding what SynthID can do — the company says it will embed watermarking into content created with its new Veo video generator and that it can now also detect AI-generated videos.
Update, May 14th: Added “Ask Photos.”