Tech

Anthropic's latest AI model can use a computer just like you – mistakes and all

Published

3 hours ago

October 22, 2024

Admin

Anthropic

Imagine an AI model that can work with a computer all on its own. Well, imagine no longer because such an AI has arrived. On Tuesday, Anthropic announced that the latest generation of its Claude AI model can use a computer — just like you and I do. Dubbed Claude 3.5 Sonnet, the AI has surfaced in beta mode for developers to use via an API.

Touted by Anthropic as the “first frontier AI model to offer computer use in public beta,” Claude 3.5 Sonnet can be coded by developers to work with a computer in several ways. By using a product or service programmed via the API, you can tell the AI to “look” at a computer screen, move a cursor around the screen, click buttons, and type text through a virtual keyboard. The idea is to emulate the way you interact with your own computer.

Also: Generative AI doesn’t have to be a power hog after all

For now, the new AI is decidedly in the experimental stage, sometimes cumbersome and prone to errors. However, Anthropic has released the new beta specifically to get feedback from developers so it can improve the model over time.

Why is computer use by an AI useful? Anthropic anticipated and has addressed that question.

“A vast amount of modern work happens via computers,” Anthropic said. “Enabling AIs to interact directly with computer software in the same way people do will unlock a huge range of applications that simply aren’t possible for the current generation of AI assistants.”

And just how can developers and users take advantage of an AI that works with a computer?

“Instead of making specific tools to help Claude complete individual tasks, we’re teaching it general computer skills — allowing it to use a wide range of standard tools and software programs designed for people,” Anthropic explained. “Developers can use this nascent capability to automate repetitive processes, build and test software, and conduct open-ended tasks like research.”

Several companies are already tapping into Claude 3.5 Sonnet’s prowess with computers, including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company, Anthropic said. As one example, the software development and deployment platform Replit is using these capabilities to evaluate applications for its Replit Agent product.

Also: How does Claude work? Anthropic reveals its secrets

Programming Claude to learn to work with computers, specifically looking at the screen and taking certain actions in response, involved a lot of trial and error, according to Anthropic.

Using a computer requires the ability to see and interpret images, such as those of a computer screen. It also involves the capacity to determine how and when to run specific operations based on what’s being displayed on the screen. To tackle these requirements, Claude 3.5 Sonnet looks at screenshots that show it what you’re viewing. The AI then counts the number of vertical and horizontal pixels to figure out where to move the cursor. This skill is essential in the AI’s ability to issue mouse commands.

How has Claude fared so far?

In the OSWorld benchmarking tests, which evaluate attempts by AI models to use computers, Claude 3.5 Sonnet scored a grade of 14.9%. Though that’s far lower than the 70%-75% human-level skill, it’s almost double the 7.7% acquired by the next best AI model in the same category, Anthropic said.

This attempt at computer use by an AI is still in the early stages. As such, Claude can’t perform more “advanced” computer tasks, such as dragging a window or zooming into the screen. Also, the way Claude works with a computer by viewing and putting together screenshots means it can miss certain actions and notifications.

Also: The best AI for coding (and what not to use)

“We expect that computer use will rapidly improve to become faster, more reliable, and more useful for the tasks our users want to complete,” Anthropic said. “It’ll also become much easier to implement for those with less software development experience. At every stage, our researchers will be working closely with our safety teams to ensure that Claude’s new capabilities are accompanied by the appropriate safety measures.”

Claude 3.5 Sonnet is now available to anyone. Developers can build applications with the computer-use beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.