Tech
Who wants ‘Her’-like AI that gets stuff wrong? | TechCrunch
Hiya, folks, welcome to TechCrunch’s regular AI newsletter. If you want this in your inbox every Wednesday, smash the link and sign up here.
Last week, OpenAI launched Advanced Voice Mode with Vision, which feeds real-time video to ChatGPT, allowing the chatbot to “see” beyond the confines of its app layer. The premise is that by giving ChatGPT greater contextual awareness, that bot can respond in a more natural and intuitive way.
But the first time I tried it, it lied to me.
“That sofa looks comfortable!” ChatGPT said as I held up my phone and asked the bot to describe our living room. It had mistaken the ottoman for a couch.
“My mistake!” ChatGPT said when I corrected it. “Well, it still looks like a comfy space.”
It’s been nearly a year since OpenAI first demoed Advanced Voice Mode with Vision, which the company pitched as a step toward AI as depicted in the Spike Jonze movie “Her.” The way OpenAI sold it, Advanced Voice Mode with Vision would grant ChatGPT superpowers — enabling the bot to solve sketched-out math problems, read emotions, and respond to affectionate letters.
Has it achieved all that? More or less. But Advanced Voice Mode with Vision hasn’t solved ChatGPT’s biggest issue: reliability. If anything, the feature makes the bot’s hallucinations more obvious.
At one point, curious to see if Advanced Voice Mode with Vision could help ChatGPT offer fashion pointers, I enabled it and asked ChatGPT to rate an outfit of mine. It happily did so. But while the bot would give opinions on my jeans and olive-colored-shirt combo, it consistently missed the brown jacket I was wearing.
I’m not the only one who has encountered slipups.
When OpenAI president Greg Brockman showed off Advanced Voice Mode with Vision on “60 Minutes” earlier this month, ChatGPT made a mistake on a geometry problem. When calculating the area of a triangle, it misidentified the triangle’s height.
So my question is, what good is “Her”-like AI if you can’t trust it?
With each ChatGPT misfire, I felt myself becoming less and less inclined to reach into my pocket, unlock my phone, launch ChatGPT, open Advanced Voice Mode, and enable Vision — a cumbersome series of steps in the best of circumstances. With its bright and cheery demeanor, Advanced Voice Mode is clearly designed to engender trust. When it doesn’t deliver on that implicit promise, it’s jarring — and disappointing.
Perhaps OpenAI can solve the hallucinations problem once and for all someday. Until then, we’re stuck with a bot that views the world through criss-crossed wiring. And frankly, I’m not sure who might want that.
News
OpenAI’s 12 days of “shipmas” continues: OpenAI is releasing new products every day up until December 20. Here’s a roundup of all the announcements, which we’re updating regularly.
YouTube lets creators opt out: YouTube is giving creators more choice over how third parties can use their content to train their AI models. Creators and rights holders will be able to flag for YouTube if they’re permitting specific companies to train models on their clips.
Meta’s smart glasses get upgrades: Meta’s Ray-Ban Meta smart glasses have gotten several new AI-powered updates, including the ability to have an ongoing conversation with Meta’s AI and translate between languages.
DeepMind’s answer to Sora: Google DeepMind, Google’s flagship AI research lab, wants to beat OpenAI at the video-generation game. On Monday, DeepMind announced Veo 2, a next-gen video-generating AI that can create two-minute-plus clips in resolutions up to 4k (4,096 x 2,160 pixels).
OpenAI whistleblower found dead: A former OpenAI employee, Suchir Balaji, was recently found dead in his San Francisco apartment, according to the San Francisco Office of the Chief Medical Examiner. In October, the 26-year-old AI researcher raised concerns about OpenAI breaking copyright law when he was interviewed by The New York Times.
Grammarly acquires Coda: Grammarly, best known for its style and spell-check tools, has acquired productivity startup Coda for an undisclosed amount. As part of the deal, Coda’s CEO and co-founder, Shishir Mehrotra, will become the new CEO of Grammarly.
Cohere is working with Palantir: TechCrunch exclusively reported that Cohere, the enterprise-focused AI startup valued at $5.5 billion, has a partnership with data analytics firm Palantir. Palantir is vocal about its close — and at times controversial — work with U.S. defense and intelligence agencies.
Research paper of the week
Anthropic has pulled back the curtains on Clio (“Claude insights and observations”), a system that the company uses to understand how customers are employing its various AI models. Clio, which Anthropic compares to analytics tools such as Google Trends, is providing “valuable insights” for improving the safety of Anthropic’s AI, claims the company.
Anthropic tapped Clio to compile anonymized usage data, some of which the company made public last week. So what are customers using Anthropic’s AI for? A range of tasks — but web and mobile app development, content creation, and academic research top the list. Predictably, the use cases vary across languages; for example, Japanese speakers are more likely to ask Anthropic’s AI to analyze anime than Spanish speakers.
Model of the week
AI startup Pika released its next-gen video generation model, Pika 2, which can create a clip from a character, object, and location that users supply. Via Pika’s platform, users can upload multiple references (e.g., images of a boardroom and office workers) and Pika 2 will “intuit” the role of each reference before combining them into a single scene.
Now, no model’s perfect, of course. See the “anime” below created by Pika 2, which has impressive consistency but suffers from the aesthetic weirdness present in all generative AI footage.
pic.twitter.com/3jWCy4659o Like I said, Animes will be the first genre thats 100% AI generated. Its amazing to see what’s already possible with Pika 2.0
— Chubby♨️ (@kimmonismus) December 16, 2024
Still, the tools are very rapidly improving in the video domain — and in equal parts piquing the interest and raising the ire of creatives.
Grab bag
The Future of Life Institute (FLI), the nonprofit organization co-founded by MIT cosmologist Max Tegmark, released an “AI Safety Index” designed to evaluate the safety practices of leading AI companies across five key areas: current harms, safety frameworks, existential safety strategy, governance and accountability, and transparency and communication.
Meta was the worst of the bunch evaluated on the Index, with an overall F grade. (The Index uses a numerical and GPA-based scoring system.) Anthropic was the best but failed to manage better than a C — suggesting that there’s room for improvement.