Tech

Chatbot Wars – Who Makes the Best AI Chatbot?

Published

7 months ago

May 27, 2024

Admin

Chatbot Wars – Who Makes the Best AI Chatbot?

The WSJ tested 5 chatbots: OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, Perplexity, and Anthropic’s Claude. Care to guess the winner and loser?

The Great AI Challenge

Please consider The Great AI Challenge: We Test Five Top Bots on Useful, Everyday Skills.

That’s a free link, which I use very sparingly to not abuse privileges. Here are a few snips.

Human-sounding bots barely existed two years ago. Now they’re everywhere. There’s ChatGPT, which kicked off the whole generative-AI craze, and big swings from Google and Microsoft, plus countless other smaller players, all with their own smooth-talking helpers.

We put five of the leading bots through a series of blind tests to determine their usefulness. While we hoped to find the Caitlin Clark of chatbots, that wasn’t exactly what happened. They excel in some areas and fail in others. Plus, they’re all evolving rapidly. During our testing, OpenAI released an upgrade to ChatGPT that improved its speed and current-events knowledge.

We wanted to see the range of responses we’d get asking real-life questions and ordering up everyday tasks—not a scientific assessment, but one that reflects how we’ll all use these tools. Consider it the chatbot Olympics.

We have ChatGPT by OpenAI, celebrated for its versatility and ability to remember user preferences. (Wall Street Journal owner News Corp has a content-licensing partnership with OpenAI.) Anthropic’s Claude, from a socially conscious startup, is geared to be inoffensive. Microsoft’s Copilot leverages OpenAI’s technology and integrates with services like Bing and Microsoft 365. Google’s Gemini accesses the popular search engine for real-time responses. And Perplexity is a research-focused chatbot that cites sources with links and stays up to date.

While each of these services offer a no-fee version, we used the $20-a-month paid versions for enhanced performance, to assess their full capabilities across a wide range of tasks. (We used the latest ChatGPT GPT-4o model and Gemini 1.5 Pro model in our testing.)

Creative writing

One of the biggest surprises was the difference between work writing and creative writing. Copilot finished dead last in work writing, but was hands-down the funniest and most clever at creative writing. We asked for a poem about a poop on a log. We asked for a wedding toast featuring the Muppets. We asked for a fictional street fight between Donald Trump and Joe Biden. With Copilot, the jokes kept coming. Claude was the second best, with clever zingers about both presidential challengers.

Overall results

What did these Olympian challenges tell us? Each chatbot has unique strengths and weaknesses, making them all worth exploring. We saw few outright errors and “hallucinations,” where bots go off on unexpected tangents and completely make things up. The bots provided mostly helpful answers and avoided controversy.
The biggest surprise? ChatGPT, despite its big update and massive fame, didn’t lead the pack. Instead, lesser-known Perplexity was our champ. “We optimize for conciseness,” says Dmitry Shevelenko, chief business officer at Perplexity AI. “We tuned our model for conciseness, which forces it to identify the most essential components.”

Congrats to Perplexity

Judging from the scores, it appears to be close between Perplexity and ChatGPT.

Copilot was shockingly bad, winning only creative writing while coming in last 5 out of 8 categories.

There was no bias in judging because the judges just rated the answers without knowing who provided them.

If I was to scored this I would award 4 points to the winner , 3 for second place, 2 for third and 1 for fourth.

I would use speed as a tie breaker. Accuracy is far more important than speed.

Finally, I would subtract 2 points for terrible answers. Copilot has one on food. When asked to make a recipe that addresses many dietary restrictions, it gave an answer that included two sticks of butter and 4 large eggs.

Perplexity has 3 firsts, 1 second, 4 thirds and no fourths. It was last only in speed, my tiebreaker.

My ranking on that scoring system (I do not know what the WSJ did) is as follows.

Perplexity: 23
ChatGPT: 18 (22 if you award 4 for speed)
Gemini: 18 (21 if you award 3 for speed)
Claude: 15 (16 if you award 1 for speed)
Copilot: 9 (11 if you count 2 for speed) but subtract 2 for terrible answers

For Me Personally

For me personally, it’s not even close. Consider the category summarizing things from the web.

Even the premium Claude account wasn’t able to handle web links.

Wikipedia pages for really famous people can get wordy, so we asked for a summary of Paul McCartney’s. Some provided short blurbs with obvious Beatle factoids. Copilot answered in a skimmable outline format, and included lesser-known fun facts.
Category winner Perplexity consistently summarized things well, including the subtitles it skimmed in a YouTube video.

Scanning subtitles from a YouTube video is very impressive. In contrast to the rest, Perplexity cites sources with links and stays up to date.

I won’t use anything without a link.

Claude responded “I apologize, but I am not able to open URLs, links or videos,” making it useless for anything.

If you are into writing fiction and humor, you might wish to try Microsoft’s Copilot.

I have not tried any of them except a very early version of ChatGPT.

The AI Boom (or Do I Mean Bust)

Also consider Tech Workers Retool for Artificial-Intelligence Boom

Tech workers are feverishly retooling their skill sets for a time when every company suddenly wants to be an artificial-intelligence company—and every worker feels the need for AI chops.

“I’ve been leading with an AI-tailored resume for the last two to three months,” says Asif Dhanani, 31 years old, of Irvine, Calif., who was laid off from his job as a technical product manager at Amazon in March.

Dhanani has landed plenty of interviews for AI product manager roles, but he hasn’t received any offers.

The tech labor market is in an unbalanced state. There is demand for a specific type of tier-one AI talent—namely those who have the technical knowledge or experience working with large language models, or LLMs, that fuel chatbots with the ability to generate content. There are companies seeking candidates with those skills, but not enough workers who are qualified to do them.

Then there is everyone else. Thousands of people have been laid off in the past few years, and many of those who remain employed are dealing with new management styles, reorganizations and microcuts, as more resources get shifted into AI. Those workers are now taking courses in AI, adding buzzwords to their résumés and competing in an increasingly crowded field.

Tony Phillips, co-founder of the Deep Atlas boot camp, says he has noticed a significant increase in the level of urgency that tech workers feel about the need to upskill. Deep Atlas recently added another five slots to their summer AI boot camp.

“People started to see the writing on the wall that their jobs really could be obsolete,” he says. “You’re probably not going to get replaced by AI. You’re going to be replaced by someone who knows AI and does your job.”

New tech job postings fell from an average of around 308,000 a month in 2019 to 180,000 a month as of April, according to the tech trade association CompTIA.

Large tech companies are trying to make their entire workforces more AI-proficient. Trailhead, Salesforce’s training platform, currently offers 43 AI-related courses ranging from fundamentals to ethical AI use. Over 60,000 Salesforce employees have taken at least one AI course.

“We believe that everyone should be reskilled and in some way have the tools they need to have to succeed in this new world,” says Jayesh Govindarajan, senior vice president of Salesforce AI.

Creative Destruction

A big creative destruction bust cycle is on the horizon. Many people are going to lose their jobs.

However, we tend to overestimate the speed at which things happen.

I am not sure where we are, and I doubt anyone else does either. But I do know we have serious supply issues and energy concerns over all the data centers we need to support AI.

Data Center Energy Consumption

According to the IEA, electricity consumption from data centres, artificial intelligence (AI) and the cryptocurrency sector could double by 2026. Factor in the global push to EVs.