ChatGPT 3.5 Review: First Doesn't Mean Best

Putting questions into Google yields links to various articles and Reddit threads with people giving their research and opinions.

ChatGPT does that synthesis for you.

There are different tiers to ChatGPT.

AI powered mobile phones

The one available for free, version 3.5, uses 175 billion parameters.

Think of parameters as the number of pieces of information.

The more parameters a model has, the better it can understand language and produce nuanced sentences.

For this review, I tested the free version.

How CNET tests AI chatbots

CNET takes a practical approach to reviewing AI chatbots.

Our goal is to determine how good it is relative to the competition and which purposes it serves best.

See our page onhow we test AIfor more.

For more information, see OpenAI’sprivacy policy.

Shopping

It’s hard to recommend ChatGPT 3.5 as a shopping aid.

Because its training data is only up untilSeptember 2021, it lacks information about newly released products.

Recipes

Searching for recipes on Google can be a slog.

These types of recipes might not be as easily available via a Google search.

At the same time, ChatGPT 3.5’s recipe generation lacked context.

ChatGPT’s version of this recipe was noticeably barebones by comparison.

When asking Gemini this same question, it was able to include ingredients like Kashmiri chili powder and amchur.

However, no AI chatbot excelled at this test.

It’s also handy if generative AI can pull up the sources it’s referencing.

However, ChatGPT 3.5 doesn’t source much at all.

The inability to easily cross-reference sources makes the actual real-world usefulness of ChatGPT questionable.

Sure, among friends, you may cite ChatGPT and get away with it.

It seems that OpenAI has tweaked ChatGPT to often not point to specific papers or sources when asked.

This could have been because in the past it would make up papers that didn’t exist.

Copilot, in creative mode, also performed similarly to Claude, finding the nuances in a complex topic.

And Perplexity, while it did a decent job, worked in sources that weren’t academically reliable.

Summarizing

ChatGPT 3.5 certainly shows its limits when asked to summarize an article.

I pasted the entire article into 3.5 and the summary it yielded was lacking.

It picked up the background information and mentioned the main thesis, but failed to bring the point home.

It also abruptly stopped its summary, ending midway through a sentence.

Perplexity and Claude failed to get the full scope of my article.

Travel

Looking up travel ideas for major cities like Los Angeles or Tokyo isn’t hard.

What about Columbus, Ohio, though?

It recommended places to see as well as restaurants to visit.

And unlike Google Gemini, all the restaurants it recommended were actually real.

Why Gemini was more prone to hallucinations than ChatGPT in this test isn’t clear.

But it does point to how much tuning OpenAI has done to ensure information remains accurate.

Generally, people like travel plans that don’t repeat locations.

Perplexity made vague recommendations whereas Claude performed well, but had one error.

Gemini hallucinated the most, making up the names of restaurants that didn’t exist.

Writing emails

ChatGPT does well in writing basic emails.

Still, it’ll require some tweaking to sound believably human.

Even when asked to dilute some of the formality, it can still come across as robotic.

Comparatively, Gemini wrote emails well and was easy to tune to make it sound more casual and humanlike.

Claude performed the best, crafting sentences with great nuance and believability.

Copilot had no problems writing basic emails, but it refused to answer prompts about more controversial topics.

That’s not to say that ChatGPT 3.5 should be used as an end-all solution.

How CNET tests AI chatbots#

Shopping#

Recipes#

Summarizing#

Travel#

Writing emails#