Innovation

Updated on: September 27, 2024 at 2:27 AM PDT

The best AI for coding, and a bunch that failed miserably

I've been subjecting AI chatbots to a set of real-world programming tests. Which chatbots handled the challenge and which crawled home in shame? Read on.

Written by David Gewirtz, Senior Contributing Editor

Why you can trust ZDNET

24252627282930313233

Years of Experience

21222324252627282930

Product Reviewers

1,0002,0003,0004,0005,0006,0007,0008,0009,00010,000

Sqft of Lab Space

Learn Our Process

'ZDNET Recommends': What exactly does it mean?

ZDNET's recommendations are based on many hours of testing, research, and comparison shopping. We gather data from the best available sources, including vendor and retailer listings as well as other relevant and independent reviews sites. And we pore over customer reviews to find out what matters to real people who already own and use the products and services we’re assessing.

When you click through from our site to a retailer and buy a product or service, we may earn affiliate commissions. This helps support our work, but does not affect what we cover or how, and it does not affect the price you pay. Neither ZDNET nor the author are compensated for these independent reviews. Indeed, we follow strict guidelines that ensure our editorial content is never influenced by advertisers.

ZDNET's editorial team writes on behalf of you, our reader. Our goal is to deliver the most accurate information and the most knowledgeable advice possible in order to help you make smarter buying decisions on tech gear and a wide array of products and services. Our editors thoroughly review and fact-check every article to ensure that our content meets the highest standards. If we have made an error or published misleading information, we will correct or clarify the article. If you see inaccuracies in our content, please report the mistake via this form.

ZDNET's reviewers spend weeks to months testing each laptop on this list, using it for both everyday tasks like browsing, streaming, and gaming, to more performance-intensive work like photo and video editing. We aim to give you a view into how each laptop could actually fit into your life and workflow, and the pros and cons of their various features like performance, display, and battery life.

ChatGPT Plus | Best overall AI chatbot for coding

Best overall AI chatbot for coding

ChatGPT Plus

View now View at OpenAI

Perplexity Pro | Best AI chatbot for LLM testing

Best AI chatbot for LLM testing

Perplexity Pro

View now View at Perplexity.AI

ChatGPT Free | Best free AI chatbot for coding

Best free AI chatbot for coding

ChatGPT Free

View now View at OpenAI

Perplexity Free | Best free AI chatbot for coding and research

Best free AI chatbot for coding and research

Perplexity Free

View now View at Perplexity.AI

I've been around technology for long enough that very little excites me, and even less surprises me. But shortly after Open AI's ChatGPT was released, I asked it to write a WordPress plugin for my wife's e-commerce site. When it did, and the plugin worked, I was indeed surprised.

That was the beginning of my deep exploration into chatbots and AI-assisted programming. Since then, I've subjected 10 large machine models (LLMs) to four real-world tests.

How to use ChatGPT to write: Resumes | Excel formulas | Essays | Cover letters

Unfortunately, not all chatbots can code alike. It's been 18 months since that first test, and even now, five of the 10 LLMs I tested can't create working plugins.

In this article, I'll show you how each LLM performed against my tests. There are two chatbots I recommend you use, but they cost $20/month. The free versions of the same chatbots do well enough that you could probably get by without paying. But the rest, whether free or paid, are not so great. I won't risk my programming projects with them or recommend that you do until their performance improves.

Also: How I test an AI chatbot's coding ability - and you can too

I've written a lot about using AIs to help with programming. Unless it's a small, simple project, like my wife's plugin, AIs can't write entire apps or programs. But they excel at writing a few lines and are not bad at fixing code.

Rather than repeat everything I've written, go ahead and read this article: How to use ChatGPT to write code: What it can and can't do for you.

If you want to understand my coding tests, why I've chosen them, and why they're relevant to this review of the 10 LLMs, read this article: How I test an AI chatbot's coding ability - and you can too.

Let's start with a comparative look at how the chatbots performed:

Next, let's look at each chatbot individually. I'll discuss nine chatbots, even though the above chart shows 10 LLMs. The results for GPT-4 and GPT-4o are both included in ChatGPT Plus. Ready? Let's go.

View now at OpenAI

Price: $20/mo
LLM: GPT-4o, GPT-4, GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: Yes
Dedicated Windows app: No
Multi-factor authentication: Yes
Tests passed: 4 of 4

ChatGPT Plus with GPT-4 and GPT-4o passed all my tests. One of my favorite features is the availability of a dedicated app. When I test web programming, I have my browser set on one thing, my IDE open, and the ChatGPT Mac app running on a separate screen.

Also: I put GPT-4o through my coding tests and it aced them - except for one weird result

In addition, Logitech's Prompt Builder, which pops up using a mouse button, can be set up to use the upgraded GPT-4o and connect to your OpenAI account, making it a simple thumb-tap to run a prompt, which is very convenient.

The only thing I didn't like was that one of my GPT-4o tests resulted in a dual-choice answer, and one of those answers was wrong. I'd rather it just gave me the correct answer. Even so, a quick test confirmed which answer would work. But that issue was a bit annoying. I didn't have that issue in GPT-4, so for now, that's the LLM setting I use with ChatGPT when coding.

Pros

Passed all tests
Solid coding results
Mac app

Cons

Hallucinations
No Windows app yet
Sometimes uncooperative

ChatGPT Plus

Best overall AI chatbot for coding

Price: $20/mo
LLM: GPT-4o, GPT-4, GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: Yes
Dedicated Windows app: No
Multi-factor authentication: Yes
Tests passed: 4 of 4

Also: I put GPT-4o through my coding tests and it aced them - except for one weird result

Show Expert Take Show less

View now at Perplexity.AI

Price: $20/mo
LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Llama 3.1 405B
Desktop browser interface: Yes
Dedicated Mac app: No
Dedicated Windows app: No
Multi-factor authentication: No
Tests passed: 4 of 4

I seriously considered listing Perplexity Pro as the best overall AI chatbot for coding, but one failing kept it out of the top slot: how you log in. Perplexity doesn't use username/password or passkey, and doesn't have multi-factor authentication. All the tool does is email you a login pin. The AI also doesn't have a separate desktop app, as ChatGPT does for Macs.

What sets Perplexity apart from other tools is that it can run multiple LLMs. While you can't set an LLM for a given session, you can easily go into the settings and choose the active model.

Also: Can Perplexity Pro help you code? It aced my programming tests - thanks to GPT-4

For programming, you'll probably want to stick to GPT-4o, because that aced all our tests. But it might be interesting to cross-check code across the different LLMs. For example, if you have GPT-4o write some regular expression code, you might consider switching to a different LLM to see what that LLM thinks of the generated code.

As we'll see below, most LLMs are unreliable, so don't take the results as gospel. However, you can use the results to give you more things to check your original code. It's sort of like an AI-driven code review.

Just don't forget to switch back to GPT-4o.

Pros

Multiple LLMs
Search criteria displayed
Good sourcing

Cons

Email-only login
No desktop app

Perplexity Pro

Best AI chatbot for LLM testing

Price: $20/mo
LLM: GPT-4o, Claude 3.5 Sonnet, Sonar Large, Claude 3 Opus, Llama 3.1 405B
Desktop browser interface: Yes
Dedicated Mac app: No
Dedicated Windows app: No
Multi-factor authentication: No
Tests passed: 4 of 4

What sets Perplexity apart from other tools is that it can run multiple LLMs. While you can't set an LLM for a given session, you can easily go into the settings and choose the active model.

Also: Can Perplexity Pro help you code? It aced my programming tests - thanks to GPT-4

Just don't forget to switch back to GPT-4o.

Show Expert Take Show less

View now at OpenAI

Price: Free
LLM: GPT-4o, GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: Yes
Dedicated Windows app: No
Multi-factor authentication: Yes
Tests passed: 3 of 4 in GPT-3.5 mode

ChatGPT is available to anyone for free. While both the Plus and free versions support GPT-4o, which passed all my programming tests, there are limitations when using the free app.

OpenAI treats free ChatGPT users as if they're in the cheap seats. If traffic is high or the servers are busy, the free ChatGPT will only make GPT-3.5 available to free users. The tool will only allow you a certain number of queries before it downgrades or shuts you off.

Also: How to use ChatGPT: What you need to know now

I've had several occasions when the free version of ChatGPT effectively told me I'd asked too many questions.

ChatGPT is a great tool, as long as you don't mind getting shut down sometimes. Even GPT-3.5 did better on the tests than all the other chatbots, and the test it failed was for a fairly obscure programming tool produced by a lone programmer in Australia.

So, if budget is important to you and you can wait when cut off, go for ChatGPT free.

Pros

Free
Passed most tests

Cons

Prompt throttling
Could cut you off in the middle of whatever you're working on

ChatGPT Free

Best free AI chatbot for coding

Price: Free
LLM: GPT-4o, GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: Yes
Dedicated Windows app: No
Multi-factor authentication: Yes
Tests passed: 3 of 4 in GPT-3.5 mode

ChatGPT is available to anyone for free. While both the Plus and free versions support GPT-4o, which passed all my programming tests, there are limitations when using the free app.

Also: How to use ChatGPT: What you need to know now

I've had several occasions when the free version of ChatGPT effectively told me I'd asked too many questions.

So, if budget is important to you and you can wait when cut off, go for ChatGPT free.

Show Expert Take Show less

View now at Perplexity.AI

Price: Free
LLM: GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: No
Dedicated Windows app: No
Multi-factor authentication: No
Tests passed: 3 of 4

I'm threading a pretty fine needle here, but because Perplexity AI's free version is based on GPT-3.5, the test results were measurably better than the other AI chatbots.

Also: 5 reasons why I prefer Perplexity over every other AI chatbot

From a programming perspective, that's pretty much the whole story. But from a research and organization perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the other AIs.

He likes how Perplexity provides more complete sources for research questions, cites its sources, organizes the replies, and offers questions for further searches.

So if you're programming, but also doing other research, consider the free version of Perplexity.

Pros

Free
Passed most tests
Range of research tools

Cons

Limited to GPT-3.5
Throttles prompt results

Perplexity Free

Best free AI chatbot for coding and research

Price: Free
LLM: GPT-3.5
Desktop browser interface: Yes
Dedicated Mac app: No
Dedicated Windows app: No
Multi-factor authentication: No
Tests passed: 3 of 4

I'm threading a pretty fine needle here, but because Perplexity AI's free version is based on GPT-3.5, the test results were measurably better than the other AI chatbots.

Also: 5 reasons why I prefer Perplexity over every other AI chatbot

From a programming perspective, that's pretty much the whole story. But from a research and organization perspective, my ZDNET colleague Steven Vaughan-Nichols prefers Perplexity over the other AIs.

He likes how Perplexity provides more complete sources for research questions, cites its sources, organizes the replies, and offers questions for further searches.

So if you're programming, but also doing other research, consider the free version of Perplexity.

Show Expert Take Show less

Chatbots to avoid for programming help

I tested nine chatbots, and four passed most of my tests. The other chatbots, including a few pitched as great for programming, each only passed one of my tests -- and Microsoft's Copilot didn't pass any.

I'm mentioning them here because people will ask, and I did test them thoroughly. Some bots do just fine for other work, so I'll point you to their general reviews if you're just curious about how they function.

Meta AI

Meta AI is Facebook's general-purpose AI. As you can see above, it failed three of our four tests.

Also: How to get started with Meta AI in Facebook, Instagram, and more

The AI did generate a nice user interface but with zero functionality. And it did find my annoying bug, which is a fairly serious challenge. Given the specific knowledge required to find the bug, I was surprised it choked on a simple regular expression challenge. But it did.

Meta Code Llama

Meta Code Llama is Facebook's AI designed specifically for coding help. It's something you can download and install on your server. I tested it running on a Hugging Face AI instance.

Also: Can Meta AI code? I tested it against Llama, Gemini, and ChatGPT - it wasn't even close

Weirdly, even though both Meta AI and Meta Code Llama choked on three of four of my tests, they choked on different problems. AIs can't be counted on to give the same answer twice, but this result was a surprise. We'll see if that changes over time.

Claude 3.5 Sonnet

Anthropic claims the 3.5 Sonnet version of its Claude AI chatbot is ideal for programming. After failing all but one test, I'm not so sure.

If you're not using it for programming, Claude may be a better choice than the free version of ChatGPT.

Also: 4 things Claude AI can do that ChatGPT can't

My ZDNET colleague Maria Diaz reports that Claude can handle uploaded files, process more words than the free version of ChatGPT, provide information roughly a year more current than GPT-3.5, and access websites.

Gemini Advanced

Gemini Advanced is Google's $20 pro version of its Gemini (formerly Bard) chatbot. I expected the tool to do better than one out of four. Interestingly, it passed the one test that every AI other than GPT-4/4o failed -- knowledge of that fairly obscure programming language produced by one programmer in Australia.

Also: 3 ways Gemini Advanced beats other AI assistants, according to Google

So, if it knew that language, why couldn't it handle basic regular expressions or other first-year programming student problems?

Microsoft Copilot

You'd think the company with the "Developers! Developers! Developers!" mantra in its DNA would have an AI that does better on the programming tests. Microsoft produces some of the best coding tools on the planet. And yet, Copilot did badly.

Also: What are Microsoft's different Copilots? Here are the differences and how you can use them

The one positive thing is that Microsoft always learns from its mistakes. So, I'll check back later and see if this result improves.

But I like [insert name here]. Does this mean I have to use a different chatbot?

Probably not. I've limited my tests to day-to-day programming tasks. None of the bots has been asked to talk like a pirate, write prose, or draw a picture. In the same way we use different productivity tools to accomplish specific tasks, feel free to choose the AI that helps you complete the task at hand.

The only issue is if you're on a budget and are paying for a pro version. Then, find the AI that does most of what you want, so you don't have to pay for too many AI add-ons.

It's only a matter of time

The results of my tests were fairly surprising, especially given the big investments of Microsoft and Google. But this area of innovation is improving at warp speed, so we'll be back with updated tests and results over time. Stay tuned.

Have you used any of these AI chatbots for programming? What has your experience been? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Editorial standards

Show Comments