How Good Is ChatGPT at Coding, Really? (2024)

This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.

Programmers have spent decades writing code for AI models, and now, in a full circle moment, AI is being used to write code. But how does an AI code generator compare to a human programmer?

A study published in the June issue of IEEE Transactions on Software Engineering evaluated the code produced by OpenAI’s ChatGPT in terms of functionality, complexity and security. The results show that ChatGPT has an extremely broad range of success when it comes to producing functional code—with a success rate ranging from anywhere as poor as 0.66 percent and as good as 89 percent—depending on the difficulty of the task, the programming language, and a number of other factors.

While in some cases the AI generator could produce better code than humans, the analysis also reveals some security concerns with AI-generated code.

Yutian Tang is a lecturer at the University of Glasgow who was involved in the study. He notes that AI-based code generation could provide some advantages in terms of enhancing productivity and automating software development tasks—but it’s important to understand the strengths and limitations of these models.

See Also

What Is Spectrum Mobile, and Is It Worth It?Spectrum Mobile Plans

“By conducting a comprehensive analysis, we can uncover potential issues and limitations that arise in the ChatGPT-based code generation... [and] improve generation techniques,” Tang explains.

To explore these limitations in more detail, his team sought to test GPT-3.5’s ability to address 728 coding problems from the LeetCode testing platform in five programming languages: C, C++, Java, JavaScript, and Python.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset.” —Yutian Tang, University of Glasgow

Overall, ChatGPT was fairly good at solving problems in the different coding languages—but especially when attempting to solve coding problems that existed on LeetCode before 2021. For instance, it was able to produce functional code for easy, medium, and hard problems with success rates of about 89, 71, and 40 percent, respectively.

“However, when it comes to the algorithm problems after 2021, ChatGPT’s ability to generate functionally correct code is affected. It sometimes fails to understand the meaning of questions, even for easy level problems,” Tang notes.

For example, ChatGPT’s ability to produce functional code for “easy” coding problems dropped from 89 percent to 52 percent after 2021. And its ability to generate functional code for “hard” problems dropped from 40 percent to 0.66 percent after this time as well.

“A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset,” Tang says.

Essentially, as coding evolves, ChatGPT has not been exposed yet to new problems and solutions. It lacks the critical thinking skills of a human and can only address problems it has previously encountered. This could explain why it is so much better at addressing older coding problems than newer ones.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems.” —Yutian Tang, University of Glasgow

Interestingly, ChatGPT is able to generate code with smaller runtime and memory overheads than at least 50 percent of human solutions to the same LeetCode problems.

The researchers also explored the ability of ChatGPT to fix its own coding errors after receiving feedback from LeetCode. They randomly selected 50 coding scenarios where ChatGPT initially generated incorrect coding, either because it didn’t understand the content or problem at hand.

While ChatGPT was good at fixing compiling errors, it generally was not good at correcting its own mistakes.

“ChatGPT may generate incorrect code because it does not understand the meaning of algorithm problems, thus, this simple error feedback information is not enough,” Tang explains.

The researchers also found that ChatGPT-generated code did have a fair amount of vulnerabilities, such as a missing null test, but many of these were easily fixable. Their results also show that generated code in C was the most complex, followed by C++ and Python, which has a similar complexity to the human-written code.

Tangs says, based on these results, it’s important that developers using ChatGPT provide additional information to help ChatGPT better understand problems or avoid vulnerabilities.

“For example, when encountering more complex programming problems, developers can provide relevant knowledge as much as possible, and tell ChatGPT in the prompt which potential vulnerabilities to be aware of,” Tang says.

From Your Site Articles

What to Do When the Ghost in the Machine Is You ›
How Coders Can Survive—and Thrive—in a ChatGPT World ›

Related Articles Around the Web

Coding Assistant - ChatGPT ›

How Good Is ChatGPT at Coding, Really? (2024)

Top Articles

The UPS Store | Ship & Print Here > 13940 Cedar Road

The UPS Store | Ship & Print Here > 2042 Town Center Blvd

Health Benefits of Guava

Mr Tire Prince Frederick Md 20678

Kobold Beast Tribe Guide and Rewards

Chase Bank Operating Hours

Jonathan Freeman : "Double homicide in Rowan County leads to arrest" - Bgrnd Search

라이키 유출

Best Restaurants Ventnor

6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com

Snow Rider 3D Unblocked Wtf

Carolina Aguilar Facebook

Xxn Abbreviation List 2023

Urban Airship Expands its Mobile Platform to Transform Customer Communications

Watch The Lovely Bones Online Free 123Movies

Zack Fairhurst Snapchat

Closest Bj Near Me

Long Island Jobs Craigslist

Hobby Stores Near Me Now

Culver's Flavor Of The Day Taylor Dr

Seeking Arrangements Boston

C&T Wok Menu - Morrisville, NC Restaurant

Dewalt vs Milwaukee: Comparing Top Power Tool Brands - EXTOL

Jayah And Kimora Phone Number

Play Tetris Mind Bender

Truck from Finland, used truck for sale from Finland

Mini-Mental State Examination (MMSE) – Strokengine

The Procurement Acronyms And Abbreviations That You Need To Know Short Forms Used In Procurement

Pioneer Library Overdrive

Korg Forums :: View topic

Does Royal Honey Work For Erectile Dysfunction - SCOBES-AR

N.J. Hogenkamp Sons Funeral Home | Saint Henry, Ohio

ATM, 3813 N Woodlawn Blvd, Wichita, KS 67220, US - MapQuest

Kokomo Mugshots Busted

P3P Orthrus With Dodge Slash

Navigating change - the workplace of tomorrow - key takeaways

2016 Honda Accord Belt Diagram

Rocketpult Infinite Fuel

Games R Us Dallas

Lake Andes Buy Sell Trade

VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium

Pulaski County Ky Mugshots Busted Newspaper

Po Box 101584 Nashville Tn

3500 Orchard Place

5103 Liberty Ave, North Bergen, NJ 07047 - MLS 240018284 - Coldwell Banker

Every Type of Sentinel in the Marvel Universe

Julies Freebies Instant Win

Land of Samurai: One Piece’s Wano Kuni Arc Explained

Latest Posts

Craigslist Salem Oregon Cars And Trucks - By Owner

Griselda The Forest Diplomacy

Article information

Author: Ms. Lucile Johns

Last Updated: 2024-09-19T08:08:31+07:00

Views: 6052

Rating: 4 / 5 (61 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ms. Lucile Johns

Birthday: 1999-11-16

Address: Suite 237 56046 Walsh Coves, West Enid, VT 46557

Phone: +59115435987187

Job: Education Supervisor

Hobby: Genealogy, Stone skipping, Skydiving, Nordic skating, Couponing, Coloring, Gardening

Introduction: My name is Ms. Lucile Johns, I am a successful, friendly, friendly, homely, adventurous, handsome, delightful person who loves writing and wants to share my knowledge and understanding with you.