Breaking
April 18, 2025

Microsoft study claims AI is still struggling to debug software | usagoldmines.com


  • AI promises a huge revolution for developers, but is it just for code creation?
  • Popular AI models from Anthropic and OpenAI aren’t great at debugging
  • Microsoft’s researchers are open-sourcing their tools to facilitate research

Although generative AI is increasingly being integrated into programming workflows, new research from Microsoft reveals that large language models still aren’t quite up to scratch when it comes to debugging.

The research suggests that even advanced models still struggle with debugging tasks that are pretty simple for experienced developers, highlighting the continued importance of human programmers.

AI does appear to have a solid use case, though, with Google now claiming that around 25% of new code is AI-generated. Meta has also noted the wide deployment of AI for coding.

AI is good for code creation, but not for debugging

The report explores how 11 Microsoft researchers tested nine AI models on SWE-bench Lite – a popular debugging benchmark. Claude 3.7 Sonnet offered the highest success rate at a far-from-perfect 48.4%. OpenAI’s o1 and o3-mini posted lower success rates of 30.2% and 22.1% respectively.

“Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench Lite issues,” the researchers wrote, blaming the suboptimal performance on a lack of data representing sequential decision-making behavior.

All hope is not lost, though. “We believe that training or fine-tuning LLMs can enhance their interactive debugging abilities,” they added. The researchers intend to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs, but in the meantime, they promise to open-source debug-gym to make it easier for others to conduct similar research.

Debug-gym is described as an “environment that allows code-repairing agents to access tools for active information-seeking behavior.”

However, for now, artificial intelligence might not be bringing as much value to developers’ lives as AI companies suggest. “Most developers spend the majority of their time debugging code,” the researchers wrote, indicating that even if they are benefitting from code generation, it might not be saving them that much time.

You might also like

​ 

This articles is written by : Nermeen Nabil Khear Abdelmalak

All rights reserved to : USAGOLDMIES . www.usagoldmines.com

You can Enjoy surfing our website categories and read more content in many fields you may like .

Why USAGoldMines ?

USAGoldMines is a comprehensive website offering the latest in financial, crypto, and technical news. With specialized sections for each category, it provides readers with up-to-date market insights, investment trends, and technological advancements, making it a valuable resource for investors and enthusiasts in the fast-paced financial world.

Recent:

Best live TV streaming service: YouTube TV vs Sling TV vs Hulu + Live TV and the rest | usagoldmine...

Best Chromebooks 2025: Best overall, best battery life, and more | usagoldmines.com

I want to upgrade my laptop to Windows 11. Microsoft won’t let me | usagoldmines.com

Buying a USB-C cable? Beware these 6 crucial gotchas | usagoldmines.com

I started ‘vibe coding’ my own apps with AI. I’m absolutely loving it | usagoldmines.com

Samsung just made the best glasses-free 3D monitor I’ve tried yet | usagoldmines.com

Best gaming laptops under $1,000: Expert picks that won’t break the bank | usagoldmines.com

This Ryzen 7 mini PC stacked with 32GB RAM is super cheap: $279 | usagoldmines.com

SHIELD TV Units Getting Hotfix Update to Squash Bugs Tim | usagoldmines.com

You Can Get Both Windows 11 Pro and Office 2019 on Sale for $46 Right Now Pradershika Sharma | usago...

The MacRumors Show: John Gruber Talks Apple Intelligence and the Future of the Company Hartley Charl...

Andor season 2 cast and character guide: who's who in the highly-rated Star Wars TV show's final cha...

State-sponsored actors spotted using ClickFix hacking tool developed by criminals | usagoldmines.co...

Score Acer’s touchscreen AI laptop with 16GB RAM for just $570 | usagoldmines.com

Graphics cards are huge now. Do you need a GPU brace to protect your PC? | usagoldmines.com

HP pays out $4 million in class action suit for false advertising | usagoldmines.com

OpenAI’s latest AI models can ‘think with images’ and combine tools | usagoldmines.com

This Massive Insurance Data Breach Leaked 1.6 Million Users' Information Emily Long | usagoldmines.c...

You Can Get This Kodak Instant Photo Printer on Sale for $70 Right Now Pradershika Sharma | usagoldm...

Nintendo Finally Announced a New Preorder Date for the Switch 2 Jake Peterson | usagoldmines.com

Entertainment venue management giant Legends International reveals major data breach | usagoldmines...

Food retail giant behind several major US supermarket brands confirms data stolen in major ransomwar...

Cupra is all about affordable cars, funky styling, electrified performance Jonathan M. Gitlin | usag...

Trump’s tariffs trigger price hikes at large online retailers Ashley Belanger | usagoldmines.com

Tested! These are the best USB-C cables for charging and data transfers | usagoldmines.com

Over 50 malicious Chrome extensions are secretly spying on you | usagoldmines.com

I block every ad on YouTube. I’m not ashamed to admit it | usagoldmines.com

Discord is making some users verify their age using face and ID scans | usagoldmines.com

How to Quickly Set Up Your New Mac David Nield | usagoldmines.com

This LG OLED TV Is at Its Lowest Price Ever Right Now Pradershika Sharma | usagoldmines.com

Best Apple Deals of the Week: Anker's 20% Sitewide Sale Exclusive to MacRumors Readers, Plus Big Sal...

IBM orders workers back to the office, or face the consequences | usagoldmines.com

Tesla really wants you to buy its Cybertruck, with huge discounts and perks thrown in to clear its g...

7 new movies and TV shows to stream on Netflix, Prime Video, Max, and more this weekend (April 18) t...

From novelty to nuisance: The AI revolution no one wanted is sweeping all before it | usagoldmines....

Super apps deserve a second chance | usagoldmines.com

How to become an intrapreneur in AI headwinds | usagoldmines.com

Nintendo Switch 2 pre-orders will start in the United States on April 24, and the price is not incre...

NYT Connections hints and answers for Saturday, April 19 (game #678) | usagoldmines.com

NYT Strands hints and answers for Saturday, April 19 (game #412) | usagoldmines.com

Quordle hints and answers for Saturday, April 19 (game #1181) | usagoldmines.com

Nintendo raises planned Switch 2 accessory prices amid tariff “uncertainty” Kyle Orland | usagoldmin...

This fast Anker power bank has a built-in USB-C cable — it’s only $16 | usagoldmines.com

This $820 RTX-powered HP gaming laptop is a killer value buy | usagoldmines.com

Pick up Anker’s 5-port USB-C hub with 4K HDMI support for just $25 | usagoldmines.com

These Sennheiser Earbuds Are at Their Lowest Price Right Now Pradershika Sharma | usagoldmines.com

Netflix's New AI Search Feature Will Understand Your Viewing Moods Tim Hardwick | usagoldmines.com

Everything new on Hulu in May 2025 – stream my favorite Pamela Anderson movie, celebrate Star Wars D...

This Tie Fighter stand for the Echo Dot lets your Alexa smart speaker join the dark side jacob.krol@...

US government flags worrying SonicWall flaw, so update now | usagoldmines.com

Assassin’s Creed Shadows is the dad rock of video games, and I love it Samuel Axon | usagoldmines.co...

Sunderfolk review: RPG magic that transports your friends together Kevin Purdy | usagoldmines.com

How Magento 2 Australia Post Shipping Can Boost Your eCommerce Business Anuradha Sinha | usagoldmine...

Seven Strategies for Making the Most Out of Your Small Garden Amanda Blum | usagoldmines.com

AirPods Pro 3 Just Months Away – Here's What We Know Tim Hardwick | usagoldmines.com

HP agrees million-dollar settlement over "false advertising" on PCs, keyboards | usagoldmines.com

Recap: Wheel of Time’s third season balefires its way to a hell of a finish Andrew Cunningham & ...

5 crucial Windows 11 settings I always change ASAP | usagoldmines.com

A critical Erlang/OTP security flaw is "surprisingly easy" to exploit, experts warn - so patch now ...

Rocket Report: Daytona rocket delayed again; Bahamas tells SpaceX to hold up Eric Berger | usagoldmi...

The iPhone 18 is again tipped to get a major performance boost – but price hikes could follow | usa...

The iPhone 16 Pro Max helped me see – with a little help from the Samsung Galaxy S25 Ultra | usagol...

Google "could face breakup" after being found guilty of having illegal ad tech monopolies | usagold...

iPhone Shipments Down 9% in China's Q1 Smartphone Boom Tim Hardwick | usagoldmines.com

Leaked Razr Plus 2025 specs may have revealed everything about Motorola's next flip foldable | usag...

British businesses are getting used to AI at work - but there are still plenty of hurdles to overcom...

The engineer's guide to staying ahead of cyber threats | usagoldmines.com

Yellowjackets season 3 finale made me shocked, surprised and sad – here are 3 things you may have mi...

We just saw the end of the desktop scanner | usagoldmines.com

What is the release date and launch time for The Last of Us season 2 episode 2? tom.power@futurenet....

AI in the workplace: why upskilling, not fear, is the key to AI collaboration | usagoldmines.com

Star Wars Celebration is in full swing, and Lucasfilm just dropped more details on its Beyond Victor...

You don't have to pay for Google Gemini to comment on what you're looking at on your phone anymore e...

Resist, eggheads! Universities are not as weak as they have chosen to be. Ars Staff | usagoldmines.c...

There’s a secret reason the Space Force is delaying the next Atlas V launch Stephen Clark | usagoldm...

This Is the Best Free Weather App for Windows Justin Pot | usagoldmines.com

You can't hide from ChatGPT – new viral AI challenge can geo-locate you from almost any photo – we t...

Meta is set to train its AI models with Europeans' public data, and you can stop it doing so chiara....

Company apologizes after AI support agent invents policy that causes user uproar Benj Edwards | usag...

A Guide to Freezing Practically Any Food Allie Chanthorn Reinmann | usagoldmines.com

Don't Fall for This New Gmail Phishing Scheme Emily Long | usagoldmines.com

Verizon Updates Ultimate 5G Plan With More Data and New Features Juli Clover | usagoldmines.com

At monopoly trial, Zuckerberg redefined social media as texting with friends Ashley Belanger | usago...

Prominent nutrition researcher resigns from NIH over scientific censorship Beth Mole | usagoldmines....

When to Bring a Problem to HR (and When Not To) Jeff Somers | usagoldmines.com

Instagram Announces 'Blend' Shared Reel Feeds Juli Clover | usagoldmines.com

Trump’s FCC chair threatens Comcast, demands changes to NBC news coverage Jon Brodkin | usagoldmines...

HP agrees to $4M settlement over claims of “falsely advertising” PCs, keyboards Scharon Harding | us...

What Strava Buying Runna Will Mean for Both Running Apps Beth Skwarecki | usagoldmines.com

How Apple CEO Tim Cook Convinced Trump to Exempt Apple From Tariffs Juli Clover | usagoldmines.com

Walmart's online store is down – here's the latest on the shopping giant's site problems jacob.krol@...

US Interior secretary orders offshore wind project shut down John Timmer | usagoldmines.com

Android 16 Beta 4 Available for Pixel Devices Kellen | usagoldmines.com

My Favorite Amazon Deal of the Day: The Google TV Streamer 4K Daniel Oropeza | usagoldmines.com

Grok Can Now Remember Your Past Conversations Jake Peterson | usagoldmines.com

Gemini 2.5 Flash comes to the Gemini app, gives developers control over “thinking” Ryan Whitwam | us...

I switched to Instagram’s X rival, Threads, for a month… and I kind of like it? | usagoldmines.com

Watch: Google Hosts TED Talk and Demos Android XR Glasses Tim | usagoldmines.com

How to Tell If Your Running Shoes Fit Correctly Meredith Dietz | usagoldmines.com

Beats Highlights New USB-C Cables in Latest 'Pill People' Ad Juli Clover | usagoldmines.com

Leave a Reply