Software

AI + ML

Google claims Big Sleep 'first' AI to spot freshly committed security bug that fuzzing missed

You snooze, you lose, er, win


Google claims one of its AI models is the first of its kind to spot a memory safety vulnerability in the wild – specifically an exploitable stack buffer underflow in SQLite – which was then fixed before the buggy code's official release.

The Chocolate Factory's LLM-based bug-hunting tool, dubbed Big Sleep, is a collaboration between Google's Project Zero and DeepMind. This software is said to be an evolution of earlier Project Naptime, announced in June. 

SQLite is an open source database engine, and the stack buffer underflow vulnerability could have allowed an attacker to cause a crash or perhaps even achieve arbitrary code execution. More specifically, the crash or code execution would happen in the SQLite executable (not the library) due to a magic value of -1 accidentally being used at one point as an array index. There is an assert() in the code to catch the use of -1 as an index, but in release builds, this debug-level check would be removed.

Thus, a miscreant could cause a crash or achieve code execution on a victim's machine by, perhaps, triggering that bad index bug with a maliciously crafted database shared with that user or through some SQL injection. Even the Googlers admit the flaw is non-trivial to exploit, so be aware that the severity of the hole is not really the news here – it's that the web giant believes its AI has scored a first.

We're told that fuzzing – feeding random and/or carefully crafted data into software to uncover exploitable bugs – didn't find the issue.

The LLM, however, did. According to Google, this is the first time an AI agent has found a previously unknown exploitable memory-safety flaw in widely used real-world software. After Big Sleep clocked the bug in early October, having been told to go through a bunch of commits to the project's source code, SQLite's developers fixed it on the same day. Thus the flaw was removed before an official release.

"We think that this work has tremendous defensive potential," the Big Sleep team crowed in a November 1 write-up. "Fuzzing has helped significantly, but we need an approach that can help defenders to find the bugs that are difficult (or impossible) to find by fuzzing, and we're hopeful that AI can narrow this gap." 

We should note that in October, Seattle-based Protect AI announced a free, open source tool that it claimed can find zero-day vulnerabilities in Python codebases with an assist from Anthropic's Claude AI model.

This tool is called Vulnhuntr and, according to its developers, it has found more than a dozen zero-day bugs in large, open source Python projects.

The two tools have different purposes, according to Google. "Our assertion in the blog post is that Big Sleep discovered the first unknown exploitable memory-safety issue in widely used real-world software," a Google spokesperson told The Register, with our emphasis added. "The Python LLM finds different types of bugs that aren't related to memory safety."

Big Sleep, which is still in the research stage, has thus far used small programs with known vulnerabilities to evaluate its bug-finding prowess. This was its first real-world experiment.

For the test, the team collected several recent commits to the SQLite repository. After manually removing trivial and document-only changes, "we then adjusted the prompt to provide the agent with both the commit message and a diff for the change, and asked the agent to review the current repository (at HEAD) for related issues that might not have been fixed," the team wrote.

The LLM, based on Gemini 1.5 Pro, ultimately found the bug, which was loosely related to changes in the seed commit [1976c3f7]. "This is not uncommon in manual variant analysis, understanding one bug in a codebase often leads a researcher to other problems," the Googlers explained.

In the write-up, the Big Sleep team also detailed the "highlights" of the steps that the agent took to evaluate the code, find the vulnerability, crash the system, and then produce a root-cause analysis.

"However, we want to reiterate that these are highly experimental results," they wrote. "The position of the Big Sleep team is that at present, it's likely that a target-specific fuzzer would be at least as effective (at finding vulnerabilities)." ®

Send us news
19 Comments

Google reports halving code migration time with AI help

Chocolate Factory slurps own dogfood, sheds drudgery in specific areas

Microsoft eggheads say AI can never be made secure – after testing Redmond's own products

If you want a picture of the future, imagine your infosec team stamping on software forever

Just as your LLM once again goes off the rails, Cisco, Nvidia are at the door smiling

Some of you have apparently already botched chatbots or allowed ‘shadow AI’ to creep in

Sage Copilot grounded briefly to fix AI misbehavior

'Minor issue' with showing accounting customers 'unrelated business information' required repairs

OpenAI's ChatGPT crawler can be tricked into DDoSing sites, answering your queries

The S in LLM stands for Security

3Blue1Brown copyright takedown blunder by AI biz blamed on human error

Worker copy-pasted wrong YouTube URL, says ChainPatrol

Biden signs sweeping cybersecurity order, just in time for Trump to gut it

Ransomware, AI, secure software, digital IDs – there's something for everyone in the presidential directive

UK unveils plans to mainline AI into the veins of the nation

Government adopts all 50 venture capitalist recommendations but leaves datacenter energy puzzle unsolved

Where does Microsoft's NPU obsession leave Nvidia's AI PC ambitions?

While Microsoft pushes AI PC experiences, Nvidia is busy wooing developers

Nvidia snaps back at Biden's 'innovation-killing' AI chip export restrictions

'New rule threatens to squander America's hard-won technological advantage' says GPU supremo

Microsoft, PC makers cut prices of Copilot+ gear in Europe, analyst stats confirm

Double-digit reduction only served to 'stimulate some interest'

Additional Microprocessors Decoded: Quick guide to what AMD is flinging out next for AI PCs, gamers, business

Plus: A peek at Nvidia's latest hype