Software

AI + ML

Google reports halving code migration time with AI help

Chocolate Factory slurps own dogfood, sheds drudgery in specific areas


Google, which peddles AI software with as much giddy enthusiasm as Microsoft, reports dogfooding its own AI concoction and leaving the lab with a pleasant taste in its mouth.

In a pre-print paper, Google computer scientists Stoyan Nikolov, Daniele Codecasa, Anna Sjovall, Maxim Tabachnyk, Satish Chandra, Siddharth Taneja, and Celal Ziftci answer the question posed by the paper's title: "How is Google using AI for internal code migrations?"

Inquiring minds want to know, particularly after Amazon claimed it used its Q Developer AI coding assistant to save hundreds of millions by migrating Java 8 applications to Java 17.

The aforementioned Chocolate Factory software engineers attempt to satisfy said curiosity by recounting how they applied large language models (LLMs) – AI in common parlance – to accelerate the process of moving code from one environment to another.

"We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs," the authors observe in their paper.

Their focus is on bespoke AI tools developed for specific product areas, such as Ads, Search, Workspace and YouTube, instead of generic AI tools that provide broadly applicable services like code completion, code review, and question answering.

Google's code migrations involved: changing 32-bit IDs in the 500-plus-million-line codebase for Google Ads to 64-bit IDs; converting its old JUnit3 testing library to JUnit4; and replacing the Joda time library with Java's standard java.time package.

The int32 to int64 migration, the Googlers explain, was not trivial as the IDs were often generically defined (int32_t in C++ or Integer in Java) and were not easily searchable. They existed in tens of thousands of code locations across thousands of files. Changes had to be tracked across multiple teams and changes to class interfaces had to be considered across multiple files.

"The full effort, if done manually, was expected to require hundreds of software engineering years and complex crossteam coordination," the authors explain.

For their LLM-based workflow, Google's software engineers implemented the following process.

An engineer from Ads would identify an ID in need of migration using a combination of code search, Kythe, and custom scripts.

Then an LLM-based migration toolkit, triggered by someone knowledgeable in the art, was run to generate verified changes containing code that passed unit tests. Those changes would be manually checked by the same engineer and potentially corrected.

Thereafter, the code changes would be sent to multiple reviewers who are responsible for the portion of the codebase affected by the changes.

The result was that 80 percent of the code modifications in the change lists (CLs) were purely the product of AI; the remainder were either human-authored or human-edited AI suggestions.

"We discovered that in most cases, the human needed to revert at least some changes the model made that were either incorrect or not necessary," the authors observe. "Given the complexity and sensitive nature of the modified code, effort has to be spent in carefully rolling out each change to users."

Based on this, Google undertook further work on LLM-driven verification to reduce the need for detailed review.

Even with the need to double-check the LLM's work, the authors estimate that the time required to complete the migration was reduced by 50 percent.

With LLM assistance, it took just three months to migrate 5,359 files and modify 149,000 lines of code to complete the JUnit3-JUnit4 transition. Approximately 87 percent of the code generated by AI ended up being committed with no changes.

As for the Joda-Java time framework switch, the authors estimate a time saving of 89 percent compared to the projected manual change time, though no specifics were provided to support that assertion.

"LLMs offer a significant opportunity for assisting, modernizing and updating large codebases," the authors conclude. "They come with a lot of flexibility, and thus, a variety of code transformation tasks can be framed in a similar workflow and achieve success. This approach has the potential to radically change the way code is maintained in large enterprises."

The Googlers also emphasize that LLMs should be viewed as complementary to traditional migration techniques that rely on Abstract Syntax Trees (ASTs) and grep-like searches. They note that additional tooling may be needed to prevent the human review process from becoming a bottleneck.

Another reason that LLMs should be used in conjunction with other tools is that they can be expensive – so it's best not to use them unnecessarily.

"Although the cost per token for predictions has steadily decreased, migrations often require touching thousands of files and the costs might quickly add up," the authors note.

Even so, there's no doubt AI has profoundly changed the way Google develops internal software. According to the paper, "the amount of characters in the code that are completed with AI-based assistance is now higher than manually typed by developers." ®

Send us news
14 Comments

Sage Copilot grounded briefly to fix AI misbehavior

'Minor issue' with showing accounting customers 'unrelated business information' required repairs

Microsoft eggheads say AI can never be made secure – after testing Redmond's own products

If you want a picture of the future, imagine your infosec team stamping on software forever

Megan, AI recruiting agent, is on the job, giving bosses fewer reasons to hire in HR

She doesn't feel pity, remorse, or fear, but she'll craft a polite email message as she turns you down

Google and Linux Foundation form Chromium love club

Right as Uncle Sam pushes for Chrome sell-off, eh?

OpenAI's ChatGPT crawler can be tricked into DDoSing sites, answering your queries

The S in LLM stands for Security

AI hype led to an enterprise datacenter spending binge in 2024 that won't last

GPUs and generative AI systems so hot right now... yet 'long-term trend remains,' says analyst

Uncle Sam now targets six landlord giants in war on alleged algorithmic rent fixing

One of ya is gonna sing like a canary, prosecutors say

Where does Microsoft's NPU obsession leave Nvidia's AI PC ambitions?

While Microsoft pushes AI PC experiences, Nvidia is busy wooing developers

UK government pledges law against sexually explicit deepfakes

Not just making them, but sharing them too

In AI agent push, Microsoft re-orgs to create 'CoreAI – Platform and Tools' team

Nad lad says 30 years of change happening in 3 years ... we're certainly feeling the compression of time

AI can improve on code it writes, but you have to know how to ask

LLMs do more for developers who already know what they're doing

UK unveils plans to mainline AI into the veins of the nation

Government adopts all 50 venture capitalist recommendations but leaves datacenter energy puzzle unsolved