On-Prem

Systems

The sad tale of the Alpha massacre

Those old operating systems had awesome power – you had to be careful wielding it


who, me? Good morning and welcome, once again, to Who, Me? in which Register readers share tales of tech support moments they might prefer to forget. But forgetting is not a way to learn from mistakes, is it?

This week's hero is a veteran we'll Regomize as "Gandalf" for such is the grayness of his beard nowadays. Back in less gray years, though, Gandalf worked for a significant – but now defunct – database maker.

Primary development was carried out on SunOS, but the developer also maintained releases for HP-UX, AIX, Tru64, Siemens Nixdorf UNIX, SCO UNIX, Linux and Windows NT among other exotic operating systems. Gandalf was on the porting team, whose mission was to back-port releases and fixes to the relevant platform branch, run the full suite of tests, and prepare software for release.

Got that so far? Good.

To aid in this endeavor, the team had a set of Quality Assurance tools for each specific database version for each platform. To test, they would simply copy the relevant set of QA Tools from a shared directory and place it on the local filesystem. Then they would set the shell variable QATOOLS to point to the location of the QA Tools. For example: set QATOOLS=/opt/qatools/

The /qatools/ directory contained all the necessary binaries, configuration files, log file locations and other necessary information to test that particular product suite: $QATOOLS/bin, $QATOOLS/etc, $QATOOLS/incl, $QATOOLS/var.

Of course it was important to ensure that they were using the right tools for the right version of the database on the right system, so each time before they began testing the needed to ensure the old version of the toolset had been cleared. This will become important in a minute.

One fine day, Gandalf and team were tasked with testing a new release of the Tru64 port on a DEC Alpha installation located remotely (in fact, it was in Menlo Park, California – a possibly unnecessary detail, but Gandalf included it so you may as well know). They logged in as root, and issued the command to clear out the past version of the tool kit:

rm -rf $QATOOLS/bin $QATOOLS/etc $QATOOLS/incl $QATOOLS/var

Shortly thereafter, Gandalf noticed that things stopped working. Most notably, the telnet connection (yes, we are well into the before times here) went dead and could not be revived.

Investigation revealed a genuinely horrific error: before typing in the very powerful command above, they had failed to point the QATOOLS variable at the correct location. Or indeed at any location.

If you recall anything about Alpha, you'll know what that command did. As Gandalf put it: "The resulting carnage was as swift as the DEC Alpha was powerful." And he recalled one of the graybeards at the time telling him: "That machine was dead before the sweat of your brow hit the keyboard!"

In short, everything was gone.

Thankfully the sysadmins in Menlo Park were able to reconstruct their devastated system and ultimately there were no serious repercussions for Gandalf or his team. Just an important lesson learned – and a heck of a war story gathered for the retelling.

Who Me? needs your war stories! Our mailbag is getting very low and dusty, so if you have a tale of tech support gone awry or lessons learned the hard way, please click here to send an email to Who, Me? so we can possibly share your adventures on some future Monday morn. ®

Send us news
81 Comments

Developers feared large chaps carrying baseball bats could come to kneecap their ... test account?

A whole different kind of 'technical debt' turned into real-world trouble

Life lesson: Don't delete millions of accounts on the same day you go to the dentist

Or ignore documentation that warns you are about to do something dangerous

Brackets go <i>there</i>? Oops. That’s not where I used them and now things are broken

Weird syntax AND/OR a junior techie can be very bad for business

Coder wrote a bug so bad security guards wanted a word when he arrived at work

Working for a startup is supposed to end with getting rich overnight, but not like this

Panic at the Cisco tech, thanks to ancient IOS syntax helper that outsmarted itself

Misplaced shortcut led to communication breakdown

NetAdmin learns that wooden chocks, unlike swipe cards, open doors when networks can't

Burglary skills are surprisingly important when building networks

Network engineer chose humiliation over a night on the datacenter floor

To avoid lock-in, it helps if you remember your keys

Relocation is a complete success – right up until the last minute

It may be a cliché to say 'Don't rest on your laurels' but you really shouldn't

I made this network so resilient nothing could possibly go wro...

All the redundancy in the world can't help a 'brown trouser' mistake

Linux admin asked savvy scientist for IT help and the boffin blew it

In science, every day is about testing hypotheses. Such as: 'plugging this thing in here is OK, right?'

Compression? What's that? And why is the network congested and the PCs frozen?

The only thing worse than a Reply All storm is a Send All storm

After we fix that, how about we also accidentally break something important?

You had one job – doing extra is nice, but dangerous