Visitors Now:
Total Visits:
Total Stories:
Profile image
Story Views

Now:
Last Hour:
Last 24 Hours:
Total:

Finding Bugs In Supercomputers Just Became Easier

Thursday, November 15, 2012 22:21
% of readers think this story is Fact. Add your two cents.

(Before It's News)

Michael Harper for redOrbit.com – Your Universe Online

For many computer users, finding a bug or an error in their system can be more than difficult, it can be a chore with no clear beginning or ending. Personal computers of all types and platforms have become increasingly sophisticated in even the last 5 years, yet they cannot perform nearly as well as today’s supercomputers. These shelves and racks of processors and other components run so many calculations, they have to be defined with a special word: The petaflop, or quadrillions of calculations per second.

Finding and dealing with a “bug” in one of these systems makes the same process on a PC look like child’s play; like looking for the red circle amongst a pile of blue squares.

Now, the team at Lawrence Livermore National Laboratory (LLNL) say they’ve finally found a tool to help them discover and diagnose bugs in their supercomputer. Called the Stack Trace Analysis Tool (STAT), this sophisticated piece of software is said to be lightweight, scalable and capable of finding bugs in a system while it’s churning out more than 1 million MPI processes.

STAT is now being used to find bugs on the IBM BlueGene/Q-based supercomputer Sequoia, which recently ranked number 2 in the Top 500 Supercomputers list.

Finding bugs while Sequoia was running fewer processes was difficult enough, but according to the research team for STAT, these bugs became more difficult to find and manifested themselves in some very perplexing ways once the computer began ramping up its processes. While the LLNL team began ramping up Sequoia, they began to notice some software defects as well as application failures. They began to use STAT and have since been able to diagnose and solve many of these issues.

STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule,” explained Greg Lee, computer scientist at LLNL.

“While testing a subsystem of Blue/Gene Q, my test program consistently failed only when scaled to 1,179,648 MPI processes. Although the test program was simple, the sheer scale at which this program ran made debugging efforts highly challenging. But when I applied STAT, it quickly revealed that one particular rank process was consistently stuck in a system call,” said Dong Ahn, a computer scientist in Livermore Computing. After discovering what was causing the bottleneck in the system, one system expert began to look at the specific core running this process and noticed the problem lay in the hardware itself. Ahn said replacing this piece of hardware solved the problem, and suddenly Sequoia was back up and churning through millions of processes once more.

“Putting this exercise into perspective, this error was due to a defect in a tiny hardware unit, the decrementor, of a single hardware thread out of a total of 4.7 million hardware threads. I felt it was like finding a needle in a haystack over a coffee break.”

STAT will prove very helpful to the LLNL team. Having debugging software which is capable of running in step with the computer will allow the team to actively monitor their system while still using it to run through its calculations. STAT has also been used on other supercomputer platforms, such as Linux and Cray, the platform used by the new number 1 supercomputer in Oak Ridge National Laboratory, Titan.

redOrbit.com
offers Science, Space, Technology, Health news, videos, images and
reference information. For the latest science news, space news,
technology news, health news visit redOrbit.com frequently. Learn
something new every day.\”



Source:

Report abuse

Comments

Your Comments
Question   Razz  Sad   Evil  Exclaim  Smile  Redface  Biggrin  Surprised  Eek   Confused   Cool  LOL   Mad   Twisted  Rolleyes   Wink  Idea  Arrow  Neutral  Cry   Mr. Green

Top Stories
Recent Stories

Register

Newsletter

Email this story
Email this story

If you really want to ban this commenter, please write down the reason:

If you really want to disable all recommended stories, click on OK button. After that, you will be redirect to your options page.