Thursday, May 31, 2007

Just the facts, Man.

OK, I've been given official permission to blog about the new system. So here is the geek porn, live ready and waiting for your perusal:

The system will be called Ranger. Here are the specs:

Compute power
  • 529 Teraflops(!) aggregate peak
  • 3,936 Sun four-socket, quad-core nodes
  • 15,744 AMD Opteron “Barcelona” processors
  • Quad-core, four flops/cycle (dual pipelines)
  • 2 GB/core, 32 GB/node, 125 TB total
  • 132 GB/s aggregate bandwidth
Disk subsystem
  • 72 Sun x4500 “Thumper” I/O servers, 24TB each
  • 1.7 Petabyte total raw storage
  • Aggregate bandwidth ~32 GB/sec
Infiniband interconnect
  • Full non-blocking 7-stage Clos fabric
  • Low latency (~2.3 /sec), high-bandwidth (~950 MB/s)

The overall look and feel of Ranger from the user perspective will be very similar to our current Linux cluster (Lonestar).

Full Linux OS w/ hardware counter patches on login and compute nodes ( is starting working kernel)

Lustre File System
$HOME, and multiple $WORKS will be available
Largest $WORK will be ~1PB total
Standard 3rd party packages
Infiniband using next generation of Open Fabrics
MVAPICH and OpenMPI (MPI1 and MPI2)

Now, having come from my humble beginnings as an English Major, this is somewhat impressive to me. I have come pretty far from my first personal purchase of a computer 13 years ago - I used my wife's student loan money to buy it.

So, slobber away, punks! Biggest computer in the world, coming soon!

Thursday, May 17, 2007

been a while?


the contract was finally signed, and we have sent out a purchase order for the first $X worth of 'starter equipment', which means they will start shipping the machines that actually exist at this moment.

I haven't been blogging about this yet, because nothing interesting has happened. Just boring preparation work for the upcoming deluge of work. Like figuring out the Server Management software, as well as getting pxe-over-ib working, and doing filesystem tests and whatnot with lustre/different raid configurations. no picture-inducing stuff.

Friday, May 4, 2007

the power! THE POWER!!!!

So. We now have access to the power and water chilling buildings. The following pictures are of the chillers. The large pipes you see labeled with green signs are the in and out takes for the entire system. Those suckers are about 2 feet in diameter.

And here, we have the power distribution units. These massive circuits each carry 4000 Amps of power to the machine room, for a total of 3MegaWatts. Insane!

This set runs the in-row coolers and the air handlers:

and this set runs the rest of the cluster. yowza!

Wednesday, May 2, 2007

new machine room is done!

I was finally allowed to go into this room without a safety helmet! I hated wearing those things. The construction crew finally finished hooking up the 116 in row coolers, and they are planning on doing the final cleanup next week.

Very soon, we will start getting the first pieces of equipment in, and we can start our mad dash towards getting everything running the way it should.

First up will be the filesystem, made up of alot of servers which each have a large number of disks in them, which will all be connected to two massive infiniband networks, and incorporated into 2 giant, very fast, very low latency and high i/o filesystems. We will be using the Lustre distributed filesystem.

In the above photo, you can see the view from our loading dock - which actually has massive doors that you can fit massive things through! What a neat thought - a building that was designed from the ground up to be a machine room, not the other way around. It really got old rolling racks up the only handicap ramp in the building in our old machine room.
Here, you can see the actual loading dock, which a real machine room should have. A truck should only have to back up, then roll the racks/equipment straight onto the machine room floor. Hell Yes.

more of what I do

So, I am rewiring 4 of the 8 infiniband switches on our "old" cluster this week:
This is just half of the entire network, which connects over 4000 nodes to each other on 8 different paths, so that any time, every node has 8 places to go for communication.

Above is a finished rack with two switches in it.
And in this shot above, is the rack I am currently working on. We had to re-wire them because the subnet manager couldn't handle the way the cables were placed, and performance (latency) was really being negatively affected. If you order them *exactly* the opposite of what we did, the entire system runs almost 2x as fast.