01 December 2010

What's a Petaflop?

Wired magazine has a good description of a petaflop computer.

That description is slightly dated, and is a high-end custom machine.  Modern, cheaper, computers would typically be built from a motherboard containing two sockets.  Each socket would contain an 8-core (16 thread) processor with 4 channels of DDR3 ram.  Thus a motherboard supports 32 threads, up to 256GB of ram, and up to 68GB/sec of memory bandwidth.

The motherboards are organized into refrigerator-sized cabinets containing about 40 motherboards (plus disk drives).  About 256 of these cabinets are connected together with a high-speed network to provide a 10,000 motherboard computing cluster.  This cluster has up to 2.5PB of ram and 680TB/sec of main memory bandwidth.  There are only 20,000 processor sockets instead of 180,000 in the Jaguar, but each socket has twice the cores, four times the threads, a 50% higher clock rate, and various tweaks to provide comparable processing power.


Whether or not you can actually get a useful petaflop out of a machine like the Jaguar is questionable.  Each flop would be accessing 4 bits of memory.  Or you would be using 8 operations per 32-bit memory read.  This assumes the memory bus can be saturated, and you aren't writing to main memory.


A more typical application might want to read a 32-bit word from memory, operate on it, and write the result back to memory.  A write operation requires both a memory read (to load the cache line that will be written to, but never read) and a memory write.  And it's difficult to saturate the memory bus.  Various conflicts in the controller and on the bus will limit the obtainable bandwidth.  


In practice, then, a typical modern motherboard would operate at about 2.8G memflops (memory accessing floating point operations per second).  The entire cluster would operate at about 28T memflops.