What's a Petaflop?
Wired magazine has a good description of a petaflop computer.
That description is slightly dated, and is a high-end custom machine. Modern, cheaper, computers would typically be built from a motherboard containing two sockets. Each socket would contain an 8-core (16 thread) processor with 4 channels of DDR3 ram. Thus a motherboard supports 32 threads, up to 256GB of ram, and up to 68GB/sec of memory bandwidth.
The motherboards are organized into refrigerator-sized cabinets containing about 40 motherboards (plus disk drives). About 256 of these cabinets are connected together with a high-speed network to provide a 10,000 motherboard computing cluster. This cluster has up to 2.5PB of ram and 680TB/sec of main memory bandwidth. There are only 20,000 processor sockets instead of 180,000 in the Jaguar, but each socket has twice the cores, four times the threads, a 50% higher clock rate, and various tweaks to provide comparable processing power.
Whether or not you can actually get a useful petaflop out of a machine like the Jaguar is questionable. Each flop would be accessing 4 bits of memory. Or you would be using 8 operations per 32-bit memory read. This assumes the memory bus can be saturated, and you aren't writing to main memory.
A more typical application might want to read a 32-bit word from memory, operate on it, and write the result back to memory. A write operation requires both a memory read (to load the cache line that will be written to, but never read) and a memory write. And it's difficult to saturate the memory bus. Various conflicts in the controller and on the bus will limit the obtainable bandwidth.
In practice, then, a typical modern motherboard would operate at about 2.8G memflops (memory accessing floating point operations per second). The entire cluster would operate at about 28T memflops.
That description is slightly dated, and is a high-end custom machine. Modern, cheaper, computers would typically be built from a motherboard containing two sockets. Each socket would contain an 8-core (16 thread) processor with 4 channels of DDR3 ram. Thus a motherboard supports 32 threads, up to 256GB of ram, and up to 68GB/sec of memory bandwidth.
The motherboards are organized into refrigerator-sized cabinets containing about 40 motherboards (plus disk drives). About 256 of these cabinets are connected together with a high-speed network to provide a 10,000 motherboard computing cluster. This cluster has up to 2.5PB of ram and 680TB/sec of main memory bandwidth. There are only 20,000 processor sockets instead of 180,000 in the Jaguar, but each socket has twice the cores, four times the threads, a 50% higher clock rate, and various tweaks to provide comparable processing power.
Whether or not you can actually get a useful petaflop out of a machine like the Jaguar is questionable. Each flop would be accessing 4 bits of memory. Or you would be using 8 operations per 32-bit memory read. This assumes the memory bus can be saturated, and you aren't writing to main memory.
A more typical application might want to read a 32-bit word from memory, operate on it, and write the result back to memory. A write operation requires both a memory read (to load the cache line that will be written to, but never read) and a memory write. And it's difficult to saturate the memory bus. Various conflicts in the controller and on the bus will limit the obtainable bandwidth.
In practice, then, a typical modern motherboard would operate at about 2.8G memflops (memory accessing floating point operations per second). The entire cluster would operate at about 28T memflops.