24 March 2011

Exascale Computing

An IEEE Spectrum article basically mentions 5 problems that need to be solved to reach exascale: power consumption of main memory, bandwidth & latency to main memory, filling the gap between main memory and disks, resiliency in the face of billions of components, and programmability to coordinate billions of threads.

There are multiple approaches being pursued to manage the main memory problems:
* We are likely to need to stop building cpus on one chip and memory on a second chip.  Instead, we will need to integrate a cpu and memory on a single chip.
* We need to rearchitect memory to save power.  Current architectures activate too much of a chip to return few bits.  We can activate smaller amounts of the chip.
* In the old days, we explicitly paged data from slow remote memories into fast local memories.  Moving 4KB pages from main memory to local memory would allow us to transfer data more efficiently than transferring 128 byte cache lines.  Also, with cache lines, writes to memory first read the cache line even when the line will be completely over-written.
* And there are a number of new materials we are looking at to improve main memory like memristors and carbon nanotubes.
    With so many avenues of attack available, we are likely to solve this problem soon.

Kogge, the author of the article, quickly skips past his reasons why Flash memory cannot fill the gap between main memory and disk.  I find this technology very promising.  Flash uses much less power than disk, and it is ubiquitous in cameras, phones, and other mobile devices.  The rapid growth of mobile market share will ensure that Flash receives enough research funding to overtake far slower disk drives.

Resiliancy and coordinating billions of threads both boil down to how we write code for exascale systems.  Google addresses both of these issues.  Software is designed to be fault tolerant using mirroring or RAID techniques or checkpointing techniques or retries.  Individual processes are frequently being kicked off of one overloaded machine and restarted on another machine.  Disks and machines are frequently failing.  Software is designed to run correctly in the face of these events.
  Google software coordinates millions of threads using a technique I like to call Distributed Object Oriented Programming.  A client process will generate a few hundred pieces of work and ship those requests out asynchronously to server processes.  The servers in turn split each piece of work they receive into tens to hundreds of operations and forward those to the next layer.  After just a few layers, millions of threads have been coordinated with each individual component only having to worry about a manageable number of threads.

We have an existence proof that a peta-scale computer can be built to run in a cubic foot with modest power requirements: the human brain.  That processor evolved.  We will soon engineer an equivalently powerful computer.  The exascale computer requires far less density and is allowed to consume far more energy.  The exascale computer is a challenge that we will easily meet.


Post a Comment

<< Home