Cesium Musings: March 2011

Kurzweil suggests that a computer with super-human intelligence is enough to lead us to a technologic and economic singularity. However, we humans have lots of experience with super-human intelligence. Get two or more people together to plan and execute an operation, and you're dealing with super-human intelligence. When NASA put a man on the moon, that was done by a team of 400,000 or so humans working together.

So, say we build an exaflops computer and ask it to design the next generation of computer. Well that exaflops computer has somewhere around 1,000 to 10,000 times as much processing power as a human. So it isn't going to design a much better computer than Intel would be currently capable of designing.

Suppose Intel busy an exaflops computer for each of the humans it hires and this raises the available processing power at Intel by a factor of about 1,000. Can Intel now design in one day what would normally take 3 years to design? In _The Mythical Man-Month_ Brooks strongly suggests that adding people to a project increases the communications overheads to such an extent that the extra people are not worthwhile.

So it isn't that we are building boxes with far more processing power than a brain. It's that we are building high bandwidth digital communications between and within those boxes.

Of course, breaking up a complex design task into numerous small pieces is hard, and getting the equivalent of 1,000,000 human brains to produce a new chip design in a day might require too more parallelism than we can obtain.

Another constraint may arise from needing to interact with the physical world. It's not just designing a chip; it's waiting for the chip manufacturing machines to be designed and built. Prototype chips need to be built and executed to verify the models of physical reality that the design software is using.

An IEEE Spectrum article basically mentions 5 problems that need to be solved to reach exascale: power consumption of main memory, bandwidth & latency to main memory, filling the gap between main memory and disks, resiliency in the face of billions of components, and programmability to coordinate billions of threads.

There are multiple approaches being pursued to manage the main memory problems:

* We are likely to need to stop building cpus on one chip and memory on a second chip. Instead, we will need to integrate a cpu and memory on a single chip.

* We need to rearchitect memory to save power. Current architectures activate too much of a chip to return few bits. We can activate smaller amounts of the chip.

* In the old days, we explicitly paged data from slow remote memories into fast local memories. Moving 4KB pages from main memory to local memory would allow us to transfer data more efficiently than transferring 128 byte cache lines. Also, with cache lines, writes to memory first read the cache line even when the line will be completely over-written.

* And there are a number of new materials we are looking at to improve main memory like memristors and carbon nanotubes.

With so many avenues of attack available, we are likely to solve this problem soon.

Kogge, the author of the article, quickly skips past his reasons why Flash memory cannot fill the gap between main memory and disk. I find this technology very promising. Flash uses much less power than disk, and it is ubiquitous in cameras, phones, and other mobile devices. The rapid growth of mobile market share will ensure that Flash receives enough research funding to overtake far slower disk drives.

Resiliancy and coordinating billions of threads both boil down to how we write code for exascale systems. Google addresses both of these issues. Software is designed to be fault tolerant using mirroring or RAID techniques or checkpointing techniques or retries. Individual processes are frequently being kicked off of one overloaded machine and restarted on another machine. Disks and machines are frequently failing. Software is designed to run correctly in the face of these events.

Google software coordinates millions of threads using a technique I like to call Distributed Object Oriented Programming. A client process will generate a few hundred pieces of work and ship those requests out asynchronously to server processes. The servers in turn split each piece of work they receive into tens to hundreds of operations and forward those to the next layer. After just a few layers, millions of threads have been coordinated with each individual component only having to worry about a manageable number of threads.

We have an existence proof that a peta-scale computer can be built to run in a cubic foot with modest power requirements: the human brain. That processor evolved. We will soon engineer an equivalently powerful computer. The exascale computer requires far less density and is allowed to consume far more energy. The exascale computer is a challenge that we will easily meet.

Cesium Musings

28 March 2011

The Singularity -- Is Strong AI enough?

24 March 2011

Exascale Computing

About Me

Links

Previous Posts

Archives