ITC 2008:
Rabaey contemplates cloud computing

by Tets Maniwa

Jan Rabaey, Donald Pederson distinguished professor at U.C. Berkeley gave the invited address at the 2008 International Test Conference in Santa Clara on October 28th.

Rabaey started out by noting that Gordon Moore, in a 2003 ISSCC address, acknowledged that no exponential can go on forever, but suggested we can delay "forever" for a long time. The challenges facing the semiconductor industry today, per Rabaey therefore, arise because traditional directions for change are becoming economically invalid. Device scaling and energy consumption are now costing more for the next process node then most companies can ever hope to recover in product sales.

To make matters worse, the new process nodes have reliability threats, much higher process variations, and energy related problems such as thermal runaway and thermally induced random errors. The promise from scaling continues to be a lower cost per transistor, but now total costs are being driven by higher complexity to address reliability problems. Therefore, further scaling is no longer profitable, Rabaey said.

This creates two opportunities. The new competition landscape includes ubiquitous computing, and changes in the nature of computing from data processing to perceptual processing. The changes are also driving computing towards "bio-inspired" paradigms that are enabled by the increasing availability of transistors in making its capabilities.

The computer industry is slowly abandoning the Boole–Turing-Von Neuman models, according to Rabaey. The new structures anticipate that 99% of all computing will reside on the compute farms. coupled with massive mutations capabilities. The second layer, where people interact with the system, will be a mobile access layer principally comprised of wireless and mobile devices, including much smarter cell phones and more compact portable computers. A new third layer will come into existence soon, a swarm of trillions of sensors. Effectively, applications will not be running on any single device, but will be part of the cloud of computers.

Rabaey opined that his new environment will engender new forms of interactions between man and machine. Interactions with information will move from batch mode, to more interactive modes, and finally take place into an immersive mode. This environment will also result in different technologies at the periphery of the compute cloud, compared to that at the center. The peripheral devices will be very low cost. at very low power, and may not even be on bulk silicon. A central core machine, on the other hand, will require all the performance and density of the latest technology, extending down into the 12-nanometer node.

This change in the compute landscape, and the integration of sensors and processors, will drive a majority of existing jobs to a virtual mode, where mining and synthesis of large data sets will be the norm. Future information tasks will require more statistical information and better abilities to order data into formats that humans can understand.

Rabaey said the emerging continual verification testing abilities necessary for this new compute paradigm to be realized will require the following capabilities: Self verification and test, adaptive capabilities and failure awareness, and devices that are inherently resilient.

There are five steps necessary to achieve these results.

* First, to test and self correct, software based built-in self check them checkpointing at very low overhead will be necessary it's possible to steal a few cycles from program operations to do the checking.

* Second, create designs that are adaptive or "always optimal" by including on-chip sensors for temperature, data errors, leakage, delay, etc. to adjust operating parameters. A large SoC might have 3000 sensors measuring temperature, noise, NBTI over time and feeding these data to a on-chip diagnostic processor.

* Third, let the errors happen. Data mining, and synthesis functions can tolerate higher error rates due to the statistical nature of the data sets. The change from a single high precision processor to a cloud of processors and sensors enables energy-efficient algorithms and architectures that can be resilient to 1016 failures in time.

* Fourth, design for intrinsic resiliency. Connected sensor networks with a high degree of parallelism can operate on estimates of data based on averages and estimates. In fact, some researchers are finding that a large set of approximations converges more rapidly with less variability and at lower power than the alternatives. The key is large-scale communication networks between sensors and processors.

* Fifth, design with the power of numbers. A large number of small, simple processors is likely to offer superior performance over a single processor. As an example, humans account for about 10 percent of all biomass on the planet. They have 10-to-the-9th neurons per node and have been on the planet about 100,000 years. In comparison, ants have about the same biomass, but with only 10-to-the-5th neurons, and have been on the planet for hundreds of millions of years. Ants have survived for this extended time by being small, simple, and living in large swarms.

Rabaey argued that, similar to ants, dense networks with large numbers of cheap, simple units are intrinsically robust. These networks only allow local information exchange, with very little or no global communications. In an IC, these on-chip communications do not require precision, but lots of cheap connections. As a result, the newly emerging exponential will be the number of connected components that are in the total system, and not the number of individual devices.

Reliability will also become a system-level characteristic. In the past, components were assembled into a system where each level within the system had to operate with no defects. In the future, the system will not require full determinism, because we cannot depend on components alone. Reliability requires statistics driving changes in verification and test strategies as well as operating modes.

So, Rabaey concluded, computers and computing itself will change fundamentally in the future. The new exponential, ala Moore, will be the number of connected functions rather than the number of devices. Design technology must change to start from a top-down system-level approach that includes resiliency and reliability as a central parts of the design.

November 10, 2008


EDA industry observer Tets Maniwa can be reached at maniwa_at_sbcglobal.net


Print Version

For EDA Confidential: www.aycinena.com

Copyright (c) 2008, Tets Maniwa. All rights reserved.