Cross-Layer Soft Error Resilience
Filippos Toufexis
Time: 12:30 pm Dec 4th, 2015
Location: Building 53 Room 3004
Until recently, with the exception of high-end servers and safety-critical applications, most systems assumed correct operation of the underlying hardware. However, it is becoming evident at recent technology nodes that systems are becoming increasingly susceptible to hardware failures. Research in the field of fault-tolerant computing has been active since the 1950s, yielding an abundance of protection techniques in different abstraction layers. There are techniques that can be applied in the circuit, logic, architectural or software level, but the cost is very high for most applications. Therefore, the most significant challenge is the design of fault-tolerant systems at very low power, performance, area, design, and validation cost. One approach toward this direction is potentially to use cross-layer protection schemes, comprising techniques that span various abstraction layers of the system stack - circuits, logic, architectural, software - such that they work together to achieve the required reliability levels, at low cost. We describe the framework we developed, to perform cross-layer soft-error resilience design exploration, and show preliminary experimental results, to demonstrate the opportunities associated with such an approach.