Resilience of an embedded architecture using hardware redundancy

Castano, Victor (2014) Resilience of an embedded architecture using hardware redundancy. Doctoral thesis, London Metropolitan University.

[img]
Preview
Text
Victor Castano - PhD Final Thesis.pdf - Published Version

Download (25MB) | Preview

Abstract

In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound.

Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures.

In this work we study the concepts of fault tolerance and dependability and
extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators.

Item Type: Thesis (Doctoral)
Additional Information: uk.bl.ethos.659015
Uncontrolled Keywords: computer architecture; computer resilience; computer reliability; computer system failures; embedded computer systems; real time systems (RTS)
Subjects: 000 Computer science, information & general works
Department: School of Human Sciences
Depositing User: Mary Burslem
Date Deposited: 24 May 2017 15:13
Last Modified: 24 May 2017 15:13
URI: http://repository.londonmet.ac.uk/id/eprint/719

Actions (login required)

View Item View Item