Skip to main content
CPE
471
Fault Tolerant Computing
This course addresses design, modeling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: reliability, dependability, maintainability, redundancy, damage confinement, error recovery, fault treatment, redundancy management, voting, information redundancy, combinatorial and sequential network testing, error detecting and correcting codes, self-checking circuits, and diagnostic theory.
Prerequisites:
0600304,0612368
0612471
(3-0-3)

Credits and Contact Hours

3 credits, 43 hours

Course Instructor Name

Dr. Mohammad Al-Failakawi

Textbook

Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design, M. L. Shooman

Catalog Description

This course addresses design, modeling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: reliability, dependability, maintainability, redundancy, damage confinement, error recovery, fault treatment, redundancy management, voting, information redundancy, combinatorial and sequential network testing, error detecting and correcting codes, self-checking circuits, and diagnostic theory

Prerequisite

ENGR-304, CpE-368

Specific Goals for the Course

Upon successful completion of this course, students will be able to:

Evaluate the dependability of a system. (Student outcomes: 1, 2, 6)

Analyze a system for performance-dependability tradeoffs. (Student outcomes: 1, 2)

Select the appropriate detection techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)

Apply the appropriate recovery techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)

Understand faults and manifestations.

Understand the reliability and availability techniques. (Student outcomes: 1, 2, 6)

Understand the evaluation criteria and financial considerations in providing fault-tolerance in a system. (Student outcomes: 1, 2).

Topics to Be Covered

Introduction to fault-tolerant computing.

Introduction to defects, faults and errors.

Fault models and reliability modeling.

Error detection and correction methods.

Hardware and software redundancy methods and fault diagnosis.

Self-checking circuits.

Software reliability modeling and fault tolerance.

Case studies.