Credits and Contact Hours
3 credits, 43 hours
Course Instructor Name
Dr. Mohammad Al-Failakawi
Textbook
Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design, M. L. Shooman
Catalog Description
This course addresses design, modeling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: reliability, dependability, maintainability, redundancy, damage confinement, error recovery, fault treatment, redundancy management, voting, information redundancy, combinatorial and sequential network testing, error detecting and correcting codes, self-checking circuits, and diagnostic theory
Prerequisite
ENGR-304, CpE-368
Specific Goals for the Course
Upon successful completion of this course, students will be able to:
Evaluate the dependability of a system. (Student outcomes: 1, 2, 6)
Analyze a system for performance-dependability tradeoffs. (Student outcomes: 1, 2)
Select the appropriate detection techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)
Apply the appropriate recovery techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)
Understand faults and manifestations.
Understand the reliability and availability techniques. (Student outcomes: 1, 2, 6)
Understand the evaluation criteria and financial considerations in providing fault-tolerance in a system. (Student outcomes: 1, 2).
Topics to Be Covered
Introduction to fault-tolerant computing.
Introduction to defects, faults and errors.
Fault models and reliability modeling.
Error detection and correction methods.
Hardware and software redundancy methods and fault diagnosis.
Self-checking circuits.
Software reliability modeling and fault tolerance.
Case studies.