Credits and Contact Hours
3 credits, 43 hours
Course Instructor Name
Dr. Mohammad Al-Failakawi
Textbook
Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design, M. L. Shooman
Catalog Description
This course addresses design, modeling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: reliability, dependability, maintainability, redundancy, damage confinement, error recovery, fault treatment, redundancy management, voting, information redundancy, combinatorial and sequential network testing, error detecting and correcting codes, self-checking circuits, and diagnostic theory
Prerequisite
ENGR-304, CpE-368
Specific Goals for the Course
Upon successful completion of this course, students will be able to:
- Evaluate the dependability of a system. (Student outcomes: 1, 2, 6)
- Analyze a system for performance-dependability tradeoffs. (Student outcomes: 1, 2)
- Select the appropriate detection techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)
- Apply the appropriate recovery techniques (hardware and software) for a given environment. (Student outcomes: 1, 2)
- Understand faults and manifestations.
- Understand the reliability and availability techniques. (Student outcomes: 1, 2, 6)
- Understand the evaluation criteria and financial considerations in providing fault-tolerance in a system. (Student outcomes: 1, 2).
Topics to Be Covered
- Introduction to fault-tolerant computing.
- Introduction to defects, faults and errors.
- Fault models and reliability modeling.
- Error detection and correction methods.
- Hardware and software redundancy methods and fault diagnosis.
- Self-checking circuits.
- Software reliability modeling and fault tolerance.
- Case studies.