Resilience concepts

TL;DR

Goal: Maximizing a system's uptime 📈.
User disruption is acceptable ✅.
More budget friendly 🏷️
With Spare infrastructure ready to switch to you can minimize outages 👌.
cope through disaster.
e.g. 4x4 repearing a tire hole with a spare tire
- spare tire.
- some user disruption while doing so.

^6c176e

Goal:
- Work through failure of some its components 🩹
- ****minimize outages**** ⚡.
User disruption not allowed or acceptable 🚫.
More expensive 💸.
Outages must be minimized and the system needs levels of redundancy.
Mission or life critical situations 🔫
operate through disaster.
e.g. Resilient systems on large planes
- (like extra engines than it needs to so it can operate through failure)

what to plan for ? and do when disaster occurs ?
*e.g.* DR Process are pilot or passanger ejection systems.
Goal: DR is a process designed to keep the non replaceable parts safe.
Set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural 🍃 or human-induced disaster 🤡.
DR can largely be automated to eliminate the time for recovery and errors.

This involves:

Pre-planning
- Ensure plans are in place for extra hardware
- [DO NOT STORE] backups at the same site as the system.
DR Processes
- Cloud machines ready when needed.
- Run periodic DR Testing.
- all the process and tools should be properly documented.
  - all the parties involved should run periodic DR Testing from time to time to minimize human error.
  - all logins to key systems need to be available for staff at time of disaster.
This is designed to keep the crucial and non replaceable parts of the system in place.
Used when HA and FT don't work.

Pasted image 20230521131034.png