costs.
TIP: Increasing availability from 99% to 99.9% is relatively
-simply. But increasing availability from 99.9999% to 99.99999% is very
+simple. But increasing availability from 99.9999% to 99.99999% is very
hard and costly. `ha-manager` has typical error detection and failover
times of about 2 minutes, so you can get no more than 99.999%
availability.
after the watchdog then times out, this happens after 60 seconds.
+HA Simulator
+------------
+
+[thumbnail="screenshot/gui-ha-manager-status.png"]
+
+By using the HA simulator you can test and learn all functionalities of the
+Proxmox VE HA solutions.
+
+By default, the simulator allows you to watch and test the behaviour of a
+real-world 3 node cluster with 6 VMs. You can also add or remove additional VMs
+or Container.
+
+You do not have to setup or configure a real cluster, the HA simulator runs out
+of the box.
+
+Install with apt:
+
+----
+apt install pve-ha-simulator
+----
+
+You can even install the package on any Debian based system without any
+other Proxmox VE packages. For that you will need to download the package and
+copy it to the system you want to run it on for installation. When you install
+the package with apt from the local file system it will also resolve the
+required dependencies for you.
+
+
+To start the simulator on a remote machine you must have a X11 redirection to
+your current system.
+
+If you are on a Linux machine you can use:
+
+----
+ssh root@<IPofPVE> -Y
+----
+
+On Windows it is working with https://mobaxterm.mobatek.net/[mobaxterm].
+
+After either connecting to a existing {pve} with the simulator installed, or
+installing it on your local Debian based system manually you can try it out as
+follows.
+
+First you need to create a working directory where the simulator saves it's
+current state and writes its the default config:
+
+----
+mkdir working
+----
+
+Then, simply pass the created directory as parameter to 'pve-ha-simulator':
+
+----
+pve-ha-simulator working/
+----
+
+You can then start, stop, migrate the simulated HA services, or even check out
+what happens on a node failure.
+
Configuration
-------------
really important task, because without, it would not be possible to
recover a resource on another node.
-If a node would not get fenced, it would be in an unknown state where
+If a node did not get fenced, it would be in an unknown state where
it may have still access to shared resources. This is really
dangerous! Imagine that every network but the storage one broke. Now,
while not reachable from the public network, the VM still runs and
max_restart::
-Maximum number of tries to restart an failed service on the actual
+Maximum number of tries to restart a failed service on the actual
node. The default is set to one.
max_relocate::
When updating the ha-manager you should do one node after the other, never
all at once for various reasons. First, while we test our software
thoughtfully, a bug affecting your specific setup cannot totally be ruled out.
-Upgrading one node after the other and checking the functionality of each node
-after finishing the update helps to recover from an eventual problems, while
-updating all could render you in a broken cluster state and is generally not
+Updating one node after the other and checking the functionality of each node
+after finishing the update helps to recover from eventual problems, while
+updating all at once could result in a broken cluster and is generally not
good practice.
Also, the {pve} HA stack uses a request acknowledge protocol to perform