X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=pmxcfs.adoc;h=ea1555943ad768361a5a1eacfb24e00a9c822074;hb=74ff2e640d0f2390aeea3689147450603cd4f56d;hp=3474d736a7d17af3751d53870a9e9ff28c4824ea;hpb=8c1189b640ae7d10119ff1c046580f48749d38bd;p=pve-docs.git

diff --git a/pmxcfs.adoc b/pmxcfs.adoc
index 3474d73..ea15559 100644
--- a/pmxcfs.adoc
+++ b/pmxcfs.adoc
@@ -1,17 +1,18 @@
+[[chapter_pmxcfs]]
 ifdef::manvolnum[]
-PVE({manvolnum})
-================
-include::attributes.txt[]
+pmxcfs(8)
+=========
+:pve-toplevel:
 
 NAME
 ----
 
 pmxcfs - Proxmox Cluster File System
 
-SYNOPSYS
+SYNOPSIS
 --------
 
-include::pmxcfs.8-cli.adoc[]
+include::pmxcfs.8-synopsis.adoc[]
 
 DESCRIPTION
 -----------
@@ -20,7 +21,7 @@ endif::manvolnum[]
 ifndef::manvolnum[]
 Proxmox Cluster File System (pmxcfs)
 ====================================
-include::attributes.txt[]
+:pve-toplevel:
 endif::manvolnum[]
 
 The Proxmox Cluster file system (``pmxcfs'') is a database-driven file
@@ -29,17 +30,18 @@ cluster nodes using `corosync`. We use this to store all PVE related
 configuration files.
 
 Although the file system stores all data inside a persistent database
-on disk, a copy of the data resides in RAM. That imposes restriction
-on the maximal size, which is currently 30MB. This is still enough to
+on disk, a copy of the data resides in RAM. This imposes restrictions
+on the maximum size, which is currently 30MB. This is still enough to
 store the configuration of several thousand virtual machines.
 
 This system provides the following advantages:
 
-* seamless replication of all configuration to all nodes in real time
-* provides strong consistency checks to avoid duplicate VM IDs
-* read-only when a node loses quorum
-* automatic updates of the corosync cluster configuration to all nodes
-* includes a distributed locking mechanism
+* Seamless replication of all configuration to all nodes in real time
+* Provides strong consistency checks to avoid duplicate VM IDs
+* Read-only when a node loses quorum
+* Automatic updates of the corosync cluster configuration to all nodes
+* Includes a distributed locking mechanism
+
 
 POSIX Compatibility
 -------------------
@@ -47,41 +49,39 @@ POSIX Compatibility
 The file system is based on FUSE, so the behavior is POSIX like. But
 some feature are simply not implemented, because we do not need them:
 
-* you can just generate normal files and directories, but no symbolic
+* You can just generate normal files and directories, but no symbolic
   links, ...
 
-* you can't rename non-empty directories (because this makes it easier
+* You can't rename non-empty directories (because this makes it easier
   to guarantee that VMIDs are unique).
 
-* you can't change file permissions (permissions are based on path)
+* You can't change file permissions (permissions are based on paths)
 
 * `O_EXCL` creates were not atomic (like old NFS)
 
 * `O_TRUNC` creates are not atomic (FUSE restriction)
 
 
-File access rights
+File Access Rights
 ------------------
 
 All files and directories are owned by user `root` and have group
 `www-data`. Only root has write permissions, but group `www-data` can
-read most files. Files below the following paths:
+read most files. Files below the following paths are only accessible by root:
 
  /etc/pve/priv/
  /etc/pve/nodes/${NAME}/priv/
 
-are only accessible by root.
-
 
 Technology
 ----------
 
-We use the http://www.corosync.org[Corosync Cluster Engine] for
-cluster communication, and http://www.sqlite.org[SQlite] for the
-database file. The filesystem is implemented in user space using
-http://fuse.sourceforge.net[FUSE].
+We use the https://www.corosync.org[Corosync Cluster Engine] for
+cluster communication, and https://www.sqlite.org[SQlite] for the
+database file. The file system is implemented in user space using
+https://github.com/libfuse/libfuse[FUSE].
 
-File system layout
+File System Layout
 ------------------
 
 The file system is mounted at:
@@ -93,37 +93,62 @@ Files
 
 [width="100%",cols="m,d"]
 |=======
-|`corosync.conf`                        | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf)
-|`storage.cfg`                          | {pve} storage configuration
-|`datacenter.cfg`                       | {pve} datacenter wide configuration (keyboard layout, proxy, ...)
-|`user.cfg`                             | {pve} access control configuration (users/groups/...)
+|`authkey.pub`                          | Public key used by the ticket system
+|`ceph.conf`                            | Ceph configuration file (note: /etc/ceph/ceph.conf is a symbolic link to this)
+|`corosync.conf`                        | Corosync cluster configuration file (prior to {pve} 4.x, this file was called cluster.conf)
+|`datacenter.cfg`                       | {pve} data center-wide configuration (keyboard layout, proxy, ...)
 |`domains.cfg`                          | {pve} authentication domains
-|`authkey.pub`                          | Public key used by ticket system
-|`pve-root-ca.pem`                      | Public certificate of cluster CA
-|`priv/shadow.cfg`                      | Shadow password file
-|`priv/authkey.key`                     | Private key used by ticket system
-|`priv/pve-root-ca.key`                 | Private key of cluster CA
-|`nodes/<NAME>/pve-ssl.pem`             | Public SSL certificate for web server (signed by cluster CA)
+|`firewall/cluster.fw`                  | Firewall configuration applied to all nodes
+|`firewall/<NAME>.fw`                   | Firewall configuration for individual nodes
+|`firewall/<VMID>.fw`                   | Firewall configuration for VMs and containers
+|`ha/crm_commands`                      | Displays HA operations that are currently being carried out by the CRM
+|`ha/manager_status`                    | JSON-formatted information regarding HA services on the cluster
+|`ha/resources.cfg`                     | Resources managed by high availability, and their current state
+|`nodes/<NAME>/config`                  | Node-specific configuration
+|`nodes/<NAME>/lxc/<VMID>.conf`         | VM configuration data for LXC containers
+|`nodes/<NAME>/openvz/`                 | Prior to PVE 4.0, used for container configuration data (deprecated, removed soon)
 |`nodes/<NAME>/pve-ssl.key`             | Private SSL key for `pve-ssl.pem`
-|`nodes/<NAME>/pveproxy-ssl.pem`        | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
+|`nodes/<NAME>/pve-ssl.pem`             | Public SSL certificate for web server (signed by cluster CA)
 |`nodes/<NAME>/pveproxy-ssl.key`        | Private SSL key for `pveproxy-ssl.pem` (optional)
+|`nodes/<NAME>/pveproxy-ssl.pem`        | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
 |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs
-|`nodes/<NAME>/lxc/<VMID>.conf`         | VM configuration data for LXC containers
-|`firewall/cluster.fw`                  | Firewall configuration applied to all nodes
-|`firewall/<NAME>.fw`                   | Firewall configuration for individual nodes
-|`firewall/<VMID>.fw`                   | Firewall configuration for VMs and Containers
+|`priv/authkey.key`                     | Private key used by ticket system
+|`priv/authorized_keys`                 | SSH keys of cluster members for authentication
+|`priv/ceph*`                           | Ceph authentication keys and associated capabilities
+|`priv/known_hosts`                     | SSH keys of the cluster members for verification
+|`priv/lock/*`                          | Lock files used by various services to ensure safe cluster-wide operations
+|`priv/pve-root-ca.key`                 | Private key of cluster CA
+|`priv/shadow.cfg`                      | Shadow password file for PVE Realm users
+|`priv/storage/<STORAGE-ID>.pw`         | Contains the password of a storage in plain text
+|`priv/tfa.cfg`                         | Base64-encoded two-factor authentication configuration
+|`priv/token.cfg`                       | API token secrets of all tokens
+|`pve-root-ca.pem`                      | Public certificate of cluster CA
+|`pve-www.key`                          | Private key used for generating CSRF tokens
+|`sdn/*`                                | Shared configuration files for Software Defined Networking (SDN)
+|`status.cfg`                           | {pve} external metrics server configuration
+|`storage.cfg`                          | {pve} storage configuration
+|`user.cfg`                             | {pve} access control configuration (users/groups/...)
+|`virtual-guest/cpu-models.conf`        | For storing custom CPU models
+|`vzdump.cron`                          | Cluster-wide vzdump backup-job schedule
 |=======
 
+
 Symbolic links
 ~~~~~~~~~~~~~~
 
+Certain directories within the cluster file system use symbolic links, in order
+to point to a node's own configuration files. Thus, the files pointed to in the
+table below refer to different files on each node of the cluster.
+
 [width="100%",cols="m,m"]
 |=======
 |`local`         | `nodes/<LOCAL_HOST_NAME>`
-|`qemu-server`   | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
 |`lxc`           | `nodes/<LOCAL_HOST_NAME>/lxc/`
+|`openvz`        | `nodes/<LOCAL_HOST_NAME>/openvz/` (deprecated, removed soon)
+|`qemu-server`   | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
 |=======
 
+
 Special status files for debugging (JSON)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -136,65 +161,81 @@ Special status files for debugging (JSON)
 |`.rrd`        |RRD data (most recent entries)
 |=======
 
+
 Enable/Disable debugging
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 You can enable verbose syslog messages with:
 
- echo "1" >/etc/pve/.debug 
+ echo "1" >/etc/pve/.debug
 
 And disable verbose syslog messages with:
 
- echo "0" >/etc/pve/.debug 
+ echo "0" >/etc/pve/.debug
 
 
 Recovery
 --------
 
-If you have major problems with your Proxmox VE host, e.g. hardware
-issues, it could be helpful to just copy the pmxcfs database file
-`/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE
+If you have major problems with your {pve} host, for example hardware
+issues, it could be helpful to copy the pmxcfs database file
+`/var/lib/pve-cluster/config.db`, and move it to a new {pve}
 host. On the new host (with nothing running), you need to stop the
-`pve-cluster` service and replace the `config.db` file (needed permissions
-`0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the
-lost Proxmox VE host, then reboot and check. (And don't forget your
-VM/CT data)
+`pve-cluster` service and replace the `config.db` file (required permissions
+`0600`). Following this, adapt `/etc/hostname` and `/etc/hosts` according to the
+lost {pve} host, then reboot and check (and don't forget your
+VM/CT data).
 
-Remove Cluster configuration
+
+Remove Cluster Configuration
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-The recommended way is to reinstall the node after you removed it from
-your cluster. This makes sure that all secret cluster/ssh keys and any
+The recommended way is to reinstall the node after you remove it from
+your cluster. This ensures that all secret cluster/ssh keys and any
 shared configuration data is destroyed.
 
-In some cases, you might prefer to put a node back to local mode
-without reinstall, which is described here:
-
-* stop the cluster file system in `/etc/pve/`
-
- # systemctl stop pve-cluster
+In some cases, you might prefer to put a node back to local mode without
+reinstalling, which is described in
+<<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
 
-* start it again but forcing local mode
 
- # pmxcfs -l
+Recovering/Moving Guests from Failed Nodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-* remove the cluster config
+For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
+`nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as the
+owner of the respective guest. This concept enables the usage of local locks
+instead of expensive cluster-wide locks for preventing concurrent guest
+configuration changes.
 
- # rm /etc/pve/cluster.conf
- # rm /etc/cluster/cluster.conf
- # rm /var/lib/pve-cluster/corosync.authkey
+As a consequence, if the owning node of a guest fails (for example, due to a power
+outage, fencing event, etc.), a regular migration is not possible (even if all
+the disks are located on shared storage), because such a local lock on the
+(offline) owning node is unobtainable. This is not a problem for HA-managed
+guests, as {pve}'s High Availability stack includes the necessary
+(cluster-wide) locking and watchdog functionality to ensure correct and
+automatic recovery of guests from fenced nodes.
 
-* stop the cluster file system again
+If a non-HA-managed guest has only shared disks (and no other local resources
+which are only available on the failed node), a manual recovery
+is possible by simply moving the guest configuration file from the failed
+node's directory in `/etc/pve/` to an online node's directory (which changes the
+logical owner or location of the guest).
 
- # systemctl stop pve-cluster
+For example, recovering the VM with ID `100` from an offline `node1` to another
+node `node2` works by running the following command as root on any member node
+of the cluster:
 
-* restart pve services (or reboot)
+ mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
 
- # systemctl start pve-cluster
- # systemctl restart pvedaemon
- # systemctl restart pveproxy
- # systemctl restart pvestatd
+WARNING: Before manually recovering a guest like this, make absolutely sure
+that the failed source node is really powered off/fenced. Otherwise {pve}'s
+locking principles are violated by the `mv` command, which can have unexpected
+consequences.
 
+WARNING: Guests with local disks (or other local resources which are only
+available on the offline node) are not recoverable like this. Either wait for the
+failed node to rejoin the cluster or restore such guests from backups.
 
 ifdef::manvolnum[]
 include::pve-copyright.adoc[]