X-Git-Url: https://git.proxmox.com/?a=blobdiff_plain;f=pmxcfs.adoc;h=ea1555943ad768361a5a1eacfb24e00a9c822074;hb=74ff2e640d0f2390aeea3689147450603cd4f56d;hp=3474d736a7d17af3751d53870a9e9ff28c4824ea;hpb=8c1189b640ae7d10119ff1c046580f48749d38bd;p=pve-docs.git diff --git a/pmxcfs.adoc b/pmxcfs.adoc index 3474d73..ea15559 100644 --- a/pmxcfs.adoc +++ b/pmxcfs.adoc @@ -1,17 +1,18 @@ +[[chapter_pmxcfs]] ifdef::manvolnum[] -PVE({manvolnum}) -================ -include::attributes.txt[] +pmxcfs(8) +========= +:pve-toplevel: NAME ---- pmxcfs - Proxmox Cluster File System -SYNOPSYS +SYNOPSIS -------- -include::pmxcfs.8-cli.adoc[] +include::pmxcfs.8-synopsis.adoc[] DESCRIPTION ----------- @@ -20,7 +21,7 @@ endif::manvolnum[] ifndef::manvolnum[] Proxmox Cluster File System (pmxcfs) ==================================== -include::attributes.txt[] +:pve-toplevel: endif::manvolnum[] The Proxmox Cluster file system (``pmxcfs'') is a database-driven file @@ -29,17 +30,18 @@ cluster nodes using `corosync`. We use this to store all PVE related configuration files. Although the file system stores all data inside a persistent database -on disk, a copy of the data resides in RAM. That imposes restriction -on the maximal size, which is currently 30MB. This is still enough to +on disk, a copy of the data resides in RAM. This imposes restrictions +on the maximum size, which is currently 30MB. This is still enough to store the configuration of several thousand virtual machines. This system provides the following advantages: -* seamless replication of all configuration to all nodes in real time -* provides strong consistency checks to avoid duplicate VM IDs -* read-only when a node loses quorum -* automatic updates of the corosync cluster configuration to all nodes -* includes a distributed locking mechanism +* Seamless replication of all configuration to all nodes in real time +* Provides strong consistency checks to avoid duplicate VM IDs +* Read-only when a node loses quorum +* Automatic updates of the corosync cluster configuration to all nodes +* Includes a distributed locking mechanism + POSIX Compatibility ------------------- @@ -47,41 +49,39 @@ POSIX Compatibility The file system is based on FUSE, so the behavior is POSIX like. But some feature are simply not implemented, because we do not need them: -* you can just generate normal files and directories, but no symbolic +* You can just generate normal files and directories, but no symbolic links, ... -* you can't rename non-empty directories (because this makes it easier +* You can't rename non-empty directories (because this makes it easier to guarantee that VMIDs are unique). -* you can't change file permissions (permissions are based on path) +* You can't change file permissions (permissions are based on paths) * `O_EXCL` creates were not atomic (like old NFS) * `O_TRUNC` creates are not atomic (FUSE restriction) -File access rights +File Access Rights ------------------ All files and directories are owned by user `root` and have group `www-data`. Only root has write permissions, but group `www-data` can -read most files. Files below the following paths: +read most files. Files below the following paths are only accessible by root: /etc/pve/priv/ /etc/pve/nodes/${NAME}/priv/ -are only accessible by root. - Technology ---------- -We use the http://www.corosync.org[Corosync Cluster Engine] for -cluster communication, and http://www.sqlite.org[SQlite] for the -database file. The filesystem is implemented in user space using -http://fuse.sourceforge.net[FUSE]. +We use the https://www.corosync.org[Corosync Cluster Engine] for +cluster communication, and https://www.sqlite.org[SQlite] for the +database file. The file system is implemented in user space using +https://github.com/libfuse/libfuse[FUSE]. -File system layout +File System Layout ------------------ The file system is mounted at: @@ -93,37 +93,62 @@ Files [width="100%",cols="m,d"] |======= -|`corosync.conf` | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf) -|`storage.cfg` | {pve} storage configuration -|`datacenter.cfg` | {pve} datacenter wide configuration (keyboard layout, proxy, ...) -|`user.cfg` | {pve} access control configuration (users/groups/...) +|`authkey.pub` | Public key used by the ticket system +|`ceph.conf` | Ceph configuration file (note: /etc/ceph/ceph.conf is a symbolic link to this) +|`corosync.conf` | Corosync cluster configuration file (prior to {pve} 4.x, this file was called cluster.conf) +|`datacenter.cfg` | {pve} data center-wide configuration (keyboard layout, proxy, ...) |`domains.cfg` | {pve} authentication domains -|`authkey.pub` | Public key used by ticket system -|`pve-root-ca.pem` | Public certificate of cluster CA -|`priv/shadow.cfg` | Shadow password file -|`priv/authkey.key` | Private key used by ticket system -|`priv/pve-root-ca.key` | Private key of cluster CA -|`nodes//pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA) +|`firewall/cluster.fw` | Firewall configuration applied to all nodes +|`firewall/.fw` | Firewall configuration for individual nodes +|`firewall/.fw` | Firewall configuration for VMs and containers +|`ha/crm_commands` | Displays HA operations that are currently being carried out by the CRM +|`ha/manager_status` | JSON-formatted information regarding HA services on the cluster +|`ha/resources.cfg` | Resources managed by high availability, and their current state +|`nodes//config` | Node-specific configuration +|`nodes//lxc/.conf` | VM configuration data for LXC containers +|`nodes//openvz/` | Prior to PVE 4.0, used for container configuration data (deprecated, removed soon) |`nodes//pve-ssl.key` | Private SSL key for `pve-ssl.pem` -|`nodes//pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`) +|`nodes//pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA) |`nodes//pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional) +|`nodes//pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`) |`nodes//qemu-server/.conf` | VM configuration data for KVM VMs -|`nodes//lxc/.conf` | VM configuration data for LXC containers -|`firewall/cluster.fw` | Firewall configuration applied to all nodes -|`firewall/.fw` | Firewall configuration for individual nodes -|`firewall/.fw` | Firewall configuration for VMs and Containers +|`priv/authkey.key` | Private key used by ticket system +|`priv/authorized_keys` | SSH keys of cluster members for authentication +|`priv/ceph*` | Ceph authentication keys and associated capabilities +|`priv/known_hosts` | SSH keys of the cluster members for verification +|`priv/lock/*` | Lock files used by various services to ensure safe cluster-wide operations +|`priv/pve-root-ca.key` | Private key of cluster CA +|`priv/shadow.cfg` | Shadow password file for PVE Realm users +|`priv/storage/.pw` | Contains the password of a storage in plain text +|`priv/tfa.cfg` | Base64-encoded two-factor authentication configuration +|`priv/token.cfg` | API token secrets of all tokens +|`pve-root-ca.pem` | Public certificate of cluster CA +|`pve-www.key` | Private key used for generating CSRF tokens +|`sdn/*` | Shared configuration files for Software Defined Networking (SDN) +|`status.cfg` | {pve} external metrics server configuration +|`storage.cfg` | {pve} storage configuration +|`user.cfg` | {pve} access control configuration (users/groups/...) +|`virtual-guest/cpu-models.conf` | For storing custom CPU models +|`vzdump.cron` | Cluster-wide vzdump backup-job schedule |======= + Symbolic links ~~~~~~~~~~~~~~ +Certain directories within the cluster file system use symbolic links, in order +to point to a node's own configuration files. Thus, the files pointed to in the +table below refer to different files on each node of the cluster. + [width="100%",cols="m,m"] |======= |`local` | `nodes/` -|`qemu-server` | `nodes//qemu-server/` |`lxc` | `nodes//lxc/` +|`openvz` | `nodes//openvz/` (deprecated, removed soon) +|`qemu-server` | `nodes//qemu-server/` |======= + Special status files for debugging (JSON) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -136,65 +161,81 @@ Special status files for debugging (JSON) |`.rrd` |RRD data (most recent entries) |======= + Enable/Disable debugging ~~~~~~~~~~~~~~~~~~~~~~~~ You can enable verbose syslog messages with: - echo "1" >/etc/pve/.debug + echo "1" >/etc/pve/.debug And disable verbose syslog messages with: - echo "0" >/etc/pve/.debug + echo "0" >/etc/pve/.debug Recovery -------- -If you have major problems with your Proxmox VE host, e.g. hardware -issues, it could be helpful to just copy the pmxcfs database file -`/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE +If you have major problems with your {pve} host, for example hardware +issues, it could be helpful to copy the pmxcfs database file +`/var/lib/pve-cluster/config.db`, and move it to a new {pve} host. On the new host (with nothing running), you need to stop the -`pve-cluster` service and replace the `config.db` file (needed permissions -`0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the -lost Proxmox VE host, then reboot and check. (And don't forget your -VM/CT data) +`pve-cluster` service and replace the `config.db` file (required permissions +`0600`). Following this, adapt `/etc/hostname` and `/etc/hosts` according to the +lost {pve} host, then reboot and check (and don't forget your +VM/CT data). -Remove Cluster configuration + +Remove Cluster Configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The recommended way is to reinstall the node after you removed it from -your cluster. This makes sure that all secret cluster/ssh keys and any +The recommended way is to reinstall the node after you remove it from +your cluster. This ensures that all secret cluster/ssh keys and any shared configuration data is destroyed. -In some cases, you might prefer to put a node back to local mode -without reinstall, which is described here: - -* stop the cluster file system in `/etc/pve/` - - # systemctl stop pve-cluster +In some cases, you might prefer to put a node back to local mode without +reinstalling, which is described in +<> -* start it again but forcing local mode - # pmxcfs -l +Recovering/Moving Guests from Failed Nodes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* remove the cluster config +For the guest configuration files in `nodes//qemu-server/` (VMs) and +`nodes//lxc/` (containers), {pve} sees the containing node `` as the +owner of the respective guest. This concept enables the usage of local locks +instead of expensive cluster-wide locks for preventing concurrent guest +configuration changes. - # rm /etc/pve/cluster.conf - # rm /etc/cluster/cluster.conf - # rm /var/lib/pve-cluster/corosync.authkey +As a consequence, if the owning node of a guest fails (for example, due to a power +outage, fencing event, etc.), a regular migration is not possible (even if all +the disks are located on shared storage), because such a local lock on the +(offline) owning node is unobtainable. This is not a problem for HA-managed +guests, as {pve}'s High Availability stack includes the necessary +(cluster-wide) locking and watchdog functionality to ensure correct and +automatic recovery of guests from fenced nodes. -* stop the cluster file system again +If a non-HA-managed guest has only shared disks (and no other local resources +which are only available on the failed node), a manual recovery +is possible by simply moving the guest configuration file from the failed +node's directory in `/etc/pve/` to an online node's directory (which changes the +logical owner or location of the guest). - # systemctl stop pve-cluster +For example, recovering the VM with ID `100` from an offline `node1` to another +node `node2` works by running the following command as root on any member node +of the cluster: -* restart pve services (or reboot) + mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/ - # systemctl start pve-cluster - # systemctl restart pvedaemon - # systemctl restart pveproxy - # systemctl restart pvestatd +WARNING: Before manually recovering a guest like this, make absolutely sure +that the failed source node is really powered off/fenced. Otherwise {pve}'s +locking principles are violated by the `mv` command, which can have unexpected +consequences. +WARNING: Guests with local disks (or other local resources which are only +available on the offline node) are not recoverable like this. Either wait for the +failed node to rejoin the cluster or restore such guests from backups. ifdef::manvolnum[] include::pve-copyright.adoc[]