pmxcfs.adoc

   1 ifdef::manvolnum[]
   2 pmxcfs(8)
   3 =========
   4 :pve-toplevel:
   5
   6 NAME
   7 ----
   8
   9 pmxcfs - Proxmox Cluster File System
  10
  11 SYNOPSIS
  12 --------
  13
  14 include::pmxcfs.8-synopsis.adoc[]
  15
  16 DESCRIPTION
  17 -----------
  18 endif::manvolnum[]
  19
  20 ifndef::manvolnum[]
  21 Proxmox Cluster File System (pmxcfs)
  22 ====================================
  23 :pve-toplevel:
  24 endif::manvolnum[]
  25
  26 The Proxmox Cluster file system (``pmxcfs'') is a database-driven file
  27 system for storing configuration files, replicated in real time to all
  28 cluster nodes using `corosync`. We use this to store all PVE related
  29 configuration files.
  30
  31 Although the file system stores all data inside a persistent database
  32 on disk, a copy of the data resides in RAM. That imposes restriction
  33 on the maximum size, which is currently 30MB. This is still enough to
  34 store the configuration of several thousand virtual machines.
  35
  36 This system provides the following advantages:
  37
  38 * seamless replication of all configuration to all nodes in real time
  39 * provides strong consistency checks to avoid duplicate VM IDs
  40 * read-only when a node loses quorum
  41 * automatic updates of the corosync cluster configuration to all nodes
  42 * includes a distributed locking mechanism
  43
  44
  45 POSIX Compatibility
  46 -------------------
  47
  48 The file system is based on FUSE, so the behavior is POSIX like. But
  49 some feature are simply not implemented, because we do not need them:
  50
  51 * you can just generate normal files and directories, but no symbolic
  52   links, ...
  53
  54 * you can't rename non-empty directories (because this makes it easier
  55   to guarantee that VMIDs are unique).
  56
  57 * you can't change file permissions (permissions are based on path)
  58
  59 * `O_EXCL` creates were not atomic (like old NFS)
  60
  61 * `O_TRUNC` creates are not atomic (FUSE restriction)
  62
  63
  64 File Access Rights
  65 ------------------
  66
  67 All files and directories are owned by user `root` and have group
  68 `www-data`. Only root has write permissions, but group `www-data` can
  69 read most files. Files below the following paths:
  70
  71  /etc/pve/priv/
  72  /etc/pve/nodes/${NAME}/priv/
  73
  74 are only accessible by root.
  75
  76
  77 Technology
  78 ----------
  79
  80 We use the http://www.corosync.org[Corosync Cluster Engine] for
  81 cluster communication, and http://www.sqlite.org[SQlite] for the
  82 database file. The file system is implemented in user space using
  83 http://fuse.sourceforge.net[FUSE].
  84
  85 File System Layout
  86 ------------------
  87
  88 The file system is mounted at:
  89
  90  /etc/pve
  91
  92 Files
  93 ~~~~~
  94
  95 [width="100%",cols="m,d"]
  96 |=======
  97 |`corosync.conf`                        | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf)
  98 |`storage.cfg`                          | {pve} storage configuration
  99 |`datacenter.cfg`                       | {pve} datacenter wide configuration (keyboard layout, proxy, ...)
 100 |`user.cfg`                             | {pve} access control configuration (users/groups/...)
 101 |`domains.cfg`                          | {pve} authentication domains
 102 |`status.cfg`                           | {pve} external metrics server configuration
 103 |`authkey.pub`                          | Public key used by ticket system
 104 |`pve-root-ca.pem`                      | Public certificate of cluster CA
 105 |`priv/shadow.cfg`                      | Shadow password file
 106 |`priv/authkey.key`                     | Private key used by ticket system
 107 |`priv/pve-root-ca.key`                 | Private key of cluster CA
 108 |`nodes/<NAME>/pve-ssl.pem`             | Public SSL certificate for web server (signed by cluster CA)
 109 |`nodes/<NAME>/pve-ssl.key`             | Private SSL key for `pve-ssl.pem`
 110 |`nodes/<NAME>/pveproxy-ssl.pem`        | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`)
 111 |`nodes/<NAME>/pveproxy-ssl.key`        | Private SSL key for `pveproxy-ssl.pem` (optional)
 112 |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs
 113 |`nodes/<NAME>/lxc/<VMID>.conf`         | VM configuration data for LXC containers
 114 |`firewall/cluster.fw`                  | Firewall configuration applied to all nodes
 115 |`firewall/<NAME>.fw`                   | Firewall configuration for individual nodes
 116 |`firewall/<VMID>.fw`                   | Firewall configuration for VMs and Containers
 117 |=======
 118
 119
 120 Symbolic links
 121 ~~~~~~~~~~~~~~
 122
 123 [width="100%",cols="m,m"]
 124 |=======
 125 |`local`         | `nodes/<LOCAL_HOST_NAME>`
 126 |`qemu-server`   | `nodes/<LOCAL_HOST_NAME>/qemu-server/`
 127 |`lxc`           | `nodes/<LOCAL_HOST_NAME>/lxc/`
 128 |=======
 129
 130
 131 Special status files for debugging (JSON)
 132 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 133
 134 [width="100%",cols="m,d"]
 135 |=======
 136 |`.version`    |File versions (to detect file modifications)
 137 |`.members`    |Info about cluster members
 138 |`.vmlist`     |List of all VMs
 139 |`.clusterlog` |Cluster log (last 50 entries)
 140 |`.rrd`        |RRD data (most recent entries)
 141 |=======
 142
 143
 144 Enable/Disable debugging
 145 ~~~~~~~~~~~~~~~~~~~~~~~~
 146
 147 You can enable verbose syslog messages with:
 148
 149  echo "1" >/etc/pve/.debug
 150
 151 And disable verbose syslog messages with:
 152
 153  echo "0" >/etc/pve/.debug
 154
 155
 156 Recovery
 157 --------
 158
 159 If you have major problems with your Proxmox VE host, e.g. hardware
 160 issues, it could be helpful to just copy the pmxcfs database file
 161 `/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE
 162 host. On the new host (with nothing running), you need to stop the
 163 `pve-cluster` service and replace the `config.db` file (needed permissions
 164 `0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the
 165 lost Proxmox VE host, then reboot and check. (And don't forget your
 166 VM/CT data)
 167
 168
 169 Remove Cluster configuration
 170 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 171
 172 The recommended way is to reinstall the node after you removed it from
 173 your cluster. This makes sure that all secret cluster/ssh keys and any
 174 shared configuration data is destroyed.
 175
 176 In some cases, you might prefer to put a node back to local mode without
 177 reinstall, which is described in
 178 <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>>
 179
 180
 181 Recovering/Moving Guests from Failed Nodes
 182 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 183
 184 For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and
 185 `nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as
 186 owner of the respective guest. This concept enables the usage of local locks
 187 instead of expensive cluster-wide locks for preventing concurrent guest
 188 configuration changes.
 189
 190 As a consequence, if the owning node of a guest fails (e.g., because of a power
 191 outage, fencing event, ..), a regular migration is not possible (even if all
 192 the disks are located on shared storage) because such a local lock on the
 193 (dead) owning node is unobtainable. This is not a problem for HA-managed
 194 guests, as {pve}'s High Availability stack includes the necessary
 195 (cluster-wide) locking and watchdog functionality to ensure correct and
 196 automatic recovery of guests from fenced nodes.
 197
 198 If a non-HA-managed guest has only shared disks (and no other local resources
 199 which are only available on the failed node are configured), a manual recovery
 200 is possible by simply moving the guest configuration file from the failed
 201 node's directory in `/etc/pve/` to an alive node's directory (which changes the
 202 logical owner or location of the guest).
 203
 204 For example, recovering the VM with ID `100` from a dead `node1` to another
 205 node `node2` works with the following command executed when logged in as root
 206 on any member node of the cluster:
 207
 208  mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/
 209
 210 WARNING: Before manually recovering a guest like this, make absolutely sure
 211 that the failed source node is really powered off/fenced. Otherwise {pve}'s
 212 locking principles are violated by the `mv` command, which can have unexpected
 213 consequences.
 214
 215 WARNING: Guest with local disks (or other local resources which are only
 216 available on the dead node) are not recoverable like this. Either wait for the
 217 failed node to rejoin the cluster or restore such guests from backups.
 218
 219 ifdef::manvolnum[]
 220 include::pve-copyright.adoc[]
 221 endif::manvolnum[]