]>
Commit | Line | Data |
---|---|---|
2409e808 | 1 | [[chapter_pmxcfs]] |
bd88f9d9 | 2 | ifdef::manvolnum[] |
b2f242ab DM |
3 | pmxcfs(8) |
4 | ========= | |
5f09af76 DM |
5 | :pve-toplevel: |
6 | ||
bd88f9d9 DM |
7 | NAME |
8 | ---- | |
9 | ||
10 | pmxcfs - Proxmox Cluster File System | |
11 | ||
49a5e11c | 12 | SYNOPSIS |
bd88f9d9 DM |
13 | -------- |
14 | ||
54079101 | 15 | include::pmxcfs.8-synopsis.adoc[] |
bd88f9d9 DM |
16 | |
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Proxmox Cluster File System (pmxcfs) | |
ac1e3896 | 23 | ==================================== |
5f09af76 | 24 | :pve-toplevel: |
194d2f29 | 25 | endif::manvolnum[] |
5f09af76 | 26 | |
8c1189b6 | 27 | The Proxmox Cluster file system (``pmxcfs'') is a database-driven file |
ac1e3896 | 28 | system for storing configuration files, replicated in real time to all |
8c1189b6 | 29 | cluster nodes using `corosync`. We use this to store all PVE related |
ac1e3896 DM |
30 | configuration files. |
31 | ||
32 | Although the file system stores all data inside a persistent database | |
0593681f | 33 | on disk, a copy of the data resides in RAM. This imposes restrictions |
5eba0743 | 34 | on the maximum size, which is currently 30MB. This is still enough to |
ac1e3896 DM |
35 | store the configuration of several thousand virtual machines. |
36 | ||
960f6344 | 37 | This system provides the following advantages: |
ac1e3896 | 38 | |
0593681f DW |
39 | * Seamless replication of all configuration to all nodes in real time |
40 | * Provides strong consistency checks to avoid duplicate VM IDs | |
41 | * Read-only when a node loses quorum | |
42 | * Automatic updates of the corosync cluster configuration to all nodes | |
43 | * Includes a distributed locking mechanism | |
ac1e3896 | 44 | |
5eba0743 | 45 | |
ac1e3896 | 46 | POSIX Compatibility |
960f6344 | 47 | ------------------- |
ac1e3896 DM |
48 | |
49 | The file system is based on FUSE, so the behavior is POSIX like. But | |
50 | some feature are simply not implemented, because we do not need them: | |
51 | ||
0593681f | 52 | * You can just generate normal files and directories, but no symbolic |
ac1e3896 DM |
53 | links, ... |
54 | ||
0593681f | 55 | * You can't rename non-empty directories (because this makes it easier |
ac1e3896 DM |
56 | to guarantee that VMIDs are unique). |
57 | ||
0593681f | 58 | * You can't change file permissions (permissions are based on paths) |
ac1e3896 DM |
59 | |
60 | * `O_EXCL` creates were not atomic (like old NFS) | |
61 | ||
62 | * `O_TRUNC` creates are not atomic (FUSE restriction) | |
63 | ||
64 | ||
5eba0743 | 65 | File Access Rights |
960f6344 | 66 | ------------------ |
ac1e3896 | 67 | |
8c1189b6 FG |
68 | All files and directories are owned by user `root` and have group |
69 | `www-data`. Only root has write permissions, but group `www-data` can | |
0593681f | 70 | read most files. Files below the following paths are only accessible by root: |
ac1e3896 DM |
71 | |
72 | /etc/pve/priv/ | |
73 | /etc/pve/nodes/${NAME}/priv/ | |
74 | ||
960f6344 | 75 | |
ac1e3896 DM |
76 | Technology |
77 | ---------- | |
78 | ||
a55d30db OB |
79 | We use the https://www.corosync.org[Corosync Cluster Engine] for |
80 | cluster communication, and https://www.sqlite.org[SQlite] for the | |
5eba0743 | 81 | database file. The file system is implemented in user space using |
a55d30db | 82 | https://github.com/libfuse/libfuse[FUSE]. |
ac1e3896 | 83 | |
5eba0743 | 84 | File System Layout |
ac1e3896 DM |
85 | ------------------ |
86 | ||
87 | The file system is mounted at: | |
88 | ||
89 | /etc/pve | |
90 | ||
91 | Files | |
92 | ~~~~~ | |
93 | ||
94 | [width="100%",cols="m,d"] | |
95 | |======= | |
42807cae DW |
96 | |`authkey.pub` | Public key used by the ticket system |
97 | |`ceph.conf` | Ceph configuration file (note: /etc/ceph/ceph.conf is a symbolic link to this) | |
98 | |`corosync.conf` | Corosync cluster configuration file (prior to {pve} 4.x, this file was called cluster.conf) | |
99 | |`datacenter.cfg` | {pve} data center-wide configuration (keyboard layout, proxy, ...) | |
8c1189b6 | 100 | |`domains.cfg` | {pve} authentication domains |
42807cae DW |
101 | |`firewall/cluster.fw` | Firewall configuration applied to all nodes |
102 | |`firewall/<NAME>.fw` | Firewall configuration for individual nodes | |
103 | |`firewall/<VMID>.fw` | Firewall configuration for VMs and containers | |
104 | |`ha/crm_commands` | Displays HA operations that are currently being carried out by the CRM | |
105 | |`ha/manager_status` | JSON-formatted information regarding HA services on the cluster | |
106 | |`ha/resources.cfg` | Resources managed by high availability, and their current state | |
107 | |`nodes/<NAME>/config` | Node-specific configuration | |
108 | |`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers | |
109 | |`nodes/<NAME>/openvz/` | Prior to PVE 4.0, used for container configuration data (deprecated, removed soon) | |
8c1189b6 | 110 | |`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem` |
42807cae | 111 | |`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA) |
8c1189b6 | 112 | |`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional) |
42807cae | 113 | |`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`) |
8c1189b6 | 114 | |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs |
42807cae DW |
115 | |`priv/authkey.key` | Private key used by ticket system |
116 | |`priv/authorized_keys` | SSH keys of cluster members for authentication | |
117 | |`priv/ceph*` | Ceph authentication keys and associated capabilities | |
118 | |`priv/known_hosts` | SSH keys of the cluster members for verification | |
119 | |`priv/lock/*` | Lock files used by various services to ensure safe cluster-wide operations | |
120 | |`priv/pve-root-ca.key` | Private key of cluster CA | |
121 | |`priv/shadow.cfg` | Shadow password file for PVE Realm users | |
122 | |`priv/storage/<STORAGE-ID>.pw` | Contains the password of a storage in plain text | |
123 | |`priv/tfa.cfg` | Base64-encoded two-factor authentication configuration | |
124 | |`priv/token.cfg` | API token secrets of all tokens | |
125 | |`pve-root-ca.pem` | Public certificate of cluster CA | |
126 | |`pve-www.key` | Private key used for generating CSRF tokens | |
127 | |`sdn/*` | Shared configuration files for Software Defined Networking (SDN) | |
128 | |`status.cfg` | {pve} external metrics server configuration | |
129 | |`storage.cfg` | {pve} storage configuration | |
130 | |`user.cfg` | {pve} access control configuration (users/groups/...) | |
131 | |`virtual-guest/cpu-models.conf` | For storing custom CPU models | |
132 | |`vzdump.cron` | Cluster-wide vzdump backup-job schedule | |
ac1e3896 DM |
133 | |======= |
134 | ||
5eba0743 | 135 | |
ac1e3896 DM |
136 | Symbolic links |
137 | ~~~~~~~~~~~~~~ | |
138 | ||
42807cae DW |
139 | Certain directories within the cluster file system use symbolic links, in order |
140 | to point to a node's own configuration files. Thus, the files pointed to in the | |
141 | table below refer to different files on each node of the cluster. | |
142 | ||
ac1e3896 DM |
143 | [width="100%",cols="m,m"] |
144 | |======= | |
8c1189b6 | 145 | |`local` | `nodes/<LOCAL_HOST_NAME>` |
8c1189b6 | 146 | |`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/` |
42807cae DW |
147 | |`openvz` | `nodes/<LOCAL_HOST_NAME>/openvz/` (deprecated, removed soon) |
148 | |`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/` | |
ac1e3896 DM |
149 | |======= |
150 | ||
5eba0743 | 151 | |
ac1e3896 DM |
152 | Special status files for debugging (JSON) |
153 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
154 | ||
155 | [width="100%",cols="m,d"] | |
156 | |======= | |
8c1189b6 FG |
157 | |`.version` |File versions (to detect file modifications) |
158 | |`.members` |Info about cluster members | |
159 | |`.vmlist` |List of all VMs | |
160 | |`.clusterlog` |Cluster log (last 50 entries) | |
161 | |`.rrd` |RRD data (most recent entries) | |
ac1e3896 DM |
162 | |======= |
163 | ||
5eba0743 | 164 | |
ac1e3896 DM |
165 | Enable/Disable debugging |
166 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
167 | ||
168 | You can enable verbose syslog messages with: | |
169 | ||
100194d7 | 170 | echo "1" >/etc/pve/.debug |
ac1e3896 DM |
171 | |
172 | And disable verbose syslog messages with: | |
173 | ||
100194d7 | 174 | echo "0" >/etc/pve/.debug |
ac1e3896 DM |
175 | |
176 | ||
177 | Recovery | |
178 | -------- | |
179 | ||
0593681f DW |
180 | If you have major problems with your {pve} host, for example hardware |
181 | issues, it could be helpful to copy the pmxcfs database file | |
182 | `/var/lib/pve-cluster/config.db`, and move it to a new {pve} | |
ac1e3896 | 183 | host. On the new host (with nothing running), you need to stop the |
0593681f DW |
184 | `pve-cluster` service and replace the `config.db` file (required permissions |
185 | `0600`). Following this, adapt `/etc/hostname` and `/etc/hosts` according to the | |
186 | lost {pve} host, then reboot and check (and don't forget your | |
187 | VM/CT data). | |
ac1e3896 | 188 | |
5eba0743 | 189 | |
0593681f | 190 | Remove Cluster Configuration |
ac1e3896 DM |
191 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
192 | ||
0593681f DW |
193 | The recommended way is to reinstall the node after you remove it from |
194 | your cluster. This ensures that all secret cluster/ssh keys and any | |
ac1e3896 DM |
195 | shared configuration data is destroyed. |
196 | ||
38ae8db3 | 197 | In some cases, you might prefer to put a node back to local mode without |
0593681f | 198 | reinstalling, which is described in |
38ae8db3 | 199 | <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>> |
bd88f9d9 | 200 | |
5db724de FG |
201 | |
202 | Recovering/Moving Guests from Failed Nodes | |
203 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
204 | ||
205 | For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and | |
0593681f | 206 | `nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as the |
5db724de FG |
207 | owner of the respective guest. This concept enables the usage of local locks |
208 | instead of expensive cluster-wide locks for preventing concurrent guest | |
209 | configuration changes. | |
210 | ||
0593681f DW |
211 | As a consequence, if the owning node of a guest fails (for example, due to a power |
212 | outage, fencing event, etc.), a regular migration is not possible (even if all | |
213 | the disks are located on shared storage), because such a local lock on the | |
214 | (offline) owning node is unobtainable. This is not a problem for HA-managed | |
5db724de FG |
215 | guests, as {pve}'s High Availability stack includes the necessary |
216 | (cluster-wide) locking and watchdog functionality to ensure correct and | |
217 | automatic recovery of guests from fenced nodes. | |
218 | ||
219 | If a non-HA-managed guest has only shared disks (and no other local resources | |
0593681f | 220 | which are only available on the failed node), a manual recovery |
5db724de | 221 | is possible by simply moving the guest configuration file from the failed |
0593681f | 222 | node's directory in `/etc/pve/` to an online node's directory (which changes the |
5db724de FG |
223 | logical owner or location of the guest). |
224 | ||
0593681f DW |
225 | For example, recovering the VM with ID `100` from an offline `node1` to another |
226 | node `node2` works by running the following command as root on any member node | |
227 | of the cluster: | |
5db724de FG |
228 | |
229 | mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/ | |
230 | ||
231 | WARNING: Before manually recovering a guest like this, make absolutely sure | |
232 | that the failed source node is really powered off/fenced. Otherwise {pve}'s | |
233 | locking principles are violated by the `mv` command, which can have unexpected | |
234 | consequences. | |
235 | ||
0593681f DW |
236 | WARNING: Guests with local disks (or other local resources which are only |
237 | available on the offline node) are not recoverable like this. Either wait for the | |
5db724de FG |
238 | failed node to rejoin the cluster or restore such guests from backups. |
239 | ||
bd88f9d9 DM |
240 | ifdef::manvolnum[] |
241 | include::pve-copyright.adoc[] | |
242 | endif::manvolnum[] |