]>
Commit | Line | Data |
---|---|---|
2409e808 | 1 | [[chapter_pmxcfs]] |
bd88f9d9 | 2 | ifdef::manvolnum[] |
b2f242ab DM |
3 | pmxcfs(8) |
4 | ========= | |
5f09af76 DM |
5 | :pve-toplevel: |
6 | ||
bd88f9d9 DM |
7 | NAME |
8 | ---- | |
9 | ||
10 | pmxcfs - Proxmox Cluster File System | |
11 | ||
49a5e11c | 12 | SYNOPSIS |
bd88f9d9 DM |
13 | -------- |
14 | ||
54079101 | 15 | include::pmxcfs.8-synopsis.adoc[] |
bd88f9d9 DM |
16 | |
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
20 | ||
21 | ifndef::manvolnum[] | |
22 | Proxmox Cluster File System (pmxcfs) | |
ac1e3896 | 23 | ==================================== |
5f09af76 | 24 | :pve-toplevel: |
194d2f29 | 25 | endif::manvolnum[] |
5f09af76 | 26 | |
8c1189b6 | 27 | The Proxmox Cluster file system (``pmxcfs'') is a database-driven file |
ac1e3896 | 28 | system for storing configuration files, replicated in real time to all |
8c1189b6 | 29 | cluster nodes using `corosync`. We use this to store all PVE related |
ac1e3896 DM |
30 | configuration files. |
31 | ||
32 | Although the file system stores all data inside a persistent database | |
33 | on disk, a copy of the data resides in RAM. That imposes restriction | |
5eba0743 | 34 | on the maximum size, which is currently 30MB. This is still enough to |
ac1e3896 DM |
35 | store the configuration of several thousand virtual machines. |
36 | ||
960f6344 | 37 | This system provides the following advantages: |
ac1e3896 DM |
38 | |
39 | * seamless replication of all configuration to all nodes in real time | |
40 | * provides strong consistency checks to avoid duplicate VM IDs | |
a8e99754 | 41 | * read-only when a node loses quorum |
ac1e3896 DM |
42 | * automatic updates of the corosync cluster configuration to all nodes |
43 | * includes a distributed locking mechanism | |
44 | ||
5eba0743 | 45 | |
ac1e3896 | 46 | POSIX Compatibility |
960f6344 | 47 | ------------------- |
ac1e3896 DM |
48 | |
49 | The file system is based on FUSE, so the behavior is POSIX like. But | |
50 | some feature are simply not implemented, because we do not need them: | |
51 | ||
52 | * you can just generate normal files and directories, but no symbolic | |
53 | links, ... | |
54 | ||
55 | * you can't rename non-empty directories (because this makes it easier | |
56 | to guarantee that VMIDs are unique). | |
57 | ||
58 | * you can't change file permissions (permissions are based on path) | |
59 | ||
60 | * `O_EXCL` creates were not atomic (like old NFS) | |
61 | ||
62 | * `O_TRUNC` creates are not atomic (FUSE restriction) | |
63 | ||
64 | ||
5eba0743 | 65 | File Access Rights |
960f6344 | 66 | ------------------ |
ac1e3896 | 67 | |
8c1189b6 FG |
68 | All files and directories are owned by user `root` and have group |
69 | `www-data`. Only root has write permissions, but group `www-data` can | |
ac1e3896 DM |
70 | read most files. Files below the following paths: |
71 | ||
72 | /etc/pve/priv/ | |
73 | /etc/pve/nodes/${NAME}/priv/ | |
74 | ||
75 | are only accessible by root. | |
76 | ||
960f6344 | 77 | |
ac1e3896 DM |
78 | Technology |
79 | ---------- | |
80 | ||
81 | We use the http://www.corosync.org[Corosync Cluster Engine] for | |
82 | cluster communication, and http://www.sqlite.org[SQlite] for the | |
5eba0743 | 83 | database file. The file system is implemented in user space using |
ac1e3896 DM |
84 | http://fuse.sourceforge.net[FUSE]. |
85 | ||
5eba0743 | 86 | File System Layout |
ac1e3896 DM |
87 | ------------------ |
88 | ||
89 | The file system is mounted at: | |
90 | ||
91 | /etc/pve | |
92 | ||
93 | Files | |
94 | ~~~~~ | |
95 | ||
96 | [width="100%",cols="m,d"] | |
97 | |======= | |
8c1189b6 FG |
98 | |`corosync.conf` | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf) |
99 | |`storage.cfg` | {pve} storage configuration | |
100 | |`datacenter.cfg` | {pve} datacenter wide configuration (keyboard layout, proxy, ...) | |
101 | |`user.cfg` | {pve} access control configuration (users/groups/...) | |
102 | |`domains.cfg` | {pve} authentication domains | |
7b7e71f1 | 103 | |`status.cfg` | {pve} external metrics server configuration |
8c1189b6 FG |
104 | |`authkey.pub` | Public key used by ticket system |
105 | |`pve-root-ca.pem` | Public certificate of cluster CA | |
106 | |`priv/shadow.cfg` | Shadow password file | |
107 | |`priv/authkey.key` | Private key used by ticket system | |
108 | |`priv/pve-root-ca.key` | Private key of cluster CA | |
109 | |`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA) | |
110 | |`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem` | |
111 | |`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`) | |
112 | |`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional) | |
113 | |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs | |
114 | |`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers | |
115 | |`firewall/cluster.fw` | Firewall configuration applied to all nodes | |
116 | |`firewall/<NAME>.fw` | Firewall configuration for individual nodes | |
117 | |`firewall/<VMID>.fw` | Firewall configuration for VMs and Containers | |
ac1e3896 DM |
118 | |======= |
119 | ||
5eba0743 | 120 | |
ac1e3896 DM |
121 | Symbolic links |
122 | ~~~~~~~~~~~~~~ | |
123 | ||
124 | [width="100%",cols="m,m"] | |
125 | |======= | |
8c1189b6 FG |
126 | |`local` | `nodes/<LOCAL_HOST_NAME>` |
127 | |`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/` | |
128 | |`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/` | |
ac1e3896 DM |
129 | |======= |
130 | ||
5eba0743 | 131 | |
ac1e3896 DM |
132 | Special status files for debugging (JSON) |
133 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
134 | ||
135 | [width="100%",cols="m,d"] | |
136 | |======= | |
8c1189b6 FG |
137 | |`.version` |File versions (to detect file modifications) |
138 | |`.members` |Info about cluster members | |
139 | |`.vmlist` |List of all VMs | |
140 | |`.clusterlog` |Cluster log (last 50 entries) | |
141 | |`.rrd` |RRD data (most recent entries) | |
ac1e3896 DM |
142 | |======= |
143 | ||
5eba0743 | 144 | |
ac1e3896 DM |
145 | Enable/Disable debugging |
146 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
147 | ||
148 | You can enable verbose syslog messages with: | |
149 | ||
100194d7 | 150 | echo "1" >/etc/pve/.debug |
ac1e3896 DM |
151 | |
152 | And disable verbose syslog messages with: | |
153 | ||
100194d7 | 154 | echo "0" >/etc/pve/.debug |
ac1e3896 DM |
155 | |
156 | ||
157 | Recovery | |
158 | -------- | |
159 | ||
160 | If you have major problems with your Proxmox VE host, e.g. hardware | |
161 | issues, it could be helpful to just copy the pmxcfs database file | |
8c1189b6 | 162 | `/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE |
ac1e3896 | 163 | host. On the new host (with nothing running), you need to stop the |
8c1189b6 FG |
164 | `pve-cluster` service and replace the `config.db` file (needed permissions |
165 | `0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the | |
166 | lost Proxmox VE host, then reboot and check. (And don't forget your | |
ac1e3896 DM |
167 | VM/CT data) |
168 | ||
5eba0743 | 169 | |
ac1e3896 DM |
170 | Remove Cluster configuration |
171 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
172 | ||
173 | The recommended way is to reinstall the node after you removed it from | |
174 | your cluster. This makes sure that all secret cluster/ssh keys and any | |
175 | shared configuration data is destroyed. | |
176 | ||
38ae8db3 TL |
177 | In some cases, you might prefer to put a node back to local mode without |
178 | reinstall, which is described in | |
179 | <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>> | |
bd88f9d9 | 180 | |
5db724de FG |
181 | |
182 | Recovering/Moving Guests from Failed Nodes | |
183 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
184 | ||
185 | For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and | |
186 | `nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as | |
187 | owner of the respective guest. This concept enables the usage of local locks | |
188 | instead of expensive cluster-wide locks for preventing concurrent guest | |
189 | configuration changes. | |
190 | ||
191 | As a consequence, if the owning node of a guest fails (e.g., because of a power | |
192 | outage, fencing event, ..), a regular migration is not possible (even if all | |
193 | the disks are located on shared storage) because such a local lock on the | |
194 | (dead) owning node is unobtainable. This is not a problem for HA-managed | |
195 | guests, as {pve}'s High Availability stack includes the necessary | |
196 | (cluster-wide) locking and watchdog functionality to ensure correct and | |
197 | automatic recovery of guests from fenced nodes. | |
198 | ||
199 | If a non-HA-managed guest has only shared disks (and no other local resources | |
200 | which are only available on the failed node are configured), a manual recovery | |
201 | is possible by simply moving the guest configuration file from the failed | |
202 | node's directory in `/etc/pve/` to an alive node's directory (which changes the | |
203 | logical owner or location of the guest). | |
204 | ||
205 | For example, recovering the VM with ID `100` from a dead `node1` to another | |
206 | node `node2` works with the following command executed when logged in as root | |
207 | on any member node of the cluster: | |
208 | ||
209 | mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/ | |
210 | ||
211 | WARNING: Before manually recovering a guest like this, make absolutely sure | |
212 | that the failed source node is really powered off/fenced. Otherwise {pve}'s | |
213 | locking principles are violated by the `mv` command, which can have unexpected | |
214 | consequences. | |
215 | ||
216 | WARNING: Guest with local disks (or other local resources which are only | |
217 | available on the dead node) are not recoverable like this. Either wait for the | |
218 | failed node to rejoin the cluster or restore such guests from backups. | |
219 | ||
bd88f9d9 DM |
220 | ifdef::manvolnum[] |
221 | include::pve-copyright.adoc[] | |
222 | endif::manvolnum[] |