]>
Commit | Line | Data |
---|---|---|
bd88f9d9 | 1 | ifdef::manvolnum[] |
b2f242ab DM |
2 | pmxcfs(8) |
3 | ========= | |
5f09af76 DM |
4 | :pve-toplevel: |
5 | ||
bd88f9d9 DM |
6 | NAME |
7 | ---- | |
8 | ||
9 | pmxcfs - Proxmox Cluster File System | |
10 | ||
49a5e11c | 11 | SYNOPSIS |
bd88f9d9 DM |
12 | -------- |
13 | ||
54079101 | 14 | include::pmxcfs.8-synopsis.adoc[] |
bd88f9d9 DM |
15 | |
16 | DESCRIPTION | |
17 | ----------- | |
18 | endif::manvolnum[] | |
19 | ||
20 | ifndef::manvolnum[] | |
21 | Proxmox Cluster File System (pmxcfs) | |
ac1e3896 | 22 | ==================================== |
5f09af76 | 23 | :pve-toplevel: |
194d2f29 | 24 | endif::manvolnum[] |
5f09af76 | 25 | |
8c1189b6 | 26 | The Proxmox Cluster file system (``pmxcfs'') is a database-driven file |
ac1e3896 | 27 | system for storing configuration files, replicated in real time to all |
8c1189b6 | 28 | cluster nodes using `corosync`. We use this to store all PVE related |
ac1e3896 DM |
29 | configuration files. |
30 | ||
31 | Although the file system stores all data inside a persistent database | |
32 | on disk, a copy of the data resides in RAM. That imposes restriction | |
5eba0743 | 33 | on the maximum size, which is currently 30MB. This is still enough to |
ac1e3896 DM |
34 | store the configuration of several thousand virtual machines. |
35 | ||
960f6344 | 36 | This system provides the following advantages: |
ac1e3896 DM |
37 | |
38 | * seamless replication of all configuration to all nodes in real time | |
39 | * provides strong consistency checks to avoid duplicate VM IDs | |
a8e99754 | 40 | * read-only when a node loses quorum |
ac1e3896 DM |
41 | * automatic updates of the corosync cluster configuration to all nodes |
42 | * includes a distributed locking mechanism | |
43 | ||
5eba0743 | 44 | |
ac1e3896 | 45 | POSIX Compatibility |
960f6344 | 46 | ------------------- |
ac1e3896 DM |
47 | |
48 | The file system is based on FUSE, so the behavior is POSIX like. But | |
49 | some feature are simply not implemented, because we do not need them: | |
50 | ||
51 | * you can just generate normal files and directories, but no symbolic | |
52 | links, ... | |
53 | ||
54 | * you can't rename non-empty directories (because this makes it easier | |
55 | to guarantee that VMIDs are unique). | |
56 | ||
57 | * you can't change file permissions (permissions are based on path) | |
58 | ||
59 | * `O_EXCL` creates were not atomic (like old NFS) | |
60 | ||
61 | * `O_TRUNC` creates are not atomic (FUSE restriction) | |
62 | ||
63 | ||
5eba0743 | 64 | File Access Rights |
960f6344 | 65 | ------------------ |
ac1e3896 | 66 | |
8c1189b6 FG |
67 | All files and directories are owned by user `root` and have group |
68 | `www-data`. Only root has write permissions, but group `www-data` can | |
ac1e3896 DM |
69 | read most files. Files below the following paths: |
70 | ||
71 | /etc/pve/priv/ | |
72 | /etc/pve/nodes/${NAME}/priv/ | |
73 | ||
74 | are only accessible by root. | |
75 | ||
960f6344 | 76 | |
ac1e3896 DM |
77 | Technology |
78 | ---------- | |
79 | ||
80 | We use the http://www.corosync.org[Corosync Cluster Engine] for | |
81 | cluster communication, and http://www.sqlite.org[SQlite] for the | |
5eba0743 | 82 | database file. The file system is implemented in user space using |
ac1e3896 DM |
83 | http://fuse.sourceforge.net[FUSE]. |
84 | ||
5eba0743 | 85 | File System Layout |
ac1e3896 DM |
86 | ------------------ |
87 | ||
88 | The file system is mounted at: | |
89 | ||
90 | /etc/pve | |
91 | ||
92 | Files | |
93 | ~~~~~ | |
94 | ||
95 | [width="100%",cols="m,d"] | |
96 | |======= | |
8c1189b6 FG |
97 | |`corosync.conf` | Corosync cluster configuration file (previous to {pve} 4.x this file was called cluster.conf) |
98 | |`storage.cfg` | {pve} storage configuration | |
99 | |`datacenter.cfg` | {pve} datacenter wide configuration (keyboard layout, proxy, ...) | |
100 | |`user.cfg` | {pve} access control configuration (users/groups/...) | |
101 | |`domains.cfg` | {pve} authentication domains | |
102 | |`authkey.pub` | Public key used by ticket system | |
103 | |`pve-root-ca.pem` | Public certificate of cluster CA | |
104 | |`priv/shadow.cfg` | Shadow password file | |
105 | |`priv/authkey.key` | Private key used by ticket system | |
106 | |`priv/pve-root-ca.key` | Private key of cluster CA | |
107 | |`nodes/<NAME>/pve-ssl.pem` | Public SSL certificate for web server (signed by cluster CA) | |
108 | |`nodes/<NAME>/pve-ssl.key` | Private SSL key for `pve-ssl.pem` | |
109 | |`nodes/<NAME>/pveproxy-ssl.pem` | Public SSL certificate (chain) for web server (optional override for `pve-ssl.pem`) | |
110 | |`nodes/<NAME>/pveproxy-ssl.key` | Private SSL key for `pveproxy-ssl.pem` (optional) | |
111 | |`nodes/<NAME>/qemu-server/<VMID>.conf` | VM configuration data for KVM VMs | |
112 | |`nodes/<NAME>/lxc/<VMID>.conf` | VM configuration data for LXC containers | |
113 | |`firewall/cluster.fw` | Firewall configuration applied to all nodes | |
114 | |`firewall/<NAME>.fw` | Firewall configuration for individual nodes | |
115 | |`firewall/<VMID>.fw` | Firewall configuration for VMs and Containers | |
ac1e3896 DM |
116 | |======= |
117 | ||
5eba0743 | 118 | |
ac1e3896 DM |
119 | Symbolic links |
120 | ~~~~~~~~~~~~~~ | |
121 | ||
122 | [width="100%",cols="m,m"] | |
123 | |======= | |
8c1189b6 FG |
124 | |`local` | `nodes/<LOCAL_HOST_NAME>` |
125 | |`qemu-server` | `nodes/<LOCAL_HOST_NAME>/qemu-server/` | |
126 | |`lxc` | `nodes/<LOCAL_HOST_NAME>/lxc/` | |
ac1e3896 DM |
127 | |======= |
128 | ||
5eba0743 | 129 | |
ac1e3896 DM |
130 | Special status files for debugging (JSON) |
131 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
132 | ||
133 | [width="100%",cols="m,d"] | |
134 | |======= | |
8c1189b6 FG |
135 | |`.version` |File versions (to detect file modifications) |
136 | |`.members` |Info about cluster members | |
137 | |`.vmlist` |List of all VMs | |
138 | |`.clusterlog` |Cluster log (last 50 entries) | |
139 | |`.rrd` |RRD data (most recent entries) | |
ac1e3896 DM |
140 | |======= |
141 | ||
5eba0743 | 142 | |
ac1e3896 DM |
143 | Enable/Disable debugging |
144 | ~~~~~~~~~~~~~~~~~~~~~~~~ | |
145 | ||
146 | You can enable verbose syslog messages with: | |
147 | ||
148 | echo "1" >/etc/pve/.debug | |
149 | ||
150 | And disable verbose syslog messages with: | |
151 | ||
152 | echo "0" >/etc/pve/.debug | |
153 | ||
154 | ||
155 | Recovery | |
156 | -------- | |
157 | ||
158 | If you have major problems with your Proxmox VE host, e.g. hardware | |
159 | issues, it could be helpful to just copy the pmxcfs database file | |
8c1189b6 | 160 | `/var/lib/pve-cluster/config.db` and move it to a new Proxmox VE |
ac1e3896 | 161 | host. On the new host (with nothing running), you need to stop the |
8c1189b6 FG |
162 | `pve-cluster` service and replace the `config.db` file (needed permissions |
163 | `0600`). Second, adapt `/etc/hostname` and `/etc/hosts` according to the | |
164 | lost Proxmox VE host, then reboot and check. (And don't forget your | |
ac1e3896 DM |
165 | VM/CT data) |
166 | ||
5eba0743 | 167 | |
ac1e3896 DM |
168 | Remove Cluster configuration |
169 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
170 | ||
171 | The recommended way is to reinstall the node after you removed it from | |
172 | your cluster. This makes sure that all secret cluster/ssh keys and any | |
173 | shared configuration data is destroyed. | |
174 | ||
38ae8db3 TL |
175 | In some cases, you might prefer to put a node back to local mode without |
176 | reinstall, which is described in | |
177 | <<pvecm_separate_node_without_reinstall,Separate A Node Without Reinstalling>> | |
bd88f9d9 | 178 | |
5db724de FG |
179 | |
180 | Recovering/Moving Guests from Failed Nodes | |
181 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
182 | ||
183 | For the guest configuration files in `nodes/<NAME>/qemu-server/` (VMs) and | |
184 | `nodes/<NAME>/lxc/` (containers), {pve} sees the containing node `<NAME>` as | |
185 | owner of the respective guest. This concept enables the usage of local locks | |
186 | instead of expensive cluster-wide locks for preventing concurrent guest | |
187 | configuration changes. | |
188 | ||
189 | As a consequence, if the owning node of a guest fails (e.g., because of a power | |
190 | outage, fencing event, ..), a regular migration is not possible (even if all | |
191 | the disks are located on shared storage) because such a local lock on the | |
192 | (dead) owning node is unobtainable. This is not a problem for HA-managed | |
193 | guests, as {pve}'s High Availability stack includes the necessary | |
194 | (cluster-wide) locking and watchdog functionality to ensure correct and | |
195 | automatic recovery of guests from fenced nodes. | |
196 | ||
197 | If a non-HA-managed guest has only shared disks (and no other local resources | |
198 | which are only available on the failed node are configured), a manual recovery | |
199 | is possible by simply moving the guest configuration file from the failed | |
200 | node's directory in `/etc/pve/` to an alive node's directory (which changes the | |
201 | logical owner or location of the guest). | |
202 | ||
203 | For example, recovering the VM with ID `100` from a dead `node1` to another | |
204 | node `node2` works with the following command executed when logged in as root | |
205 | on any member node of the cluster: | |
206 | ||
207 | mv /etc/pve/nodes/node1/qemu-server/100.conf /etc/pve/nodes/node2/ | |
208 | ||
209 | WARNING: Before manually recovering a guest like this, make absolutely sure | |
210 | that the failed source node is really powered off/fenced. Otherwise {pve}'s | |
211 | locking principles are violated by the `mv` command, which can have unexpected | |
212 | consequences. | |
213 | ||
214 | WARNING: Guest with local disks (or other local resources which are only | |
215 | available on the dead node) are not recoverable like this. Either wait for the | |
216 | failed node to rejoin the cluster or restore such guests from backups. | |
217 | ||
bd88f9d9 DM |
218 | ifdef::manvolnum[] |
219 | include::pve-copyright.adoc[] | |
220 | endif::manvolnum[] |