]>
Commit | Line | Data |
---|---|---|
80c0adcb | 1 | [[chapter_pveceph]] |
0840a663 | 2 | ifdef::manvolnum[] |
b2f242ab DM |
3 | pveceph(1) |
4 | ========== | |
404a158e | 5 | :pve-toplevel: |
0840a663 DM |
6 | |
7 | NAME | |
8 | ---- | |
9 | ||
21394e70 | 10 | pveceph - Manage Ceph Services on Proxmox VE Nodes |
0840a663 | 11 | |
49a5e11c | 12 | SYNOPSIS |
0840a663 DM |
13 | -------- |
14 | ||
15 | include::pveceph.1-synopsis.adoc[] | |
16 | ||
17 | DESCRIPTION | |
18 | ----------- | |
19 | endif::manvolnum[] | |
0840a663 | 20 | ifndef::manvolnum[] |
fe93f133 DM |
21 | Manage Ceph Services on Proxmox VE Nodes |
22 | ======================================== | |
49d3ad91 | 23 | :pve-toplevel: |
0840a663 DM |
24 | endif::manvolnum[] |
25 | ||
8997dd6e DM |
26 | [thumbnail="gui-ceph-status.png"] |
27 | ||
c994e4e5 DM |
28 | {pve} unifies your compute and storage systems, i.e. you can use the |
29 | same physical nodes within a cluster for both computing (processing | |
30 | VMs and containers) and replicated storage. The traditional silos of | |
31 | compute and storage resources can be wrapped up into a single | |
32 | hyper-converged appliance. Separate storage networks (SANs) and | |
33 | connections via network (NAS) disappear. With the integration of Ceph, | |
34 | an open source software-defined storage platform, {pve} has the | |
35 | ability to run and manage Ceph storage directly on the hypervisor | |
36 | nodes. | |
37 | ||
38 | Ceph is a distributed object store and file system designed to provide | |
1d54c3b4 AA |
39 | excellent performance, reliability and scalability. |
40 | ||
41 | For small to mid sized deployments, it is possible to install a Ceph server for | |
42 | RADOS Block Devices (RBD) directly on your {pve} cluster nodes, see | |
c994e4e5 DM |
43 | xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]. Recent |
44 | hardware has plenty of CPU power and RAM, so running storage services | |
45 | and VMs on the same node is possible. | |
21394e70 DM |
46 | |
47 | To simplify management, we provide 'pveceph' - a tool to install and | |
48 | manage {ceph} services on {pve} nodes. | |
49 | ||
1d54c3b4 AA |
50 | Ceph consists of a couple of Daemons |
51 | footnote:[Ceph intro http://docs.ceph.com/docs/master/start/intro/], for use as | |
52 | a RBD storage: | |
53 | ||
54 | - Ceph Monitor (ceph-mon) | |
55 | - Ceph Manager (ceph-mgr) | |
56 | - Ceph OSD (ceph-osd; Object Storage Daemon) | |
57 | ||
58 | TIP: We recommend to get familiar with the Ceph vocabulary. | |
59 | footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary] | |
60 | ||
21394e70 DM |
61 | |
62 | Precondition | |
63 | ------------ | |
64 | ||
c994e4e5 DM |
65 | To build a Proxmox Ceph Cluster there should be at least three (preferably) |
66 | identical servers for the setup. | |
21394e70 | 67 | |
470d4313 | 68 | A 10Gb network, exclusively used for Ceph, is recommended. A meshed |
c994e4e5 DM |
69 | network setup is also an option if there are no 10Gb switches |
70 | available, see {webwiki-url}Full_Mesh_Network_for_Ceph_Server[wiki] . | |
21394e70 DM |
71 | |
72 | Check also the recommendations from | |
1d54c3b4 | 73 | http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website]. |
21394e70 DM |
74 | |
75 | ||
76 | Installation of Ceph Packages | |
77 | ----------------------------- | |
78 | ||
79 | On each node run the installation script as follows: | |
80 | ||
81 | [source,bash] | |
82 | ---- | |
19920184 | 83 | pveceph install |
21394e70 DM |
84 | ---- |
85 | ||
86 | This sets up an `apt` package repository in | |
87 | `/etc/apt/sources.list.d/ceph.list` and installs the required software. | |
88 | ||
89 | ||
90 | Creating initial Ceph configuration | |
91 | ----------------------------------- | |
92 | ||
8997dd6e DM |
93 | [thumbnail="gui-ceph-config.png"] |
94 | ||
21394e70 DM |
95 | After installation of packages, you need to create an initial Ceph |
96 | configuration on just one node, based on your network (`10.10.10.0/24` | |
97 | in the following example) dedicated for Ceph: | |
98 | ||
99 | [source,bash] | |
100 | ---- | |
101 | pveceph init --network 10.10.10.0/24 | |
102 | ---- | |
103 | ||
104 | This creates an initial config at `/etc/pve/ceph.conf`. That file is | |
c994e4e5 | 105 | automatically distributed to all {pve} nodes by using |
21394e70 DM |
106 | xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link |
107 | from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run | |
108 | Ceph commands without the need to specify a configuration file. | |
109 | ||
110 | ||
d9a27ee1 | 111 | [[pve_ceph_monitors]] |
21394e70 DM |
112 | Creating Ceph Monitors |
113 | ---------------------- | |
114 | ||
8997dd6e DM |
115 | [thumbnail="gui-ceph-monitor.png"] |
116 | ||
1d54c3b4 AA |
117 | The Ceph Monitor (MON) |
118 | footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/] | |
119 | maintains a master copy of the cluster map. For HA you need to have at least 3 | |
120 | monitors. | |
121 | ||
122 | On each node where you want to place a monitor (three monitors are recommended), | |
123 | create it by using the 'Ceph -> Monitor' tab in the GUI or run. | |
21394e70 DM |
124 | |
125 | ||
126 | [source,bash] | |
127 | ---- | |
128 | pveceph createmon | |
129 | ---- | |
130 | ||
1d54c3b4 AA |
131 | This will also install the needed Ceph Manager ('ceph-mgr') by default. If you |
132 | do not want to install a manager, specify the '-exclude-manager' option. | |
133 | ||
134 | ||
135 | [[pve_ceph_manager]] | |
136 | Creating Ceph Manager | |
137 | ---------------------- | |
138 | ||
139 | The Manager daemon runs alongside the monitors. It provides interfaces for | |
140 | monitoring the cluster. Since the Ceph luminous release the | |
141 | ceph-mgr footnote:[Ceph Manager http://docs.ceph.com/docs/luminous/mgr/] daemon | |
142 | is required. During monitor installation the ceph manager will be installed as | |
143 | well. | |
144 | ||
145 | NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For | |
146 | high availability install more then one manager. | |
147 | ||
148 | [source,bash] | |
149 | ---- | |
150 | pveceph createmgr | |
151 | ---- | |
152 | ||
21394e70 | 153 | |
d9a27ee1 | 154 | [[pve_ceph_osds]] |
21394e70 DM |
155 | Creating Ceph OSDs |
156 | ------------------ | |
157 | ||
8997dd6e DM |
158 | [thumbnail="gui-ceph-osd-status.png"] |
159 | ||
21394e70 DM |
160 | via GUI or via CLI as follows: |
161 | ||
162 | [source,bash] | |
163 | ---- | |
164 | pveceph createosd /dev/sd[X] | |
165 | ---- | |
166 | ||
1d54c3b4 AA |
167 | TIP: We recommend a Ceph cluster size, starting with 12 OSDs, distributed evenly |
168 | among your, at least three nodes (4 OSDs on each node). | |
169 | ||
170 | ||
171 | Ceph Bluestore | |
172 | ~~~~~~~~~~~~~~ | |
21394e70 | 173 | |
1d54c3b4 AA |
174 | Starting with the Ceph Kraken release, a new Ceph OSD storage type was |
175 | introduced, the so called Bluestore | |
176 | footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/]. In | |
177 | Ceph luminous this store is the default when creating OSDs. | |
21394e70 DM |
178 | |
179 | [source,bash] | |
180 | ---- | |
1d54c3b4 AA |
181 | pveceph createosd /dev/sd[X] |
182 | ---- | |
183 | ||
184 | NOTE: In order to select a disk in the GUI, to be more failsafe, the disk needs | |
185 | to have a | |
186 | GPT footnoteref:[GPT, | |
187 | GPT partition table https://en.wikipedia.org/wiki/GUID_Partition_Table] | |
188 | partition table. You can create this with `gdisk /dev/sd(x)`. If there is no | |
189 | GPT, you cannot select the disk as DB/WAL. | |
190 | ||
191 | If you want to use a separate DB/WAL device for your OSDs, you can specify it | |
192 | through the '-wal_dev' option. | |
193 | ||
194 | [source,bash] | |
195 | ---- | |
196 | pveceph createosd /dev/sd[X] -wal_dev /dev/sd[Y] | |
197 | ---- | |
198 | ||
199 | NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s | |
200 | internal journal or write-ahead log. It is recommended to use a fast SSDs or | |
201 | NVRAM for better performance. | |
202 | ||
203 | ||
204 | Ceph Filestore | |
205 | ~~~~~~~~~~~~~ | |
206 | Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can | |
207 | still be used and might give better performance in small setups, when backed by | |
208 | a NVMe SSD or similar. | |
209 | ||
210 | [source,bash] | |
211 | ---- | |
212 | pveceph createosd /dev/sd[X] -bluestore 0 | |
213 | ---- | |
214 | ||
215 | NOTE: In order to select a disk in the GUI, the disk needs to have a | |
216 | GPT footnoteref:[GPT] partition table. You can | |
217 | create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the | |
218 | disk as journal. Currently the journal size is fixed to 5 GB. | |
219 | ||
220 | If you want to use a dedicated SSD journal disk: | |
221 | ||
222 | [source,bash] | |
223 | ---- | |
e677b344 | 224 | pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y] -bluestore 0 |
21394e70 DM |
225 | ---- |
226 | ||
227 | Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD | |
228 | journal disk. | |
229 | ||
230 | [source,bash] | |
231 | ---- | |
e677b344 | 232 | pveceph createosd /dev/sdf -journal_dev /dev/sdb -bluestore 0 |
21394e70 DM |
233 | ---- |
234 | ||
235 | This partitions the disk (data and journal partition), creates | |
236 | filesystems and starts the OSD, afterwards it is running and fully | |
1d54c3b4 | 237 | functional. |
21394e70 | 238 | |
1d54c3b4 AA |
239 | NOTE: This command refuses to initialize disk when it detects existing data. So |
240 | if you want to overwrite a disk you should remove existing data first. You can | |
241 | do that using: 'ceph-disk zap /dev/sd[X]' | |
21394e70 DM |
242 | |
243 | You can create OSDs containing both journal and data partitions or you | |
244 | can place the journal on a dedicated SSD. Using a SSD journal disk is | |
1d54c3b4 | 245 | highly recommended to achieve good performance. |
21394e70 DM |
246 | |
247 | ||
07fef357 | 248 | [[pve_ceph_pools]] |
1d54c3b4 AA |
249 | Creating Ceph Pools |
250 | ------------------- | |
21394e70 | 251 | |
8997dd6e DM |
252 | [thumbnail="gui-ceph-pools.png"] |
253 | ||
1d54c3b4 AA |
254 | A pool is a logical group for storing objects. It holds **P**lacement |
255 | **G**roups (PG), a collection of objects. | |
256 | ||
257 | When no options are given, we set a | |
258 | default of **64 PGs**, a **size of 3 replicas** and a **min_size of 2 replicas** | |
259 | for serving objects in a degraded state. | |
260 | ||
261 | NOTE: The default number of PGs works for 2-6 disks. Ceph throws a | |
262 | "HEALTH_WARNING" if you have too few or too many PGs in your cluster. | |
263 | ||
264 | It is advised to calculate the PG number depending on your setup, you can find | |
265 | the formula and the PG | |
266 | calculator footnote:[PG calculator http://ceph.com/pgcalc/] online. While PGs | |
267 | can be increased later on, they can never be decreased. | |
268 | ||
269 | ||
270 | You can create pools through command line or on the GUI on each PVE host under | |
271 | **Ceph -> Pools**. | |
272 | ||
273 | [source,bash] | |
274 | ---- | |
275 | pveceph createpool <name> | |
276 | ---- | |
277 | ||
278 | If you would like to automatically get also a storage definition for your pool, | |
279 | active the checkbox "Add storages" on the GUI or use the command line option | |
280 | '--add_storages' on pool creation. | |
21394e70 | 281 | |
1d54c3b4 AA |
282 | Further information on Ceph pool handling can be found in the Ceph pool |
283 | operation footnote:[Ceph pool operation | |
284 | http://docs.ceph.com/docs/luminous/rados/operations/pools/] | |
285 | manual. | |
21394e70 DM |
286 | |
287 | Ceph Client | |
288 | ----------- | |
289 | ||
8997dd6e DM |
290 | [thumbnail="gui-ceph-log.png"] |
291 | ||
21394e70 DM |
292 | You can then configure {pve} to use such pools to store VM or |
293 | Container images. Simply use the GUI too add a new `RBD` storage (see | |
294 | section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]). | |
295 | ||
1d54c3b4 AA |
296 | You also need to copy the keyring to a predefined location for a external Ceph |
297 | cluster. If Ceph is installed on the Proxmox nodes itself, then this will be | |
298 | done automatically. | |
21394e70 DM |
299 | |
300 | NOTE: The file name needs to be `<storage_id> + `.keyring` - `<storage_id>` is | |
301 | the expression after 'rbd:' in `/etc/pve/storage.cfg` which is | |
302 | `my-ceph-storage` in the following example: | |
303 | ||
304 | [source,bash] | |
305 | ---- | |
306 | mkdir /etc/pve/priv/ceph | |
307 | cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring | |
308 | ---- | |
0840a663 DM |
309 | |
310 | ||
311 | ifdef::manvolnum[] | |
312 | include::pve-copyright.adoc[] | |
313 | endif::manvolnum[] |