]> git.proxmox.com Git - pve-docs.git/blob - pveceph.adoc
Update docs to the reflect the new Ceph luminous
[pve-docs.git] / pveceph.adoc
1 [[chapter_pveceph]]
2 ifdef::manvolnum[]
3 pveceph(1)
4 ==========
5 :pve-toplevel:
6
7 NAME
8 ----
9
10 pveceph - Manage Ceph Services on Proxmox VE Nodes
11
12 SYNOPSIS
13 --------
14
15 include::pveceph.1-synopsis.adoc[]
16
17 DESCRIPTION
18 -----------
19 endif::manvolnum[]
20 ifndef::manvolnum[]
21 Manage Ceph Services on Proxmox VE Nodes
22 ========================================
23 :pve-toplevel:
24 endif::manvolnum[]
25
26 [thumbnail="gui-ceph-status.png"]
27
28 {pve} unifies your compute and storage systems, i.e. you can use the
29 same physical nodes within a cluster for both computing (processing
30 VMs and containers) and replicated storage. The traditional silos of
31 compute and storage resources can be wrapped up into a single
32 hyper-converged appliance. Separate storage networks (SANs) and
33 connections via network (NAS) disappear. With the integration of Ceph,
34 an open source software-defined storage platform, {pve} has the
35 ability to run and manage Ceph storage directly on the hypervisor
36 nodes.
37
38 Ceph is a distributed object store and file system designed to provide
39 excellent performance, reliability and scalability.
40
41 For small to mid sized deployments, it is possible to install a Ceph server for
42 RADOS Block Devices (RBD) directly on your {pve} cluster nodes, see
43 xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]. Recent
44 hardware has plenty of CPU power and RAM, so running storage services
45 and VMs on the same node is possible.
46
47 To simplify management, we provide 'pveceph' - a tool to install and
48 manage {ceph} services on {pve} nodes.
49
50 Ceph consists of a couple of Daemons
51 footnote:[Ceph intro http://docs.ceph.com/docs/master/start/intro/], for use as
52 a RBD storage:
53
54 - Ceph Monitor (ceph-mon)
55 - Ceph Manager (ceph-mgr)
56 - Ceph OSD (ceph-osd; Object Storage Daemon)
57
58 TIP: We recommend to get familiar with the Ceph vocabulary.
59 footnote:[Ceph glossary http://docs.ceph.com/docs/luminous/glossary]
60
61
62 Precondition
63 ------------
64
65 To build a Proxmox Ceph Cluster there should be at least three (preferably)
66 identical servers for the setup.
67
68 A 10Gb network, exclusively used for Ceph, is recommended. A meshed
69 network setup is also an option if there are no 10Gb switches
70 available, see {webwiki-url}Full_Mesh_Network_for_Ceph_Server[wiki] .
71
72 Check also the recommendations from
73 http://docs.ceph.com/docs/luminous/start/hardware-recommendations/[Ceph's website].
74
75
76 Installation of Ceph Packages
77 -----------------------------
78
79 On each node run the installation script as follows:
80
81 [source,bash]
82 ----
83 pveceph install
84 ----
85
86 This sets up an `apt` package repository in
87 `/etc/apt/sources.list.d/ceph.list` and installs the required software.
88
89
90 Creating initial Ceph configuration
91 -----------------------------------
92
93 [thumbnail="gui-ceph-config.png"]
94
95 After installation of packages, you need to create an initial Ceph
96 configuration on just one node, based on your network (`10.10.10.0/24`
97 in the following example) dedicated for Ceph:
98
99 [source,bash]
100 ----
101 pveceph init --network 10.10.10.0/24
102 ----
103
104 This creates an initial config at `/etc/pve/ceph.conf`. That file is
105 automatically distributed to all {pve} nodes by using
106 xref:chapter_pmxcfs[pmxcfs]. The command also creates a symbolic link
107 from `/etc/ceph/ceph.conf` pointing to that file. So you can simply run
108 Ceph commands without the need to specify a configuration file.
109
110
111 [[pve_ceph_monitors]]
112 Creating Ceph Monitors
113 ----------------------
114
115 [thumbnail="gui-ceph-monitor.png"]
116
117 The Ceph Monitor (MON)
118 footnote:[Ceph Monitor http://docs.ceph.com/docs/luminous/start/intro/]
119 maintains a master copy of the cluster map. For HA you need to have at least 3
120 monitors.
121
122 On each node where you want to place a monitor (three monitors are recommended),
123 create it by using the 'Ceph -> Monitor' tab in the GUI or run.
124
125
126 [source,bash]
127 ----
128 pveceph createmon
129 ----
130
131 This will also install the needed Ceph Manager ('ceph-mgr') by default. If you
132 do not want to install a manager, specify the '-exclude-manager' option.
133
134
135 [[pve_ceph_manager]]
136 Creating Ceph Manager
137 ----------------------
138
139 The Manager daemon runs alongside the monitors. It provides interfaces for
140 monitoring the cluster. Since the Ceph luminous release the
141 ceph-mgr footnote:[Ceph Manager http://docs.ceph.com/docs/luminous/mgr/] daemon
142 is required. During monitor installation the ceph manager will be installed as
143 well.
144
145 NOTE: It is recommended to install the Ceph Manager on the monitor nodes. For
146 high availability install more then one manager.
147
148 [source,bash]
149 ----
150 pveceph createmgr
151 ----
152
153
154 [[pve_ceph_osds]]
155 Creating Ceph OSDs
156 ------------------
157
158 [thumbnail="gui-ceph-osd-status.png"]
159
160 via GUI or via CLI as follows:
161
162 [source,bash]
163 ----
164 pveceph createosd /dev/sd[X]
165 ----
166
167 TIP: We recommend a Ceph cluster size, starting with 12 OSDs, distributed evenly
168 among your, at least three nodes (4 OSDs on each node).
169
170
171 Ceph Bluestore
172 ~~~~~~~~~~~~~~
173
174 Starting with the Ceph Kraken release, a new Ceph OSD storage type was
175 introduced, the so called Bluestore
176 footnote:[Ceph Bluestore http://ceph.com/community/new-luminous-bluestore/]. In
177 Ceph luminous this store is the default when creating OSDs.
178
179 [source,bash]
180 ----
181 pveceph createosd /dev/sd[X]
182 ----
183
184 NOTE: In order to select a disk in the GUI, to be more failsafe, the disk needs
185 to have a
186 GPT footnoteref:[GPT,
187 GPT partition table https://en.wikipedia.org/wiki/GUID_Partition_Table]
188 partition table. You can create this with `gdisk /dev/sd(x)`. If there is no
189 GPT, you cannot select the disk as DB/WAL.
190
191 If you want to use a separate DB/WAL device for your OSDs, you can specify it
192 through the '-wal_dev' option.
193
194 [source,bash]
195 ----
196 pveceph createosd /dev/sd[X] -wal_dev /dev/sd[Y]
197 ----
198
199 NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s
200 internal journal or write-ahead log. It is recommended to use a fast SSDs or
201 NVRAM for better performance.
202
203
204 Ceph Filestore
205 ~~~~~~~~~~~~~
206 Till Ceph luminous, Filestore was used as storage type for Ceph OSDs. It can
207 still be used and might give better performance in small setups, when backed by
208 a NVMe SSD or similar.
209
210 [source,bash]
211 ----
212 pveceph createosd /dev/sd[X] -bluestore 0
213 ----
214
215 NOTE: In order to select a disk in the GUI, the disk needs to have a
216 GPT footnoteref:[GPT] partition table. You can
217 create this with `gdisk /dev/sd(x)`. If there is no GPT, you cannot select the
218 disk as journal. Currently the journal size is fixed to 5 GB.
219
220 If you want to use a dedicated SSD journal disk:
221
222 [source,bash]
223 ----
224 pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y]
225 ----
226
227 Example: Use /dev/sdf as data disk (4TB) and /dev/sdb is the dedicated SSD
228 journal disk.
229
230 [source,bash]
231 ----
232 pveceph createosd /dev/sdf -journal_dev /dev/sdb
233 ----
234
235 This partitions the disk (data and journal partition), creates
236 filesystems and starts the OSD, afterwards it is running and fully
237 functional.
238
239 NOTE: This command refuses to initialize disk when it detects existing data. So
240 if you want to overwrite a disk you should remove existing data first. You can
241 do that using: 'ceph-disk zap /dev/sd[X]'
242
243 You can create OSDs containing both journal and data partitions or you
244 can place the journal on a dedicated SSD. Using a SSD journal disk is
245 highly recommended to achieve good performance.
246
247
248 [[pve_creating_ceph_pools]]
249 Creating Ceph Pools
250 -------------------
251
252 [thumbnail="gui-ceph-pools.png"]
253
254 A pool is a logical group for storing objects. It holds **P**lacement
255 **G**roups (PG), a collection of objects.
256
257 When no options are given, we set a
258 default of **64 PGs**, a **size of 3 replicas** and a **min_size of 2 replicas**
259 for serving objects in a degraded state.
260
261 NOTE: The default number of PGs works for 2-6 disks. Ceph throws a
262 "HEALTH_WARNING" if you have too few or too many PGs in your cluster.
263
264 It is advised to calculate the PG number depending on your setup, you can find
265 the formula and the PG
266 calculator footnote:[PG calculator http://ceph.com/pgcalc/] online. While PGs
267 can be increased later on, they can never be decreased.
268
269
270 You can create pools through command line or on the GUI on each PVE host under
271 **Ceph -> Pools**.
272
273 [source,bash]
274 ----
275 pveceph createpool <name>
276 ----
277
278 If you would like to automatically get also a storage definition for your pool,
279 active the checkbox "Add storages" on the GUI or use the command line option
280 '--add_storages' on pool creation.
281
282 Further information on Ceph pool handling can be found in the Ceph pool
283 operation footnote:[Ceph pool operation
284 http://docs.ceph.com/docs/luminous/rados/operations/pools/]
285 manual.
286
287 Ceph Client
288 -----------
289
290 [thumbnail="gui-ceph-log.png"]
291
292 You can then configure {pve} to use such pools to store VM or
293 Container images. Simply use the GUI too add a new `RBD` storage (see
294 section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
295
296 You also need to copy the keyring to a predefined location for a external Ceph
297 cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
298 done automatically.
299
300 NOTE: The file name needs to be `<storage_id> + `.keyring` - `<storage_id>` is
301 the expression after 'rbd:' in `/etc/pve/storage.cfg` which is
302 `my-ceph-storage` in the following example:
303
304 [source,bash]
305 ----
306 mkdir /etc/pve/priv/ceph
307 cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyring
308 ----
309
310
311 ifdef::manvolnum[]
312 include::pve-copyright.adoc[]
313 endif::manvolnum[]