]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ========================== |
2 | Monitor Config Reference | |
3 | ========================== | |
4 | ||
5 | Understanding how to configure a :term:`Ceph Monitor` is an important part of | |
6 | building a reliable :term:`Ceph Storage Cluster`. **All Ceph Storage Clusters | |
f67539c2 | 7 | have at least one monitor**. The monitor complement usually remains fairly |
7c673cae | 8 | consistent, but you can add, remove or replace a monitor in a cluster. See |
f67539c2 | 9 | `Adding/Removing a Monitor`_ for details. |
7c673cae FG |
10 | |
11 | ||
12 | .. index:: Ceph Monitor; Paxos | |
13 | ||
14 | Background | |
15 | ========== | |
16 | ||
f67539c2 | 17 | Ceph Monitors maintain a "master copy" of the :term:`Cluster Map`, which means a |
7c673cae FG |
18 | :term:`Ceph Client` can determine the location of all Ceph Monitors, Ceph OSD |
19 | Daemons, and Ceph Metadata Servers just by connecting to one Ceph Monitor and | |
20 | retrieving a current cluster map. Before Ceph Clients can read from or write to | |
21 | Ceph OSD Daemons or Ceph Metadata Servers, they must connect to a Ceph Monitor | |
22 | first. With a current copy of the cluster map and the CRUSH algorithm, a Ceph | |
23 | Client can compute the location for any object. The ability to compute object | |
24 | locations allows a Ceph Client to talk directly to Ceph OSD Daemons, which is a | |
25 | very important aspect of Ceph's high scalability and performance. See | |
26 | `Scalability and High Availability`_ for additional details. | |
27 | ||
28 | The primary role of the Ceph Monitor is to maintain a master copy of the cluster | |
29 | map. Ceph Monitors also provide authentication and logging services. Ceph | |
30 | Monitors write all changes in the monitor services to a single Paxos instance, | |
31 | and Paxos writes the changes to a key/value store for strong consistency. Ceph | |
32 | Monitors can query the most recent version of the cluster map during sync | |
33 | operations. Ceph Monitors leverage the key/value store's snapshots and iterators | |
34 | (using leveldb) to perform store-wide synchronization. | |
35 | ||
f91f0fd5 | 36 | .. ditaa:: |
7c673cae FG |
37 | /-------------\ /-------------\ |
38 | | Monitor | Write Changes | Paxos | | |
39 | | cCCC +-------------->+ cCCC | | |
40 | | | | | | |
41 | +-------------+ \------+------/ | |
42 | | Auth | | | |
43 | +-------------+ | Write Changes | |
44 | | Log | | | |
45 | +-------------+ v | |
46 | | Monitor Map | /------+------\ | |
47 | +-------------+ | Key / Value | | |
48 | | OSD Map | | Store | | |
49 | +-------------+ | cCCC | | |
50 | | PG Map | \------+------/ | |
51 | +-------------+ ^ | |
52 | | MDS Map | | Read Changes | |
53 | +-------------+ | | |
54 | | cCCC |*---------------------+ | |
55 | \-------------/ | |
56 | ||
57 | ||
58 | .. deprecated:: version 0.58 | |
59 | ||
60 | In Ceph versions 0.58 and earlier, Ceph Monitors use a Paxos instance for | |
61 | each service and store the map as a file. | |
62 | ||
63 | .. index:: Ceph Monitor; cluster map | |
64 | ||
65 | Cluster Maps | |
66 | ------------ | |
67 | ||
68 | The cluster map is a composite of maps, including the monitor map, the OSD map, | |
69 | the placement group map and the metadata server map. The cluster map tracks a | |
70 | number of important things: which processes are ``in`` the Ceph Storage Cluster; | |
71 | which processes that are ``in`` the Ceph Storage Cluster are ``up`` and running | |
72 | or ``down``; whether, the placement groups are ``active`` or ``inactive``, and | |
73 | ``clean`` or in some other state; and, other details that reflect the current | |
74 | state of the cluster such as the total amount of storage space, and the amount | |
75 | of storage used. | |
76 | ||
77 | When there is a significant change in the state of the cluster--e.g., a Ceph OSD | |
78 | Daemon goes down, a placement group falls into a degraded state, etc.--the | |
79 | cluster map gets updated to reflect the current state of the cluster. | |
80 | Additionally, the Ceph Monitor also maintains a history of the prior states of | |
81 | the cluster. The monitor map, OSD map, placement group map and metadata server | |
82 | map each maintain a history of their map versions. We call each version an | |
83 | "epoch." | |
84 | ||
85 | When operating your Ceph Storage Cluster, keeping track of these states is an | |
86 | important part of your system administration duties. See `Monitoring a Cluster`_ | |
87 | and `Monitoring OSDs and PGs`_ for additional details. | |
88 | ||
89 | .. index:: high availability; quorum | |
90 | ||
91 | Monitor Quorum | |
92 | -------------- | |
93 | ||
94 | Our Configuring ceph section provides a trivial `Ceph configuration file`_ that | |
95 | provides for one monitor in the test cluster. A cluster will run fine with a | |
96 | single monitor; however, **a single monitor is a single-point-of-failure**. To | |
97 | ensure high availability in a production Ceph Storage Cluster, you should run | |
98 | Ceph with multiple monitors so that the failure of a single monitor **WILL NOT** | |
99 | bring down your entire cluster. | |
100 | ||
101 | When a Ceph Storage Cluster runs multiple Ceph Monitors for high availability, | |
102 | Ceph Monitors use `Paxos`_ to establish consensus about the master cluster map. | |
103 | A consensus requires a majority of monitors running to establish a quorum for | |
104 | consensus about the cluster map (e.g., 1; 2 out of 3; 3 out of 5; 4 out of 6; | |
105 | etc.). | |
106 | ||
31f18b77 FG |
107 | ``mon force quorum join`` |
108 | ||
109 | :Description: Force monitor to join quorum even if it has been previously removed from the map | |
110 | :Type: Boolean | |
111 | :Default: ``False`` | |
7c673cae FG |
112 | |
113 | .. index:: Ceph Monitor; consistency | |
114 | ||
115 | Consistency | |
116 | ----------- | |
117 | ||
118 | When you add monitor settings to your Ceph configuration file, you need to be | |
119 | aware of some of the architectural aspects of Ceph Monitors. **Ceph imposes | |
120 | strict consistency requirements** for a Ceph monitor when discovering another | |
121 | Ceph Monitor within the cluster. Whereas, Ceph Clients and other Ceph daemons | |
122 | use the Ceph configuration file to discover monitors, monitors discover each | |
123 | other using the monitor map (monmap), not the Ceph configuration file. | |
124 | ||
125 | A Ceph Monitor always refers to the local copy of the monmap when discovering | |
126 | other Ceph Monitors in the Ceph Storage Cluster. Using the monmap instead of the | |
127 | Ceph configuration file avoids errors that could break the cluster (e.g., typos | |
128 | in ``ceph.conf`` when specifying a monitor address or port). Since monitors use | |
129 | monmaps for discovery and they share monmaps with clients and other Ceph | |
130 | daemons, **the monmap provides monitors with a strict guarantee that their | |
131 | consensus is valid.** | |
132 | ||
133 | Strict consistency also applies to updates to the monmap. As with any other | |
134 | updates on the Ceph Monitor, changes to the monmap always run through a | |
135 | distributed consensus algorithm called `Paxos`_. The Ceph Monitors must agree on | |
136 | each update to the monmap, such as adding or removing a Ceph Monitor, to ensure | |
137 | that each monitor in the quorum has the same version of the monmap. Updates to | |
138 | the monmap are incremental so that Ceph Monitors have the latest agreed upon | |
139 | version, and a set of previous versions. Maintaining a history enables a Ceph | |
140 | Monitor that has an older version of the monmap to catch up with the current | |
141 | state of the Ceph Storage Cluster. | |
142 | ||
f67539c2 TL |
143 | If Ceph Monitors were to discover each other through the Ceph configuration file |
144 | instead of through the monmap, additional risks would be introduced because | |
c07f9fc5 | 145 | Ceph configuration files are not updated and distributed automatically. Ceph |
7c673cae FG |
146 | Monitors might inadvertently use an older Ceph configuration file, fail to |
147 | recognize a Ceph Monitor, fall out of a quorum, or develop a situation where | |
c07f9fc5 | 148 | `Paxos`_ is not able to determine the current state of the system accurately. |
7c673cae FG |
149 | |
150 | ||
151 | .. index:: Ceph Monitor; bootstrapping monitors | |
152 | ||
153 | Bootstrapping Monitors | |
154 | ---------------------- | |
155 | ||
f67539c2 | 156 | In most configuration and deployment cases, tools that deploy Ceph help |
7c673cae | 157 | bootstrap the Ceph Monitors by generating a monitor map for you (e.g., |
f67539c2 | 158 | ``cephadm``, etc). A Ceph Monitor requires a few explicit |
7c673cae FG |
159 | settings: |
160 | ||
161 | - **Filesystem ID**: The ``fsid`` is the unique identifier for your | |
162 | object store. Since you can run multiple clusters on the same | |
163 | hardware, you must specify the unique ID of the object store when | |
164 | bootstrapping a monitor. Deployment tools usually do this for you | |
f67539c2 | 165 | (e.g., ``cephadm`` can call a tool like ``uuidgen``), but you |
7c673cae FG |
166 | may specify the ``fsid`` manually too. |
167 | ||
168 | - **Monitor ID**: A monitor ID is a unique ID assigned to each monitor within | |
169 | the cluster. It is an alphanumeric value, and by convention the identifier | |
170 | usually follows an alphabetical increment (e.g., ``a``, ``b``, etc.). This | |
171 | can be set in a Ceph configuration file (e.g., ``[mon.a]``, ``[mon.b]``, etc.), | |
172 | by a deployment tool, or using the ``ceph`` commandline. | |
173 | ||
174 | - **Keys**: The monitor must have secret keys. A deployment tool such as | |
f67539c2 | 175 | ``cephadm`` usually does this for you, but you may |
7c673cae FG |
176 | perform this step manually too. See `Monitor Keyrings`_ for details. |
177 | ||
178 | For additional details on bootstrapping, see `Bootstrapping a Monitor`_. | |
179 | ||
180 | .. index:: Ceph Monitor; configuring monitors | |
181 | ||
182 | Configuring Monitors | |
183 | ==================== | |
184 | ||
185 | To apply configuration settings to the entire cluster, enter the configuration | |
186 | settings under ``[global]``. To apply configuration settings to all monitors in | |
187 | your cluster, enter the configuration settings under ``[mon]``. To apply | |
188 | configuration settings to specific monitors, specify the monitor instance | |
189 | (e.g., ``[mon.a]``). By convention, monitor instance names use alpha notation. | |
190 | ||
191 | .. code-block:: ini | |
192 | ||
193 | [global] | |
194 | ||
195 | [mon] | |
196 | ||
197 | [mon.a] | |
198 | ||
199 | [mon.b] | |
200 | ||
201 | [mon.c] | |
202 | ||
203 | ||
204 | Minimum Configuration | |
205 | --------------------- | |
206 | ||
207 | The bare minimum monitor settings for a Ceph monitor via the Ceph configuration | |
f67539c2 | 208 | file include a hostname and a network address for each monitor. You can configure |
7c673cae FG |
209 | these under ``[mon]`` or under the entry for a specific monitor. |
210 | ||
211 | .. code-block:: ini | |
212 | ||
11fdf7f2 TL |
213 | [global] |
214 | mon host = 10.0.0.2,10.0.0.3,10.0.0.4 | |
7c673cae FG |
215 | |
216 | .. code-block:: ini | |
217 | ||
218 | [mon.a] | |
219 | host = hostname1 | |
220 | mon addr = 10.0.0.10:6789 | |
221 | ||
222 | See the `Network Configuration Reference`_ for details. | |
223 | ||
224 | .. note:: This minimum configuration for monitors assumes that a deployment | |
225 | tool generates the ``fsid`` and the ``mon.`` key for you. | |
226 | ||
f67539c2 TL |
227 | Once you deploy a Ceph cluster, you **SHOULD NOT** change the IP addresses of |
228 | monitors. However, if you decide to change the monitor's IP address, you | |
7c673cae FG |
229 | must follow a specific procedure. See `Changing a Monitor's IP Address`_ for |
230 | details. | |
231 | ||
f67539c2 | 232 | Monitors can also be found by clients by using DNS SRV records. See `Monitor lookup through DNS`_ for details. |
7c673cae FG |
233 | |
234 | Cluster ID | |
235 | ---------- | |
236 | ||
237 | Each Ceph Storage Cluster has a unique identifier (``fsid``). If specified, it | |
238 | usually appears under the ``[global]`` section of the configuration file. | |
239 | Deployment tools usually generate the ``fsid`` and store it in the monitor map, | |
240 | so the value may not appear in a configuration file. The ``fsid`` makes it | |
241 | possible to run daemons for multiple clusters on the same hardware. | |
242 | ||
243 | ``fsid`` | |
244 | ||
245 | :Description: The cluster ID. One per cluster. | |
246 | :Type: UUID | |
247 | :Required: Yes. | |
248 | :Default: N/A. May be generated by a deployment tool if not specified. | |
249 | ||
250 | .. note:: Do not set this value if you use a deployment tool that does | |
251 | it for you. | |
252 | ||
253 | ||
254 | .. index:: Ceph Monitor; initial members | |
255 | ||
256 | Initial Members | |
257 | --------------- | |
258 | ||
259 | We recommend running a production Ceph Storage Cluster with at least three Ceph | |
260 | Monitors to ensure high availability. When you run multiple monitors, you may | |
261 | specify the initial monitors that must be members of the cluster in order to | |
262 | establish a quorum. This may reduce the time it takes for your cluster to come | |
263 | online. | |
264 | ||
265 | .. code-block:: ini | |
266 | ||
267 | [mon] | |
f67539c2 | 268 | mon_initial_members = a,b,c |
7c673cae FG |
269 | |
270 | ||
f67539c2 | 271 | ``mon_initial_members`` |
7c673cae FG |
272 | |
273 | :Description: The IDs of initial monitors in a cluster during startup. If | |
274 | specified, Ceph requires an odd number of monitors to form an | |
275 | initial quorum (e.g., 3). | |
276 | ||
277 | :Type: String | |
278 | :Default: None | |
279 | ||
280 | .. note:: A *majority* of monitors in your cluster must be able to reach | |
281 | each other in order to establish a quorum. You can decrease the initial | |
282 | number of monitors to establish a quorum with this setting. | |
283 | ||
284 | .. index:: Ceph Monitor; data path | |
285 | ||
286 | Data | |
287 | ---- | |
288 | ||
289 | Ceph provides a default path where Ceph Monitors store data. For optimal | |
290 | performance in a production Ceph Storage Cluster, we recommend running Ceph | |
f67539c2 | 291 | Monitors on separate hosts and drives from Ceph OSD Daemons. As leveldb uses |
7c673cae FG |
292 | ``mmap()`` for writing the data, Ceph Monitors flush their data from memory to disk |
293 | very often, which can interfere with Ceph OSD Daemon workloads if the data | |
294 | store is co-located with the OSD Daemons. | |
295 | ||
f67539c2 | 296 | In Ceph versions 0.58 and earlier, Ceph Monitors store their data in plain files. This |
7c673cae | 297 | approach allows users to inspect monitor data with common tools like ``ls`` |
f67539c2 | 298 | and ``cat``. However, this approach didn't provide strong consistency. |
7c673cae FG |
299 | |
300 | In Ceph versions 0.59 and later, Ceph Monitors store their data as key/value | |
301 | pairs. Ceph Monitors require `ACID`_ transactions. Using a data store prevents | |
302 | recovering Ceph Monitors from running corrupted versions through Paxos, and it | |
303 | enables multiple modification operations in one single atomic batch, among other | |
304 | advantages. | |
305 | ||
306 | Generally, we do not recommend changing the default data location. If you modify | |
307 | the default location, we recommend that you make it uniform across Ceph Monitors | |
308 | by setting it in the ``[mon]`` section of the configuration file. | |
309 | ||
310 | ||
f67539c2 | 311 | ``mon_data`` |
7c673cae FG |
312 | |
313 | :Description: The monitor's data location. | |
314 | :Type: String | |
315 | :Default: ``/var/lib/ceph/mon/$cluster-$id`` | |
316 | ||
317 | ||
f67539c2 | 318 | ``mon_data_size_warn`` |
31f18b77 | 319 | |
f67539c2 TL |
320 | :Description: Raise ``HEALTH_WARN`` status when a monitor's data |
321 | store grows to be larger than this size, 15GB by default. | |
9f95a23c | 322 | |
31f18b77 | 323 | :Type: Integer |
9f95a23c | 324 | :Default: ``15*1024*1024*1024`` |
31f18b77 FG |
325 | |
326 | ||
f67539c2 | 327 | ``mon_data_avail_warn`` |
31f18b77 | 328 | |
f67539c2 TL |
329 | :Description: Raise ``HEALTH_WARN`` status when the filesystem that houses a |
330 | monitor's data store reports that its available capacity is | |
331 | less than or equal to this percentage . | |
9f95a23c | 332 | |
31f18b77 | 333 | :Type: Integer |
9f95a23c | 334 | :Default: ``30`` |
31f18b77 FG |
335 | |
336 | ||
f67539c2 | 337 | ``mon_data_avail_crit`` |
31f18b77 | 338 | |
f67539c2 TL |
339 | :Description: Raise ``HEALTH_ERR`` status when the filesystem that houses a |
340 | monitor's data store reports that its available capacity is | |
341 | less than or equal to this percentage. | |
9f95a23c | 342 | |
31f18b77 | 343 | :Type: Integer |
9f95a23c | 344 | :Default: ``5`` |
31f18b77 | 345 | |
f67539c2 | 346 | ``mon_warn_on_cache_pools_without_hit_sets`` |
31f18b77 | 347 | |
f67539c2 | 348 | :Description: Raise ``HEALTH_WARN`` when a cache pool does not |
11fdf7f2 TL |
349 | have the ``hit_set_type`` value configured. |
350 | See :ref:`hit_set_type <hit_set_type>` for more | |
31f18b77 | 351 | details. |
9f95a23c | 352 | |
31f18b77 | 353 | :Type: Boolean |
9f95a23c | 354 | :Default: ``True`` |
31f18b77 | 355 | |
f67539c2 | 356 | ``mon_warn_on_crush_straw_calc_version_zero`` |
31f18b77 | 357 | |
f67539c2 | 358 | :Description: Raise ``HEALTH_WARN`` when the CRUSH |
31f18b77 | 359 | ``straw_calc_version`` is zero. See |
11fdf7f2 | 360 | :ref:`CRUSH map tunables <crush-map-tunables>` for |
31f18b77 | 361 | details. |
9f95a23c | 362 | |
31f18b77 | 363 | :Type: Boolean |
9f95a23c | 364 | :Default: ``True`` |
31f18b77 FG |
365 | |
366 | ||
f67539c2 | 367 | ``mon_warn_on_legacy_crush_tunables`` |
31f18b77 | 368 | |
f67539c2 | 369 | :Description: Raise ``HEALTH_WARN`` when |
31f18b77 | 370 | CRUSH tunables are too old (older than ``mon_min_crush_required_version``) |
9f95a23c | 371 | |
31f18b77 | 372 | :Type: Boolean |
9f95a23c | 373 | :Default: ``True`` |
31f18b77 FG |
374 | |
375 | ||
f67539c2 | 376 | ``mon_crush_min_required_version`` |
31f18b77 | 377 | |
f67539c2 | 378 | :Description: The minimum tunable profile required by the cluster. |
31f18b77 | 379 | See |
11fdf7f2 | 380 | :ref:`CRUSH map tunables <crush-map-tunables>` for |
31f18b77 | 381 | details. |
9f95a23c | 382 | |
31f18b77 | 383 | :Type: String |
9f95a23c | 384 | :Default: ``hammer`` |
31f18b77 FG |
385 | |
386 | ||
f67539c2 | 387 | ``mon_warn_on_osd_down_out_interval_zero`` |
31f18b77 | 388 | |
f67539c2 TL |
389 | :Description: Raise ``HEALTH_WARN`` when |
390 | ``mon_osd_down_out_interval`` is zero. Having this option set to | |
31f18b77 | 391 | zero on the leader acts much like the ``noout`` flag. It's hard |
11fdf7f2 | 392 | to figure out what's going wrong with clusters without the |
31f18b77 FG |
393 | ``noout`` flag set but acting like that just the same, so we |
394 | report a warning in this case. | |
9f95a23c | 395 | |
31f18b77 | 396 | :Type: Boolean |
9f95a23c | 397 | :Default: ``True`` |
31f18b77 FG |
398 | |
399 | ||
f67539c2 | 400 | ``mon_warn_on_slow_ping_ratio`` |
eafe8130 | 401 | |
f67539c2 TL |
402 | :Description: Raise ``HEALTH_WARN`` when any heartbeat |
403 | between OSDs exceeds ``mon_warn_on_slow_ping_ratio`` | |
404 | of ``osd_heartbeat_grace``. The default is 5%. | |
eafe8130 TL |
405 | :Type: Float |
406 | :Default: ``0.05`` | |
407 | ||
408 | ||
f67539c2 | 409 | ``mon_warn_on_slow_ping_time`` |
eafe8130 | 410 | |
f67539c2 TL |
411 | :Description: Override ``mon_warn_on_slow_ping_ratio`` with a specific value. |
412 | Raise ``HEALTH_WARN`` if any heartbeat | |
413 | between OSDs exceeds ``mon_warn_on_slow_ping_time`` | |
eafe8130 TL |
414 | milliseconds. The default is 0 (disabled). |
415 | :Type: Integer | |
416 | :Default: ``0`` | |
417 | ||
418 | ||
f67539c2 | 419 | ``mon_warn_on_pool_no_redundancy`` |
9f95a23c | 420 | |
f67539c2 | 421 | :Description: Raise ``HEALTH_WARN`` if any pool is |
9f95a23c TL |
422 | configured with no replicas. |
423 | :Type: Boolean | |
424 | :Default: ``True`` | |
425 | ||
426 | ||
f67539c2 | 427 | ``mon_cache_target_full_warn_ratio`` |
31f18b77 FG |
428 | |
429 | :Description: Position between pool's ``cache_target_full`` and | |
430 | ``target_max_object`` where we start warning | |
31f18b77 | 431 | |
31f18b77 | 432 | :Type: Float |
9f95a23c | 433 | :Default: ``0.66`` |
31f18b77 FG |
434 | |
435 | ||
f67539c2 | 436 | ``mon_health_to_clog`` |
31f18b77 | 437 | |
f67539c2 | 438 | :Description: Enable sending a health summary to the cluster log periodically. |
31f18b77 | 439 | :Type: Boolean |
9f95a23c | 440 | :Default: ``True`` |
31f18b77 FG |
441 | |
442 | ||
f67539c2 | 443 | ``mon_health_to_clog_tick_interval`` |
31f18b77 | 444 | |
f67539c2 TL |
445 | :Description: How often (in seconds) the monitor sends a health summary to the cluster |
446 | log (a non-positive number disables). If current health summary | |
31f18b77 FG |
447 | is empty or identical to the last time, monitor will not send it |
448 | to cluster log. | |
9f95a23c | 449 | |
eafe8130 | 450 | :Type: Float |
9f95a23c | 451 | :Default: ``60.0`` |
31f18b77 FG |
452 | |
453 | ||
f67539c2 | 454 | ``mon_health_to_clog_interval`` |
31f18b77 | 455 | |
f67539c2 TL |
456 | :Description: How often (in seconds) the monitor sends a health summary to the cluster |
457 | log (a non-positive number disables). Monitors will always | |
458 | send a summary to the cluster log whether or not it differs from | |
459 | the previous summary. | |
9f95a23c | 460 | |
31f18b77 | 461 | :Type: Integer |
9f95a23c | 462 | :Default: ``3600`` |
31f18b77 FG |
463 | |
464 | ||
465 | ||
7c673cae FG |
466 | .. index:: Ceph Storage Cluster; capacity planning, Ceph Monitor; capacity planning |
467 | ||
f67539c2 TL |
468 | .. _storage-capacity: |
469 | ||
7c673cae FG |
470 | Storage Capacity |
471 | ---------------- | |
472 | ||
f67539c2 TL |
473 | When a Ceph Storage Cluster gets close to its maximum capacity |
474 | (see``mon_osd_full ratio``), Ceph prevents you from writing to or reading from OSDs | |
475 | as a safety measure to prevent data loss. Therefore, letting a | |
7c673cae FG |
476 | production Ceph Storage Cluster approach its full ratio is not a good practice, |
477 | because it sacrifices high availability. The default full ratio is ``.95``, or | |
478 | 95% of capacity. This a very aggressive setting for a test cluster with a small | |
479 | number of OSDs. | |
480 | ||
481 | .. tip:: When monitoring your cluster, be alert to warnings related to the | |
482 | ``nearfull`` ratio. This means that a failure of some OSDs could result | |
483 | in a temporary service disruption if one or more OSDs fails. Consider adding | |
484 | more OSDs to increase storage capacity. | |
485 | ||
f67539c2 TL |
486 | A common scenario for test clusters involves a system administrator removing an |
487 | OSD from the Ceph Storage Cluster, watching the cluster rebalance, then removing | |
488 | another OSD, and another, until at least one OSD eventually reaches the full | |
489 | ratio and the cluster locks up. We recommend a bit of capacity | |
7c673cae FG |
490 | planning even with a test cluster. Planning enables you to gauge how much spare |
491 | capacity you will need in order to maintain high availability. Ideally, you want | |
492 | to plan for a series of Ceph OSD Daemon failures where the cluster can recover | |
f67539c2 TL |
493 | to an ``active+clean`` state without replacing those OSDs |
494 | immediately. Cluster operation continues in the ``active+degraded`` state, but this | |
495 | is not ideal for normal operation and should be addressed promptly. | |
7c673cae FG |
496 | |
497 | The following diagram depicts a simplistic Ceph Storage Cluster containing 33 | |
f67539c2 | 498 | Ceph Nodes with one OSD per host, each OSD reading from |
7c673cae FG |
499 | and writing to a 3TB drive. So this exemplary Ceph Storage Cluster has a maximum |
500 | actual capacity of 99TB. With a ``mon osd full ratio`` of ``0.95``, if the Ceph | |
501 | Storage Cluster falls to 5TB of remaining capacity, the cluster will not allow | |
502 | Ceph Clients to read and write data. So the Ceph Storage Cluster's operating | |
503 | capacity is 95TB, not 99TB. | |
504 | ||
505 | .. ditaa:: | |
7c673cae FG |
506 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
507 | | Rack 1 | | Rack 2 | | Rack 3 | | Rack 4 | | Rack 5 | | Rack 6 | | |
508 | | cCCC | | cF00 | | cCCC | | cCCC | | cCCC | | cCCC | | |
509 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
510 | | OSD 1 | | OSD 7 | | OSD 13 | | OSD 19 | | OSD 25 | | OSD 31 | | |
511 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
512 | | OSD 2 | | OSD 8 | | OSD 14 | | OSD 20 | | OSD 26 | | OSD 32 | | |
513 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
514 | | OSD 3 | | OSD 9 | | OSD 15 | | OSD 21 | | OSD 27 | | OSD 33 | | |
515 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
516 | | OSD 4 | | OSD 10 | | OSD 16 | | OSD 22 | | OSD 28 | | Spare | | |
517 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
518 | | OSD 5 | | OSD 11 | | OSD 17 | | OSD 23 | | OSD 29 | | Spare | | |
519 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
520 | | OSD 6 | | OSD 12 | | OSD 18 | | OSD 24 | | OSD 30 | | Spare | | |
521 | +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ | |
522 | ||
523 | It is normal in such a cluster for one or two OSDs to fail. A less frequent but | |
524 | reasonable scenario involves a rack's router or power supply failing, which | |
525 | brings down multiple OSDs simultaneously (e.g., OSDs 7-12). In such a scenario, | |
526 | you should still strive for a cluster that can remain operational and achieve an | |
527 | ``active + clean`` state--even if that means adding a few hosts with additional | |
528 | OSDs in short order. If your capacity utilization is too high, you may not lose | |
529 | data, but you could still sacrifice data availability while resolving an outage | |
530 | within a failure domain if capacity utilization of the cluster exceeds the full | |
531 | ratio. For this reason, we recommend at least some rough capacity planning. | |
532 | ||
533 | Identify two numbers for your cluster: | |
534 | ||
535 | #. The number of OSDs. | |
536 | #. The total capacity of the cluster | |
537 | ||
538 | If you divide the total capacity of your cluster by the number of OSDs in your | |
539 | cluster, you will find the mean average capacity of an OSD within your cluster. | |
540 | Consider multiplying that number by the number of OSDs you expect will fail | |
541 | simultaneously during normal operations (a relatively small number). Finally | |
542 | multiply the capacity of the cluster by the full ratio to arrive at a maximum | |
543 | operating capacity; then, subtract the number of amount of data from the OSDs | |
544 | you expect to fail to arrive at a reasonable full ratio. Repeat the foregoing | |
545 | process with a higher number of OSD failures (e.g., a rack of OSDs) to arrive at | |
546 | a reasonable number for a near full ratio. | |
547 | ||
11fdf7f2 | 548 | The following settings only apply on cluster creation and are then stored in |
f67539c2 TL |
549 | the OSDMap. To clarify, in normal operation the values that are used by OSDs |
550 | are those found in the OSDMap, not those in the configuration file or central | |
551 | config store. | |
11fdf7f2 | 552 | |
7c673cae FG |
553 | .. code-block:: ini |
554 | ||
555 | [global] | |
f67539c2 TL |
556 | mon_osd_full_ratio = .80 |
557 | mon_osd_backfillfull_ratio = .75 | |
558 | mon_osd_nearfull_ratio = .70 | |
7c673cae FG |
559 | |
560 | ||
f67539c2 | 561 | ``mon_osd_full_ratio`` |
7c673cae | 562 | |
f67539c2 | 563 | :Description: The threshold percentage of device space utilized before an OSD is |
7c673cae FG |
564 | considered ``full``. |
565 | ||
566 | :Type: Float | |
9f95a23c | 567 | :Default: ``0.95`` |
7c673cae FG |
568 | |
569 | ||
f67539c2 | 570 | ``mon_osd_backfillfull_ratio`` |
7c673cae | 571 | |
f67539c2 | 572 | :Description: The threshold percentage of device space utilized before an OSD is |
7c673cae FG |
573 | considered too ``full`` to backfill. |
574 | ||
575 | :Type: Float | |
9f95a23c | 576 | :Default: ``0.90`` |
7c673cae FG |
577 | |
578 | ||
f67539c2 | 579 | ``mon_osd_nearfull_ratio`` |
7c673cae | 580 | |
f67539c2 | 581 | :Description: The threshold percentage of device space used before an OSD is |
7c673cae FG |
582 | considered ``nearfull``. |
583 | ||
584 | :Type: Float | |
9f95a23c | 585 | :Default: ``0.85`` |
7c673cae FG |
586 | |
587 | ||
588 | .. tip:: If some OSDs are nearfull, but others have plenty of capacity, you | |
f67539c2 | 589 | may have an inaccurate CRUSH weight set for the nearfull OSDs. |
7c673cae | 590 | |
11fdf7f2 TL |
591 | .. tip:: These settings only apply during cluster creation. Afterwards they need |
592 | to be changed in the OSDMap using ``ceph osd set-nearfull-ratio`` and | |
593 | ``ceph osd set-full-ratio`` | |
594 | ||
7c673cae FG |
595 | .. index:: heartbeat |
596 | ||
597 | Heartbeat | |
598 | --------- | |
599 | ||
600 | Ceph monitors know about the cluster by requiring reports from each OSD, and by | |
601 | receiving reports from OSDs about the status of their neighboring OSDs. Ceph | |
602 | provides reasonable default settings for monitor/OSD interaction; however, you | |
603 | may modify them as needed. See `Monitor/OSD Interaction`_ for details. | |
604 | ||
605 | ||
606 | .. index:: Ceph Monitor; leader, Ceph Monitor; provider, Ceph Monitor; requester, Ceph Monitor; synchronization | |
607 | ||
608 | Monitor Store Synchronization | |
609 | ----------------------------- | |
610 | ||
611 | When you run a production cluster with multiple monitors (recommended), each | |
612 | monitor checks to see if a neighboring monitor has a more recent version of the | |
613 | cluster map (e.g., a map in a neighboring monitor with one or more epoch numbers | |
614 | higher than the most current epoch in the map of the instant monitor). | |
615 | Periodically, one monitor in the cluster may fall behind the other monitors to | |
616 | the point where it must leave the quorum, synchronize to retrieve the most | |
617 | current information about the cluster, and then rejoin the quorum. For the | |
618 | purposes of synchronization, monitors may assume one of three roles: | |
619 | ||
620 | #. **Leader**: The `Leader` is the first monitor to achieve the most recent | |
621 | Paxos version of the cluster map. | |
622 | ||
623 | #. **Provider**: The `Provider` is a monitor that has the most recent version | |
624 | of the cluster map, but wasn't the first to achieve the most recent version. | |
625 | ||
626 | #. **Requester:** A `Requester` is a monitor that has fallen behind the leader | |
627 | and must synchronize in order to retrieve the most recent information about | |
628 | the cluster before it can rejoin the quorum. | |
629 | ||
630 | These roles enable a leader to delegate synchronization duties to a provider, | |
631 | which prevents synchronization requests from overloading the leader--improving | |
632 | performance. In the following diagram, the requester has learned that it has | |
633 | fallen behind the other monitors. The requester asks the leader to synchronize, | |
634 | and the leader tells the requester to synchronize with a provider. | |
635 | ||
636 | ||
f91f0fd5 TL |
637 | .. ditaa:: |
638 | +-----------+ +---------+ +----------+ | |
7c673cae FG |
639 | | Requester | | Leader | | Provider | |
640 | +-----------+ +---------+ +----------+ | |
641 | | | | | |
642 | | | | | |
643 | | Ask to Synchronize | | | |
644 | |------------------->| | | |
645 | | | | | |
646 | |<-------------------| | | |
647 | | Tell Requester to | | | |
648 | | Sync with Provider | | | |
649 | | | | | |
650 | | Synchronize | | |
651 | |--------------------+-------------------->| | |
652 | | | | | |
653 | |<-------------------+---------------------| | |
654 | | Send Chunk to Requester | | |
655 | | (repeat as necessary) | | |
656 | | Requester Acks Chuck to Provider | | |
657 | |--------------------+-------------------->| | |
658 | | | | |
659 | | Sync Complete | | |
660 | | Notification | | |
661 | |------------------->| | |
662 | | | | |
663 | |<-------------------| | |
664 | | Ack | | |
665 | | | | |
666 | ||
667 | ||
668 | Synchronization always occurs when a new monitor joins the cluster. During | |
669 | runtime operations, monitors may receive updates to the cluster map at different | |
670 | times. This means the leader and provider roles may migrate from one monitor to | |
671 | another. If this happens while synchronizing (e.g., a provider falls behind the | |
672 | leader), the provider can terminate synchronization with a requester. | |
673 | ||
f67539c2 TL |
674 | Once synchronization is complete, Ceph performs trimming across the cluster. |
675 | Trimming requires that the placement groups are ``active+clean``. | |
7c673cae FG |
676 | |
677 | ||
f67539c2 | 678 | ``mon_sync_timeout`` |
7c673cae | 679 | |
31f18b77 FG |
680 | :Description: Number of seconds the monitor will wait for the next update |
681 | message from its sync provider before it gives up and bootstrap | |
682 | again. | |
9f95a23c | 683 | |
7c673cae | 684 | :Type: Double |
11fdf7f2 | 685 | :Default: ``60.0`` |
7c673cae FG |
686 | |
687 | ||
f67539c2 | 688 | ``mon_sync_max_payload_size`` |
7c673cae | 689 | |
31f18b77 | 690 | :Description: The maximum size for a sync payload (in bytes). |
7c673cae | 691 | :Type: 32-bit Integer |
9f95a23c | 692 | :Default: ``1048576`` |
7c673cae FG |
693 | |
694 | ||
f67539c2 | 695 | ``paxos_max_join_drift`` |
7c673cae | 696 | |
31f18b77 FG |
697 | :Description: The maximum Paxos iterations before we must first sync the |
698 | monitor data stores. When a monitor finds that its peer is too | |
699 | far ahead of it, it will first sync with data stores before moving | |
700 | on. | |
9f95a23c | 701 | |
31f18b77 FG |
702 | :Type: Integer |
703 | :Default: ``10`` | |
7c673cae | 704 | |
9f95a23c | 705 | |
f67539c2 | 706 | ``paxos_stash_full_interval`` |
7c673cae | 707 | |
31f18b77 FG |
708 | :Description: How often (in commits) to stash a full copy of the PaxosService state. |
709 | Current this setting only affects ``mds``, ``mon``, ``auth`` and ``mgr`` | |
710 | PaxosServices. | |
9f95a23c | 711 | |
31f18b77 | 712 | :Type: Integer |
9f95a23c TL |
713 | :Default: ``25`` |
714 | ||
7c673cae | 715 | |
f67539c2 | 716 | ``paxos_propose_interval`` |
7c673cae FG |
717 | |
718 | :Description: Gather updates for this time interval before proposing | |
31f18b77 | 719 | a map update. |
9f95a23c | 720 | |
7c673cae FG |
721 | :Type: Double |
722 | :Default: ``1.0`` | |
723 | ||
724 | ||
f67539c2 | 725 | ``paxos_min`` |
31f18b77 | 726 | |
f67539c2 | 727 | :Description: The minimum number of Paxos states to keep around |
31f18b77 | 728 | :Type: Integer |
9f95a23c | 729 | :Default: ``500`` |
31f18b77 FG |
730 | |
731 | ||
f67539c2 | 732 | ``paxos_min_wait`` |
7c673cae FG |
733 | |
734 | :Description: The minimum amount of time to gather updates after a period of | |
735 | inactivity. | |
9f95a23c | 736 | |
7c673cae FG |
737 | :Type: Double |
738 | :Default: ``0.05`` | |
739 | ||
740 | ||
f67539c2 | 741 | ``paxos_trim_min`` |
31f18b77 FG |
742 | |
743 | :Description: Number of extra proposals tolerated before trimming | |
744 | :Type: Integer | |
9f95a23c | 745 | :Default: ``250`` |
31f18b77 FG |
746 | |
747 | ||
f67539c2 | 748 | ``paxos_trim_max`` |
31f18b77 FG |
749 | |
750 | :Description: The maximum number of extra proposals to trim at a time | |
751 | :Type: Integer | |
9f95a23c | 752 | :Default: ``500`` |
31f18b77 FG |
753 | |
754 | ||
f67539c2 | 755 | ``paxos_service_trim_min`` |
31f18b77 FG |
756 | |
757 | :Description: The minimum amount of versions to trigger a trim (0 disables it) | |
758 | :Type: Integer | |
9f95a23c | 759 | :Default: ``250`` |
31f18b77 FG |
760 | |
761 | ||
f67539c2 | 762 | ``paxos_service_trim_max`` |
31f18b77 FG |
763 | |
764 | :Description: The maximum amount of versions to trim during a single proposal (0 disables it) | |
765 | :Type: Integer | |
9f95a23c | 766 | :Default: ``500`` |
31f18b77 FG |
767 | |
768 | ||
f67539c2 TL |
769 | ``paxos service trim max multiplier`` |
770 | ||
771 | :Description: The factor by which paxos service trim max will be multiplied | |
772 | to get a new upper bound when trim sizes are high (0 disables it) | |
773 | :Type: Integer | |
774 | :Default: ``20`` | |
775 | ||
776 | ||
31f18b77 FG |
777 | ``mon mds force trim to`` |
778 | ||
779 | :Description: Force monitor to trim mdsmaps to this point (0 disables it. | |
780 | dangerous, use with care) | |
9f95a23c | 781 | |
31f18b77 | 782 | :Type: Integer |
9f95a23c | 783 | :Default: ``0`` |
31f18b77 FG |
784 | |
785 | ||
f67539c2 | 786 | ``mon_osd_force_trim_to`` |
31f18b77 FG |
787 | |
788 | :Description: Force monitor to trim osdmaps to this point, even if there is | |
789 | PGs not clean at the specified epoch (0 disables it. dangerous, | |
790 | use with care) | |
9f95a23c | 791 | |
31f18b77 | 792 | :Type: Integer |
9f95a23c TL |
793 | :Default: ``0`` |
794 | ||
31f18b77 | 795 | |
f67539c2 | 796 | ``mon_osd_cache_size`` |
31f18b77 FG |
797 | |
798 | :Description: The size of osdmaps cache, not to rely on underlying store's cache | |
799 | :Type: Integer | |
9f95a23c | 800 | :Default: ``500`` |
31f18b77 FG |
801 | |
802 | ||
f67539c2 | 803 | ``mon_election_timeout`` |
31f18b77 FG |
804 | |
805 | :Description: On election proposer, maximum waiting time for all ACKs in seconds. | |
806 | :Type: Float | |
9f95a23c | 807 | :Default: ``5.00`` |
31f18b77 FG |
808 | |
809 | ||
f67539c2 | 810 | ``mon_lease`` |
7c673cae FG |
811 | |
812 | :Description: The length (in seconds) of the lease on the monitor's versions. | |
813 | :Type: Float | |
9f95a23c | 814 | :Default: ``5.00`` |
7c673cae FG |
815 | |
816 | ||
f67539c2 | 817 | ``mon_lease_renew_interval_factor`` |
7c673cae | 818 | |
f67539c2 | 819 | :Description: ``mon_lease`` \* ``mon_lease_renew_interval_factor`` will be the |
31f18b77 FG |
820 | interval for the Leader to renew the other monitor's leases. The |
821 | factor should be less than ``1.0``. | |
9f95a23c | 822 | |
7c673cae | 823 | :Type: Float |
9f95a23c | 824 | :Default: ``0.60`` |
7c673cae FG |
825 | |
826 | ||
f67539c2 | 827 | ``mon_lease_ack_timeout_factor`` |
7c673cae | 828 | |
f67539c2 | 829 | :Description: The Leader will wait ``mon_lease`` \* ``mon_lease_ack_timeout_factor`` |
31f18b77 | 830 | for the Providers to acknowledge the lease extension. |
9f95a23c | 831 | |
7c673cae | 832 | :Type: Float |
9f95a23c | 833 | :Default: ``2.00`` |
31f18b77 FG |
834 | |
835 | ||
f67539c2 | 836 | ``mon_accept_timeout_factor`` |
31f18b77 | 837 | |
f67539c2 | 838 | :Description: The Leader will wait ``mon_lease`` \* ``mon_accept_timeout_factor`` |
31f18b77 FG |
839 | for the Requester(s) to accept a Paxos update. It is also used |
840 | during the Paxos recovery phase for similar purposes. | |
9f95a23c | 841 | |
31f18b77 | 842 | :Type: Float |
9f95a23c | 843 | :Default: ``2.00`` |
7c673cae FG |
844 | |
845 | ||
f67539c2 | 846 | ``mon_min_osdmap_epochs`` |
7c673cae FG |
847 | |
848 | :Description: Minimum number of OSD map epochs to keep at all times. | |
849 | :Type: 32-bit Integer | |
850 | :Default: ``500`` | |
851 | ||
852 | ||
f67539c2 | 853 | ``mon_max_log_epochs`` |
7c673cae FG |
854 | |
855 | :Description: Maximum number of Log epochs the monitor should keep. | |
856 | :Type: 32-bit Integer | |
857 | :Default: ``500`` | |
858 | ||
859 | ||
860 | ||
7c673cae FG |
861 | .. index:: Ceph Monitor; clock |
862 | ||
863 | Clock | |
864 | ----- | |
865 | ||
866 | Ceph daemons pass critical messages to each other, which must be processed | |
867 | before daemons reach a timeout threshold. If the clocks in Ceph monitors | |
868 | are not synchronized, it can lead to a number of anomalies. For example: | |
869 | ||
870 | - Daemons ignoring received messages (e.g., timestamps outdated) | |
871 | - Timeouts triggered too soon/late when a message wasn't received in time. | |
872 | ||
31f18b77 | 873 | See `Monitor Store Synchronization`_ for details. |
7c673cae FG |
874 | |
875 | ||
f67539c2 | 876 | .. tip:: You must configure NTP or PTP daemons on your Ceph monitor hosts to |
7c673cae | 877 | ensure that the monitor cluster operates with synchronized clocks. |
f67539c2 TL |
878 | It can be advantageous to have monitor hosts sync with each other |
879 | as well as with multiple quality upstream time sources. | |
7c673cae | 880 | |
c07f9fc5 | 881 | Clock drift may still be noticeable with NTP even though the discrepancy is not |
7c673cae FG |
882 | yet harmful. Ceph's clock drift / clock skew warnings may get triggered even |
883 | though NTP maintains a reasonable level of synchronization. Increasing your | |
884 | clock drift may be tolerable under such circumstances; however, a number of | |
885 | factors such as workload, network latency, configuring overrides to default | |
886 | timeouts and the `Monitor Store Synchronization`_ settings may influence | |
887 | the level of acceptable clock drift without compromising Paxos guarantees. | |
888 | ||
889 | Ceph provides the following tunable options to allow you to find | |
890 | acceptable values. | |
891 | ||
892 | ||
f67539c2 | 893 | ``mon_tick_interval`` |
7c673cae FG |
894 | |
895 | :Description: A monitor's tick interval in seconds. | |
896 | :Type: 32-bit Integer | |
897 | :Default: ``5`` | |
898 | ||
899 | ||
f67539c2 | 900 | ``mon_clock_drift_allowed`` |
7c673cae FG |
901 | |
902 | :Description: The clock drift in seconds allowed between monitors. | |
903 | :Type: Float | |
9f95a23c | 904 | :Default: ``0.05`` |
7c673cae FG |
905 | |
906 | ||
f67539c2 | 907 | ``mon_clock_drift_warn_backoff`` |
7c673cae FG |
908 | |
909 | :Description: Exponential backoff for clock drift warnings | |
910 | :Type: Float | |
9f95a23c | 911 | :Default: ``5.00`` |
7c673cae FG |
912 | |
913 | ||
f67539c2 | 914 | ``mon_timecheck_interval`` |
7c673cae FG |
915 | |
916 | :Description: The time check interval (clock drift check) in seconds | |
31f18b77 | 917 | for the Leader. |
7c673cae FG |
918 | |
919 | :Type: Float | |
9f95a23c | 920 | :Default: ``300.00`` |
7c673cae FG |
921 | |
922 | ||
f67539c2 | 923 | ``mon_timecheck_skew_interval`` |
31f18b77 FG |
924 | |
925 | :Description: The time check interval (clock drift check) in seconds when in | |
926 | presence of a skew in seconds for the Leader. | |
9f95a23c | 927 | |
31f18b77 | 928 | :Type: Float |
9f95a23c | 929 | :Default: ``30.00`` |
31f18b77 | 930 | |
7c673cae FG |
931 | |
932 | Client | |
933 | ------ | |
934 | ||
f67539c2 | 935 | ``mon_client_hunt_interval`` |
7c673cae FG |
936 | |
937 | :Description: The client will try a new monitor every ``N`` seconds until it | |
938 | establishes a connection. | |
939 | ||
940 | :Type: Double | |
9f95a23c | 941 | :Default: ``3.00`` |
7c673cae FG |
942 | |
943 | ||
f67539c2 | 944 | ``mon_client_ping_interval`` |
7c673cae FG |
945 | |
946 | :Description: The client will ping the monitor every ``N`` seconds. | |
947 | :Type: Double | |
9f95a23c | 948 | :Default: ``10.00`` |
7c673cae FG |
949 | |
950 | ||
f67539c2 | 951 | ``mon_client_max_log_entries_per_message`` |
7c673cae FG |
952 | |
953 | :Description: The maximum number of log entries a monitor will generate | |
954 | per client message. | |
955 | ||
956 | :Type: Integer | |
957 | :Default: ``1000`` | |
958 | ||
959 | ||
f67539c2 | 960 | ``mon_client_bytes`` |
7c673cae FG |
961 | |
962 | :Description: The amount of client message data allowed in memory (in bytes). | |
963 | :Type: 64-bit Integer Unsigned | |
964 | :Default: ``100ul << 20`` | |
965 | ||
f67539c2 | 966 | .. _pool-settings: |
7c673cae FG |
967 | |
968 | Pool settings | |
969 | ============= | |
9f95a23c | 970 | |
7c673cae | 971 | Since version v0.94 there is support for pool flags which allow or disallow changes to be made to pools. |
f67539c2 TL |
972 | Monitors can also disallow removal of pools if appropriately configured. The inconvenience of this guardrail |
973 | is far outweighed by the number of accidental pool (and thus data) deletions it prevents. | |
7c673cae | 974 | |
f67539c2 | 975 | ``mon_allow_pool_delete`` |
7c673cae | 976 | |
f67539c2 | 977 | :Description: Should monitors allow pools to be removed, regardless of what the pool flags say? |
7c673cae | 978 | |
7c673cae FG |
979 | :Type: Boolean |
980 | :Default: ``false`` | |
981 | ||
9f95a23c | 982 | |
f67539c2 | 983 | ``osd_pool_default_ec_fast_read`` |
11fdf7f2 TL |
984 | |
985 | :Description: Whether to turn on fast read on the pool or not. It will be used as | |
986 | the default setting of newly created erasure coded pools if ``fast_read`` | |
987 | is not specified at create time. | |
9f95a23c | 988 | |
11fdf7f2 TL |
989 | :Type: Boolean |
990 | :Default: ``false`` | |
991 | ||
9f95a23c | 992 | |
f67539c2 | 993 | ``osd_pool_default_flag_hashpspool`` |
7c673cae FG |
994 | |
995 | :Description: Set the hashpspool flag on new pools | |
996 | :Type: Boolean | |
997 | :Default: ``true`` | |
998 | ||
9f95a23c | 999 | |
f67539c2 | 1000 | ``osd_pool_default_flag_nodelete`` |
7c673cae | 1001 | |
f67539c2 | 1002 | :Description: Set the ``nodelete`` flag on new pools, which prevents pool removal. |
7c673cae FG |
1003 | :Type: Boolean |
1004 | :Default: ``false`` | |
1005 | ||
9f95a23c | 1006 | |
f67539c2 | 1007 | ``osd_pool_default_flag_nopgchange`` |
7c673cae | 1008 | |
f67539c2 | 1009 | :Description: Set the ``nopgchange`` flag on new pools. Does not allow the number of PGs to be changed. |
7c673cae FG |
1010 | :Type: Boolean |
1011 | :Default: ``false`` | |
1012 | ||
9f95a23c | 1013 | |
f67539c2 | 1014 | ``osd_pool_default_flag_nosizechange`` |
7c673cae | 1015 | |
f67539c2 | 1016 | :Description: Set the ``nosizechange`` flag on new pools. Does not allow the ``size`` to be changed. |
7c673cae FG |
1017 | :Type: Boolean |
1018 | :Default: ``false`` | |
1019 | ||
1020 | For more information about the pool flags see `Pool values`_. | |
1021 | ||
1022 | Miscellaneous | |
1023 | ============= | |
1024 | ||
f67539c2 | 1025 | ``mon_max_osd`` |
7c673cae FG |
1026 | |
1027 | :Description: The maximum number of OSDs allowed in the cluster. | |
1028 | :Type: 32-bit Integer | |
1029 | :Default: ``10000`` | |
1030 | ||
9f95a23c | 1031 | |
f67539c2 | 1032 | ``mon_globalid_prealloc`` |
7c673cae FG |
1033 | |
1034 | :Description: The number of global IDs to pre-allocate for clients and daemons in the cluster. | |
1035 | :Type: 32-bit Integer | |
9f95a23c TL |
1036 | :Default: ``10000`` |
1037 | ||
7c673cae | 1038 | |
f67539c2 | 1039 | ``mon_subscribe_interval`` |
7c673cae FG |
1040 | |
1041 | :Description: The refresh interval (in seconds) for subscriptions. The | |
f67539c2 | 1042 | subscription mechanism enables obtaining cluster maps |
7c673cae FG |
1043 | and log information. |
1044 | ||
1045 | :Type: Double | |
9f95a23c | 1046 | :Default: ``86400.00`` |
7c673cae FG |
1047 | |
1048 | ||
f67539c2 | 1049 | ``mon_stat_smooth_intervals`` |
7c673cae FG |
1050 | |
1051 | :Description: Ceph will smooth statistics over the last ``N`` PG maps. | |
1052 | :Type: Integer | |
9f95a23c | 1053 | :Default: ``6`` |
7c673cae FG |
1054 | |
1055 | ||
f67539c2 | 1056 | ``mon_probe_timeout`` |
7c673cae FG |
1057 | |
1058 | :Description: Number of seconds the monitor will wait to find peers before bootstrapping. | |
1059 | :Type: Double | |
9f95a23c | 1060 | :Default: ``2.00`` |
7c673cae FG |
1061 | |
1062 | ||
f67539c2 | 1063 | ``mon_daemon_bytes`` |
7c673cae FG |
1064 | |
1065 | :Description: The message memory cap for metadata server and OSD messages (in bytes). | |
1066 | :Type: 64-bit Integer Unsigned | |
1067 | :Default: ``400ul << 20`` | |
1068 | ||
1069 | ||
f67539c2 | 1070 | ``mon_max_log_entries_per_event`` |
7c673cae FG |
1071 | |
1072 | :Description: The maximum number of log entries per event. | |
1073 | :Type: Integer | |
1074 | :Default: ``4096`` | |
1075 | ||
1076 | ||
f67539c2 | 1077 | ``mon_osd_prime_pg_temp`` |
7c673cae | 1078 | |
f67539c2 TL |
1079 | :Description: Enables or disables priming the PGMap with the previous OSDs when an ``out`` |
1080 | OSD comes back into the cluster. With the ``true`` setting, clients | |
1081 | will continue to use the previous OSDs until the newly ``in`` OSDs for | |
1082 | a PG have peered. | |
9f95a23c | 1083 | |
7c673cae FG |
1084 | :Type: Boolean |
1085 | :Default: ``true`` | |
1086 | ||
1087 | ||
f67539c2 | 1088 | ``mon_osd_prime pg temp max time`` |
7c673cae FG |
1089 | |
1090 | :Description: How much time in seconds the monitor should spend trying to prime the | |
1091 | PGMap when an out OSD comes back into the cluster. | |
9f95a23c | 1092 | |
7c673cae | 1093 | :Type: Float |
9f95a23c | 1094 | :Default: ``0.50`` |
7c673cae FG |
1095 | |
1096 | ||
f67539c2 | 1097 | ``mon_osd_prime_pg_temp_max_time_estimate`` |
31f18b77 FG |
1098 | |
1099 | :Description: Maximum estimate of time spent on each PG before we prime all PGs | |
1100 | in parallel. | |
9f95a23c | 1101 | |
31f18b77 FG |
1102 | :Type: Float |
1103 | :Default: ``0.25`` | |
1104 | ||
1105 | ||
f67539c2 | 1106 | ``mon_mds_skip_sanity`` |
31f18b77 FG |
1107 | |
1108 | :Description: Skip safety assertions on FSMap (in case of bugs where we want to | |
1109 | continue anyway). Monitor terminates if the FSMap sanity check | |
1110 | fails, but we can disable it by enabling this option. | |
9f95a23c | 1111 | |
31f18b77 | 1112 | :Type: Boolean |
9f95a23c | 1113 | :Default: ``False`` |
31f18b77 FG |
1114 | |
1115 | ||
f67539c2 | 1116 | ``mon_max_mdsmap_epochs`` |
31f18b77 | 1117 | |
f67539c2 | 1118 | :Description: The maximum number of mdsmap epochs to trim during a single proposal. |
31f18b77 | 1119 | :Type: Integer |
9f95a23c | 1120 | :Default: ``500`` |
31f18b77 FG |
1121 | |
1122 | ||
f67539c2 | 1123 | ``mon_config_key_max_entry_size`` |
31f18b77 FG |
1124 | |
1125 | :Description: The maximum size of config-key entry (in bytes) | |
1126 | :Type: Integer | |
9f95a23c | 1127 | :Default: ``65536`` |
31f18b77 FG |
1128 | |
1129 | ||
f67539c2 | 1130 | ``mon_scrub_interval`` |
31f18b77 | 1131 | |
f67539c2 TL |
1132 | :Description: How often the monitor scrubs its store by comparing |
1133 | the stored checksums with the computed ones for all stored | |
1134 | keys. (0 disables it. dangerous, use with care) | |
9f95a23c | 1135 | |
f67539c2 TL |
1136 | :Type: Seconds |
1137 | :Default: ``1 day`` | |
31f18b77 FG |
1138 | |
1139 | ||
f67539c2 | 1140 | ``mon_scrub_max_keys`` |
31f18b77 FG |
1141 | |
1142 | :Description: The maximum number of keys to scrub each time. | |
1143 | :Type: Integer | |
9f95a23c | 1144 | :Default: ``100`` |
31f18b77 FG |
1145 | |
1146 | ||
f67539c2 | 1147 | ``mon_compact_on_start`` |
31f18b77 FG |
1148 | |
1149 | :Description: Compact the database used as Ceph Monitor store on | |
1150 | ``ceph-mon`` start. A manual compaction helps to shrink the | |
1151 | monitor database and improve the performance of it if the regular | |
1152 | compaction fails to work. | |
9f95a23c | 1153 | |
31f18b77 | 1154 | :Type: Boolean |
9f95a23c | 1155 | :Default: ``False`` |
31f18b77 FG |
1156 | |
1157 | ||
f67539c2 | 1158 | ``mon_compact_on_bootstrap`` |
31f18b77 | 1159 | |
f67539c2 TL |
1160 | :Description: Compact the database used as Ceph Monitor store |
1161 | on bootstrap. Monitors probe each other to establish | |
1162 | a quorum after bootstrap. If a monitor times out before joining the | |
1163 | quorum, it will start over and bootstrap again. | |
9f95a23c | 1164 | |
31f18b77 | 1165 | :Type: Boolean |
9f95a23c | 1166 | :Default: ``False`` |
31f18b77 FG |
1167 | |
1168 | ||
f67539c2 | 1169 | ``mon_compact_on_trim`` |
31f18b77 FG |
1170 | |
1171 | :Description: Compact a certain prefix (including paxos) when we trim its old states. | |
1172 | :Type: Boolean | |
9f95a23c | 1173 | :Default: ``True`` |
31f18b77 FG |
1174 | |
1175 | ||
f67539c2 | 1176 | ``mon_cpu_threads`` |
31f18b77 FG |
1177 | |
1178 | :Description: Number of threads for performing CPU intensive work on monitor. | |
9f95a23c TL |
1179 | :Type: Integer |
1180 | :Default: ``4`` | |
31f18b77 FG |
1181 | |
1182 | ||
f67539c2 | 1183 | ``mon_osd_mapping_pgs_per_chunk`` |
31f18b77 FG |
1184 | |
1185 | :Description: We calculate the mapping from placement group to OSDs in chunks. | |
1186 | This option specifies the number of placement groups per chunk. | |
9f95a23c | 1187 | |
31f18b77 | 1188 | :Type: Integer |
9f95a23c TL |
1189 | :Default: ``4096`` |
1190 | ||
31f18b77 | 1191 | |
f67539c2 | 1192 | ``mon_session_timeout`` |
31f18b77 FG |
1193 | |
1194 | :Description: Monitor will terminate inactive sessions stay idle over this | |
1195 | time limit. | |
9f95a23c | 1196 | |
31f18b77 | 1197 | :Type: Integer |
9f95a23c TL |
1198 | :Default: ``300`` |
1199 | ||
31f18b77 | 1200 | |
f67539c2 | 1201 | ``mon_osd_cache_size_min`` |
eafe8130 TL |
1202 | |
1203 | :Description: The minimum amount of bytes to be kept mapped in memory for osd | |
1204 | monitor caches. | |
9f95a23c | 1205 | |
eafe8130 | 1206 | :Type: 64-bit Integer |
9f95a23c TL |
1207 | :Default: ``134217728`` |
1208 | ||
eafe8130 | 1209 | |
f67539c2 | 1210 | ``mon_memory_target`` |
eafe8130 | 1211 | |
f67539c2 | 1212 | :Description: The amount of bytes pertaining to OSD monitor caches and KV cache |
eafe8130 | 1213 | to be kept mapped in memory with cache auto-tuning enabled. |
9f95a23c | 1214 | |
eafe8130 | 1215 | :Type: 64-bit Integer |
9f95a23c TL |
1216 | :Default: ``2147483648`` |
1217 | ||
eafe8130 | 1218 | |
f67539c2 | 1219 | ``mon_memory_autotune`` |
eafe8130 | 1220 | |
f67539c2 | 1221 | :Description: Autotune the cache memory used for OSD monitors and KV |
eafe8130 | 1222 | database. |
9f95a23c | 1223 | |
eafe8130 | 1224 | :Type: Boolean |
9f95a23c | 1225 | :Default: ``True`` |
31f18b77 | 1226 | |
7c673cae | 1227 | |
11fdf7f2 | 1228 | .. _Paxos: https://en.wikipedia.org/wiki/Paxos_(computer_science) |
7c673cae FG |
1229 | .. _Monitor Keyrings: ../../../dev/mon-bootstrap#secret-keys |
1230 | .. _Ceph configuration file: ../ceph-conf/#monitors | |
1231 | .. _Network Configuration Reference: ../network-config-ref | |
1232 | .. _Monitor lookup through DNS: ../mon-lookup-dns | |
11fdf7f2 | 1233 | .. _ACID: https://en.wikipedia.org/wiki/ACID |
7c673cae | 1234 | .. _Adding/Removing a Monitor: ../../operations/add-or-rm-mons |
7c673cae FG |
1235 | .. _Monitoring a Cluster: ../../operations/monitoring |
1236 | .. _Monitoring OSDs and PGs: ../../operations/monitoring-osd-pg | |
1237 | .. _Bootstrapping a Monitor: ../../../dev/mon-bootstrap | |
1238 | .. _Changing a Monitor's IP Address: ../../operations/add-or-rm-mons#changing-a-monitor-s-ip-address | |
1239 | .. _Monitor/OSD Interaction: ../mon-osd-interaction | |
1240 | .. _Scalability and High Availability: ../../../architecture#scalability-and-high-availability | |
1241 | .. _Pool values: ../../operations/pools/#set-pool-values |