]> git.proxmox.com Git - ceph.git/blame - ceph/doc/rados/operations/monitoring-osd-pg.rst
bump version to 18.2.4-pve3
[ceph.git] / ceph / doc / rados / operations / monitoring-osd-pg.rst
CommitLineData
7c673cae
FG
1=========================
2 Monitoring OSDs and PGs
3=========================
4
5High availability and high reliability require a fault-tolerant approach to
1e59de90
TL
6managing hardware and software issues. Ceph has no single point of failure and
7it can service requests for data even when in a "degraded" mode. Ceph's `data
8placement`_ introduces a layer of indirection to ensure that data doesn't bind
9directly to specific OSDs. For this reason, tracking system faults
10requires finding the `placement group`_ (PG) and the underlying OSDs at the
11root of the problem.
12
aee94f69
TL
13.. tip:: A fault in one part of the cluster might prevent you from accessing a
14 particular object, but that doesn't mean that you are prevented from
15 accessing other objects. When you run into a fault, don't panic. Just
16 follow the steps for monitoring your OSDs and placement groups, and then
17 begin troubleshooting.
7c673cae 18
1e59de90
TL
19Ceph is self-repairing. However, when problems persist, monitoring OSDs and
20placement groups will help you identify the problem.
7c673cae
FG
21
22
23Monitoring OSDs
24===============
25
aee94f69
TL
26An OSD is either *in* service (``in``) or *out* of service (``out``). An OSD is
27either running and reachable (``up``), or it is not running and not reachable
28(``down``).
29
30If an OSD is ``up``, it may be either ``in`` service (clients can read and
31write data) or it is ``out`` of service. If the OSD was ``in`` but then due to
32a failure or a manual action was set to the ``out`` state, Ceph will migrate
33placement groups to the other OSDs to maintin the configured redundancy.
34
35If an OSD is ``out`` of service, CRUSH will not assign placement groups to it.
36If an OSD is ``down``, it will also be ``out``.
37
38.. note:: If an OSD is ``down`` and ``in``, there is a problem and this
39 indicates that the cluster is not in a healthy state.
7c673cae 40
f91f0fd5
TL
41.. ditaa::
42
43 +----------------+ +----------------+
7c673cae
FG
44 | | | |
45 | OSD #n In | | OSD #n Up |
46 | | | |
47 +----------------+ +----------------+
48 ^ ^
49 | |
50 | |
51 v v
52 +----------------+ +----------------+
53 | | | |
54 | OSD #n Out | | OSD #n Down |
55 | | | |
56 +----------------+ +----------------+
57
1e59de90
TL
58If you run the commands ``ceph health``, ``ceph -s``, or ``ceph -w``,
59you might notice that the cluster does not always show ``HEALTH OK``. Don't
60panic. There are certain circumstances in which it is expected and normal that
61the cluster will **NOT** show ``HEALTH OK``:
7c673cae 62
1e59de90
TL
63#. You haven't started the cluster yet.
64#. You have just started or restarted the cluster and it's not ready to show
65 health statuses yet, because the PGs are in the process of being created and
66 the OSDs are in the process of peering.
67#. You have just added or removed an OSD.
68#. You have just have modified your cluster map.
7c673cae 69
1e59de90
TL
70Checking to see if OSDs are ``up`` and running is an important aspect of monitoring them:
71whenever the cluster is up and running, every OSD that is ``in`` the cluster should also
72be ``up`` and running. To see if all of the cluster's OSDs are running, run the following
73command:
39ae355f
TL
74
75.. prompt:: bash $
7c673cae 76
1e59de90 77 ceph osd stat
7c673cae 78
1e59de90
TL
79The output provides the following information: the total number of OSDs (x),
80how many OSDs are ``up`` (y), how many OSDs are ``in`` (z), and the map epoch (eNNNN). ::
7c673cae 81
1e59de90 82 x osds: y up, z in; epoch: eNNNN
7c673cae 83
1e59de90
TL
84If the number of OSDs that are ``in`` the cluster is greater than the number of
85OSDs that are ``up``, run the following command to identify the ``ceph-osd``
39ae355f
TL
86daemons that are not running:
87
88.. prompt:: bash $
7c673cae 89
1e59de90 90 ceph osd tree
7c673cae
FG
91
92::
93
1e59de90
TL
94 #ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
95 -1 2.00000 pool openstack
96 -3 2.00000 rack dell-2950-rack-A
97 -2 2.00000 host dell-2950-A1
98 0 ssd 1.00000 osd.0 up 1.00000 1.00000
99 1 ssd 1.00000 osd.1 down 1.00000 1.00000
7c673cae 100
1e59de90
TL
101.. tip:: Searching through a well-designed CRUSH hierarchy to identify the physical
102 locations of particular OSDs might help you troubleshoot your cluster.
7c673cae 103
1e59de90 104If an OSD is ``down``, start it by running the following command:
39ae355f
TL
105
106.. prompt:: bash $
7c673cae 107
1e59de90
TL
108 sudo systemctl start ceph-osd@1
109
110For problems associated with OSDs that have stopped or won't restart, see `OSD Not Running`_.
7c673cae 111
7c673cae
FG
112
113PG Sets
114=======
115
1e59de90
TL
116When CRUSH assigns a PG to OSDs, it takes note of how many replicas of the PG
117are required by the pool and then assigns each replica to a different OSD.
118For example, if the pool requires three replicas of a PG, CRUSH might assign
119them individually to ``osd.1``, ``osd.2`` and ``osd.3``. CRUSH seeks a
120pseudo-random placement that takes into account the failure domains that you
121have set in your `CRUSH map`_; for this reason, PGs are rarely assigned to
122immediately adjacent OSDs in a large cluster.
123
124Ceph processes client requests with the **Acting Set** of OSDs: this is the set
125of OSDs that currently have a full and working version of a PG shard and that
126are therefore responsible for handling requests. By contrast, the **Up Set** is
127the set of OSDs that contain a shard of a specific PG. Data is moved or copied
128to the **Up Set**, or planned to be moved or copied, to the **Up Set**. See
129:ref:`Placement Group Concepts <rados_operations_pg_concepts>`.
130
131Sometimes an OSD in the Acting Set is ``down`` or otherwise unable to
132service requests for objects in the PG. When this kind of situation
133arises, don't panic. Common examples of such a situation include:
134
135- You added or removed an OSD, CRUSH reassigned the PG to
136 other OSDs, and this reassignment changed the composition of the Acting Set and triggered
137 the migration of data by means of a "backfill" process.
7c673cae 138- An OSD was ``down``, was restarted, and is now ``recovering``.
1e59de90 139- An OSD in the Acting Set is ``down`` or unable to service requests,
7c673cae
FG
140 and another OSD has temporarily assumed its duties.
141
1e59de90
TL
142Typically, the Up Set and the Acting Set are identical. When they are not, it
143might indicate that Ceph is migrating the PG (in other words, that the PG has
144been remapped), that an OSD is recovering, or that there is a problem with the
145cluster (in such scenarios, Ceph usually shows a "HEALTH WARN" state with a
146"stuck stale" message).
7c673cae 147
1e59de90 148To retrieve a list of PGs, run the following command:
39ae355f
TL
149
150.. prompt:: bash $
7c673cae 151
1e59de90
TL
152 ceph pg dump
153
154To see which OSDs are within the Acting Set and the Up Set for a specific PG, run the following command:
39ae355f
TL
155
156.. prompt:: bash $
7c673cae 157
1e59de90 158 ceph pg map {pg-num}
7c673cae 159
1e59de90
TL
160The output provides the following information: the osdmap epoch (eNNN), the PG number
161({pg-num}), the OSDs in the Up Set (up[]), and the OSDs in the Acting Set
39ae355f 162(acting[])::
7c673cae 163
1e59de90 164 osdmap eNNN pg {raw-pg-num} ({pg-num}) -> up [0,1,2] acting [0,1,2]
7c673cae 165
1e59de90
TL
166.. note:: If the Up Set and the Acting Set do not match, this might indicate
167 that the cluster is rebalancing itself or that there is a problem with
7c673cae 168 the cluster.
1e59de90 169
7c673cae
FG
170
171Peering
172=======
173
1e59de90
TL
174Before you can write data to a PG, it must be in an ``active`` state and it
175will preferably be in a ``clean`` state. For Ceph to determine the current
176state of a PG, peering must take place. That is, the primary OSD of the PG
177(that is, the first OSD in the Acting Set) must peer with the secondary and
178OSDs so that consensus on the current state of the PG can be established. In
179the following diagram, we assume a pool with three replicas of the PG:
7c673cae 180
f91f0fd5
TL
181.. ditaa::
182
183 +---------+ +---------+ +-------+
7c673cae
FG
184 | OSD 1 | | OSD 2 | | OSD 3 |
185 +---------+ +---------+ +-------+
186 | | |
187 | Request To | |
1e59de90 188 | Peer | |
7c673cae
FG
189 |-------------->| |
190 |<--------------| |
191 | Peering |
192 | |
193 | Request To |
1e59de90
TL
194 | Peer |
195 |----------------------------->|
7c673cae
FG
196 |<-----------------------------|
197 | Peering |
198
1e59de90
TL
199The OSDs also report their status to the monitor. For details, see `Configuring Monitor/OSD
200Interaction`_. To troubleshoot peering issues, see `Peering
7c673cae
FG
201Failure`_.
202
203
1e59de90
TL
204Monitoring PG States
205====================
7c673cae 206
1e59de90
TL
207If you run the commands ``ceph health``, ``ceph -s``, or ``ceph -w``,
208you might notice that the cluster does not always show ``HEALTH OK``. After
209first checking to see if the OSDs are running, you should also check PG
210states. There are certain PG-peering-related circumstances in which it is expected
211and normal that the cluster will **NOT** show ``HEALTH OK``:
7c673cae 212
1e59de90
TL
213#. You have just created a pool and the PGs haven't peered yet.
214#. The PGs are recovering.
7c673cae 215#. You have just added an OSD to or removed an OSD from the cluster.
1e59de90
TL
216#. You have just modified your CRUSH map and your PGs are migrating.
217#. There is inconsistent data in different replicas of a PG.
218#. Ceph is scrubbing a PG's replicas.
7c673cae
FG
219#. Ceph doesn't have enough storage capacity to complete backfilling operations.
220
1e59de90
TL
221If one of these circumstances causes Ceph to show ``HEALTH WARN``, don't
222panic. In many cases, the cluster will recover on its own. In some cases, however, you
223might need to take action. An important aspect of monitoring PGs is to check their
224status as ``active`` and ``clean``: that is, it is important to ensure that, when the
225cluster is up and running, all PGs are ``active`` and (preferably) ``clean``.
226To see the status of every PG, run the following command:
39ae355f
TL
227
228.. prompt:: bash $
7c673cae 229
1e59de90 230 ceph pg stat
7c673cae 231
1e59de90
TL
232The output provides the following information: the total number of PGs (x), how many
233PGs are in a particular state such as ``active+clean`` (y), and the
11fdf7f2 234amount of data stored (z). ::
7c673cae 235
1e59de90 236 x pgs: y active+clean; z bytes data, aa MB used, bb GB / cc GB avail
7c673cae 237
1e59de90
TL
238.. note:: It is common for Ceph to report multiple states for PGs (for example,
239 ``active+clean``, ``active+clean+remapped``, ``active+clean+scrubbing``.
7c673cae 240
1e59de90
TL
241Here Ceph shows not only the PG states, but also storage capacity used (aa),
242the amount of storage capacity remaining (bb), and the total storage capacity
243of the PG. These values can be important in a few cases:
7c673cae 244
1e59de90
TL
245- The cluster is reaching its ``near full ratio`` or ``full ratio``.
246- Data is not being distributed across the cluster due to an error in the
247 CRUSH configuration.
7c673cae
FG
248
249
250.. topic:: Placement Group IDs
251
1e59de90
TL
252 PG IDs consist of the pool number (not the pool name) followed by a period
253 (.) and a hexadecimal number. You can view pool numbers and their names from
254 in the output of ``ceph osd lspools``. For example, the first pool that was
255 created corresponds to pool number ``1``. A fully qualified PG ID has the
7c673cae 256 following form::
7c673cae 257
1e59de90
TL
258 {pool-num}.{pg-id}
259
260 It typically resembles the following::
261
262 1.1701b
263
264
265To retrieve a list of PGs, run the following command:
39ae355f
TL
266
267.. prompt:: bash $
7c673cae 268
1e59de90
TL
269 ceph pg dump
270
271To format the output in JSON format and save it to a file, run the following command:
39ae355f
TL
272
273.. prompt:: bash $
7c673cae 274
1e59de90 275 ceph pg dump -o {filename} --format=json
7c673cae 276
1e59de90 277To query a specific PG, run the following command:
39ae355f
TL
278
279.. prompt:: bash $
7c673cae 280
1e59de90
TL
281 ceph pg {poolnum}.{pg-id} query
282
7c673cae
FG
283Ceph will output the query in JSON format.
284
1e59de90
TL
285The following subsections describe the most common PG states in detail.
286
7c673cae
FG
287
288Creating
289--------
290
1e59de90
TL
291PGs are created when you create a pool: the command that creates a pool
292specifies the total number of PGs for that pool, and when the pool is created
293all of those PGs are created as well. Ceph will echo ``creating`` while it is
294creating PGs. After the PG(s) are created, the OSDs that are part of a PG's
295Acting Set will peer. Once peering is complete, the PG status should be
296``active+clean``. This status means that Ceph clients begin writing to the
297PG.
7c673cae 298
f91f0fd5
TL
299.. ditaa::
300
7c673cae
FG
301 /-----------\ /-----------\ /-----------\
302 | Creating |------>| Peering |------>| Active |
303 \-----------/ \-----------/ \-----------/
304
305Peering
306-------
307
1e59de90
TL
308When a PG peers, the OSDs that store the replicas of its data converge on an
309agreed state of the data and metadata within that PG. When peering is complete,
310those OSDs agree about the state of that PG. However, completion of the peering
311process does **NOT** mean that each replica has the latest contents.
7c673cae 312
11fdf7f2 313.. topic:: Authoritative History
7c673cae 314
1e59de90
TL
315 Ceph will **NOT** acknowledge a write operation to a client until that write
316 operation is persisted by every OSD in the Acting Set. This practice ensures
317 that at least one member of the Acting Set will have a record of every
318 acknowledged write operation since the last successful peering operation.
7c673cae 319
1e59de90
TL
320 Given an accurate record of each acknowledged write operation, Ceph can
321 construct a new authoritative history of the PG--that is, a complete and
322 fully ordered set of operations that, if performed, would bring an OSD’s
323 copy of the PG up to date.
7c673cae
FG
324
325
326Active
327------
328
1e59de90
TL
329After Ceph has completed the peering process, a PG should become ``active``.
330The ``active`` state means that the data in the PG is generally available for
331read and write operations in the primary and replica OSDs.
7c673cae
FG
332
333
334Clean
335-----
336
1e59de90
TL
337When a PG is in the ``clean`` state, all OSDs holding its data and metadata
338have successfully peered and there are no stray replicas. Ceph has replicated
339all objects in the PG the correct number of times.
7c673cae
FG
340
341
342Degraded
343--------
344
345When a client writes an object to the primary OSD, the primary OSD is
346responsible for writing the replicas to the replica OSDs. After the primary OSD
1e59de90 347writes the object to storage, the PG will remain in a ``degraded``
7c673cae
FG
348state until the primary OSD has received an acknowledgement from the replica
349OSDs that Ceph created the replica objects successfully.
350
1e59de90
TL
351The reason that a PG can be ``active+degraded`` is that an OSD can be
352``active`` even if it doesn't yet hold all of the PG's objects. If an OSD goes
353``down``, Ceph marks each PG assigned to the OSD as ``degraded``. The PGs must
354peer again when the OSD comes back online. However, a client can still write a
355new object to a ``degraded`` PG if it is ``active``.
7c673cae 356
1e59de90 357If an OSD is ``down`` and the ``degraded`` condition persists, Ceph might mark the
7c673cae
FG
358``down`` OSD as ``out`` of the cluster and remap the data from the ``down`` OSD
359to another OSD. The time between being marked ``down`` and being marked ``out``
1e59de90 360is determined by ``mon_osd_down_out_interval``, which is set to ``600`` seconds
7c673cae
FG
361by default.
362
1e59de90
TL
363A PG can also be in the ``degraded`` state because there are one or more
364objects that Ceph expects to find in the PG but that Ceph cannot find. Although
365you cannot read or write to unfound objects, you can still access all of the other
366objects in the ``degraded`` PG.
7c673cae
FG
367
368
369Recovering
370----------
371
1e59de90
TL
372Ceph was designed for fault-tolerance, because hardware and other server
373problems are expected or even routine. When an OSD goes ``down``, its contents
374might fall behind the current state of other replicas in the PGs. When the OSD
375has returned to the ``up`` state, the contents of the PGs must be updated to
376reflect that current state. During that time period, the OSD might be in a
377``recovering`` state.
7c673cae 378
c07f9fc5 379Recovery is not always trivial, because a hardware failure might cause a
7c673cae 380cascading failure of multiple OSDs. For example, a network switch for a rack or
1e59de90
TL
381cabinet might fail, which can cause the OSDs of a number of host machines to
382fall behind the current state of the cluster. In such a scenario, general
383recovery is possible only if each of the OSDs recovers after the fault has been
384resolved.]
385
386Ceph provides a number of settings that determine how the cluster balances the
387resource contention between the need to process new service requests and the
388need to recover data objects and restore the PGs to the current state. The
389``osd_recovery_delay_start`` setting allows an OSD to restart, re-peer, and
390even process some replay requests before starting the recovery process. The
391``osd_recovery_thread_timeout`` setting determines the duration of a thread
392timeout, because multiple OSDs might fail, restart, and re-peer at staggered
393rates. The ``osd_recovery_max_active`` setting limits the number of recovery
394requests an OSD can entertain simultaneously, in order to prevent the OSD from
395failing to serve. The ``osd_recovery_max_chunk`` setting limits the size of
396the recovered data chunks, in order to prevent network congestion.
7c673cae
FG
397
398
399Back Filling
400------------
401
1e59de90
TL
402When a new OSD joins the cluster, CRUSH will reassign PGs from OSDs that are
403already in the cluster to the newly added OSD. It can put excessive load on the
404new OSD to force it to immediately accept the reassigned PGs. Back filling the
405OSD with the PGs allows this process to begin in the background. After the
406backfill operations have completed, the new OSD will begin serving requests as
407soon as it is ready.
7c673cae 408
1e59de90 409During the backfill operations, you might see one of several states:
c07f9fc5 410``backfill_wait`` indicates that a backfill operation is pending, but is not
1e59de90
TL
411yet underway; ``backfilling`` indicates that a backfill operation is currently
412underway; and ``backfill_toofull`` indicates that a backfill operation was
413requested but couldn't be completed due to insufficient storage capacity. When
414a PG cannot be backfilled, it might be considered ``incomplete``.
415
416The ``backfill_toofull`` state might be transient. It might happen that, as PGs
417are moved around, space becomes available. The ``backfill_toofull`` state is
418similar to ``backfill_wait`` in that backfill operations can proceed as soon as
419conditions change.
420
421Ceph provides a number of settings to manage the load spike associated with the
422reassignment of PGs to an OSD (especially a new OSD). The ``osd_max_backfills``
423setting specifies the maximum number of concurrent backfills to and from an OSD
424(default: 1). The ``backfill_full_ratio`` setting allows an OSD to refuse a
425backfill request if the OSD is approaching its full ratio (default: 90%). This
426setting can be changed with the ``ceph osd set-backfillfull-ratio`` command. If
427an OSD refuses a backfill request, the ``osd_backfill_retry_interval`` setting
428allows an OSD to retry the request after a certain interval (default: 30
429seconds). OSDs can also set ``osd_backfill_scan_min`` and
430``osd_backfill_scan_max`` in order to manage scan intervals (default: 64 and
431512, respectively).
7c673cae
FG
432
433
434Remapped
435--------
436
1e59de90
TL
437When the Acting Set that services a PG changes, the data migrates from the old
438Acting Set to the new Acting Set. Because it might take time for the new
439primary OSD to begin servicing requests, the old primary OSD might be required
440to continue servicing requests until the PG data migration is complete. After
441data migration has completed, the mapping uses the primary OSD of the new
442Acting Set.
7c673cae
FG
443
444
445Stale
446-----
447
1e59de90
TL
448Although Ceph uses heartbeats in order to ensure that hosts and daemons are
449running, the ``ceph-osd`` daemons might enter a ``stuck`` state where they are
450not reporting statistics in a timely manner (for example, there might be a
451temporary network fault). By default, OSD daemons report their PG, up through,
452boot, and failure statistics every half second (that is, in accordance with a
453value of ``0.5``), which is more frequent than the reports defined by the
454heartbeat thresholds. If the primary OSD of a PG's Acting Set fails to report
455to the monitor or if other OSDs have reported the primary OSD ``down``, the
456monitors will mark the PG ``stale``.
7c673cae 457
1e59de90
TL
458When you start your cluster, it is common to see the ``stale`` state until the
459peering process completes. After your cluster has been running for a while,
460however, seeing PGs in the ``stale`` state indicates that the primary OSD for
461those PGs is ``down`` or not reporting PG statistics to the monitor.
7c673cae
FG
462
463
464Identifying Troubled PGs
465========================
466
1e59de90
TL
467As previously noted, a PG is not necessarily having problems just because its
468state is not ``active+clean``. When PGs are stuck, this might indicate that
469Ceph cannot perform self-repairs. The stuck states include:
7c673cae 470
1e59de90
TL
471- **Unclean**: PGs contain objects that have not been replicated the desired
472 number of times. Under normal conditions, it can be assumed that these PGs
473 are recovering.
474- **Inactive**: PGs cannot process reads or writes because they are waiting for
475 an OSD that has the most up-to-date data to come back ``up``.
476- **Stale**: PG are in an unknown state, because the OSDs that host them have
477 not reported to the monitor cluster for a certain period of time (determined
478 by ``mon_osd_report_timeout``).
7c673cae 479
1e59de90 480To identify stuck PGs, run the following command:
39ae355f
TL
481
482.. prompt:: bash $
7c673cae 483
1e59de90 484 ceph pg dump_stuck [unclean|inactive|stale|undersized|degraded]
7c673cae 485
1e59de90
TL
486For more detail, see `Placement Group Subsystem`_. To troubleshoot stuck PGs,
487see `Troubleshooting PG Errors`_.
7c673cae
FG
488
489
490Finding an Object Location
491==========================
492
493To store object data in the Ceph Object Store, a Ceph client must:
494
495#. Set an object name
496#. Specify a `pool`_
497
1e59de90
TL
498The Ceph client retrieves the latest cluster map, the CRUSH algorithm
499calculates how to map the object to a PG, and then the algorithm calculates how
500to dynamically assign the PG to an OSD. To find the object location given only
501the object name and the pool name, run a command of the following form:
39ae355f
TL
502
503.. prompt:: bash $
7c673cae 504
1e59de90 505 ceph osd map {poolname} {object-name} [namespace]
7c673cae
FG
506
507.. topic:: Exercise: Locate an Object
508
1e59de90
TL
509 As an exercise, let's create an object. We can specify an object name, a path
510 to a test file that contains some object data, and a pool name by using the
39ae355f
TL
511 ``rados put`` command on the command line. For example:
512
513 .. prompt:: bash $
7c673cae 514
1e59de90
TL
515 rados put {object-name} {file-path} --pool=data
516 rados put test-object-1 testfile.txt --pool=data
7c673cae 517
1e59de90
TL
518 To verify that the Ceph Object Store stored the object, run the
519 following command:
7c673cae 520
39ae355f
TL
521 .. prompt:: bash $
522
523 rados -p data ls
7c673cae 524
1e59de90 525 To identify the object location, run the following commands:
39ae355f
TL
526
527 .. prompt:: bash $
7c673cae 528
39ae355f
TL
529 ceph osd map {pool-name} {object-name}
530 ceph osd map data test-object-1
1e59de90
TL
531
532 Ceph should output the object's location. For example::
533
534 osdmap e537 pool 'data' (1) object 'test-object-1' -> pg 1.d1743484 (1.4) -> up ([0,1], p0) acting ([0,1], p0)
535
536 To remove the test object, simply delete it by running the ``rados rm``
537 command. For example:
39ae355f
TL
538
539 .. prompt:: bash $
1e59de90 540
39ae355f 541 rados rm test-object-1 --pool=data
7c673cae
FG
542
543As the cluster evolves, the object location may change dynamically. One benefit
1e59de90
TL
544of Ceph's dynamic rebalancing is that Ceph spares you the burden of manually
545performing the migration. For details, see the `Architecture`_ section.
7c673cae
FG
546
547.. _data placement: ../data-placement
548.. _pool: ../pools
549.. _placement group: ../placement-groups
550.. _Architecture: ../../../architecture
551.. _OSD Not Running: ../../troubleshooting/troubleshooting-osd#osd-not-running
552.. _Troubleshooting PG Errors: ../../troubleshooting/troubleshooting-pg#troubleshooting-pg-errors
553.. _Peering Failure: ../../troubleshooting/troubleshooting-pg#failures-osd-peering
554.. _CRUSH map: ../crush-map
555.. _Configuring Monitor/OSD Interaction: ../../configuration/mon-osd-interaction/
556.. _Placement Group Subsystem: ../control#placement-group-subsystem