ceph/doc/rados/operations/control.rst

   1 .. index:: control, commands
   2
   3 ==================
   4  Control Commands
   5 ==================
   6
   7
   8 Monitor Commands
   9 ================
  10
  11 Monitor commands are issued using the ``ceph`` utility::
  12
  13         ceph [-m monhost] {command}
  14
  15 The command is usually (though not always) of the form::
  16
  17         ceph {subsystem} {command}
  18
  19
  20 System Commands
  21 ===============
  22
  23 Execute the following to display the current cluster status.  ::
  24
  25         ceph -s
  26         ceph status
  27
  28 Execute the following to display a running summary of cluster status
  29 and major events. ::
  30
  31         ceph -w
  32
  33 Execute the following to show the monitor quorum, including which monitors are
  34 participating and which one is the leader. ::
  35
  36         ceph mon stat
  37         ceph quorum_status
  38
  39 Execute the following to query the status of a single monitor, including whether
  40 or not it is in the quorum. ::
  41
  42         ceph tell mon.[id] mon_status
  43
  44 where the value of ``[id]`` can be determined, e.g., from ``ceph -s``.
  45
  46
  47 Authentication Subsystem
  48 ========================
  49
  50 To add a keyring for an OSD, execute the following::
  51
  52         ceph auth add {osd} {--in-file|-i} {path-to-osd-keyring}
  53
  54 To list the cluster's keys and their capabilities, execute the following::
  55
  56         ceph auth ls
  57
  58
  59 Placement Group Subsystem
  60 =========================
  61
  62 To display the statistics for all placement groups (PGs), execute the following::
  63
  64         ceph pg dump [--format {format}]
  65
  66 The valid formats are ``plain`` (default), ``json`` ``json-pretty``, ``xml``, and ``xml-pretty``.
  67 When implementing monitoring and other tools, it is best to use ``json`` format.
  68 JSON parsing is more deterministic than the human-oriented ``plain``, and the layout is much
  69 less variable from release to release.  The ``jq`` utility can be invaluable when extracting
  70 data from JSON output.
  71
  72 To display the statistics for all placement groups stuck in a specified state,
  73 execute the following::
  74
  75         ceph pg dump_stuck inactive|unclean|stale|undersized|degraded [--format {format}] [-t|--threshold {seconds}]
  76
  77
  78 ``--format`` may be ``plain`` (default), ``json``, ``json-pretty``, ``xml``, or ``xml-pretty``.
  79
  80 ``--threshold`` defines how many seconds "stuck" is (default: 300)
  81
  82 **Inactive** Placement groups cannot process reads or writes because they are waiting for an OSD
  83 with the most up-to-date data to come back.
  84
  85 **Unclean** Placement groups contain objects that are not replicated the desired number
  86 of times. They should be recovering.
  87
  88 **Stale** Placement groups are in an unknown state - the OSDs that host them have not
  89 reported to the monitor cluster in a while (configured by
  90 ``mon_osd_report_timeout``).
  91
  92 Delete "lost" objects or revert them to their prior state, either a previous version
  93 or delete them if they were just created. ::
  94
  95         ceph pg {pgid} mark_unfound_lost revert|delete
  96
  97
  98 .. _osd-subsystem:
  99
 100 OSD Subsystem
 101 =============
 102
 103 Query OSD subsystem status. ::
 104
 105         ceph osd stat
 106
 107 Write a copy of the most recent OSD map to a file. See
 108 :ref:`osdmaptool <osdmaptool>`. ::
 109
 110         ceph osd getmap -o file
 111
 112 Write a copy of the crush map from the most recent OSD map to
 113 file. ::
 114
 115         ceph osd getcrushmap -o file
 116
 117 The foregoing is functionally equivalent to ::
 118
 119         ceph osd getmap -o /tmp/osdmap
 120         osdmaptool /tmp/osdmap --export-crush file
 121
 122 Dump the OSD map. Valid formats for ``-f`` are ``plain``, ``json``, ``json-pretty``,
 123 ``xml``, and ``xml-pretty``. If no ``--format`` option is given, the OSD map is
 124 dumped as plain text.  As above, JSON format is best for tools, scripting, and other automation. ::
 125
 126         ceph osd dump [--format {format}]
 127
 128 Dump the OSD map as a tree with one line per OSD containing weight
 129 and state. ::
 130
 131         ceph osd tree [--format {format}]
 132
 133 Find out where a specific object is or would be stored in the system::
 134
 135         ceph osd map <pool-name> <object-name>
 136
 137 Add or move a new item (OSD) with the given id/name/weight at the specified
 138 location. ::
 139
 140         ceph osd crush set {id} {weight} [{loc1} [{loc2} ...]]
 141
 142 Remove an existing item (OSD) from the CRUSH map. ::
 143
 144         ceph osd crush remove {name}
 145
 146 Remove an existing bucket from the CRUSH map. ::
 147
 148         ceph osd crush remove {bucket-name}
 149
 150 Move an existing bucket from one position in the hierarchy to another.  ::
 151
 152         ceph osd crush move {id} {loc1} [{loc2} ...]
 153
 154 Set the weight of the item given by ``{name}`` to ``{weight}``. ::
 155
 156         ceph osd crush reweight {name} {weight}
 157
 158 Mark an OSD as ``lost``. This may result in permanent data loss. Use with caution. ::
 159
 160         ceph osd lost {id} [--yes-i-really-mean-it]
 161
 162 Create a new OSD. If no UUID is given, it will be set automatically when the OSD
 163 starts up. ::
 164
 165         ceph osd create [{uuid}]
 166
 167 Remove the given OSD(s). ::
 168
 169         ceph osd rm [{id}...]
 170
 171 Query the current ``max_osd`` parameter in the OSD map. ::
 172
 173         ceph osd getmaxosd
 174
 175 Import the given crush map. ::
 176
 177         ceph osd setcrushmap -i file
 178
 179 Set the ``max_osd`` parameter in the OSD map. This defaults to 10000 now so
 180 most admins will never need to adjust this. ::
 181
 182         ceph osd setmaxosd
 183
 184 Mark OSD ``{osd-num}`` down. ::
 185
 186         ceph osd down {osd-num}
 187
 188 Mark OSD ``{osd-num}`` out of the distribution (i.e. allocated no data). ::
 189
 190         ceph osd out {osd-num}
 191
 192 Mark ``{osd-num}`` in the distribution (i.e. allocated data). ::
 193
 194         ceph osd in {osd-num}
 195
 196 Set or clear the pause flags in the OSD map. If set, no IO requests
 197 will be sent to any OSD. Clearing the flags via unpause results in
 198 resending pending requests. ::
 199
 200         ceph osd pause
 201         ceph osd unpause
 202
 203 Set the override weight (reweight) of ``{osd-num}`` to ``{weight}``. Two OSDs with the
 204 same weight will receive roughly the same number of I/O requests and
 205 store approximately the same amount of data. ``ceph osd reweight``
 206 sets an override weight on the OSD. This value is in the range 0 to 1,
 207 and forces CRUSH to re-place (1-weight) of the data that would
 208 otherwise live on this drive. It does not change weights assigned
 209 to the buckets above the OSD in the crush map, and is a corrective
 210 measure in case the normal CRUSH distribution is not working out quite
 211 right. For instance, if one of your OSDs is at 90% and the others are
 212 at 50%, you could reduce this weight to compensate. ::
 213
 214         ceph osd reweight {osd-num} {weight}
 215
 216 Balance OSD fullness by reducing the override weight of OSDs which are
 217 overly utilized.  Note that these override aka ``reweight`` values
 218 default to 1.00000 and are relative only to each other; they not absolute.
 219 It is crucial to distinguish them from CRUSH weights, which reflect the
 220 absolute capacity of a bucket in TiB.  By default this command adjusts
 221 override weight on OSDs which have + or - 20% of the average utilization,
 222 but if you include a ``threshold`` that percentage will be used instead. ::
 223
 224         ceph osd reweight-by-utilization [threshold [max_change [max_osds]]] [--no-increasing]
 225
 226 To limit the step by which any OSD's reweight will be changed, specify
 227 ``max_change`` which defaults to 0.05.  To limit the number of OSDs that will
 228 be adjusted, specify ``max_osds`` as well; the default is 4.  Increasing these
 229 parameters can speed leveling of OSD utilization, at the potential cost of
 230 greater impact on client operations due to more data moving at once.
 231
 232 To determine which and how many PGs and OSDs will be affected by a given invocation
 233 you can test before executing. ::
 234
 235         ceph osd test-reweight-by-utilization [threshold [max_change max_osds]] [--no-increasing]
 236
 237 Adding ``--no-increasing`` to either command prevents increasing any
 238 override weights that are currently < 1.00000.  This can be useful when
 239 you are balancing in a hurry to remedy ``full`` or ``nearful`` OSDs or
 240 when some OSDs are being evacuated or slowly brought into service.
 241
 242 Deployments utilizing Nautilus (or later revisions of Luminous and Mimic)
 243 that have no pre-Luminous cients may instead wish to instead enable the
 244 `balancer`` module for ``ceph-mgr``.
 245
 246 Add/remove an IP address or CIDR range to/from the blocklist.
 247 When adding to the blocklist,
 248 you can specify how long it should be blocklisted in seconds; otherwise,
 249 it will default to 1 hour. A blocklisted address is prevented from
 250 connecting to any OSD. If you blocklist an IP or range containing an OSD, be aware
 251 that OSD will also be prevented from performing operations on its peers where it
 252 acts as a client. (This includes tiering and copy-from functionality.)
 253
 254 If you want to blocklist a range (in CIDR format), you may do so by
 255 including the ``range`` keyword.
 256
 257 These commands are mostly only useful for failure testing, as
 258 blocklists are normally maintained automatically and shouldn't need
 259 manual intervention. ::
 260
 261         ceph osd blocklist ["range"] add ADDRESS[:source_port][/netmask_bits] [TIME]
 262         ceph osd blocklist ["range"] rm ADDRESS[:source_port][/netmask_bits]
 263
 264 Creates/deletes a snapshot of a pool. ::
 265
 266         ceph osd pool mksnap {pool-name} {snap-name}
 267         ceph osd pool rmsnap {pool-name} {snap-name}
 268
 269 Creates/deletes/renames a storage pool. ::
 270
 271         ceph osd pool create {pool-name} [pg_num [pgp_num]]
 272         ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]
 273         ceph osd pool rename {old-name} {new-name}
 274
 275 Changes a pool setting. ::
 276
 277         ceph osd pool set {pool-name} {field} {value}
 278
 279 Valid fields are:
 280
 281         * ``size``: Sets the number of copies of data in the pool.
 282         * ``pg_num``: The placement group number.
 283         * ``pgp_num``: Effective number when calculating pg placement.
 284         * ``crush_rule``: rule number for mapping placement.
 285
 286 Get the value of a pool setting. ::
 287
 288         ceph osd pool get {pool-name} {field}
 289
 290 Valid fields are:
 291
 292         * ``pg_num``: The placement group number.
 293         * ``pgp_num``: Effective number of placement groups when calculating placement.
 294
 295
 296 Sends a scrub command to OSD ``{osd-num}``. To send the command to all OSDs, use ``*``. ::
 297
 298         ceph osd scrub {osd-num}
 299
 300 Sends a repair command to OSD.N. To send the command to all OSDs, use ``*``. ::
 301
 302         ceph osd repair N
 303
 304 Runs a simple throughput benchmark against OSD.N, writing ``TOTAL_DATA_BYTES``
 305 in write requests of ``BYTES_PER_WRITE`` each. By default, the test
 306 writes 1 GB in total in 4-MB increments.
 307 The benchmark is non-destructive and will not overwrite existing live
 308 OSD data, but might temporarily affect the performance of clients
 309 concurrently accessing the OSD. ::
 310
 311         ceph tell osd.N bench [TOTAL_DATA_BYTES] [BYTES_PER_WRITE]
 312
 313 To clear an OSD's caches between benchmark runs, use the 'cache drop' command ::
 314
 315         ceph tell osd.N cache drop
 316
 317 To get the cache statistics of an OSD, use the 'cache status' command ::
 318
 319         ceph tell osd.N cache status
 320
 321 MDS Subsystem
 322 =============
 323
 324 Change configuration parameters on a running mds. ::
 325
 326         ceph tell mds.{mds-id} config set {setting} {value}
 327
 328 Example::
 329
 330         ceph tell mds.0 config set debug_ms 1
 331
 332 Enables debug messages. ::
 333
 334         ceph mds stat
 335
 336 Displays the status of all metadata servers. ::
 337
 338         ceph mds fail 0
 339
 340 Marks the active MDS as failed, triggering failover to a standby if present.
 341
 342 .. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap
 343
 344
 345 Mon Subsystem
 346 =============
 347
 348 Show monitor stats::
 349
 350         ceph mon stat
 351
 352         e2: 3 mons at {a=127.0.0.1:40000/0,b=127.0.0.1:40001/0,c=127.0.0.1:40002/0}, election epoch 6, quorum 0,1,2 a,b,c
 353
 354
 355 The ``quorum`` list at the end lists monitor nodes that are part of the current quorum.
 356
 357 This is also available more directly::
 358
 359         ceph quorum_status -f json-pretty
 360
 361 .. code-block:: javascript
 362
 363         {
 364             "election_epoch": 6,
 365             "quorum": [
 366                 0,
 367                 1,
 368                 2
 369             ],
 370             "quorum_names": [
 371                 "a",
 372                 "b",
 373                 "c"
 374             ],
 375             "quorum_leader_name": "a",
 376             "monmap": {
 377                 "epoch": 2,
 378                 "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
 379                 "modified": "2016-12-26 14:42:09.288066",
 380                 "created": "2016-12-26 14:42:03.573585",
 381                 "features": {
 382                     "persistent": [
 383                         "kraken"
 384                     ],
 385                     "optional": []
 386                 },
 387                 "mons": [
 388                     {
 389                         "rank": 0,
 390                         "name": "a",
 391                         "addr": "127.0.0.1:40000\/0",
 392                         "public_addr": "127.0.0.1:40000\/0"
 393                     },
 394                     {
 395                         "rank": 1,
 396                         "name": "b",
 397                         "addr": "127.0.0.1:40001\/0",
 398                         "public_addr": "127.0.0.1:40001\/0"
 399                     },
 400                     {
 401                         "rank": 2,
 402                         "name": "c",
 403                         "addr": "127.0.0.1:40002\/0",
 404                         "public_addr": "127.0.0.1:40002\/0"
 405                     }
 406                 ]
 407             }
 408         }
 409
 410
 411 The above will block until a quorum is reached.
 412
 413 For a status of just a single monitor::
 414
 415         ceph tell mon.[name] mon_status
 416
 417 where the value of ``[name]`` can be taken from ``ceph quorum_status``. Sample
 418 output::
 419
 420         {
 421             "name": "b",
 422             "rank": 1,
 423             "state": "peon",
 424             "election_epoch": 6,
 425             "quorum": [
 426                 0,
 427                 1,
 428                 2
 429             ],
 430             "features": {
 431                 "required_con": "9025616074522624",
 432                 "required_mon": [
 433                     "kraken"
 434                 ],
 435                 "quorum_con": "1152921504336314367",
 436                 "quorum_mon": [
 437                     "kraken"
 438                 ]
 439             },
 440             "outside_quorum": [],
 441             "extra_probe_peers": [],
 442             "sync_provider": [],
 443             "monmap": {
 444                 "epoch": 2,
 445                 "fsid": "ba807e74-b64f-4b72-b43f-597dfe60ddbc",
 446                 "modified": "2016-12-26 14:42:09.288066",
 447                 "created": "2016-12-26 14:42:03.573585",
 448                 "features": {
 449                     "persistent": [
 450                         "kraken"
 451                     ],
 452                     "optional": []
 453                 },
 454                 "mons": [
 455                     {
 456                         "rank": 0,
 457                         "name": "a",
 458                         "addr": "127.0.0.1:40000\/0",
 459                         "public_addr": "127.0.0.1:40000\/0"
 460                     },
 461                     {
 462                         "rank": 1,
 463                         "name": "b",
 464                         "addr": "127.0.0.1:40001\/0",
 465                         "public_addr": "127.0.0.1:40001\/0"
 466                     },
 467                     {
 468                         "rank": 2,
 469                         "name": "c",
 470                         "addr": "127.0.0.1:40002\/0",
 471                         "public_addr": "127.0.0.1:40002\/0"
 472                     }
 473                 ]
 474             }
 475         }
 476
 477 A dump of the monitor state::
 478
 479         ceph mon dump
 480
 481         dumped monmap epoch 2
 482         epoch 2
 483         fsid ba807e74-b64f-4b72-b43f-597dfe60ddbc
 484         last_changed 2016-12-26 14:42:09.288066
 485         created 2016-12-26 14:42:03.573585
 486         0: 127.0.0.1:40000/0 mon.a
 487         1: 127.0.0.1:40001/0 mon.b
 488         2: 127.0.0.1:40002/0 mon.c
 489