ceph/doc/install/manual-deployment.rst

   1 ===================
   2  Manual Deployment
   3 ===================
   4
   5 All Ceph clusters require at least one monitor, and at least as many OSDs as
   6 copies of an object stored on the cluster.  Bootstrapping the initial monitor(s)
   7 is the first step in deploying a Ceph Storage Cluster. Monitor deployment also
   8 sets important criteria for the entire cluster, such as the number of replicas
   9 for pools, the number of placement groups per OSD, the heartbeat intervals,
  10 whether authentication is required, etc. Most of these values are set by
  11 default, so it's useful to know about them when setting up your cluster for
  12 production.
  13
  14 Following the same configuration as `Installation (Quick)`_, we will set up a
  15 cluster with ``node1`` as  the monitor node, and ``node2`` and ``node3`` for
  16 OSD nodes.
  17
  18
  19
  20 .. ditaa::
  21            /------------------\         /----------------\
  22            |    Admin Node    |         |     node1      |
  23            |                  +-------->+                |
  24            |                  |         | cCCC           |
  25            \---------+--------/         \----------------/
  26                      |
  27                      |                  /----------------\
  28                      |                  |     node2      |
  29                      +----------------->+                |
  30                      |                  | cCCC           |
  31                      |                  \----------------/
  32                      |
  33                      |                  /----------------\
  34                      |                  |     node3      |
  35                      +----------------->|                |
  36                                         | cCCC           |
  37                                         \----------------/
  38
  39
  40 Monitor Bootstrapping
  41 =====================
  42
  43 Bootstrapping a monitor (a Ceph Storage Cluster, in theory) requires
  44 a number of things:
  45
  46 - **Unique Identifier:** The ``fsid`` is a unique identifier for the cluster,
  47   and stands for File System ID from the days when the Ceph Storage Cluster was
  48   principally for the Ceph Filesystem. Ceph now supports native interfaces,
  49   block devices, and object storage gateway interfaces too, so ``fsid`` is a
  50   bit of a misnomer.
  51
  52 - **Cluster Name:** Ceph clusters have a cluster name, which is a simple string
  53   without spaces. The default cluster name is ``ceph``, but you may specify
  54   a different cluster name. Overriding the default cluster name is
  55   especially useful when you are working with multiple clusters and you need to
  56   clearly understand which cluster your are working with.
  57
  58   For example, when you run multiple clusters in a `federated architecture`_,
  59   the cluster name (e.g., ``us-west``, ``us-east``) identifies the cluster for
  60   the current CLI session. **Note:** To identify the cluster name on the
  61   command line interface, specify the Ceph configuration file with the
  62   cluster name (e.g., ``ceph.conf``, ``us-west.conf``, ``us-east.conf``, etc.).
  63   Also see CLI usage (``ceph --cluster {cluster-name}``).
  64
  65 - **Monitor Name:** Each monitor instance within a cluster has a unique name.
  66   In common practice, the Ceph Monitor name is the host name (we recommend one
  67   Ceph Monitor per host, and no commingling of Ceph OSD Daemons with
  68   Ceph Monitors). You may retrieve the short hostname with ``hostname -s``.
  69
  70 - **Monitor Map:** Bootstrapping the initial monitor(s) requires you to
  71   generate a monitor map. The monitor map requires the ``fsid``, the cluster
  72   name (or uses the default), and at least one host name and its IP address.
  73
  74 - **Monitor Keyring**: Monitors communicate with each other via a
  75   secret key. You must generate a keyring with a monitor secret and provide
  76   it when bootstrapping the initial monitor(s).
  77
  78 - **Administrator Keyring**: To use the ``ceph`` CLI tools, you must have
  79   a ``client.admin`` user. So you must generate the admin user and keyring,
  80   and you must also add the ``client.admin`` user to the monitor keyring.
  81
  82 The foregoing requirements do not imply the creation of a Ceph Configuration
  83 file. However, as a best practice, we recommend creating a Ceph configuration
  84 file and populating it with the ``fsid``, the ``mon initial members`` and the
  85 ``mon host`` settings.
  86
  87 You can get and set all of the monitor settings at runtime as well. However,
  88 a Ceph Configuration file may contain only those settings that override the
  89 default values. When you add settings to a Ceph configuration file, these
  90 settings override the default settings. Maintaining those settings in a
  91 Ceph configuration file makes it easier to maintain your cluster.
  92
  93 The procedure is as follows:
  94
  95
  96 #. Log in to the initial monitor node(s)::
  97
  98         ssh {hostname}
  99
 100    For example::
 101
 102         ssh node1
 103
 104
 105 #. Ensure you have a directory for the Ceph configuration file. By default,
 106    Ceph uses ``/etc/ceph``. When you install ``ceph``, the installer will
 107    create the ``/etc/ceph`` directory automatically. ::
 108
 109         ls /etc/ceph
 110
 111    **Note:** Deployment tools may remove this directory when purging a
 112    cluster (e.g., ``ceph-deploy purgedata {node-name}``, ``ceph-deploy purge
 113    {node-name}``).
 114
 115 #. Create a Ceph configuration file. By default, Ceph uses
 116    ``ceph.conf``, where ``ceph`` reflects the cluster name. ::
 117
 118         sudo vim /etc/ceph/ceph.conf
 119
 120
 121 #. Generate a unique ID (i.e., ``fsid``) for your cluster. ::
 122
 123         uuidgen
 124
 125
 126 #. Add the unique ID to your Ceph configuration file. ::
 127
 128         fsid = {UUID}
 129
 130    For example::
 131
 132         fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
 133
 134
 135 #. Add the initial monitor(s) to your Ceph configuration file. ::
 136
 137         mon initial members = {hostname}[,{hostname}]
 138
 139    For example::
 140
 141         mon initial members = node1
 142
 143
 144 #. Add the IP address(es) of the initial monitor(s) to your Ceph configuration
 145    file and save the file. ::
 146
 147         mon host = {ip-address}[,{ip-address}]
 148
 149    For example::
 150
 151         mon host = 192.168.0.1
 152
 153    **Note:** You may use IPv6 addresses instead of IPv4 addresses, but
 154    you must set ``ms bind ipv6`` to ``true``. See `Network Configuration
 155    Reference`_ for details about network configuration.
 156
 157 #. Create a keyring for your cluster and generate a monitor secret key. ::
 158
 159         ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
 160
 161
 162 #. Generate an administrator keyring, generate a ``client.admin`` user and add
 163    the user to the keyring. ::
 164
 165         sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
 166
 167
 168 #. Add the ``client.admin`` key to the ``ceph.mon.keyring``. ::
 169
 170         ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
 171
 172
 173 #. Generate a monitor map using the hostname(s), host IP address(es) and the FSID.
 174    Save it as ``/tmp/monmap``::
 175
 176         monmaptool --create --add {hostname} {ip-address} --fsid {uuid} /tmp/monmap
 177
 178    For example::
 179
 180         monmaptool --create --add node1 192.168.0.1 --fsid a7f64266-0894-4f1e-a635-d0aeaca0e993 /tmp/monmap
 181
 182
 183 #. Create a default data directory (or directories) on the monitor host(s). ::
 184
 185         sudo mkdir /var/lib/ceph/mon/{cluster-name}-{hostname}
 186
 187    For example::
 188
 189         sudo mkdir /var/lib/ceph/mon/ceph-node1
 190
 191    See `Monitor Config Reference - Data`_ for details.
 192
 193 #. Populate the monitor daemon(s) with the monitor map and keyring. ::
 194
 195         sudo -u ceph ceph-mon [--cluster {cluster-name}] --mkfs -i {hostname} --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
 196
 197    For example::
 198
 199         sudo -u ceph ceph-mon --mkfs -i node1 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
 200
 201
 202 #. Consider settings for a Ceph configuration file. Common settings include
 203    the following::
 204
 205         [global]
 206         fsid = {cluster-id}
 207         mon initial members = {hostname}[, {hostname}]
 208         mon host = {ip-address}[, {ip-address}]
 209         public network = {network}[, {network}]
 210         cluster network = {network}[, {network}]
 211         auth cluster required = cephx
 212         auth service required = cephx
 213         auth client required = cephx
 214         osd journal size = {n}
 215         osd pool default size = {n}  # Write an object n times.
 216         osd pool default min size = {n} # Allow writing n copy in a degraded state.
 217         osd pool default pg num = {n}
 218         osd pool default pgp num = {n}
 219         osd crush chooseleaf type = {n}
 220
 221    In the foregoing example, the ``[global]`` section of the configuration might
 222    look like this::
 223
 224         [global]
 225         fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
 226         mon initial members = node1
 227         mon host = 192.168.0.1
 228         public network = 192.168.0.0/24
 229         auth cluster required = cephx
 230         auth service required = cephx
 231         auth client required = cephx
 232         osd journal size = 1024
 233         osd pool default size = 2
 234         osd pool default min size = 1
 235         osd pool default pg num = 333
 236         osd pool default pgp num = 333
 237         osd crush chooseleaf type = 1
 238
 239 #. Touch the ``done`` file.
 240
 241    Mark that the monitor is created and ready to be started::
 242
 243         sudo touch /var/lib/ceph/mon/ceph-node1/done
 244
 245 #. Start the monitor(s).
 246
 247    For Ubuntu, use Upstart::
 248
 249         sudo start ceph-mon id=node1 [cluster={cluster-name}]
 250
 251    In this case, to allow the start of the daemon at each reboot you
 252    must create two empty files like this::
 253
 254         sudo touch /var/lib/ceph/mon/{cluster-name}-{hostname}/upstart
 255
 256    For example::
 257
 258         sudo touch /var/lib/ceph/mon/ceph-node1/upstart
 259
 260    For Debian/CentOS/RHEL, use sysvinit::
 261
 262         sudo /etc/init.d/ceph start mon.node1
 263
 264
 265 #. Verify that Ceph created the default pools. ::
 266
 267         ceph osd lspools
 268
 269    You should see output like this::
 270
 271         0 data,1 metadata,2 rbd,
 272
 273
 274 #. Verify that the monitor is running. ::
 275
 276         ceph -s
 277
 278    You should see output that the monitor you started is up and running, and
 279    you should see a health error indicating that placement groups are stuck
 280    inactive. It should look something like this::
 281
 282         cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
 283           health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
 284           monmap e1: 1 mons at {node1=192.168.0.1:6789/0}, election epoch 1, quorum 0 node1
 285           osdmap e1: 0 osds: 0 up, 0 in
 286           pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
 287              0 kB used, 0 kB / 0 kB avail
 288              192 creating
 289
 290    **Note:** Once you add OSDs and start them, the placement group health errors
 291    should disappear. See the next section for details.
 292
 293 Manager daemon configuration
 294 ============================
 295
 296 On each node where you run a ceph-mon daemon, you should also set up a ceph-mgr daemon.
 297
 298 See :doc:`../mgr/administrator`
 299
 300 Adding OSDs
 301 ===========
 302
 303 Once you have your initial monitor(s) running, you should add OSDs. Your cluster
 304 cannot reach an ``active + clean`` state until you have enough OSDs to handle the
 305 number of copies of an object (e.g., ``osd pool default size = 2`` requires at
 306 least two OSDs). After bootstrapping your monitor, your cluster has a default
 307 CRUSH map; however, the CRUSH map doesn't have any Ceph OSD Daemons mapped to
 308 a Ceph Node.
 309
 310
 311 Short Form
 312 ----------
 313
 314 Ceph provides the ``ceph-disk`` utility, which can prepare a disk, partition or
 315 directory for use with Ceph. The ``ceph-disk`` utility creates the OSD ID by
 316 incrementing the index. Additionally, ``ceph-disk`` will add the new OSD to the
 317 CRUSH map under the host for you. Execute ``ceph-disk -h`` for CLI details.
 318 The ``ceph-disk`` utility automates the steps of the `Long Form`_ below. To
 319 create the first two OSDs with the short form procedure, execute the following
 320 on  ``node2`` and ``node3``:
 321
 322
 323 #. Prepare the OSD. ::
 324
 325         ssh {node-name}
 326         sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} {data-path} [{journal-path}]
 327
 328    For example::
 329
 330         ssh node1
 331         sudo ceph-disk prepare --cluster ceph --cluster-uuid a7f64266-0894-4f1e-a635-d0aeaca0e993 --fs-type ext4 /dev/hdd1
 332
 333
 334 #. Activate the OSD::
 335
 336         sudo ceph-disk activate {data-path} [--activate-key {path}]
 337
 338    For example::
 339
 340         sudo ceph-disk activate /dev/hdd1
 341
 342    **Note:** Use the ``--activate-key`` argument if you do not have a copy
 343    of ``/var/lib/ceph/bootstrap-osd/{cluster}.keyring`` on the Ceph Node.
 344
 345
 346 Long Form
 347 ---------
 348
 349 Without the benefit of any helper utilities, create an OSD and add it to the
 350 cluster and CRUSH map with the following procedure. To create the first two
 351 OSDs with the long form procedure, execute the following on ``node2`` and
 352 ``node3``:
 353
 354 #. Connect to the OSD host. ::
 355
 356         ssh {node-name}
 357
 358 #. Generate a UUID for the OSD. ::
 359
 360         uuidgen
 361
 362
 363 #. Create the OSD. If no UUID is given, it will be set automatically when the
 364    OSD starts up. The following command will output the OSD number, which you
 365    will need for subsequent steps. ::
 366
 367         ceph osd create [{uuid} [{id}]]
 368
 369
 370 #. Create the default directory on your new OSD. ::
 371
 372         ssh {new-osd-host}
 373         sudo mkdir /var/lib/ceph/osd/{cluster-name}-{osd-number}
 374
 375
 376 #. If the OSD is for a drive other than the OS drive, prepare it
 377    for use with Ceph, and mount it to the directory you just created::
 378
 379         ssh {new-osd-host}
 380         sudo mkfs -t {fstype} /dev/{hdd}
 381         sudo mount -o user_xattr /dev/{hdd} /var/lib/ceph/osd/{cluster-name}-{osd-number}
 382
 383
 384 #. Initialize the OSD data directory. ::
 385
 386         ssh {new-osd-host}
 387         sudo ceph-osd -i {osd-num} --mkfs --mkkey --osd-uuid [{uuid}]
 388
 389    The directory must be empty before you can run ``ceph-osd`` with the
 390    ``--mkkey`` option. In addition, the ceph-osd tool requires specification
 391    of custom cluster names with the ``--cluster`` option.
 392
 393
 394 #. Register the OSD authentication key. The value of ``ceph`` for
 395    ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.  If your
 396    cluster name differs from ``ceph``, use your cluster name instead.::
 397
 398         sudo ceph auth add osd.{osd-num} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/{cluster-name}-{osd-num}/keyring
 399
 400
 401 #. Add your Ceph Node to the CRUSH map. ::
 402
 403         ceph [--cluster {cluster-name}] osd crush add-bucket {hostname} host
 404
 405    For example::
 406
 407         ceph osd crush add-bucket node1 host
 408
 409
 410 #. Place the Ceph Node under the root ``default``. ::
 411
 412         ceph osd crush move node1 root=default
 413
 414
 415 #. Add the OSD to the CRUSH map so that it can begin receiving data. You may
 416    also decompile the CRUSH map, add the OSD to the device list, add the host as a
 417    bucket (if it's not already in the CRUSH map), add the device as an item in the
 418    host, assign it a weight, recompile it and set it. ::
 419
 420         ceph [--cluster {cluster-name}] osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
 421
 422    For example::
 423
 424         ceph osd crush add osd.0 1.0 host=node1
 425
 426
 427 #. After you add an OSD to Ceph, the OSD is in your configuration. However,
 428    it is not yet running. The OSD is ``down`` and ``in``. You must start
 429    your new OSD before it can begin receiving data.
 430
 431    For Ubuntu, use Upstart::
 432
 433         sudo start ceph-osd id={osd-num} [cluster={cluster-name}]
 434
 435    For example::
 436
 437         sudo start ceph-osd id=0
 438         sudo start ceph-osd id=1
 439
 440    For Debian/CentOS/RHEL, use sysvinit::
 441
 442         sudo /etc/init.d/ceph start osd.{osd-num} [--cluster {cluster-name}]
 443
 444    For example::
 445
 446         sudo /etc/init.d/ceph start osd.0
 447         sudo /etc/init.d/ceph start osd.1
 448
 449    In this case, to allow the start of the daemon at each reboot you
 450    must create an empty file like this::
 451
 452         sudo touch /var/lib/ceph/osd/{cluster-name}-{osd-num}/sysvinit
 453
 454    For example::
 455
 456         sudo touch /var/lib/ceph/osd/ceph-0/sysvinit
 457         sudo touch /var/lib/ceph/osd/ceph-1/sysvinit
 458
 459    Once you start your OSD, it is ``up`` and ``in``.
 460
 461
 462
 463 Adding MDS
 464 ==========
 465
 466 In the below instructions, ``{id}`` is an arbitrary name, such as the hostname of the machine.
 467
 468 #. Create the mds data directory.::
 469
 470         mkdir -p /var/lib/ceph/mds/{cluster-name}-{id}
 471
 472 #. Create a keyring.::
 473
 474         ceph-authtool --create-keyring /var/lib/ceph/mds/{cluster-name}-{id}/keyring --gen-key -n mds.{id}
 475
 476 #. Import the keyring and set caps.::
 477
 478         ceph auth add mds.{id} osd "allow rwx" mds "allow" mon "allow profile mds" -i /var/lib/ceph/mds/{cluster}-{id}/keyring
 479
 480 #. Add to ceph.conf.::
 481
 482         [mds.{id}]
 483         host = {id}
 484
 485 #. Start the daemon the manual way.::
 486
 487         ceph-mds --cluster {cluster-name} -i {id} -m {mon-hostname}:{mon-port} [-f]
 488
 489 #. Start the daemon the right way (using ceph.conf entry).::
 490
 491         service ceph start
 492
 493 #. If starting the daemon fails with this error::
 494
 495         mds.-1.0 ERROR: failed to authenticate: (22) Invalid argument
 496
 497    Then make sure you do not have a keyring set in ceph.conf in the global section; move it to the client section; or add a keyring setting specific to this mds daemon. And verify that you see the same key in the mds data directory and ``ceph auth get mds.{id}`` output.
 498
 499 #. Now you are ready to `create a Ceph filesystem`_.
 500
 501
 502 Summary
 503 =======
 504
 505 Once you have your monitor and two OSDs up and running, you can watch the
 506 placement groups peer by executing the following::
 507
 508         ceph -w
 509
 510 To view the tree, execute the following::
 511
 512         ceph osd tree
 513
 514 You should see output that looks something like this::
 515
 516         # id    weight  type name       up/down reweight
 517         -1      2       root default
 518         -2      2               host node1
 519         0       1                       osd.0   up      1
 520         -3      1               host node2
 521         1       1                       osd.1   up      1
 522
 523 To add (or remove) additional monitors, see `Add/Remove Monitors`_.
 524 To add (or remove) additional Ceph OSD Daemons, see `Add/Remove OSDs`_.
 525
 526
 527 .. _federated architecture: ../../radosgw/federated-config
 528 .. _Installation (Quick): ../../start
 529 .. _Add/Remove Monitors: ../../rados/operations/add-or-rm-mons
 530 .. _Add/Remove OSDs: ../../rados/operations/add-or-rm-osds
 531 .. _Network Configuration Reference: ../../rados/configuration/network-config-ref
 532 .. _Monitor Config Reference - Data: ../../rados/configuration/mon-config-ref#data
 533 .. _create a Ceph filesystem: ../../cephfs/createfs