ceph/doc/install/manual-freebsd-deployment.rst

   1 ==============================
   2  Manual Deployment on FreeBSD
   3 ==============================
   4
   5 This a largely a copy of the regular Manual Deployment with FreeBSD specifics.
   6 The difference lies in two parts: The underlying diskformat, and the way to use
   7 the tools.
   8
   9 All Ceph clusters require at least one monitor, and at least as many OSDs as
  10 copies of an object stored on the cluster.  Bootstrapping the initial monitor(s)
  11 is the first step in deploying a Ceph Storage Cluster. Monitor deployment also
  12 sets important criteria for the entire cluster, such as the number of replicas
  13 for pools, the number of placement groups per OSD, the heartbeat intervals,
  14 whether authentication is required, etc. Most of these values are set by
  15 default, so it's useful to know about them when setting up your cluster for
  16 production.
  17
  18 Following the same configuration as `Installation (Quick)`_, we will set up a
  19 cluster with ``node1`` as  the monitor node, and ``node2`` and ``node3`` for
  20 OSD nodes.
  21
  22
  23
  24 .. ditaa::
  25            /------------------\         /----------------\
  26            |    Admin Node    |         |     node1      |
  27            |                  +-------->+                |
  28            |                  |         | cCCC           |
  29            \---------+--------/         \----------------/
  30                      |
  31                      |                  /----------------\
  32                      |                  |     node2      |
  33                      +----------------->+                |
  34                      |                  | cCCC           |
  35                      |                  \----------------/
  36                      |
  37                      |                  /----------------\
  38                      |                  |     node3      |
  39                      +----------------->|                |
  40                                         | cCCC           |
  41                                         \----------------/
  42
  43
  44
  45 Disklayout on FreeBSD
  46 =====================
  47
  48 Current implementation works on ZFS pools
  49
  50 * All Ceph data is created in /var/lib/ceph
  51 * Log files go into /var/log/ceph
  52 * PID files go into /var/log/run
  53 * One ZFS pool is allocated per OSD, like::
  54
  55     gpart create -s GPT ada1
  56     gpart add -t freebsd-zfs -l osd1 ada1
  57     zpool create -o mountpoint=/var/lib/ceph/osd/osd.1 osd
  58
  59 * Some cache and log (ZIL) can be attached.
  60   Please note that this is different from the Ceph journals. Cache and log are
  61   totally transparent for Ceph, and help the filesystem to keep the system
  62   consistant and help performance.
  63   Assuming that ada2 is an SSD::
  64
  65     gpart create -s GPT ada2
  66     gpart add -t freebsd-zfs -l osd1-log -s 1G ada2
  67     zpool add osd1 log gpt/osd1-log
  68     gpart add -t freebsd-zfs -l osd1-cache -s 10G ada2
  69     zpool add osd1 log gpt/osd1-cache
  70
  71 * Note: *UFS2 does not allow large xattribs*
  72
  73
  74 Configuration
  75 -------------
  76
  77 As per FreeBSD default parts of extra software go into ``/usr/local/``. Which
  78 means that for ``/etc/ceph.conf`` the default location is
  79 ``/usr/local/etc/ceph/ceph.conf``. Smartest thing to do is to create a softlink
  80 from ``/etc/ceph`` to ``/usr/local/etc/ceph``::
  81
  82   ln -s /usr/local/etc/ceph /etc/ceph
  83
  84 A sample file is provided in ``/usr/local/share/doc/ceph/sample.ceph.conf``
  85 Note that ``/usr/local/etc/ceph/ceph.conf`` will be found by most tools,
  86 linking it to ``/etc/ceph/ceph.conf`` will help with any scripts that are found
  87 in extra tools, scripts, and/or discussionlists.
  88
  89 Monitor Bootstrapping
  90 =====================
  91
  92 Bootstrapping a monitor (a Ceph Storage Cluster, in theory) requires
  93 a number of things:
  94
  95 - **Unique Identifier:** The ``fsid`` is a unique identifier for the cluster,
  96   and stands for File System ID from the days when the Ceph Storage Cluster was
  97   principally for the Ceph Filesystem. Ceph now supports native interfaces,
  98   block devices, and object storage gateway interfaces too, so ``fsid`` is a
  99   bit of a misnomer.
 100
 101 - **Cluster Name:** Ceph clusters have a cluster name, which is a simple string
 102   without spaces. The default cluster name is ``ceph``, but you may specify
 103   a different cluster name. Overriding the default cluster name is
 104   especially useful when you are working with multiple clusters and you need to
 105   clearly understand which cluster your are working with.
 106
 107   For example, when you run multiple clusters in a `federated architecture`_,
 108   the cluster name (e.g., ``us-west``, ``us-east``) identifies the cluster for
 109   the current CLI session. **Note:** To identify the cluster name on the
 110   command line interface, specify the a Ceph configuration file with the
 111   cluster name (e.g., ``ceph.conf``, ``us-west.conf``, ``us-east.conf``, etc.).
 112   Also see CLI usage (``ceph --cluster {cluster-name}``).
 113
 114 - **Monitor Name:** Each monitor instance within a cluster has a unique name.
 115   In common practice, the Ceph Monitor name is the host name (we recommend one
 116   Ceph Monitor per host, and no commingling of Ceph OSD Daemons with
 117   Ceph Monitors). You may retrieve the short hostname with ``hostname -s``.
 118
 119 - **Monitor Map:** Bootstrapping the initial monitor(s) requires you to
 120   generate a monitor map. The monitor map requires the ``fsid``, the cluster
 121   name (or uses the default), and at least one host name and its IP address.
 122
 123 - **Monitor Keyring**: Monitors communicate with each other via a
 124   secret key. You must generate a keyring with a monitor secret and provide
 125   it when bootstrapping the initial monitor(s).
 126
 127 - **Administrator Keyring**: To use the ``ceph`` CLI tools, you must have
 128   a ``client.admin`` user. So you must generate the admin user and keyring,
 129   and you must also add the ``client.admin`` user to the monitor keyring.
 130
 131 The foregoing requirements do not imply the creation of a Ceph Configuration
 132 file. However, as a best practice, we recommend creating a Ceph configuration
 133 file and populating it with the ``fsid``, the ``mon initial members`` and the
 134 ``mon host`` settings.
 135
 136 You can get and set all of the monitor settings at runtime as well. However,
 137 a Ceph Configuration file may contain only those settings that override the
 138 default values. When you add settings to a Ceph configuration file, these
 139 settings override the default settings. Maintaining those settings in a
 140 Ceph configuration file makes it easier to maintain your cluster.
 141
 142 The procedure is as follows:
 143
 144
 145 #. Log in to the initial monitor node(s)::
 146
 147         ssh {hostname}
 148
 149    For example::
 150
 151         ssh node1
 152
 153
 154 #. Ensure you have a directory for the Ceph configuration file. By default,
 155    Ceph uses ``/etc/ceph``. When you install ``ceph``, the installer will
 156    create the ``/etc/ceph`` directory automatically. ::
 157
 158         ls /etc/ceph
 159
 160    **Note:** Deployment tools may remove this directory when purging a
 161    cluster (e.g., ``ceph-deploy purgedata {node-name}``, ``ceph-deploy purge
 162    {node-name}``).
 163
 164 #. Create a Ceph configuration file. By default, Ceph uses
 165    ``ceph.conf``, where ``ceph`` reflects the cluster name. ::
 166
 167         sudo vim /etc/ceph/ceph.conf
 168
 169
 170 #. Generate a unique ID (i.e., ``fsid``) for your cluster. ::
 171
 172         uuidgen
 173
 174
 175 #. Add the unique ID to your Ceph configuration file. ::
 176
 177         fsid = {UUID}
 178
 179    For example::
 180
 181         fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
 182
 183
 184 #. Add the initial monitor(s) to your Ceph configuration file. ::
 185
 186         mon initial members = {hostname}[,{hostname}]
 187
 188    For example::
 189
 190         mon initial members = node1
 191
 192
 193 #. Add the IP address(es) of the initial monitor(s) to your Ceph configuration
 194    file and save the file. ::
 195
 196         mon host = {ip-address}[,{ip-address}]
 197
 198    For example::
 199
 200         mon host = 192.168.0.1
 201
 202    **Note:** You may use IPv6 addresses instead of IPv4 addresses, but
 203    you must set ``ms bind ipv6`` to ``true``. See `Network Configuration
 204    Reference`_ for details about network configuration.
 205
 206 #. Create a keyring for your cluster and generate a monitor secret key. ::
 207
 208         ceph-authtool --create-keyring /tmp/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *'
 209
 210
 211 #. Generate an administrator keyring, generate a ``client.admin`` user and add
 212    the user to the keyring. ::
 213
 214         sudo ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
 215
 216
 217 #. Add the ``client.admin`` key to the ``ceph.mon.keyring``. ::
 218
 219         ceph-authtool /tmp/ceph.mon.keyring --import-keyring /etc/ceph/ceph.client.admin.keyring
 220
 221
 222 #. Generate a monitor map using the hostname(s), host IP address(es) and the FSID.
 223    Save it as ``/tmp/monmap``::
 224
 225         monmaptool --create --add {hostname} {ip-address} --fsid {uuid} /tmp/monmap
 226
 227    For example::
 228
 229         monmaptool --create --add node1 192.168.0.1 --fsid a7f64266-0894-4f1e-a635-d0aeaca0e993 /tmp/monmap
 230
 231
 232 #. Create a default data directory (or directories) on the monitor host(s). ::
 233
 234         sudo mkdir /var/lib/ceph/mon/{cluster-name}-{hostname}
 235
 236    For example::
 237
 238         sudo mkdir /var/lib/ceph/mon/ceph-node1
 239
 240    See `Monitor Config Reference - Data`_ for details.
 241
 242 #. Populate the monitor daemon(s) with the monitor map and keyring. ::
 243
 244         sudo -u ceph ceph-mon [--cluster {cluster-name}] --mkfs -i {hostname} --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
 245
 246    For example::
 247
 248         sudo -u ceph ceph-mon --mkfs -i node1 --monmap /tmp/monmap --keyring /tmp/ceph.mon.keyring
 249
 250
 251 #. Consider settings for a Ceph configuration file. Common settings include
 252    the following::
 253
 254         [global]
 255         fsid = {cluster-id}
 256         mon initial members = {hostname}[, {hostname}]
 257         mon host = {ip-address}[, {ip-address}]
 258         public network = {network}[, {network}]
 259         cluster network = {network}[, {network}]
 260         auth cluster required = cephx
 261         auth service required = cephx
 262         auth client required = cephx
 263         osd journal size = {n}
 264         osd pool default size = {n}  # Write an object n times.
 265         osd pool default min size = {n} # Allow writing n copy in a degraded state.
 266         osd pool default pg num = {n}
 267         osd pool default pgp num = {n}
 268         osd crush chooseleaf type = {n}
 269
 270    In the foregoing example, the ``[global]`` section of the configuration might
 271    look like this::
 272
 273         [global]
 274         fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
 275         mon initial members = node1
 276         mon host = 192.168.0.1
 277         public network = 192.168.0.0/24
 278         auth cluster required = cephx
 279         auth service required = cephx
 280         auth client required = cephx
 281         osd journal size = 1024
 282         osd pool default size = 2
 283         osd pool default min size = 1
 284         osd pool default pg num = 333
 285         osd pool default pgp num = 333
 286         osd crush chooseleaf type = 1
 287
 288 #. Touch the ``done`` file.
 289
 290    Mark that the monitor is created and ready to be started::
 291
 292         sudo touch /var/lib/ceph/mon/ceph-node1/done
 293
 294 #. And for FreeBSD an entry for every monitor needs to be added to the config
 295    file. (The requirement will be removed in future releases).
 296
 297    The entry should look like::
 298
 299      [mon]
 300          [mon.node1]
 301              host = node1    # this name can be resolve
 302
 303
 304 #. Start the monitor(s).
 305
 306    For Ubuntu, use Upstart::
 307
 308         sudo start ceph-mon id=node1 [cluster={cluster-name}]
 309
 310    In this case, to allow the start of the daemon at each reboot you
 311    must create two empty files like this::
 312
 313         sudo touch /var/lib/ceph/mon/{cluster-name}-{hostname}/upstart
 314
 315    For example::
 316
 317         sudo touch /var/lib/ceph/mon/ceph-node1/upstart
 318
 319    For Debian/CentOS/RHEL, use sysvinit::
 320
 321         sudo /etc/init.d/ceph start mon.node1
 322
 323    For FreeBSD we use the rc.d init scripts (called bsdrc in Ceph)::
 324
 325         sudo service ceph start start mon.node1
 326
 327    For this to work /etc/rc.conf also needs the entry to enable ceph::
 328      cat 'ceph_enable="YES"' >> /etc/rc.conf
 329
 330
 331 #. Verify that Ceph created the default pools. ::
 332
 333         ceph osd lspools
 334
 335    You should see output like this::
 336
 337         0 data,1 metadata,2 rbd,
 338
 339
 340 #. Verify that the monitor is running. ::
 341
 342         ceph -s
 343
 344    You should see output that the monitor you started is up and running, and
 345    you should see a health error indicating that placement groups are stuck
 346    inactive. It should look something like this::
 347
 348         cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
 349           health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
 350           monmap e1: 1 mons at {node1=192.168.0.1:6789/0}, election epoch 1, quorum 0 node1
 351           osdmap e1: 0 osds: 0 up, 0 in
 352           pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
 353              0 kB used, 0 kB / 0 kB avail
 354              192 creating
 355
 356    **Note:** Once you add OSDs and start them, the placement group health errors
 357    should disappear. See the next section for details.
 358
 359
 360 Adding OSDs
 361 ===========
 362
 363 Once you have your initial monitor(s) running, you should add OSDs. Your cluster
 364 cannot reach an ``active + clean`` state until you have enough OSDs to handle the
 365 number of copies of an object (e.g., ``osd pool default size = 2`` requires at
 366 least two OSDs). After bootstrapping your monitor, your cluster has a default
 367 CRUSH map; however, the CRUSH map doesn't have any Ceph OSD Daemons mapped to
 368 a Ceph Node.
 369
 370
 371 Short Form
 372 ----------
 373
 374 Ceph provides the ``ceph-disk`` utility, which can prepare a disk, partition or
 375 directory for use with Ceph. The ``ceph-disk`` utility creates the OSD ID by
 376 incrementing the index. Additionally, ``ceph-disk`` will add the new OSD to the
 377 CRUSH map under the host for you. Execute ``ceph-disk -h`` for CLI details.
 378 The ``ceph-disk`` utility automates the steps of the `Long Form`_ below. To
 379 create the first two OSDs with the short form procedure, execute the following
 380 on  ``node2`` and ``node3``:
 381
 382
 383 #. Prepare the OSD. ::
 384
 385    On FreeBSD only existing directories can be use to create OSDs in::
 386
 387
 388         ssh {node-name}
 389         sudo ceph-disk prepare --cluster {cluster-name} --cluster-uuid {uuid} {path-to-ceph-osd-directory}
 390
 391    For example::
 392
 393         ssh node1
 394         sudo ceph-disk prepare --cluster ceph --cluster-uuid a7f64266-0894-4f1e-a635-d0aeaca0e993 /var/lib/ceph/osd/osd.1
 395
 396
 397 #. Activate the OSD::
 398
 399         sudo ceph-disk activate {data-path} [--activate-key {path}]
 400
 401    For example::
 402
 403         sudo ceph-disk activate /var/lib/ceph/osd/osd.1
 404
 405    **Note:** Use the ``--activate-key`` argument if you do not have a copy
 406    of ``/var/lib/ceph/bootstrap-osd/{cluster}.keyring`` on the Ceph Node.
 407
 408    FreeBSD does not auto start the OSDs, but also requires a entry in
 409    ``ceph.conf``. One for each OSD::
 410
 411      [osd]
 412      [osd.1]
 413          host = node1    # this name can be resolve
 414
 415
 416 Long Form
 417 ---------
 418
 419 Without the benefit of any helper utilities, create an OSD and add it to the
 420 cluster and CRUSH map with the following procedure. To create the first two
 421 OSDs with the long form procedure, execute the following on ``node2`` and
 422 ``node3``:
 423
 424 #. Connect to the OSD host. ::
 425
 426         ssh {node-name}
 427
 428 #. Generate a UUID for the OSD. ::
 429
 430         uuidgen
 431
 432
 433 #. Create the OSD. If no UUID is given, it will be set automatically when the
 434    OSD starts up. The following command will output the OSD number, which you
 435    will need for subsequent steps. ::
 436
 437         ceph osd create [{uuid} [{id}]]
 438
 439
 440 #. Create the default directory on your new OSD. ::
 441
 442         ssh {new-osd-host}
 443         sudo mkdir /var/lib/ceph/osd/{cluster-name}-{osd-number}
 444
 445    Above are the ZFS instructions to do this for FreeBSD.
 446
 447
 448 #. If the OSD is for a drive other than the OS drive, prepare it
 449    for use with Ceph, and mount it to the directory you just created.
 450
 451
 452 #. Initialize the OSD data directory. ::
 453
 454         ssh {new-osd-host}
 455         sudo ceph-osd -i {osd-num} --mkfs --mkkey --osd-uuid [{uuid}]
 456
 457    The directory must be empty before you can run ``ceph-osd`` with the
 458    ``--mkkey`` option. In addition, the ceph-osd tool requires specification
 459    of custom cluster names with the ``--cluster`` option.
 460
 461
 462 #. Register the OSD authentication key. The value of ``ceph`` for
 463    ``ceph-{osd-num}`` in the path is the ``$cluster-$id``.  If your
 464    cluster name differs from ``ceph``, use your cluster name instead.::
 465
 466         sudo ceph auth add osd.{osd-num} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/{cluster-name}-{osd-num}/keyring
 467
 468
 469 #. Add your Ceph Node to the CRUSH map. ::
 470
 471         ceph [--cluster {cluster-name}] osd crush add-bucket {hostname} host
 472
 473    For example::
 474
 475         ceph osd crush add-bucket node1 host
 476
 477
 478 #. Place the Ceph Node under the root ``default``. ::
 479
 480         ceph osd crush move node1 root=default
 481
 482
 483 #. Add the OSD to the CRUSH map so that it can begin receiving data. You may
 484    also decompile the CRUSH map, add the OSD to the device list, add the host as a
 485    bucket (if it's not already in the CRUSH map), add the device as an item in the
 486    host, assign it a weight, recompile it and set it. ::
 487
 488         ceph [--cluster {cluster-name}] osd crush add {id-or-name} {weight} [{bucket-type}={bucket-name} ...]
 489
 490    For example::
 491
 492         ceph osd crush add osd.0 1.0 host=node1
 493
 494
 495 #. After you add an OSD to Ceph, the OSD is in your configuration. However,
 496    it is not yet running. The OSD is ``down`` and ``in``. You must start
 497    your new OSD before it can begin receiving data.
 498
 499    For Ubuntu, use Upstart::
 500
 501         sudo start ceph-osd id={osd-num} [cluster={cluster-name}]
 502
 503    For example::
 504
 505         sudo start ceph-osd id=0
 506         sudo start ceph-osd id=1
 507
 508    For Debian/CentOS/RHEL, use sysvinit::
 509
 510         sudo /etc/init.d/ceph start osd.{osd-num} [--cluster {cluster-name}]
 511
 512    For example::
 513
 514         sudo /etc/init.d/ceph start osd.0
 515         sudo /etc/init.d/ceph start osd.1
 516
 517    In this case, to allow the start of the daemon at each reboot you
 518    must create an empty file like this::
 519
 520         sudo touch /var/lib/ceph/osd/{cluster-name}-{osd-num}/sysvinit
 521
 522    For example::
 523
 524         sudo touch /var/lib/ceph/osd/ceph-0/sysvinit
 525         sudo touch /var/lib/ceph/osd/ceph-1/sysvinit
 526
 527    Once you start your OSD, it is ``up`` and ``in``.
 528
 529    For FreeBSD using rc.d init::
 530
 531    After adding the OSD to ``ceph.conf``
 532
 533         sudo service ceph start osd.{osd-num}
 534
 535    For example::
 536
 537         sudo service ceph start osd.0
 538         sudo service ceph start osd.1
 539
 540    In this case, to allow the start of the daemon at each reboot you
 541    must create an empty file like this::
 542
 543         sudo touch /var/lib/ceph/osd/{cluster-name}-{osd-num}/bsdrc
 544
 545    For example::
 546
 547         sudo touch /var/lib/ceph/osd/ceph-0/bsdrc
 548         sudo touch /var/lib/ceph/osd/ceph-1/bsdrc
 549
 550    Once you start your OSD, it is ``up`` and ``in``.
 551
 552
 553
 554 Adding MDS
 555 ==========
 556
 557 In the below instructions, ``{id}`` is an arbitrary name, such as the hostname of the machine.
 558
 559 #. Create the mds data directory.::
 560
 561         mkdir -p /var/lib/ceph/mds/{cluster-name}-{id}
 562
 563 #. Create a keyring.::
 564
 565         ceph-authtool --create-keyring /var/lib/ceph/mds/{cluster-name}-{id}/keyring --gen-key -n mds.{id}
 566
 567 #. Import the keyring and set caps.::
 568
 569         ceph auth add mds.{id} osd "allow rwx" mds "allow" mon "allow profile mds" -i /var/lib/ceph/mds/{cluster}-{id}/keyring
 570
 571 #. Add to ceph.conf.::
 572
 573         [mds.{id}]
 574         host = {id}
 575
 576 #. Start the daemon the manual way.::
 577
 578         ceph-mds --cluster {cluster-name} -i {id} -m {mon-hostname}:{mon-port} [-f]
 579
 580 #. Start the daemon the right way (using ceph.conf entry).::
 581
 582         service ceph start
 583
 584 #. If starting the daemon fails with this error::
 585
 586         mds.-1.0 ERROR: failed to authenticate: (22) Invalid argument
 587
 588    Then make sure you do not have a keyring set in ceph.conf in the global section; move it to the client section; or add a keyring setting specific to this mds daemon. And verify that you see the same key in the mds data directory and ``ceph auth get mds.{id}`` output.
 589
 590 #. Now you are ready to `create a Ceph filesystem`_.
 591
 592
 593 Summary
 594 =======
 595
 596 Once you have your monitor and two OSDs up and running, you can watch the
 597 placement groups peer by executing the following::
 598
 599         ceph -w
 600
 601 To view the tree, execute the following::
 602
 603         ceph osd tree
 604
 605 You should see output that looks something like this::
 606
 607         # id    weight  type name       up/down reweight
 608         -1      2       root default
 609         -2      2               host node1
 610         0       1                       osd.0   up      1
 611         -3      1               host node2
 612         1       1                       osd.1   up      1
 613
 614 To add (or remove) additional monitors, see `Add/Remove Monitors`_.
 615 To add (or remove) additional Ceph OSD Daemons, see `Add/Remove OSDs`_.
 616
 617
 618 .. _federated architecture: ../../radosgw/federated-config
 619 .. _Installation (Quick): ../../start
 620 .. _Add/Remove Monitors: ../../rados/operations/add-or-rm-mons
 621 .. _Add/Remove OSDs: ../../rados/operations/add-or-rm-osds
 622 .. _Network Configuration Reference: ../../rados/configuration/network-config-ref
 623 .. _Monitor Config Reference - Data: ../../rados/configuration/mon-config-ref#data
 624 .. _create a Ceph filesystem: ../../cephfs/createfs