ceph/doc/dev/cephadm/developing-cephadm.rst

   1 =======================
   2 Developing with cephadm
   3 =======================
   4
   5 There are several ways to develop with cephadm.  Which you use depends
   6 on what you're trying to accomplish.
   7
   8 vstart --cephadm
   9 ================
  10
  11 - Start a cluster with vstart, with cephadm configured
  12 - Manage any additional daemons with cephadm
  13 - Requires compiled ceph binaries
  14
  15 In this case, the mon and manager at a minimum are running in the usual
  16 vstart way, not managed by cephadm.  But cephadm is enabled and the local
  17 host is added, so you can deploy additional daemons or add additional hosts.
  18
  19 This works well for developing cephadm itself, because any mgr/cephadm
  20 or cephadm/cephadm code changes can be applied by kicking ceph-mgr
  21 with ``ceph mgr fail x``.  (When the mgr (re)starts, it loads the
  22 cephadm/cephadm script into memory.)
  23
  24 ::
  25
  26    MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm
  27
  28 - ``~/.ssh/id_dsa[.pub]`` is used as the cluster key.  It is assumed that
  29   this key is authorized to ssh with no passphrase to root@`hostname`.
  30 - cephadm does not try to manage any daemons started by vstart.sh (any
  31   nonzero number in the environment variables).  No service spec is defined
  32   for mon or mgr.
  33 - You'll see health warnings from cephadm about stray daemons--that's because
  34   the vstart-launched daemons aren't controlled by cephadm.
  35 - The default image is ``quay.io/ceph-ci/ceph:master``, but you can change
  36   this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.
  37
  38
  39 cstart and cpatch
  40 =================
  41
  42 The ``cstart.sh`` script will launch a cluster using cephadm and put the
  43 conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
  44 (just like with vstart).  The ``ckill.sh`` script will tear it down.
  45
  46 - A unique but stable fsid is stored in ``fsid`` (in the build dir).
  47 - The mon port is random, just like with vstart.
  48 - The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
  49   the first 8 chars of the fsid.
  50 - If the container image doesn't exist yet when you run cstart for the
  51   first time, it is built with cpatch.
  52
  53 There are a few advantages here:
  54
  55 - The cluster is a "normal" cephadm cluster that looks and behaves
  56   just like a user's cluster would.  In contrast, vstart and teuthology
  57   clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
  58   having the ``lockdep`` turned on).
  59
  60 To start a test cluster::
  61
  62   sudo ../src/cstart.sh
  63
  64 The last line of the output will be a line you can cut+paste to update
  65 the container image.  For instance::
  66
  67   sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e
  68
  69 By default, cpatch will patch everything it can think of from the local
  70 build dir into the container image.  If you are working on a specific
  71 part of the system, though, can you get away with smaller changes so that
  72 cpatch runs faster.  For instance::
  73
  74   sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py
  75
  76 will update the mgr modules (minus the dashboard).  Or::
  77
  78   sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core
  79
  80 will do most binaries and libraries.  Pass ``-h`` to cpatch for all options.
  81
  82 Once the container is updated, you can refresh/restart daemons by bouncing
  83 them with::
  84
  85   sudo systemctl restart ceph-`cat fsid`.target
  86
  87 When you're done, you can tear down the cluster with::
  88
  89   sudo ../src/ckill.sh   # or,
  90   sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`
  91
  92 cephadm bootstrap --shared_ceph_folder
  93 ======================================
  94
  95 Cephadm can also be used directly without compiled ceph binaries.
  96
  97 Run cephadm like so::
  98
  99   sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
 100     --ssh-private-key /home/<user>/.ssh/id_rsa \
 101     --skip-mon-network \
 102     --skip-monitoring-stack --single-host-defaults \
 103     --skip-dashboard \
 104     --shared_ceph_folder /home/<user>/path/to/ceph/
 105
 106 - ``~/.ssh/id_rsa`` is used as the cluster key.  It is assumed that
 107   this key is authorized to ssh with no passphrase to root@`hostname`.
 108
 109 Source code changes made in the ``pybind/mgr/`` directory then
 110 require a daemon restart to take effect.
 111
 112 Kcli: a virtualization management tool to make easy orchestrators development
 113 =============================================================================
 114 `Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
 115 virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
 116 GCP and AWS) and to easily deploy and customize VMs from cloud images.
 117
 118 It allows you to setup an environment with several vms with your preferred
 119 configuration (memory, cpus, disks) and OS flavor.
 120
 121 main advantages:
 122 ----------------
 123   - Fast. Typically you can have a completely new Ceph cluster ready to debug
 124     and develop orchestrator features in less than 5 minutes.
 125   - "Close to production" lab. The resulting lab is close to "real" clusters
 126     in QE labs or even production. It makes it easy to test "real things" in
 127     an almost "real" environment.
 128   - Safe and isolated. Does not depend of the things you have installed in
 129     your machine. And the vms are isolated from your environment.
 130   - Easy to work "dev" environment. For "not compilated" software pieces,
 131     for example any mgr module. It is an environment that allow you to test your
 132     changes interactively.
 133
 134 Installation:
 135 -------------
 136 Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
 137 but we suggest to use the container image approach.
 138
 139 So things to do:
 140   - 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
 141     and install/configure whatever is needed to meet them.
 142   - 2. get the kcli image and create one alias for executing the kcli command
 143     ::
 144
 145         # podman pull quay.io/karmab/kcli
 146         # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'
 147
 148 .. note:: This assumes that /var/lib/libvirt/images is your default libvirt pool.... Adjust if using a different path
 149
 150 .. note:: Once you have used your kcli tool to create and use different labs, we
 151    suggest you stick to a given container tag and update your kcli alias.
 152    Why? kcli uses a rolling release model and sticking to a specific
 153    container tag will improve overall stability.
 154    what we want is overall stability.
 155
 156 Test your kcli installation:
 157 ----------------------------
 158 See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_
 159
 160 Create a Ceph lab cluster
 161 -------------------------
 162 In order to make this task simple, we are going to use a "plan".
 163
 164 A "plan" is a file where you can define a set of vms with different settings.
 165 You can define hardware parameters (cpu, memory, disks ..), operating system and
 166 it also allows you to automate the installation and configuration of any
 167 software you want to have.
 168
 169 There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
 170 plans that can be used for different purposes. And we have predefined plans to
 171 install Ceph clusters using Ceph ansible or cephadm, so let's create our first Ceph
 172 cluster using cephadm::
 173
 174 # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml
 175
 176 This will create a set of three vms using the plan file pointed by the url.
 177 After a few minutes, let's check the cluster:
 178
 179 * Take a look to the vms created::
 180
 181   # kcli list vms
 182
 183 * Enter in the bootstrap node::
 184
 185   # kcli ssh ceph-node-00
 186
 187 * Take a look to the ceph cluster installed::
 188
 189   [centos@ceph-node-00 ~]$ sudo -i
 190   [root@ceph-node-00 ~]# cephadm version
 191   [root@ceph-node-00 ~]# cephadm shell
 192   [ceph: root@ceph-node-00 /]# ceph orch host ls
 193
 194 Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
 195 ------------------------------------------------------------------------------------------
 196 The cephadm kcli plan (and cephadm) are prepared to do that.
 197
 198 The idea behind this method is to replace several python mgr folders in each of
 199 the ceph daemons with the source code folders in your host machine.
 200 This "trick" will allow you to make changes in any orchestrator or dashboard
 201 module and test them intermediately. (only needed to disable/enable the mgr module)
 202
 203 So in order to create a ceph cluster for development purposes you must use the
 204 same cephadm plan but with a new parameter pointing to your Ceph source code folder::
 205
 206   # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph
 207
 208 Ceph Dashboard development
 209 --------------------------
 210 Ceph dashboard module is not going to be loaded if previously you have not
 211 generated the frontend bundle.
 212
 213 For now, in order load properly the Ceph Dashboardmodule and to apply frontend
 214 changes you have to run "ng build" on your laptop::
 215
 216   # Start local frontend build with watcher (in background):
 217   sudo dnf install -y nodejs
 218   cd <path-to-your-ceph-repo>
 219   cd src/pybind/mgr/dashboard/frontend
 220   sudo chown -R <your-user>:root dist node_modules
 221   NG_CLI_ANALYTICS=false npm ci
 222   npm run build -- --deleteOutputPath=false --watch &
 223
 224 After saving your changes, the frontend bundle will be built again.
 225 When completed, you'll see::
 226
 227   "Localized bundle generation complete."
 228
 229 Then you can reload your Dashboard browser tab.
 230
 231 Cephadm DiD (Docker in Docker) box development environment
 232 ==========================================================
 233
 234 As kcli has a long startup time, we created an alternative which is faster using
 235 Docker inside Docker. This approach has its downsides too as we have to
 236 simulate the creation of osds and addition of devices with loopback devices.
 237
 238 Cephadm's DiD environment is a command which requires little to setup. The setup
 239 requires you to get the required docker images for what we call boxes and ceph.
 240 A box is the first layer of docker containers which can be either a seed or a
 241 host. A seed is the main box which holds cephadm and where you bootstrap the
 242 cluster. On the other hand, you have hosts with an ssh server setup so you can
 243 add those hosts to the cluster. The second layer, managed by cephadm, inside the
 244 seed box, requires the ceph image.
 245
 246 .. warning:: This development environment is still experimental and can have unexpected
 247              behaviour. Please take a look at the road map and the known issues section
 248              to see what the development progress.
 249
 250 Requirements
 251 ------------
 252
 253 * `docker-compose <https://docs.docker.com/compose/install/>`_
 254 * lvm
 255
 256 Setup
 257 -----
 258
 259 In order to setup Cephadm's box run::
 260
 261   cd src/cephadm/box
 262   sudo ln -sf "$PWD"/box.py /usr/bin/box
 263   sudo box -v cluster setup
 264
 265 .. note:: It is recommended to run box with verbose (-v).
 266
 267 After getting all needed images we can create a simple cluster without osds and hosts with::
 268
 269   sudo box -v cluster start
 270
 271 If you want to deploy the cluster with more osds and hosts::
 272   # 3 osds and 3 hosts by default
 273   sudo box -v cluster start --extended
 274   # explicitly change number of hosts and osds
 275   sudo box -v cluster start --extended --osds 5 --hosts 5
 276
 277 Without the extended option, explicitly adding either more hosts or osds won't change the state
 278 of the cluster.
 279
 280 .. note:: Cluster start will try to setup even if cluster setup was not called.
 281 .. note:: Osds are created with loopback devices and hence, sudo is needed to
 282    create loopback devices capable of holding osds.
 283 .. note::  Each osd will require 5GiB of space.
 284
 285 After bootstraping the cluster you can go inside the seed box in which you'll be
 286 able to run cehpadm commands::
 287
 288   box -v cluster sh
 289   [root@8d52a7860245] cephadm --help
 290   ...
 291
 292
 293 If you want to navigate to the dashboard you can find the ip address after running::
 294   docker ps
 295   docker inspect <container-id> | grep IPAddress
 296
 297 The address will be https://$IPADDRESS:8443
 298
 299 You can also find the hostname and ip of each box container with::
 300
 301   sudo box cluster list
 302
 303 and you'll see something like::
 304
 305   IP               Name            Hostname
 306   172.30.0.2       box_hosts_1     6283b7b51d91
 307   172.30.0.3       box_hosts_3     3dcf7f1b25a4
 308   172.30.0.4       box_seed_1      8d52a7860245
 309   172.30.0.5       box_hosts_2     c3c7b3273bf1
 310
 311 To remove the cluster and clean up run::
 312
 313   box cluster down
 314
 315 If you just want to clean up the last cluster created run::
 316
 317   box cluster cleanup
 318
 319 To check all available commands run::
 320
 321   box --help
 322
 323
 324 Known issues
 325 ------------
 326
 327 * If you get permission issues with cephadm because it cannot infer the keyring
 328   and configuration, please run cephadm like this example::
 329
 330     cephadm shell --config /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.kerying
 331
 332 * Docker containers run with the --privileged flag enabled which has been seen
 333   to make some computers log out.
 334
 335 * Sometimes when starting a cluster the osds won't get deployed because cephadm
 336   takes a while to update the state. If this happens wait and call::
 337
 338     box -v osd deploy --vg vg1
 339
 340 Road map
 341 ------------
 342
 343 * Run containers without --privileged
 344 * Enable ceph-volume to mark loopback devices as a valid block device in
 345   the inventory.
 346 * Make DiD ready to run dashboard CI tests (including cluster expansion).
 347
 348 Note regarding network calls from CLI handlers
 349 ==============================================
 350
 351 Executing any cephadm CLI commands like ``ceph orch ls`` will block the
 352 mon command handler thread within the MGR, thus preventing any concurrent
 353 CLI calls. Note that pressing ``^C`` will not resolve this situation,
 354 as *only* the client will be aborted, but not execution of the command
 355 within the orchestrator manager module itself. This means, cephadm will
 356 be completely unresponsive until the execution of the CLI handler is
 357 fully completed. Note that even ``ceph orch ps`` will not respond while
 358 another handler is executing.
 359
 360 This means we should do very few synchronous calls to remote hosts.
 361 As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
 362 Everything else should be done asynchronously in other threads, like ``serve()``.
 363
 364 Note regarding different variables used in the code
 365 ===================================================
 366
 367 * a ``service_type`` is something like mon, mgr, alertmanager etc defined
 368   in ``ServiceSpec``
 369 * a ``service_id`` is the name of the service. Some services don't have
 370   names.
 371 * a ``service_name`` is ``<service_type>.<service_id>``
 372 * a ``daemon_type`` is the same as the service_type, except for ingress,
 373   which has the haproxy and keepalived daemon types.
 374 * a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
 375   (Not the case for e.g. OSDs. OSDs are always called OSD.N)
 376 * a ``daemon_name`` is ``<daemon_type>.<daemon_id>``