]> git.proxmox.com Git - ceph.git/blame - ceph/doc/dev/cephadm/developing-cephadm.rst
import ceph quincy 17.2.6
[ceph.git] / ceph / doc / dev / cephadm / developing-cephadm.rst
CommitLineData
1911f103
TL
1=======================
2Developing with cephadm
3=======================
4
5There are several ways to develop with cephadm. Which you use depends
6on what you're trying to accomplish.
7
8vstart --cephadm
9================
10
11- Start a cluster with vstart, with cephadm configured
12- Manage any additional daemons with cephadm
b3b6e05e 13- Requires compiled ceph binaries
1911f103
TL
14
15In this case, the mon and manager at a minimum are running in the usual
16vstart way, not managed by cephadm. But cephadm is enabled and the local
17host is added, so you can deploy additional daemons or add additional hosts.
18
19This works well for developing cephadm itself, because any mgr/cephadm
20or cephadm/cephadm code changes can be applied by kicking ceph-mgr
21with ``ceph mgr fail x``. (When the mgr (re)starts, it loads the
22cephadm/cephadm script into memory.)
23
24::
25
26 MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm
27
28- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key. It is assumed that
f6b5b4d7
TL
29 this key is authorized to ssh with no passphrase to root@`hostname`.
30- cephadm does not try to manage any daemons started by vstart.sh (any
31 nonzero number in the environment variables). No service spec is defined
32 for mon or mgr.
1911f103 33- You'll see health warnings from cephadm about stray daemons--that's because
f6b5b4d7 34 the vstart-launched daemons aren't controlled by cephadm.
39ae355f 35- The default image is ``quay.io/ceph-ci/ceph:main``, but you can change
1911f103
TL
36 this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.
37
38
39cstart and cpatch
40=================
41
42The ``cstart.sh`` script will launch a cluster using cephadm and put the
43conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
44(just like with vstart). The ``ckill.sh`` script will tear it down.
45
46- A unique but stable fsid is stored in ``fsid`` (in the build dir).
47- The mon port is random, just like with vstart.
48- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
49 the first 8 chars of the fsid.
50- If the container image doesn't exist yet when you run cstart for the
51 first time, it is built with cpatch.
52
53There are a few advantages here:
54
55- The cluster is a "normal" cephadm cluster that looks and behaves
20effc67
TL
56 just like a user's cluster would. In contrast, vstart and teuthology
57 clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
58 having the ``lockdep`` turned on).
1911f103
TL
59
60To start a test cluster::
61
62 sudo ../src/cstart.sh
63
20effc67
TL
64The last line of the output will be a line you can cut+paste to update
65the container image. For instance::
1911f103 66
f67539c2 67 sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e
1911f103
TL
68
69By default, cpatch will patch everything it can think of from the local
70build dir into the container image. If you are working on a specific
71part of the system, though, can you get away with smaller changes so that
72cpatch runs faster. For instance::
73
f67539c2 74 sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py
1911f103
TL
75
76will update the mgr modules (minus the dashboard). Or::
77
f67539c2 78 sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core
1911f103
TL
79
80will do most binaries and libraries. Pass ``-h`` to cpatch for all options.
81
82Once the container is updated, you can refresh/restart daemons by bouncing
83them with::
84
85 sudo systemctl restart ceph-`cat fsid`.target
86
87When you're done, you can tear down the cluster with::
88
89 sudo ../src/ckill.sh # or,
90 sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`
e306af50 91
b3b6e05e
TL
92cephadm bootstrap --shared_ceph_folder
93======================================
94
95Cephadm can also be used directly without compiled ceph binaries.
96
97Run cephadm like so::
98
99 sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
100 --ssh-private-key /home/<user>/.ssh/id_rsa \
101 --skip-mon-network \
102 --skip-monitoring-stack --single-host-defaults \
20effc67 103 --skip-dashboard \
b3b6e05e
TL
104 --shared_ceph_folder /home/<user>/path/to/ceph/
105
106- ``~/.ssh/id_rsa`` is used as the cluster key. It is assumed that
107 this key is authorized to ssh with no passphrase to root@`hostname`.
108
109Source code changes made in the ``pybind/mgr/`` directory then
20effc67 110require a daemon restart to take effect.
522d829b 111
b3b6e05e
TL
112Kcli: a virtualization management tool to make easy orchestrators development
113=============================================================================
114`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
115virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
116GCP and AWS) and to easily deploy and customize VMs from cloud images.
117
118It allows you to setup an environment with several vms with your preferred
20effc67 119configuration (memory, cpus, disks) and OS flavor.
b3b6e05e
TL
120
121main advantages:
122----------------
20effc67 123 - Fast. Typically you can have a completely new Ceph cluster ready to debug
b3b6e05e 124 and develop orchestrator features in less than 5 minutes.
20effc67
TL
125 - "Close to production" lab. The resulting lab is close to "real" clusters
126 in QE labs or even production. It makes it easy to test "real things" in
127 an almost "real" environment.
128 - Safe and isolated. Does not depend of the things you have installed in
129 your machine. And the vms are isolated from your environment.
b3b6e05e
TL
130 - Easy to work "dev" environment. For "not compilated" software pieces,
131 for example any mgr module. It is an environment that allow you to test your
132 changes interactively.
133
134Installation:
135-------------
136Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
20effc67 137but we suggest to use the container image approach.
b3b6e05e
TL
138
139So things to do:
140 - 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
20effc67 141 and install/configure whatever is needed to meet them.
b3b6e05e
TL
142 - 2. get the kcli image and create one alias for executing the kcli command
143 ::
144
145 # podman pull quay.io/karmab/kcli
146 # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'
147
20effc67 148.. note:: This assumes that /var/lib/libvirt/images is your default libvirt pool.... Adjust if using a different path
b3b6e05e
TL
149
150.. note:: Once you have used your kcli tool to create and use different labs, we
20effc67
TL
151 suggest you stick to a given container tag and update your kcli alias.
152 Why? kcli uses a rolling release model and sticking to a specific
153 container tag will improve overall stability.
154 what we want is overall stability.
b3b6e05e
TL
155
156Test your kcli installation:
157----------------------------
158See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_
159
160Create a Ceph lab cluster
161-------------------------
20effc67 162In order to make this task simple, we are going to use a "plan".
b3b6e05e 163
20effc67 164A "plan" is a file where you can define a set of vms with different settings.
b3b6e05e
TL
165You can define hardware parameters (cpu, memory, disks ..), operating system and
166it also allows you to automate the installation and configuration of any
167software you want to have.
168
169There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
170plans that can be used for different purposes. And we have predefined plans to
20effc67 171install Ceph clusters using Ceph ansible or cephadm, so let's create our first Ceph
b3b6e05e
TL
172cluster using cephadm::
173
20effc67 174# kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml
b3b6e05e
TL
175
176This will create a set of three vms using the plan file pointed by the url.
20effc67 177After a few minutes, let's check the cluster:
b3b6e05e
TL
178
179* Take a look to the vms created::
180
181 # kcli list vms
182
183* Enter in the bootstrap node::
184
185 # kcli ssh ceph-node-00
186
187* Take a look to the ceph cluster installed::
188
189 [centos@ceph-node-00 ~]$ sudo -i
190 [root@ceph-node-00 ~]# cephadm version
191 [root@ceph-node-00 ~]# cephadm shell
192 [ceph: root@ceph-node-00 /]# ceph orch host ls
193
194Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
195------------------------------------------------------------------------------------------
196The cephadm kcli plan (and cephadm) are prepared to do that.
197
198The idea behind this method is to replace several python mgr folders in each of
199the ceph daemons with the source code folders in your host machine.
200This "trick" will allow you to make changes in any orchestrator or dashboard
201module and test them intermediately. (only needed to disable/enable the mgr module)
202
203So in order to create a ceph cluster for development purposes you must use the
20effc67 204same cephadm plan but with a new parameter pointing to your Ceph source code folder::
b3b6e05e
TL
205
206 # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph
207
208Ceph Dashboard development
209--------------------------
210Ceph dashboard module is not going to be loaded if previously you have not
211generated the frontend bundle.
212
213For now, in order load properly the Ceph Dashboardmodule and to apply frontend
214changes you have to run "ng build" on your laptop::
215
216 # Start local frontend build with watcher (in background):
217 sudo dnf install -y nodejs
218 cd <path-to-your-ceph-repo>
219 cd src/pybind/mgr/dashboard/frontend
220 sudo chown -R <your-user>:root dist node_modules
221 NG_CLI_ANALYTICS=false npm ci
222 npm run build -- --deleteOutputPath=false --watch &
223
224After saving your changes, the frontend bundle will be built again.
225When completed, you'll see::
226
227 "Localized bundle generation complete."
228
229Then you can reload your Dashboard browser tab.
20effc67
TL
230
231Cephadm DiD (Docker in Docker) box development environment
232==========================================================
233
234As kcli has a long startup time, we created an alternative which is faster using
235Docker inside Docker. This approach has its downsides too as we have to
236simulate the creation of osds and addition of devices with loopback devices.
237
238Cephadm's DiD environment is a command which requires little to setup. The setup
239requires you to get the required docker images for what we call boxes and ceph.
240A box is the first layer of docker containers which can be either a seed or a
241host. A seed is the main box which holds cephadm and where you bootstrap the
242cluster. On the other hand, you have hosts with an ssh server setup so you can
243add those hosts to the cluster. The second layer, managed by cephadm, inside the
244seed box, requires the ceph image.
245
246.. warning:: This development environment is still experimental and can have unexpected
247 behaviour. Please take a look at the road map and the known issues section
248 to see what the development progress.
249
250Requirements
251------------
252
253* `docker-compose <https://docs.docker.com/compose/install/>`_
254* lvm
255
256Setup
257-----
258
259In order to setup Cephadm's box run::
260
261 cd src/cephadm/box
262 sudo ln -sf "$PWD"/box.py /usr/bin/box
263 sudo box -v cluster setup
264
265.. note:: It is recommended to run box with verbose (-v).
266
33c7a0ef 267After getting all needed images we can create a simple cluster without osds and hosts with::
20effc67 268
33c7a0ef
TL
269 sudo box -v cluster start
270
271If you want to deploy the cluster with more osds and hosts::
272 # 3 osds and 3 hosts by default
273 sudo box -v cluster start --extended
274 # explicitly change number of hosts and osds
275 sudo box -v cluster start --extended --osds 5 --hosts 5
276
277Without the extended option, explicitly adding either more hosts or osds won't change the state
278of the cluster.
20effc67
TL
279
280.. note:: Cluster start will try to setup even if cluster setup was not called.
281.. note:: Osds are created with loopback devices and hence, sudo is needed to
282 create loopback devices capable of holding osds.
283.. note:: Each osd will require 5GiB of space.
284
285After bootstraping the cluster you can go inside the seed box in which you'll be
286able to run cehpadm commands::
287
288 box -v cluster sh
289 [root@8d52a7860245] cephadm --help
290 ...
291
292
293If you want to navigate to the dashboard you can find the ip address after running::
294 docker ps
295 docker inspect <container-id> | grep IPAddress
296
297The address will be https://$IPADDRESS:8443
298
299You can also find the hostname and ip of each box container with::
300
301 sudo box cluster list
302
303and you'll see something like::
304
305 IP Name Hostname
306 172.30.0.2 box_hosts_1 6283b7b51d91
307 172.30.0.3 box_hosts_3 3dcf7f1b25a4
308 172.30.0.4 box_seed_1 8d52a7860245
309 172.30.0.5 box_hosts_2 c3c7b3273bf1
310
311To remove the cluster and clean up run::
312
313 box cluster down
314
315If you just want to clean up the last cluster created run::
316
317 box cluster cleanup
318
319To check all available commands run::
320
321 box --help
322
323
324Known issues
325------------
326
327* If you get permission issues with cephadm because it cannot infer the keyring
328 and configuration, please run cephadm like this example::
329
330 cephadm shell --config /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.kerying
331
332* Docker containers run with the --privileged flag enabled which has been seen
333 to make some computers log out.
334
335* Sometimes when starting a cluster the osds won't get deployed because cephadm
336 takes a while to update the state. If this happens wait and call::
337
338 box -v osd deploy --vg vg1
339
340Road map
341------------
342
343* Run containers without --privileged
344* Enable ceph-volume to mark loopback devices as a valid block device in
345 the inventory.
346* Make DiD ready to run dashboard CI tests (including cluster expansion).
347
348Note regarding network calls from CLI handlers
349==============================================
350
351Executing any cephadm CLI commands like ``ceph orch ls`` will block the
352mon command handler thread within the MGR, thus preventing any concurrent
353CLI calls. Note that pressing ``^C`` will not resolve this situation,
354as *only* the client will be aborted, but not execution of the command
355within the orchestrator manager module itself. This means, cephadm will
356be completely unresponsive until the execution of the CLI handler is
357fully completed. Note that even ``ceph orch ps`` will not respond while
358another handler is executing.
359
360This means we should do very few synchronous calls to remote hosts.
361As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
362Everything else should be done asynchronously in other threads, like ``serve()``.
363
364Note regarding different variables used in the code
365===================================================
366
367* a ``service_type`` is something like mon, mgr, alertmanager etc defined
368 in ``ServiceSpec``
369* a ``service_id`` is the name of the service. Some services don't have
370 names.
371* a ``service_name`` is ``<service_type>.<service_id>``
372* a ``daemon_type`` is the same as the service_type, except for ingress,
373 which has the haproxy and keepalived daemon types.
374* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
375 (Not the case for e.g. OSDs. OSDs are always called OSD.N)
376* a ``daemon_name`` is ``<daemon_type>.<daemon_id>``