[ceph.git] / ceph / doc / dev / cephadm / developing-cephadm.rst

=======================
Developing with cephadm
=======================

There are several ways to develop with cephadm.  Which you use depends
on what you're trying to accomplish.

vstart --cephadm
================

- Start a cluster with vstart, with cephadm configured
- Manage any additional daemons with cephadm
- Requires compiled ceph binaries

In this case, the mon and manager at a minimum are running in the usual
vstart way, not managed by cephadm.  But cephadm is enabled and the local
host is added, so you can deploy additional daemons or add additional hosts.

This works well for developing cephadm itself, because any mgr/cephadm
or cephadm/cephadm code changes can be applied by kicking ceph-mgr
with ``ceph mgr fail x``.  (When the mgr (re)starts, it loads the
cephadm/cephadm script into memory.)

::

   MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm

- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key.  It is assumed that
  this key is authorized to ssh with no passphrase to root@`hostname`.
- cephadm does not try to manage any daemons started by vstart.sh (any
  nonzero number in the environment variables).  No service spec is defined
  for mon or mgr.
- You'll see health warnings from cephadm about stray daemons--that's because
  the vstart-launched daemons aren't controlled by cephadm.
- The default image is ``quay.io/ceph-ci/ceph:main``, but you can change
  this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.


cstart and cpatch
=================

The ``cstart.sh`` script will launch a cluster using cephadm and put the
conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
(just like with vstart).  The ``ckill.sh`` script will tear it down.

- A unique but stable fsid is stored in ``fsid`` (in the build dir).
- The mon port is random, just like with vstart.
- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
  the first 8 chars of the fsid.
- If the container image doesn't exist yet when you run cstart for the
  first time, it is built with cpatch.

There are a few advantages here:

- The cluster is a "normal" cephadm cluster that looks and behaves
  just like a user's cluster would.  In contrast, vstart and teuthology
  clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
  having the ``lockdep`` turned on).

To start a test cluster::

  sudo ../src/cstart.sh

The last line of the output will be a line you can cut+paste to update
the container image.  For instance::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e

By default, cpatch will patch everything it can think of from the local
build dir into the container image.  If you are working on a specific
part of the system, though, can you get away with smaller changes so that
cpatch runs faster.  For instance::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py

will update the mgr modules (minus the dashboard).  Or::

  sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core

will do most binaries and libraries.  Pass ``-h`` to cpatch for all options.

Once the container is updated, you can refresh/restart daemons by bouncing
them with::

  sudo systemctl restart ceph-`cat fsid`.target

When you're done, you can tear down the cluster with::

  sudo ../src/ckill.sh   # or,
  sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`

cephadm bootstrap --shared_ceph_folder
======================================

Cephadm can also be used directly without compiled ceph binaries.

Run cephadm like so::

  sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
    --ssh-private-key /home/<user>/.ssh/id_rsa \
    --skip-mon-network \
    --skip-monitoring-stack --single-host-defaults \
    --skip-dashboard \
    --shared_ceph_folder /home/<user>/path/to/ceph/

- ``~/.ssh/id_rsa`` is used as the cluster key.  It is assumed that
  this key is authorized to ssh with no passphrase to root@`hostname`.

Source code changes made in the ``pybind/mgr/`` directory then
require a daemon restart to take effect.

Kcli: a virtualization management tool to make easy orchestrators development
=============================================================================
`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
GCP and AWS) and to easily deploy and customize VMs from cloud images.

It allows you to setup an environment with several vms with your preferred
configuration (memory, cpus, disks) and OS flavor.

main advantages:
----------------
  - Fast. Typically you can have a completely new Ceph cluster ready to debug
    and develop orchestrator features in less than 5 minutes.
  - "Close to production" lab. The resulting lab is close to "real" clusters
    in QE labs or even production. It makes it easy to test "real things" in
    an almost "real" environment.
  - Safe and isolated. Does not depend of the things you have installed in
    your machine. And the vms are isolated from your environment.
  - Easy to work "dev" environment. For "not compilated" software pieces,
    for example any mgr module. It is an environment that allow you to test your
    changes interactively.

Installation:
-------------
Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
but we suggest to use the container image approach.

So things to do:
  - 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
    and install/configure whatever is needed to meet them.
  - 2. get the kcli image and create one alias for executing the kcli command
    ::

        # podman pull quay.io/karmab/kcli
        # alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'

.. note:: This assumes that /var/lib/libvirt/images is your default libvirt pool.... Adjust if using a different path

.. note:: Once you have used your kcli tool to create and use different labs, we
   suggest you stick to a given container tag and update your kcli alias.
   Why? kcli uses a rolling release model and sticking to a specific
   container tag will improve overall stability.
   what we want is overall stability.

Test your kcli installation:
----------------------------
See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_

Create a Ceph lab cluster
-------------------------
In order to make this task simple, we are going to use a "plan".

A "plan" is a file where you can define a set of vms with different settings.
You can define hardware parameters (cpu, memory, disks ..), operating system and
it also allows you to automate the installation and configuration of any
software you want to have.

There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
plans that can be used for different purposes. And we have predefined plans to
install Ceph clusters using Ceph ansible or cephadm, so let's create our first Ceph
cluster using cephadm::

# kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml

This will create a set of three vms using the plan file pointed by the url.
After a few minutes, let's check the cluster:

* Take a look to the vms created::

  # kcli list vms

* Enter in the bootstrap node::

  # kcli ssh ceph-node-00

* Take a look to the ceph cluster installed::

  [centos@ceph-node-00 ~]$ sudo -i
  [root@ceph-node-00 ~]# cephadm version
  [root@ceph-node-00 ~]# cephadm shell
  [ceph: root@ceph-node-00 /]# ceph orch host ls

Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
------------------------------------------------------------------------------------------
The cephadm kcli plan (and cephadm) are prepared to do that.

The idea behind this method is to replace several python mgr folders in each of
the ceph daemons with the source code folders in your host machine.
This "trick" will allow you to make changes in any orchestrator or dashboard
module and test them intermediately. (only needed to disable/enable the mgr module)

So in order to create a ceph cluster for development purposes you must use the
same cephadm plan but with a new parameter pointing to your Ceph source code folder::

  # kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph

Ceph Dashboard development
--------------------------
Ceph dashboard module is not going to be loaded if previously you have not
generated the frontend bundle.

For now, in order load properly the Ceph Dashboardmodule and to apply frontend
changes you have to run "ng build" on your laptop::

  # Start local frontend build with watcher (in background):
  sudo dnf install -y nodejs
  cd <path-to-your-ceph-repo>
  cd src/pybind/mgr/dashboard/frontend
  sudo chown -R <your-user>:root dist node_modules
  NG_CLI_ANALYTICS=false npm ci
  npm run build -- --deleteOutputPath=false --watch &

After saving your changes, the frontend bundle will be built again.
When completed, you'll see::

  "Localized bundle generation complete."

Then you can reload your Dashboard browser tab.

Cephadm DiD (Docker in Docker) box development environment
==========================================================

As kcli has a long startup time, we created an alternative which is faster using
Docker inside Docker. This approach has its downsides too as we have to
simulate the creation of osds and addition of devices with loopback devices.

Cephadm's DiD environment is a command which requires little to setup. The setup
requires you to get the required docker images for what we call boxes and ceph.
A box is the first layer of docker containers which can be either a seed or a
host. A seed is the main box which holds cephadm and where you bootstrap the
cluster. On the other hand, you have hosts with an ssh server setup so you can
add those hosts to the cluster. The second layer, managed by cephadm, inside the
seed box, requires the ceph image.

.. warning:: This development environment is still experimental and can have unexpected
             behaviour. Please take a look at the road map and the known issues section
             to see what the development progress.

Requirements
------------

* `docker-compose <https://docs.docker.com/compose/install/>`_
* lvm

Setup
-----

In order to setup Cephadm's box run::

  cd src/cephadm/box
  sudo ln -sf "$PWD"/box.py /usr/bin/box
  sudo box -v cluster setup

.. note:: It is recommended to run box with verbose (-v).

After getting all needed images we can create a simple cluster without osds and hosts with::

  sudo box -v cluster start

If you want to deploy the cluster with more osds and hosts::
  # 3 osds and 3 hosts by default
  sudo box -v cluster start --extended
  # explicitly change number of hosts and osds
  sudo box -v cluster start --extended --osds 5 --hosts 5

Without the extended option, explicitly adding either more hosts or osds won't change the state
of the cluster.

.. note:: Cluster start will try to setup even if cluster setup was not called.
.. note:: Osds are created with loopback devices and hence, sudo is needed to
   create loopback devices capable of holding osds.
.. note::  Each osd will require 5GiB of space.

After bootstraping the cluster you can go inside the seed box in which you'll be
able to run cehpadm commands::

  box -v cluster sh
  [root@8d52a7860245] cephadm --help
  ...


If you want to navigate to the dashboard you can find the ip address after running::
  docker ps
  docker inspect <container-id> | grep IPAddress

The address will be https://$IPADDRESS:8443

You can also find the hostname and ip of each box container with::

  sudo box cluster list

and you'll see something like::

  IP               Name            Hostname
  172.30.0.2       box_hosts_1     6283b7b51d91
  172.30.0.3       box_hosts_3     3dcf7f1b25a4
  172.30.0.4       box_seed_1      8d52a7860245
  172.30.0.5       box_hosts_2     c3c7b3273bf1

To remove the cluster and clean up run::

  box cluster down
 
If you just want to clean up the last cluster created run::

  box cluster cleanup

To check all available commands run::

  box --help


Known issues
------------

* If you get permission issues with cephadm because it cannot infer the keyring
  and configuration, please run cephadm like this example::

    cephadm shell --config /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.kerying

* Docker containers run with the --privileged flag enabled which has been seen
  to make some computers log out.

* Sometimes when starting a cluster the osds won't get deployed because cephadm
  takes a while to update the state. If this happens wait and call::

    box -v osd deploy --vg vg1

Road map
------------

* Run containers without --privileged 
* Enable ceph-volume to mark loopback devices as a valid block device in
  the inventory.
* Make DiD ready to run dashboard CI tests (including cluster expansion).

Note regarding network calls from CLI handlers
==============================================

Executing any cephadm CLI commands like ``ceph orch ls`` will block the
mon command handler thread within the MGR, thus preventing any concurrent
CLI calls. Note that pressing ``^C`` will not resolve this situation,
as *only* the client will be aborted, but not execution of the command
within the orchestrator manager module itself. This means, cephadm will
be completely unresponsive until the execution of the CLI handler is
fully completed. Note that even ``ceph orch ps`` will not respond while
another handler is executing.

This means we should do very few synchronous calls to remote hosts.
As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
Everything else should be done asynchronously in other threads, like ``serve()``.

Note regarding different variables used in the code
===================================================

* a ``service_type`` is something like mon, mgr, alertmanager etc defined
  in ``ServiceSpec``
* a ``service_id`` is the name of the service. Some services don't have
  names.
* a ``service_name`` is ``<service_type>.<service_id>``
* a ``daemon_type`` is the same as the service_type, except for ingress,
  which has the haproxy and keepalived daemon types.
* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
  (Not the case for e.g. OSDs. OSDs are always called OSD.N)
* a ``daemon_name`` is ``<daemon_type>.<daemon_id>``
Commit	Line	Data
1911f103 TL	1	=======================
	2	Developing with cephadm
	3	=======================
	4
	5	There are several ways to develop with cephadm. Which you use depends
	6	on what you're trying to accomplish.
	7
	8	vstart --cephadm
	9	================
	10
	11	- Start a cluster with vstart, with cephadm configured
	12	- Manage any additional daemons with cephadm
b3b6e05e	13	- Requires compiled ceph binaries
1911f103 TL	14
	15	In this case, the mon and manager at a minimum are running in the usual
	16	vstart way, not managed by cephadm. But cephadm is enabled and the local
	17	host is added, so you can deploy additional daemons or add additional hosts.
	18
	19	This works well for developing cephadm itself, because any mgr/cephadm
	20	or cephadm/cephadm code changes can be applied by kicking ceph-mgr
	21	with ``ceph mgr fail x``. (When the mgr (re)starts, it loads the
	22	cephadm/cephadm script into memory.)
	23
	24	::
	25
	26	MON=1 MGR=1 OSD=0 MDS=0 ../src/vstart.sh -d -n -x --cephadm
	27
	28	- ``~/.ssh/id_dsa[.pub]`` is used as the cluster key. It is assumed that
f6b5b4d7 TL	29	this key is authorized to ssh with no passphrase to root@`hostname`.
	30	- cephadm does not try to manage any daemons started by vstart.sh (any
	31	nonzero number in the environment variables). No service spec is defined
	32	for mon or mgr.
1911f103	33	- You'll see health warnings from cephadm about stray daemons--that's because
f6b5b4d7	34	the vstart-launched daemons aren't controlled by cephadm.
39ae355f	35	- The default image is ``quay.io/ceph-ci/ceph:main``, but you can change
1911f103 TL	36	this by passing ``-o container_image=...`` or ``ceph config set global container_image ...``.
	37
	38
	39	cstart and cpatch
	40	=================
	41
	42	The ``cstart.sh`` script will launch a cluster using cephadm and put the
	43	conf and keyring in your build dir, so that the ``bin/ceph ...`` CLI works
	44	(just like with vstart). The ``ckill.sh`` script will tear it down.
	45
	46	- A unique but stable fsid is stored in ``fsid`` (in the build dir).
	47	- The mon port is random, just like with vstart.
	48	- The container image is ``quay.io/ceph-ci/ceph:$tag`` where $tag is
	49	the first 8 chars of the fsid.
	50	- If the container image doesn't exist yet when you run cstart for the
	51	first time, it is built with cpatch.
	52
	53	There are a few advantages here:
	54
	55	- The cluster is a "normal" cephadm cluster that looks and behaves
20effc67 TL	56	just like a user's cluster would. In contrast, vstart and teuthology
	57	clusters tend to be special in subtle (and not-so-subtle) ways (e.g.
	58	having the ``lockdep`` turned on).
1911f103 TL	59
	60	To start a test cluster::
	61
	62	sudo ../src/cstart.sh
	63
20effc67 TL	64	The last line of the output will be a line you can cut+paste to update
20effc67 TL	65	the container image. For instance::
1911f103	66
f67539c2	67	sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e
1911f103 TL	68
	69	By default, cpatch will patch everything it can think of from the local
	70	build dir into the container image. If you are working on a specific
	71	part of the system, though, can you get away with smaller changes so that
	72	cpatch runs faster. For instance::
	73
f67539c2	74	sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --py
1911f103 TL	75
	76	will update the mgr modules (minus the dashboard). Or::
	77
f67539c2	78	sudo ../src/script/cpatch -t quay.io/ceph-ci/ceph:8f509f4e --core
1911f103 TL	79
	80	will do most binaries and libraries. Pass ``-h`` to cpatch for all options.
	81
	82	Once the container is updated, you can refresh/restart daemons by bouncing
	83	them with::
	84
	85	sudo systemctl restart ceph-`cat fsid`.target
	86
	87	When you're done, you can tear down the cluster with::
	88
	89	sudo ../src/ckill.sh # or,
	90	sudo ../src/cephadm/cephadm rm-cluster --force --fsid `cat fsid`
e306af50	91
b3b6e05e TL	92	cephadm bootstrap --shared_ceph_folder
	93	======================================
	94
	95	Cephadm can also be used directly without compiled ceph binaries.
	96
	97	Run cephadm like so::
	98
	99	sudo ./cephadm bootstrap --mon-ip 127.0.0.1 \
	100	--ssh-private-key /home/<user>/.ssh/id_rsa \
	101	--skip-mon-network \
	102	--skip-monitoring-stack --single-host-defaults \
20effc67	103	--skip-dashboard \
b3b6e05e TL	104	--shared_ceph_folder /home/<user>/path/to/ceph/
	105
	106	- ``~/.ssh/id_rsa`` is used as the cluster key. It is assumed that
	107	this key is authorized to ssh with no passphrase to root@`hostname`.
	108
	109	Source code changes made in the ``pybind/mgr/`` directory then
20effc67	110	require a daemon restart to take effect.
522d829b	111
b3b6e05e TL	112	Kcli: a virtualization management tool to make easy orchestrators development
	113	=============================================================================
	114	`Kcli <https://github.com/karmab/kcli>`_ is meant to interact with existing
	115	virtualization providers (libvirt, KubeVirt, oVirt, OpenStack, VMware vSphere,
	116	GCP and AWS) and to easily deploy and customize VMs from cloud images.
	117
	118	It allows you to setup an environment with several vms with your preferred
20effc67	119	configuration (memory, cpus, disks) and OS flavor.
b3b6e05e TL	120
	121	main advantages:
	122	----------------
20effc67	123	- Fast. Typically you can have a completely new Ceph cluster ready to debug
b3b6e05e	124	and develop orchestrator features in less than 5 minutes.
20effc67 TL	125	- "Close to production" lab. The resulting lab is close to "real" clusters
	126	in QE labs or even production. It makes it easy to test "real things" in
	127	an almost "real" environment.
	128	- Safe and isolated. Does not depend of the things you have installed in
	129	your machine. And the vms are isolated from your environment.
b3b6e05e TL	130	- Easy to work "dev" environment. For "not compilated" software pieces,
	131	for example any mgr module. It is an environment that allow you to test your
	132	changes interactively.
	133
	134	Installation:
	135	-------------
	136	Complete documentation in `kcli installation <https://kcli.readthedocs.io/en/latest/#installation>`_
20effc67	137	but we suggest to use the container image approach.
b3b6e05e TL	138
	139	So things to do:
	140	- 1. Review `requeriments <https://kcli.readthedocs.io/en/latest/#libvirt-hypervisor-requisites>`_
20effc67	141	and install/configure whatever is needed to meet them.
b3b6e05e TL	142	- 2. get the kcli image and create one alias for executing the kcli command
	143	::
	144
	145	# podman pull quay.io/karmab/kcli
	146	# alias kcli='podman run --net host -it --rm --security-opt label=disable -v $HOME/.ssh:/root/.ssh -v $HOME/.kcli:/root/.kcli -v /var/lib/libvirt/images:/var/lib/libvirt/images -v /var/run/libvirt:/var/run/libvirt -v $PWD:/workdir -v /var/tmp:/ignitiondir quay.io/karmab/kcli'
	147
20effc67	148	.. note:: This assumes that /var/lib/libvirt/images is your default libvirt pool.... Adjust if using a different path
b3b6e05e TL	149
b3b6e05e TL	150	.. note:: Once you have used your kcli tool to create and use different labs, we
20effc67 TL	151	suggest you stick to a given container tag and update your kcli alias.
	152	Why? kcli uses a rolling release model and sticking to a specific
	153	container tag will improve overall stability.
	154	what we want is overall stability.
b3b6e05e TL	155
	156	Test your kcli installation:
	157	----------------------------
	158	See the kcli `basic usage workflow <https://kcli.readthedocs.io/en/latest/#basic-workflow>`_
	159
	160	Create a Ceph lab cluster
	161	-------------------------
20effc67	162	In order to make this task simple, we are going to use a "plan".
b3b6e05e	163
20effc67	164	A "plan" is a file where you can define a set of vms with different settings.
b3b6e05e TL	165	You can define hardware parameters (cpu, memory, disks ..), operating system and
	166	it also allows you to automate the installation and configuration of any
	167	software you want to have.
	168
	169	There is a `repository <https://github.com/karmab/kcli-plans>`_ with a collection of
	170	plans that can be used for different purposes. And we have predefined plans to
20effc67	171	install Ceph clusters using Ceph ansible or cephadm, so let's create our first Ceph
b3b6e05e TL	172	cluster using cephadm::
b3b6e05e TL	173
20effc67	174	# kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml
b3b6e05e TL	175
b3b6e05e TL	176	This will create a set of three vms using the plan file pointed by the url.
20effc67	177	After a few minutes, let's check the cluster:
b3b6e05e TL	178
	179	* Take a look to the vms created::
	180
	181	# kcli list vms
	182
	183	* Enter in the bootstrap node::
	184
	185	# kcli ssh ceph-node-00
	186
	187	* Take a look to the ceph cluster installed::
	188
	189	[centos@ceph-node-00 ~]$ sudo -i
	190	[root@ceph-node-00 ~]# cephadm version
	191	[root@ceph-node-00 ~]# cephadm shell
	192	[ceph: root@ceph-node-00 /]# ceph orch host ls
	193
	194	Create a Ceph cluster to make easy developing in mgr modules (Orchestrators and Dashboard)
	195	------------------------------------------------------------------------------------------
	196	The cephadm kcli plan (and cephadm) are prepared to do that.
	197
	198	The idea behind this method is to replace several python mgr folders in each of
	199	the ceph daemons with the source code folders in your host machine.
	200	This "trick" will allow you to make changes in any orchestrator or dashboard
	201	module and test them intermediately. (only needed to disable/enable the mgr module)
	202
	203	So in order to create a ceph cluster for development purposes you must use the
20effc67	204	same cephadm plan but with a new parameter pointing to your Ceph source code folder::
b3b6e05e TL	205
	206	# kcli create plan -u https://github.com/karmab/kcli-plans/blob/master/ceph/ceph_cluster.yml -P ceph_dev_folder=/home/mycodefolder/ceph
	207
	208	Ceph Dashboard development
	209	--------------------------
	210	Ceph dashboard module is not going to be loaded if previously you have not
	211	generated the frontend bundle.
	212
	213	For now, in order load properly the Ceph Dashboardmodule and to apply frontend
	214	changes you have to run "ng build" on your laptop::
	215
	216	# Start local frontend build with watcher (in background):
	217	sudo dnf install -y nodejs
	218	cd <path-to-your-ceph-repo>
	219	cd src/pybind/mgr/dashboard/frontend
	220	sudo chown -R <your-user>:root dist node_modules
	221	NG_CLI_ANALYTICS=false npm ci
	222	npm run build -- --deleteOutputPath=false --watch &
	223
	224	After saving your changes, the frontend bundle will be built again.
	225	When completed, you'll see::
	226
	227	"Localized bundle generation complete."
	228
	229	Then you can reload your Dashboard browser tab.
20effc67 TL	230
	231	Cephadm DiD (Docker in Docker) box development environment
	232	==========================================================
	233
	234	As kcli has a long startup time, we created an alternative which is faster using
	235	Docker inside Docker. This approach has its downsides too as we have to
	236	simulate the creation of osds and addition of devices with loopback devices.
	237
	238	Cephadm's DiD environment is a command which requires little to setup. The setup
	239	requires you to get the required docker images for what we call boxes and ceph.
	240	A box is the first layer of docker containers which can be either a seed or a
	241	host. A seed is the main box which holds cephadm and where you bootstrap the
	242	cluster. On the other hand, you have hosts with an ssh server setup so you can
	243	add those hosts to the cluster. The second layer, managed by cephadm, inside the
	244	seed box, requires the ceph image.
	245
	246	.. warning:: This development environment is still experimental and can have unexpected
	247	behaviour. Please take a look at the road map and the known issues section
	248	to see what the development progress.
	249
	250	Requirements
	251	------------
	252
	253	* `docker-compose <https://docs.docker.com/compose/install/>`_
	254	* lvm
	255
	256	Setup
	257	-----
	258
	259	In order to setup Cephadm's box run::
	260
	261	cd src/cephadm/box
	262	sudo ln -sf "$PWD"/box.py /usr/bin/box
	263	sudo box -v cluster setup
	264
	265	.. note:: It is recommended to run box with verbose (-v).
	266
33c7a0ef	267	After getting all needed images we can create a simple cluster without osds and hosts with::
20effc67	268
33c7a0ef TL	269	sudo box -v cluster start
	270
	271	If you want to deploy the cluster with more osds and hosts::
	272	# 3 osds and 3 hosts by default
	273	sudo box -v cluster start --extended
	274	# explicitly change number of hosts and osds
	275	sudo box -v cluster start --extended --osds 5 --hosts 5
	276
	277	Without the extended option, explicitly adding either more hosts or osds won't change the state
	278	of the cluster.
20effc67 TL	279
	280	.. note:: Cluster start will try to setup even if cluster setup was not called.
	281	.. note:: Osds are created with loopback devices and hence, sudo is needed to
	282	create loopback devices capable of holding osds.
	283	.. note:: Each osd will require 5GiB of space.
	284
	285	After bootstraping the cluster you can go inside the seed box in which you'll be
	286	able to run cehpadm commands::
	287
	288	box -v cluster sh
	289	[root@8d52a7860245] cephadm --help
	290	...
	291
	292
	293	If you want to navigate to the dashboard you can find the ip address after running::
	294	docker ps
	295	docker inspect <container-id> \| grep IPAddress
	296
	297	The address will be https://$IPADDRESS:8443
	298
	299	You can also find the hostname and ip of each box container with::
	300
	301	sudo box cluster list
	302
	303	and you'll see something like::
	304
	305	IP Name Hostname
	306	172.30.0.2 box_hosts_1 6283b7b51d91
	307	172.30.0.3 box_hosts_3 3dcf7f1b25a4
	308	172.30.0.4 box_seed_1 8d52a7860245
	309	172.30.0.5 box_hosts_2 c3c7b3273bf1
	310
	311	To remove the cluster and clean up run::
	312
	313	box cluster down
	314
	315	If you just want to clean up the last cluster created run::
	316
	317	box cluster cleanup
	318
	319	To check all available commands run::
	320
	321	box --help
	322
	323
	324	Known issues
	325	------------
	326
	327	* If you get permission issues with cephadm because it cannot infer the keyring
	328	and configuration, please run cephadm like this example::
	329
	330	cephadm shell --config /etc/ceph/ceph.conf --keyring /etc/ceph/ceph.kerying
	331
	332	* Docker containers run with the --privileged flag enabled which has been seen
	333	to make some computers log out.
	334
	335	* Sometimes when starting a cluster the osds won't get deployed because cephadm
	336	takes a while to update the state. If this happens wait and call::
	337
	338	box -v osd deploy --vg vg1
	339
	340	Road map
	341	------------
	342
343	* Run containers without --privileged
344	* Enable ceph-volume to mark loopback devices as a valid block device in
345	the inventory.
346	* Make DiD ready to run dashboard CI tests (including cluster expansion).
347
348	Note regarding network calls from CLI handlers
349	==============================================
350
351	Executing any cephadm CLI commands like ``ceph orch ls`` will block the
352	mon command handler thread within the MGR, thus preventing any concurrent
353	CLI calls. Note that pressing ``^C`` will not resolve this situation,
354	as only the client will be aborted, but not execution of the command
355	within the orchestrator manager module itself. This means, cephadm will
356	be completely unresponsive until the execution of the CLI handler is
357	fully completed. Note that even ``ceph orch ps`` will not respond while
358	another handler is executing.
359
360	This means we should do very few synchronous calls to remote hosts.
361	As a guideline, cephadm should do at most ``O(1)`` network calls in CLI handlers.
362	Everything else should be done asynchronously in other threads, like ``serve()``.
363
364	Note regarding different variables used in the code
365	===================================================
366
367	* a ``service_type`` is something like mon, mgr, alertmanager etc defined
368	in ``ServiceSpec``
369	* a ``service_id`` is the name of the service. Some services don't have
370	names.
371	* a ``service_name`` is ``<service_type>.<service_id>``
372	* a ``daemon_type`` is the same as the service_type, except for ingress,
373	which has the haproxy and keepalived daemon types.
374	* a ``daemon_id`` is typically ``<service_id>.<hostname>.<random-string>``.
375	(Not the case for e.g. OSDs. OSDs are always called OSD.N)
376	* a ``daemon_name`` is ``<daemon_type>.<daemon_id>``