]> git.proxmox.com Git - ceph.git/blame - ceph/doc/dev/developer_guide/tests-integration-tests.rst
buildsys: switch source download to quincy
[ceph.git] / ceph / doc / dev / developer_guide / tests-integration-tests.rst
CommitLineData
f67539c2
TL
1.. _testing-integration-tests:
2
9f95a23c
TL
3Testing - Integration Tests
4===========================
5
f67539c2 6Ceph has two types of tests: :ref:`make check <make-check>` tests and integration tests.
9f95a23c
TL
7When a test requires multiple machines, root access or lasts for a
8longer time (for example, to simulate a realistic Ceph deployment), it
9is deemed to be an integration test. Integration tests are organized into
10"suites", which are defined in the `ceph/qa sub-directory`_ and run with
11the ``teuthology-suite`` command.
12
13The ``teuthology-suite`` command is part of the `teuthology framework`_.
14In the sections that follow we attempt to provide a detailed introduction
15to that framework from the perspective of a beginning Ceph developer.
16
17Teuthology consumes packages
18----------------------------
19
20It may take some time to understand the significance of this fact, but it
21is `very` significant. It means that automated tests can be conducted on
22multiple platforms using the same packages (RPM, DEB) that can be
23installed on any machine running those platforms.
24
25Teuthology has a `list of platforms that it supports
26<https://github.com/ceph/ceph/tree/master/qa/distros/supported>`_ (as
f67539c2 27of September 2020 the list consisted of "RHEL/CentOS 8" and "Ubuntu 18.04"). It
9f95a23c
TL
28expects to be provided pre-built Ceph packages for these platforms.
29Teuthology deploys these platforms on machines (bare-metal or
30cloud-provisioned), installs the packages on them, and deploys Ceph
31clusters on them - all as called for by the test.
32
33The Nightlies
34-------------
35
36A number of integration tests are run on a regular basis in the `Sepia
37lab`_ against the official Ceph repositories (on the ``master`` development
38branch and the stable branches). Traditionally, these tests are called "the
39nightlies" because the Ceph core developers used to live and work in
40the same time zone and from their perspective the tests were run overnight.
41
42The results of the nightlies are published at http://pulpito.ceph.com/. The
43developer nick shows in the
44test results URL and in the first column of the Pulpito dashboard. The
45results are also reported on the `ceph-qa mailing list
46<https://ceph.com/irc/>`_ for analysis.
47
48Testing Priority
49----------------
50
51The ``teuthology-suite`` command includes an almost mandatory option ``-p <N>``
52which specifies the priority of the jobs submitted to the queue. The lower
53the value of ``N``, the higher the priority. The option is almost mandatory
54because the default is ``1000`` which matches the priority of the nightlies.
55Nightlies are often half-finished and cancelled due to the volume of testing
56done so your jobs may never finish. Therefore, it is common to select a
57priority less than 1000.
58
f67539c2 59Job priority should be selected based on the following recommendations:
9f95a23c
TL
60
61* **Priority < 10:** Use this if the sky is falling and some group of tests
62 must be run ASAP.
63
64* **10 <= Priority < 50:** Use this if your tests are urgent and blocking
65 other important development.
66
67* **50 <= Priority < 75:** Use this if you are testing a particular
68 feature/fix and running fewer than about 25 jobs. This range can also be
69 used for urgent release testing.
70
71* **75 <= Priority < 100:** Tech Leads will regularly schedule integration
72 tests with this priority to verify pull requests against master.
73
74* **100 <= Priority < 150:** This priority is to be used for QE validation of
75 point releases.
76
77* **150 <= Priority < 200:** Use this priority for 100 jobs or fewer of a
78 particular feature/fix that you'd like results on in a day or so.
79
80* **200 <= Priority < 1000:** Use this priority for large test runs that can
81 be done over the course of a week.
82
83In case you don't know how many jobs would be triggered by
84``teuthology-suite`` command, use ``--dry-run`` to get a count first and then
85issue ``teuthology-suite`` command again, this time without ``--dry-run`` and
86with ``-p`` and an appropriate number as an argument to it.
87
f67539c2
TL
88To skip the priority check, use ``--force-priority``. In order to be sensitive
89to the runs of other developers who also need to do testing, please use it in
90emergency only.
91
9f95a23c
TL
92Suites Inventory
93----------------
94
95The ``suites`` directory of the `ceph/qa sub-directory`_ contains
96all the integration tests, for all the Ceph components.
97
98`ceph-deploy <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy>`_
99 install a Ceph cluster with ``ceph-deploy`` (:ref:`ceph-deploy man page <ceph-deploy>`)
100
101`dummy <https://github.com/ceph/ceph/tree/master/qa/suites/dummy>`_
102 get a machine, do nothing and return success (commonly used to
f67539c2 103 verify the :ref:`testing-integration-tests` infrastructure works as expected)
9f95a23c
TL
104
105`fs <https://github.com/ceph/ceph/tree/master/qa/suites/fs>`_
f67539c2 106 test CephFS mounted using kernel and FUSE clients, also with multiple MDSs.
9f95a23c
TL
107
108`krbd <https://github.com/ceph/ceph/tree/master/qa/suites/krbd>`_
109 test the RBD kernel module
110
9f95a23c
TL
111`powercycle <https://github.com/ceph/ceph/tree/master/qa/suites/powercycle>`_
112 verify the Ceph cluster behaves when machines are powered off
113 and on again
114
115`rados <https://github.com/ceph/ceph/tree/master/qa/suites/rados>`_
116 run Ceph clusters including OSDs and MONs, under various conditions of
117 stress
118
119`rbd <https://github.com/ceph/ceph/tree/master/qa/suites/rbd>`_
120 run RBD tests using actual Ceph clusters, with and without qemu
121
122`rgw <https://github.com/ceph/ceph/tree/master/qa/suites/rgw>`_
123 run RGW tests using actual Ceph clusters
124
125`smoke <https://github.com/ceph/ceph/tree/master/qa/suites/smoke>`_
126 run tests that exercise the Ceph API with an actual Ceph cluster
127
128`teuthology <https://github.com/ceph/ceph/tree/master/qa/suites/teuthology>`_
129 verify that teuthology can run integration tests, with and without OpenStack
130
131`upgrade <https://github.com/ceph/ceph/tree/master/qa/suites/upgrade>`_
132 for various versions of Ceph, verify that upgrades can happen
133 without disrupting an ongoing workload
134
135.. _`ceph-deploy man page`: ../../man/8/ceph-deploy
136
137teuthology-describe-tests
138-------------------------
139
140In February 2016, a new feature called ``teuthology-describe-tests`` was
141added to the `teuthology framework`_ to facilitate documentation and better
142understanding of integration tests (`feature announcement
143<http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29287>`_).
144
145The upshot is that tests can be documented by embedding ``meta:``
146annotations in the yaml files used to define the tests. The results can be
147seen in the `ceph-qa-suite wiki
148<http://tracker.ceph.com/projects/ceph-qa-suite/wiki/>`_.
149
150Since this is a new feature, many yaml files have yet to be annotated.
151Developers are encouraged to improve the documentation, in terms of both
152coverage and quality.
153
154How integration tests are run
155-----------------------------
156
157Given that - as a new Ceph developer - you will typically not have access
158to the `Sepia lab`_, you may rightly ask how you can run the integration
159tests in your own environment.
160
161One option is to set up a teuthology cluster on bare metal. Though this is
162a non-trivial task, it `is` possible. Here are `some notes
163<http://docs.ceph.com/teuthology/docs/LAB_SETUP.html>`_ to get you started
164if you decide to go this route.
165
166If you have access to an OpenStack tenant, you have another option: the
167`teuthology framework`_ has an OpenStack backend, which is documented `here
168<https://github.com/dachary/teuthology/tree/openstack#openstack-backend>`__.
169This OpenStack backend can build packages from a given git commit or
170branch, provision VMs, install the packages and run integration tests
171on those VMs. This process is controlled using a tool called
172``ceph-workbench ceph-qa-suite``. This tool also automates publishing of
173test results at http://teuthology-logs.public.ceph.com.
174
175Running integration tests on your code contributions and publishing the
176results allows reviewers to verify that changes to the code base do not
177cause regressions, or to analyze test failures when they do occur.
178
179Every teuthology cluster, whether bare-metal or cloud-provisioned, has a
180so-called "teuthology machine" from which tests suites are triggered using the
181``teuthology-suite`` command.
182
183A detailed and up-to-date description of each `teuthology-suite`_ option is
f67539c2
TL
184available by running the following command on the teuthology machine
185
186.. prompt:: bash $
9f95a23c 187
f67539c2 188 teuthology-suite --help
9f95a23c
TL
189
190.. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html
191
192How integration tests are defined
193---------------------------------
194
195Integration tests are defined by yaml files found in the ``suites``
196subdirectory of the `ceph/qa sub-directory`_ and implemented by python
197code found in the ``tasks`` subdirectory. Some tests ("standalone tests")
198are defined in a single yaml file, while other tests are defined by a
199directory tree containing yaml files that are combined, at runtime, into a
200larger yaml file.
201
202Reading a standalone test
203-------------------------
204
205Let us first examine a standalone test, or "singleton".
206
207Here is a commented example using the integration test
208`rados/singleton/all/admin-socket.yaml
209<https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/admin-socket.yaml>`_
f67539c2
TL
210
211.. code-block:: yaml
9f95a23c
TL
212
213 roles:
214 - - mon.a
215 - osd.0
216 - osd.1
217 tasks:
218 - install:
219 - ceph:
220 - admin_socket:
221 osd.0:
222 version:
223 git_version:
224 help:
225 config show:
226 config set filestore_dump_file /tmp/foo:
227 perf dump:
228 perf schema:
229
230The ``roles`` array determines the composition of the cluster (how
231many MONs, OSDs, etc.) on which this test is designed to run, as well
232as how these roles will be distributed over the machines in the
233testing cluster. In this case, there is only one element in the
234top-level array: therefore, only one machine is allocated to the
235test. The nested array declares that this machine shall run a MON with
236id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs
237(``osd.0`` and ``osd.1``).
238
239The body of the test is in the ``tasks`` array: each element is
240evaluated in order, causing the corresponding python file found in the
241``tasks`` subdirectory of the `teuthology repository`_ or
242`ceph/qa sub-directory`_ to be run. "Running" in this case means calling
243the ``task()`` function defined in that file.
244
245In this case, the `install
246<https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_
247task comes first. It installs the Ceph packages on each machine (as
248defined by the ``roles`` array). A full description of the ``install``
249task is `found in the python file
250<https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_
251(search for "def task").
252
253The ``ceph`` task, which is documented `here
254<https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py>`__ (again,
255search for "def task"), starts OSDs and MONs (and possibly MDSs as well)
256as required by the ``roles`` array. In this example, it will start one MON
257(``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same
258machine. Control moves to the next task when the Ceph cluster reaches
259``HEALTH_OK`` state.
260
261The next task is ``admin_socket`` (`source code
262<https://github.com/ceph/ceph/blob/master/qa/tasks/admin_socket.py>`_).
263The parameter of the ``admin_socket`` task (and any other task) is a
264structure which is interpreted as documented in the task. In this example
265the parameter is a set of commands to be sent to the admin socket of
266``osd.0``. The task verifies that each of them returns on success (i.e.
267exit code zero).
268
f67539c2
TL
269This test can be run with
270
271.. prompt:: bash $
9f95a23c 272
f67539c2 273 teuthology-suite --machine-type smithi --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml
9f95a23c
TL
274
275Test descriptions
276-----------------
277
278Each test has a "test description", which is similar to a directory path,
279but not the same. In the case of a standalone test, like the one in
280`Reading a standalone test`_, the test description is identical to the
281relative path (starting from the ``suites/`` directory of the
282`ceph/qa sub-directory`_) of the yaml file defining the test.
283
284Much more commonly, tests are defined not by a single yaml file, but by a
285`directory tree of yaml files`. At runtime, the tree is walked and all yaml
286files (facets) are combined into larger yaml "programs" that define the
287tests. A full listing of the yaml defining the test is included at the
288beginning of every test log.
289
290In these cases, the description of each test consists of the
291subdirectory under `suites/
292<https://github.com/ceph/ceph/tree/master/qa/suites>`_ containing the
293yaml facets, followed by an expression in curly braces (``{}``) consisting of
294a list of yaml facets in order of concatenation. For instance the
295test description::
296
297 ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml}
298
299signifies the concatenation of two files:
300
301* ceph-deploy/basic/distros/centos_7.0.yaml
302* ceph-deploy/basic/tasks/ceph-deploy.yaml
303
304How tests are built from directories
305------------------------------------
306
307As noted in the previous section, most tests are not defined in a single
308yaml file, but rather as a `combination` of files collected from a
309directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_.
310
311The set of all tests defined by a given subdirectory of ``suites/`` is
312called an "integration test suite", or a "teuthology suite".
313
314Combination of yaml facets is controlled by special files (``%`` and
315``+``) that are placed within the directory tree and can be thought of as
316operators. The ``%`` file is the "convolution" operator and ``+``
317signifies concatenation.
318
319Convolution operator
320^^^^^^^^^^^^^^^^^^^^
321
322The convolution operator, implemented as an empty file called ``%``, tells
323teuthology to construct a test matrix from yaml facets found in
324subdirectories below the directory containing the operator.
325
326For example, the `ceph-deploy suite
f67539c2 327<https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy/>`_ is
9f95a23c 328defined by the ``suites/ceph-deploy/`` tree, which consists of the files and
f67539c2 329subdirectories in the following structure
9f95a23c 330
f67539c2
TL
331.. code-block:: none
332
333 qa/suites/ceph-deploy
334 ├── %
335 ├── distros
336 │   ├── centos_latest.yaml
337 │   └── ubuntu_latest.yaml
338 └── tasks
339 ├── ceph-admin-commands.yaml
340 └── rbd_import_export.yaml
9f95a23c
TL
341
342This is interpreted as a 2x1 matrix consisting of two tests:
343
3441. ceph-deploy/basic/{distros/centos_7.0.yaml tasks/ceph-deploy.yaml}
3452. ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml}
346
347i.e. the concatenation of centos_7.0.yaml and ceph-deploy.yaml and
348the concatenation of ubuntu_16.04.yaml and ceph-deploy.yaml, respectively.
349In human terms, this means that the task found in ``ceph-deploy.yaml`` is
350intended to run on both CentOS 7.0 and Ubuntu 16.04.
351
352Without the file percent, the ``ceph-deploy`` tree would be interpreted as
353three standalone tests:
354
355* ceph-deploy/basic/distros/centos_7.0.yaml
356* ceph-deploy/basic/distros/ubuntu_16.04.yaml
357* ceph-deploy/basic/tasks/ceph-deploy.yaml
358
359(which would of course be wrong in this case).
360
361Referring to the `ceph/qa sub-directory`_, you will notice that the
362``centos_7.0.yaml`` and ``ubuntu_16.04.yaml`` files in the
363``suites/ceph-deploy/basic/distros/`` directory are implemented as symlinks.
364By using symlinks instead of copying, a single file can appear in multiple
365suites. This eases the maintenance of the test framework as a whole.
366
367All the tests generated from the ``suites/ceph-deploy/`` directory tree
f67539c2
TL
368(also known as the "ceph-deploy suite") can be run with
369
370.. prompt:: bash $
9f95a23c 371
f67539c2 372 teuthology-suite --machine-type smithi --suite ceph-deploy
9f95a23c
TL
373
374An individual test from the `ceph-deploy suite`_ can be run by adding the
f67539c2 375``--filter`` option
9f95a23c 376
f67539c2
TL
377.. prompt:: bash $
378
379 teuthology-suite \
380 --machine-type smithi \
9f95a23c
TL
381 --suite ceph-deploy/basic \
382 --filter 'ceph-deploy/basic/{distros/ubuntu_16.04.yaml tasks/ceph-deploy.yaml}'
383
384.. note:: To run a standalone test like the one in `Reading a standalone
385 test`_, ``--suite`` alone is sufficient. If you want to run a single
386 test from a suite that is defined as a directory tree, ``--suite`` must
387 be combined with ``--filter``. This is because the ``--suite`` option
388 understands POSIX relative paths only.
389
390Concatenation operator
391^^^^^^^^^^^^^^^^^^^^^^
392
393For even greater flexibility in sharing yaml files between suites, the
394special file plus (``+``) can be used to concatenate files within a
395directory. For instance, consider the `suites/rbd/thrash
396<https://github.com/ceph/ceph/tree/master/qa/suites/rbd/thrash>`_
f67539c2
TL
397tree
398
399.. code-block:: none
400
401 qa/suites/rbd/thrash
402 ├── %
403 ├── clusters
404 │   ├── +
405 │   ├── fixed-2.yaml
406 │   └── openstack.yaml
407 └── workloads
408 ├── rbd_api_tests_copy_on_read.yaml
409 ├── rbd_api_tests.yaml
410 └── rbd_fsx_rate_limit.yaml
9f95a23c
TL
411
412This creates two tests:
413
414* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}
415* rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml}
416
417Because the ``clusters/`` subdirectory contains the special file plus
418(``+``), all the other files in that subdirectory (``fixed-2.yaml`` and
419``openstack.yaml`` in this case) are concatenated together
420and treated as a single file. Without the special file plus, they would
421have been convolved with the files from the workloads directory to create
422a 2x2 matrix:
423
424* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}
425* rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml}
426* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml}
427* rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml}
428
429The ``clusters/fixed-2.yaml`` file is shared among many suites to
f67539c2
TL
430define the following ``roles``
431
432.. code-block:: yaml
9f95a23c
TL
433
434 roles:
435 - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0]
436 - [mon.b, osd.3, osd.4, osd.5, client.1]
437
438The ``rbd/thrash`` suite as defined above, consisting of two tests,
f67539c2
TL
439can be run with
440
441.. prompt:: bash $
9f95a23c 442
f67539c2 443 teuthology-suite --machine-type smithi --suite rbd/thrash
9f95a23c
TL
444
445A single test from the rbd/thrash suite can be run by adding the
f67539c2 446``--filter`` option
9f95a23c 447
f67539c2
TL
448.. prompt:: bash $
449
450 teuthology-suite \
451 --machine-type smithi \
9f95a23c
TL
452 --suite rbd/thrash \
453 --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}'
454
455Filtering tests by their description
456------------------------------------
457
458When a few jobs fail and need to be run again, the ``--filter`` option
459can be used to select tests with a matching description. For instance, if the
460``rados`` suite fails the `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ test, the following will only
f67539c2
TL
461run the tests that contain this file
462
463.. prompt:: bash $
9f95a23c 464
f67539c2 465 teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml
9f95a23c
TL
466
467The ``--filter-out`` option does the opposite (it matches tests that do `not`
468contain a given string), and can be combined with the ``--filter`` option.
469
470Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings
471(which means the comma character is implicitly forbidden in filenames found in
f67539c2 472the `ceph/qa sub-directory`_). For instance
9f95a23c 473
f67539c2
TL
474.. prompt:: bash $
475
476 teuthology-suite --machine-type smithi --suite rados --filter all/peer.yaml,all/rest-api.yaml
9f95a23c
TL
477
478will run tests that contain either
479`all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_
480or
481`all/rest-api.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/rest-api.yaml>`_
482
483Each string is looked up anywhere in the test description and has to
484be an exact match: they are not regular expressions.
485
486Reducing the number of tests
487----------------------------
488
489The ``rados`` suite generates tens or even hundreds of thousands of tests out
490of a few hundred files. This happens because teuthology constructs test
491matrices from subdirectories wherever it encounters a file named ``%``. For
492instance, all tests in the `rados/basic suite
493<https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic>`_ run with
494different messenger types: ``simple``, ``async`` and ``random``, because they
495are combined (via the special file ``%``) with the `msgr directory
496<https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic/msgr>`_
497
498All integration tests are required to be run before a Ceph release is
499published. When merely verifying whether a contribution can be merged without
500risking a trivial regression, it is enough to run a subset. The ``--subset``
501option can be used to reduce the number of tests that are triggered. For
f67539c2
TL
502instance
503
504.. prompt:: bash $
9f95a23c 505
f67539c2 506 teuthology-suite --machine-type smithi --suite rados --subset 0/4000
9f95a23c
TL
507
508will run as few tests as possible. The tradeoff in this case is that
509not all combinations of test variations will together,
510but no matter how small a ratio is provided in the ``--subset``,
511teuthology will still ensure that all files in the suite are in at
512least one test. Understanding the actual logic that drives this
513requires reading the teuthology source code.
514
515The ``--limit`` option only runs the first ``N`` tests in the suite:
516this is rarely useful, however, because there is no way to control which
517test will be first.
518
519.. _ceph/qa sub-directory: https://github.com/ceph/ceph/tree/master/qa
9f95a23c
TL
520.. _Sepia Lab: https://wiki.sepia.ceph.com/doku.php
521.. _teuthology repository: https://github.com/ceph/teuthology
522.. _teuthology framework: https://github.com/ceph/teuthology