]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/index.rst
abefa34a2d25b4b728ba603ca94d842b8b023012
[ceph.git] / ceph / doc / dev / index.rst
1 ============================================
2 Contributing to Ceph: A Guide for Developers
3 ============================================
4
5 :Author: Loic Dachary
6 :Author: Nathan Cutler
7 :License: Creative Commons Attribution-ShareAlike (CC BY-SA)
8
9 .. note:: The old (pre-2016) developer documentation has been moved to :doc:`/dev/index-old`.
10
11 .. contents::
12 :depth: 3
13
14 Introduction
15 ============
16
17 This guide has two aims. First, it should lower the barrier to entry for
18 software developers who wish to get involved in the Ceph project. Second,
19 it should serve as a reference for Ceph developers.
20
21 We assume that readers are already familiar with Ceph (the distributed
22 object store and file system designed to provide excellent performance,
23 reliability and scalability). If not, please refer to the `project website`_
24 and especially the `publications list`_.
25
26 .. _`project website`: http://ceph.com
27 .. _`publications list`: https://ceph.com/resources/publications/
28
29 Since this document is to be consumed by developers, who are assumed to
30 have Internet access, topics covered elsewhere, either within the Ceph
31 documentation or elsewhere on the web, are treated by linking. If you
32 notice that a link is broken or if you know of a better link, please
33 `report it as a bug`_.
34
35 .. _`report it as a bug`: http://tracker.ceph.com/projects/ceph/issues/new
36
37 Essentials (tl;dr)
38 ==================
39
40 This chapter presents essential information that every Ceph developer needs
41 to know.
42
43 Leads
44 -----
45
46 The Ceph project is led by Sage Weil. In addition, each major project
47 component has its own lead. The following table shows all the leads and
48 their nicks on `GitHub`_:
49
50 .. _github: https://github.com/
51
52 ========= =============== =============
53 Scope Lead GitHub nick
54 ========= =============== =============
55 Ceph Sage Weil liewegas
56 RADOS Samuel Just athanatos
57 RGW Yehuda Sadeh yehudasa
58 RBD Jason Dillaman dillaman
59 CephFS John Spray jcsp
60 Build/Ops Ken Dreyer ktdreyer
61 ========= =============== =============
62
63 The Ceph-specific acronyms in the table are explained in
64 :doc:`/architecture`.
65
66 History
67 -------
68
69 See the `History chapter of the Wikipedia article`_.
70
71 .. _`History chapter of the Wikipedia article`: https://en.wikipedia.org/wiki/Ceph_%28software%29#History
72
73 Licensing
74 ---------
75
76 Ceph is free software.
77
78 Unless stated otherwise, the Ceph source code is distributed under the terms of
79 the LGPL2.1. For full details, see `the file COPYING in the top-level
80 directory of the source-code tree`_.
81
82 .. _`the file COPYING in the top-level directory of the source-code tree`:
83 https://github.com/ceph/ceph/blob/master/COPYING
84
85 Source code repositories
86 ------------------------
87
88 The source code of Ceph lives on `GitHub`_ in a number of repositories below
89 the `Ceph "organization"`_.
90
91 .. _`Ceph "organization"`: https://github.com/ceph
92
93 To make a meaningful contribution to the project as a developer, a working
94 knowledge of git_ is essential.
95
96 .. _git: https://git-scm.com/documentation
97
98 Although the `Ceph "organization"`_ includes several software repositories,
99 this document covers only one: https://github.com/ceph/ceph.
100
101 Redmine issue tracker
102 ---------------------
103
104 Although `GitHub`_ is used for code, Ceph-related issues (Bugs, Features,
105 Backports, Documentation, etc.) are tracked at http://tracker.ceph.com,
106 which is powered by `Redmine`_.
107
108 .. _Redmine: http://www.redmine.org
109
110 The tracker has a Ceph project with a number of subprojects loosely
111 corresponding to the various architectural components (see
112 :doc:`/architecture`).
113
114 Mere `registration`_ in the tracker automatically grants permissions
115 sufficient to open new issues and comment on existing ones.
116
117 .. _registration: http://tracker.ceph.com/account/register
118
119 To report a bug or propose a new feature, `jump to the Ceph project`_ and
120 click on `New issue`_.
121
122 .. _`jump to the Ceph project`: http://tracker.ceph.com/projects/ceph
123 .. _`New issue`: http://tracker.ceph.com/projects/ceph/issues/new
124
125 Mailing list
126 ------------
127
128 Ceph development email discussions take place on the mailing list
129 ``ceph-devel@vger.kernel.org``. The list is open to all. Subscribe by
130 sending a message to ``majordomo@vger.kernel.org`` with the line: ::
131
132 subscribe ceph-devel
133
134 in the body of the message.
135
136 There are also `other Ceph-related mailing lists`_.
137
138 .. _`other Ceph-related mailing lists`: https://ceph.com/irc/
139
140 IRC
141 ---
142
143 In addition to mailing lists, the Ceph community also communicates in real
144 time using `Internet Relay Chat`_.
145
146 .. _`Internet Relay Chat`: http://www.irchelp.org/
147
148 See https://ceph.com/irc/ for how to set up your IRC
149 client and a list of channels.
150
151 Submitting patches
152 ------------------
153
154 The canonical instructions for submitting patches are contained in the
155 `the file CONTRIBUTING.rst in the top-level directory of the source-code
156 tree`_. There may be some overlap between this guide and that file.
157
158 .. _`the file CONTRIBUTING.rst in the top-level directory of the source-code tree`:
159 https://github.com/ceph/ceph/blob/master/CONTRIBUTING.rst
160
161 All newcomers are encouraged to read that file carefully.
162
163 Building from source
164 --------------------
165
166 See instructions at :doc:`/install/build-ceph`.
167
168 Using ccache to speed up local builds
169 -------------------------------------
170
171 Rebuilds of the ceph source tree can benefit significantly from use of `ccache`_.
172 Many a times while switching branches and such, one might see build failures for
173 certain older branches mostly due to older build artifacts. These rebuilds can
174 significantly benefit the use of ccache. For a full clean source tree, one could
175 do ::
176
177 $ make clean
178
179 # note the following will nuke everything in the source tree that
180 # isn't tracked by git, so make sure to backup any log files /conf options
181
182 $ git clean -fdx; git submodule foreach git clean -fdx
183
184 ccache is available as a package in most distros. To build ceph with ccache one
185 can::
186
187 $ cmake -DWITH_CCACHE=ON ..
188
189 ccache can also be used for speeding up all builds in the system. for more
190 details refer to the `run modes`_ of the ccache manual. The default settings of
191 ``ccache`` can be displayed with ``ccache -s``.
192
193 .. note: It is recommended to override the ``max_size``, which is the size of
194 cache, defaulting to 10G, to a larger size like 25G or so. Refer to the
195 `configuration`_ section of ccache manual.
196
197 .. _`ccache`: https://ccache.samba.org/
198 .. _`run modes`: https://ccache.samba.org/manual.html#_run_modes
199 .. _`configuration`: https://ccache.samba.org/manual.html#_configuration
200
201 Development-mode cluster
202 ------------------------
203
204 See :doc:`/dev/quick_guide`.
205
206 Backporting
207 -----------
208
209 All bugfixes should be merged to the ``master`` branch before being backported.
210 To flag a bugfix for backporting, make sure it has a `tracker issue`_
211 associated with it and set the ``Backport`` field to a comma-separated list of
212 previous releases (e.g. "hammer,jewel") that you think need the backport.
213 The rest (including the actual backporting) will be taken care of by the
214 `Stable Releases and Backports`_ team.
215
216 .. _`tracker issue`: http://tracker.ceph.com/
217 .. _`Stable Releases and Backports`: http://tracker.ceph.com/projects/ceph-releases/wiki
218
219
220 What is merged where and when ?
221 ===============================
222
223 Commits are merged into branches according to criteria that change
224 during the lifecycle of a Ceph release. This chapter is the inventory
225 of what can be merged in which branch at a given point in time.
226
227 Development releases (i.e. x.0.z)
228 ---------------------------------
229
230 What ?
231 ^^^^^^
232
233 * features
234 * bug fixes
235
236 Where ?
237 ^^^^^^^
238
239 Features are merged to the master branch. Bug fixes should be merged
240 to the corresponding named branch (e.g. "jewel" for 10.0.z, "kraken"
241 for 11.0.z, etc.). However, this is not mandatory - bug fixes can be
242 merged to the master branch as well, since the master branch is
243 periodically merged to the named branch during the development
244 releases phase. In either case, if the bugfix is important it can also
245 be flagged for backport to one or more previous stable releases.
246
247 When ?
248 ^^^^^^
249
250 After the stable release candidates of the previous release enters
251 phase 2 (see below). For example: the "jewel" named branch was
252 created when the infernalis release candidates entered phase 2. From
253 this point on, master was no longer associated with infernalis. As
254 soon as the named branch of the next stable release is created, master
255 starts getting periodically merged into it.
256
257 Branch merges
258 ^^^^^^^^^^^^^
259
260 * The branch of the stable release is merged periodically into master.
261 * The master branch is merged periodically into the branch of the
262 stable release.
263 * The master is merged into the branch of the stable release
264 immediately after each development x.0.z release.
265
266 Stable release candidates (i.e. x.1.z) phase 1
267 ----------------------------------------------
268
269 What ?
270 ^^^^^^
271
272 * bug fixes only
273
274 Where ?
275 ^^^^^^^
276
277 The branch of the stable release (e.g. "jewel" for 10.0.z, "kraken"
278 for 11.0.z, etc.) or master. Bug fixes should be merged to the named
279 branch corresponding to the stable release candidate (e.g. "jewel" for
280 10.1.z) or to master. During this phase, all commits to master will be
281 merged to the named branch, and vice versa. In other words, it makes
282 no difference whether a commit is merged to the named branch or to
283 master - it will make it into the next release candidate either way.
284
285 When ?
286 ^^^^^^
287
288 After the first stable release candidate is published, i.e. after the
289 x.1.0 tag is set in the release branch.
290
291 Branch merges
292 ^^^^^^^^^^^^^
293
294 * The branch of the stable release is merged periodically into master.
295 * The master branch is merged periodically into the branch of the
296 stable release.
297 * The master is merged into the branch of the stable release
298 immediately after each x.1.z release candidate.
299
300 Stable release candidates (i.e. x.1.z) phase 2
301 ----------------------------------------------
302
303 What ?
304 ^^^^^^
305
306 * bug fixes only
307
308 Where ?
309 ^^^^^^^
310
311 The branch of the stable release (e.g. "jewel" for 10.0.z, "kraken"
312 for 11.0.z, etc.). During this phase, all commits to the named branch
313 will be merged into master. Cherry-picking to the named branch during
314 release candidate phase 2 is done manually since the official
315 backporting process only begins when the release is pronounced
316 "stable".
317
318 When ?
319 ^^^^^^
320
321 After Sage Weil decides it is time for phase 2 to happen.
322
323 Branch merges
324 ^^^^^^^^^^^^^
325
326 * The branch of the stable release is merged periodically into master.
327
328 Stable releases (i.e. x.2.z)
329 ----------------------------
330
331 What ?
332 ^^^^^^
333
334 * bug fixes
335 * features are sometime accepted
336 * commits should be cherry-picked from master when possible
337 * commits that are not cherry-picked from master must be about a bug unique to the stable release
338 * see also `the backport HOWTO`_
339
340 .. _`the backport HOWTO`:
341 http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO#HOWTO
342
343 Where ?
344 ^^^^^^^
345
346 The branch of the stable release (hammer for 0.94.x, infernalis for 9.2.x, etc.)
347
348 When ?
349 ^^^^^^
350
351 After the stable release is published, i.e. after the "vx.2.0" tag is
352 set in the release branch.
353
354 Branch merges
355 ^^^^^^^^^^^^^
356
357 Never
358
359 Issue tracker
360 =============
361
362 See `Redmine issue tracker`_ for a brief introduction to the Ceph Issue Tracker.
363
364 Ceph developers use the issue tracker to
365
366 1. keep track of issues - bugs, fix requests, feature requests, backport
367 requests, etc.
368
369 2. communicate with other developers and keep them informed as work
370 on the issues progresses.
371
372 Issue tracker conventions
373 -------------------------
374
375 When you start working on an existing issue, it's nice to let the other
376 developers know this - to avoid duplication of labor. Typically, this is
377 done by changing the :code:`Assignee` field (to yourself) and changing the
378 :code:`Status` to *In progress*. Newcomers to the Ceph community typically do not
379 have sufficient privileges to update these fields, however: they can
380 simply update the issue with a brief note.
381
382 .. table:: Meanings of some commonly used statuses
383
384 ================ ===========================================
385 Status Meaning
386 ================ ===========================================
387 New Initial status
388 In Progress Somebody is working on it
389 Need Review Pull request is open with a fix
390 Pending Backport Fix has been merged, backport(s) pending
391 Resolved Fix and backports (if any) have been merged
392 ================ ===========================================
393
394 Basic workflow
395 ==============
396
397 The following chart illustrates basic development workflow:
398
399 .. ditaa::
400
401 Upstream Code Your Local Environment
402
403 /----------\ git clone /-------------\
404 | Ceph | -------------------------> | ceph/master |
405 \----------/ \-------------/
406 ^ |
407 | | git branch fix_1
408 | git merge |
409 | v
410 /----------------\ git commit --amend /-------------\
411 | make check |---------------------> | ceph/fix_1 |
412 | ceph--qa--suite| \-------------/
413 \----------------/ |
414 ^ | fix changes
415 | | test changes
416 | review | git commit
417 | |
418 | v
419 /--------------\ /-------------\
420 | github |<---------------------- | ceph/fix_1 |
421 | pull request | git push \-------------/
422 \--------------/
423
424 Below we present an explanation of this chart. The explanation is written
425 with the assumption that you, the reader, are a beginning developer who
426 has an idea for a bugfix, but do not know exactly how to proceed.
427
428 Update the tracker
429 ------------------
430
431 Before you start, you should know the `Issue tracker`_ number of the bug
432 you intend to fix. If there is no tracker issue, now is the time to create
433 one.
434
435 The tracker is there to explain the issue (bug) to your fellow Ceph
436 developers and keep them informed as you make progress toward resolution.
437 To this end, then, provide a descriptive title as well as sufficient
438 information and details in the description.
439
440 If you have sufficient tracker permissions, assign the bug to yourself by
441 changing the ``Assignee`` field. If your tracker permissions have not yet
442 been elevated, simply add a comment to the issue with a short message like
443 "I am working on this issue".
444
445 Upstream code
446 -------------
447
448 This section, and the ones that follow, correspond to the nodes in the
449 above chart.
450
451 The upstream code lives in https://github.com/ceph/ceph.git, which is
452 sometimes referred to as the "upstream repo", or simply "upstream". As the
453 chart illustrates, we will make a local copy of this code, modify it, test
454 our modifications, and submit the modifications back to the upstream repo
455 for review.
456
457 A local copy of the upstream code is made by
458
459 1. forking the upstream repo on GitHub, and
460 2. cloning your fork to make a local working copy
461
462 See the `the GitHub documentation
463 <https://help.github.com/articles/fork-a-repo/#platform-linux>`_ for
464 detailed instructions on forking. In short, if your GitHub username is
465 "mygithubaccount", your fork of the upstream repo will show up at
466 https://github.com/mygithubaccount/ceph. Once you have created your fork,
467 you clone it by doing:
468
469 .. code::
470
471 $ git clone https://github.com/mygithubaccount/ceph
472
473 While it is possible to clone the upstream repo directly, in this case you
474 must fork it first. Forking is what enables us to open a `GitHub pull
475 request`_.
476
477 For more information on using GitHub, refer to `GitHub Help
478 <https://help.github.com/>`_.
479
480 Local environment
481 -----------------
482
483 In the local environment created in the previous step, you now have a
484 copy of the ``master`` branch in ``remotes/origin/master``. Since the fork
485 (https://github.com/mygithubaccount/ceph.git) is frozen in time and the
486 upstream repo (https://github.com/ceph/ceph.git, typically abbreviated to
487 ``ceph/ceph.git``) is updated frequently by other developers, you will need
488 to sync your fork periodically. To do this, first add the upstream repo as
489 a "remote" and fetch it::
490
491 $ git remote add ceph https://github.com/ceph/ceph.git
492 $ git fetch ceph
493
494 Fetching downloads all objects (commits, branches) that were added since
495 the last sync. After running these commands, all the branches from
496 ``ceph/ceph.git`` are downloaded to the local git repo as
497 ``remotes/ceph/$BRANCH_NAME`` and can be referenced as
498 ``ceph/$BRANCH_NAME`` in certain git commands.
499
500 For example, your local ``master`` branch can be reset to the upstream Ceph
501 ``master`` branch by doing::
502
503 $ git fetch ceph
504 $ git checkout master
505 $ git reset --hard ceph/master
506
507 Finally, the ``master`` branch of your fork can then be synced to upstream
508 master by::
509
510 $ git push -u origin master
511
512 Bugfix branch
513 -------------
514
515 Next, create a branch for the bugfix:
516
517 .. code::
518
519 $ git checkout master
520 $ git checkout -b fix_1
521 $ git push -u origin fix_1
522
523 This creates a ``fix_1`` branch locally and in our GitHub fork. At this
524 point, the ``fix_1`` branch is identical to the ``master`` branch, but not
525 for long! You are now ready to modify the code.
526
527 Fix bug locally
528 ---------------
529
530 At this point, change the status of the tracker issue to "In progress" to
531 communicate to the other Ceph developers that you have begun working on a
532 fix. If you don't have permission to change that field, your comment that
533 you are working on the issue is sufficient.
534
535 Possibly, your fix is very simple and requires only minimal testing.
536 More likely, it will be an iterative process involving trial and error, not
537 to mention skill. An explanation of how to fix bugs is beyond the
538 scope of this document. Instead, we focus on the mechanics of the process
539 in the context of the Ceph project.
540
541 A detailed discussion of the tools available for validating your bugfixes,
542 see the `Testing`_ chapter.
543
544 For now, let us just assume that you have finished work on the bugfix and
545 that you have tested it and believe it works. Commit the changes to your local
546 branch using the ``--signoff`` option::
547
548 $ git commit -as
549
550 and push the changes to your fork::
551
552 $ git push origin fix_1
553
554 GitHub pull request
555 -------------------
556
557 The next step is to open a GitHub pull request. The purpose of this step is
558 to make your bugfix available to the community of Ceph developers. They
559 will review it and may do additional testing on it.
560
561 In short, this is the point where you "go public" with your modifications.
562 Psychologically, you should be prepared to receive suggestions and
563 constructive criticism. Don't worry! In our experience, the Ceph project is
564 a friendly place!
565
566 If you are uncertain how to use pull requests, you may read
567 `this GitHub pull request tutorial`_.
568
569 .. _`this GitHub pull request tutorial`:
570 https://help.github.com/articles/using-pull-requests/
571
572 For some ideas on what constitutes a "good" pull request, see
573 the `Git Commit Good Practice`_ article at the `OpenStack Project Wiki`_.
574
575 .. _`Git Commit Good Practice`: https://wiki.openstack.org/wiki/GitCommitMessages
576 .. _`OpenStack Project Wiki`: https://wiki.openstack.org/wiki/Main_Page
577
578 Once your pull request (PR) is opened, update the `Issue tracker`_ by
579 adding a comment to the bug pointing the other developers to your PR. The
580 update can be as simple as::
581
582 *PR*: https://github.com/ceph/ceph/pull/$NUMBER_OF_YOUR_PULL_REQUEST
583
584 Automated PR validation
585 -----------------------
586
587 When your PR hits GitHub, the Ceph project's `Continuous Integration (CI)
588 <https://en.wikipedia.org/wiki/Continuous_integration>`_
589 infrastructure will test it automatically. At the time of this writing
590 (March 2016), the automated CI testing included a test to check that the
591 commits in the PR are properly signed (see `Submitting patches`_) and a
592 ``make check`` test.
593
594 The latter, ``make check``, builds the PR and runs it through a battery of
595 tests. These tests run on machines operated by the Ceph Continuous
596 Integration (CI) team. When the tests complete, the result will be shown
597 on GitHub in the pull request itself.
598
599 You can (and should) also test your modifications before you open a PR.
600 Refer to the `Testing`_ chapter for details.
601
602 Integration tests AKA ceph-qa-suite
603 -----------------------------------
604
605 Since Ceph is a complex beast, it may also be necessary to test your fix to
606 see how it behaves on real clusters running either on real or virtual
607 hardware. Tests designed for this purpose live in the `ceph/qa
608 sub-directory`_ and are run via the `teuthology framework`_.
609
610 .. _`ceph/qa sub-directory`: https://github.com/ceph/ceph/tree/master/qa/
611 .. _`teuthology repository`: https://github.com/ceph/teuthology
612 .. _`teuthology framework`: https://github.com/ceph/teuthology
613
614 If you have access to an OpenStack tenant, you are encouraged to run the
615 integration tests yourself using `ceph-workbench ceph-qa-suite`_,
616 and to post the test results to the PR.
617
618 .. _`ceph-workbench ceph-qa-suite`: http://ceph-workbench.readthedocs.org/
619
620 The Ceph community has access to the `Sepia lab
621 <http://ceph.github.io/sepia/>`_ where integration tests can be run on
622 real hardware. Other developers may add tags like "needs-qa" to your PR.
623 This allows PRs that need testing to be merged into a single branch and
624 tested all at the same time. Since teuthology suites can take hours
625 (even days in some cases) to run, this can save a lot of time.
626
627 Integration testing is discussed in more detail in the `Testing`_ chapter.
628
629 Code review
630 -----------
631
632 Once your bugfix has been thoroughly tested, or even during this process,
633 it will be subjected to code review by other developers. This typically
634 takes the form of correspondence in the PR itself, but can be supplemented
635 by discussions on `IRC`_ and the `Mailing list`_.
636
637 Amending your PR
638 ----------------
639
640 While your PR is going through `Testing`_ and `Code review`_, you can
641 modify it at any time by editing files in your local branch.
642
643 After the changes are committed locally (to the ``fix_1`` branch in our
644 example), they need to be pushed to GitHub so they appear in the PR.
645
646 Modifying the PR is done by adding commits to the ``fix_1`` branch upon
647 which it is based, often followed by rebasing to modify the branch's git
648 history. See `this tutorial
649 <https://www.atlassian.com/git/tutorials/rewriting-history>`_ for a good
650 introduction to rebasing. When you are done with your modifications, you
651 will need to force push your branch with:
652
653 .. code::
654
655 $ git push --force origin fix_1
656
657 Merge
658 -----
659
660 The bugfixing process culminates when one of the project leads decides to
661 merge your PR.
662
663 When this happens, it is a signal for you (or the lead who merged the PR)
664 to change the `Issue tracker`_ status to "Resolved". Some issues may be
665 flagged for backporting, in which case the status should be changed to
666 "Pending Backport" (see the `Backporting`_ chapter for details).
667
668
669 Testing
670 =======
671
672 Ceph has two types of tests: "make check" tests and integration tests.
673 The former are run via `GNU Make <https://www.gnu.org/software/make/>`,
674 and the latter are run via the `teuthology framework`_. The following two
675 chapters examine the "make check" and integration tests in detail.
676
677 Testing - make check
678 ====================
679
680 After compiling Ceph, the ``make check`` command can be used to run the
681 code through a battery of tests covering various aspects of Ceph. For
682 inclusion in "make check", a test must:
683
684 * bind ports that do not conflict with other tests
685 * not require root access
686 * not require more than one machine to run
687 * complete within a few minutes
688
689 While it is possible to run ``make check`` directly, it can be tricky to
690 correctly set up your environment. Fortunately, a script is provided to
691 make it easier run "make check" on your code. It can be run from the
692 top-level directory of the Ceph source tree by doing::
693
694 $ ./run-make-check.sh
695
696 You will need a minimum of 8GB of RAM and 32GB of free disk space for this
697 command to complete successfully on x86_64 (other architectures may have
698 different constraints). Depending on your hardware, it can take from 20
699 minutes to three hours to complete, but it's worth the wait.
700
701 Future sections
702 ---------------
703
704 * Principles of make check tests
705 * Where to find test results
706 * How to interpret test results
707 * Find the corresponding source code
708 * Writing make check tests
709 * Make check caveats
710
711 Testing - integration tests
712 ===========================
713
714 When a test requires multiple machines, root access or lasts for a
715 longer time (for example, to simulate a realistic Ceph deployment), it
716 is deemed to be an integration test. Integration tests are organized into
717 "suites", which are defined in the `ceph/qa sub-directory`_ and run with
718 the ``teuthology-suite`` command.
719
720 The ``teuthology-suite`` command is part of the `teuthology framework`_.
721 In the sections that follow we attempt to provide a detailed introduction
722 to that framework from the perspective of a beginning Ceph developer.
723
724 Teuthology consumes packages
725 ----------------------------
726
727 It may take some time to understand the significance of this fact, but it
728 is `very` significant. It means that automated tests can be conducted on
729 multiple platforms using the same packages (RPM, DEB) that can be
730 installed on any machine running those platforms.
731
732 Teuthology has a `list of platforms that it supports
733 <https://github.com/ceph/ceph/tree/master/qa/distros/supported>`_ (as
734 of March 2016 the list consisted of "CentOS 7.2" and "Ubuntu 14.04"). It
735 expects to be provided pre-built Ceph packages for these platforms.
736 Teuthology deploys these platforms on machines (bare-metal or
737 cloud-provisioned), installs the packages on them, and deploys Ceph
738 clusters on them - all as called for by the test.
739
740 The nightlies
741 -------------
742
743 A number of integration tests are run on a regular basis in the `Sepia
744 lab`_ against the official Ceph repositories (on the ``master`` development
745 branch and the stable branches). Traditionally, these tests are called "the
746 nightlies" because the Ceph core developers used to live and work in
747 the same time zone and from their perspective the tests were run overnight.
748
749 The results of the nightlies are published at http://pulpito.ceph.com/ and
750 http://pulpito.ovh.sepia.ceph.com:8081/. The developer nick shows in the
751 test results URL and in the first column of the Pulpito dashboard. The
752 results are also reported on the `ceph-qa mailing list
753 <https://ceph.com/irc/>`_ for analysis.
754
755 Suites inventory
756 ----------------
757
758 The ``suites`` directory of the `ceph/qa sub-directory`_ contains
759 all the integration tests, for all the Ceph components.
760
761 `ceph-deploy <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy>`_
762 install a Ceph cluster with ``ceph-deploy`` (`ceph-deploy man page`_)
763
764 `ceph-disk <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-disk>`_
765 verify init scripts (upstart etc.) and udev integration with
766 ``ceph-disk`` (`ceph-disk man page`_), with and without `dmcrypt
767 <https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt>`_ support.
768
769 `dummy <https://github.com/ceph/ceph/tree/master/qa/suites/dummy>`_
770 get a machine, do nothing and return success (commonly used to
771 verify the integration testing infrastructure works as expected)
772
773 `fs <https://github.com/ceph/ceph/tree/master/qa/suites/fs>`_
774 test CephFS
775
776 `kcephfs <https://github.com/ceph/ceph/tree/master/qa/suites/kcephfs>`_
777 test the CephFS kernel module
778
779 `krbd <https://github.com/ceph/ceph/tree/master/qa/suites/krbd>`_
780 test the RBD kernel module
781
782 `powercycle <https://github.com/ceph/ceph/tree/master/qa/suites/powercycle>`_
783 verify the Ceph cluster behaves when machines are powered off
784 and on again
785
786 `rados <https://github.com/ceph/ceph/tree/master/qa/suites/rados>`_
787 run Ceph clusters including OSDs and MONs, under various conditions of
788 stress
789
790 `rbd <https://github.com/ceph/ceph/tree/master/qa/suites/rbd>`_
791 run RBD tests using actual Ceph clusters, with and without qemu
792
793 `rgw <https://github.com/ceph/ceph/tree/master/qa/suites/rgw>`_
794 run RGW tests using actual Ceph clusters
795
796 `smoke <https://github.com/ceph/ceph/tree/master/qa/suites/smoke>`_
797 run tests that exercise the Ceph API with an actual Ceph cluster
798
799 `teuthology <https://github.com/ceph/ceph/tree/master/qa/suites/teuthology>`_
800 verify that teuthology can run integration tests, with and without OpenStack
801
802 `upgrade <https://github.com/ceph/ceph/tree/master/qa/suites/upgrade>`_
803 for various versions of Ceph, verify that upgrades can happen
804 without disrupting an ongoing workload
805
806 .. _`ceph-deploy man page`: ../../man/8/ceph-deploy
807 .. _`ceph-disk man page`: ../../man/8/ceph-disk
808
809 teuthology-describe-tests
810 -------------------------
811
812 In February 2016, a new feature called ``teuthology-describe-tests`` was
813 added to the `teuthology framework`_ to facilitate documentation and better
814 understanding of integration tests (`feature announcement
815 <http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29287>`_).
816
817 The upshot is that tests can be documented by embedding ``meta:``
818 annotations in the yaml files used to define the tests. The results can be
819 seen in the `ceph-qa-suite wiki
820 <http://tracker.ceph.com/projects/ceph-qa-suite/wiki/>`_.
821
822 Since this is a new feature, many yaml files have yet to be annotated.
823 Developers are encouraged to improve the documentation, in terms of both
824 coverage and quality.
825
826 How integration tests are run
827 -----------------------------
828
829 Given that - as a new Ceph developer - you will typically not have access
830 to the `Sepia lab`_, you may rightly ask how you can run the integration
831 tests in your own environment.
832
833 One option is to set up a teuthology cluster on bare metal. Though this is
834 a non-trivial task, it `is` possible. Here are `some notes
835 <http://docs.ceph.com/teuthology/docs/LAB_SETUP.html>`_ to get you started
836 if you decide to go this route.
837
838 If you have access to an OpenStack tenant, you have another option: the
839 `teuthology framework`_ has an OpenStack backend, which is documented `here
840 <https://github.com/dachary/teuthology/tree/openstack#openstack-backend>`__.
841 This OpenStack backend can build packages from a given git commit or
842 branch, provision VMs, install the packages and run integration tests
843 on those VMs. This process is controlled using a tool called
844 `ceph-workbench ceph-qa-suite`_. This tool also automates publishing of
845 test results at http://teuthology-logs.public.ceph.com.
846
847 Running integration tests on your code contributions and publishing the
848 results allows reviewers to verify that changes to the code base do not
849 cause regressions, or to analyze test failures when they do occur.
850
851 Every teuthology cluster, whether bare-metal or cloud-provisioned, has a
852 so-called "teuthology machine" from which tests suites are triggered using the
853 ``teuthology-suite`` command.
854
855 A detailed and up-to-date description of each `teuthology-suite`_ option is
856 available by running the following command on the teuthology machine::
857
858 $ teuthology-suite --help
859
860 .. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html
861
862 How integration tests are defined
863 ---------------------------------
864
865 Integration tests are defined by yaml files found in the ``suites``
866 subdirectory of the `ceph/qa sub-directory`_ and implemented by python
867 code found in the ``tasks`` subdirectory. Some tests ("standalone tests")
868 are defined in a single yaml file, while other tests are defined by a
869 directory tree containing yaml files that are combined, at runtime, into a
870 larger yaml file.
871
872 Reading a standalone test
873 -------------------------
874
875 Let us first examine a standalone test, or "singleton".
876
877 Here is a commented example using the integration test
878 `rados/singleton/all/admin-socket.yaml
879 <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/admin-socket.yaml>`_
880 ::
881
882 roles:
883 - - mon.a
884 - osd.0
885 - osd.1
886 tasks:
887 - install:
888 - ceph:
889 - admin_socket:
890 osd.0:
891 version:
892 git_version:
893 help:
894 config show:
895 config set filestore_dump_file /tmp/foo:
896 perf dump:
897 perf schema:
898
899 The ``roles`` array determines the composition of the cluster (how
900 many MONs, OSDs, etc.) on which this test is designed to run, as well
901 as how these roles will be distributed over the machines in the
902 testing cluster. In this case, there is only one element in the
903 top-level array: therefore, only one machine is allocated to the
904 test. The nested array declares that this machine shall run a MON with
905 id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs
906 (``osd.0`` and ``osd.1``).
907
908 The body of the test is in the ``tasks`` array: each element is
909 evaluated in order, causing the corresponding python file found in the
910 ``tasks`` subdirectory of the `teuthology repository`_ or
911 `ceph/qa sub-directory`_ to be run. "Running" in this case means calling
912 the ``task()`` function defined in that file.
913
914 In this case, the `install
915 <https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_
916 task comes first. It installs the Ceph packages on each machine (as
917 defined by the ``roles`` array). A full description of the ``install``
918 task is `found in the python file
919 <https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_
920 (search for "def task").
921
922 The ``ceph`` task, which is documented `here
923 <https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py>`__ (again,
924 search for "def task"), starts OSDs and MONs (and possibly MDSs as well)
925 as required by the ``roles`` array. In this example, it will start one MON
926 (``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same
927 machine. Control moves to the next task when the Ceph cluster reaches
928 ``HEALTH_OK`` state.
929
930 The next task is ``admin_socket`` (`source code
931 <https://github.com/ceph/ceph/blob/master/qa/tasks/admin_socket.py>`_).
932 The parameter of the ``admin_socket`` task (and any other task) is a
933 structure which is interpreted as documented in the task. In this example
934 the parameter is a set of commands to be sent to the admin socket of
935 ``osd.0``. The task verifies that each of them returns on success (i.e.
936 exit code zero).
937
938 This test can be run with::
939
940 $ teuthology-suite --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml
941
942 Test descriptions
943 -----------------
944
945 Each test has a "test description", which is similar to a directory path,
946 but not the same. In the case of a standalone test, like the one in
947 `Reading a standalone test`_, the test description is identical to the
948 relative path (starting from the ``suites/`` directory of the
949 `ceph/qa sub-directory`_) of the yaml file defining the test.
950
951 Much more commonly, tests are defined not by a single yaml file, but by a
952 `directory tree of yaml files`. At runtime, the tree is walked and all yaml
953 files (facets) are combined into larger yaml "programs" that define the
954 tests. A full listing of the yaml defining the test is included at the
955 beginning of every test log.
956
957 In these cases, the description of each test consists of the
958 subdirectory under `suites/
959 <https://github.com/ceph/ceph/tree/master/qa/suites>`_ containing the
960 yaml facets, followed by an expression in curly braces (``{}``) consisting of
961 a list of yaml facets in order of concatenation. For instance the
962 test description::
963
964 ceph-disk/basic/{distros/centos_7.0.yaml tasks/ceph-disk.yaml}
965
966 signifies the concatenation of two files:
967
968 * ceph-disk/basic/distros/centos_7.0.yaml
969 * ceph-disk/basic/tasks/ceph-disk.yaml
970
971 How are tests built from directories?
972 -------------------------------------
973
974 As noted in the previous section, most tests are not defined in a single
975 yaml file, but rather as a `combination` of files collected from a
976 directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_.
977
978 The set of all tests defined by a given subdirectory of ``suites/`` is
979 called an "integration test suite", or a "teuthology suite".
980
981 Combination of yaml facets is controlled by special files (``%`` and
982 ``+``) that are placed within the directory tree and can be thought of as
983 operators. The ``%`` file is the "convolution" operator and ``+``
984 signifies concatenation.
985
986 Convolution operator
987 --------------------
988
989 The convolution operator, implemented as an empty file called ``%``, tells
990 teuthology to construct a test matrix from yaml facets found in
991 subdirectories below the directory containing the operator.
992
993 For example, the `ceph-disk suite
994 <https://github.com/ceph/ceph/tree/jewel/qa/suites/ceph-disk/>`_ is
995 defined by the ``suites/ceph-disk/`` tree, which consists of the files and
996 subdirectories in the following structure::
997
998 directory: ceph-disk/basic
999 file: %
1000 directory: distros
1001 file: centos_7.0.yaml
1002 file: ubuntu_14.04.yaml
1003 directory: tasks
1004 file: ceph-disk.yaml
1005
1006 This is interpreted as a 2x1 matrix consisting of two tests:
1007
1008 1. ceph-disk/basic/{distros/centos_7.0.yaml tasks/ceph-disk.yaml}
1009 2. ceph-disk/basic/{distros/ubuntu_14.04.yaml tasks/ceph-disk.yaml}
1010
1011 i.e. the concatenation of centos_7.0.yaml and ceph-disk.yaml and
1012 the concatenation of ubuntu_14.04.yaml and ceph-disk.yaml, respectively.
1013 In human terms, this means that the task found in ``ceph-disk.yaml`` is
1014 intended to run on both CentOS 7.0 and Ubuntu 14.04.
1015
1016 Without the file percent, the ``ceph-disk`` tree would be interpreted as
1017 three standalone tests:
1018
1019 * ceph-disk/basic/distros/centos_7.0.yaml
1020 * ceph-disk/basic/distros/ubuntu_14.04.yaml
1021 * ceph-disk/basic/tasks/ceph-disk.yaml
1022
1023 (which would of course be wrong in this case).
1024
1025 Referring to the `ceph/qa sub-directory`_, you will notice that the
1026 ``centos_7.0.yaml`` and ``ubuntu_14.04.yaml`` files in the
1027 ``suites/ceph-disk/basic/distros/`` directory are implemented as symlinks.
1028 By using symlinks instead of copying, a single file can appear in multiple
1029 suites. This eases the maintenance of the test framework as a whole.
1030
1031 All the tests generated from the ``suites/ceph-disk/`` directory tree
1032 (also known as the "ceph-disk suite") can be run with::
1033
1034 $ teuthology-suite --suite ceph-disk
1035
1036 An individual test from the `ceph-disk suite`_ can be run by adding the
1037 ``--filter`` option::
1038
1039 $ teuthology-suite \
1040 --suite ceph-disk/basic \
1041 --filter 'ceph-disk/basic/{distros/ubuntu_14.04.yaml tasks/ceph-disk.yaml}'
1042
1043 .. note: To run a standalone test like the one in `Reading a standalone
1044 test`_, ``--suite`` alone is sufficient. If you want to run a single
1045 test from a suite that is defined as a directory tree, ``--suite`` must
1046 be combined with ``--filter``. This is because the ``--suite`` option
1047 understands POSIX relative paths only.
1048
1049 Concatenation operator
1050 ----------------------
1051
1052 For even greater flexibility in sharing yaml files between suites, the
1053 special file plus (``+``) can be used to concatenate files within a
1054 directory. For instance, consider the `suites/rbd/thrash
1055 <https://github.com/ceph/ceph/tree/master/qa/suites/rbd/thrash>`_
1056 tree::
1057
1058 directory: rbd/thrash
1059 file: %
1060 directory: clusters
1061 file: +
1062 file: fixed-2.yaml
1063 file: openstack.yaml
1064 directory: workloads
1065 file: rbd_api_tests_copy_on_read.yaml
1066 file: rbd_api_tests.yaml
1067
1068 This creates two tests:
1069
1070 * rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}
1071 * rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml}
1072
1073 Because the ``clusters/`` subdirectory contains the special file plus
1074 (``+``), all the other files in that subdirectory (``fixed-2.yaml`` and
1075 ``openstack.yaml`` in this case) are concatenated together
1076 and treated as a single file. Without the special file plus, they would
1077 have been convolved with the files from the workloads directory to create
1078 a 2x2 matrix:
1079
1080 * rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}
1081 * rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml}
1082 * rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml}
1083 * rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml}
1084
1085 The ``clusters/fixed-2.yaml`` file is shared among many suites to
1086 define the following ``roles``::
1087
1088 roles:
1089 - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0]
1090 - [mon.b, osd.3, osd.4, osd.5, client.1]
1091
1092 The ``rbd/thrash`` suite as defined above, consisting of two tests,
1093 can be run with::
1094
1095 $ teuthology-suite --suite rbd/thrash
1096
1097 A single test from the rbd/thrash suite can be run by adding the
1098 ``--filter`` option::
1099
1100 $ teuthology-suite \
1101 --suite rbd/thrash \
1102 --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}'
1103
1104 Filtering tests by their description
1105 ------------------------------------
1106
1107 When a few jobs fail and need to be run again, the ``--filter`` option
1108 can be used to select tests with a matching description. For instance, if the
1109 ``rados`` suite fails the `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ test, the following will only run the tests that contain this file::
1110
1111 teuthology-suite --suite rados --filter all/peer.yaml
1112
1113 The ``--filter-out`` option does the opposite (it matches tests that do
1114 `not` contain a given string), and can be combined with the ``--filter``
1115 option.
1116
1117 Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings (which
1118 means the comma character is implicitly forbidden in filenames found in the
1119 `ceph/qa sub-directory`_). For instance::
1120
1121 teuthology-suite --suite rados --filter all/peer.yaml,all/rest-api.yaml
1122
1123 will run tests that contain either
1124 `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_
1125 or
1126 `all/rest-api.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/rest-api.yaml>`_
1127
1128 Each string is looked up anywhere in the test description and has to
1129 be an exact match: they are not regular expressions.
1130
1131 Reducing the number of tests
1132 ----------------------------
1133
1134 The ``rados`` suite generates thousands of tests out of a few hundred
1135 files. This happens because teuthology constructs test matrices from
1136 subdirectories wherever it encounters a file named ``%``. For instance,
1137 all tests in the `rados/basic suite
1138 <https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic>`_
1139 run with different messenger types: ``simple``, ``async`` and
1140 ``random``, because they are combined (via the special file ``%``) with
1141 the `msgr directory
1142 <https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic/msgr>`_
1143
1144 All integration tests are required to be run before a Ceph release is published.
1145 When merely verifying whether a contribution can be merged without
1146 risking a trivial regression, it is enough to run a subset. The ``--subset`` option can be used to
1147 reduce the number of tests that are triggered. For instance::
1148
1149 teuthology-suite --suite rados --subset 0/4000
1150
1151 will run as few tests as possible. The tradeoff in this case is that
1152 some tests will only run on ``xfs`` and not on ``ext4`` or ``btrfs``,
1153 but no matter how small a ratio is provided in the ``--subset``,
1154 teuthology will still ensure that all files in the suite are in at
1155 least one test. Understanding the actual logic that drives this
1156 requires reading the teuthology source code.
1157
1158 The ``--limit`` option only runs the first ``N`` tests in the suite:
1159 this is rarely useful, however, because there is no way to control which
1160 test will be first.
1161
1162 Testing in the cloud
1163 ====================
1164
1165 In this chapter, we will explain in detail how use an OpenStack
1166 tenant as an environment for Ceph integration testing.
1167
1168 Assumptions and caveat
1169 ----------------------
1170
1171 We assume that:
1172
1173 1. you are the only person using the tenant
1174 2. you have the credentials
1175 3. the tenant supports the ``nova`` and ``cinder`` APIs
1176
1177 Caveat: be aware that, as of this writing (July 2016), testing in
1178 OpenStack clouds is a new feature. Things may not work as advertised.
1179 If you run into trouble, ask for help on `IRC`_ or the `Mailing list`_, or
1180 open a bug report at the `ceph-workbench bug tracker`_.
1181
1182 .. _`ceph-workbench bug tracker`: http://ceph-workbench.dachary.org/root/ceph-workbench/issues
1183
1184 Prepare tenant
1185 --------------
1186
1187 If you have not tried to use ``ceph-workbench`` with this tenant before,
1188 proceed to the next step.
1189
1190 To start with a clean slate, login to your tenant via the Horizon dashboard and:
1191
1192 * terminate the ``teuthology`` and ``packages-repository`` instances, if any
1193 * delete the ``teuthology`` and ``teuthology-worker`` security groups, if any
1194 * delete the ``teuthology`` and ``teuthology-myself`` key pairs, if any
1195
1196 Also do the above if you ever get key-related errors ("invalid key", etc.) when
1197 trying to schedule suites.
1198
1199 Getting ceph-workbench
1200 ----------------------
1201
1202 Since testing in the cloud is done using the `ceph-workbench
1203 ceph-qa-suite`_ tool, you will need to install that first. It is designed
1204 to be installed via Docker, so if you don't have Docker running on your
1205 development machine, take care of that first. You can follow `the official
1206 tutorial<https://docs.docker.com/engine/installation/>`_ to install if
1207 you have not installed yet.
1208
1209 Once Docker is up and running, install ``ceph-workbench`` by following the
1210 `Installation instructions in the ceph-workbench documentation
1211 <http://ceph-workbench.readthedocs.org/en/latest/#installation>`_.
1212
1213 Linking ceph-workbench with your OpenStack tenant
1214 -------------------------------------------------
1215
1216 Before you can trigger your first teuthology suite, you will need to link
1217 ``ceph-workbench`` with your OpenStack account.
1218
1219 First, download a ``openrc.sh`` file by clicking on the "Download OpenStack
1220 RC File" button, which can be found in the "API Access" tab of the "Access
1221 & Security" dialog of the OpenStack Horizon dashboard.
1222
1223 Second, create a ``~/.ceph-workbench`` directory, set its permissions to
1224 700, and move the ``openrc.sh`` file into it. Make sure that the filename
1225 is exactly ``~/.ceph-workbench/openrc.sh``.
1226
1227 Third, edit the file so it does not ask for your OpenStack password
1228 interactively. Comment out the relevant lines and replace them with
1229 something like::
1230
1231 export OS_PASSWORD="aiVeth0aejee3eep8rogho3eep7Pha6ek"
1232
1233 When `ceph-workbench ceph-qa-suite`_ connects to your OpenStack tenant for
1234 the first time, it will generate two keypairs: ``teuthology-myself`` and
1235 ``teuthology``.
1236
1237 .. If this is not the first time you have tried to use
1238 .. `ceph-workbench ceph-qa-suite`_ with this tenant, make sure to delete any
1239 .. stale keypairs with these names!
1240
1241 Run the dummy suite
1242 -------------------
1243
1244 You are now ready to take your OpenStack teuthology setup for a test
1245 drive::
1246
1247 $ ceph-workbench ceph-qa-suite --suite dummy
1248
1249 Be forewarned that the first run of `ceph-workbench ceph-qa-suite`_ on a
1250 pristine tenant will take a long time to complete because it downloads a VM
1251 image and during this time the command may not produce any output.
1252
1253 The images are cached in OpenStack, so they are only downloaded once.
1254 Subsequent runs of the same command will complete faster.
1255
1256 Although ``dummy`` suite does not run any tests, in all other respects it
1257 behaves just like a teuthology suite and produces some of the same
1258 artifacts.
1259
1260 The last bit of output should look something like this::
1261
1262 pulpito web interface: http://149.202.168.201:8081/
1263 ssh access : ssh -i /home/smithfarm/.ceph-workbench/teuthology-myself.pem ubuntu@149.202.168.201 # logs in /usr/share/nginx/html
1264
1265 What this means is that `ceph-workbench ceph-qa-suite`_ triggered the test
1266 suite run. It does not mean that the suite run has completed. To monitor
1267 progress of the run, check the Pulpito web interface URL periodically, or
1268 if you are impatient, ssh to the teuthology machine using the ssh command
1269 shown and do::
1270
1271 $ tail -f /var/log/teuthology.*
1272
1273 The `/usr/share/nginx/html` directory contains the complete logs of the
1274 test suite. If we had provided the ``--upload`` option to the
1275 `ceph-workbench ceph-qa-suite`_ command, these logs would have been
1276 uploaded to http://teuthology-logs.public.ceph.com.
1277
1278 Run a standalone test
1279 ---------------------
1280
1281 The standalone test explained in `Reading a standalone test`_ can be run
1282 with the following command::
1283
1284 $ ceph-workbench ceph-qa-suite --suite rados/singleton/all/admin-socket.yaml
1285
1286 This will run the suite shown on the current ``master`` branch of
1287 ``ceph/ceph.git``. You can specify a different branch with the ``--ceph``
1288 option, and even a different git repo with the ``--ceph-git-url`` option. (Run
1289 ``ceph-workbench ceph-qa-suite --help`` for an up-to-date list of available
1290 options.)
1291
1292 The first run of a suite will also take a long time, because ceph packages
1293 have to be built, first. Again, the packages so built are cached and
1294 `ceph-workbench ceph-qa-suite`_ will not build identical packages a second
1295 time.
1296
1297 Interrupt a running suite
1298 -------------------------
1299
1300 Teuthology suites take time to run. From time to time one may wish to
1301 interrupt a running suite. One obvious way to do this is::
1302
1303 ceph-workbench ceph-qa-suite --teardown
1304
1305 This destroys all VMs created by `ceph-workbench ceph-qa-suite`_ and
1306 returns the OpenStack tenant to a "clean slate".
1307
1308 Sometimes you may wish to interrupt the running suite, but keep the logs,
1309 the teuthology VM, the packages-repository VM, etc. To do this, you can
1310 ``ssh`` to the teuthology VM (using the ``ssh access`` command reported
1311 when you triggered the suite -- see `Run the dummy suite`_) and, once
1312 there::
1313
1314 sudo /etc/init.d/teuthology restart
1315
1316 This will keep the teuthology machine, the logs and the packages-repository
1317 instance but nuke everything else.
1318
1319 Upload logs to archive server
1320 -----------------------------
1321
1322 Since the teuthology instance in OpenStack is only semi-permanent, with limited
1323 space for storing logs, ``teuthology-openstack`` provides an ``--upload``
1324 option which, if included in the ``ceph-workbench ceph-qa-suite`` command,
1325 will cause logs from all failed jobs to be uploaded to the log archive server
1326 maintained by the Ceph project. The logs will appear at the URL::
1327
1328 http://teuthology-logs.public.ceph.com/$RUN
1329
1330 where ``$RUN`` is the name of the run. It will be a string like this::
1331
1332 ubuntu-2016-07-23_16:08:12-rados-hammer-backports---basic-openstack
1333
1334 Even if you don't providing the ``--upload`` option, however, all the logs can
1335 still be found on the teuthology machine in the directory
1336 ``/usr/share/nginx/html``.
1337
1338 Provision VMs ad hoc
1339 --------------------
1340
1341 From the teuthology VM, it is possible to provision machines on an "ad hoc"
1342 basis, to use however you like. The magic incantation is::
1343
1344 teuthology-lock --lock-many $NUMBER_OF_MACHINES \
1345 --os-type $OPERATING_SYSTEM \
1346 --os-version $OS_VERSION \
1347 --machine-type openstack \
1348 --owner $EMAIL_ADDRESS
1349
1350 The command must be issued from the ``~/teuthology`` directory. The possible
1351 values for ``OPERATING_SYSTEM`` AND ``OS_VERSION`` can be found by examining
1352 the contents of the directory ``teuthology/openstack/``. For example::
1353
1354 teuthology-lock --lock-many 1 --os-type ubuntu --os-version 16.04 \
1355 --machine-type openstack --owner foo@example.com
1356
1357 When you are finished with the machine, find it in the list of machines::
1358
1359 openstack server list
1360
1361 to determine the name or ID, and then terminate it with::
1362
1363 openstack server delete $NAME_OR_ID
1364
1365 Deploy a cluster for manual testing
1366 -----------------------------------
1367
1368 The `teuthology framework`_ and `ceph-workbench ceph-qa-suite`_ are
1369 versatile tools that automatically provision Ceph clusters in the cloud and
1370 run various tests on them in an automated fashion. This enables a single
1371 engineer, in a matter of hours, to perform thousands of tests that would
1372 keep dozens of human testers occupied for days or weeks if conducted
1373 manually.
1374
1375 However, there are times when the automated tests do not cover a particular
1376 scenario and manual testing is desired. It turns out that it is simple to
1377 adapt a test to stop and wait after the Ceph installation phase, and the
1378 engineer can then ssh into the running cluster. Simply add the following
1379 snippet in the desired place within the test YAML and schedule a run with the
1380 test::
1381
1382 tasks:
1383 - exec:
1384 client.0:
1385 - sleep 1000000000 # forever
1386
1387 (Make sure you have a ``client.0`` defined in your ``roles`` stanza or adapt
1388 accordingly.)
1389
1390 The same effect can be achieved using the ``interactive`` task::
1391
1392 tasks:
1393 - interactive
1394
1395 By following the test log, you can determine when the test cluster has entered
1396 the "sleep forever" condition. At that point, you can ssh to the teuthology
1397 machine and from there to one of the target VMs (OpenStack) or teuthology
1398 worker machines machine (Sepia) where the test cluster is running.
1399
1400 The VMs (or "instances" in OpenStack terminology) created by
1401 `ceph-workbench ceph-qa-suite`_ are named as follows:
1402
1403 ``teuthology`` - the teuthology machine
1404
1405 ``packages-repository`` - VM where packages are stored
1406
1407 ``ceph-*`` - VM where packages are built
1408
1409 ``target*`` - machines where tests are run
1410
1411 The VMs named ``target*`` are used by tests. If you are monitoring the
1412 teuthology log for a given test, the hostnames of these target machines can
1413 be found out by searching for the string ``Locked targets``::
1414
1415 2016-03-20T11:39:06.166 INFO:teuthology.task.internal:Locked targets:
1416 target149202171058.teuthology: null
1417 target149202171059.teuthology: null
1418
1419 The IP addresses of the target machines can be found by running ``openstack
1420 server list`` on the teuthology machine, but the target VM hostnames (e.g.
1421 ``target149202171058.teuthology``) are resolvable within the teuthology
1422 cluster.
1423
1424
1425 Testing - how to run s3-tests locally
1426 =====================================
1427
1428 RGW code can be tested by building Ceph locally from source, starting a vstart
1429 cluster, and running the "s3-tests" suite against it.
1430
1431 The following instructions should work on jewel and above.
1432
1433 Step 1 - build Ceph
1434 -------------------
1435
1436 Refer to :doc:`install/build-ceph`.
1437
1438 You can do step 2 separately while it is building.
1439
1440 Step 2 - vstart
1441 ---------------
1442
1443 When the build completes, and still in the top-level directory of the git
1444 clone where you built Ceph, do the following, for cmake builds::
1445
1446 cd build/
1447 RGW=1 ../vstart.sh -n
1448
1449 This will produce a lot of output as the vstart cluster is started up. At the
1450 end you should see a message like::
1451
1452 started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output.
1453
1454 This means the cluster is running.
1455
1456
1457 Step 3 - run s3-tests
1458 ---------------------
1459
1460 To run the s3tests suite do the following::
1461
1462 $ ../qa/workunits/rgw/run-s3tests.sh
1463
1464 .. WIP
1465 .. ===
1466 ..
1467 .. Building RPM packages
1468 .. ---------------------
1469 ..
1470 .. Ceph is regularly built and packaged for a number of major Linux
1471 .. distributions. At the time of this writing, these included CentOS, Debian,
1472 .. Fedora, openSUSE, and Ubuntu.
1473 ..
1474 .. Architecture
1475 .. ============
1476 ..
1477 .. Ceph is a collection of components built on top of RADOS and provide
1478 .. services (RBD, RGW, CephFS) and APIs (S3, Swift, POSIX) for the user to
1479 .. store and retrieve data.
1480 ..
1481 .. See :doc:`/architecture` for an overview of Ceph architecture. The
1482 .. following sections treat each of the major architectural components
1483 .. in more detail, with links to code and tests.
1484 ..
1485 .. FIXME The following are just stubs. These need to be developed into
1486 .. detailed descriptions of the various high-level components (RADOS, RGW,
1487 .. etc.) with breakdowns of their respective subcomponents.
1488 ..
1489 .. FIXME Later, in the Testing chapter I would like to take another look
1490 .. at these components/subcomponents with a focus on how they are tested.
1491 ..
1492 .. RADOS
1493 .. -----
1494 ..
1495 .. RADOS stands for "Reliable, Autonomic Distributed Object Store". In a Ceph
1496 .. cluster, all data are stored in objects, and RADOS is the component responsible
1497 .. for that.
1498 ..
1499 .. RADOS itself can be further broken down into Monitors, Object Storage Daemons
1500 .. (OSDs), and client APIs (librados). Monitors and OSDs are introduced at
1501 .. :doc:`/start/intro`. The client library is explained at
1502 .. :doc:`/rados/api/index`.
1503 ..
1504 .. RGW
1505 .. ---
1506 ..
1507 .. RGW stands for RADOS Gateway. Using the embedded HTTP server civetweb_ or
1508 .. Apache FastCGI, RGW provides a REST interface to RADOS objects.
1509 ..
1510 .. .. _civetweb: https://github.com/civetweb/civetweb
1511 ..
1512 .. A more thorough introduction to RGW can be found at :doc:`/radosgw/index`.
1513 ..
1514 .. RBD
1515 .. ---
1516 ..
1517 .. RBD stands for RADOS Block Device. It enables a Ceph cluster to store disk
1518 .. images, and includes in-kernel code enabling RBD images to be mounted.
1519 ..
1520 .. To delve further into RBD, see :doc:`/rbd/rbd`.
1521 ..
1522 .. CephFS
1523 .. ------
1524 ..
1525 .. CephFS is a distributed file system that enables a Ceph cluster to be used as a NAS.
1526 ..
1527 .. File system metadata is managed by Meta Data Server (MDS) daemons. The Ceph
1528 .. file system is explained in more detail at :doc:`/cephfs/index`.
1529 ..