]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | ============================================ |
2 | Contributing to Ceph: A Guide for Developers | |
3 | ============================================ | |
4 | ||
5 | :Author: Loic Dachary | |
6 | :Author: Nathan Cutler | |
7 | :License: Creative Commons Attribution-ShareAlike (CC BY-SA) | |
8 | ||
9 | .. note:: The old (pre-2016) developer documentation has been moved to :doc:`/dev/index-old`. | |
10 | ||
11 | .. contents:: | |
12 | :depth: 3 | |
13 | ||
14 | Introduction | |
15 | ============ | |
16 | ||
17 | This guide has two aims. First, it should lower the barrier to entry for | |
18 | software developers who wish to get involved in the Ceph project. Second, | |
19 | it should serve as a reference for Ceph developers. | |
20 | ||
21 | We assume that readers are already familiar with Ceph (the distributed | |
22 | object store and file system designed to provide excellent performance, | |
23 | reliability and scalability). If not, please refer to the `project website`_ | |
24 | and especially the `publications list`_. | |
25 | ||
26 | .. _`project website`: http://ceph.com | |
27 | .. _`publications list`: https://ceph.com/resources/publications/ | |
28 | ||
29 | Since this document is to be consumed by developers, who are assumed to | |
30 | have Internet access, topics covered elsewhere, either within the Ceph | |
31 | documentation or elsewhere on the web, are treated by linking. If you | |
32 | notice that a link is broken or if you know of a better link, please | |
33 | `report it as a bug`_. | |
34 | ||
35 | .. _`report it as a bug`: http://tracker.ceph.com/projects/ceph/issues/new | |
36 | ||
37 | Essentials (tl;dr) | |
38 | ================== | |
39 | ||
40 | This chapter presents essential information that every Ceph developer needs | |
41 | to know. | |
42 | ||
43 | Leads | |
44 | ----- | |
45 | ||
46 | The Ceph project is led by Sage Weil. In addition, each major project | |
47 | component has its own lead. The following table shows all the leads and | |
48 | their nicks on `GitHub`_: | |
49 | ||
50 | .. _github: https://github.com/ | |
51 | ||
52 | ========= =============== ============= | |
53 | Scope Lead GitHub nick | |
54 | ========= =============== ============= | |
55 | Ceph Sage Weil liewegas | |
56 | RADOS Samuel Just athanatos | |
57 | RGW Yehuda Sadeh yehudasa | |
58 | RBD Jason Dillaman dillaman | |
59 | CephFS John Spray jcsp | |
60 | Build/Ops Ken Dreyer ktdreyer | |
61 | ========= =============== ============= | |
62 | ||
63 | The Ceph-specific acronyms in the table are explained in | |
64 | :doc:`/architecture`. | |
65 | ||
66 | History | |
67 | ------- | |
68 | ||
69 | See the `History chapter of the Wikipedia article`_. | |
70 | ||
71 | .. _`History chapter of the Wikipedia article`: https://en.wikipedia.org/wiki/Ceph_%28software%29#History | |
72 | ||
73 | Licensing | |
74 | --------- | |
75 | ||
76 | Ceph is free software. | |
77 | ||
78 | Unless stated otherwise, the Ceph source code is distributed under the terms of | |
79 | the LGPL2.1. For full details, see `the file COPYING in the top-level | |
80 | directory of the source-code tree`_. | |
81 | ||
82 | .. _`the file COPYING in the top-level directory of the source-code tree`: | |
83 | https://github.com/ceph/ceph/blob/master/COPYING | |
84 | ||
85 | Source code repositories | |
86 | ------------------------ | |
87 | ||
88 | The source code of Ceph lives on `GitHub`_ in a number of repositories below | |
89 | the `Ceph "organization"`_. | |
90 | ||
91 | .. _`Ceph "organization"`: https://github.com/ceph | |
92 | ||
93 | To make a meaningful contribution to the project as a developer, a working | |
94 | knowledge of git_ is essential. | |
95 | ||
96 | .. _git: https://git-scm.com/documentation | |
97 | ||
98 | Although the `Ceph "organization"`_ includes several software repositories, | |
99 | this document covers only one: https://github.com/ceph/ceph. | |
100 | ||
101 | Redmine issue tracker | |
102 | --------------------- | |
103 | ||
104 | Although `GitHub`_ is used for code, Ceph-related issues (Bugs, Features, | |
105 | Backports, Documentation, etc.) are tracked at http://tracker.ceph.com, | |
106 | which is powered by `Redmine`_. | |
107 | ||
108 | .. _Redmine: http://www.redmine.org | |
109 | ||
110 | The tracker has a Ceph project with a number of subprojects loosely | |
111 | corresponding to the various architectural components (see | |
112 | :doc:`/architecture`). | |
113 | ||
114 | Mere `registration`_ in the tracker automatically grants permissions | |
115 | sufficient to open new issues and comment on existing ones. | |
116 | ||
117 | .. _registration: http://tracker.ceph.com/account/register | |
118 | ||
119 | To report a bug or propose a new feature, `jump to the Ceph project`_ and | |
120 | click on `New issue`_. | |
121 | ||
122 | .. _`jump to the Ceph project`: http://tracker.ceph.com/projects/ceph | |
123 | .. _`New issue`: http://tracker.ceph.com/projects/ceph/issues/new | |
124 | ||
125 | Mailing list | |
126 | ------------ | |
127 | ||
128 | Ceph development email discussions take place on the mailing list | |
129 | ``ceph-devel@vger.kernel.org``. The list is open to all. Subscribe by | |
130 | sending a message to ``majordomo@vger.kernel.org`` with the line: :: | |
131 | ||
132 | subscribe ceph-devel | |
133 | ||
134 | in the body of the message. | |
135 | ||
136 | There are also `other Ceph-related mailing lists`_. | |
137 | ||
31f18b77 | 138 | .. _`other Ceph-related mailing lists`: https://ceph.com/irc/ |
7c673cae FG |
139 | |
140 | IRC | |
141 | --- | |
142 | ||
143 | In addition to mailing lists, the Ceph community also communicates in real | |
144 | time using `Internet Relay Chat`_. | |
145 | ||
146 | .. _`Internet Relay Chat`: http://www.irchelp.org/ | |
147 | ||
31f18b77 | 148 | See https://ceph.com/irc/ for how to set up your IRC |
7c673cae FG |
149 | client and a list of channels. |
150 | ||
151 | Submitting patches | |
152 | ------------------ | |
153 | ||
154 | The canonical instructions for submitting patches are contained in the | |
155 | `the file CONTRIBUTING.rst in the top-level directory of the source-code | |
156 | tree`_. There may be some overlap between this guide and that file. | |
157 | ||
158 | .. _`the file CONTRIBUTING.rst in the top-level directory of the source-code tree`: | |
159 | https://github.com/ceph/ceph/blob/master/CONTRIBUTING.rst | |
160 | ||
161 | All newcomers are encouraged to read that file carefully. | |
162 | ||
163 | Building from source | |
164 | -------------------- | |
165 | ||
166 | See instructions at :doc:`/install/build-ceph`. | |
167 | ||
168 | Using ccache to speed up local builds | |
169 | ------------------------------------- | |
170 | ||
171 | Rebuilds of the ceph source tree can benefit significantly from use of `ccache`_. | |
172 | Many a times while switching branches and such, one might see build failures for | |
173 | certain older branches mostly due to older build artifacts. These rebuilds can | |
174 | significantly benefit the use of ccache. For a full clean source tree, one could | |
175 | do :: | |
176 | ||
177 | $ make clean | |
178 | ||
179 | # note the following will nuke everything in the source tree that | |
180 | # isn't tracked by git, so make sure to backup any log files /conf options | |
181 | ||
182 | $ git clean -fdx; git submodule foreach git clean -fdx | |
183 | ||
184 | ccache is available as a package in most distros. To build ceph with ccache one | |
185 | can:: | |
186 | ||
187 | $ cmake -DWITH_CCACHE=ON .. | |
188 | ||
189 | ccache can also be used for speeding up all builds in the system. for more | |
190 | details refer to the `run modes`_ of the ccache manual. The default settings of | |
191 | ``ccache`` can be displayed with ``ccache -s``. | |
192 | ||
193 | .. note: It is recommended to override the ``max_size``, which is the size of | |
194 | cache, defaulting to 10G, to a larger size like 25G or so. Refer to the | |
195 | `configuration`_ section of ccache manual. | |
196 | ||
197 | .. _`ccache`: https://ccache.samba.org/ | |
198 | .. _`run modes`: https://ccache.samba.org/manual.html#_run_modes | |
199 | .. _`configuration`: https://ccache.samba.org/manual.html#_configuration | |
200 | ||
201 | Development-mode cluster | |
202 | ------------------------ | |
203 | ||
204 | See :doc:`/dev/quick_guide`. | |
205 | ||
206 | Backporting | |
207 | ----------- | |
208 | ||
209 | All bugfixes should be merged to the ``master`` branch before being backported. | |
210 | To flag a bugfix for backporting, make sure it has a `tracker issue`_ | |
211 | associated with it and set the ``Backport`` field to a comma-separated list of | |
212 | previous releases (e.g. "hammer,jewel") that you think need the backport. | |
213 | The rest (including the actual backporting) will be taken care of by the | |
214 | `Stable Releases and Backports`_ team. | |
215 | ||
216 | .. _`tracker issue`: http://tracker.ceph.com/ | |
217 | .. _`Stable Releases and Backports`: http://tracker.ceph.com/projects/ceph-releases/wiki | |
218 | ||
219 | ||
220 | What is merged where and when ? | |
221 | =============================== | |
222 | ||
223 | Commits are merged into branches according to criteria that change | |
224 | during the lifecycle of a Ceph release. This chapter is the inventory | |
225 | of what can be merged in which branch at a given point in time. | |
226 | ||
227 | Development releases (i.e. x.0.z) | |
228 | --------------------------------- | |
229 | ||
230 | What ? | |
231 | ^^^^^^ | |
232 | ||
233 | * features | |
234 | * bug fixes | |
235 | ||
236 | Where ? | |
237 | ^^^^^^^ | |
238 | ||
239 | Features are merged to the master branch. Bug fixes should be merged | |
240 | to the corresponding named branch (e.g. "jewel" for 10.0.z, "kraken" | |
241 | for 11.0.z, etc.). However, this is not mandatory - bug fixes can be | |
242 | merged to the master branch as well, since the master branch is | |
243 | periodically merged to the named branch during the development | |
244 | releases phase. In either case, if the bugfix is important it can also | |
245 | be flagged for backport to one or more previous stable releases. | |
246 | ||
247 | When ? | |
248 | ^^^^^^ | |
249 | ||
250 | After the stable release candidates of the previous release enters | |
251 | phase 2 (see below). For example: the "jewel" named branch was | |
252 | created when the infernalis release candidates entered phase 2. From | |
253 | this point on, master was no longer associated with infernalis. As | |
254 | soon as the named branch of the next stable release is created, master | |
255 | starts getting periodically merged into it. | |
256 | ||
257 | Branch merges | |
258 | ^^^^^^^^^^^^^ | |
259 | ||
260 | * The branch of the stable release is merged periodically into master. | |
261 | * The master branch is merged periodically into the branch of the | |
262 | stable release. | |
263 | * The master is merged into the branch of the stable release | |
264 | immediately after each development x.0.z release. | |
265 | ||
266 | Stable release candidates (i.e. x.1.z) phase 1 | |
267 | ---------------------------------------------- | |
268 | ||
269 | What ? | |
270 | ^^^^^^ | |
271 | ||
272 | * bug fixes only | |
273 | ||
274 | Where ? | |
275 | ^^^^^^^ | |
276 | ||
277 | The branch of the stable release (e.g. "jewel" for 10.0.z, "kraken" | |
278 | for 11.0.z, etc.) or master. Bug fixes should be merged to the named | |
279 | branch corresponding to the stable release candidate (e.g. "jewel" for | |
280 | 10.1.z) or to master. During this phase, all commits to master will be | |
281 | merged to the named branch, and vice versa. In other words, it makes | |
282 | no difference whether a commit is merged to the named branch or to | |
283 | master - it will make it into the next release candidate either way. | |
284 | ||
285 | When ? | |
286 | ^^^^^^ | |
287 | ||
288 | After the first stable release candidate is published, i.e. after the | |
289 | x.1.0 tag is set in the release branch. | |
290 | ||
291 | Branch merges | |
292 | ^^^^^^^^^^^^^ | |
293 | ||
294 | * The branch of the stable release is merged periodically into master. | |
295 | * The master branch is merged periodically into the branch of the | |
296 | stable release. | |
297 | * The master is merged into the branch of the stable release | |
298 | immediately after each x.1.z release candidate. | |
299 | ||
300 | Stable release candidates (i.e. x.1.z) phase 2 | |
301 | ---------------------------------------------- | |
302 | ||
303 | What ? | |
304 | ^^^^^^ | |
305 | ||
306 | * bug fixes only | |
307 | ||
308 | Where ? | |
309 | ^^^^^^^ | |
310 | ||
311 | The branch of the stable release (e.g. "jewel" for 10.0.z, "kraken" | |
312 | for 11.0.z, etc.). During this phase, all commits to the named branch | |
313 | will be merged into master. Cherry-picking to the named branch during | |
314 | release candidate phase 2 is done manually since the official | |
315 | backporting process only begins when the release is pronounced | |
316 | "stable". | |
317 | ||
318 | When ? | |
319 | ^^^^^^ | |
320 | ||
321 | After Sage Weil decides it is time for phase 2 to happen. | |
322 | ||
323 | Branch merges | |
324 | ^^^^^^^^^^^^^ | |
325 | ||
326 | * The branch of the stable release is merged periodically into master. | |
327 | ||
328 | Stable releases (i.e. x.2.z) | |
329 | ---------------------------- | |
330 | ||
331 | What ? | |
332 | ^^^^^^ | |
333 | ||
334 | * bug fixes | |
335 | * features are sometime accepted | |
336 | * commits should be cherry-picked from master when possible | |
337 | * commits that are not cherry-picked from master must be about a bug unique to the stable release | |
338 | * see also `the backport HOWTO`_ | |
339 | ||
340 | .. _`the backport HOWTO`: | |
341 | http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO#HOWTO | |
342 | ||
343 | Where ? | |
344 | ^^^^^^^ | |
345 | ||
346 | The branch of the stable release (hammer for 0.94.x, infernalis for 9.2.x, etc.) | |
347 | ||
348 | When ? | |
349 | ^^^^^^ | |
350 | ||
351 | After the stable release is published, i.e. after the "vx.2.0" tag is | |
352 | set in the release branch. | |
353 | ||
354 | Branch merges | |
355 | ^^^^^^^^^^^^^ | |
356 | ||
357 | Never | |
358 | ||
359 | Issue tracker | |
360 | ============= | |
361 | ||
362 | See `Redmine issue tracker`_ for a brief introduction to the Ceph Issue Tracker. | |
363 | ||
364 | Ceph developers use the issue tracker to | |
365 | ||
366 | 1. keep track of issues - bugs, fix requests, feature requests, backport | |
367 | requests, etc. | |
368 | ||
369 | 2. communicate with other developers and keep them informed as work | |
370 | on the issues progresses. | |
371 | ||
372 | Issue tracker conventions | |
373 | ------------------------- | |
374 | ||
375 | When you start working on an existing issue, it's nice to let the other | |
376 | developers know this - to avoid duplication of labor. Typically, this is | |
377 | done by changing the :code:`Assignee` field (to yourself) and changing the | |
378 | :code:`Status` to *In progress*. Newcomers to the Ceph community typically do not | |
379 | have sufficient privileges to update these fields, however: they can | |
380 | simply update the issue with a brief note. | |
381 | ||
382 | .. table:: Meanings of some commonly used statuses | |
383 | ||
384 | ================ =========================================== | |
385 | Status Meaning | |
386 | ================ =========================================== | |
387 | New Initial status | |
388 | In Progress Somebody is working on it | |
389 | Need Review Pull request is open with a fix | |
390 | Pending Backport Fix has been merged, backport(s) pending | |
391 | Resolved Fix and backports (if any) have been merged | |
392 | ================ =========================================== | |
393 | ||
394 | Basic workflow | |
395 | ============== | |
396 | ||
397 | The following chart illustrates basic development workflow: | |
398 | ||
399 | .. ditaa:: | |
400 | ||
401 | Upstream Code Your Local Environment | |
402 | ||
403 | /----------\ git clone /-------------\ | |
404 | | Ceph | -------------------------> | ceph/master | | |
405 | \----------/ \-------------/ | |
406 | ^ | | |
407 | | | git branch fix_1 | |
408 | | git merge | | |
409 | | v | |
410 | /----------------\ git commit --amend /-------------\ | |
411 | | make check |---------------------> | ceph/fix_1 | | |
412 | | ceph--qa--suite| \-------------/ | |
413 | \----------------/ | | |
414 | ^ | fix changes | |
415 | | | test changes | |
416 | | review | git commit | |
417 | | | | |
418 | | v | |
419 | /--------------\ /-------------\ | |
420 | | github |<---------------------- | ceph/fix_1 | | |
421 | | pull request | git push \-------------/ | |
422 | \--------------/ | |
423 | ||
424 | Below we present an explanation of this chart. The explanation is written | |
425 | with the assumption that you, the reader, are a beginning developer who | |
426 | has an idea for a bugfix, but do not know exactly how to proceed. | |
427 | ||
428 | Update the tracker | |
429 | ------------------ | |
430 | ||
431 | Before you start, you should know the `Issue tracker`_ number of the bug | |
432 | you intend to fix. If there is no tracker issue, now is the time to create | |
433 | one. | |
434 | ||
435 | The tracker is there to explain the issue (bug) to your fellow Ceph | |
436 | developers and keep them informed as you make progress toward resolution. | |
437 | To this end, then, provide a descriptive title as well as sufficient | |
438 | information and details in the description. | |
439 | ||
440 | If you have sufficient tracker permissions, assign the bug to yourself by | |
441 | changing the ``Assignee`` field. If your tracker permissions have not yet | |
442 | been elevated, simply add a comment to the issue with a short message like | |
443 | "I am working on this issue". | |
444 | ||
445 | Upstream code | |
446 | ------------- | |
447 | ||
448 | This section, and the ones that follow, correspond to the nodes in the | |
449 | above chart. | |
450 | ||
451 | The upstream code lives in https://github.com/ceph/ceph.git, which is | |
452 | sometimes referred to as the "upstream repo", or simply "upstream". As the | |
453 | chart illustrates, we will make a local copy of this code, modify it, test | |
454 | our modifications, and submit the modifications back to the upstream repo | |
455 | for review. | |
456 | ||
457 | A local copy of the upstream code is made by | |
458 | ||
459 | 1. forking the upstream repo on GitHub, and | |
460 | 2. cloning your fork to make a local working copy | |
461 | ||
462 | See the `the GitHub documentation | |
463 | <https://help.github.com/articles/fork-a-repo/#platform-linux>`_ for | |
464 | detailed instructions on forking. In short, if your GitHub username is | |
465 | "mygithubaccount", your fork of the upstream repo will show up at | |
466 | https://github.com/mygithubaccount/ceph. Once you have created your fork, | |
467 | you clone it by doing: | |
468 | ||
469 | .. code:: | |
470 | ||
471 | $ git clone https://github.com/mygithubaccount/ceph | |
472 | ||
473 | While it is possible to clone the upstream repo directly, in this case you | |
474 | must fork it first. Forking is what enables us to open a `GitHub pull | |
475 | request`_. | |
476 | ||
477 | For more information on using GitHub, refer to `GitHub Help | |
478 | <https://help.github.com/>`_. | |
479 | ||
480 | Local environment | |
481 | ----------------- | |
482 | ||
483 | In the local environment created in the previous step, you now have a | |
484 | copy of the ``master`` branch in ``remotes/origin/master``. Since the fork | |
485 | (https://github.com/mygithubaccount/ceph.git) is frozen in time and the | |
486 | upstream repo (https://github.com/ceph/ceph.git, typically abbreviated to | |
487 | ``ceph/ceph.git``) is updated frequently by other developers, you will need | |
488 | to sync your fork periodically. To do this, first add the upstream repo as | |
489 | a "remote" and fetch it:: | |
490 | ||
491 | $ git remote add ceph https://github.com/ceph/ceph.git | |
492 | $ git fetch ceph | |
493 | ||
494 | Fetching downloads all objects (commits, branches) that were added since | |
495 | the last sync. After running these commands, all the branches from | |
496 | ``ceph/ceph.git`` are downloaded to the local git repo as | |
497 | ``remotes/ceph/$BRANCH_NAME`` and can be referenced as | |
498 | ``ceph/$BRANCH_NAME`` in certain git commands. | |
499 | ||
500 | For example, your local ``master`` branch can be reset to the upstream Ceph | |
501 | ``master`` branch by doing:: | |
502 | ||
503 | $ git fetch ceph | |
504 | $ git checkout master | |
505 | $ git reset --hard ceph/master | |
506 | ||
507 | Finally, the ``master`` branch of your fork can then be synced to upstream | |
508 | master by:: | |
509 | ||
510 | $ git push -u origin master | |
511 | ||
512 | Bugfix branch | |
513 | ------------- | |
514 | ||
515 | Next, create a branch for the bugfix: | |
516 | ||
517 | .. code:: | |
518 | ||
519 | $ git checkout master | |
520 | $ git checkout -b fix_1 | |
521 | $ git push -u origin fix_1 | |
522 | ||
523 | This creates a ``fix_1`` branch locally and in our GitHub fork. At this | |
524 | point, the ``fix_1`` branch is identical to the ``master`` branch, but not | |
525 | for long! You are now ready to modify the code. | |
526 | ||
527 | Fix bug locally | |
528 | --------------- | |
529 | ||
530 | At this point, change the status of the tracker issue to "In progress" to | |
531 | communicate to the other Ceph developers that you have begun working on a | |
532 | fix. If you don't have permission to change that field, your comment that | |
533 | you are working on the issue is sufficient. | |
534 | ||
535 | Possibly, your fix is very simple and requires only minimal testing. | |
536 | More likely, it will be an iterative process involving trial and error, not | |
537 | to mention skill. An explanation of how to fix bugs is beyond the | |
538 | scope of this document. Instead, we focus on the mechanics of the process | |
539 | in the context of the Ceph project. | |
540 | ||
541 | A detailed discussion of the tools available for validating your bugfixes, | |
542 | see the `Testing`_ chapter. | |
543 | ||
544 | For now, let us just assume that you have finished work on the bugfix and | |
545 | that you have tested it and believe it works. Commit the changes to your local | |
546 | branch using the ``--signoff`` option:: | |
547 | ||
548 | $ git commit -as | |
549 | ||
550 | and push the changes to your fork:: | |
551 | ||
552 | $ git push origin fix_1 | |
553 | ||
554 | GitHub pull request | |
555 | ------------------- | |
556 | ||
557 | The next step is to open a GitHub pull request. The purpose of this step is | |
558 | to make your bugfix available to the community of Ceph developers. They | |
559 | will review it and may do additional testing on it. | |
560 | ||
561 | In short, this is the point where you "go public" with your modifications. | |
562 | Psychologically, you should be prepared to receive suggestions and | |
563 | constructive criticism. Don't worry! In our experience, the Ceph project is | |
564 | a friendly place! | |
565 | ||
566 | If you are uncertain how to use pull requests, you may read | |
567 | `this GitHub pull request tutorial`_. | |
568 | ||
569 | .. _`this GitHub pull request tutorial`: | |
570 | https://help.github.com/articles/using-pull-requests/ | |
571 | ||
572 | For some ideas on what constitutes a "good" pull request, see | |
573 | the `Git Commit Good Practice`_ article at the `OpenStack Project Wiki`_. | |
574 | ||
575 | .. _`Git Commit Good Practice`: https://wiki.openstack.org/wiki/GitCommitMessages | |
576 | .. _`OpenStack Project Wiki`: https://wiki.openstack.org/wiki/Main_Page | |
577 | ||
578 | Once your pull request (PR) is opened, update the `Issue tracker`_ by | |
579 | adding a comment to the bug pointing the other developers to your PR. The | |
580 | update can be as simple as:: | |
581 | ||
582 | *PR*: https://github.com/ceph/ceph/pull/$NUMBER_OF_YOUR_PULL_REQUEST | |
583 | ||
584 | Automated PR validation | |
585 | ----------------------- | |
586 | ||
587 | When your PR hits GitHub, the Ceph project's `Continuous Integration (CI) | |
588 | <https://en.wikipedia.org/wiki/Continuous_integration>`_ | |
589 | infrastructure will test it automatically. At the time of this writing | |
590 | (March 2016), the automated CI testing included a test to check that the | |
591 | commits in the PR are properly signed (see `Submitting patches`_) and a | |
224ce89b | 592 | `make check`_ test. |
7c673cae | 593 | |
224ce89b | 594 | The latter, `make check`_, builds the PR and runs it through a battery of |
7c673cae FG |
595 | tests. These tests run on machines operated by the Ceph Continuous |
596 | Integration (CI) team. When the tests complete, the result will be shown | |
597 | on GitHub in the pull request itself. | |
598 | ||
599 | You can (and should) also test your modifications before you open a PR. | |
600 | Refer to the `Testing`_ chapter for details. | |
601 | ||
224ce89b WB |
602 | Notes on PR make check test |
603 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
604 | ||
605 | The GitHub `make check`_ test is driven by a Jenkins instance. | |
606 | ||
607 | Jenkins merges the PR branch into the latest version of the base branch before | |
608 | starting the build, so you don't have to rebase the PR to pick up any fixes. | |
609 | ||
610 | You can trigger the PR tests at any time by adding a comment to the PR - the | |
611 | comment should contain the string "test this please". Since a human subscribed | |
612 | to the PR might interpret that as a request for him or her to test the PR, it's | |
613 | good to write the request as "Jenkins, test this please". | |
614 | ||
615 | The `make check`_ log is the place to go if there is a failure and you're not | |
616 | sure what caused it. To reach it, first click on "details" (next to the `make | |
617 | check`_ test in the PR) to get into the Jenkins web GUI, and then click on | |
618 | "Console Output" (on the left). | |
619 | ||
620 | Jenkins is set up to grep the log for strings known to have been associated | |
621 | with `make check`_ failures in the past. However, there is no guarantee that | |
622 | the strings are associated with any given `make check`_ failure. You have to | |
623 | dig into the log to be sure. | |
624 | ||
7c673cae FG |
625 | Integration tests AKA ceph-qa-suite |
626 | ----------------------------------- | |
627 | ||
628 | Since Ceph is a complex beast, it may also be necessary to test your fix to | |
629 | see how it behaves on real clusters running either on real or virtual | |
630 | hardware. Tests designed for this purpose live in the `ceph/qa | |
631 | sub-directory`_ and are run via the `teuthology framework`_. | |
632 | ||
633 | .. _`ceph/qa sub-directory`: https://github.com/ceph/ceph/tree/master/qa/ | |
634 | .. _`teuthology repository`: https://github.com/ceph/teuthology | |
635 | .. _`teuthology framework`: https://github.com/ceph/teuthology | |
636 | ||
637 | If you have access to an OpenStack tenant, you are encouraged to run the | |
638 | integration tests yourself using `ceph-workbench ceph-qa-suite`_, | |
639 | and to post the test results to the PR. | |
640 | ||
641 | .. _`ceph-workbench ceph-qa-suite`: http://ceph-workbench.readthedocs.org/ | |
642 | ||
643 | The Ceph community has access to the `Sepia lab | |
644 | <http://ceph.github.io/sepia/>`_ where integration tests can be run on | |
645 | real hardware. Other developers may add tags like "needs-qa" to your PR. | |
646 | This allows PRs that need testing to be merged into a single branch and | |
647 | tested all at the same time. Since teuthology suites can take hours | |
648 | (even days in some cases) to run, this can save a lot of time. | |
649 | ||
650 | Integration testing is discussed in more detail in the `Testing`_ chapter. | |
651 | ||
652 | Code review | |
653 | ----------- | |
654 | ||
655 | Once your bugfix has been thoroughly tested, or even during this process, | |
656 | it will be subjected to code review by other developers. This typically | |
657 | takes the form of correspondence in the PR itself, but can be supplemented | |
658 | by discussions on `IRC`_ and the `Mailing list`_. | |
659 | ||
660 | Amending your PR | |
661 | ---------------- | |
662 | ||
663 | While your PR is going through `Testing`_ and `Code review`_, you can | |
664 | modify it at any time by editing files in your local branch. | |
665 | ||
666 | After the changes are committed locally (to the ``fix_1`` branch in our | |
667 | example), they need to be pushed to GitHub so they appear in the PR. | |
668 | ||
669 | Modifying the PR is done by adding commits to the ``fix_1`` branch upon | |
670 | which it is based, often followed by rebasing to modify the branch's git | |
671 | history. See `this tutorial | |
672 | <https://www.atlassian.com/git/tutorials/rewriting-history>`_ for a good | |
673 | introduction to rebasing. When you are done with your modifications, you | |
674 | will need to force push your branch with: | |
675 | ||
676 | .. code:: | |
677 | ||
678 | $ git push --force origin fix_1 | |
679 | ||
680 | Merge | |
681 | ----- | |
682 | ||
683 | The bugfixing process culminates when one of the project leads decides to | |
684 | merge your PR. | |
685 | ||
686 | When this happens, it is a signal for you (or the lead who merged the PR) | |
687 | to change the `Issue tracker`_ status to "Resolved". Some issues may be | |
688 | flagged for backporting, in which case the status should be changed to | |
689 | "Pending Backport" (see the `Backporting`_ chapter for details). | |
690 | ||
691 | ||
692 | Testing | |
693 | ======= | |
694 | ||
224ce89b | 695 | Ceph has two types of tests: `make check`_ tests and integration tests. |
7c673cae FG |
696 | The former are run via `GNU Make <https://www.gnu.org/software/make/>`, |
697 | and the latter are run via the `teuthology framework`_. The following two | |
224ce89b WB |
698 | chapters examine the `make check`_ and integration tests in detail. |
699 | ||
700 | .. _`make check`: | |
7c673cae FG |
701 | |
702 | Testing - make check | |
703 | ==================== | |
704 | ||
224ce89b | 705 | After compiling Ceph, the `make check`_ command can be used to run the |
7c673cae | 706 | code through a battery of tests covering various aspects of Ceph. For |
224ce89b | 707 | inclusion in `make check`_, a test must: |
7c673cae FG |
708 | |
709 | * bind ports that do not conflict with other tests | |
710 | * not require root access | |
711 | * not require more than one machine to run | |
712 | * complete within a few minutes | |
713 | ||
224ce89b | 714 | While it is possible to run `make check`_ directly, it can be tricky to |
7c673cae | 715 | correctly set up your environment. Fortunately, a script is provided to |
224ce89b | 716 | make it easier run `make check`_ on your code. It can be run from the |
7c673cae FG |
717 | top-level directory of the Ceph source tree by doing:: |
718 | ||
719 | $ ./run-make-check.sh | |
720 | ||
721 | You will need a minimum of 8GB of RAM and 32GB of free disk space for this | |
722 | command to complete successfully on x86_64 (other architectures may have | |
723 | different constraints). Depending on your hardware, it can take from 20 | |
724 | minutes to three hours to complete, but it's worth the wait. | |
725 | ||
224ce89b WB |
726 | Caveats |
727 | ------- | |
7c673cae | 728 | |
224ce89b WB |
729 | 1. Unlike the various Ceph daemons and ``ceph-fuse``, the `make check`_ tests |
730 | are linked against the default memory allocator (glibc) unless explicitly | |
731 | linked against something else. This enables tools like valgrind to be used | |
732 | in the tests. | |
7c673cae FG |
733 | |
734 | Testing - integration tests | |
735 | =========================== | |
736 | ||
737 | When a test requires multiple machines, root access or lasts for a | |
738 | longer time (for example, to simulate a realistic Ceph deployment), it | |
739 | is deemed to be an integration test. Integration tests are organized into | |
740 | "suites", which are defined in the `ceph/qa sub-directory`_ and run with | |
741 | the ``teuthology-suite`` command. | |
742 | ||
743 | The ``teuthology-suite`` command is part of the `teuthology framework`_. | |
744 | In the sections that follow we attempt to provide a detailed introduction | |
745 | to that framework from the perspective of a beginning Ceph developer. | |
746 | ||
747 | Teuthology consumes packages | |
748 | ---------------------------- | |
749 | ||
750 | It may take some time to understand the significance of this fact, but it | |
751 | is `very` significant. It means that automated tests can be conducted on | |
752 | multiple platforms using the same packages (RPM, DEB) that can be | |
753 | installed on any machine running those platforms. | |
754 | ||
755 | Teuthology has a `list of platforms that it supports | |
756 | <https://github.com/ceph/ceph/tree/master/qa/distros/supported>`_ (as | |
757 | of March 2016 the list consisted of "CentOS 7.2" and "Ubuntu 14.04"). It | |
758 | expects to be provided pre-built Ceph packages for these platforms. | |
759 | Teuthology deploys these platforms on machines (bare-metal or | |
760 | cloud-provisioned), installs the packages on them, and deploys Ceph | |
761 | clusters on them - all as called for by the test. | |
762 | ||
763 | The nightlies | |
764 | ------------- | |
765 | ||
766 | A number of integration tests are run on a regular basis in the `Sepia | |
767 | lab`_ against the official Ceph repositories (on the ``master`` development | |
768 | branch and the stable branches). Traditionally, these tests are called "the | |
769 | nightlies" because the Ceph core developers used to live and work in | |
770 | the same time zone and from their perspective the tests were run overnight. | |
771 | ||
772 | The results of the nightlies are published at http://pulpito.ceph.com/ and | |
773 | http://pulpito.ovh.sepia.ceph.com:8081/. The developer nick shows in the | |
774 | test results URL and in the first column of the Pulpito dashboard. The | |
775 | results are also reported on the `ceph-qa mailing list | |
31f18b77 | 776 | <https://ceph.com/irc/>`_ for analysis. |
7c673cae FG |
777 | |
778 | Suites inventory | |
779 | ---------------- | |
780 | ||
781 | The ``suites`` directory of the `ceph/qa sub-directory`_ contains | |
782 | all the integration tests, for all the Ceph components. | |
783 | ||
784 | `ceph-deploy <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-deploy>`_ | |
785 | install a Ceph cluster with ``ceph-deploy`` (`ceph-deploy man page`_) | |
786 | ||
787 | `ceph-disk <https://github.com/ceph/ceph/tree/master/qa/suites/ceph-disk>`_ | |
788 | verify init scripts (upstart etc.) and udev integration with | |
789 | ``ceph-disk`` (`ceph-disk man page`_), with and without `dmcrypt | |
790 | <https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt>`_ support. | |
791 | ||
792 | `dummy <https://github.com/ceph/ceph/tree/master/qa/suites/dummy>`_ | |
793 | get a machine, do nothing and return success (commonly used to | |
794 | verify the integration testing infrastructure works as expected) | |
795 | ||
796 | `fs <https://github.com/ceph/ceph/tree/master/qa/suites/fs>`_ | |
797 | test CephFS | |
798 | ||
799 | `kcephfs <https://github.com/ceph/ceph/tree/master/qa/suites/kcephfs>`_ | |
800 | test the CephFS kernel module | |
801 | ||
802 | `krbd <https://github.com/ceph/ceph/tree/master/qa/suites/krbd>`_ | |
803 | test the RBD kernel module | |
804 | ||
805 | `powercycle <https://github.com/ceph/ceph/tree/master/qa/suites/powercycle>`_ | |
806 | verify the Ceph cluster behaves when machines are powered off | |
807 | and on again | |
808 | ||
809 | `rados <https://github.com/ceph/ceph/tree/master/qa/suites/rados>`_ | |
810 | run Ceph clusters including OSDs and MONs, under various conditions of | |
811 | stress | |
812 | ||
813 | `rbd <https://github.com/ceph/ceph/tree/master/qa/suites/rbd>`_ | |
814 | run RBD tests using actual Ceph clusters, with and without qemu | |
815 | ||
816 | `rgw <https://github.com/ceph/ceph/tree/master/qa/suites/rgw>`_ | |
817 | run RGW tests using actual Ceph clusters | |
818 | ||
819 | `smoke <https://github.com/ceph/ceph/tree/master/qa/suites/smoke>`_ | |
820 | run tests that exercise the Ceph API with an actual Ceph cluster | |
821 | ||
822 | `teuthology <https://github.com/ceph/ceph/tree/master/qa/suites/teuthology>`_ | |
823 | verify that teuthology can run integration tests, with and without OpenStack | |
824 | ||
825 | `upgrade <https://github.com/ceph/ceph/tree/master/qa/suites/upgrade>`_ | |
826 | for various versions of Ceph, verify that upgrades can happen | |
827 | without disrupting an ongoing workload | |
828 | ||
829 | .. _`ceph-deploy man page`: ../../man/8/ceph-deploy | |
830 | .. _`ceph-disk man page`: ../../man/8/ceph-disk | |
831 | ||
832 | teuthology-describe-tests | |
833 | ------------------------- | |
834 | ||
835 | In February 2016, a new feature called ``teuthology-describe-tests`` was | |
836 | added to the `teuthology framework`_ to facilitate documentation and better | |
837 | understanding of integration tests (`feature announcement | |
838 | <http://article.gmane.org/gmane.comp.file-systems.ceph.devel/29287>`_). | |
839 | ||
840 | The upshot is that tests can be documented by embedding ``meta:`` | |
841 | annotations in the yaml files used to define the tests. The results can be | |
842 | seen in the `ceph-qa-suite wiki | |
843 | <http://tracker.ceph.com/projects/ceph-qa-suite/wiki/>`_. | |
844 | ||
845 | Since this is a new feature, many yaml files have yet to be annotated. | |
846 | Developers are encouraged to improve the documentation, in terms of both | |
847 | coverage and quality. | |
848 | ||
849 | How integration tests are run | |
850 | ----------------------------- | |
851 | ||
852 | Given that - as a new Ceph developer - you will typically not have access | |
853 | to the `Sepia lab`_, you may rightly ask how you can run the integration | |
854 | tests in your own environment. | |
855 | ||
856 | One option is to set up a teuthology cluster on bare metal. Though this is | |
857 | a non-trivial task, it `is` possible. Here are `some notes | |
858 | <http://docs.ceph.com/teuthology/docs/LAB_SETUP.html>`_ to get you started | |
859 | if you decide to go this route. | |
860 | ||
861 | If you have access to an OpenStack tenant, you have another option: the | |
862 | `teuthology framework`_ has an OpenStack backend, which is documented `here | |
863 | <https://github.com/dachary/teuthology/tree/openstack#openstack-backend>`__. | |
864 | This OpenStack backend can build packages from a given git commit or | |
865 | branch, provision VMs, install the packages and run integration tests | |
866 | on those VMs. This process is controlled using a tool called | |
867 | `ceph-workbench ceph-qa-suite`_. This tool also automates publishing of | |
868 | test results at http://teuthology-logs.public.ceph.com. | |
869 | ||
870 | Running integration tests on your code contributions and publishing the | |
871 | results allows reviewers to verify that changes to the code base do not | |
872 | cause regressions, or to analyze test failures when they do occur. | |
873 | ||
874 | Every teuthology cluster, whether bare-metal or cloud-provisioned, has a | |
875 | so-called "teuthology machine" from which tests suites are triggered using the | |
876 | ``teuthology-suite`` command. | |
877 | ||
878 | A detailed and up-to-date description of each `teuthology-suite`_ option is | |
879 | available by running the following command on the teuthology machine:: | |
880 | ||
881 | $ teuthology-suite --help | |
882 | ||
883 | .. _teuthology-suite: http://docs.ceph.com/teuthology/docs/teuthology.suite.html | |
884 | ||
885 | How integration tests are defined | |
886 | --------------------------------- | |
887 | ||
888 | Integration tests are defined by yaml files found in the ``suites`` | |
889 | subdirectory of the `ceph/qa sub-directory`_ and implemented by python | |
890 | code found in the ``tasks`` subdirectory. Some tests ("standalone tests") | |
891 | are defined in a single yaml file, while other tests are defined by a | |
892 | directory tree containing yaml files that are combined, at runtime, into a | |
893 | larger yaml file. | |
894 | ||
895 | Reading a standalone test | |
896 | ------------------------- | |
897 | ||
898 | Let us first examine a standalone test, or "singleton". | |
899 | ||
900 | Here is a commented example using the integration test | |
901 | `rados/singleton/all/admin-socket.yaml | |
902 | <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/admin-socket.yaml>`_ | |
903 | :: | |
904 | ||
905 | roles: | |
906 | - - mon.a | |
907 | - osd.0 | |
908 | - osd.1 | |
909 | tasks: | |
910 | - install: | |
911 | - ceph: | |
912 | - admin_socket: | |
913 | osd.0: | |
914 | version: | |
915 | git_version: | |
916 | help: | |
917 | config show: | |
918 | config set filestore_dump_file /tmp/foo: | |
919 | perf dump: | |
920 | perf schema: | |
921 | ||
922 | The ``roles`` array determines the composition of the cluster (how | |
923 | many MONs, OSDs, etc.) on which this test is designed to run, as well | |
924 | as how these roles will be distributed over the machines in the | |
925 | testing cluster. In this case, there is only one element in the | |
926 | top-level array: therefore, only one machine is allocated to the | |
927 | test. The nested array declares that this machine shall run a MON with | |
928 | id ``a`` (that is the ``mon.a`` in the list of roles) and two OSDs | |
929 | (``osd.0`` and ``osd.1``). | |
930 | ||
931 | The body of the test is in the ``tasks`` array: each element is | |
932 | evaluated in order, causing the corresponding python file found in the | |
933 | ``tasks`` subdirectory of the `teuthology repository`_ or | |
934 | `ceph/qa sub-directory`_ to be run. "Running" in this case means calling | |
935 | the ``task()`` function defined in that file. | |
936 | ||
937 | In this case, the `install | |
938 | <https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_ | |
939 | task comes first. It installs the Ceph packages on each machine (as | |
940 | defined by the ``roles`` array). A full description of the ``install`` | |
941 | task is `found in the python file | |
942 | <https://github.com/ceph/teuthology/blob/master/teuthology/task/install/__init__.py>`_ | |
943 | (search for "def task"). | |
944 | ||
945 | The ``ceph`` task, which is documented `here | |
946 | <https://github.com/ceph/ceph/blob/master/qa/tasks/ceph.py>`__ (again, | |
947 | search for "def task"), starts OSDs and MONs (and possibly MDSs as well) | |
948 | as required by the ``roles`` array. In this example, it will start one MON | |
949 | (``mon.a``) and two OSDs (``osd.0`` and ``osd.1``), all on the same | |
950 | machine. Control moves to the next task when the Ceph cluster reaches | |
951 | ``HEALTH_OK`` state. | |
952 | ||
953 | The next task is ``admin_socket`` (`source code | |
954 | <https://github.com/ceph/ceph/blob/master/qa/tasks/admin_socket.py>`_). | |
955 | The parameter of the ``admin_socket`` task (and any other task) is a | |
956 | structure which is interpreted as documented in the task. In this example | |
957 | the parameter is a set of commands to be sent to the admin socket of | |
958 | ``osd.0``. The task verifies that each of them returns on success (i.e. | |
959 | exit code zero). | |
960 | ||
961 | This test can be run with:: | |
962 | ||
963 | $ teuthology-suite --suite rados/singleton/all/admin-socket.yaml fs/ext4.yaml | |
964 | ||
965 | Test descriptions | |
966 | ----------------- | |
967 | ||
968 | Each test has a "test description", which is similar to a directory path, | |
969 | but not the same. In the case of a standalone test, like the one in | |
970 | `Reading a standalone test`_, the test description is identical to the | |
971 | relative path (starting from the ``suites/`` directory of the | |
972 | `ceph/qa sub-directory`_) of the yaml file defining the test. | |
973 | ||
974 | Much more commonly, tests are defined not by a single yaml file, but by a | |
975 | `directory tree of yaml files`. At runtime, the tree is walked and all yaml | |
976 | files (facets) are combined into larger yaml "programs" that define the | |
977 | tests. A full listing of the yaml defining the test is included at the | |
978 | beginning of every test log. | |
979 | ||
980 | In these cases, the description of each test consists of the | |
981 | subdirectory under `suites/ | |
982 | <https://github.com/ceph/ceph/tree/master/qa/suites>`_ containing the | |
983 | yaml facets, followed by an expression in curly braces (``{}``) consisting of | |
984 | a list of yaml facets in order of concatenation. For instance the | |
985 | test description:: | |
986 | ||
987 | ceph-disk/basic/{distros/centos_7.0.yaml tasks/ceph-disk.yaml} | |
988 | ||
989 | signifies the concatenation of two files: | |
990 | ||
991 | * ceph-disk/basic/distros/centos_7.0.yaml | |
992 | * ceph-disk/basic/tasks/ceph-disk.yaml | |
993 | ||
994 | How are tests built from directories? | |
995 | ------------------------------------- | |
996 | ||
997 | As noted in the previous section, most tests are not defined in a single | |
998 | yaml file, but rather as a `combination` of files collected from a | |
999 | directory tree within the ``suites/`` subdirectory of the `ceph/qa sub-directory`_. | |
1000 | ||
1001 | The set of all tests defined by a given subdirectory of ``suites/`` is | |
1002 | called an "integration test suite", or a "teuthology suite". | |
1003 | ||
1004 | Combination of yaml facets is controlled by special files (``%`` and | |
1005 | ``+``) that are placed within the directory tree and can be thought of as | |
1006 | operators. The ``%`` file is the "convolution" operator and ``+`` | |
1007 | signifies concatenation. | |
1008 | ||
1009 | Convolution operator | |
1010 | -------------------- | |
1011 | ||
1012 | The convolution operator, implemented as an empty file called ``%``, tells | |
1013 | teuthology to construct a test matrix from yaml facets found in | |
1014 | subdirectories below the directory containing the operator. | |
1015 | ||
1016 | For example, the `ceph-disk suite | |
1017 | <https://github.com/ceph/ceph/tree/jewel/qa/suites/ceph-disk/>`_ is | |
1018 | defined by the ``suites/ceph-disk/`` tree, which consists of the files and | |
1019 | subdirectories in the following structure:: | |
1020 | ||
1021 | directory: ceph-disk/basic | |
1022 | file: % | |
1023 | directory: distros | |
1024 | file: centos_7.0.yaml | |
1025 | file: ubuntu_14.04.yaml | |
1026 | directory: tasks | |
1027 | file: ceph-disk.yaml | |
1028 | ||
1029 | This is interpreted as a 2x1 matrix consisting of two tests: | |
1030 | ||
1031 | 1. ceph-disk/basic/{distros/centos_7.0.yaml tasks/ceph-disk.yaml} | |
1032 | 2. ceph-disk/basic/{distros/ubuntu_14.04.yaml tasks/ceph-disk.yaml} | |
1033 | ||
1034 | i.e. the concatenation of centos_7.0.yaml and ceph-disk.yaml and | |
1035 | the concatenation of ubuntu_14.04.yaml and ceph-disk.yaml, respectively. | |
1036 | In human terms, this means that the task found in ``ceph-disk.yaml`` is | |
1037 | intended to run on both CentOS 7.0 and Ubuntu 14.04. | |
1038 | ||
1039 | Without the file percent, the ``ceph-disk`` tree would be interpreted as | |
1040 | three standalone tests: | |
1041 | ||
1042 | * ceph-disk/basic/distros/centos_7.0.yaml | |
1043 | * ceph-disk/basic/distros/ubuntu_14.04.yaml | |
1044 | * ceph-disk/basic/tasks/ceph-disk.yaml | |
1045 | ||
1046 | (which would of course be wrong in this case). | |
1047 | ||
1048 | Referring to the `ceph/qa sub-directory`_, you will notice that the | |
1049 | ``centos_7.0.yaml`` and ``ubuntu_14.04.yaml`` files in the | |
1050 | ``suites/ceph-disk/basic/distros/`` directory are implemented as symlinks. | |
1051 | By using symlinks instead of copying, a single file can appear in multiple | |
1052 | suites. This eases the maintenance of the test framework as a whole. | |
1053 | ||
1054 | All the tests generated from the ``suites/ceph-disk/`` directory tree | |
1055 | (also known as the "ceph-disk suite") can be run with:: | |
1056 | ||
1057 | $ teuthology-suite --suite ceph-disk | |
1058 | ||
1059 | An individual test from the `ceph-disk suite`_ can be run by adding the | |
1060 | ``--filter`` option:: | |
1061 | ||
1062 | $ teuthology-suite \ | |
1063 | --suite ceph-disk/basic \ | |
1064 | --filter 'ceph-disk/basic/{distros/ubuntu_14.04.yaml tasks/ceph-disk.yaml}' | |
1065 | ||
1066 | .. note: To run a standalone test like the one in `Reading a standalone | |
1067 | test`_, ``--suite`` alone is sufficient. If you want to run a single | |
1068 | test from a suite that is defined as a directory tree, ``--suite`` must | |
1069 | be combined with ``--filter``. This is because the ``--suite`` option | |
1070 | understands POSIX relative paths only. | |
1071 | ||
1072 | Concatenation operator | |
1073 | ---------------------- | |
1074 | ||
1075 | For even greater flexibility in sharing yaml files between suites, the | |
1076 | special file plus (``+``) can be used to concatenate files within a | |
1077 | directory. For instance, consider the `suites/rbd/thrash | |
1078 | <https://github.com/ceph/ceph/tree/master/qa/suites/rbd/thrash>`_ | |
1079 | tree:: | |
1080 | ||
1081 | directory: rbd/thrash | |
1082 | file: % | |
1083 | directory: clusters | |
1084 | file: + | |
1085 | file: fixed-2.yaml | |
1086 | file: openstack.yaml | |
1087 | directory: workloads | |
1088 | file: rbd_api_tests_copy_on_read.yaml | |
1089 | file: rbd_api_tests.yaml | |
1090 | ||
1091 | This creates two tests: | |
1092 | ||
1093 | * rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} | |
1094 | * rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests.yaml} | |
1095 | ||
1096 | Because the ``clusters/`` subdirectory contains the special file plus | |
1097 | (``+``), all the other files in that subdirectory (``fixed-2.yaml`` and | |
1098 | ``openstack.yaml`` in this case) are concatenated together | |
1099 | and treated as a single file. Without the special file plus, they would | |
1100 | have been convolved with the files from the workloads directory to create | |
1101 | a 2x2 matrix: | |
1102 | ||
1103 | * rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml} | |
1104 | * rbd/thrash/{clusters/openstack.yaml workloads/rbd_api_tests.yaml} | |
1105 | * rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests_copy_on_read.yaml} | |
1106 | * rbd/thrash/{clusters/fixed-2.yaml workloads/rbd_api_tests.yaml} | |
1107 | ||
1108 | The ``clusters/fixed-2.yaml`` file is shared among many suites to | |
1109 | define the following ``roles``:: | |
1110 | ||
1111 | roles: | |
1112 | - [mon.a, mon.c, osd.0, osd.1, osd.2, client.0] | |
1113 | - [mon.b, osd.3, osd.4, osd.5, client.1] | |
1114 | ||
1115 | The ``rbd/thrash`` suite as defined above, consisting of two tests, | |
1116 | can be run with:: | |
1117 | ||
1118 | $ teuthology-suite --suite rbd/thrash | |
1119 | ||
1120 | A single test from the rbd/thrash suite can be run by adding the | |
1121 | ``--filter`` option:: | |
1122 | ||
1123 | $ teuthology-suite \ | |
1124 | --suite rbd/thrash \ | |
1125 | --filter 'rbd/thrash/{clusters/fixed-2.yaml clusters/openstack.yaml workloads/rbd_api_tests_copy_on_read.yaml}' | |
1126 | ||
1127 | Filtering tests by their description | |
1128 | ------------------------------------ | |
1129 | ||
1130 | When a few jobs fail and need to be run again, the ``--filter`` option | |
1131 | can be used to select tests with a matching description. For instance, if the | |
1132 | ``rados`` suite fails the `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ test, the following will only run the tests that contain this file:: | |
1133 | ||
1134 | teuthology-suite --suite rados --filter all/peer.yaml | |
1135 | ||
1136 | The ``--filter-out`` option does the opposite (it matches tests that do | |
1137 | `not` contain a given string), and can be combined with the ``--filter`` | |
1138 | option. | |
1139 | ||
1140 | Both ``--filter`` and ``--filter-out`` take a comma-separated list of strings (which | |
1141 | means the comma character is implicitly forbidden in filenames found in the | |
1142 | `ceph/qa sub-directory`_). For instance:: | |
1143 | ||
1144 | teuthology-suite --suite rados --filter all/peer.yaml,all/rest-api.yaml | |
1145 | ||
1146 | will run tests that contain either | |
1147 | `all/peer.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/peer.yaml>`_ | |
1148 | or | |
1149 | `all/rest-api.yaml <https://github.com/ceph/ceph/blob/master/qa/suites/rados/singleton/all/rest-api.yaml>`_ | |
1150 | ||
1151 | Each string is looked up anywhere in the test description and has to | |
1152 | be an exact match: they are not regular expressions. | |
1153 | ||
1154 | Reducing the number of tests | |
1155 | ---------------------------- | |
1156 | ||
1157 | The ``rados`` suite generates thousands of tests out of a few hundred | |
31f18b77 FG |
1158 | files. This happens because teuthology constructs test matrices from |
1159 | subdirectories wherever it encounters a file named ``%``. For instance, | |
1160 | all tests in the `rados/basic suite | |
1161 | <https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic>`_ | |
1162 | run with different messenger types: ``simple``, ``async`` and | |
1163 | ``random``, because they are combined (via the special file ``%``) with | |
1164 | the `msgr directory | |
1165 | <https://github.com/ceph/ceph/tree/master/qa/suites/rados/basic/msgr>`_ | |
7c673cae FG |
1166 | |
1167 | All integration tests are required to be run before a Ceph release is published. | |
1168 | When merely verifying whether a contribution can be merged without | |
1169 | risking a trivial regression, it is enough to run a subset. The ``--subset`` option can be used to | |
1170 | reduce the number of tests that are triggered. For instance:: | |
1171 | ||
1172 | teuthology-suite --suite rados --subset 0/4000 | |
1173 | ||
1174 | will run as few tests as possible. The tradeoff in this case is that | |
224ce89b | 1175 | not all combinations of test variations will together, |
7c673cae FG |
1176 | but no matter how small a ratio is provided in the ``--subset``, |
1177 | teuthology will still ensure that all files in the suite are in at | |
1178 | least one test. Understanding the actual logic that drives this | |
1179 | requires reading the teuthology source code. | |
1180 | ||
1181 | The ``--limit`` option only runs the first ``N`` tests in the suite: | |
1182 | this is rarely useful, however, because there is no way to control which | |
1183 | test will be first. | |
1184 | ||
1185 | Testing in the cloud | |
1186 | ==================== | |
1187 | ||
1188 | In this chapter, we will explain in detail how use an OpenStack | |
1189 | tenant as an environment for Ceph integration testing. | |
1190 | ||
1191 | Assumptions and caveat | |
1192 | ---------------------- | |
1193 | ||
1194 | We assume that: | |
1195 | ||
1196 | 1. you are the only person using the tenant | |
1197 | 2. you have the credentials | |
1198 | 3. the tenant supports the ``nova`` and ``cinder`` APIs | |
1199 | ||
1200 | Caveat: be aware that, as of this writing (July 2016), testing in | |
1201 | OpenStack clouds is a new feature. Things may not work as advertised. | |
1202 | If you run into trouble, ask for help on `IRC`_ or the `Mailing list`_, or | |
1203 | open a bug report at the `ceph-workbench bug tracker`_. | |
1204 | ||
1205 | .. _`ceph-workbench bug tracker`: http://ceph-workbench.dachary.org/root/ceph-workbench/issues | |
1206 | ||
1207 | Prepare tenant | |
1208 | -------------- | |
1209 | ||
1210 | If you have not tried to use ``ceph-workbench`` with this tenant before, | |
1211 | proceed to the next step. | |
1212 | ||
1213 | To start with a clean slate, login to your tenant via the Horizon dashboard and: | |
1214 | ||
1215 | * terminate the ``teuthology`` and ``packages-repository`` instances, if any | |
1216 | * delete the ``teuthology`` and ``teuthology-worker`` security groups, if any | |
1217 | * delete the ``teuthology`` and ``teuthology-myself`` key pairs, if any | |
1218 | ||
1219 | Also do the above if you ever get key-related errors ("invalid key", etc.) when | |
1220 | trying to schedule suites. | |
1221 | ||
1222 | Getting ceph-workbench | |
1223 | ---------------------- | |
1224 | ||
1225 | Since testing in the cloud is done using the `ceph-workbench | |
1226 | ceph-qa-suite`_ tool, you will need to install that first. It is designed | |
1227 | to be installed via Docker, so if you don't have Docker running on your | |
31f18b77 | 1228 | development machine, take care of that first. You can follow `the official |
224ce89b | 1229 | tutorial <https://docs.docker.com/engine/installation/>`_ to install if |
31f18b77 | 1230 | you have not installed yet. |
7c673cae FG |
1231 | |
1232 | Once Docker is up and running, install ``ceph-workbench`` by following the | |
1233 | `Installation instructions in the ceph-workbench documentation | |
1234 | <http://ceph-workbench.readthedocs.org/en/latest/#installation>`_. | |
1235 | ||
1236 | Linking ceph-workbench with your OpenStack tenant | |
1237 | ------------------------------------------------- | |
1238 | ||
1239 | Before you can trigger your first teuthology suite, you will need to link | |
1240 | ``ceph-workbench`` with your OpenStack account. | |
1241 | ||
1242 | First, download a ``openrc.sh`` file by clicking on the "Download OpenStack | |
1243 | RC File" button, which can be found in the "API Access" tab of the "Access | |
1244 | & Security" dialog of the OpenStack Horizon dashboard. | |
1245 | ||
1246 | Second, create a ``~/.ceph-workbench`` directory, set its permissions to | |
1247 | 700, and move the ``openrc.sh`` file into it. Make sure that the filename | |
1248 | is exactly ``~/.ceph-workbench/openrc.sh``. | |
1249 | ||
1250 | Third, edit the file so it does not ask for your OpenStack password | |
1251 | interactively. Comment out the relevant lines and replace them with | |
1252 | something like:: | |
1253 | ||
1254 | export OS_PASSWORD="aiVeth0aejee3eep8rogho3eep7Pha6ek" | |
1255 | ||
1256 | When `ceph-workbench ceph-qa-suite`_ connects to your OpenStack tenant for | |
1257 | the first time, it will generate two keypairs: ``teuthology-myself`` and | |
1258 | ``teuthology``. | |
1259 | ||
1260 | .. If this is not the first time you have tried to use | |
1261 | .. `ceph-workbench ceph-qa-suite`_ with this tenant, make sure to delete any | |
1262 | .. stale keypairs with these names! | |
1263 | ||
1264 | Run the dummy suite | |
1265 | ------------------- | |
1266 | ||
1267 | You are now ready to take your OpenStack teuthology setup for a test | |
1268 | drive:: | |
1269 | ||
1270 | $ ceph-workbench ceph-qa-suite --suite dummy | |
1271 | ||
1272 | Be forewarned that the first run of `ceph-workbench ceph-qa-suite`_ on a | |
1273 | pristine tenant will take a long time to complete because it downloads a VM | |
1274 | image and during this time the command may not produce any output. | |
1275 | ||
1276 | The images are cached in OpenStack, so they are only downloaded once. | |
1277 | Subsequent runs of the same command will complete faster. | |
1278 | ||
1279 | Although ``dummy`` suite does not run any tests, in all other respects it | |
1280 | behaves just like a teuthology suite and produces some of the same | |
1281 | artifacts. | |
1282 | ||
1283 | The last bit of output should look something like this:: | |
1284 | ||
1285 | pulpito web interface: http://149.202.168.201:8081/ | |
1286 | ssh access : ssh -i /home/smithfarm/.ceph-workbench/teuthology-myself.pem ubuntu@149.202.168.201 # logs in /usr/share/nginx/html | |
1287 | ||
1288 | What this means is that `ceph-workbench ceph-qa-suite`_ triggered the test | |
1289 | suite run. It does not mean that the suite run has completed. To monitor | |
1290 | progress of the run, check the Pulpito web interface URL periodically, or | |
1291 | if you are impatient, ssh to the teuthology machine using the ssh command | |
1292 | shown and do:: | |
1293 | ||
1294 | $ tail -f /var/log/teuthology.* | |
1295 | ||
1296 | The `/usr/share/nginx/html` directory contains the complete logs of the | |
1297 | test suite. If we had provided the ``--upload`` option to the | |
1298 | `ceph-workbench ceph-qa-suite`_ command, these logs would have been | |
1299 | uploaded to http://teuthology-logs.public.ceph.com. | |
1300 | ||
1301 | Run a standalone test | |
1302 | --------------------- | |
1303 | ||
1304 | The standalone test explained in `Reading a standalone test`_ can be run | |
1305 | with the following command:: | |
1306 | ||
1307 | $ ceph-workbench ceph-qa-suite --suite rados/singleton/all/admin-socket.yaml | |
1308 | ||
1309 | This will run the suite shown on the current ``master`` branch of | |
1310 | ``ceph/ceph.git``. You can specify a different branch with the ``--ceph`` | |
1311 | option, and even a different git repo with the ``--ceph-git-url`` option. (Run | |
1312 | ``ceph-workbench ceph-qa-suite --help`` for an up-to-date list of available | |
1313 | options.) | |
1314 | ||
1315 | The first run of a suite will also take a long time, because ceph packages | |
1316 | have to be built, first. Again, the packages so built are cached and | |
1317 | `ceph-workbench ceph-qa-suite`_ will not build identical packages a second | |
1318 | time. | |
1319 | ||
1320 | Interrupt a running suite | |
1321 | ------------------------- | |
1322 | ||
1323 | Teuthology suites take time to run. From time to time one may wish to | |
1324 | interrupt a running suite. One obvious way to do this is:: | |
1325 | ||
1326 | ceph-workbench ceph-qa-suite --teardown | |
1327 | ||
1328 | This destroys all VMs created by `ceph-workbench ceph-qa-suite`_ and | |
1329 | returns the OpenStack tenant to a "clean slate". | |
1330 | ||
1331 | Sometimes you may wish to interrupt the running suite, but keep the logs, | |
1332 | the teuthology VM, the packages-repository VM, etc. To do this, you can | |
1333 | ``ssh`` to the teuthology VM (using the ``ssh access`` command reported | |
1334 | when you triggered the suite -- see `Run the dummy suite`_) and, once | |
1335 | there:: | |
1336 | ||
1337 | sudo /etc/init.d/teuthology restart | |
1338 | ||
1339 | This will keep the teuthology machine, the logs and the packages-repository | |
1340 | instance but nuke everything else. | |
1341 | ||
1342 | Upload logs to archive server | |
1343 | ----------------------------- | |
1344 | ||
1345 | Since the teuthology instance in OpenStack is only semi-permanent, with limited | |
1346 | space for storing logs, ``teuthology-openstack`` provides an ``--upload`` | |
1347 | option which, if included in the ``ceph-workbench ceph-qa-suite`` command, | |
1348 | will cause logs from all failed jobs to be uploaded to the log archive server | |
1349 | maintained by the Ceph project. The logs will appear at the URL:: | |
1350 | ||
1351 | http://teuthology-logs.public.ceph.com/$RUN | |
1352 | ||
1353 | where ``$RUN`` is the name of the run. It will be a string like this:: | |
1354 | ||
1355 | ubuntu-2016-07-23_16:08:12-rados-hammer-backports---basic-openstack | |
1356 | ||
1357 | Even if you don't providing the ``--upload`` option, however, all the logs can | |
1358 | still be found on the teuthology machine in the directory | |
1359 | ``/usr/share/nginx/html``. | |
1360 | ||
1361 | Provision VMs ad hoc | |
1362 | -------------------- | |
1363 | ||
1364 | From the teuthology VM, it is possible to provision machines on an "ad hoc" | |
1365 | basis, to use however you like. The magic incantation is:: | |
1366 | ||
1367 | teuthology-lock --lock-many $NUMBER_OF_MACHINES \ | |
1368 | --os-type $OPERATING_SYSTEM \ | |
1369 | --os-version $OS_VERSION \ | |
1370 | --machine-type openstack \ | |
1371 | --owner $EMAIL_ADDRESS | |
1372 | ||
1373 | The command must be issued from the ``~/teuthology`` directory. The possible | |
1374 | values for ``OPERATING_SYSTEM`` AND ``OS_VERSION`` can be found by examining | |
1375 | the contents of the directory ``teuthology/openstack/``. For example:: | |
1376 | ||
1377 | teuthology-lock --lock-many 1 --os-type ubuntu --os-version 16.04 \ | |
1378 | --machine-type openstack --owner foo@example.com | |
1379 | ||
1380 | When you are finished with the machine, find it in the list of machines:: | |
1381 | ||
1382 | openstack server list | |
1383 | ||
1384 | to determine the name or ID, and then terminate it with:: | |
1385 | ||
1386 | openstack server delete $NAME_OR_ID | |
1387 | ||
1388 | Deploy a cluster for manual testing | |
1389 | ----------------------------------- | |
1390 | ||
1391 | The `teuthology framework`_ and `ceph-workbench ceph-qa-suite`_ are | |
1392 | versatile tools that automatically provision Ceph clusters in the cloud and | |
1393 | run various tests on them in an automated fashion. This enables a single | |
1394 | engineer, in a matter of hours, to perform thousands of tests that would | |
1395 | keep dozens of human testers occupied for days or weeks if conducted | |
1396 | manually. | |
1397 | ||
1398 | However, there are times when the automated tests do not cover a particular | |
1399 | scenario and manual testing is desired. It turns out that it is simple to | |
1400 | adapt a test to stop and wait after the Ceph installation phase, and the | |
1401 | engineer can then ssh into the running cluster. Simply add the following | |
1402 | snippet in the desired place within the test YAML and schedule a run with the | |
1403 | test:: | |
1404 | ||
1405 | tasks: | |
1406 | - exec: | |
1407 | client.0: | |
1408 | - sleep 1000000000 # forever | |
1409 | ||
1410 | (Make sure you have a ``client.0`` defined in your ``roles`` stanza or adapt | |
1411 | accordingly.) | |
1412 | ||
1413 | The same effect can be achieved using the ``interactive`` task:: | |
1414 | ||
1415 | tasks: | |
1416 | - interactive | |
1417 | ||
1418 | By following the test log, you can determine when the test cluster has entered | |
1419 | the "sleep forever" condition. At that point, you can ssh to the teuthology | |
1420 | machine and from there to one of the target VMs (OpenStack) or teuthology | |
1421 | worker machines machine (Sepia) where the test cluster is running. | |
1422 | ||
1423 | The VMs (or "instances" in OpenStack terminology) created by | |
1424 | `ceph-workbench ceph-qa-suite`_ are named as follows: | |
1425 | ||
1426 | ``teuthology`` - the teuthology machine | |
1427 | ||
1428 | ``packages-repository`` - VM where packages are stored | |
1429 | ||
1430 | ``ceph-*`` - VM where packages are built | |
1431 | ||
1432 | ``target*`` - machines where tests are run | |
1433 | ||
1434 | The VMs named ``target*`` are used by tests. If you are monitoring the | |
1435 | teuthology log for a given test, the hostnames of these target machines can | |
1436 | be found out by searching for the string ``Locked targets``:: | |
1437 | ||
1438 | 2016-03-20T11:39:06.166 INFO:teuthology.task.internal:Locked targets: | |
1439 | target149202171058.teuthology: null | |
1440 | target149202171059.teuthology: null | |
1441 | ||
1442 | The IP addresses of the target machines can be found by running ``openstack | |
1443 | server list`` on the teuthology machine, but the target VM hostnames (e.g. | |
1444 | ``target149202171058.teuthology``) are resolvable within the teuthology | |
1445 | cluster. | |
1446 | ||
1447 | ||
1448 | Testing - how to run s3-tests locally | |
1449 | ===================================== | |
1450 | ||
1451 | RGW code can be tested by building Ceph locally from source, starting a vstart | |
1452 | cluster, and running the "s3-tests" suite against it. | |
1453 | ||
1454 | The following instructions should work on jewel and above. | |
1455 | ||
1456 | Step 1 - build Ceph | |
1457 | ------------------- | |
1458 | ||
224ce89b | 1459 | Refer to :doc:`/install/build-ceph`. |
7c673cae FG |
1460 | |
1461 | You can do step 2 separately while it is building. | |
1462 | ||
31f18b77 | 1463 | Step 2 - vstart |
7c673cae FG |
1464 | --------------- |
1465 | ||
1466 | When the build completes, and still in the top-level directory of the git | |
31f18b77 | 1467 | clone where you built Ceph, do the following, for cmake builds:: |
7c673cae | 1468 | |
31f18b77 FG |
1469 | cd build/ |
1470 | RGW=1 ../vstart.sh -n | |
7c673cae FG |
1471 | |
1472 | This will produce a lot of output as the vstart cluster is started up. At the | |
1473 | end you should see a message like:: | |
1474 | ||
1475 | started. stop.sh to stop. see out/* (e.g. 'tail -f out/????') for debug output. | |
1476 | ||
1477 | This means the cluster is running. | |
1478 | ||
7c673cae | 1479 | |
31f18b77 | 1480 | Step 3 - run s3-tests |
7c673cae FG |
1481 | --------------------- |
1482 | ||
31f18b77 | 1483 | To run the s3tests suite do the following:: |
7c673cae | 1484 | |
31f18b77 | 1485 | $ ../qa/workunits/rgw/run-s3tests.sh |
7c673cae FG |
1486 | |
1487 | .. WIP | |
1488 | .. === | |
1489 | .. | |
1490 | .. Building RPM packages | |
1491 | .. --------------------- | |
1492 | .. | |
1493 | .. Ceph is regularly built and packaged for a number of major Linux | |
1494 | .. distributions. At the time of this writing, these included CentOS, Debian, | |
1495 | .. Fedora, openSUSE, and Ubuntu. | |
1496 | .. | |
1497 | .. Architecture | |
1498 | .. ============ | |
1499 | .. | |
1500 | .. Ceph is a collection of components built on top of RADOS and provide | |
1501 | .. services (RBD, RGW, CephFS) and APIs (S3, Swift, POSIX) for the user to | |
1502 | .. store and retrieve data. | |
1503 | .. | |
1504 | .. See :doc:`/architecture` for an overview of Ceph architecture. The | |
1505 | .. following sections treat each of the major architectural components | |
1506 | .. in more detail, with links to code and tests. | |
1507 | .. | |
1508 | .. FIXME The following are just stubs. These need to be developed into | |
1509 | .. detailed descriptions of the various high-level components (RADOS, RGW, | |
1510 | .. etc.) with breakdowns of their respective subcomponents. | |
1511 | .. | |
1512 | .. FIXME Later, in the Testing chapter I would like to take another look | |
1513 | .. at these components/subcomponents with a focus on how they are tested. | |
1514 | .. | |
1515 | .. RADOS | |
1516 | .. ----- | |
1517 | .. | |
1518 | .. RADOS stands for "Reliable, Autonomic Distributed Object Store". In a Ceph | |
1519 | .. cluster, all data are stored in objects, and RADOS is the component responsible | |
1520 | .. for that. | |
1521 | .. | |
1522 | .. RADOS itself can be further broken down into Monitors, Object Storage Daemons | |
1523 | .. (OSDs), and client APIs (librados). Monitors and OSDs are introduced at | |
1524 | .. :doc:`/start/intro`. The client library is explained at | |
1525 | .. :doc:`/rados/api/index`. | |
1526 | .. | |
1527 | .. RGW | |
1528 | .. --- | |
1529 | .. | |
1530 | .. RGW stands for RADOS Gateway. Using the embedded HTTP server civetweb_ or | |
1531 | .. Apache FastCGI, RGW provides a REST interface to RADOS objects. | |
1532 | .. | |
1533 | .. .. _civetweb: https://github.com/civetweb/civetweb | |
1534 | .. | |
1535 | .. A more thorough introduction to RGW can be found at :doc:`/radosgw/index`. | |
1536 | .. | |
1537 | .. RBD | |
1538 | .. --- | |
1539 | .. | |
1540 | .. RBD stands for RADOS Block Device. It enables a Ceph cluster to store disk | |
1541 | .. images, and includes in-kernel code enabling RBD images to be mounted. | |
1542 | .. | |
1543 | .. To delve further into RBD, see :doc:`/rbd/rbd`. | |
1544 | .. | |
1545 | .. CephFS | |
1546 | .. ------ | |
1547 | .. | |
1548 | .. CephFS is a distributed file system that enables a Ceph cluster to be used as a NAS. | |
1549 | .. | |
1550 | .. File system metadata is managed by Meta Data Server (MDS) daemons. The Ceph | |
1551 | .. file system is explained in more detail at :doc:`/cephfs/index`. | |
1552 | .. |