]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/crimson/crimson.rst
update source to Ceph Pacific 16.2.2
[ceph.git] / ceph / doc / dev / crimson / crimson.rst
1 =======
2 crimson
3 =======
4
5 Crimson is the code name of crimson-osd, which is the next generation ceph-osd.
6 It targets fast networking devices, fast storage devices by leveraging state of
7 the art technologies like DPDK and SPDK, for better performance. And it will
8 keep the support of HDDs and low-end SSDs via BlueStore. Crismon will try to
9 be backward compatible with classic OSD.
10
11 .. highlight:: console
12
13 Building Crimson
14 ================
15
16 Crismon is not enabled by default. To enable it::
17
18 $ WITH_SEASTAR=true ./install-deps.sh
19 $ mkdir build && cd build
20 $ cmake -DWITH_SEASTAR=ON ..
21
22 Please note, `ASan`_ is enabled by default if crimson is built from a source
23 cloned using git.
24
25 Also, Seastar uses its own lockless allocator which does not play well with
26 the alien threads. So, to use alienstore / bluestore backend, you might want to
27 pass ``-DSeastar_CXX_FLAGS=-DSEASTAR_DEFAULT_ALLOCATOR`` to ``cmake`` when
28 configuring this project to use the libc allocator, like::
29
30 $ cmake -DWITH_SEASTAR=ON -DSeastar_CXX_FLAGS=-DSEASTAR_DEFAULT_ALLOCATOR ..
31
32 .. _ASan: https://github.com/google/sanitizers/wiki/AddressSanitizer
33
34 Running Crimson
35 ===============
36
37 As you might expect, crimson is not featurewise on par with its predecessor yet.
38
39 object store backend
40 --------------------
41
42 At the moment ``crimson-osd`` offers two object store backends:
43
44 - CyanStore: CyanStore is modeled after memstore in classic OSD.
45 - AlienStore: AlienStore is short for Alienized BlueStore.
46
47 Seastore is still under active development.
48
49 daemonize
50 ---------
51
52 Unlike ``ceph-osd``, ``crimson-osd`` does daemonize itself even if the
53 ``daemonize`` option is enabled. Because, to read this option, ``crimson-osd``
54 needs to ready its config sharded service, but this sharded service lives
55 in the seastar reactor. If we fork a child process and exit the parent after
56 starting the Seastar engine, that will leave us with a single thread which is
57 the replica of the thread calls `fork()`_. This would unnecessarily complicate
58 the code, if we would have tackled this problem in crimson.
59
60 Since a lot of GNU/Linux distros are using systemd nowadays, which is able to
61 daemonize the application, there is no need to daemonize by ourselves. For
62 those who are using sysvinit, they can use ``start-stop-daemon`` for daemonizing
63 ``crimson-osd``. If this is not acceptable, we can whip up a helper utility
64 to do the trick.
65
66
67 .. _fork(): http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html
68
69 logging
70 -------
71
72 Currently, ``crimson-osd`` uses the logging utility offered by Seastar. see
73 ``src/common/dout.h`` for the mapping between different logging levels to
74 the severity levels in Seastar. For instance, the messages sent to ``derr``
75 will be printed using ``logger::error()``, and the messages with debug level
76 over ``20`` will be printed using ``logger::trace()``.
77
78 +---------+---------+
79 | ceph | seastar |
80 +---------+---------+
81 | < 0 | error |
82 +---------+---------+
83 | 0 | warn |
84 +---------+---------+
85 | [1, 5) | info |
86 +---------+---------+
87 | [5, 20] | debug |
88 +---------+---------+
89 | > 20 | trace |
90 +---------+---------+
91
92 Please note, ``crimson-osd``
93 does not send the logging message to specified ``log_file``. It writes
94 the logging messages to stdout and/or syslog. Again, this behavior can be
95 changed using ``--log-to-stdout`` and ``--log-to-syslog`` command line
96 options. By default, ``log-to-stdout`` is enabled, and the latter disabled.
97
98
99 vstart.sh
100 ---------
101
102 To facilitate the development of crimson, following options would be handy when
103 using ``vstart.sh``,
104
105 ``--crimson``
106 start ``crimson-osd`` instead of ``ceph-osd``
107
108 ``--nodaemon``
109 do not daemonize the service
110
111 ``--redirect-output``
112 redirect the stdout and stderr of service to ``out/$type.$num.stdout``.
113
114 ``--osd-args``
115 pass extra command line options to crimson-osd or ceph-osd. It's quite
116 useful for passing Seastar options to crimson-osd. For instance, you could
117 use ``--osd-args "--memory 2G"`` to set the memory to use. Please refer
118 the output of::
119
120 crimson-osd --help-seastar
121
122 for more Seastar specific command line options.
123
124 ``--memstore``
125 use the CyanStore as the object store backend.
126
127 ``--bluestore``
128 use the AlienStore as the object store backend. This is the default setting,
129 if not specified otherwise.
130
131 So, a typical command to start a single-crimson-node cluster is::
132
133 $ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
134 --without-dashboard --memstore \
135 --crimson --nodaemon --redirect-output \
136 --osd-args "--memory 4G"
137
138 Where we assign 4 GiB memory, a single thread running on core-0 to crimson-osd.
139
140 You could stop the vstart cluster using::
141
142 $ ../src/stop.sh --crimson
143
144
145 CBT Based Testing
146 =================
147
148 We can use `cbt`_ for performing perf tests::
149
150 $ git checkout master
151 $ make crimson-osd
152 $ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/baseline ../src/test/crimson/cbt/radosbench_4K_read.yaml
153 $ git checkout yet-another-pr
154 $ make crimson-osd
155 $ ../src/script/run-cbt.sh --cbt ~/dev/cbt -a /tmp/yap ../src/test/crimson/cbt/radosbench_4K_read.yaml
156 $ ~/dev/cbt/compare.py -b /tmp/baseline -a /tmp/yap -v
157 19:48:23 - INFO - cbt - prefill/gen8/0: bandwidth: (or (greater) (near 0.05)):: 0.183165/0.186155 => accepted
158 19:48:23 - INFO - cbt - prefill/gen8/0: iops_avg: (or (greater) (near 0.05)):: 46.0/47.0 => accepted
159 19:48:23 - WARNING - cbt - prefill/gen8/0: iops_stddev: (or (less) (near 0.05)):: 10.4403/6.65833 => rejected
160 19:48:23 - INFO - cbt - prefill/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.340868/0.333712 => accepted
161 19:48:23 - INFO - cbt - prefill/gen8/1: bandwidth: (or (greater) (near 0.05)):: 0.190447/0.177619 => accepted
162 19:48:23 - INFO - cbt - prefill/gen8/1: iops_avg: (or (greater) (near 0.05)):: 48.0/45.0 => accepted
163 19:48:23 - INFO - cbt - prefill/gen8/1: iops_stddev: (or (less) (near 0.05)):: 6.1101/9.81495 => accepted
164 19:48:23 - INFO - cbt - prefill/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.325163/0.350251 => accepted
165 19:48:23 - INFO - cbt - seq/gen8/0: bandwidth: (or (greater) (near 0.05)):: 1.24654/1.22336 => accepted
166 19:48:23 - INFO - cbt - seq/gen8/0: iops_avg: (or (greater) (near 0.05)):: 319.0/313.0 => accepted
167 19:48:23 - INFO - cbt - seq/gen8/0: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
168 19:48:23 - INFO - cbt - seq/gen8/0: latency_avg: (or (less) (near 0.05)):: 0.0497733/0.0509029 => accepted
169 19:48:23 - INFO - cbt - seq/gen8/1: bandwidth: (or (greater) (near 0.05)):: 1.22717/1.11372 => accepted
170 19:48:23 - INFO - cbt - seq/gen8/1: iops_avg: (or (greater) (near 0.05)):: 314.0/285.0 => accepted
171 19:48:23 - INFO - cbt - seq/gen8/1: iops_stddev: (or (less) (near 0.05)):: 0.0/0.0 => accepted
172 19:48:23 - INFO - cbt - seq/gen8/1: latency_avg: (or (less) (near 0.05)):: 0.0508262/0.0557337 => accepted
173 19:48:23 - WARNING - cbt - 1 tests failed out of 16
174
175 Where we compile and run the same test against two branches. One is ``master``, another is ``yet-another-pr`` branch.
176 And then we compare the test results. Along with every test case, a set of rules is defined to check if we have
177 performance regressions when comparing two set of test results. If a possible regression is found, the rule and
178 corresponding test results are highlighted.
179
180 .. _cbt: https://github.com/ceph/cbt
181
182 Hacking Crimson
183 ===============
184
185
186 Seastar Documents
187 -----------------
188
189 See `Seastar Tutorial <https://github.com/scylladb/seastar/blob/master/doc/tutorial.md>`_ .
190 Or build a browsable version and start an HTTP server::
191
192 $ cd seastar
193 $ ./configure.py --mode debug
194 $ ninja -C build/debug docs
195 $ python3 -m http.server -d build/debug/doc/html
196
197 You might want to install ``pandoc`` and other dependencies beforehand.
198
199 Debugging Crimson
200 =================
201
202 Debugging with GDB
203 ------------------
204
205 The `tips`_ for debugging Scylla also apply to Crimson.
206
207 .. _tips: https://github.com/scylladb/scylla/blob/master/docs/debugging.md#tips-and-tricks
208
209 Human-readable backtraces with addr2line
210 ----------------------------------------
211
212 When a seastar application crashes, it leaves us with a serial of addresses, like::
213
214 Segmentation fault.
215 Backtrace:
216 0x00000000108254aa
217 0x00000000107f74b9
218 0x00000000105366cc
219 0x000000001053682c
220 0x00000000105d2c2e
221 0x0000000010629b96
222 0x0000000010629c31
223 0x00002a02ebd8272f
224 0x00000000105d93ee
225 0x00000000103eff59
226 0x000000000d9c1d0a
227 /lib/x86_64-linux-gnu/libc.so.6+0x000000000002409a
228 0x000000000d833ac9
229 Segmentation fault
230
231 ``seastar-addr2line`` offered by Seastar can be used to decipher these
232 addresses. After running the script, it will be waiting for input from stdin,
233 so we need to copy and paste the above addresses, then send the EOF by inputting
234 ``control-D`` in the terminal::
235
236 $ ../src/seastar/scripts/seastar-addr2line -e bin/crimson-osd
237
238 0x00000000108254aa
239 0x00000000107f74b9
240 0x00000000105366cc
241 0x000000001053682c
242 0x00000000105d2c2e
243 0x0000000010629b96
244 0x0000000010629c31
245 0x00002a02ebd8272f
246 0x00000000105d93ee
247 0x00000000103eff59
248 0x000000000d9c1d0a
249 0x00000000108254aa
250 [Backtrace #0]
251 seastar::backtrace_buffer::append_backtrace() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1136
252 seastar::print_with_backtrace(seastar::backtrace_buffer&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1157
253 seastar::print_with_backtrace(char const*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:1164
254 seastar::sigsegv_action() at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5119
255 seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::operator()(int, siginfo_t*, void*) const at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5105
256 seastar::install_oneshot_signal_handler<11, &seastar::sigsegv_action>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5101
257 ?? ??:0
258 seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config) at /home/kefu/dev/ceph/build/../src/seastar/src/core/reactor.cc:5418
259 seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/kefu/dev/ceph/build/../src/seastar/src/core/app-template.cc:173 (discriminator 5)
260 main at /home/kefu/dev/ceph/build/../src/crimson/osd/main.cc:131 (discriminator 1)
261
262 Please note, ``seastar-addr2line`` is able to extract the addresses from
263 the input, so you can also paste the log messages like::
264
265 2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr:Backtrace:
266 2020-07-22T11:37:04.500 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e78dbc
267 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e7f0
268 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e8b8
269 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: 0x0000000000e3e985
270 2020-07-22T11:37:04.501 INFO:teuthology.orchestra.run.smithi061.stderr: /lib64/libpthread.so.0+0x0000000000012dbf
271
272 Unlike classic OSD, crimson does not print a human-readable backtrace when it
273 handles fatal signals like `SIGSEGV` or `SIGABRT`. And it is more complicated
274 when it comes to a stripped binary. So before planting a signal handler for
275 those signals in crimson, we could to use `script/ceph-debug-docker.sh` to parse
276 the addresses in the backtrace::
277
278 # assuming you are under the source tree of ceph
279 $ ./src/script/ceph-debug-docker.sh --flavor crimson master:27e237c137c330ebb82627166927b7681b20d0aa centos:8
280 ....
281 [root@3deb50a8ad51 ~]# wget -q https://raw.githubusercontent.com/scylladb/seastar/master/scripts/seastar-addr2line
282 [root@3deb50a8ad51 ~]# dnf install -q -y file
283 [root@3deb50a8ad51 ~]# python3 seastar-addr2line -e /usr/bin/crimson-osd
284 # paste the backtrace here