]>
Commit | Line | Data |
---|---|---|
20effc67 TL |
1 | ========================= |
2 | Tracing Ceph With LTTng | |
3 | ========================= | |
4 | ||
5 | Configuring Ceph with LTTng | |
6 | =========================== | |
7 | ||
8 | Use -DWITH_LTTNG option (default: ON):: | |
9 | ||
10 | ./do_cmake -DWITH_LTTNG=ON | |
11 | ||
12 | Config option for tracing must be set to true in ceph.conf. | |
13 | Following options are currently available:: | |
14 | ||
15 | bluestore_tracing | |
16 | event_tracing (-DWITH_EVENTTRACE) | |
17 | osd_function_tracing (-DWITH_OSD_INSTRUMENT_FUNCTIONS) | |
18 | osd_objectstore_tracing (actually filestore tracing) | |
19 | rbd_tracing | |
20 | osd_tracing | |
21 | rados_tracing | |
22 | rgw_op_tracing | |
23 | rgw_rados_tracing | |
24 | ||
25 | Testing Trace | |
26 | ============= | |
27 | ||
28 | Start LTTng daemon:: | |
29 | ||
30 | lttng-sessiond --daemonize | |
31 | ||
32 | Run vstart cluster with enabling trace options:: | |
33 | ||
34 | ../src/vstart.sh -d -n -l -e -o "osd_tracing = true" | |
35 | ||
36 | List available tracepoints:: | |
37 | ||
38 | lttng list --userspace | |
39 | ||
40 | You will get something like:: | |
41 | ||
42 | UST events: | |
43 | ------------- | |
44 | PID: 100859 - Name: /path/to/ceph-osd | |
45 | pg:queue_op (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) | |
46 | osd:do_osd_op_post (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) | |
47 | osd:do_osd_op_pre_unknown (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) | |
48 | osd:do_osd_op_pre_copy_from (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) | |
49 | osd:do_osd_op_pre_copy_get (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint) | |
50 | ... | |
51 | ||
52 | Create tracing session, enable tracepoints and start trace:: | |
53 | ||
54 | lttng create trace-test | |
55 | lttng enable-event --userspace osd:* | |
56 | lttng start | |
57 | ||
58 | Perform some ceph operatin:: | |
59 | ||
60 | rados bench -p ec 5 write | |
61 | ||
62 | Stop tracing and view result:: | |
63 | ||
64 | lttng stop | |
65 | lttng view | |
66 | ||
67 | Destroy tracing session:: | |
68 | ||
69 | lttng destroy | |
70 | ||
7c673cae | 71 | ========================= |
11fdf7f2 | 72 | Tracing Ceph With Blkin |
7c673cae FG |
73 | ========================= |
74 | ||
75 | Ceph can use Blkin, a library created by Marios Kogias and others, | |
76 | which enables tracking a specific request from the time it enters | |
77 | the system at higher levels till it is finally served by RADOS. | |
78 | ||
79 | In general, Blkin implements the Dapper_ tracing semantics | |
80 | in order to show the causal relationships between the different | |
81 | processing phases that an IO request may trigger. The goal is an | |
82 | end-to-end visualisation of the request's route in the system, | |
83 | accompanied by information concerning latencies in each processing | |
84 | phase. Thanks to LTTng this can happen with a minimal overhead and | |
85 | in realtime. The LTTng traces can then be visualized with Twitter's | |
86 | Zipkin_. | |
87 | ||
88 | .. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf | |
11fdf7f2 | 89 | .. _Zipkin: https://zipkin.io/ |
7c673cae FG |
90 | |
91 | ||
7c673cae FG |
92 | Configuring Ceph with Blkin |
93 | =========================== | |
94 | ||
20effc67 | 95 | Use -DWITH_BLKIN option (which requires -DWITH_LTTNG):: |
7c673cae | 96 | |
20effc67 | 97 | ./do_cmake -DWITH_LTTNG=ON -DWITH_BLKIN=ON |
11fdf7f2 | 98 | |
20effc67 TL |
99 | Config option for blkin must be set to true in ceph.conf. |
100 | Following options are currently available:: | |
7c673cae | 101 | |
20effc67 TL |
102 | rbd_blkin_trace_all |
103 | osd_blkin_trace_all | |
104 | osdc_blkin_trace_all | |
7c673cae FG |
105 | |
106 | Testing Blkin | |
107 | ============= | |
108 | ||
109 | It's easy to test Ceph's Blkin tracing. Let's assume you don't have | |
110 | Ceph already running, and you compiled Ceph with Blkin support but | |
11fdf7f2 TL |
111 | you didn't install it. Then launch Ceph with the ``vstart.sh`` script |
112 | in Ceph's src directory so you can see the possible tracepoints.:: | |
7c673cae | 113 | |
20effc67 | 114 | OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all" |
7c673cae FG |
115 | lttng list --userspace |
116 | ||
117 | You'll see something like the following::: | |
118 | ||
119 | UST events: | |
120 | ------------- | |
121 | PID: 8987 - Name: ./ceph-osd | |
122 | zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint) | |
11fdf7f2 TL |
123 | zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint) |
124 | zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint) | |
125 | lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint) | |
7c673cae FG |
126 | |
127 | PID: 8407 - Name: ./ceph-mon | |
128 | zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint) | |
11fdf7f2 TL |
129 | zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint) |
130 | zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint) | |
131 | lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint) | |
7c673cae FG |
132 | |
133 | ... | |
134 | ||
135 | Next, stop Ceph so that the tracepoints can be enabled.:: | |
136 | ||
20effc67 | 137 | ../src/stop.sh |
7c673cae FG |
138 | |
139 | Start up an LTTng session and enable the tracepoints.:: | |
140 | ||
141 | lttng create blkin-test | |
142 | lttng enable-event --userspace zipkin:timestamp | |
11fdf7f2 TL |
143 | lttng enable-event --userspace zipkin:keyval_integer |
144 | lttng enable-event --userspace zipkin:keyval_string | |
7c673cae FG |
145 | lttng start |
146 | ||
147 | Then start up Ceph again.:: | |
148 | ||
20effc67 | 149 | OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all" |
7c673cae FG |
150 | |
151 | You may want to check that ceph is up.:: | |
152 | ||
20effc67 | 153 | ceph status |
7c673cae | 154 | |
11fdf7f2 | 155 | Now put something in using rados, check that it made it, get it back, and remove it.:: |
7c673cae | 156 | |
20effc67 TL |
157 | ceph osd pool create test-blkin |
158 | rados put test-object-1 ../src/vstart.sh --pool=test-blkin | |
159 | rados -p test-blkin ls | |
160 | ceph osd map test-blkin test-object-1 | |
161 | rados get test-object-1 ./vstart-copy.sh --pool=test-blkin | |
7c673cae | 162 | md5sum vstart* |
20effc67 | 163 | rados rm test-object-1 --pool=test-blkin |
7c673cae FG |
164 | |
165 | You could also use the example in ``examples/librados/`` or ``rados bench``. | |
166 | ||
167 | Then stop the LTTng session and see what was collected.:: | |
168 | ||
169 | lttng stop | |
170 | lttng view | |
171 | ||
172 | You'll see something like::: | |
173 | ||
11fdf7f2 TL |
174 | [15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" } |
175 | [15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 } | |
176 | [15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" } | |
7c673cae FG |
177 | |
178 | ||
179 | Install Zipkin | |
180 | =============== | |
181 | One of the points of using Blkin is so that you can look at the traces | |
182 | using Zipkin. Users should run Zipkin as a tracepoints collector and | |
11fdf7f2 TL |
183 | also a web service. The executable jar runs a collector on port 9410 and |
184 | the web interface on port 9411 | |
7c673cae FG |
185 | |
186 | Download Zipkin Package:: | |
187 | ||
11fdf7f2 TL |
188 | git clone https://github.com/openzipkin/zipkin && cd zipkin |
189 | wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec' | |
190 | java -jar zipkin.jar | |
7c673cae | 191 | |
20effc67 TL |
192 | Or, launch docker image:: |
193 | ||
194 | docker run -d -p 9411:9411 openzipkin/Zipkin | |
7c673cae FG |
195 | |
196 | Show Ceph's Blkin Traces in Zipkin-web | |
197 | ====================================== | |
11fdf7f2 TL |
198 | Download babeltrace-zipkin project. This project takes the traces |
199 | generated with blkin and sends them to a Zipkin collector using scribe:: | |
200 | ||
201 | git clone https://github.com/vears91/babeltrace-zipkin | |
202 | cd babeltrace-zipkin | |
7c673cae FG |
203 | |
204 | Send lttng data to Zipkin:: | |
205 | ||
206 | python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip} | |
207 | ||
208 | Example:: | |
209 | ||
210 | python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1 | |
211 | ||
212 | Check Ceph traces on webpage:: | |
213 | ||
11fdf7f2 | 214 | Browse http://${zipkin-collector-ip}:9411 |
7c673cae | 215 | Click "Find traces" |