]> git.proxmox.com Git - ceph.git/blob - ceph/doc/dev/blkin.rst
update ceph source to reef 18.2.1
[ceph.git] / ceph / doc / dev / blkin.rst
1 =========================
2 Tracing Ceph With LTTng
3 =========================
4
5 Configuring Ceph with LTTng
6 ===========================
7
8 Use -DWITH_LTTNG option (default: ON)::
9
10 ./do_cmake -DWITH_LTTNG=ON
11
12 Config option for tracing must be set to true in ceph.conf.
13 Following options are currently available::
14
15 bluestore_tracing
16 event_tracing (-DWITH_EVENTTRACE)
17 osd_function_tracing (-DWITH_OSD_INSTRUMENT_FUNCTIONS)
18 osd_objectstore_tracing (actually filestore tracing)
19 rbd_tracing
20 osd_tracing
21 rados_tracing
22 rgw_op_tracing
23 rgw_rados_tracing
24
25 Testing Trace
26 =============
27
28 Start LTTng daemon::
29
30 lttng-sessiond --daemonize
31
32 Run vstart cluster with enabling trace options::
33
34 ../src/vstart.sh -d -n -l -e -o "osd_tracing = true"
35
36 List available tracepoints::
37
38 lttng list --userspace
39
40 You will get something like::
41
42 UST events:
43 -------------
44 PID: 100859 - Name: /path/to/ceph-osd
45 pg:queue_op (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
46 osd:do_osd_op_post (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
47 osd:do_osd_op_pre_unknown (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
48 osd:do_osd_op_pre_copy_from (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
49 osd:do_osd_op_pre_copy_get (loglevel: TRACE_DEBUG_LINE (13)) (type: tracepoint)
50 ...
51
52 Create tracing session, enable tracepoints and start trace::
53
54 lttng create trace-test
55 lttng enable-event --userspace osd:*
56 lttng start
57
58 Perform some Ceph operation::
59
60 rados bench -p ec 5 write
61
62 Stop tracing and view result::
63
64 lttng stop
65 lttng view
66
67 Destroy tracing session::
68
69 lttng destroy
70
71 =========================
72 Tracing Ceph With Blkin
73 =========================
74
75 Ceph can use Blkin, a library created by Marios Kogias and others,
76 which enables tracking a specific request from the time it enters
77 the system at higher levels till it is finally served by RADOS.
78
79 In general, Blkin implements the Dapper_ tracing semantics
80 in order to show the causal relationships between the different
81 processing phases that an IO request may trigger. The goal is an
82 end-to-end visualisation of the request's route in the system,
83 accompanied by information concerning latencies in each processing
84 phase. Thanks to LTTng this can happen with a minimal overhead and
85 in realtime. The LTTng traces can then be visualized with Twitter's
86 Zipkin_.
87
88 .. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
89 .. _Zipkin: https://zipkin.io/
90
91
92 Configuring Ceph with Blkin
93 ===========================
94
95 Use -DWITH_BLKIN option (which requires -DWITH_LTTNG)::
96
97 ./do_cmake -DWITH_LTTNG=ON -DWITH_BLKIN=ON
98
99 Config option for blkin must be set to true in ceph.conf.
100 Following options are currently available::
101
102 rbd_blkin_trace_all
103 osd_blkin_trace_all
104 osdc_blkin_trace_all
105
106 Testing Blkin
107 =============
108
109 It's easy to test Ceph's Blkin tracing. Let's assume you don't have
110 Ceph already running, and you compiled Ceph with Blkin support but
111 you didn't install it. Then launch Ceph with the ``vstart.sh`` script
112 in Ceph's src directory so you can see the possible tracepoints.::
113
114 OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
115 lttng list --userspace
116
117 You'll see something like the following:::
118
119 UST events:
120 -------------
121 PID: 8987 - Name: ./ceph-osd
122 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
123 zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
124 zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
125 lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
126
127 PID: 8407 - Name: ./ceph-mon
128 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
129 zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
130 zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
131 lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
132
133 ...
134
135 Next, stop Ceph so that the tracepoints can be enabled.::
136
137 ../src/stop.sh
138
139 Start up an LTTng session and enable the tracepoints.::
140
141 lttng create blkin-test
142 lttng enable-event --userspace zipkin:timestamp
143 lttng enable-event --userspace zipkin:keyval_integer
144 lttng enable-event --userspace zipkin:keyval_string
145 lttng start
146
147 Then start up Ceph again.::
148
149 OSD=3 MON=3 RGW=1 ../src/vstart.sh -n -o "rbd_blkin_trace_all"
150
151 You may want to check that ceph is up.::
152
153 ceph status
154
155 Now put something in using rados, check that it made it, get it back, and remove it.::
156
157 ceph osd pool create test-blkin
158 rados put test-object-1 ../src/vstart.sh --pool=test-blkin
159 rados -p test-blkin ls
160 ceph osd map test-blkin test-object-1
161 rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
162 md5sum vstart*
163 rados rm test-object-1 --pool=test-blkin
164
165 You could also use the example in ``examples/librados/`` or ``rados bench``.
166
167 Then stop the LTTng session and see what was collected.::
168
169 lttng stop
170 lttng view
171
172 You'll see something like:::
173
174 [15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" }
175 [15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 }
176 [15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" }
177
178
179 Install Zipkin
180 ===============
181 One of the points of using Blkin is so that you can look at the traces
182 using Zipkin. Users should run Zipkin as a tracepoints collector and
183 also a web service. The executable jar runs a collector on port 9410 and
184 the web interface on port 9411
185
186 Download Zipkin Package::
187
188 git clone https://github.com/openzipkin/zipkin && cd zipkin
189 wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
190 java -jar zipkin.jar
191
192 Or, launch docker image::
193
194 docker run -d -p 9411:9411 openzipkin/Zipkin
195
196 Show Ceph's Blkin Traces in Zipkin-web
197 ======================================
198 Download babeltrace-zipkin project. This project takes the traces
199 generated with blkin and sends them to a Zipkin collector using scribe::
200
201 git clone https://github.com/vears91/babeltrace-zipkin
202 cd babeltrace-zipkin
203
204 Send lttng data to Zipkin::
205
206 python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
207
208 Example::
209
210 python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
211
212 Check Ceph traces on webpage::
213
214 Browse http://${zipkin-collector-ip}:9411
215 Click "Find traces"