1 =========================
2 Tracing Ceph With Blkin
3 =========================
5 Ceph can use Blkin, a library created by Marios Kogias and others,
6 which enables tracking a specific request from the time it enters
7 the system at higher levels till it is finally served by RADOS.
9 In general, Blkin implements the Dapper_ tracing semantics
10 in order to show the causal relationships between the different
11 processing phases that an IO request may trigger. The goal is an
12 end-to-end visualisation of the request's route in the system,
13 accompanied by information concerning latencies in each processing
14 phase. Thanks to LTTng this can happen with a minimal overhead and
15 in realtime. The LTTng traces can then be visualized with Twitter's
18 .. _Dapper: http://static.googleusercontent.com/media/research.google.com/el//pubs/archive/36356.pdf
19 .. _Zipkin: https://zipkin.io/
25 You can install Markos Kogias' upstream Blkin_ by hand.::
30 or build distribution packages using DistroReadyBlkin_, which also comes with
31 pkgconfig support. If you choose the latter, then you must generate the
32 configure and make files first.::
37 .. _Blkin: https://github.com/marioskogias/blkin
38 .. _DistroReadyBlkin: https://github.com/agshew/blkin
41 Configuring Ceph with Blkin
42 ===========================
44 If you built and installed Blkin by hand, rather than building and
45 installing packages, then set these variables before configuring
48 export BLKIN_CFLAGS=-Iblkin/
49 export BLKIN_LIBS=-lzipkin-cpp
51 Blkin support in Ceph is disabled by default, so you may
52 want to configure with something like::
54 ./do_cmake -DWITH_BLKIN=ON
56 Config option for blkin must be set to true in ceph.conf to get
57 traces from rbd through OSDC and OSD::
59 rbd_blkin_trace_all = true
65 It's easy to test Ceph's Blkin tracing. Let's assume you don't have
66 Ceph already running, and you compiled Ceph with Blkin support but
67 you didn't install it. Then launch Ceph with the ``vstart.sh`` script
68 in Ceph's src directory so you can see the possible tracepoints.::
71 OSD=3 MON=3 RGW=1 ./vstart.sh -n
72 lttng list --userspace
74 You'll see something like the following:::
78 PID: 8987 - Name: ./ceph-osd
79 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
80 zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
81 zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
82 lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
84 PID: 8407 - Name: ./ceph-mon
85 zipkin:timestamp (loglevel: TRACE_WARNING (4)) (type: tracepoint)
86 zipkin:keyval_integer (loglevel: TRACE_WARNING (4)) (type: tracepoint)
87 zipkin:keyval_string (loglevel: TRACE_WARNING (4)) (type: tracepoint)
88 lttng_ust_tracelog:TRACE_DEBUG (loglevel: TRACE_DEBUG (14)) (type: tracepoint)
92 Next, stop Ceph so that the tracepoints can be enabled.::
96 Start up an LTTng session and enable the tracepoints.::
98 lttng create blkin-test
99 lttng enable-event --userspace zipkin:timestamp
100 lttng enable-event --userspace zipkin:keyval_integer
101 lttng enable-event --userspace zipkin:keyval_string
104 Then start up Ceph again.::
106 OSD=3 MON=3 RGW=1 ./vstart.sh -n
108 You may want to check that ceph is up.::
112 Now put something in using rados, check that it made it, get it back, and remove it.::
114 ./ceph osd pool create test-blkin 8
115 ./rados put test-object-1 ./vstart.sh --pool=test-blkin
116 ./rados -p test-blkin ls
117 ./ceph osd map test-blkin test-object-1
118 ./rados get test-object-1 ./vstart-copy.sh --pool=test-blkin
120 ./rados rm test-object-1 --pool=test-blkin
122 You could also use the example in ``examples/librados/`` or ``rados bench``.
124 Then stop the LTTng session and see what was collected.::
129 You'll see something like:::
131 [15:33:08.884275486] (+0.000225472) ubuntu zipkin:timestamp: { cpu_id = 53 }, { trace_name = "op", service_name = "Objecter", port_no = 0, ip = "0.0.0.0", trace_id = 5485970765435202833, span_id = 5485970765435202833, parent_span_id = 0, event = "osd op reply" }
132 [15:33:08.884614135] (+0.000002839) ubuntu zipkin:keyval_integer: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "tid", val = 2 }
133 [15:33:08.884616431] (+0.000002296) ubuntu zipkin:keyval_string: { cpu_id = 10 }, { trace_name = "", service_name = "Messenger", port_no = 6805, ip = "0.0.0.0", trace_id = 7381732770245808782, span_id = 7387710183742669839, parent_span_id = 1205040135881905799, key = "entity type", val = "client" }
138 One of the points of using Blkin is so that you can look at the traces
139 using Zipkin. Users should run Zipkin as a tracepoints collector and
140 also a web service. The executable jar runs a collector on port 9410 and
141 the web interface on port 9411
143 Download Zipkin Package::
145 git clone https://github.com/openzipkin/zipkin && cd zipkin
146 wget -O zipkin.jar 'https://search.maven.org/remote_content?g=io.zipkin.java&a=zipkin-server&v=LATEST&c=exec'
150 Show Ceph's Blkin Traces in Zipkin-web
151 ======================================
152 Download babeltrace-zipkin project. This project takes the traces
153 generated with blkin and sends them to a Zipkin collector using scribe::
155 git clone https://github.com/vears91/babeltrace-zipkin
158 Send lttng data to Zipkin::
160 python3 babeltrace_zipkin.py ${lttng-traces-dir}/${blkin-test}/ust/uid/0/64-bit/ -p ${zipkin-collector-port(9410 by default)} -s ${zipkin-collector-ip}
164 python3 babeltrace_zipkin.py ~/lttng-traces-dir/blkin-test-20150225-160222/ust/uid/0/64-bit/ -p 9410 -s 127.0.0.1
166 Check Ceph traces on webpage::
168 Browse http://${zipkin-collector-ip}:9411