]>
Commit | Line | Data |
---|---|---|
1 | =============== | |
2 | Perf counters | |
3 | =============== | |
4 | ||
5 | The perf counters provide generic internal infrastructure for gauges and counters. The counted values can be both integer and float. There is also an "average" type (normally float) that combines a sum and num counter which can be divided to provide an average. | |
6 | ||
7 | The intention is that this data will be collected and aggregated by a tool like ``collectd`` or ``statsd`` and fed into a tool like ``graphite`` for graphing and analysis. Also, note the :doc:`../mgr/prometheus`. | |
8 | ||
9 | Access | |
10 | ------ | |
11 | ||
12 | The perf counter data is accessed via the admin socket. For example:: | |
13 | ||
14 | ceph daemon osd.0 perf schema | |
15 | ceph daemon osd.0 perf dump | |
16 | ||
17 | ||
18 | Collections | |
19 | ----------- | |
20 | ||
21 | The values are grouped into named collections, normally representing a subsystem or an instance of a subsystem. For example, the internal ``throttle`` mechanism reports statistics on how it is throttling, and each instance is named something like:: | |
22 | ||
23 | ||
24 | throttle-msgr_dispatch_throttler-hbserver | |
25 | throttle-msgr_dispatch_throttler-client | |
26 | throttle-filestore_bytes | |
27 | ... | |
28 | ||
29 | ||
30 | Schema | |
31 | ------ | |
32 | ||
33 | The ``perf schema`` command dumps a json description of which values are available, and what their type is. Each named value as a ``type`` bitfield, with the following bits defined. | |
34 | ||
35 | +------+-------------------------------------+ | |
36 | | bit | meaning | | |
37 | +======+=====================================+ | |
38 | | 1 | floating point value | | |
39 | +------+-------------------------------------+ | |
40 | | 2 | unsigned 64-bit integer value | | |
41 | +------+-------------------------------------+ | |
42 | | 4 | average (sum + count pair), where | | |
43 | +------+-------------------------------------+ | |
44 | | 8 | counter (vs gauge) | | |
45 | +------+-------------------------------------+ | |
46 | ||
47 | Every value will have either bit 1 or 2 set to indicate the type | |
48 | (float or integer). | |
49 | ||
50 | If bit 8 is set (counter), the value is monotonically increasing and | |
51 | the reader may want to subtract off the previously read value to get | |
52 | the delta during the previous interval. | |
53 | ||
54 | If bit 4 is set (average), there will be two values to read, a sum and | |
55 | a count. If it is a counter, the average for the previous interval | |
56 | would be sum delta (since the previous read) divided by the count | |
57 | delta. Alternatively, dividing the values outright would provide the | |
58 | lifetime average value. Normally these are used to measure latencies | |
59 | (number of requests and a sum of request latencies), and the average | |
60 | for the previous interval is what is interesting. | |
61 | ||
62 | Instead of interpreting the bit fields, the ``metric type`` has a | |
63 | value of either ``guage`` or ``counter``, and the ``value type`` | |
64 | property will be one of ``real``, ``integer``, ``real-integer-pair`` | |
65 | (for a sum + real count pair), or ``integer-integer-pair`` (for a | |
66 | sum + integer count pair). | |
67 | ||
68 | Here is an example of the schema output:: | |
69 | ||
70 | { | |
71 | "throttle-bluestore_throttle_bytes": { | |
72 | "val": { | |
73 | "type": 2, | |
74 | "metric_type": "gauge", | |
75 | "value_type": "integer", | |
76 | "description": "Currently available throttle", | |
77 | "nick": "" | |
78 | }, | |
79 | "max": { | |
80 | "type": 2, | |
81 | "metric_type": "gauge", | |
82 | "value_type": "integer", | |
83 | "description": "Max value for throttle", | |
84 | "nick": "" | |
85 | }, | |
86 | "get_started": { | |
87 | "type": 10, | |
88 | "metric_type": "counter", | |
89 | "value_type": "integer", | |
90 | "description": "Number of get calls, increased before wait", | |
91 | "nick": "" | |
92 | }, | |
93 | "get": { | |
94 | "type": 10, | |
95 | "metric_type": "counter", | |
96 | "value_type": "integer", | |
97 | "description": "Gets", | |
98 | "nick": "" | |
99 | }, | |
100 | "get_sum": { | |
101 | "type": 10, | |
102 | "metric_type": "counter", | |
103 | "value_type": "integer", | |
104 | "description": "Got data", | |
105 | "nick": "" | |
106 | }, | |
107 | "get_or_fail_fail": { | |
108 | "type": 10, | |
109 | "metric_type": "counter", | |
110 | "value_type": "integer", | |
111 | "description": "Get blocked during get_or_fail", | |
112 | "nick": "" | |
113 | }, | |
114 | "get_or_fail_success": { | |
115 | "type": 10, | |
116 | "metric_type": "counter", | |
117 | "value_type": "integer", | |
118 | "description": "Successful get during get_or_fail", | |
119 | "nick": "" | |
120 | }, | |
121 | "take": { | |
122 | "type": 10, | |
123 | "metric_type": "counter", | |
124 | "value_type": "integer", | |
125 | "description": "Takes", | |
126 | "nick": "" | |
127 | }, | |
128 | "take_sum": { | |
129 | "type": 10, | |
130 | "metric_type": "counter", | |
131 | "value_type": "integer", | |
132 | "description": "Taken data", | |
133 | "nick": "" | |
134 | }, | |
135 | "put": { | |
136 | "type": 10, | |
137 | "metric_type": "counter", | |
138 | "value_type": "integer", | |
139 | "description": "Puts", | |
140 | "nick": "" | |
141 | }, | |
142 | "put_sum": { | |
143 | "type": 10, | |
144 | "metric_type": "counter", | |
145 | "value_type": "integer", | |
146 | "description": "Put data", | |
147 | "nick": "" | |
148 | }, | |
149 | "wait": { | |
150 | "type": 5, | |
151 | "metric_type": "gauge", | |
152 | "value_type": "real-integer-pair", | |
153 | "description": "Waiting latency", | |
154 | "nick": "" | |
155 | } | |
156 | } | |
157 | ||
158 | ||
159 | Dump | |
160 | ---- | |
161 | ||
162 | The actual dump is similar to the schema, except that average values are grouped. For example:: | |
163 | ||
164 | { | |
165 | "throttle-msgr_dispatch_throttler-hbserver" : { | |
166 | "get_or_fail_fail" : 0, | |
167 | "get_sum" : 0, | |
168 | "max" : 104857600, | |
169 | "put" : 0, | |
170 | "val" : 0, | |
171 | "take" : 0, | |
172 | "get_or_fail_success" : 0, | |
173 | "wait" : { | |
174 | "avgcount" : 0, | |
175 | "sum" : 0 | |
176 | }, | |
177 | "get" : 0, | |
178 | "take_sum" : 0, | |
179 | "put_sum" : 0 | |
180 | }, | |
181 | "throttle-msgr_dispatch_throttler-client" : { | |
182 | "get_or_fail_fail" : 0, | |
183 | "get_sum" : 82760, | |
184 | "max" : 104857600, | |
185 | "put" : 2637, | |
186 | "val" : 0, | |
187 | "take" : 0, | |
188 | "get_or_fail_success" : 0, | |
189 | "wait" : { | |
190 | "avgcount" : 0, | |
191 | "sum" : 0 | |
192 | }, | |
193 | "get" : 2637, | |
194 | "take_sum" : 0, | |
195 | "put_sum" : 82760 | |
196 | } | |
197 | } | |
198 |