]> git.proxmox.com Git - mirror_ubuntu-artful-kernel.git/blame - Documentation/trace/events.txt
tracing: Add support for named triggers
[mirror_ubuntu-artful-kernel.git] / Documentation / trace / events.txt
CommitLineData
abd41443
TT
1 Event Tracing
2
3 Documentation written by Theodore Ts'o
95b69608 4 Updated by Li Zefan and Tom Zanussi
abd41443 5
143c145e
LZ
61. Introduction
7===============
abd41443
TT
8
9Tracepoints (see Documentation/trace/tracepoints.txt) can be used
10without creating custom kernel modules to register probe functions
11using the event tracing infrastructure.
12
13Not all tracepoints can be traced using the event tracing system;
14the kernel developer must provide code snippets which define how the
15tracing information is saved into the tracing buffer, and how the
143c145e 16tracing information should be printed.
abd41443 17
143c145e
LZ
182. Using Event Tracing
19======================
20
212.1 Via the 'set_event' interface
22---------------------------------
abd41443
TT
23
24The events which are available for tracing can be found in the file
52ad51e7 25/sys/kernel/debug/tracing/available_events.
abd41443
TT
26
27To enable a particular event, such as 'sched_wakeup', simply echo it
52ad51e7 28to /sys/kernel/debug/tracing/set_event. For example:
abd41443 29
52ad51e7 30 # echo sched_wakeup >> /sys/kernel/debug/tracing/set_event
abd41443 31
143c145e
LZ
32[ Note: '>>' is necessary, otherwise it will firstly disable
33 all the events. ]
abd41443
TT
34
35To disable an event, echo the event name to the set_event file prefixed
36with an exclamation point:
37
52ad51e7 38 # echo '!sched_wakeup' >> /sys/kernel/debug/tracing/set_event
143c145e
LZ
39
40To disable all events, echo an empty line to the set_event file:
41
52ad51e7 42 # echo > /sys/kernel/debug/tracing/set_event
abd41443 43
143c145e 44To enable all events, echo '*:*' or '*:' to the set_event file:
abd41443 45
52ad51e7 46 # echo *:* > /sys/kernel/debug/tracing/set_event
abd41443
TT
47
48The events are organized into subsystems, such as ext4, irq, sched,
49etc., and a full event name looks like this: <subsystem>:<event>. The
50subsystem name is optional, but it is displayed in the available_events
51file. All of the events in a subsystem can be specified via the syntax
52"<subsystem>:*"; for example, to enable all irq events, you can use the
53command:
54
52ad51e7 55 # echo 'irq:*' > /sys/kernel/debug/tracing/set_event
143c145e
LZ
56
572.2 Via the 'enable' toggle
58---------------------------
59
52ad51e7 60The events available are also listed in /sys/kernel/debug/tracing/events/ hierarchy
143c145e
LZ
61of directories.
62
63To enable event 'sched_wakeup':
64
52ad51e7 65 # echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
143c145e
LZ
66
67To disable it:
68
52ad51e7 69 # echo 0 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
143c145e
LZ
70
71To enable all events in sched subsystem:
72
52ad51e7 73 # echo 1 > /sys/kernel/debug/tracing/events/sched/enable
143c145e 74
6afb1c65 75To enable all events:
143c145e 76
52ad51e7 77 # echo 1 > /sys/kernel/debug/tracing/events/enable
143c145e
LZ
78
79When reading one of these enable files, there are four results:
80
81 0 - all events this file affects are disabled
82 1 - all events this file affects are enabled
83 X - there is a mixture of events enabled and disabled
84 ? - this file does not affect any event
85
020e5f85
LZ
862.3 Boot option
87---------------
88
89In order to facilitate early boot debugging, use boot option:
90
91 trace_event=[event-list]
92
03d646e6
LZ
93event-list is a comma separated list of events. See section 2.1 for event
94format.
020e5f85 95
143c145e
LZ
963. Defining an event-enabled tracepoint
97=======================================
98
99See The example provided in samples/trace_events
100
95b69608
TZ
1014. Event formats
102================
103
104Each trace event has a 'format' file associated with it that contains
105a description of each field in a logged event. This information can
106be used to parse the binary trace stream, and is also the place to
107find the field names that can be used in event filters (see section 5).
108
109It also displays the format string that will be used to print the
110event in text mode, along with the event name and ID used for
111profiling.
112
113Every event has a set of 'common' fields associated with it; these are
114the fields prefixed with 'common_'. The other fields vary between
115events and correspond to the fields defined in the TRACE_EVENT
116definition for that event.
117
118Each field in the format has the form:
119
120 field:field-type field-name; offset:N; size:N;
121
122where offset is the offset of the field in the trace record and size
123is the size of the data item, in bytes.
124
125For example, here's the information displayed for the 'sched_wakeup'
126event:
127
bfc672dc 128# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
95b69608
TZ
129
130name: sched_wakeup
131ID: 60
132format:
133 field:unsigned short common_type; offset:0; size:2;
134 field:unsigned char common_flags; offset:2; size:1;
135 field:unsigned char common_preempt_count; offset:3; size:1;
136 field:int common_pid; offset:4; size:4;
137 field:int common_tgid; offset:8; size:4;
138
139 field:char comm[TASK_COMM_LEN]; offset:12; size:16;
140 field:pid_t pid; offset:28; size:4;
141 field:int prio; offset:32; size:4;
142 field:int success; offset:36; size:4;
143 field:int cpu; offset:40; size:4;
144
145print fmt: "task %s:%d [%d] success=%d [%03d]", REC->comm, REC->pid,
146 REC->prio, REC->success, REC->cpu
147
148This event contains 10 fields, the first 5 common and the remaining 5
149event-specific. All the fields for this event are numeric, except for
150'comm' which is a string, a distinction important for event filtering.
151
1525. Event filtering
153==================
154
155Trace events can be filtered in the kernel by associating boolean
156'filter expressions' with them. As soon as an event is logged into
157the trace buffer, its fields are checked against the filter expression
158associated with that event type. An event with field values that
159'match' the filter will appear in the trace output, and an event whose
160values don't match will be discarded. An event with no filter
161associated with it matches everything, and is the default when no
162filter has been set for an event.
163
1645.1 Expression syntax
165---------------------
166
167A filter expression consists of one or more 'predicates' that can be
168combined using the logical operators '&&' and '||'. A predicate is
169simply a clause that compares the value of a field contained within a
170logged event with a constant value and returns either 0 or 1 depending
171on whether the field value matched (1) or didn't match (0):
172
173 field-name relational-operator value
174
175Parentheses can be used to provide arbitrary logical groupings and
176double-quotes can be used to prevent the shell from interpreting
177operators as shell metacharacters.
178
179The field-names available for use in filters can be found in the
180'format' files for trace events (see section 4).
181
182The relational-operators depend on the type of the field being tested:
183
184The operators available for numeric fields are:
185
1a891cf1 186==, !=, <, <=, >, >=, &
95b69608
TZ
187
188And for string fields they are:
189
c3e13c7c 190==, !=, ~
95b69608 191
c3e13c7c
SRRH
192The glob (~) only accepts a wild card character (*) at the start and or
193end of the string. For example:
194
195 prev_comm ~ "*sh"
196 prev_comm ~ "sh*"
197 prev_comm ~ "*sh*"
198
199But does not allow for it to be within the string:
200
201 prev_comm ~ "ba*sh" <-- is invalid
95b69608 202
95b69608
TZ
2035.2 Setting filters
204-------------------
205
206A filter for an individual event is set by writing a filter expression
207to the 'filter' file for the given event.
208
209For example:
210
bfc672dc 211# cd /sys/kernel/debug/tracing/events/sched/sched_wakeup
95b69608
TZ
212# echo "common_preempt_count > 4" > filter
213
214A slightly more involved example:
215
bfc672dc 216# cd /sys/kernel/debug/tracing/events/signal/signal_generate
95b69608
TZ
217# echo "((sig >= 10 && sig < 15) || sig == 17) && comm != bash" > filter
218
219If there is an error in the expression, you'll get an 'Invalid
220argument' error when setting it, and the erroneous string along with
221an error message can be seen by looking at the filter e.g.:
222
bfc672dc 223# cd /sys/kernel/debug/tracing/events/signal/signal_generate
95b69608
TZ
224# echo "((sig >= 10 && sig < 15) || dsig == 17) && comm != bash" > filter
225-bash: echo: write error: Invalid argument
226# cat filter
227((sig >= 10 && sig < 15) || dsig == 17) && comm != bash
228^
229parse_error: Field not found
230
231Currently the caret ('^') for an error always appears at the beginning of
232the filter string; the error message should still be useful though
233even without more accurate position info.
234
2355.3 Clearing filters
236--------------------
237
238To clear the filter for an event, write a '0' to the event's filter
239file.
240
241To clear the filters for all events in a subsystem, write a '0' to the
242subsystem's filter file.
243
2445.3 Subsystem filters
245---------------------
246
247For convenience, filters for every event in a subsystem can be set or
248cleared as a group by writing a filter expression into the filter file
88393161 249at the root of the subsystem. Note however, that if a filter for any
95b69608
TZ
250event within the subsystem lacks a field specified in the subsystem
251filter, or if the filter can't be applied for any other reason, the
252filter for that event will retain its previous setting. This can
253result in an unintended mixture of filters which could lead to
254confusing (to the user who might think different filters are in
255effect) trace output. Only filters that reference just the common
256fields can be guaranteed to propagate successfully to all events.
257
258Here are a few subsystem filter examples that also illustrate the
259above points:
260
88393161 261Clear the filters on all events in the sched subsystem:
95b69608
TZ
262
263# cd /sys/kernel/debug/tracing/events/sched
264# echo 0 > filter
265# cat sched_switch/filter
266none
267# cat sched_wakeup/filter
268none
269
270Set a filter using only common fields for all events in the sched
88393161 271subsystem (all events end up with the same filter):
95b69608
TZ
272
273# cd /sys/kernel/debug/tracing/events/sched
274# echo common_pid == 0 > filter
275# cat sched_switch/filter
276common_pid == 0
277# cat sched_wakeup/filter
278common_pid == 0
279
280Attempt to set a filter using a non-common field for all events in the
88393161 281sched subsystem (all events but those that have a prev_pid field retain
95b69608
TZ
282their old filters):
283
284# cd /sys/kernel/debug/tracing/events/sched
285# echo prev_pid == 0 > filter
286# cat sched_switch/filter
287prev_pid == 0
288# cat sched_wakeup/filter
289common_pid == 0
ac38fb85 290
627645fd
SRRH
2915.4 PID filtering
292-----------------
293
294The set_event_pid file in the same directory as the top events directory
295exists, will filter all events from tracing any task that does not have the
296PID listed in the set_event_pid file.
297
298# cd /sys/kernel/debug/tracing
299# echo $$ > set_event_pid
300# echo 1 > events/enabled
301
302Will only trace events for the current task.
303
304To add more PIDs without losing the PIDs already included, use '>>'.
305
306# echo 123 244 1 >> set_event_pid
307
308
ac38fb85
TZ
3096. Event triggers
310=================
311
312Trace events can be made to conditionally invoke trigger 'commands'
313which can take various forms and are described in detail below;
314examples would be enabling or disabling other trace events or invoking
315a stack trace whenever the trace event is hit. Whenever a trace event
316with attached triggers is invoked, the set of trigger commands
317associated with that event is invoked. Any given trigger can
318additionally have an event filter of the same form as described in
319section 5 (Event filtering) associated with it - the command will only
320be invoked if the event being invoked passes the associated filter.
321If no filter is associated with the trigger, it always passes.
322
323Triggers are added to and removed from a particular event by writing
324trigger expressions to the 'trigger' file for the given event.
325
326A given event can have any number of triggers associated with it,
327subject to any restrictions that individual commands may have in that
328regard.
329
330Event triggers are implemented on top of "soft" mode, which means that
331whenever a trace event has one or more triggers associated with it,
332the event is activated even if it isn't actually enabled, but is
333disabled in a "soft" mode. That is, the tracepoint will be called,
334but just will not be traced, unless of course it's actually enabled.
335This scheme allows triggers to be invoked even for events that aren't
336enabled, and also allows the current event filter implementation to be
337used for conditionally invoking triggers.
338
339The syntax for event triggers is roughly based on the syntax for
340set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
341section of Documentation/trace/ftrace.txt), but there are major
342differences and the implementation isn't currently tied to it in any
343way, so beware about making generalizations between the two.
344
3456.1 Expression syntax
346---------------------
347
348Triggers are added by echoing the command to the 'trigger' file:
349
350 # echo 'command[:count] [if filter]' > trigger
351
352Triggers are removed by echoing the same command but starting with '!'
353to the 'trigger' file:
354
355 # echo '!command[:count] [if filter]' > trigger
356
357The [if filter] part isn't used in matching commands when removing, so
358leaving that off in a '!' command will accomplish the same thing as
359having it in.
360
361The filter syntax is the same as that described in the 'Event
362filtering' section above.
363
364For ease of use, writing to the trigger file using '>' currently just
365adds or removes a single trigger and there's no explicit '>>' support
366('>' actually behaves like '>>') or truncation support to remove all
367triggers (you have to use '!' for each one added.)
368
3696.2 Supported trigger commands
370------------------------------
371
372The following commands are supported:
373
374- enable_event/disable_event
375
376 These commands can enable or disable another trace event whenever
377 the triggering event is hit. When these commands are registered,
378 the other trace event is activated, but disabled in a "soft" mode.
379 That is, the tracepoint will be called, but just will not be traced.
380 The event tracepoint stays in this mode as long as there's a trigger
381 in effect that can trigger it.
382
383 For example, the following trigger causes kmalloc events to be
384 traced when a read system call is entered, and the :1 at the end
385 specifies that this enablement happens only once:
386
387 # echo 'enable_event:kmem:kmalloc:1' > \
388 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
389
390 The following trigger causes kmalloc events to stop being traced
391 when a read system call exits. This disablement happens on every
392 read system call exit:
393
394 # echo 'disable_event:kmem:kmalloc' > \
395 /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
396
397 The format is:
398
399 enable_event:<system>:<event>[:count]
400 disable_event:<system>:<event>[:count]
401
402 To remove the above commands:
403
404 # echo '!enable_event:kmem:kmalloc:1' > \
405 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
406
407 # echo '!disable_event:kmem:kmalloc' > \
408 /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
409
410 Note that there can be any number of enable/disable_event triggers
411 per triggering event, but there can only be one trigger per
412 triggered event. e.g. sys_enter_read can have triggers enabling both
413 kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
414 versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
415 bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
416 could be combined into a single filter on kmem:kmalloc though).
417
418- stacktrace
419
420 This command dumps a stacktrace in the trace buffer whenever the
421 triggering event occurs.
422
423 For example, the following trigger dumps a stacktrace every time the
424 kmalloc tracepoint is hit:
425
426 # echo 'stacktrace' > \
427 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
428
429 The following trigger dumps a stacktrace the first 5 times a kmalloc
430 request happens with a size >= 64K
431
432 # echo 'stacktrace:5 if bytes_req >= 65536' > \
433 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
434
435 The format is:
436
437 stacktrace[:count]
438
439 To remove the above commands:
440
441 # echo '!stacktrace' > \
442 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
443
444 # echo '!stacktrace:5 if bytes_req >= 65536' > \
445 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
446
447 The latter can also be removed more simply by the following (without
448 the filter):
449
450 # echo '!stacktrace:5' > \
451 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
452
453 Note that there can be only one stacktrace trigger per triggering
454 event.
455
456- snapshot
457
458 This command causes a snapshot to be triggered whenever the
459 triggering event occurs.
460
461 The following command creates a snapshot every time a block request
462 queue is unplugged with a depth > 1. If you were tracing a set of
463 events or functions at the time, the snapshot trace buffer would
c98be0c9 464 capture those events when the trigger event occurred:
ac38fb85
TZ
465
466 # echo 'snapshot if nr_rq > 1' > \
467 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
468
469 To only snapshot once:
470
471 # echo 'snapshot:1 if nr_rq > 1' > \
472 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
473
474 To remove the above commands:
475
476 # echo '!snapshot if nr_rq > 1' > \
477 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
478
479 # echo '!snapshot:1 if nr_rq > 1' > \
480 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
481
482 Note that there can be only one snapshot trigger per triggering
483 event.
484
485- traceon/traceoff
486
487 These commands turn tracing on and off when the specified events are
488 hit. The parameter determines how many times the tracing system is
489 turned on and off. If unspecified, there is no limit.
490
491 The following command turns tracing off the first time a block
492 request queue is unplugged with a depth > 1. If you were tracing a
493 set of events or functions at the time, you could then examine the
494 trace buffer to see the sequence of events that led up to the
495 trigger event:
496
497 # echo 'traceoff:1 if nr_rq > 1' > \
498 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
499
500 To always disable tracing when nr_rq > 1 :
501
502 # echo 'traceoff if nr_rq > 1' > \
503 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
504
505 To remove the above commands:
506
507 # echo '!traceoff:1 if nr_rq > 1' > \
508 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
509
510 # echo '!traceoff if nr_rq > 1' > \
511 /sys/kernel/debug/tracing/events/block/block_unplug/trigger
512
513 Note that there can be only one traceon or traceoff trigger per
514 triggering event.
0fc3813c
TZ
515
516- hist
517
518 This command aggregates event hits into a hash table keyed on one or
519 more trace event format fields (or stacktrace) and a set of running
520 totals derived from one or more trace event format fields and/or
521 event counts (hitcount).
522
523 The format of a hist trigger is as follows:
524
525 hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
526 [:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
527 [:clear] [if <filter>]
528
529 When a matching event is hit, an entry is added to a hash table
530 using the key(s) and value(s) named. Keys and values correspond to
531 fields in the event's format description. Values must correspond to
532 numeric fields - on an event hit, the value(s) will be added to a
533 sum kept for that field. The special string 'hitcount' can be used
534 in place of an explicit value field - this is simply a count of
535 event hits. If 'values' isn't specified, an implicit 'hitcount'
536 value will be automatically created and used as the only value.
537 Keys can be any field, or the special string 'stacktrace', which
538 will use the event's kernel stacktrace as the key. The keywords
539 'keys' or 'key' can be used to specify keys, and the keywords
540 'values', 'vals', or 'val' can be used to specify values. Compound
541 keys consisting of up to two fields can be specified by the 'keys'
542 keyword. Hashing a compound key produces a unique entry in the
543 table for each unique combination of component keys, and can be
544 useful for providing more fine-grained summaries of event data.
545 Additionally, sort keys consisting of up to two fields can be
546 specified by the 'sort' keyword. If more than one field is
547 specified, the result will be a 'sort within a sort': the first key
548 is taken to be the primary sort key and the second the secondary
549 key.
550
551 'hist' triggers add a 'hist' file to each event's subdirectory.
552 Reading the 'hist' file for the event will dump the hash table in
52a7f16d
TZ
553 its entirety to stdout. If there are multiple hist triggers
554 attached to an event, there will be a table for each trigger in the
555 output. Each printed hash table entry is a simple list of the keys
556 and values comprising the entry; keys are printed first and are
557 delineated by curly braces, and are followed by the set of value
558 fields for the entry. By default, numeric fields are displayed as
559 base-10 integers. This can be modified by appending any of the
560 following modifiers to the field name:
0fc3813c
TZ
561
562 .hex display a number as a hex value
563 .sym display an address as a symbol
564 .sym-offset display an address as a symbol and offset
565 .syscall display a syscall id as a system call name
566 .execname display a common_pid as a program name
567
568 Note that in general the semantics of a given field aren't
569 interpreted when applying a modifier to it, but there are some
570 restrictions to be aware of in this regard:
571
572 - only the 'hex' modifier can be used for values (because values
573 are essentially sums, and the other modifiers don't make sense
574 in that context).
575 - the 'execname' modifier can only be used on a 'common_pid'. The
576 reason for this is that the execname is simply the 'comm' value
577 saved for the 'current' process when an event was triggered,
578 which is the same as the common_pid value saved by the event
579 tracing code. Trying to apply that comm value to other pid
580 values wouldn't be correct, and typically events that care save
581 pid-specific comm fields in the event itself.
582
583 A typical usage scenario would be the following to enable a hist
584 trigger, read its current contents, and then turn it off:
585
586 # echo 'hist:keys=skbaddr.hex:vals=len' > \
587 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
588
589 # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
590
591 # echo '!hist:keys=skbaddr.hex:vals=len' > \
592 /sys/kernel/debug/tracing/events/net/netif_rx/trigger
593
594 The trigger file itself can be read to show the details of the
595 currently attached hist trigger. This information is also displayed
596 at the top of the 'hist' file when read.
597
598 By default, the size of the hash table is 2048 entries. The 'size'
599 parameter can be used to specify more or fewer than that. The units
600 are in terms of hashtable entries - if a run uses more entries than
601 specified, the results will show the number of 'drops', the number
602 of hits that were ignored. The size should be a power of 2 between
603 128 and 131072 (any non- power-of-2 number specified will be rounded
604 up).
605
606 The 'sort' parameter can be used to specify a value field to sort
607 on. The default if unspecified is 'hitcount' and the default sort
608 order is 'ascending'. To sort in the opposite direction, append
609 .descending' to the sort key.
610
611 The 'pause' parameter can be used to pause an existing hist trigger
612 or to start a hist trigger but not log any events until told to do
613 so. 'continue' or 'cont' can be used to start or restart a paused
614 hist trigger.
615
616 The 'clear' parameter will clear the contents of a running hist
617 trigger and leave its current paused/active state.
618
619 Note that the 'pause', 'cont', and 'clear' parameters should be
620 applied using 'append' shell operator ('>>') if applied to an
621 existing trigger, rather than via the '>' operator, which will cause
622 the trigger to be removed through truncation.
623
624- enable_hist/disable_hist
625
626 The enable_hist and disable_hist triggers can be used to have one
627 event conditionally start and stop another event's already-attached
628 hist trigger. Any number of enable_hist and disable_hist triggers
629 can be attached to a given event, allowing that event to kick off
630 and stop aggregations on a host of other events.
631
632 The format is very similar to the enable/disable_event triggers:
633
634 enable_hist:<system>:<event>[:count]
635 disable_hist:<system>:<event>[:count]
636
637 Instead of enabling or disabling the tracing of the target event
638 into the trace buffer as the enable/disable_event triggers do, the
639 enable/disable_hist triggers enable or disable the aggregation of
640 the target event into a hash table.
641
642 A typical usage scenario for the enable_hist/disable_hist triggers
643 would be to first set up a paused hist trigger on some event,
644 followed by an enable_hist/disable_hist pair that turns the hist
645 aggregation on and off when conditions of interest are hit:
646
647 # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
648 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
649
650 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
651 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
652
653 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
654 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
655
656 The above sets up an initially paused hist trigger which is unpaused
657 and starts aggregating events when a given program is executed, and
658 which stops aggregating when the process exits and the hist trigger
659 is paused again.
660
661 The examples below provide a more concrete illustration of the
662 concepts and typical usage patterns discussed above.
663
664
6656.2 'hist' trigger examples
666---------------------------
667
668 The first set of examples creates aggregations using the kmalloc
669 event. The fields that can be used for the hist trigger are listed
670 in the kmalloc event's format file:
671
672 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
673 name: kmalloc
674 ID: 374
675 format:
676 field:unsigned short common_type; offset:0; size:2; signed:0;
677 field:unsigned char common_flags; offset:2; size:1; signed:0;
678 field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
679 field:int common_pid; offset:4; size:4; signed:1;
680
681 field:unsigned long call_site; offset:8; size:8; signed:0;
682 field:const void * ptr; offset:16; size:8; signed:0;
683 field:size_t bytes_req; offset:24; size:8; signed:0;
684 field:size_t bytes_alloc; offset:32; size:8; signed:0;
685 field:gfp_t gfp_flags; offset:40; size:4; signed:0;
686
687 We'll start by creating a hist trigger that generates a simple table
688 that lists the total number of bytes requested for each function in
689 the kernel that made one or more calls to kmalloc:
690
691 # echo 'hist:key=call_site:val=bytes_req' > \
692 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
693
694 This tells the tracing system to create a 'hist' trigger using the
695 call_site field of the kmalloc event as the key for the table, which
696 just means that each unique call_site address will have an entry
697 created for it in the table. The 'val=bytes_req' parameter tells
698 the hist trigger that for each unique entry (call_site) in the
699 table, it should keep a running total of the number of bytes
700 requested by that call_site.
701
702 We'll let it run for awhile and then dump the contents of the 'hist'
703 file in the kmalloc event's subdirectory (for readability, a number
704 of entries have been omitted):
705
706 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
707 # trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
708
709 { call_site: 18446744072106379007 } hitcount: 1 bytes_req: 176
710 { call_site: 18446744071579557049 } hitcount: 1 bytes_req: 1024
711 { call_site: 18446744071580608289 } hitcount: 1 bytes_req: 16384
712 { call_site: 18446744071581827654 } hitcount: 1 bytes_req: 24
713 { call_site: 18446744071580700980 } hitcount: 1 bytes_req: 8
714 { call_site: 18446744071579359876 } hitcount: 1 bytes_req: 152
715 { call_site: 18446744071580795365 } hitcount: 3 bytes_req: 144
716 { call_site: 18446744071581303129 } hitcount: 3 bytes_req: 144
717 { call_site: 18446744071580713234 } hitcount: 4 bytes_req: 2560
718 { call_site: 18446744071580933750 } hitcount: 4 bytes_req: 736
719 .
720 .
721 .
722 { call_site: 18446744072106047046 } hitcount: 69 bytes_req: 5576
723 { call_site: 18446744071582116407 } hitcount: 73 bytes_req: 2336
724 { call_site: 18446744072106054684 } hitcount: 136 bytes_req: 140504
725 { call_site: 18446744072106224230 } hitcount: 136 bytes_req: 19584
726 { call_site: 18446744072106078074 } hitcount: 153 bytes_req: 2448
727 { call_site: 18446744072106062406 } hitcount: 153 bytes_req: 36720
728 { call_site: 18446744071582507929 } hitcount: 153 bytes_req: 37088
729 { call_site: 18446744072102520590 } hitcount: 273 bytes_req: 10920
730 { call_site: 18446744071582143559 } hitcount: 358 bytes_req: 716
731 { call_site: 18446744072106465852 } hitcount: 417 bytes_req: 56712
732 { call_site: 18446744072102523378 } hitcount: 485 bytes_req: 27160
733 { call_site: 18446744072099568646 } hitcount: 1676 bytes_req: 33520
734
735 Totals:
736 Hits: 4610
737 Entries: 45
738 Dropped: 0
739
740 The output displays a line for each entry, beginning with the key
741 specified in the trigger, followed by the value(s) also specified in
742 the trigger. At the beginning of the output is a line that displays
743 the trigger info, which can also be displayed by reading the
744 'trigger' file:
745
746 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
747 hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
748
749 At the end of the output are a few lines that display the overall
750 totals for the run. The 'Hits' field shows the total number of
751 times the event trigger was hit, the 'Entries' field shows the total
752 number of used entries in the hash table, and the 'Dropped' field
753 shows the number of hits that were dropped because the number of
754 used entries for the run exceeded the maximum number of entries
755 allowed for the table (normally 0, but if not a hint that you may
756 want to increase the size of the table using the 'size' parameter).
757
758 Notice in the above output that there's an extra field, 'hitcount',
759 which wasn't specified in the trigger. Also notice that in the
760 trigger info output, there's a parameter, 'sort=hitcount', which
761 wasn't specified in the trigger either. The reason for that is that
762 every trigger implicitly keeps a count of the total number of hits
763 attributed to a given entry, called the 'hitcount'. That hitcount
764 information is explicitly displayed in the output, and in the
765 absence of a user-specified sort parameter, is used as the default
766 sort field.
767
768 The value 'hitcount' can be used in place of an explicit value in
769 the 'values' parameter if you don't really need to have any
770 particular field summed and are mainly interested in hit
771 frequencies.
772
773 To turn the hist trigger off, simply call up the trigger in the
774 command history and re-execute it with a '!' prepended:
775
776 # echo '!hist:key=call_site:val=bytes_req' > \
777 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
778
779 Finally, notice that the call_site as displayed in the output above
780 isn't really very useful. It's an address, but normally addresses
781 are displayed in hex. To have a numeric field displayed as a hex
782 value, simply append '.hex' to the field name in the trigger:
783
784 # echo 'hist:key=call_site.hex:val=bytes_req' > \
785 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
786
787 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
788 # trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
789
790 { call_site: ffffffffa026b291 } hitcount: 1 bytes_req: 433
791 { call_site: ffffffffa07186ff } hitcount: 1 bytes_req: 176
792 { call_site: ffffffff811ae721 } hitcount: 1 bytes_req: 16384
793 { call_site: ffffffff811c5134 } hitcount: 1 bytes_req: 8
794 { call_site: ffffffffa04a9ebb } hitcount: 1 bytes_req: 511
795 { call_site: ffffffff8122e0a6 } hitcount: 1 bytes_req: 12
796 { call_site: ffffffff8107da84 } hitcount: 1 bytes_req: 152
797 { call_site: ffffffff812d8246 } hitcount: 1 bytes_req: 24
798 { call_site: ffffffff811dc1e5 } hitcount: 3 bytes_req: 144
799 { call_site: ffffffffa02515e8 } hitcount: 3 bytes_req: 648
800 { call_site: ffffffff81258159 } hitcount: 3 bytes_req: 144
801 { call_site: ffffffff811c80f4 } hitcount: 4 bytes_req: 544
802 .
803 .
804 .
805 { call_site: ffffffffa06c7646 } hitcount: 106 bytes_req: 8024
806 { call_site: ffffffffa06cb246 } hitcount: 132 bytes_req: 31680
807 { call_site: ffffffffa06cef7a } hitcount: 132 bytes_req: 2112
808 { call_site: ffffffff8137e399 } hitcount: 132 bytes_req: 23232
809 { call_site: ffffffffa06c941c } hitcount: 185 bytes_req: 171360
810 { call_site: ffffffffa06f2a66 } hitcount: 185 bytes_req: 26640
811 { call_site: ffffffffa036a70e } hitcount: 265 bytes_req: 10600
812 { call_site: ffffffff81325447 } hitcount: 292 bytes_req: 584
813 { call_site: ffffffffa072da3c } hitcount: 446 bytes_req: 60656
814 { call_site: ffffffffa036b1f2 } hitcount: 526 bytes_req: 29456
815 { call_site: ffffffffa0099c06 } hitcount: 1780 bytes_req: 35600
816
817 Totals:
818 Hits: 4775
819 Entries: 46
820 Dropped: 0
821
822 Even that's only marginally more useful - while hex values do look
823 more like addresses, what users are typically more interested in
824 when looking at text addresses are the corresponding symbols
825 instead. To have an address displayed as symbolic value instead,
826 simply append '.sym' or '.sym-offset' to the field name in the
827 trigger:
828
829 # echo 'hist:key=call_site.sym:val=bytes_req' > \
830 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
831
832 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
833 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
834
835 { call_site: [ffffffff810adcb9] syslog_print_all } hitcount: 1 bytes_req: 1024
836 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
837 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
838 { call_site: [ffffffff8154acbe] usb_alloc_urb } hitcount: 1 bytes_req: 192
839 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
840 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
841 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
842 { call_site: [ffffffff811febd5] fsnotify_alloc_group } hitcount: 2 bytes_req: 528
843 { call_site: [ffffffff81440f58] __tty_buffer_request_room } hitcount: 2 bytes_req: 2624
844 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 2 bytes_req: 96
845 { call_site: [ffffffffa05e19af] ieee80211_start_tx_ba_session [mac80211] } hitcount: 2 bytes_req: 464
846 { call_site: [ffffffff81672406] tcp_get_metrics } hitcount: 2 bytes_req: 304
847 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
848 { call_site: [ffffffff81089b05] sched_create_group } hitcount: 2 bytes_req: 1424
849 .
850 .
851 .
852 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1185 bytes_req: 123240
853 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 1185 bytes_req: 104280
854 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 1402 bytes_req: 190672
855 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 1518 bytes_req: 146208
856 { call_site: [ffffffffa029070e] drm_vma_node_allow [drm] } hitcount: 1746 bytes_req: 69840
857 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 2021 bytes_req: 792312
858 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 2592 bytes_req: 145152
859 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2629 bytes_req: 378576
860 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2629 bytes_req: 3783248
861 { call_site: [ffffffff81325607] apparmor_file_alloc_security } hitcount: 5192 bytes_req: 10384
862 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 5529 bytes_req: 110584
863 { call_site: [ffffffff8131ebf7] aa_alloc_task_context } hitcount: 21943 bytes_req: 702176
864 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 55759 bytes_req: 5074265
865
866 Totals:
867 Hits: 109928
868 Entries: 71
869 Dropped: 0
870
871 Because the default sort key above is 'hitcount', the above shows a
872 the list of call_sites by increasing hitcount, so that at the bottom
873 we see the functions that made the most kmalloc calls during the
874 run. If instead we we wanted to see the top kmalloc callers in
875 terms of the number of bytes requested rather than the number of
876 calls, and we wanted the top caller to appear at the top, we can use
877 the 'sort' parameter, along with the 'descending' modifier:
878
879 # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
880 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
881
882 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
883 # trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
884
885 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 2186 bytes_req: 3397464
886 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1790 bytes_req: 712176
887 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 8132 bytes_req: 513135
888 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 106 bytes_req: 440128
889 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 2186 bytes_req: 314784
890 { call_site: [ffffffff812891ca] ext4_find_extent } hitcount: 2174 bytes_req: 208992
891 { call_site: [ffffffff811ae8e1] __kmalloc } hitcount: 8 bytes_req: 131072
892 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 859 bytes_req: 116824
893 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 1834 bytes_req: 102704
894 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 972 bytes_req: 101088
895 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl [drm] } hitcount: 972 bytes_req: 85536
896 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 3333 bytes_req: 66664
897 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 209 bytes_req: 61632
898 .
899 .
900 .
901 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128
902 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128
903 { call_site: [ffffffff812d8406] copy_semundo } hitcount: 2 bytes_req: 48
904 { call_site: [ffffffff81200ba6] inotify_new_group } hitcount: 1 bytes_req: 48
905 { call_site: [ffffffffa027121a] drm_getmagic [drm] } hitcount: 1 bytes_req: 48
906 { call_site: [ffffffff811e3a25] __seq_open_private } hitcount: 1 bytes_req: 40
907 { call_site: [ffffffff811c52f4] bprm_change_interp } hitcount: 2 bytes_req: 16
908 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8
909 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7
910 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7
911
912 Totals:
913 Hits: 32133
914 Entries: 81
915 Dropped: 0
916
917 To display the offset and size information in addition to the symbol
918 name, just use 'sym-offset' instead:
919
920 # echo 'hist:key=call_site.sym-offset:val=bytes_req:sort=bytes_req.descending' > \
921 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
922
923 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
924 # trigger info: hist:keys=call_site.sym-offset:vals=bytes_req:sort=bytes_req.descending:size=2048 [active]
925
926 { call_site: [ffffffffa046041c] i915_gem_execbuffer2+0x6c/0x2c0 [i915] } hitcount: 4569 bytes_req: 3163720
927 { call_site: [ffffffffa0489a66] intel_ring_begin+0xc6/0x1f0 [i915] } hitcount: 4569 bytes_req: 657936
928 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23+0x694/0x1020 [i915] } hitcount: 1519 bytes_req: 472936
929 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23+0x516/0x1020 [i915] } hitcount: 3050 bytes_req: 211832
930 { call_site: [ffffffff811e2a1b] seq_buf_alloc+0x1b/0x50 } hitcount: 34 bytes_req: 148384
931 { call_site: [ffffffffa04a580c] intel_crtc_page_flip+0xbc/0x870 [i915] } hitcount: 1385 bytes_req: 144040
932 { call_site: [ffffffff811ae8e1] __kmalloc+0x191/0x1b0 } hitcount: 8 bytes_req: 131072
933 { call_site: [ffffffffa0287592] drm_mode_page_flip_ioctl+0x282/0x360 [drm] } hitcount: 1385 bytes_req: 121880
934 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc+0x32/0x100 [drm] } hitcount: 1848 bytes_req: 103488
935 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state+0x2c/0xa0 [i915] } hitcount: 461 bytes_req: 62696
936 { call_site: [ffffffffa029070e] drm_vma_node_allow+0x2e/0xd0 [drm] } hitcount: 1541 bytes_req: 61640
937 { call_site: [ffffffff815f8d7b] sk_prot_alloc+0xcb/0x1b0 } hitcount: 57 bytes_req: 57456
938 .
939 .
940 .
941 { call_site: [ffffffff8109524a] alloc_fair_sched_group+0x5a/0x1a0 } hitcount: 2 bytes_req: 128
942 { call_site: [ffffffffa027b921] drm_vm_open_locked+0x31/0xa0 [drm] } hitcount: 3 bytes_req: 96
943 { call_site: [ffffffff8122e266] proc_self_follow_link+0x76/0xb0 } hitcount: 8 bytes_req: 96
944 { call_site: [ffffffff81213e80] load_elf_binary+0x240/0x1650 } hitcount: 3 bytes_req: 84
945 { call_site: [ffffffff8154bc62] usb_control_msg+0x42/0x110 } hitcount: 1 bytes_req: 8
946 { call_site: [ffffffffa00bf6fe] hidraw_send_report+0x7e/0x1a0 [hid] } hitcount: 1 bytes_req: 7
947 { call_site: [ffffffffa00bf1ca] hidraw_report_event+0x8a/0x120 [hid] } hitcount: 1 bytes_req: 7
948
949 Totals:
950 Hits: 26098
951 Entries: 64
952 Dropped: 0
953
954 We can also add multiple fields to the 'values' parameter. For
955 example, we might want to see the total number of bytes allocated
956 alongside bytes requested, and display the result sorted by bytes
957 allocated in a descending order:
958
959 # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
960 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
961
962 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
963 # trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc.descending:size=2048 [active]
964
965 { call_site: [ffffffffa046041c] i915_gem_execbuffer2 [i915] } hitcount: 7403 bytes_req: 4084360 bytes_alloc: 5958016
966 { call_site: [ffffffff811e2a1b] seq_buf_alloc } hitcount: 541 bytes_req: 2213968 bytes_alloc: 2228224
967 { call_site: [ffffffffa0489a66] intel_ring_begin [i915] } hitcount: 7404 bytes_req: 1066176 bytes_alloc: 1421568
968 { call_site: [ffffffffa045e7c4] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 1565 bytes_req: 557368 bytes_alloc: 1037760
969 { call_site: [ffffffff8125847d] ext4_htree_store_dirent } hitcount: 9557 bytes_req: 595778 bytes_alloc: 695744
970 { call_site: [ffffffffa045e646] i915_gem_do_execbuffer.isra.23 [i915] } hitcount: 5839 bytes_req: 430680 bytes_alloc: 470400
971 { call_site: [ffffffffa04c4a3c] intel_plane_duplicate_state [i915] } hitcount: 2388 bytes_req: 324768 bytes_alloc: 458496
972 { call_site: [ffffffffa02911f2] drm_modeset_lock_crtc [drm] } hitcount: 3911 bytes_req: 219016 bytes_alloc: 250304
973 { call_site: [ffffffff815f8d7b] sk_prot_alloc } hitcount: 235 bytes_req: 236880 bytes_alloc: 240640
974 { call_site: [ffffffff8137e559] sg_kmalloc } hitcount: 557 bytes_req: 169024 bytes_alloc: 221760
975 { call_site: [ffffffffa00b7c06] hid_report_raw_event [hid] } hitcount: 9378 bytes_req: 187548 bytes_alloc: 206312
976 { call_site: [ffffffffa04a580c] intel_crtc_page_flip [i915] } hitcount: 1519 bytes_req: 157976 bytes_alloc: 194432
977 .
978 .
979 .
980 { call_site: [ffffffff8109bd3b] sched_autogroup_create_attach } hitcount: 2 bytes_req: 144 bytes_alloc: 192
981 { call_site: [ffffffff81097ee8] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
982 { call_site: [ffffffff8109524a] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
983 { call_site: [ffffffff81095225] alloc_fair_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
984 { call_site: [ffffffff81097ec2] alloc_rt_sched_group } hitcount: 2 bytes_req: 128 bytes_alloc: 128
985 { call_site: [ffffffff81213e80] load_elf_binary } hitcount: 3 bytes_req: 84 bytes_alloc: 96
986 { call_site: [ffffffff81079a2e] kthread_create_on_node } hitcount: 1 bytes_req: 56 bytes_alloc: 64
987 { call_site: [ffffffffa00bf6fe] hidraw_send_report [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
988 { call_site: [ffffffff8154bc62] usb_control_msg } hitcount: 1 bytes_req: 8 bytes_alloc: 8
989 { call_site: [ffffffffa00bf1ca] hidraw_report_event [hid] } hitcount: 1 bytes_req: 7 bytes_alloc: 8
990
991 Totals:
992 Hits: 66598
993 Entries: 65
994 Dropped: 0
995
996 Finally, to finish off our kmalloc example, instead of simply having
997 the hist trigger display symbolic call_sites, we can have the hist
998 trigger additionally display the complete set of kernel stack traces
999 that led to each call_site. To do that, we simply use the special
1000 value 'stacktrace' for the key parameter:
1001
1002 # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
1003 /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
1004
1005 The above trigger will use the kernel stack trace in effect when an
1006 event is triggered as the key for the hash table. This allows the
1007 enumeration of every kernel callpath that led up to a particular
1008 event, along with a running total of any of the event fields for
1009 that event. Here we tally bytes requested and bytes allocated for
1010 every callpath in the system that led up to a kmalloc (in this case
1011 every callpath to a kmalloc for a kernel compile):
1012
1013 # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
1014 # trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
1015
1016 { stacktrace:
1017 __kmalloc_track_caller+0x10b/0x1a0
1018 kmemdup+0x20/0x50
1019 hidraw_report_event+0x8a/0x120 [hid]
1020 hid_report_raw_event+0x3ea/0x440 [hid]
1021 hid_input_report+0x112/0x190 [hid]
1022 hid_irq_in+0xc2/0x260 [usbhid]
1023 __usb_hcd_giveback_urb+0x72/0x120
1024 usb_giveback_urb_bh+0x9e/0xe0
1025 tasklet_hi_action+0xf8/0x100
1026 __do_softirq+0x114/0x2c0
1027 irq_exit+0xa5/0xb0
1028 do_IRQ+0x5a/0xf0
1029 ret_from_intr+0x0/0x30
1030 cpuidle_enter+0x17/0x20
1031 cpu_startup_entry+0x315/0x3e0
1032 rest_init+0x7c/0x80
1033 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
1034 { stacktrace:
1035 __kmalloc_track_caller+0x10b/0x1a0
1036 kmemdup+0x20/0x50
1037 hidraw_report_event+0x8a/0x120 [hid]
1038 hid_report_raw_event+0x3ea/0x440 [hid]
1039 hid_input_report+0x112/0x190 [hid]
1040 hid_irq_in+0xc2/0x260 [usbhid]
1041 __usb_hcd_giveback_urb+0x72/0x120
1042 usb_giveback_urb_bh+0x9e/0xe0
1043 tasklet_hi_action+0xf8/0x100
1044 __do_softirq+0x114/0x2c0
1045 irq_exit+0xa5/0xb0
1046 do_IRQ+0x5a/0xf0
1047 ret_from_intr+0x0/0x30
1048 } hitcount: 3 bytes_req: 21 bytes_alloc: 24
1049 { stacktrace:
1050 kmem_cache_alloc_trace+0xeb/0x150
1051 aa_alloc_task_context+0x27/0x40
1052 apparmor_cred_prepare+0x1f/0x50
1053 security_prepare_creds+0x16/0x20
1054 prepare_creds+0xdf/0x1a0
1055 SyS_capset+0xb5/0x200
1056 system_call_fastpath+0x12/0x6a
1057 } hitcount: 1 bytes_req: 32 bytes_alloc: 32
1058 .
1059 .
1060 .
1061 { stacktrace:
1062 __kmalloc+0x11b/0x1b0
1063 i915_gem_execbuffer2+0x6c/0x2c0 [i915]
1064 drm_ioctl+0x349/0x670 [drm]
1065 do_vfs_ioctl+0x2f0/0x4f0
1066 SyS_ioctl+0x81/0xa0
1067 system_call_fastpath+0x12/0x6a
1068 } hitcount: 17726 bytes_req: 13944120 bytes_alloc: 19593808
1069 { stacktrace:
1070 __kmalloc+0x11b/0x1b0
1071 load_elf_phdrs+0x76/0xa0
1072 load_elf_binary+0x102/0x1650
1073 search_binary_handler+0x97/0x1d0
1074 do_execveat_common.isra.34+0x551/0x6e0
1075 SyS_execve+0x3a/0x50
1076 return_from_execve+0x0/0x23
1077 } hitcount: 33348 bytes_req: 17152128 bytes_alloc: 20226048
1078 { stacktrace:
1079 kmem_cache_alloc_trace+0xeb/0x150
1080 apparmor_file_alloc_security+0x27/0x40
1081 security_file_alloc+0x16/0x20
1082 get_empty_filp+0x93/0x1c0
1083 path_openat+0x31/0x5f0
1084 do_filp_open+0x3a/0x90
1085 do_sys_open+0x128/0x220
1086 SyS_open+0x1e/0x20
1087 system_call_fastpath+0x12/0x6a
1088 } hitcount: 4766422 bytes_req: 9532844 bytes_alloc: 38131376
1089 { stacktrace:
1090 __kmalloc+0x11b/0x1b0
1091 seq_buf_alloc+0x1b/0x50
1092 seq_read+0x2cc/0x370
1093 proc_reg_read+0x3d/0x80
1094 __vfs_read+0x28/0xe0
1095 vfs_read+0x86/0x140
1096 SyS_read+0x46/0xb0
1097 system_call_fastpath+0x12/0x6a
1098 } hitcount: 19133 bytes_req: 78368768 bytes_alloc: 78368768
1099
1100 Totals:
1101 Hits: 6085872
1102 Entries: 253
1103 Dropped: 0
1104
1105 If you key a hist trigger on common_pid, in order for example to
1106 gather and display sorted totals for each process, you can use the
1107 special .execname modifier to display the executable names for the
1108 processes in the table rather than raw pids. The example below
1109 keeps a per-process sum of total bytes read:
1110
1111 # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
1112 /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
1113
1114 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
1115 # trigger info: hist:keys=common_pid.execname:vals=count:sort=count.descending:size=2048 [active]
1116
1117 { common_pid: gnome-terminal [ 3196] } hitcount: 280 count: 1093512
1118 { common_pid: Xorg [ 1309] } hitcount: 525 count: 256640
1119 { common_pid: compiz [ 2889] } hitcount: 59 count: 254400
1120 { common_pid: bash [ 8710] } hitcount: 3 count: 66369
1121 { common_pid: dbus-daemon-lau [ 8703] } hitcount: 49 count: 47739
1122 { common_pid: irqbalance [ 1252] } hitcount: 27 count: 27648
1123 { common_pid: 01ifupdown [ 8705] } hitcount: 3 count: 17216
1124 { common_pid: dbus-daemon [ 772] } hitcount: 10 count: 12396
1125 { common_pid: Socket Thread [ 8342] } hitcount: 11 count: 11264
1126 { common_pid: nm-dhcp-client. [ 8701] } hitcount: 6 count: 7424
1127 { common_pid: gmain [ 1315] } hitcount: 18 count: 6336
1128 .
1129 .
1130 .
1131 { common_pid: postgres [ 1892] } hitcount: 2 count: 32
1132 { common_pid: postgres [ 1891] } hitcount: 2 count: 32
1133 { common_pid: gmain [ 8704] } hitcount: 2 count: 32
1134 { common_pid: upstart-dbus-br [ 2740] } hitcount: 21 count: 21
1135 { common_pid: nm-dispatcher.a [ 8696] } hitcount: 1 count: 16
1136 { common_pid: indicator-datet [ 2904] } hitcount: 1 count: 16
1137 { common_pid: gdbus [ 2998] } hitcount: 1 count: 16
1138 { common_pid: rtkit-daemon [ 2052] } hitcount: 1 count: 8
1139 { common_pid: init [ 1] } hitcount: 2 count: 2
1140
1141 Totals:
1142 Hits: 2116
1143 Entries: 51
1144 Dropped: 0
1145
1146 Similarly, if you key a hist trigger on syscall id, for example to
1147 gather and display a list of systemwide syscall hits, you can use
1148 the special .syscall modifier to display the syscall names rather
1149 than raw ids. The example below keeps a running total of syscall
1150 counts for the system during the run:
1151
1152 # echo 'hist:key=id.syscall:val=hitcount' > \
1153 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1154
1155 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1156 # trigger info: hist:keys=id.syscall:vals=hitcount:sort=hitcount:size=2048 [active]
1157
1158 { id: sys_fsync [ 74] } hitcount: 1
1159 { id: sys_newuname [ 63] } hitcount: 1
1160 { id: sys_prctl [157] } hitcount: 1
1161 { id: sys_statfs [137] } hitcount: 1
1162 { id: sys_symlink [ 88] } hitcount: 1
1163 { id: sys_sendmmsg [307] } hitcount: 1
1164 { id: sys_semctl [ 66] } hitcount: 1
1165 { id: sys_readlink [ 89] } hitcount: 3
1166 { id: sys_bind [ 49] } hitcount: 3
1167 { id: sys_getsockname [ 51] } hitcount: 3
1168 { id: sys_unlink [ 87] } hitcount: 3
1169 { id: sys_rename [ 82] } hitcount: 4
1170 { id: unknown_syscall [ 58] } hitcount: 4
1171 { id: sys_connect [ 42] } hitcount: 4
1172 { id: sys_getpid [ 39] } hitcount: 4
1173 .
1174 .
1175 .
1176 { id: sys_rt_sigprocmask [ 14] } hitcount: 952
1177 { id: sys_futex [202] } hitcount: 1534
1178 { id: sys_write [ 1] } hitcount: 2689
1179 { id: sys_setitimer [ 38] } hitcount: 2797
1180 { id: sys_read [ 0] } hitcount: 3202
1181 { id: sys_select [ 23] } hitcount: 3773
1182 { id: sys_writev [ 20] } hitcount: 4531
1183 { id: sys_poll [ 7] } hitcount: 8314
1184 { id: sys_recvmsg [ 47] } hitcount: 13738
1185 { id: sys_ioctl [ 16] } hitcount: 21843
1186
1187 Totals:
1188 Hits: 67612
1189 Entries: 72
1190 Dropped: 0
1191
1192 The syscall counts above provide a rough overall picture of system
1193 call activity on the system; we can see for example that the most
1194 popular system call on this system was the 'sys_ioctl' system call.
1195
1196 We can use 'compound' keys to refine that number and provide some
1197 further insight as to which processes exactly contribute to the
1198 overall ioctl count.
1199
1200 The command below keeps a hitcount for every unique combination of
1201 system call id and pid - the end result is essentially a table
1202 that keeps a per-pid sum of system call hits. The results are
1203 sorted using the system call id as the primary key, and the
1204 hitcount sum as the secondary key:
1205
1206 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount' > \
1207 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1208
1209 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1210 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 [active]
1211
1212 { id: sys_read [ 0], common_pid: rtkit-daemon [ 1877] } hitcount: 1
1213 { id: sys_read [ 0], common_pid: gdbus [ 2976] } hitcount: 1
1214 { id: sys_read [ 0], common_pid: console-kit-dae [ 3400] } hitcount: 1
1215 { id: sys_read [ 0], common_pid: postgres [ 1865] } hitcount: 1
1216 { id: sys_read [ 0], common_pid: deja-dup-monito [ 3543] } hitcount: 2
1217 { id: sys_read [ 0], common_pid: NetworkManager [ 890] } hitcount: 2
1218 { id: sys_read [ 0], common_pid: evolution-calen [ 3048] } hitcount: 2
1219 { id: sys_read [ 0], common_pid: postgres [ 1864] } hitcount: 2
1220 { id: sys_read [ 0], common_pid: nm-applet [ 3022] } hitcount: 2
1221 { id: sys_read [ 0], common_pid: whoopsie [ 1212] } hitcount: 2
1222 .
1223 .
1224 .
1225 { id: sys_ioctl [ 16], common_pid: bash [ 8479] } hitcount: 1
1226 { id: sys_ioctl [ 16], common_pid: bash [ 3472] } hitcount: 12
1227 { id: sys_ioctl [ 16], common_pid: gnome-terminal [ 3199] } hitcount: 16
1228 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 1808
1229 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 5580
1230 .
1231 .
1232 .
1233 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2690] } hitcount: 3
1234 { id: sys_waitid [247], common_pid: upstart-dbus-br [ 2688] } hitcount: 16
1235 { id: sys_inotify_add_watch [254], common_pid: gmain [ 975] } hitcount: 2
1236 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3204] } hitcount: 4
1237 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2888] } hitcount: 4
1238 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3003] } hitcount: 4
1239 { id: sys_inotify_add_watch [254], common_pid: gmain [ 2873] } hitcount: 4
1240 { id: sys_inotify_add_watch [254], common_pid: gmain [ 3196] } hitcount: 6
1241 { id: sys_openat [257], common_pid: java [ 2623] } hitcount: 2
1242 { id: sys_eventfd2 [290], common_pid: ibus-ui-gtk3 [ 2760] } hitcount: 4
1243 { id: sys_eventfd2 [290], common_pid: compiz [ 2994] } hitcount: 6
1244
1245 Totals:
1246 Hits: 31536
1247 Entries: 323
1248 Dropped: 0
1249
1250 The above list does give us a breakdown of the ioctl syscall by
1251 pid, but it also gives us quite a bit more than that, which we
1252 don't really care about at the moment. Since we know the syscall
1253 id for sys_ioctl (16, displayed next to the sys_ioctl name), we
1254 can use that to filter out all the other syscalls:
1255
1256 # echo 'hist:key=id.syscall,common_pid.execname:val=hitcount:sort=id,hitcount if id == 16' > \
1257 /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
1258
1259 # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
1260 # trigger info: hist:keys=id.syscall,common_pid.execname:vals=hitcount:sort=id.syscall,hitcount:size=2048 if id == 16 [active]
1261
1262 { id: sys_ioctl [ 16], common_pid: gmain [ 2769] } hitcount: 1
1263 { id: sys_ioctl [ 16], common_pid: evolution-addre [ 8571] } hitcount: 1
1264 { id: sys_ioctl [ 16], common_pid: gmain [ 3003] } hitcount: 1
1265 { id: sys_ioctl [ 16], common_pid: gmain [ 2781] } hitcount: 1
1266 { id: sys_ioctl [ 16], common_pid: gmain [ 2829] } hitcount: 1
1267 { id: sys_ioctl [ 16], common_pid: bash [ 8726] } hitcount: 1
1268 { id: sys_ioctl [ 16], common_pid: bash [ 8508] } hitcount: 1
1269 { id: sys_ioctl [ 16], common_pid: gmain [ 2970] } hitcount: 1
1270 { id: sys_ioctl [ 16], common_pid: gmain [ 2768] } hitcount: 1
1271 .
1272 .
1273 .
1274 { id: sys_ioctl [ 16], common_pid: pool [ 8559] } hitcount: 45
1275 { id: sys_ioctl [ 16], common_pid: pool [ 8555] } hitcount: 48
1276 { id: sys_ioctl [ 16], common_pid: pool [ 8551] } hitcount: 48
1277 { id: sys_ioctl [ 16], common_pid: avahi-daemon [ 896] } hitcount: 66
1278 { id: sys_ioctl [ 16], common_pid: Xorg [ 1267] } hitcount: 26674
1279 { id: sys_ioctl [ 16], common_pid: compiz [ 2994] } hitcount: 73443
1280
1281 Totals:
1282 Hits: 101162
1283 Entries: 103
1284 Dropped: 0
1285
1286 The above output shows that 'compiz' and 'Xorg' are far and away
1287 the heaviest ioctl callers (which might lead to questions about
1288 whether they really need to be making all those calls and to
1289 possible avenues for further investigation.)
1290
1291 The compound key examples used a key and a sum value (hitcount) to
1292 sort the output, but we can just as easily use two keys instead.
1293 Here's an example where we use a compound key composed of the the
1294 common_pid and size event fields. Sorting with pid as the primary
1295 key and 'size' as the secondary key allows us to display an
1296 ordered summary of the recvfrom sizes, with counts, received by
1297 each process:
1298
1299 # echo 'hist:key=common_pid.execname,size:val=hitcount:sort=common_pid,size' > \
1300 /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/trigger
1301
1302 # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_recvfrom/hist
1303 # trigger info: hist:keys=common_pid.execname,size:vals=hitcount:sort=common_pid.execname,size:size=2048 [active]
1304
1305 { common_pid: smbd [ 784], size: 4 } hitcount: 1
1306 { common_pid: dnsmasq [ 1412], size: 4096 } hitcount: 672
1307 { common_pid: postgres [ 1796], size: 1000 } hitcount: 6
1308 { common_pid: postgres [ 1867], size: 1000 } hitcount: 10
1309 { common_pid: bamfdaemon [ 2787], size: 28 } hitcount: 2
1310 { common_pid: bamfdaemon [ 2787], size: 14360 } hitcount: 1
1311 { common_pid: compiz [ 2994], size: 8 } hitcount: 1
1312 { common_pid: compiz [ 2994], size: 20 } hitcount: 11
1313 { common_pid: gnome-terminal [ 3199], size: 4 } hitcount: 2
1314 { common_pid: firefox [ 8817], size: 4 } hitcount: 1
1315 { common_pid: firefox [ 8817], size: 8 } hitcount: 5
1316 { common_pid: firefox [ 8817], size: 588 } hitcount: 2
1317 { common_pid: firefox [ 8817], size: 628 } hitcount: 1
1318 { common_pid: firefox [ 8817], size: 6944 } hitcount: 1
1319 { common_pid: firefox [ 8817], size: 408880 } hitcount: 2
1320 { common_pid: firefox [ 8822], size: 8 } hitcount: 2
1321 { common_pid: firefox [ 8822], size: 160 } hitcount: 2
1322 { common_pid: firefox [ 8822], size: 320 } hitcount: 2
1323 { common_pid: firefox [ 8822], size: 352 } hitcount: 1
1324 .
1325 .
1326 .
1327 { common_pid: pool [ 8923], size: 1960 } hitcount: 10
1328 { common_pid: pool [ 8923], size: 2048 } hitcount: 10
1329 { common_pid: pool [ 8924], size: 1960 } hitcount: 10
1330 { common_pid: pool [ 8924], size: 2048 } hitcount: 10
1331 { common_pid: pool [ 8928], size: 1964 } hitcount: 4
1332 { common_pid: pool [ 8928], size: 1965 } hitcount: 2
1333 { common_pid: pool [ 8928], size: 2048 } hitcount: 6
1334 { common_pid: pool [ 8929], size: 1982 } hitcount: 1
1335 { common_pid: pool [ 8929], size: 2048 } hitcount: 1
1336
1337 Totals:
1338 Hits: 2016
1339 Entries: 224
1340 Dropped: 0
1341
1342 The above example also illustrates the fact that although a compound
1343 key is treated as a single entity for hashing purposes, the sub-keys
1344 it's composed of can be accessed independently.
1345
1346 The next example uses a string field as the hash key and
1347 demonstrates how you can manually pause and continue a hist trigger.
1348 In this example, we'll aggregate fork counts and don't expect a
1349 large number of entries in the hash table, so we'll drop it to a
1350 much smaller number, say 256:
1351
1352 # echo 'hist:key=child_comm:val=hitcount:size=256' > \
1353 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1354
1355 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1356 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
1357
1358 { child_comm: dconf worker } hitcount: 1
1359 { child_comm: ibus-daemon } hitcount: 1
1360 { child_comm: whoopsie } hitcount: 1
1361 { child_comm: smbd } hitcount: 1
1362 { child_comm: gdbus } hitcount: 1
1363 { child_comm: kthreadd } hitcount: 1
1364 { child_comm: dconf worker } hitcount: 1
1365 { child_comm: evolution-alarm } hitcount: 2
1366 { child_comm: Socket Thread } hitcount: 2
1367 { child_comm: postgres } hitcount: 2
1368 { child_comm: bash } hitcount: 3
1369 { child_comm: compiz } hitcount: 3
1370 { child_comm: evolution-sourc } hitcount: 4
1371 { child_comm: dhclient } hitcount: 4
1372 { child_comm: pool } hitcount: 5
1373 { child_comm: nm-dispatcher.a } hitcount: 8
1374 { child_comm: firefox } hitcount: 8
1375 { child_comm: dbus-daemon } hitcount: 8
1376 { child_comm: glib-pacrunner } hitcount: 10
1377 { child_comm: evolution } hitcount: 23
1378
1379 Totals:
1380 Hits: 89
1381 Entries: 20
1382 Dropped: 0
1383
1384 If we want to pause the hist trigger, we can simply append :pause to
1385 the command that started the trigger. Notice that the trigger info
1386 displays as [paused]:
1387
1388 # echo 'hist:key=child_comm:val=hitcount:size=256:pause' >> \
1389 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1390
1391 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1392 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [paused]
1393
1394 { child_comm: dconf worker } hitcount: 1
1395 { child_comm: kthreadd } hitcount: 1
1396 { child_comm: dconf worker } hitcount: 1
1397 { child_comm: gdbus } hitcount: 1
1398 { child_comm: ibus-daemon } hitcount: 1
1399 { child_comm: Socket Thread } hitcount: 2
1400 { child_comm: evolution-alarm } hitcount: 2
1401 { child_comm: smbd } hitcount: 2
1402 { child_comm: bash } hitcount: 3
1403 { child_comm: whoopsie } hitcount: 3
1404 { child_comm: compiz } hitcount: 3
1405 { child_comm: evolution-sourc } hitcount: 4
1406 { child_comm: pool } hitcount: 5
1407 { child_comm: postgres } hitcount: 6
1408 { child_comm: firefox } hitcount: 8
1409 { child_comm: dhclient } hitcount: 10
1410 { child_comm: emacs } hitcount: 12
1411 { child_comm: dbus-daemon } hitcount: 20
1412 { child_comm: nm-dispatcher.a } hitcount: 20
1413 { child_comm: evolution } hitcount: 35
1414 { child_comm: glib-pacrunner } hitcount: 59
1415
1416 Totals:
1417 Hits: 199
1418 Entries: 21
1419 Dropped: 0
1420
1421 To manually continue having the trigger aggregate events, append
1422 :cont instead. Notice that the trigger info displays as [active]
1423 again, and the data has changed:
1424
1425 # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >> \
1426 /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
1427
1428 # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
1429 # trigger info: hist:keys=child_comm:vals=hitcount:sort=hitcount:size=256 [active]
1430
1431 { child_comm: dconf worker } hitcount: 1
1432 { child_comm: dconf worker } hitcount: 1
1433 { child_comm: kthreadd } hitcount: 1
1434 { child_comm: gdbus } hitcount: 1
1435 { child_comm: ibus-daemon } hitcount: 1
1436 { child_comm: Socket Thread } hitcount: 2
1437 { child_comm: evolution-alarm } hitcount: 2
1438 { child_comm: smbd } hitcount: 2
1439 { child_comm: whoopsie } hitcount: 3
1440 { child_comm: compiz } hitcount: 3
1441 { child_comm: evolution-sourc } hitcount: 4
1442 { child_comm: bash } hitcount: 5
1443 { child_comm: pool } hitcount: 5
1444 { child_comm: postgres } hitcount: 6
1445 { child_comm: firefox } hitcount: 8
1446 { child_comm: dhclient } hitcount: 11
1447 { child_comm: emacs } hitcount: 12
1448 { child_comm: dbus-daemon } hitcount: 22
1449 { child_comm: nm-dispatcher.a } hitcount: 22
1450 { child_comm: evolution } hitcount: 35
1451 { child_comm: glib-pacrunner } hitcount: 59
1452
1453 Totals:
1454 Hits: 206
1455 Entries: 21
1456 Dropped: 0
1457
1458 The previous example showed how to start and stop a hist trigger by
1459 appending 'pause' and 'continue' to the hist trigger command. A
1460 hist trigger can also be started in a paused state by initially
1461 starting the trigger with ':pause' appended. This allows you to
1462 start the trigger only when you're ready to start collecting data
1463 and not before. For example, you could start the trigger in a
1464 paused state, then unpause it and do something you want to measure,
1465 then pause the trigger again when done.
1466
1467 Of course, doing this manually can be difficult and error-prone, but
1468 it is possible to automatically start and stop a hist trigger based
1469 on some condition, via the enable_hist and disable_hist triggers.
1470
1471 For example, suppose we wanted to take a look at the relative
1472 weights in terms of skb length for each callpath that leads to a
1473 netif_receieve_skb event when downloading a decent-sized file using
1474 wget.
1475
1476 First we set up an initially paused stacktrace trigger on the
1477 netif_receive_skb event:
1478
1479 # echo 'hist:key=stacktrace:vals=len:pause' > \
1480 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1481
1482 Next, we set up an 'enable_hist' trigger on the sched_process_exec
1483 event, with an 'if filename==/usr/bin/wget' filter. The effect of
1484 this new trigger is that it will 'unpause' the hist trigger we just
1485 set up on netif_receive_skb if and only if it sees a
1486 sched_process_exec event with a filename of '/usr/bin/wget'. When
1487 that happens, all netif_receive_skb events are aggregated into a
1488 hash table keyed on stacktrace:
1489
1490 # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
1491 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1492
1493 The aggregation continues until the netif_receive_skb is paused
1494 again, which is what the following disable_hist event does by
1495 creating a similar setup on the sched_process_exit event, using the
1496 filter 'comm==wget':
1497
1498 # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
1499 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1500
1501 Whenever a process exits and the comm field of the disable_hist
1502 trigger filter matches 'comm==wget', the netif_receive_skb hist
1503 trigger is disabled.
1504
1505 The overall effect is that netif_receive_skb events are aggregated
1506 into the hash table for only the duration of the wget. Executing a
1507 wget command and then listing the 'hist' file will display the
1508 output generated by the wget command:
1509
1510 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1511
1512 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1513 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1514
1515 { stacktrace:
1516 __netif_receive_skb_core+0x46d/0x990
1517 __netif_receive_skb+0x18/0x60
1518 netif_receive_skb_internal+0x23/0x90
1519 napi_gro_receive+0xc8/0x100
1520 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1521 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1522 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1523 ieee80211_rx+0x31d/0x900 [mac80211]
1524 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1525 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1526 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1527 irq_thread_fn+0x20/0x50
1528 irq_thread+0x11f/0x150
1529 kthread+0xd2/0xf0
1530 ret_from_fork+0x42/0x70
1531 } hitcount: 85 len: 28884
1532 { stacktrace:
1533 __netif_receive_skb_core+0x46d/0x990
1534 __netif_receive_skb+0x18/0x60
1535 netif_receive_skb_internal+0x23/0x90
1536 napi_gro_complete+0xa4/0xe0
1537 dev_gro_receive+0x23a/0x360
1538 napi_gro_receive+0x30/0x100
1539 ieee80211_deliver_skb+0xd6/0x270 [mac80211]
1540 ieee80211_rx_handlers+0xccf/0x22f0 [mac80211]
1541 ieee80211_prepare_and_rx_handle+0x4e7/0xc40 [mac80211]
1542 ieee80211_rx+0x31d/0x900 [mac80211]
1543 iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
1544 iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
1545 iwl_pcie_irq_handler+0xe3c/0x12f0 [iwlwifi]
1546 irq_thread_fn+0x20/0x50
1547 irq_thread+0x11f/0x150
1548 kthread+0xd2/0xf0
1549 } hitcount: 98 len: 664329
1550 { stacktrace:
1551 __netif_receive_skb_core+0x46d/0x990
1552 __netif_receive_skb+0x18/0x60
1553 process_backlog+0xa8/0x150
1554 net_rx_action+0x15d/0x340
1555 __do_softirq+0x114/0x2c0
1556 do_softirq_own_stack+0x1c/0x30
1557 do_softirq+0x65/0x70
1558 __local_bh_enable_ip+0xb5/0xc0
1559 ip_finish_output+0x1f4/0x840
1560 ip_output+0x6b/0xc0
1561 ip_local_out_sk+0x31/0x40
1562 ip_send_skb+0x1a/0x50
1563 udp_send_skb+0x173/0x2a0
1564 udp_sendmsg+0x2bf/0x9f0
1565 inet_sendmsg+0x64/0xa0
1566 sock_sendmsg+0x3d/0x50
1567 } hitcount: 115 len: 13030
1568 { stacktrace:
1569 __netif_receive_skb_core+0x46d/0x990
1570 __netif_receive_skb+0x18/0x60
1571 netif_receive_skb_internal+0x23/0x90
1572 napi_gro_complete+0xa4/0xe0
1573 napi_gro_flush+0x6d/0x90
1574 iwl_pcie_irq_handler+0x92a/0x12f0 [iwlwifi]
1575 irq_thread_fn+0x20/0x50
1576 irq_thread+0x11f/0x150
1577 kthread+0xd2/0xf0
1578 ret_from_fork+0x42/0x70
1579 } hitcount: 934 len: 5512212
1580
1581 Totals:
1582 Hits: 1232
1583 Entries: 4
1584 Dropped: 0
1585
1586 The above shows all the netif_receive_skb callpaths and their total
1587 lengths for the duration of the wget command.
1588
1589 The 'clear' hist trigger param can be used to clear the hash table.
1590 Suppose we wanted to try another run of the previous example but
1591 this time also wanted to see the complete list of events that went
1592 into the histogram. In order to avoid having to set everything up
1593 again, we can just clear the histogram first:
1594
1595 # echo 'hist:key=stacktrace:vals=len:clear' >> \
1596 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1597
1598 Just to verify that it is in fact cleared, here's what we now see in
1599 the hist file:
1600
1601 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1602 # trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
1603
1604 Totals:
1605 Hits: 0
1606 Entries: 0
1607 Dropped: 0
1608
1609 Since we want to see the detailed list of every netif_receive_skb
1610 event occurring during the new run, which are in fact the same
1611 events being aggregated into the hash table, we add some additional
1612 'enable_event' events to the triggering sched_process_exec and
1613 sched_process_exit events as such:
1614
1615 # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
1616 /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1617
1618 # echo 'disable_event:net:netif_receive_skb if comm==wget' > \
1619 /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1620
1621 If you read the trigger files for the sched_process_exec and
1622 sched_process_exit triggers, you should see two triggers for each:
1623 one enabling/disabling the hist aggregation and the other
1624 enabling/disabling the logging of events:
1625
1626 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
1627 enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1628 enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
1629
1630 # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
1631 enable_event:net:netif_receive_skb:unlimited if comm==wget
1632 disable_hist:net:netif_receive_skb:unlimited if comm==wget
1633
1634 In other words, whenever either of the sched_process_exec or
1635 sched_process_exit events is hit and matches 'wget', it enables or
1636 disables both the histogram and the event log, and what you end up
1637 with is a hash table and set of events just covering the specified
1638 duration. Run the wget command again:
1639
1640 $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
1641
1642 Displaying the 'hist' file should show something similar to what you
1643 saw in the last run, but this time you should also see the
1644 individual events in the trace file:
1645
1646 # cat /sys/kernel/debug/tracing/trace
1647
1648 # tracer: nop
1649 #
1650 # entries-in-buffer/entries-written: 183/1426 #P:4
1651 #
1652 # _-----=> irqs-off
1653 # / _----=> need-resched
1654 # | / _---=> hardirq/softirq
1655 # || / _--=> preempt-depth
1656 # ||| / delay
1657 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
1658 # | | | |||| | |
1659 wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
1660 wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
1661 dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
1662 dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
1663 ##### CPU 2 buffer started ####
1664 irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
1665 irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
1666 irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
1667 irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
1668 irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
1669 .
1670 .
1671 .
52a7f16d
TZ
1672
1673 The following example demonstrates how multiple hist triggers can be
1674 attached to a given event. This capability can be useful for
1675 creating a set of different summaries derived from the same set of
1676 events, or for comparing the effects of different filters, among
1677 other things.
1678
1679 # echo 'hist:keys=skbaddr.hex:vals=len if len < 0' >> \
1680 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1681 # echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' >> \
1682 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1683 # echo 'hist:keys=skbaddr.hex:vals=len if len == 256' >> \
1684 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1685 # echo 'hist:keys=skbaddr.hex:vals=len' >> \
1686 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1687 # echo 'hist:keys=len:vals=common_preempt_count' >> \
1688 /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
1689
1690 The above set of commands create four triggers differing only in
1691 their filters, along with a completely different though fairly
1692 nonsensical trigger. Note that in order to append multiple hist
1693 triggers to the same file, you should use the '>>' operator to
1694 append them ('>' will also add the new hist trigger, but will remove
1695 any existing hist triggers beforehand).
1696
1697 Displaying the contents of the 'hist' file for the event shows the
1698 contents of all five histograms:
1699
1700 # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
1701
1702 # event histogram
1703 #
1704 # trigger info: hist:keys=len:vals=hitcount,common_preempt_count:sort=hitcount:size=2048 [active]
1705 #
1706
1707 { len: 176 } hitcount: 1 common_preempt_count: 0
1708 { len: 223 } hitcount: 1 common_preempt_count: 0
1709 { len: 4854 } hitcount: 1 common_preempt_count: 0
1710 { len: 395 } hitcount: 1 common_preempt_count: 0
1711 { len: 177 } hitcount: 1 common_preempt_count: 0
1712 { len: 446 } hitcount: 1 common_preempt_count: 0
1713 { len: 1601 } hitcount: 1 common_preempt_count: 0
1714 .
1715 .
1716 .
1717 { len: 1280 } hitcount: 66 common_preempt_count: 0
1718 { len: 116 } hitcount: 81 common_preempt_count: 40
1719 { len: 708 } hitcount: 112 common_preempt_count: 0
1720 { len: 46 } hitcount: 221 common_preempt_count: 0
1721 { len: 1264 } hitcount: 458 common_preempt_count: 0
1722
1723 Totals:
1724 Hits: 1428
1725 Entries: 147
1726 Dropped: 0
1727
1728
1729 # event histogram
1730 #
1731 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 [active]
1732 #
1733
1734 { skbaddr: ffff8800baee5e00 } hitcount: 1 len: 130
1735 { skbaddr: ffff88005f3d5600 } hitcount: 1 len: 1280
1736 { skbaddr: ffff88005f3d4900 } hitcount: 1 len: 1280
1737 { skbaddr: ffff88009fed6300 } hitcount: 1 len: 115
1738 { skbaddr: ffff88009fe0ad00 } hitcount: 1 len: 115
1739 { skbaddr: ffff88008cdb1900 } hitcount: 1 len: 46
1740 { skbaddr: ffff880064b5ef00 } hitcount: 1 len: 118
1741 { skbaddr: ffff880044e3c700 } hitcount: 1 len: 60
1742 { skbaddr: ffff880100065900 } hitcount: 1 len: 46
1743 { skbaddr: ffff8800d46bd500 } hitcount: 1 len: 116
1744 { skbaddr: ffff88005f3d5f00 } hitcount: 1 len: 1280
1745 { skbaddr: ffff880100064700 } hitcount: 1 len: 365
1746 { skbaddr: ffff8800badb6f00 } hitcount: 1 len: 60
1747 .
1748 .
1749 .
1750 { skbaddr: ffff88009fe0be00 } hitcount: 27 len: 24677
1751 { skbaddr: ffff88009fe0a400 } hitcount: 27 len: 23052
1752 { skbaddr: ffff88009fe0b700 } hitcount: 31 len: 25589
1753 { skbaddr: ffff88009fe0b600 } hitcount: 32 len: 27326
1754 { skbaddr: ffff88006a462800 } hitcount: 68 len: 71678
1755 { skbaddr: ffff88006a463700 } hitcount: 70 len: 72678
1756 { skbaddr: ffff88006a462b00 } hitcount: 71 len: 77589
1757 { skbaddr: ffff88006a463600 } hitcount: 73 len: 71307
1758 { skbaddr: ffff88006a462200 } hitcount: 81 len: 81032
1759
1760 Totals:
1761 Hits: 1451
1762 Entries: 318
1763 Dropped: 0
1764
1765
1766 # event histogram
1767 #
1768 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len == 256 [active]
1769 #
1770
1771
1772 Totals:
1773 Hits: 0
1774 Entries: 0
1775 Dropped: 0
1776
1777
1778 # event histogram
1779 #
1780 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len > 4096 [active]
1781 #
1782
1783 { skbaddr: ffff88009fd2c300 } hitcount: 1 len: 7212
1784 { skbaddr: ffff8800d2bcce00 } hitcount: 1 len: 7212
1785 { skbaddr: ffff8800d2bcd700 } hitcount: 1 len: 7212
1786 { skbaddr: ffff8800d2bcda00 } hitcount: 1 len: 21492
1787 { skbaddr: ffff8800ae2e2d00 } hitcount: 1 len: 7212
1788 { skbaddr: ffff8800d2bcdb00 } hitcount: 1 len: 7212
1789 { skbaddr: ffff88006a4df500 } hitcount: 1 len: 4854
1790 { skbaddr: ffff88008ce47b00 } hitcount: 1 len: 18636
1791 { skbaddr: ffff8800ae2e2200 } hitcount: 1 len: 12924
1792 { skbaddr: ffff88005f3e1000 } hitcount: 1 len: 4356
1793 { skbaddr: ffff8800d2bcdc00 } hitcount: 2 len: 24420
1794 { skbaddr: ffff8800d2bcc200 } hitcount: 2 len: 12996
1795
1796 Totals:
1797 Hits: 14
1798 Entries: 12
1799 Dropped: 0
1800
1801
1802 # event histogram
1803 #
1804 # trigger info: hist:keys=skbaddr.hex:vals=hitcount,len:sort=hitcount:size=2048 if len < 0 [active]
1805 #
1806
1807
1808 Totals:
1809 Hits: 0
1810 Entries: 0
1811 Dropped: 0