]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
8 years agoperf bpf: Rename bpf config to program config
Wang Nan [Fri, 27 Nov 2015 08:47:37 +0000 (08:47 +0000)]
perf bpf: Rename bpf config to program config

Following patches are going to introduce BPF object level configuration
to enable setting values into BPF maps. To avoid confusion, this patch
renames existing 'config' in bpf-loader.c to 'program config'. Following
patches would introduce 'object config'.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448614067-197576-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools lib bpf: Extract and collect map names from BPF object file
Wang Nan [Fri, 27 Nov 2015 08:47:36 +0000 (08:47 +0000)]
tools lib bpf: Extract and collect map names from BPF object file

This patch collects name of maps in BPF object files and saves them into
'maps' field in 'struct bpf_object'. 'bpf_object__get_map_by_name' is
introduced to retrive fd and definitions of a map through its name.

Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448614067-197576-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools lib bpf: Collect map definition in bpf_object
Wang Nan [Fri, 27 Nov 2015 08:47:35 +0000 (08:47 +0000)]
tools lib bpf: Collect map definition in bpf_object

This patch collects more information from maps sections in BPF object
files into 'struct bpf_object', enables later patches access those
information (such as the type and size of the map).

In this patch, a new handler 'struct bpf_map' is extracted in parallel
with bpf_object and bpf_program. Its iterator and accessor is also
created.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448614067-197576-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf hists browser: Update nr entries regardless of min percent
Namhyung Kim [Fri, 27 Nov 2015 17:32:39 +0000 (02:32 +0900)]
perf hists browser: Update nr entries regardless of min percent

When perf report on TUI was called with -S symbol filter, it should
update nr entries even if min_pcnt is 0.  IIRC the reason was to update
nr entries after applying minimum percent threshold.  But if symbol
filter was given on command line (with -S option), it should use
hists->nr_non_filtered_entries instead of hists->nr_entries.

So this patch fixes a bug of navigating hists browser that the cursor
goes beyond the number of entries when -S (or similar) option is used.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448645559-31167-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf hists: Do not skip elided fields when processing samples
Namhyung Kim [Fri, 27 Nov 2015 17:32:38 +0000 (02:32 +0900)]
perf hists: Do not skip elided fields when processing samples

If user gives a filter, perf marks the corresponding column elided and
omits the output.  But it should process and aggregates samples using
the field, otherwise samples will be aggregated as if the column was not
there resulted in incorrect output.

For example, I'd like to set a filter on native_write_msr_safe.  The
original overhead of the function is negligible.

  $ perf report | grep native_write_msr_safe
      0.00%  swapper  [kernel.vmlinux]  native_write_msr_safe
      0.00%  perf     [kernel.vmlinux]  native_write_msr_safe

However adding -S option gives different output.

  $ perf report -S native_write_msr_safe --percentage absolute | \
  > grep -e swapper -e perf
     51.47%  swapper  [kernel.vmlinux]
      4.14%  perf     [kernel.vmlinux]

Since it aggregated samples using comm and dso only.  In fact, the above
values are same when it sorts with -s comm,dso.

  $ perf report -s comm,dso | grep -e swapper -e perf
     51.47%  swapper  [kernel.vmlinux]
      4.14%  perf     [kernel.vmlinux]

This resulted in TUI failure with -ERANGE since it tries to increase
sample hit count for annotation with wrong symbols due to incorrect
aggregation.

This patch fixes it not to skip elided fields when comparing samples in
order to insert them to the hists.

Commiter note:

After the patch, with a different workloads:

  # perf report --show-total-period -S native_write_msr_safe --stdio
  #
  # symbol: native_write_msr_safe
  #
  # Samples: 455  of event 'cycles:pp'
  # Event count (approx.): 134787489
  #
  # Overhead Period Command         Shared Object
  # ........ ...... ............... ................
  #
       0.22% 293081 qemu-system-x86 [vmlinux]
       0.19% 255914 swapper         [vmlinux]
       0.00%   2054 Timer           [vmlinux]
       0.00%   1021 firefox         [vmlinux]
       0.00%      2 perf            [vmlinux]

  # perf report --show-total-period | grep native_write_msr_safe
  Failed to open /tmp/perf-14838.map, continuing without symbols
       0.22% 293081 qemu-system-x86 [vmlinux]  [k] native_write_msr_safe
       0.19% 255914 swapper         [vmlinux]  [k] native_write_msr_safe
       0.00%   2054 Timer           [vmlinux]  [k] native_write_msr_safe
       0.00%   1021 firefox         [vmlinux]  [k] native_write_msr_safe
       0.00%      2 perf            [vmlinux]  [k] native_write_msr_safe
  #

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448645559-31167-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf report: Show error message when processing sample fails
Namhyung Kim [Fri, 27 Nov 2015 17:32:37 +0000 (02:32 +0900)]
perf report: Show error message when processing sample fails

Currently when perf fails to process samples for some reason, it doesn't
show any message about the failure.  This is very inconvenient for users
especially on TUI as screen is reset after the failure.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448645559-31167-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf list: Robustify event printing routine
Arnaldo Carvalho de Melo [Fri, 27 Nov 2015 19:04:58 +0000 (16:04 -0300)]
perf list: Robustify event printing routine

When a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") added
PERF_COUNT_SW_BPF_OUTPUT we ended up with a new entry in the event_symbols_sw
array that wasn't initialized, thus set to NULL, fix print_symbol_events()
to check for that case so that we don't crash if this happens again.

  (gdb) bt
  #0  __match_glob (ignore_space=false, pat=<optimized out>, str=<optimized out>) at util/string.c:198
  #1  strglobmatch (str=<optimized out>, pat=pat@entry=0x7fffffffe61d "stall") at util/string.c:252
  #2  0x00000000004993a5 in print_symbol_events (type=1, syms=0x872880 <event_symbols_sw+160>, max=11, name_only=false, event_glob=0x7fffffffe61d "stall")
      at util/parse-events.c:1615
  #3  print_events (event_glob=event_glob@entry=0x7fffffffe61d "stall", name_only=false) at util/parse-events.c:1675
  #4  0x000000000042c79e in cmd_list (argc=1, argv=0x7fffffffe390, prefix=<optimized out>) at builtin-list.c:68
  #5  0x00000000004788a5 in run_builtin (p=p@entry=0x871758 <commands+120>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe390) at perf.c:370
  #6  0x0000000000420ab0 in handle_internal_command (argv=0x7fffffffe390, argc=2) at perf.c:429
  #7  run_argv (argv=0x7fffffffe110, argcp=0x7fffffffe11c) at perf.c:473
  #8  main (argc=2, argv=0x7fffffffe390) at perf.c:588
  (gdb) p event_symbols_sw[PERF_COUNT_SW_BPF_OUTPUT]
  $4 = {symbol = 0x0, alias = 0x0}
  (gdb)

A patch to robustify perf to not segfault when the next counter gets added in
the kernel will follow this one.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-57wysblcjfrseb0zg5u7ek10@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf list: Add support for PERF_COUNT_SW_BPF_OUT
Arnaldo Carvalho de Melo [Fri, 27 Nov 2015 18:54:33 +0000 (15:54 -0300)]
perf list: Add support for PERF_COUNT_SW_BPF_OUT

When PERF_COUNT_SW_BPF_OUTPUT was added to the kernel we should've
added it to tools/perf, where it is used just to list events.

This ended up causing a segfault in commands like "perf list stall".

Fix it by adding that new software counter.

A patch to robustify perf to not segfault when the next counter gets
added in the kernel will follow this one.

Reported-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-uya354upi3eprsey6mi5962d@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: 'unwind' test should create kernel maps
Jiri Olsa [Fri, 27 Nov 2015 08:21:21 +0000 (09:21 +0100)]
perf test: 'unwind' test should create kernel maps

The 'perf test unwind' is failing because it forgot to create the kernel
maps, fix it.

After the patch:

  # perf test unwind
  40: Test dwarf unwind         : Ok

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20151127082121.GA24503@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf buildid-list: Show running kernel build id fix
Michael Petlan [Fri, 27 Nov 2015 13:48:09 +0000 (14:48 +0100)]
perf buildid-list: Show running kernel build id fix

The --kernel option of perf buildid-list tool should show the running
kernel buildid.  The functionality has been lost during other changes of
the related code.

The build_id__sprintf() function should return length of the build-id
string,  but it was the length of the build-id raw data instead. Due to
that, some return value checking caused that the final string was not
printed out.

With this patch the build_id__sprintf() returns the correct value, so
the --kernel option works again.

Before:

# perf buildid-list --kernel
#

After:

# perf buildid-list --kernel
972c1edab5bdc06cc224af45d510af662a3c6972
#

Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LPU-Reference: 1448632089.24573.114.camel@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Fri, 27 Nov 2015 07:28:44 +0000 (08:28 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Add 'vmlinux.debug' to the vmlinux search path (Ekaterina Tumanova)

  - Do not show sample_(type|period) in the perf_event_attr dump when using
    -v with 'perf stat' (Jiri Olsa)

  - Display the WEIGHT sample bit, when set, in 'perf evlist -v' (Jiri Olsa)

  - Honour --hide-unresolved in 'report', will honour it as well in 'top'
    when --hide-unresolved gets supported in that tool (Namhyung Kim)

  - Fix freeze wit h--call-graph 'flat/folded' due to not properly
    reinitializing the callchain rb_tree (Namhyumg Kim)

  - Set dso->long_name when a module name is passed as a parameter
    to tools like 'perf probe' but the 'struct dso' associated to that module
    still doesn't have the full path for the module, just the '[name]' one
    obtained from /proc/modules (Wang Nan)

  - Fix anon_hugepage mmaps detection using scanf on /proc/PID/smaps (Yannick Brosseau)

Infrastructure changes:

  - Add helper function for updating bpf maps elements (He Kuang)

  - Fix traceevents plugins build race (Jiri Olsa)

  - Add the $OUTPUT path prefix with 'fixdep' (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agobpf tools: Add helper function for updating bpf maps elements
He Kuang [Tue, 24 Nov 2015 13:36:08 +0000 (13:36 +0000)]
bpf tools: Add helper function for updating bpf maps elements

Add bpf_map_update_elem() helper function which calls the sys_bpf
syscall to update elements in bpf maps. Upcoming patches will use it to
adjust data in map through the perf command line.

Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448372181-151723-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf evlist: Display WEIGHT sample type bit
Jiri Olsa [Wed, 25 Nov 2015 15:36:55 +0000 (16:36 +0100)]
perf evlist: Display WEIGHT sample type bit

Adding WIEGHT bit_name call to display sample_type properly.

  $ perf evlist -v
  cpu/mem-loads/pp: ...SNIP... sample_type: IP|TID|TIME|ADDR|ID|CPU|DATA_SRC|WEIGHT ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448465815-27404-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf stat: Clear sample_(type|period) for counting
Jiri Olsa [Wed, 25 Nov 2015 15:36:54 +0000 (16:36 +0100)]
perf stat: Clear sample_(type|period) for counting

Clear sample_(type|period) for counting, as it only confuses debug
output with unwanted sampling details:

Before:

  $ sudo perf stat -e 'raw_syscalls:sys_enter' -vv ls
  ------------------------------------------------------------
  perf_event_attr:
    type                             2
    size                             112
    config                           0x11
    { sample_period, sample_freq }   1
    sample_type                      TIME|CPU|PERIOD|RAW
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ...

After:
  $ sudo perf stat -e 'raw_syscalls:sys_enter' -vv ls
  ------------------------------------------------------------
  perf_event_attr:
    type                             2
    size                             112
    config                           0x11
    read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
    disabled                         1
    inherit                          1
    enable_on_exec                   1
    exclude_guest                    1
  ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448465815-27404-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf symbols: Add the path to vmlinux.debug
Ekaterina Tumanova [Wed, 25 Nov 2015 16:32:46 +0000 (17:32 +0100)]
perf symbols: Add the path to vmlinux.debug

Currently when debuginfo is separated to vmlinux.debug, it's contents
get ignored. Let's change that and add it to the vmlinux_path list.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448469166-61363-3-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf symbols: Refactor vmlinux_path__init() to ease path additions
Ekaterina Tumanova [Wed, 25 Nov 2015 16:32:45 +0000 (17:32 +0100)]
perf symbols: Refactor vmlinux_path__init() to ease path additions

Refactor vmlinux_path__init() to ease subsequent additions of new
vmlinux locations.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448469166-61363-2-git-send-email-tumanova@linux.vnet.ibm.com
[ Rename vmlinux_path__update() to vmlinux_path__add() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools build: Use fixdep with OUTPUT path prefix
Jiri Olsa [Thu, 26 Nov 2015 18:50:55 +0000 (19:50 +0100)]
tools build: Use fixdep with OUTPUT path prefix

Adding OUTPUT path prefix for fixdep target so we use it properly in out
of tree builds.

If the fixdep already existed in the tree, the out of tree build would
see it already exist and did not build the out of tree version, as
reported by Arnaldo:

  [acme@zoo linux]$ make O=/tmp/build/perf -C tools/perf
  make: Entering directory '/home/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
  make[2]: Nothing to be done for 'fixdep'.
  make: Leaving directory '/home/git/linux/tools/perf'

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20151126185055.GC19410@krava.brq.redhat.com
[ Fixed conflict with 5725dd8fa888 ("tools build: Clean CFLAGS and LDFLAGS for fixdep") ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf script: Pass perf_script into process_event
Jiri Olsa [Thu, 26 Nov 2015 17:55:21 +0000 (18:55 +0100)]
perf script: Pass perf_script into process_event

Passing perf_script struct into process_event function, so we could
process configuration data for event printing.

It will be used in following patch to get event name string width.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20151126175521.GA18979@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Correctly identify anon_hugepage when generating map (v2)
Yannick Brosseau [Thu, 26 Nov 2015 11:42:32 +0000 (03:42 -0800)]
perf tools: Correctly identify anon_hugepage when generating map (v2)

When parsing /proc/xxx/maps, the sscanf in perf_event__synthesize_mmap_events
truncate the map name at the space in "/anon_hugepage (deleted)".

is_anon_memory() then only receives the string "/anon_hugepage" and does
not detect it.  We change is_anon_memory() to only compare the first
part of the string, effectively ignoring if " (deleted)" is there.

Signed-off-by: Yannick Brosseau <scientist@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joshua Zhu <zhu.wen-jie@hp.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/1448538152-2898-1-git-send-email-scientist@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf machine: Adjust dso->long_name for offline module
Wang Nan [Thu, 26 Nov 2015 03:59:57 +0000 (03:59 +0000)]
perf machine: Adjust dso->long_name for offline module

Something unexpected may happen if copy statically linked perf to a
production environment:

  # ./perf probe -m ./mymodule.ko my_func
  [mymodule] with build id 326ab42550ef3d24944f53c817533728367effeb not found, continuing without symbols
  Failed to find symbol my_func in /home/wangnan/kmodule/mymodule.ko
    Error: Failed to add events.
  # ./perf buildid-cache -a ./mymodule.ko
  # ./perf probe -m ./mymodule.ko my_func
  Added new event:
    probe:my_func        (on my_func in /home/wangnan/kmodule/mymodule.ko)

  You can now use it in all perf tools, such as:

   perf record -e probe:my_func -aR sleep 1

Where:

  # ldd ./perf
  not a dynamic executable
  # strace -e open ./perf probe -m ./mymodule.ko my_func
  ...
  open("/home/wangnan/kmodule/mymodule.ko", O_RDONLY) = 3
  open("/home/wangnan/kmodule/../lib64/elfutils/libebl_x86_64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
  ...
  open("/lib64/tls/libebl_x86_64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
  open("/lib64/libebl_x86_64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
  open("/usr/lib64/tls/libebl_x86_64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
  open("/usr/lib64/libebl_x86_64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
  open("[mymodule]", O_RDONLY)            = -1 ENOENT (No such file or directory)
  open("/home/wangnan/.debug/.build-id/32/6ab42550ef3d24944f53c817533728367effeb", O_RDONLY) = -1 ENOENT (No such file or directory)
  open("[mymodule]", O_RDONLY)            = -1 ENOENT (No such file or directory)

In the above example, probe fails before we put the module into
buildid-cache. However, user would expect it success in both case
because perf is able to find probe points actually.

The reason is because perf won't utilize module's full path if it failed
to open debuginfo. In:

     convert_to_probe_trace_events ->
        find_probe_trace_events_from_map ->
            get_target_map ->
                kernel_get_module_map ->
                    machine__findnew_module_map ->
                        map_groups__find_by_name

map_groups__find_by_name() is able to find the map of that module, but
this information is found from /proc/module before it knows the real
path of the offline module. Therefore, the map->dso->long_name is set to
something like '[mymodule]', which prevent dso__load() find the real
path of the module file.

In another aspect, if dso__load() can get the offline module through
buildid cache, it can read symble table from that ko. Even if debuginfo
is not available, 'perf probe' can success if the '.symtab' can be
found.

This patch improves machine__findnew_module_map(): when dso->long_name
is leading with '[' (doesn't find path of module when parsing
/proc/modules), fixes it by dso__set_long_name(), so following
dso__load() is possible to find the symbol table.

This patch won't interfere with buildid matching. Here is the test
result:

  # ./perf probe -m ./mymodule.ko my_func
  Added new event:
    probe:my_func        (on my_func in /home/wangnan/kmodule/mymodule.ko)

  You can now use it in all perf tools, such as:

   perf record -e probe:my_func -aR sleep 1

  # ./perf probe -d '*'
  Removed event: probe:my_func
  # mv ./mymodule.{ko,.bak}
  # mv ./moduleb.ko mymodule.ko
  # ./perf probe -m ./mymodule.ko my_func
  /home/wangnan/kmodule/mymodule.ko with build id 326ab42550ef3d24944f53c817533728367effeb not found, continuing without symbols
  Failed to find symbol my_func in /home/wangnan/kmodule/mymodule.ko
    Error: Failed to add events.

  # ./perf probe -v -m ./mymodule.ko my_func
  probe-definition(0): my_func
  symbol:my_func file:(null) line:0 offset:0 return:0 lazy:(null)
  0 arguments
  Could not open debuginfo. Try to use symbols.
  symsrc__init: build id mismatch for /home/wangnan/kmodule/mymodule.ko.
  /home/wangnan/kmodule/mymodule.ko with build id 326ab42550ef3d24944f53c817533728367effeb not found, continuing without symbols
  Failed to find symbol my_func in /home/wangnan/kmodule/mymodule.ko
    Error: Failed to add events. Reason: No such file or directory (Code: -2)

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448510397-187965-1-git-send-email-wangnan0@huawei.com
[ Renamed adjust_dso_long_name() do dso__adjust_kmod_long_name() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf build: Fix traceevent plugins build race
Jiri Olsa [Thu, 26 Nov 2015 13:54:04 +0000 (14:54 +0100)]
perf build: Fix traceevent plugins build race

Ingo reported following build failure:

  $ make clean install
  ...
    CC       plugin_kmem.o
  fixdep: error opening depfile: ./.plugin_hrtimer.o.d: No such file or directory
  /home/mingo/tip/tools/build/Makefile.build:77: recipe for target
  'plugin_hrtimer.o' failed
  make[3]: *** [plugin_hrtimer.o] Error 2
  Makefile:189: recipe for target 'plugin_hrtimer-in.o' failed
  make[2]: *** [plugin_hrtimer-in.o] Error 2
  Makefile.perf:414: recipe for target 'libtraceevent_plugins' failed
  make[1]: *** [libtraceevent_plugins] Error 2
  make[1]: *** Waiting for unfinished jobs....

Currently we have the install-traceevent-plugins target being dependent
on $(LIBTRACEEVENT), which will actualy not build any plugin. So the
install-traceevent-plugins target itself will try to build plugins,
but..

Plugins built is also triggered by perf build itself via
libtraceevent_plugins target.

This might cause a race having one make thread removing temp files from
another and result in above error. Fixing this by having proper plugins
build dependency before installing plugins.

Reported-and-Tested-by:: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448546044-28973-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf script: Remove default_scripting_ops
Jiri Olsa [Thu, 26 Nov 2015 13:55:23 +0000 (14:55 +0100)]
perf script: Remove default_scripting_ops

The default script handler (the one that displays samples on screen) is
implemented scripting_ops instance with process_event callback.

This way we can't pass any script config into display function, because
we don't want perl or python handlers to be depended on perf script
internals.

Removing the default_scripting_ops and calling process event function
directly. This way it's possible to pass perf_script struct and process
configuration data in following commit.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1448546125-29245-1-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf top: Fix freeze on --call-graph flat/folded
Namhyung Kim [Thu, 26 Nov 2015 07:08:18 +0000 (16:08 +0900)]
perf top: Fix freeze on --call-graph flat/folded

The callchain rbtree is rebuilt periodically, so it needs to
reinitialize the root everytime.  Otherwise it can be stuck in the
rbtree insertion with stale pointers.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448521700-32062-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf callchain: Honor hide_unresolved
Namhyung Kim [Thu, 26 Nov 2015 07:08:20 +0000 (16:08 +0900)]
perf callchain: Honor hide_unresolved

If user requested to hide unresolved entries, skip unresolved callchains
as well as hist entries.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1448521700-32062-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Thu, 26 Nov 2015 08:13:50 +0000 (09:13 +0100)]
Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Fix to free temporal Dwarf_Frame correctly in 'perf probe', fixing a
    regression introduced in perf/core that prevented, at least, adding
    an uprobe collecting function parameter values (Masami Hiramatsu)

  - Fix output of %llu for 64 bit values read on 32 bit machines in
    libtraceevent (Steven Rostedt)

Developer visible:

  - Clean CFLAGS and LDFLAGS for fixdep in tools/build (Wang Nan)

  - Don't do a feature check when cleaning tools/lib/bpf (Wang Nan)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agotools lib bpf: Don't do a feature check when cleaning
Wang Nan [Tue, 24 Nov 2015 13:36:07 +0000 (13:36 +0000)]
tools lib bpf: Don't do a feature check when cleaning

Before this patch libbpf always do feature check even when cleaning.

For example:

  $ cd kernel/tools/lib/bpf
  $ make

  Auto-detecting system features:
  ...                        libelf: [ on  ]
  ...                           bpf: [ on  ]

    CC       libbpf.o
    CC       bpf.o
    LD       libbpf-in.o
    LINK     libbpf.a
    LINK     libbpf.so
  $ make clean
    CLEAN    libbpf
    CLEAN    core-gen
  $ make clean

  Auto-detecting system features:
  ...                        libelf: [ on  ]
  ...                           bpf: [ on  ]

    CLEAN    libbpf
    CLEAN    core-gen
  $

Although the first 'make clean' doesn't show feature check result, it
still does the check. No output because check result is similar to
FEATURE-DUMP.libbpf.

This patch uses same method as perf to turn off feature checking when
'make clean'.

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448372181-151723-3-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agotools build: Clean CFLAGS and LDFLAGS for fixdep
Wang Nan [Tue, 24 Nov 2015 13:36:06 +0000 (13:36 +0000)]
tools build: Clean CFLAGS and LDFLAGS for fixdep

Sometimes passing variables to tools/build is dangerous. For example, on
my platform there is a gcc problem (gcc 4.8.1):

It passes the stackprotector-all feature check:

  $ gcc -fstack-protector-all -c ./test.c
  $ echo $?
  0

But requires LDFLAGS support if separate compiling and linking:
  $ gcc -fstack-protector-all -c ./test.c
  $ gcc ./test.o
  ./test.o: In function `main':
  test.c:(.text+0xb): undefined reference to `__stack_chk_guard'
  test.c:(.text+0x21): undefined reference to `__stack_chk_guard'
  collect2: error: ld returned 1 exit status
  $ gcc -fstack-protector-all ./test.o
  $ echo $?
  0
  $ gcc ./test.o -lssp
  $ echo $?
  0
  $

In this environment building perf throws an error:

  $ make
    BUILD:   Doing 'make -j24' parallel build
  config/Makefile:344: No libunwind found. Please install libunwind-dev[el] >= 1.1 and/or set LIBUNWIND_DIR
  config/Makefile:403: No libaudit.h found, disables 'trace' tool, please install audit-libs-devel or libaudit-dev
  config/Makefile:418: slang not found, disables TUI support. Please install slang-devel or libslang-dev
  config/Makefile:432: GTK2 not found, disables GTK2 support. Please install gtk2-devel or libgtk2.0-dev
  config/Makefile:564: No bfd.h/libbfd found, please install binutils-dev[el]/zlib-static/libiberty-dev to gain symbol demangling
  config/Makefile:606: No numa.h found, disables 'perf bench numa mem' benchmark, please install numactl-devel/libnuma-devel/libnuma-dev
    CC       fixdep.o
    LD       fixdep-in.o
    LINK     fixdep
  fixdep-in.o: In function `parse_dep_file':
  /kernel/tools/build/fixdep.c:47: undefined reference to `__stack_chk_guard'
  /kernel/tools/build/fixdep.c:117: undefined reference to `__stack_chk_guard'
  fixdep-in.o: In function `main':
  /kernel-hydrogen/tools/build/fixdep.c:156: undefined reference to `__stack_chk_guard'
  /kernel/tools/build/fixdep.c:168: undefined reference to `__stack_chk_guard'
  collect2: error: ld returned 1 exit status
  make[2]: *** [fixdep] Error 1
  make[1]: *** [fixdep] Error 2
  make: *** [all] Error 2

This is because the CFLAGS used in building perf pollutes the CFLAGS
used for fixdep, passing -fstack-protector-all to buiold fixdep which is
obviously not required. Since fixdep is a small host side tool, we
should keep its CFLAGS/LDFLAGS simple and clean.

This patch clears the CFLAGS and LDFLAGS passed when building fixdep, so
such gcc problem won't block the perf build process.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1448372181-151723-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf probe: Fix to free temporal Dwarf_Frame correctly
Masami Hiramatsu [Wed, 25 Nov 2015 10:34:32 +0000 (19:34 +0900)]
perf probe: Fix to free temporal Dwarf_Frame correctly

The commit 05c8d802fa52 ("perf probe: Fix to free temporal Dwarf_Frame")
tried to fix the memory leak of Dwarf_Frame, but it released the frame
at wrong point. Since the dwarf_frame_cfa(frame, &pf->fb_ops, &nops) can
return an address inside the frame data structure to pf->fb_ops, we can
not release the frame before using pf->fb_ops.

This reverts the commit and releases the frame afterwards (right before
returning from call_probe_finder) correctly.

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 05c8d802fa52 ("perf probe: Fix to free temporal Dwarf_Frame")
LPU-Reference: 20151125103432.1473.31009.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoMerge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Tue, 24 Nov 2015 08:07:28 +0000 (09:07 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Allow callchain order (caller, callee) to the libdw and libunwind based DWARF
    unwinders (Jiri Olsa)

  - Add missing parent_val initialization in the callchain code, fixing a
    SEGFAULT when using callchains with 'perf top' (Jiri Olsa)

  - Add initial 'perf config' command, for now just with a --list command to the
    contents of the configuration file in use and a basic man page describing
    its format, commands for doing edits and detailed documentation are being
    reviewed and proof-read. (Taeung Song)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agotools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines
Steven Rostedt [Mon, 16 Nov 2015 22:25:16 +0000 (17:25 -0500)]
tools lib traceevent: Fix output of %llu for 64 bit values read on 32 bit machines

When a long value is read on 32 bit machines for 64 bit output, the
parsing needs to change "%lu" into "%llu", as the value is read
natively.

Unfortunately, if "%llu" is already there, the code will add another "l"
to it and fail to parse it properly.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20151116172516.4b79b109@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf callchain: Add missing parent_val initialization
Jiri Olsa [Sat, 21 Nov 2015 10:23:55 +0000 (11:23 +0100)]
perf callchain: Add missing parent_val initialization

Adding missing parent_val callchain_node initialization.
It's causing segfault in perf top:

  $ sudo perf top -g
  perf: Segmentation fault
  -------- backtrace --------
  free_callchain_node(+0x29) in perf [0x4a4b3e]
  free_callchain(+0x29) in perf [0x4a5a83]
  hist_entry__delete(+0x126) in perf [0x4c6649]
  hists__delete_entry(+0x6e) in perf [0x4c66dc]
  hists__decay_entries(+0x7d) in perf [0x4c6776]
  perf_top__sort_new_samples(+0x7c) in perf [0x436a78]
  hist_browser__run(+0xf2) in perf [0x507760]
  perf_evsel__hists_browse(+0x1da) in perf [0x507c8d]
  perf_evlist__tui_browse_hists(+0x3e) in perf [0x5088cf]
  display_thread_tui(+0x7f) in perf [0x437953]
  start_thread(+0xc5) in libpthread-2.21.so [0x7f7068fbb555]
  __clone(+0x6d) in libc-2.21.so [0x7f7066fc3b9d]
  [0x0]

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains")
Link: http://lkml.kernel.org/r/20151121102355.GA17313@krava.local
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf config: Add initial man page
Taeung Song [Sun, 22 Nov 2015 10:11:56 +0000 (19:11 +0900)]
perf config: Add initial man page

Add perf-config document to describe the perf configuration and a
'list’ subcommand.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/63AD9B57-7B8C-46F8-8F18-0FFEB9A6A1BC@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf tools: Add 'perf config' command
Taeung Song [Tue, 17 Nov 2015 13:53:21 +0000 (22:53 +0900)]
perf tools: Add 'perf config' command

The perf configuration file contains many variables to change various
aspects of each of its tools, including output, disk usage, etc.

But looking at the state of configuration is difficult and there's no
documentation about config variables except for the variables in
perfconfig.example exist.

So this patch adds a 'perf-config' command with a '--list' option.

    perf config [options]

    display current perf config variables.
    # perf config -l | --list

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1447768424-17327-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf callchain: Add order support for libdw DWARF unwinder
Jiri Olsa [Thu, 19 Nov 2015 13:01:19 +0000 (14:01 +0100)]
perf callchain: Add order support for libdw DWARF unwinder

As reported by Milian, currently for DWARF unwind (both libdw and
libunwind) we display callchain in callee order only.

Adding the support to follow callchain order setup to libdw DWARF
unwinder, so we could get following output for report:

  $ perf record --call-graph dwarf ls
  ...

  $ perf report --no-children --stdio

    21.12%  ls       libc-2.21.so      [.] __strcoll_l
                 |
                 ---__strcoll_l
                    mpsort_with_tmp
                    mpsort_with_tmp
                    mpsort_with_tmp
                    sort_files
                    main
                    __libc_start_main
                    _start

  $ perf report --stdio --no-children -g caller

    21.12%  ls       libc-2.21.so      [.] __strcoll_l
                 |
                 ---_start
                    __libc_start_main
                    main
                    sort_files
                    mpsort_with_tmp
                    mpsort_with_tmp
                    mpsort_with_tmp
                    __strcoll_l

Reported-and-Tested-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jan Kratochvil <jkratoch@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20151119130119.GA26617@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf test: Add callchain order setup for DWARF unwinder test
Jiri Olsa [Tue, 17 Nov 2015 15:05:39 +0000 (16:05 +0100)]
perf test: Add callchain order setup for DWARF unwinder test

Adding callchain order setup for DWARF unwinder test. The test now runs
unwinder for both callee and caller orders.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1447772739-18471-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf callchain: Add order support for libunwind DWARF unwinder
Jiri Olsa [Wed, 18 Nov 2015 07:52:47 +0000 (08:52 +0100)]
perf callchain: Add order support for libunwind DWARF unwinder

As reported by Milian, currently for DWARF unwind (both libdw and
libunwind) we display callchain in callee order only.

Adding the support to follow callchain order setup to libunwind DWARF
unwinder, so we could get following output for report:

  $ perf record --call-graph dwarf ls
  ...
  $ perf report --no-children --stdio

    39.26%  ls       libc-2.21.so      [.] __strcoll_l
                 |
                 ---__strcoll_l
                    mpsort_with_tmp
                    mpsort_with_tmp
                    sort_files
                    main
                    __libc_start_main
                    _start
                    0

  $ perf report -g caller --no-children --stdio
    ...
    39.26%  ls       libc-2.21.so      [.] __strcoll_l
                 |
                 ---0
                    _start
                    __libc_start_main
                    main
                    sort_files
                    mpsort_with_tmp
                    mpsort_with_tmp
                    __strcoll_l

Based-on-patch-by: Milian Wolff <milian.wolff@kdab.com>
Reported-and-Tested-by: Milian Wolff <milian.wolff@kdab.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Wang Nan <wangnan0@huawei.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20151118075247.GA5416@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf callchain: Move initial entry call into get_entries function
Jiri Olsa [Tue, 17 Nov 2015 15:05:37 +0000 (16:05 +0100)]
perf callchain: Move initial entry call into get_entries function

Moving initial entry call into get_entries function so all entries
processing is on one place. It will be useful for next change that adds
ordering logic.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1447772739-18471-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
8 years agoperf/x86: Handle multiple umask bits for BDW CYCLE_ACTIVITY.*
Andi Kleen [Tue, 17 Nov 2015 00:21:07 +0000 (16:21 -0800)]
perf/x86: Handle multiple umask bits for BDW CYCLE_ACTIVITY.*

The earlier constraint fix for Broadwell CYCLE_ACTIVITY.*
forced umask 8 to counter 2. For this it used UEVENT,
to match the complete umask.

The event list for Broadwell has an additional
STALLS_L1D_PENDIND event that uses umask 8, but also
sets other bits in the umask.  The earlier strict umask match
didn't handle this case.

Add a new UBIT_EVENT constraint macro that only matches
the specified bits in the umask. Then use that macro
to handle CYCLE_ACTIVITY.* on Broadwell.

The documented event also uses cmask, but there's no
need to let the event scheduler know about the cmask,
as the scheduling restriction is only tied to the umask.

Reported-by: Grant Ayers <ayers@cs.stanford.edu>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1447719667-9998-1-git-send-email-andi@firstfloor.org
[ Filled in the missing email address of Grant Ayers - hopefully I got the right one. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf, x86: Stop Intel PT before kdump starts
Takao Indoh [Wed, 4 Nov 2015 05:22:33 +0000 (14:22 +0900)]
perf, x86: Stop Intel PT before kdump starts

This patch stops Intel PT logging and saves its registers in memory
before kdump is started. This feature is needed to prevent Intel PT from
overwriting its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data.

After the crash dump is captured by kdump, users can retrieve the log buffer
from the vmcore and use it to investigate bad kernel behavior.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-3-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86/intel/pt: Add interface to stop Intel PT logging
Takao Indoh [Wed, 4 Nov 2015 05:22:32 +0000 (14:22 +0900)]
perf/x86/intel/pt: Add interface to stop Intel PT logging

This patch add a function for external components to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, the intel_pt driver disables Intel PT and saves its registers
using pt_event_stop(), which is also used by pmu.stop handler.

This function stops Intel PT on the CPU where it is working, therefore
users of it need to call it for each CPU to stop all logging.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86: Add option to disable reading branch flags/cycles
Andi Kleen [Tue, 20 Oct 2015 18:46:34 +0000 (11:46 -0700)]
perf/x86: Add option to disable reading branch flags/cycles

With LBRv5 reading the extra LBR flags like mispredict, TSX, cycles is
not free anymore, as it has moved to a separate MSR.

For callstack mode we don't need any of this information; so we can
avoid the unnecessary MSR read. Add flags to the perf interface where
perf record can request not collecting this information.

Add branch_sample_type flags for CYCLES and FLAGS. It's a bit unusual
for branch_sample_types to be negative (disable), not positive (enable),
but since the legacy ABI reported the flags we need some form of
explicit disabling to avoid breaking the ABI.

After we have the flags the x86 perf code can keep track if any users
need the flags. If noone needs it the information is not collected.

This cuts down the cost of LBR callstack on Skylake significantly.
Profiling a kernel build with LBR call stack the average run time of
the PMI handler drops by 43%.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86: Optimize stack walk user accesses
Andi Kleen [Thu, 22 Oct 2015 22:07:21 +0000 (15:07 -0700)]
perf/x86: Optimize stack walk user accesses

Change the perf user stack walking to use the new
__copy_from_user_nmi(), and split each access into word sized transfer
sizes. This allows to inline the complete access and optimize it all
into a single load.

The main advantage is that this avoids the overhead of double page
faults.  When normal copy_from_user() fails it reexecutes the copy to
compute an accurate number of non copied bytes. This leads to
executing the expensive page fault twice.

While walking stacks having a fault at some point is relatively common
(typically when some part of the program isn't compiled with frame
pointers), so this is a large overhead.

With the optimized copies we avoid this problem because they only do
all accesses once. And of course they're much faster too when the
access does not fault because they're just single instructions instead
of complex function calls.

While profiling a kernel build with -g, the patch brings down the
average time of the PMI handler from 966ns to 552ns (-43%).

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1445551641-13379-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agox86: Add an inlined __copy_from_user_nmi() variant
Andi Kleen [Thu, 22 Oct 2015 22:07:20 +0000 (15:07 -0700)]
x86: Add an inlined __copy_from_user_nmi() variant

Add a inlined __ variant of copy_from_user_nmi. The inlined variant allows
the user to:

 - batch the access_ok() check for multiple accesses

 - avoid having a pagefault_disable/enable() on every access if the
   caller already ensures disabled page faults due to its context.

 - get all the optimizations in copy_*_user() for small constant sized
   transfers

It is just a define to __copy_from_user_inatomic().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445551641-13379-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoMerge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git...
Ingo Molnar [Mon, 23 Nov 2015 08:13:48 +0000 (09:13 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Allows BPF scriptlets specify arguments to be fetched using
    DWARF info, using a prologue generated at compile/build time (He Kuang, Wang Nan)

  - Allow attaching BPF scriptlets to module symbols (Wang Nan)

  - Allow attaching BPF scriptlets to userspace code using uprobe (Wang Nan)

  - BPF programs now can specify 'perf probe' tunables via its section name,
    separating key=val values using semicolons (Wang Nan)

Testing some of these new BPF features:

  Use case: get callchains when receiving SSL packets, filter then in the
            kernel, at arbitrary place.

  # cat ssl.bpf.c
  #define SEC(NAME) __attribute__((section(NAME), used))

  struct pt_regs;

  SEC("func=__inet_lookup_established hnum")
  int func(struct pt_regs *ctx, int err, unsigned short port)
  {
          return err == 0 && port == 443;
  }

  char _license[] SEC("license") = "GPL";
  int  _version   SEC("version") = LINUX_VERSION_CODE;
  #
  # perf record -a -g -e ssl.bpf.c
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
  # perf script | head -30
  swapper     0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
 8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
 856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
 2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
 2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
 96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
 969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
 2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
 95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
1163ffa start_kernel ([kernel.vmlinux].init.text)
11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)

  qemu-system-x86  9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
 8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
 896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
 8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
 855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
 856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
 8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
   430a br_handle_frame_finish ([bridge])
   48bc br_handle_frame ([bridge])
 855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
 8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
  #

    Use 'perf probe' various options to list functions, see what variables can
    be collected at any given point, experiment first collecting without a filter,
    then filter, use it together with 'perf trace', 'perf top', with or without
    callchains, if it explodes, please tell us!

  - Introduce a new callchain mode: "folded", that will list per line
    representations of all callchains for a give histogram entry, facilitating
    'perf report' output processing by other tools, such as Brendan Gregg's
    flamegraph tools (Namhyung Kim)

  E.g:

 # perf report | grep -v ^# | head
    18.37%     0.00%  swapper  [kernel.kallsyms]   [k] cpu_startup_entry
                    |
                    ---cpu_startup_entry
                       |
                       |--12.07%--start_secondary
                       |
                        --6.30%--rest_init
                                  start_kernel
                                  x86_64_start_reservations
                                  x86_64_start_kernel
  #

 Becomes, in "folded" mode:

 # perf report -g folded | grep -v ^# | head -5
     18.37%     0.00%  swapper [kernel.kallsyms]   [k] cpu_startup_entry
   12.07% cpu_startup_entry;start_secondary
    6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
     16.90%     0.00%  swapper [kernel.kallsyms]   [k] call_cpuidle
   11.23% call_cpuidle;cpu_startup_entry;start_secondary
    5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
     16.90%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter
   11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
    5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
     15.12%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter_state
  #

   The user can also select one of "count", "period" or "percent" as the first column.

Infrastructure changes:

  - Fix multiple leaks found with Valgrind and a refcount
    debugger (Masami Hiramatsu)

  - Add further 'perf test' entries for BPF and LLVM (Wang Nan)

  - Improve 'perf test' to suport subtests, so that the series of tests
    performed in the LLVM and BPF main tests appear in the default 'perf test'
    output (Wang Nan)

  - Move memdup() from tools/perf to tools/lib/string.c (Arnaldo Carvalho de Melo)

  - Adopt strtobool() from the kernel into tools/lib/ (Wang Nan)

  - Fix selftests_install tools/ Makefile rule (Kevin Hilman)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agotreewide: Remove old email address
Peter Zijlstra [Mon, 16 Nov 2015 10:08:45 +0000 (11:08 +0100)]
treewide: Remove old email address

There were still a number of references to my old Red Hat email
address in the kernel source. Remove these while keeping the
Red Hat copyright notices intact.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/x86: Fix LBR call stack save/restore
Andi Kleen [Tue, 20 Oct 2015 18:46:33 +0000 (11:46 -0700)]
perf/x86: Fix LBR call stack save/restore

This fixes a bug I added in the following commit:

  90405aa02247 ("perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode")

The bug could lead to lost LBR call stacks. When restoring the LBR state
we need to use the TOS of the previous context, not the current context.
To do that we need to save/restore the TOS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1445366797-30894-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf: Update email address in MAINTAINERS
Peter Zijlstra [Mon, 16 Nov 2015 10:07:44 +0000 (11:07 +0100)]
perf: Update email address in MAINTAINERS

While still valid, I'm trying to phase out this email address.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/core: Robustify the perf_cgroup_from_task() RCU checks
Stephane Eranian [Thu, 12 Nov 2015 10:00:04 +0000 (11:00 +0100)]
perf/core: Robustify the perf_cgroup_from_task() RCU checks

This patch reinforces the lockdep checks performed by
perf_cgroup_from_tsk() by passing the perf_event_context
whenever possible. It is okay to not hold the RCU read lock
when we know we hold the ctx->lock. This patch makes sure this
property holds.

In some functions, such as perf_cgroup_sched_in(), we do not
pass the context because we are sure we are holding the RCU
read lock.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: edumazet@google.com
Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoperf/core: Fix RCU problem with cgroup context switching code
Stephane Eranian [Thu, 12 Nov 2015 10:00:03 +0000 (11:00 +0100)]
perf/core: Fix RCU problem with cgroup context switching code

The RCU checker detected RCU violation in the cgroup switching routines
perf_cgroup_sched_in() and perf_cgroup_sched_out(). We were dereferencing
cgroup from task without holding the RCU lock.

Fix this by holding the RCU read lock. We move the locking from
perf_cgroup_switch() to avoid double locking.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: edumazet@google.com
Link: http://lkml.kernel.org/r/1447322404-10920-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
8 years agoLinux 4.4-rc2
Linus Torvalds [Mon, 23 Nov 2015 00:45:59 +0000 (16:45 -0800)]
Linux 4.4-rc2

8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sun, 22 Nov 2015 23:21:40 +0000 (15:21 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge slub bulk allocator updates from Andrew Morton:
 "This missed the merge window because I was waiting for some repairs to
  come in.  Nothing actually uses the bulk allocator yet and the changes
  to other code paths are pretty small.  And the net guys are waiting
  for this so they can start merging the client code"

More comments from Jesper Dangaard Brouer:
 "The kmem_cache_alloc_bulk() call, in mm/slub.c, were included in
  previous kernel.  The present version contains a bug.  Vladimir
  Davydov noticed it contained a bug, when kernel is compiled with
  CONFIG_MEMCG_KMEM (see commit 03ec0ed57ffc: "slub: fix kmem cgroup
  bug in kmem_cache_alloc_bulk").  Plus the mem cgroup counterpart in
  kmem_cache_free_bulk() were missing (see commit 033745189b1b "slub:
  add missing kmem cgroup support to kmem_cache_free_bulk").

  I don't consider the fix stable-material because there are no in-tree
  users of the API.

  But with known bugs (for memcg) I cannot start using the API in the
  net-tree"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  slab/slub: adjust kmem_cache_alloc_bulk API
  slub: add missing kmem cgroup support to kmem_cache_free_bulk
  slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  slub: optimize bulk slowpath free by detached freelist
  slub: support for bulk free with SLUB freelists

8 years agoMerge tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 22 Nov 2015 23:10:57 +0000 (15:10 -0800)]
Merge tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are a few small tty/serial driver fixes for 4.4-rc2 that resolve
  some reported problems.

  All have been in linux-next, full details are in the shortlog below"

* tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: export fsl8250_handle_irq
  serial: 8250_mid: Add missing dependency
  tty: audit: Fix audit source
  serial: etraxfs-uart: Fix crash
  serial: fsl_lpuart: Fix earlycon support
  bcm63xx_uart: Use the device name when registering an interrupt
  tty: Fix direct use of tty buffer work
  tty: Fix tty_send_xchar() lock order inversion

8 years agoMerge tag 'staging-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 22 Nov 2015 21:26:24 +0000 (13:26 -0800)]
Merge tag 'staging-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO fixes from Greg KH:
 "Here are some staging and iio driver fixes for 4.4-rc2.  All of these
  are in response to issues that have been reported and have been in
  linux-next for a while"

* tag 'staging-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  Revert "Staging: wilc1000: coreconfigurator: Drop unneeded wrapper functions"
  iio: adc: xilinx: Fix VREFN scale
  iio: si7020: Swap data byte order
  iio: adc: vf610_adc: Fix division by zero error
  iio:ad7793: Fix ad7785 product ID
  iio: ad5064: Fix ad5629/ad5669 shift
  iio:ad5064: Make sure ad5064_i2c_write() returns 0 on success
  iio: lpc32xx_adc: fix warnings caused by enabling unprepared clock
  staging: iio: select IRQ_WORK for IIO_DUMMY_EVGEN
  vf610_adc: Fix internal temperature calculation

8 years agoMerge tag 'usb-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 22 Nov 2015 21:15:05 +0000 (13:15 -0800)]
Merge tag 'usb-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of USB fixes and new device ids for 4.4-rc2.  All
  have been in linux-next and the details are in the shortlog"

* tag 'usb-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
  usblp: do not set TASK_INTERRUPTIBLE before lock
  USB: MAINTAINERS: cxacru
  usb: kconfig: fix warning of select USB_OTG
  USB: option: add XS Stick W100-2 from 4G Systems
  xhci: Fix a race in usb2 LPM resume, blocking U3 for usb2 devices
  usb: xhci: fix checking ep busy for CFC
  xhci: Workaround to get Intel xHCI reset working more reliably
  usb: chipidea: imx: fix a possible NULL dereference
  usb: chipidea: usbmisc_imx: fix a possible NULL dereference
  usb: chipidea: otg: gadget module load and unload support
  usb: chipidea: debug: disable usb irq while role switch
  ARM: dts: imx27.dtsi: change the clock information for usb
  usb: chipidea: imx: refine clock operations to adapt for all platforms
  usb: gadget: atmel_usba_udc: Expose correct device speed
  usb: musb: enable usb_dma parameter
  usb: phy: phy-mxs-usb: fix a possible NULL dereference
  usb: dwc3: gadget: let us set lower max_speed
  usb: musb: fix tx fifo flush handling
  usb: gadget: f_loopback: fix the warning during the enumeration
  usb: dwc2: host: Fix remote wakeup when not in DWC2_L2
  ...

8 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sun, 22 Nov 2015 20:59:46 +0000 (12:59 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:

 - Fix a flood of annoying build warnings

 - A number of fixes for Atheros 79xx platforms

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: ath79: Add a machine entry for booting OF machines
  MIPS: ath79: Fix the size of the MISC INTC registers in ar9132.dtsi
  MIPS: ath79: Fix the DDR control initialization on ar71xx and ar934x
  MIPS: Fix flood of warnings about comparsion being always true.

8 years agoMerge branch 'parisc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Sun, 22 Nov 2015 20:50:58 +0000 (12:50 -0800)]
Merge branch 'parisc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc update from Helge Deller:
 "This patchset adds Huge Page and HUGETLBFS support for parisc"

Honestly, the hugepage support should have gone through in the merge
window, and is not really an rc-time fix.  But it only touches
arch/parisc, and I cannot find it in myself to care.  If one of the
three parisc users notices a breakage, I will point at Helge and make
rude farting noises.

* 'parisc-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Map kernel text and data on huge pages
  parisc: Add Huge Page and HUGETLBFS support
  parisc: Use long branch to do_syscall_trace_exit
  parisc: Increase initial kernel mapping to 32MB on 64bit kernel
  parisc: Initialize the fault vector earlier in the boot process.
  parisc: Add defines for Huge page support
  parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h
  parisc: Drop definition of start_thread_som for HP-UX SOM binaries
  parisc: Fix wrong comment regarding first pmd entry flags

8 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 22 Nov 2015 20:37:20 +0000 (12:37 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf tool fixes from Thomas Gleixner:
 "A couple of fixes for perf tools:

   - Build system updates

   - Plug a memory leak in an error path of perf probe

   - Tear down probes correctly when adding fails

   - Fixes to the perf symbol handling

   - Fix ordering of event processing in buildid-list

   - Fix per DSO filtering in the histogram browser"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Clear probe_trace_event when add_probe_trace_event() fails
  perf probe: Fix memory leaking on failure by clearing all probe_trace_events
  perf inject: Also re-pipe lost_samples event
  perf buildid-list: Requires ordered events
  perf symbols: Fix dso lookup by long name and missing buildids
  perf symbols: Allow forcing reading of non-root owned files by root
  perf hists browser: The dso can be obtained from popup_action->ms.map->dso
  perf hists browser: Fix 'd' hotkey action to filter by DSO
  perf symbols: Rebuild rbtree when adjusting symbols for kcore
  tools: Add a "make all" rule
  tools: Actually install tmon in the install rule

8 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 22 Nov 2015 20:00:12 +0000 (12:00 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "This update contains:

   - MPX updates for handling 32bit processes

   - A fix for a long standing bug in 32bit signal frame handling
     related to FPU/XSAVE state

   - Handle get_xsave_addr() correctly in KVM

   - Fix SMAP check under paravirtualization

   - Add a comment to the static function trace entry to avoid further
     confusion about the difference to dynamic tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Fix SMAP check in PVOPS environments
  x86/ftrace: Add comment on static function tracing
  x86/fpu: Fix get_xsave_addr() behavior under virtualization
  x86/fpu: Fix 32-bit signal frame handling
  x86/mpx: Fix 32-bit address space calculation
  x86/mpx: Do proper get_user() when running 32-bit binaries on 64-bit kernels

8 years agoslab/slub: adjust kmem_cache_alloc_bulk API
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:58 +0000 (15:57 -0800)]
slab/slub: adjust kmem_cache_alloc_bulk API

Adjust kmem_cache_alloc_bulk API before we have any real users.

Adjust API to return type 'int' instead of previously type 'bool'.  This
is done to allow future extension of the bulk alloc API.

A future extension could be to allow SLUB to stop at a page boundary, when
specified by a flag, and then return the number of objects.

The advantage of this approach, would make it easier to make bulk alloc
run without local IRQs disabled.  With an approach of cmpxchg "stealing"
the entire c->freelist or page->freelist.  To avoid overshooting we would
stop processing at a slab-page boundary.  Else we always end up returning
some objects at the cost of another cmpxchg.

To keep compatible with future users of this API linking against an older
kernel when using the new flag, we need to return the number of allocated
objects with this API change.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: add missing kmem cgroup support to kmem_cache_free_bulk
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:55 +0000 (15:57 -0800)]
slub: add missing kmem cgroup support to kmem_cache_free_bulk

Initial implementation missed support for kmem cgroup support in
kmem_cache_free_bulk() call, add this.

If CONFIG_MEMCG_KMEM is not enabled, the compiler should be smart enough
to not add any asm code.

Incoming bulk free objects can belong to different kmem cgroups, and
object free call can happen at a later point outside memcg context.  Thus,
we need to keep the orig kmem_cache, to correctly verify if a memcg object
match against its "root_cache" (s->memcg_params.root_cache).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: fix kmem cgroup bug in kmem_cache_alloc_bulk
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:52 +0000 (15:57 -0800)]
slub: fix kmem cgroup bug in kmem_cache_alloc_bulk

The call slab_pre_alloc_hook() interacts with kmemgc and is not allowed to
be called several times inside the bulk alloc for loop, due to the call to
memcg_kmem_get_cache().

This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.

As suggested by Vladimir Davydov, change slab_post_alloc_hook() to be able
to handle an array of objects.

A subtle detail is, loop iterator "i" in slab_post_alloc_hook() must have
same type (size_t) as size argument.  This helps the compiler to easier
realize that it can remove the loop, when all debug statements inside loop
evaluates to nothing.  Note, this is only an issue because the kernel is
compiled with GCC option: -fno-strict-overflow

In slab_alloc_node() the compiler inlines and optimizes the invocation of
slab_post_alloc_hook(s, flags, 1, &object) by removing the loop and access
object directly.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: optimize bulk slowpath free by detached freelist
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:49 +0000 (15:57 -0800)]
slub: optimize bulk slowpath free by detached freelist

This change focus on improving the speed of object freeing in the
"slowpath" of kmem_cache_free_bulk.

The calls slab_free (fastpath) and __slab_free (slowpath) have been
extended with support for bulk free, which amortize the overhead of
the (locked) cmpxchg_double.

To use the new bulking feature, we build what I call a detached
freelist.  The detached freelist takes advantage of three properties:

 1) the free function call owns the object that is about to be freed,
    thus writing into this memory is synchronization-free.

 2) many freelist's can co-exist side-by-side in the same slab-page
    each with a separate head pointer.

 3) it is the visibility of the head pointer that needs synchronization.

Given these properties, the brilliant part is that the detached
freelist can be constructed without any need for synchronization.  The
freelist is constructed directly in the page objects, without any
synchronization needed.  The detached freelist is allocated on the
stack of the function call kmem_cache_free_bulk.  Thus, the freelist
head pointer is not visible to other CPUs.

All objects in a SLUB freelist must belong to the same slab-page.
Thus, constructing the detached freelist is about matching objects
that belong to the same slab-page.  The bulk free array is scanned is
a progressive manor with a limited look-ahead facility.

Kmem debug support is handled in call of slab_free().

Notice kmem_cache_free_bulk no longer need to disable IRQs. This
only slowed down single free bulk with approx 3 cycles.

Performance data:
 Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz

SLUB fastpath single object quick reuse: 47 cycles(tsc) 11.931 ns

To get stable and comparable numbers, the kernel have been booted with
"slab_merge" (this also improve performance for larger bulk sizes).

Performance data, compared against fallback bulking:

bulk -  fallback bulk            - improvement with this patch
   1 -  62 cycles(tsc) 15.662 ns - 49 cycles(tsc) 12.407 ns- improved 21.0%
   2 -  55 cycles(tsc) 13.935 ns - 30 cycles(tsc) 7.506 ns - improved 45.5%
   3 -  53 cycles(tsc) 13.341 ns - 23 cycles(tsc) 5.865 ns - improved 56.6%
   4 -  52 cycles(tsc) 13.081 ns - 20 cycles(tsc) 5.048 ns - improved 61.5%
   8 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.659 ns - improved 64.0%
  16 -  49 cycles(tsc) 12.412 ns - 17 cycles(tsc) 4.495 ns - improved 65.3%
  30 -  49 cycles(tsc) 12.484 ns - 18 cycles(tsc) 4.533 ns - improved 63.3%
  32 -  50 cycles(tsc) 12.627 ns - 18 cycles(tsc) 4.707 ns - improved 64.0%
  34 -  96 cycles(tsc) 24.243 ns - 23 cycles(tsc) 5.976 ns - improved 76.0%
  48 -  83 cycles(tsc) 20.818 ns - 21 cycles(tsc) 5.329 ns - improved 74.7%
  64 -  74 cycles(tsc) 18.700 ns - 20 cycles(tsc) 5.127 ns - improved 73.0%
 128 -  90 cycles(tsc) 22.734 ns - 27 cycles(tsc) 6.833 ns - improved 70.0%
 158 -  99 cycles(tsc) 24.776 ns - 30 cycles(tsc) 7.583 ns - improved 69.7%
 250 - 104 cycles(tsc) 26.089 ns - 37 cycles(tsc) 9.280 ns - improved 64.4%

Performance data, compared current in-kernel bulking:

bulk - curr in-kernel  - improvement with this patch
   1 -  46 cycles(tsc) - 49 cycles(tsc) - improved (cycles:-3) -6.5%
   2 -  27 cycles(tsc) - 30 cycles(tsc) - improved (cycles:-3) -11.1%
   3 -  21 cycles(tsc) - 23 cycles(tsc) - improved (cycles:-2) -9.5%
   4 -  18 cycles(tsc) - 20 cycles(tsc) - improved (cycles:-2) -11.1%
   8 -  17 cycles(tsc) - 18 cycles(tsc) - improved (cycles:-1) -5.9%
  16 -  18 cycles(tsc) - 17 cycles(tsc) - improved (cycles: 1)  5.6%
  30 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
  32 -  18 cycles(tsc) - 18 cycles(tsc) - improved (cycles: 0)  0.0%
  34 -  78 cycles(tsc) - 23 cycles(tsc) - improved (cycles:55) 70.5%
  48 -  60 cycles(tsc) - 21 cycles(tsc) - improved (cycles:39) 65.0%
  64 -  49 cycles(tsc) - 20 cycles(tsc) - improved (cycles:29) 59.2%
 128 -  69 cycles(tsc) - 27 cycles(tsc) - improved (cycles:42) 60.9%
 158 -  79 cycles(tsc) - 30 cycles(tsc) - improved (cycles:49) 62.0%
 250 -  86 cycles(tsc) - 37 cycles(tsc) - improved (cycles:49) 57.0%

Performance with normal SLUB merging is significantly slower for
larger bulking.  This is believed to (primarily) be an effect of not
having to share the per-CPU data-structures, as tuning per-CPU size
can achieve similar performance.

bulk - slab_nomerge   -  normal SLUB merge
   1 -  49 cycles(tsc) - 49 cycles(tsc) - merge slower with cycles:0
   2 -  30 cycles(tsc) - 30 cycles(tsc) - merge slower with cycles:0
   3 -  23 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:0
   4 -  20 cycles(tsc) - 20 cycles(tsc) - merge slower with cycles:0
   8 -  18 cycles(tsc) - 18 cycles(tsc) - merge slower with cycles:0
  16 -  17 cycles(tsc) - 17 cycles(tsc) - merge slower with cycles:0
  30 -  18 cycles(tsc) - 23 cycles(tsc) - merge slower with cycles:5
  32 -  18 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:4
  34 -  23 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:-1
  48 -  21 cycles(tsc) - 22 cycles(tsc) - merge slower with cycles:1
  64 -  20 cycles(tsc) - 48 cycles(tsc) - merge slower with cycles:28
 128 -  27 cycles(tsc) - 57 cycles(tsc) - merge slower with cycles:30
 158 -  30 cycles(tsc) - 59 cycles(tsc) - merge slower with cycles:29
 250 -  37 cycles(tsc) - 56 cycles(tsc) - merge slower with cycles:19

Joint work with Alexander Duyck.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c

[akpm@linux-foundation.org: BUG_ON -> WARN_ON;return]
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: support for bulk free with SLUB freelists
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:46 +0000 (15:57 -0800)]
slub: support for bulk free with SLUB freelists

Make it possible to free a freelist with several objects by adjusting API
of slab_free() and __slab_free() to have head, tail and an objects counter
(cnt).

Tail being NULL indicate single object free of head object.  This allow
compiler inline constant propagation in slab_free() and
slab_free_freelist_hook() to avoid adding any overhead in case of single
object free.

This allows a freelist with several objects (all within the same
slab-page) to be free'ed using a single locked cmpxchg_double in
__slab_free() and with an unlocked cmpxchg_double in slab_free().

Object debugging on the free path is also extended to handle these
freelists.  When CONFIG_SLUB_DEBUG is enabled it will also detect if
objects don't belong to the same slab-page.

These changes are needed for the next patch to bulk free the detached
freelists it introduces and constructs.

Micro benchmarking showed no performance reduction due to this change,
when debugging is turned off (compiled with CONFIG_SLUB_DEBUG).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoparisc: Map kernel text and data on huge pages
Helge Deller [Sat, 21 Nov 2015 23:07:44 +0000 (00:07 +0100)]
parisc: Map kernel text and data on huge pages

Adjust the linker script and map_pages() to map kernel text and data on
physical 1MB huge/large pages.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Add Huge Page and HUGETLBFS support
Helge Deller [Sat, 21 Nov 2015 23:07:06 +0000 (00:07 +0100)]
parisc: Add Huge Page and HUGETLBFS support

This patch adds huge page support to allow userspace to allocate huge
pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels.
A later patch will add kernel support to map kernel text and data on
huge pages.

The only requirement is, that the kernel needs to be compiled for a
PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support
variable page sizes. 64bit Kernels are compiled for PA2.0 by default.

Technically on parisc multiple physical huge pages may be needed to
emulate standard 2MB huge pages.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Use long branch to do_syscall_trace_exit
Helge Deller [Fri, 20 Nov 2015 10:22:32 +0000 (11:22 +0100)]
parisc: Use long branch to do_syscall_trace_exit

Use the 22bit instead of the 17bit branch instruction on a 64bit kernel
to reach the do_syscall_trace_exit function from the gateway page.
A huge page enabled kernel may need the additional branch distance bits.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Increase initial kernel mapping to 32MB on 64bit kernel
Helge Deller [Fri, 20 Nov 2015 10:17:27 +0000 (11:17 +0100)]
parisc: Increase initial kernel mapping to 32MB on 64bit kernel

For the 64bit kernel the initially 16 MB kernel memory might become too
small if you build a kernel with many modules built-in and with kernel
text and data areas mapped on huge pages.

This patch increases the initial mapping to 32MB for 64bit kernels and
keeps 16MB for 32bit kernels.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Initialize the fault vector earlier in the boot process.
Helge Deller [Fri, 20 Nov 2015 09:50:01 +0000 (10:50 +0100)]
parisc: Initialize the fault vector earlier in the boot process.

A fault vector on parisc needs to be 2K aligned.  Furthermore the
checksum of the fault vector needs to sum up to 0 which is being
calculated and written at runtime.

Up to now we aligned both PA20 and PA11 fault vectors on the same 4K
page in order to easily write the checksum after having mapped the
kernel read-only (by mapping this page only as read-write).
But when we want to map the kernel text and data on huge pages this
makes things harder.
So, simplify it by aligning both fault vectors on 2K boundries and write
the checksum before we map the page read-only.

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Add defines for Huge page support
Helge Deller [Fri, 20 Nov 2015 14:46:52 +0000 (15:46 +0100)]
parisc: Add defines for Huge page support

Huge pages on parisc will have the same size as one pmd table, which
is on a 64bit kernel 2MB on a kernel with 4K kernel page sizes, and
on a 32bit kernel 4MB when used with 4K kernel pages.

Since parisc does not physically supports 2MB huge page sizes, emulate
it with two consecutive 1MB page sizes instead. Keeping the same huge
page size as one pmd will allow us to add transparent huge page support
later on.

Bit 21 in the pte flags was unused and will now be used to mark a page
as huge page (_PAGE_HPAGE_BIT).

Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoparisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h
Helge Deller [Sun, 22 Nov 2015 11:14:14 +0000 (12:14 +0100)]
parisc: Drop unused MADV_xxxK_PAGES flags from asm/mman.h

Drop the MADV_xxK_PAGES flags, which were never used and were from a proposed
API which was never integrated into the generic Linux kernel code.

Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
8 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 21 Nov 2015 18:49:13 +0000 (10:49 -0800)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "A bunch of fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG
  slub: avoid irqoff/on in bulk allocation
  slub: create new ___slab_alloc function that can be called with irqs disabled
  mm: fix up sparse warning in gfpflags_allow_blocking
  ocfs2: fix umask ignored issue
  PM/OPP: add entry in MAINTAINERS
  kernel/panic.c: turn off locks debug before releasing console lock
  kernel/signal.c: unexport sigsuspend()
  kasan: fix kmemleak false-positive in kasan_module_alloc()
  fat: fix fake_offset handling on error path
  mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes
  mm/page-writeback.c: initialize m_dirty to avoid compile warning
  various: fix pci_set_dma_mask return value checking
  mm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
  mm: vmalloc: don't remove inexistent guard hole in remove_vm_area()
  tools/vm/page-types.c: support KPF_IDLE
  ncpfs: don't allow negative timeouts
  configfs: allow dynamic group creation
  MAINTAINERS: add Moritz as reviewer for FPGA Manager Framework
  slab.h: sprinkle __assume_aligned attributes

8 years agoMerge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 21 Nov 2015 18:26:24 +0000 (10:26 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two timer fixlets from Arnd:

   - Use proper constant size in the FSL timer driver
   - Prevent a build error for legacy platforms"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource: Disallow drivers for ARCH_USES_GETTIMEOFFSET
  clocksource/fsl: Avoid harmless 64-bit warnings

8 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 21 Nov 2015 18:19:15 +0000 (10:19 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:
 "Three fixes for the ARM GIC interrupt controller from Marc addressing
  various shortcomings versus boot initialization and suspend/resume"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic: Add save/restore of the active state
  irqchip/gic: Clear enable bits before restoring them
  irqchip/gic: Make sure all interrupts are deactivated at boot

8 years agoMerge tag 'for-linus-20151120' of git://git.infradead.org/linux-mtd
Linus Torvalds [Sat, 21 Nov 2015 17:52:07 +0000 (09:52 -0800)]
Merge tag 'for-linus-20151120' of git://git.infradead.org/linux-mtd

Pull MTD fixes from Brian Norris:

 - MAINTAINERS updates for brcmnand driver

 - Fix reboot hangs seen when multiple NAND flash chips are registered
   with the same controller

 - Fix build issues on jz4740 NAND driver; the error was introduced in
   4.3, so I guess nobody really cared, but we might as well fix it

* tag 'for-linus-20151120' of git://git.infradead.org/linux-mtd:
  MAINTAINERS: brcmnand: Add co-maintainer for Broadcom SoCs
  MAINTAINERS: brcmnand: Add Broadcom internal mailing-list
  mtd: nand: fix shutdown/reboot for multi-chip systems
  mtd: jz4740_nand: fix build on jz4740 after removing gpio.h

8 years agoserial: export fsl8250_handle_irq
Arnd Bergmann [Mon, 16 Nov 2015 15:48:09 +0000 (16:48 +0100)]
serial: export fsl8250_handle_irq

fsl8250_handle_irq is now used by the of_serial driver, and that fails
if it is a loadable module:

ERROR: "fsl8250_handle_irq" [drivers/tty/serial/of_serial.ko] undefined!

This exports the symbol to avoid randconfig errors.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: d43b54d269d2 ("serial: Enable Freescale 16550 workaround on arm")
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agoserial: 8250_mid: Add missing dependency
Heikki Krogerus [Thu, 12 Nov 2015 13:21:23 +0000 (15:21 +0200)]
serial: 8250_mid: Add missing dependency

8250_mid uses rational_best_approximation() function, so the
driver needs to select CONFIG_RATIONAL option.

This fixes build error when CONFIG_RATIONAL is not enabled:

drivers/built-in.o: In function `mid8250_set_termios':
8250_mid.c:(.text+0x10169a): undefined reference to `rational_best_approximation'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agotty: audit: Fix audit source
Peter Hurley [Sun, 8 Nov 2015 13:52:31 +0000 (08:52 -0500)]
tty: audit: Fix audit source

The data to audit/record is in the 'from' buffer (ie., the input
read buffer).

Fixes: 72586c6061ab ("n_tty: Fix auditing support for cannonical mode")
Cc: stable <stable@vger.kernel.org> # 4.1+
Cc: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Laura Abbott <labbott@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agoserial: etraxfs-uart: Fix crash
Guenter Roeck [Mon, 2 Nov 2015 02:32:56 +0000 (18:32 -0800)]
serial: etraxfs-uart: Fix crash

Since commit 7d8c70d8048c ("serial: mctrl-gpio: rename init function"),
crisv32 either do not build or crash as follows.

Unable to handle kernel NULL pointer dereference
Linux 4.3.0-rc7-next-20151101 #1 Sun Nov 1 11:41:28 PST 2015
...
Call Trace: [<c0004a0e>] show_stack+0x0/0x9e
[<c004c0c0>] printk+0x0/0x2c
[<c00059d4>] show_registers+0x14a/0x1c2
[<c004c0c0>] printk+0x0/0x2c
[<c0004b52>] die_if_kernel+0x7c/0x9e
[<c0005346>] do_page_fault+0x32e/0x3e6
[<c01dc59c>] of_get_property+0x0/0x2c
[<c01e0558>] of_irq_parse_raw+0x12a/0x376
[<c01dc59c>] of_get_property+0x0/0x2c
[<c0053aca>] get_page_from_freelist+0x73e/0x856
[<c01dc59c>] of_get_property+0x0/0x2c
[<c0008912>] d_mmu_refill+0x10a/0x112
[<c01b488c>] devm_kmalloc+0x40/0x56
[<c01b47d0>] add_dr+0xc/0x1c
[<c01b4800>] devm_add_action+0x2/0x4e
[<c01abdbc>] mctrl_gpio_init_noauto+0x1c/0x76
[<c01abf9e>] mctrl_gpio_init+0x22/0x110

The function call in the etraxfs-uart driver was not renamed,
possibly due to interference with commit 7b9c5162c182 ("serial:
etraxfs-uart: use mctrl_gpio helpers for handling modem signals").

Fixes: 7d8c70d8048c ("serial: mctrl-gpio: rename init function")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Niklas Cassel <nks@flawful.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agoserial: fsl_lpuart: Fix earlycon support
Peter Hurley [Tue, 20 Oct 2015 13:55:21 +0000 (09:55 -0400)]
serial: fsl_lpuart: Fix earlycon support

Earlycon support for Freescale lpuart should only be enabled when
console support is enabled.

Fixes: 1d59b382f1c4 ("serial: fsl_lpuart: add earlycon support")
Acked-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agobcm63xx_uart: Use the device name when registering an interrupt
Simon Arlott [Sun, 15 Nov 2015 16:14:32 +0000 (16:14 +0000)]
bcm63xx_uart: Use the device name when registering an interrupt

Use the device name when registering an interrupt so that multiple
ports don't all have the same interrupt name.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agotty: Fix direct use of tty buffer work
Peter Hurley [Sun, 8 Nov 2015 12:53:06 +0000 (07:53 -0500)]
tty: Fix direct use of tty buffer work

Recent abstraction of tty buffer work introduced api to manage
tty input kworker; use it.

Fixes: e176058f0de5 ("tty: Abstract tty buffer work")
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agotty: Fix tty_send_xchar() lock order inversion
Peter Hurley [Wed, 11 Nov 2015 13:03:54 +0000 (08:03 -0500)]
tty: Fix tty_send_xchar() lock order inversion

The correct lock order is atomic_write_lock => termios_rwsem, as
established by tty_write() => n_tty_write().

Fixes: c274f6ef1c666 ("tty: Hold termios_rwsem for tcflow(TCIxxx)")
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 years agoslub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG
Jesper Dangaard Brouer [Fri, 20 Nov 2015 23:57:41 +0000 (15:57 -0800)]
slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG

The #ifdef of CONFIG_SLUB_DEBUG is located very far from the associated
#else.  For readability mark it with a comment.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: avoid irqoff/on in bulk allocation
Christoph Lameter [Fri, 20 Nov 2015 23:57:38 +0000 (15:57 -0800)]
slub: avoid irqoff/on in bulk allocation

Use the new function that can do allocation while interrupts are disabled.
Avoids irq on/off sequences.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoslub: create new ___slab_alloc function that can be called with irqs disabled
Christoph Lameter [Fri, 20 Nov 2015 23:57:35 +0000 (15:57 -0800)]
slub: create new ___slab_alloc function that can be called with irqs disabled

Bulk alloc needs a function like that because it enables interrupts before
calling __slab_alloc which promptly disables them again using the expensive
local_irq_save().

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: fix up sparse warning in gfpflags_allow_blocking
Jeff Layton [Fri, 20 Nov 2015 23:57:32 +0000 (15:57 -0800)]
mm: fix up sparse warning in gfpflags_allow_blocking

sparse says:

    include/linux/gfp.h:274:26: warning: incorrect type in return expression (different base types)
    include/linux/gfp.h:274:26:    expected bool
    include/linux/gfp.h:274:26:    got restricted gfp_t

...add a forced cast to silence the warning.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoocfs2: fix umask ignored issue
Junxiao Bi [Fri, 20 Nov 2015 23:57:30 +0000 (15:57 -0800)]
ocfs2: fix umask ignored issue

New created file's mode is not masked with umask, and this makes umask not
work for ocfs2 volume.

Fixes: 702e5bc ("ocfs2: use generic posix ACL infrastructure")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoPM/OPP: add entry in MAINTAINERS
Viresh Kumar [Fri, 20 Nov 2015 23:57:27 +0000 (15:57 -0800)]
PM/OPP: add entry in MAINTAINERS

Add entry for operating performance points into MAINTAINERS file.  This
will also allow get_maintainers to list OPP stakeholders properly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernel/panic.c: turn off locks debug before releasing console lock
Vitaly Kuznetsov [Fri, 20 Nov 2015 23:57:24 +0000 (15:57 -0800)]
kernel/panic.c: turn off locks debug before releasing console lock

Commit 08d78658f393 ("panic: release stale console lock to always get the
logbuf printed out") introduced an unwanted bad unlock balance report when
panic() is called directly and not from OOPS (e.g.  from out_of_memory()).
The difference is that in case of OOPS we disable locks debug in
oops_enter() and on direct panic call nobody does that.

Fixes: 08d78658f393 ("panic: release stale console lock to always get the logbuf printed out")
Reported-by: kernel test robot <ying.huang@linux.intel.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Baoquan He <bhe@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokernel/signal.c: unexport sigsuspend()
Richard Weinberger [Fri, 20 Nov 2015 23:57:21 +0000 (15:57 -0800)]
kernel/signal.c: unexport sigsuspend()

sigsuspend() is nowhere used except in signal.c itself, so we can mark it
static do not pollute the global namespace.

But this patch is more than a boring cleanup patch, it fixes a real issue
on UserModeLinux.  UML has a special console driver to display ttys using
xterm, or other terminal emulators, on the host side.  Vegard reported
that sometimes UML is unable to spawn a xterm and he's facing the
following warning:

  WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0()

It turned out that this warning makes absolutely no sense as the UML
xterm code calls sigsuspend() on the host side, at least it tries.  But
as the kernel itself offers a sigsuspend() symbol the linker choose this
one instead of the glibc wrapper.  Interestingly this code used to work
since ever but always blocked signals on the wrong side.  Some recent
kernel change made the WARN_ON() trigger and uncovered the bug.

It is a wonderful example of how much works by chance on computers. :-)

Fixes: 68f3f16d9ad0f1 ("new helper: sigsuspend()")
Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agokasan: fix kmemleak false-positive in kasan_module_alloc()
Andrey Ryabinin [Fri, 20 Nov 2015 23:57:18 +0000 (15:57 -0800)]
kasan: fix kmemleak false-positive in kasan_module_alloc()

Kmemleak reports the following leak:

unreferenced object 0xfffffbfff41ea000 (size 20480):
comm "modprobe", pid 65199, jiffies 4298875551 (age 542.568s)
hex dump (first 32 bytes):
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
backtrace:
  [<ffffffff82354f5e>] kmemleak_alloc+0x4e/0xc0
  [<ffffffff8152e718>] __vmalloc_node_range+0x4b8/0x740
  [<ffffffff81574072>] kasan_module_alloc+0x72/0xc0
  [<ffffffff810efe68>] module_alloc+0x78/0xb0
  [<ffffffff812f6a24>] module_alloc_update_bounds+0x14/0x70
  [<ffffffff812f8184>] layout_and_allocate+0x16f4/0x3c90
  [<ffffffff812faa1f>] load_module+0x2ff/0x6690
  [<ffffffff813010b6>] SyS_finit_module+0x136/0x170
  [<ffffffff8239bbc9>] system_call_fastpath+0x16/0x1b
  [<ffffffffffffffff>] 0xffffffffffffffff

kasan_module_alloc() allocates shadow memory for module and frees it on
module unloading.  It doesn't store the pointer to allocated shadow memory
because it could be calculated from the shadowed address, i.e.
kasan_mem_to_shadow(addr).

Since kmemleak cannot find pointer to allocated shadow, it thinks that
memory leaked.

Use kmemleak_ignore() to tell kmemleak that this is not a leak and shadow
memory doesn't contain any pointers.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agofat: fix fake_offset handling on error path
OGAWA Hirofumi [Fri, 20 Nov 2015 23:57:15 +0000 (15:57 -0800)]
fat: fix fake_offset handling on error path

For the root directory, .  and ..  are faked (using dir_emit_dots()) and
ctx->pos is reset from 2 to 0.

A corrupted root directory could cause fat_get_entry() to fail, but
->iterate() (fat_readdir()) reports progress to the VFS (with ctx->pos
rewound to 0), so any following calls to ->iterate() continue to return
the same entries again and again.

The result is that userspace will never see the end of the directory,
causing e.g.  'ls' to hang in a getdents() loop.

[hirofumi@mail.parknet.co.jp: cleanup and make sure to correct fake_offset]
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Richard Weinberger <richard.weinberger@gmail.com>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes
Mike Kravetz [Fri, 20 Nov 2015 23:57:13 +0000 (15:57 -0800)]
mm/hugetlbfs: fix bugs in fallocate hole punch of areas with holes

Hugh Dickins pointed out problems with the new hugetlbfs fallocate hole
punch code.  These problems are in the routine remove_inode_hugepages and
mostly occur in the case where there are holes in the range of pages to be
removed.  These holes could be the result of a previous hole punch or
simply sparse allocation.  The current code could access pages outside the
specified range.

remove_inode_hugepages handles both hole punch and truncate operations.
Page index handling was fixed/cleaned up so that the loop index always
matches the page being processed.  The code now only makes a single pass
through the range of pages as it was determined page faults could not race
with truncate.  A cond_resched() was added after removing up to
PAGEVEC_SIZE pages.

Some totally unnecessary code in hugetlbfs_fallocate() that remained from
early development was also removed.

Tested with fallocate tests submitted here:
http://librelist.com/browser//libhugetlbfs/2015/6/25/patch-tests-add-tests-for-fallocate-system-call/
And, some ftruncate tests under development

Fixes: b5cec28d36f5 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Hillf Danton" <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.3]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm/page-writeback.c: initialize m_dirty to avoid compile warning
Yang Shi [Fri, 20 Nov 2015 23:57:10 +0000 (15:57 -0800)]
mm/page-writeback.c: initialize m_dirty to avoid compile warning

When building kernel with gcc 5.2, the below warning is raised:

  mm/page-writeback.c: In function 'balance_dirty_pages.isra.10':
  mm/page-writeback.c:1545:17: warning: 'm_dirty' may be used uninitialized in this function [-Wmaybe-uninitialized]
     unsigned long m_dirty, m_thresh, m_bg_thresh;

The m_dirty{thresh, bg_thresh} are initialized in the block of "if
(mdtc)", so if mdts is null, they won't be initialized before being used.
Initialize m_dirty to zero, also initialize m_thresh and m_bg_thresh to
keep consistency.

They are used later by if condition: !mdtc || m_dirty <=
dirty_freerun_ceiling(m_thresh, m_bg_thresh)

If mdtc is null, dirty_freerun_ceiling will not be called at all, so the
initialization will not change any behavior other than just ceasing the
compile warning.

(akpm: the patch actually reduces .text size by ~20 bytes on gcc-4.x.y)

[akpm@linux-foundation.org: add comment]
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agovarious: fix pci_set_dma_mask return value checking
Christoph Hellwig [Fri, 20 Nov 2015 23:57:07 +0000 (15:57 -0800)]
various: fix pci_set_dma_mask return value checking

pci_set_dma_mask returns a negative errno value, not a bool like
pci_dma_supported.  This of course was just a giant test for attention :)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Jongman Heo <jongman.heo@samsung.com>
Tested-by: Jongman Heo <jongman.heo@samsung.com> [pcnet32]
Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Antti Palosaari <crope@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390
Jason J. Herne [Fri, 20 Nov 2015 23:57:04 +0000 (15:57 -0800)]
mm: loosen MADV_NOHUGEPAGE to enable Qemu postcopy on s390

MADV_NOHUGEPAGE processing is too restrictive.  kvm already disables
hugepage but hugepage_madvise() takes the error path when we ask to turn
on the MADV_NOHUGEPAGE bit and the bit is already on.  This causes Qemu's
new postcopy migration feature to fail on s390 because its first action is
to madvise the guest address space as NOHUGEPAGE.  This patch modifies the
code so that the operation succeeds without error now.

For consistency reasons do the same for MADV_HUGEPAGE.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agomm: vmalloc: don't remove inexistent guard hole in remove_vm_area()
Jerome Marchand [Fri, 20 Nov 2015 23:57:02 +0000 (15:57 -0800)]
mm: vmalloc: don't remove inexistent guard hole in remove_vm_area()

Commit 71394fe50146 ("mm: vmalloc: add flag preventing guard hole
allocation") missed a spot.  Currently remove_vm_area() decreases vm->size
to "remove" the guard hole page, even when it isn't present.  All but one
users just free the vm_struct rigth away and never access vm->size anyway.

Don't touch the size in remove_vm_area() and have __vunmap() use the
proper get_vm_area_size() helper.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agotools/vm/page-types.c: support KPF_IDLE
Naoya Horiguchi [Fri, 20 Nov 2015 23:56:59 +0000 (15:56 -0800)]
tools/vm/page-types.c: support KPF_IDLE

PageIdle is exported in include/uapi/linux/kernel-page-flags.h, so let's
make page-types.c tool handle it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoncpfs: don't allow negative timeouts
Dan Carpenter [Fri, 20 Nov 2015 23:56:56 +0000 (15:56 -0800)]
ncpfs: don't allow negative timeouts

This code causes a static checker warning because it's a user controlled
variable where we cap the upper bound but not the lower bound.  Let's
return an -EINVAL for negative timeouts.

[akpm@linux-foundation.org: remove unneeded `else']
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: David Howells <dhowells@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8 years agoconfigfs: allow dynamic group creation
Daniel Baluta [Fri, 20 Nov 2015 23:56:53 +0000 (15:56 -0800)]
configfs: allow dynamic group creation

This patchset introduces IIO software triggers, offers a way of configuring
them via configfs and adds the IIO hrtimer based interrupt source to be used
with software triggers.

The architecture is now split in 3 parts, to remove all IIO trigger specific
parts from IIO configfs core:

(1) IIO configfs - creates the root of the IIO configfs subsys.
(2) IIO software triggers - software trigger implementation, dynamically
    creating /config/iio/triggers group.
(3) IIO hrtimer trigger - is the first interrupt source for software triggers
    (with syfs to follow). Each trigger type can implement its own set of
    attributes.

Lockdep seems to be happy with the locking in configfs patch.

This patch (of 5):

We don't want to hardcode default groups at subsystem
creation time. We export:
* configfs_register_group
* configfs_unregister_group
to allow drivers to programatically create/destroy groups
later, after module init time.

This is needed for IIO configfs support.

(akpm: the other 4 patches to be merged via the IIO tree)

Signed-off-by: Daniel Baluta <daniel.baluta@intel.com>
Suggested-by: Lars-Peter Clausen <lars@metafoo.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Adriana Reus <adriana.reus@intel.com>
Cc: Cristina Opriceana <cristina.opriceana@gmail.com>
Cc: Peter Meerwald <pmeerw@pmeerw.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>