git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/log

perf tools: use XSI-complaint version of strerror_r() instead of GNU-specific

Perf uses GNU-specific version of strerror_r(). The GNU-specific strerror_r()
returns a pointer to a string containing the error message. This may be either
a pointer to a string that the function stores in buf, or a pointer to some
(immutable) static string (in which case buf is unused).

In glibc-2.16 GNU version was marked with attribute warn_unused_result. It
triggers few warnings in perf:

util/target.c: In function ‘perf_target__strerror’:
util/target.c:114:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]
ui/browsers/hists.c: In function ‘hist_browser__dump’:
ui/browsers/hists.c:981:13: error: ignoring return value of ‘strerror_r’, declared with attribute warn_unused_result [-Werror=unused-result]

They are bugs.

Let's fix strerror_r() usage.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Ulrich Drepper <drepper@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/20120723210654.GA25248@shutemov.name
[ committer note: s/assert/BUG_ON/g ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Make the breakpoint events sample period default to 1

There have one problem about hw_breakpoint perf event, as watched, the
events reported to userspace is not correctly, sometime one trigger
bp_event report several events, sometime bp_event cannot go through to
user.

The root cause is attr->freq is 1 passed to kernel defaultly in bp
events, this make kernel calculate event period not as expect, make
sample period to 1 will change attr->freq to 0, to fix this problem.

This patch is similar with commit f92128 about tracepoint events:
perf: Make the trace events sample period default to 1

Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CACV3sbLF8taiCq_VYW-sgRJyupeMzg58C7ZXfMe3xZUiH_Mx6w@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf test: Add dso data caching tests

Adding automated test for DSO data reading. Testing raw/cached reads
from different file/cache locations.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-18-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf symbols: Add dso data caching

Adding dso data caching so we don't need to open/read/close, each time
we want dso data.

The DSO data caching affects following functions:
  dso__data_read_offset
  dso__data_read_addr

Each DSO read tries to find the data (based on offset) inside the cache.
If it's not present it fills the cache from file, and returns the data.
If it is present, data are returned with no file read.

Each data read is cached by reading cache page sized/aligned amount of
DSO data. The cache page size is hardcoded to 4096.  The cache is using
RB tree with file offset as a sort key.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-17-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf symbols: Add interface to read DSO image data

Adding following interface for DSO object to allow
reading of DSO image data:

  dso__data_fd
    - opens DSO and returns file descriptor
      Binary types are used to locate/open DSO in following order:
        DSO_BINARY_TYPE__BUILD_ID_CACHE
        DSO_BINARY_TYPE__SYSTEM_PATH_DSO
      In other word we first try to open DSO build-id path,
      and if that fails we try to open DSO system path.

  dso__data_read_offset
    - reads DSO data from specified offset

  dso__data_read_addr
    - reads DSO data from specified address/map.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-11-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf symbols: Factor DSO symtab types to generic binary types

Adding interface to access DSOs so it could be used
from another place.

New DSO binary type is added - making current SYMTAB__*
types more general:
DSO_BINARY_TYPE__* = SYMTAB__*

Following function is added to return path based on the specified
binary type:
dso__binary_type_file

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1342959280-5361-10-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf hists: Print newline between hists callchains

Tiny cosmetic fix. The lack of a newline between hists callchains was
looking slightly messy.

Before:

     0.24%      swapper  [kernel.kallsyms]  [k] _raw_spin_lock_irq
                |
                --- _raw_spin_lock_irq
                    run_timer_softirq
                    __do_softirq
                    call_softirq
                    do_softirq
                    irq_exit
                    smp_apic_timer_interrupt
                    apic_timer_interrupt
                    default_idle
                    amd_e400_idle
                    cpu_idle
                    start_secondary
     0.10%         perf  [kernel.kallsyms]  [k] lock_is_held
                   |
                   --- lock_is_held
                       __might_sleep
                       mutex_lock_nested
                       perf_event_for_each_child
                       perf_ioctl
                       do_vfs_ioctl
                       sys_ioctl
                       system_call_fastpath
                       ioctl
                       cmd_record
                       run_builtin
                       main
                       __libc_start_main

After:

     0.24%      swapper  [kernel.kallsyms]  [k] _raw_spin_lock_irq
                |
                --- _raw_spin_lock_irq
                    run_timer_softirq
                    __do_softirq
                    call_softirq
                    do_softirq
                    irq_exit
                    smp_apic_timer_interrupt
                    apic_timer_interrupt
                    default_idle
                    amd_e400_idle
                    cpu_idle
                    start_secondary

     0.10%         perf  [kernel.kallsyms]  [k] lock_is_held
                   |
                   --- lock_is_held
                       __might_sleep
                       mutex_lock_nested
                       perf_event_for_each_child
                       perf_ioctl
                       do_vfs_ioctl
                       sys_ioctl
                       system_call_fastpath
                       ioctl
                       cmd_record
                       run_builtin
                       main
                       __libc_start_main

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Fix trace events storms due to weight demux

Trace events have a period (weight) of 1 by default. This can be
overriden on events definition by using the __perf_count() macro.

For example, the sched_stat_runtime() is weighted with the runtime of
the task that fired the event.

By default, perf handles such weighted event by dividing it into
individual events carrying a weight of 1. For example if
sched_stat_runtime is fired and the task has run 5000000 nsecs, perf
divides it into 5000000 events in the buffer.

This behaviour makes weighted events unusable because they quickly
fullfill the buffers and we lose most events.

The commit 5d81e5cfb37a174e8ddc0413e2e70cdf05807ace ("events: Don't
divide events if it has field period") solves this problem by sending
only one event when PERF_SAMPLE_PERIOD flag is set. The weight is
carried in the sample itself such that we don't need to demultiplex it
anymore.

This patch provides the last missing piece to use this feature by
setting PERF_SAMPLE_PERIOD from perf tools when we deal with trace
events.

Before:
$ ./perf record -e sched:* -a sleep 1
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 1.619 MB perf.data (~70749 samples) ]
Warning:
Processed 16909 events and lost 1 chunks!

Check IO/CPU overload!

$ ./perf script
perf  1894 [003]   824.898327: sched_migrate_task: comm=perf pid=1898 prio=120 orig_cpu=2 dest_cpu=0
perf  1894 [003]   824.898335: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898336: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898337: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898338: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898339: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898340: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
perf  1894 [003]   824.898341: sched_stat_sleep: comm=perf pid=1898 delay=113179500 [ns]
[...]

After:
$ ./perf record -e sched:* -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.074 MB perf.data (~3228 samples) ]

$ ./perf script

perf  1461 [000]   554.286957: sched_migrate_task: comm=perf pid=1465 prio=120 orig_cpu=3 dest_cpu=1
perf  1461 [000]   554.286964: sched_stat_sleep: comm=perf pid=1465 delay=133047190 [ns]
perf  1461 [000]   554.286967: sched_wakeup: comm=perf pid=1465 prio=120 success=1 target_cpu=001
swapper     0 [001]   554.286976: sched_stat_wait: comm=perf pid=1465 delay=0 [ns]
swapper     0 [001]   554.286983: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf
[...]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf hists: Return correct number of characters printed in callchain

Include the omitted number of characters printed for the first entry.

Not that it really matters because nobody seem to care about the number
of printed characters for now. But just in case.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342631456-7233-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf tools: Dump exclude_{guest,host}, precise_ip header info too

Adds the attributes to the event line in the header dump.
From:
event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
      config2 = 0x0, excl_usr = 0, excl_kern = 0, ...
to
  event : name = cycles, type = 0, config = 0x0, config1 = 0x0,
      config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
      excl_guest = 0, precise_ip = 0, ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1342826756-64663-8-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf kvm: Limit repetitive guestmount message to once per directory

After 7ed97ad use of the guestmount option without a subdir for *each*
VM generates an error message for each sample related to that VM. Once
per VM is enough.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-7-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf kvm: Fix bug resolving guest kernel syms

Guest kernel symbols are not resolved despite passing the information
needed to resolve them. e.g.,

perf kvm --guest --guestmount=/tmp/guest-mount record -a -- sleep 1
perf kvm --guest --guestmount=/tmp/guest-mount report --stdio

    36.55%  [guest/11399]  [unknown]         [g] 0xffffffff81600bc8
    33.19%  [guest/10474]  [unknown]         [g] 0x00000000c0116e00
    30.26%  [guest/11094]  [unknown]         [g] 0xffffffff8100a288
    43.69%  [guest/10474]  [unknown]         [g] 0x00000000c0103d90
    37.38%  [guest/11399]  [unknown]         [g] 0xffffffff81600bc8
    12.24%  [guest/11094]  [unknown]         [g] 0xffffffff810aa91d
     6.69%  [guest/11094]  [unknown]         [u] 0x00007fa784d721c3

which is just pathetic.

After a maddening 2 days sifting through perf minutia I found it --
id_hdr_size is not initialized for guest machines. This shows up on the
report side as random garbage for the cpu and timestamp, e.g.,

29816 7310572949125804849 0x1ac0 [0x50]: PERF_RECORD_MMAP ...

That messes up the sample sorting such that synthesized guest maps are
processed last.

With this patch you get a much more helpful report:

  12.11%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] irqtime_account_process_tick
  10.58%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] run_timer_softirq
   6.95%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] printk_needs_cpu
   6.50%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] do_timer
   6.45%  [guest/11399]  [guest.kernel.kallsyms.11399]  [g] idle_balance
   4.90%  [guest/11094]  [guest.kernel.kallsyms.11094]  [g] native_read_tsc
    ...

v2:
- changed rbtree walk to use rb_first per Namhyung's suggestion

Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-5-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf kvm: Guest userspace samples should not be lumped with host uspace

e.g., perf kvm --host  --guest report -i perf.data --stdio -D
shows:

1 599127912065356 0x143b8 [0x48]: PERF_RECORD_SAMPLE(IP, 5): 5671/5676: 0x7fdf95a061c0 period: 1 addr: 0
... chain: nr:2
.....  0: ffffffffffffff80
.....  1: fffffffffffffe00
... thread: qemu-kvm:5671
...... dso: <not found>

(IP, 5) means sample in guest userspace. Those samples should not be lumped
into the VMM's host thread. i.e, the report output:

    56.86%  qemu-kvm  [unknown]         [u] 0x00007fdf95a061c0

With this patch the output emphasizes it is a guest userspace hit:

    56.86%  [guest/5671]  [unknown]         [u] 0x00007fdf95a061c0

Looking at 3 VMs (2 64-bit, 1 32-bit) with each running a CPU bound
process (openssl speed), perf report currently shows:

  93.84%  117726   qemu-kvm  [unknown]   [u] 0x00007fd7dcaea8e5

which is wrong. With this patch you get:

  31.50%   39258   [guest/18772]  [unknown]   [u] 0x00007fd7dcaea8e5
  31.50%   39236   [guest/11230]  [unknown]   [u] 0x0000000000a57340
  30.84%   39232   [guest/18395]  [unknown]   [u] 0x00007f66f641e107

Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-4-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf kvm: Set name for VM process in guest machine

COMM events are not generated in the context of a guest machine, so the
thread name is never set for the VMM process. For example, the qemu-kvm
name applies to the process in the host machine, not the guest machine.
So, samples for guest machines are currently displayed as:

99.67% :5671 [unknown] [g] 0xffffffff81366b41

where 5671 is the pid of the VMM. With this patch the samples in the guest
machine are shown as:

18.43% [guest/5671] [unknown] [g] 0xffffffff810d68b7

Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-3-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

perf symbols: Add machine id to modules debug message

Current debug message is:
Problems creating module maps, continuing anyway...

When running multiple VMs it would be nice to know which machine the
message is referring to:

$ perf kvm --guest --guestmount=/tmp/guest-mount record -av -- sleep 10
Problems creating module maps for guest 6613, continuing anyway...

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1342826756-64663-2-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core

Pull tracing fix from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'linus' into perf/core

Pick up the latest ring-buffer fixes, before applying a new fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) IPVS oops'ers:
   a) Should not reset skb->nf_bridge in forwarding hook (Lin Ming)
   b) 3.4 commit can cause ip_vs_control_cleanup to be invoked after
      the ipvs_core_ops are unregistered during rmmod (Julian ANastasov)

2) ixgbevf bringup failure can crash in TX descriptor cleanup
    (Alexander Duyck)

3) AX25 switch missing break statement hoses ROSE sockets (Alan Cox)

4) CAIF accesses freed per-net memory (Sjur Brandeland)

5) Network cgroup code has out-or-bounds accesses (Eric DUmazet), and
    accesses freed memory (Gao Feng)

6) Fix a crash in SCTP reported by Dave Jones caused by freeing an
    association still on a list (Neil HOrman)

7) __netdev_alloc_skb() regresses on GFP_DMA using drivers because that
    GFP flag is not being retained for the allocation (Eric Dumazet).

8) Missing NULL hceck in sch_sfb netlink message parsing (Alan Cox)

9) bnx2 crashes because TX index iteration is not bounded correctly
    (Michael Chan)

10) IPoIB generates warnings in TCP queue collapsing (via
    skb_try_coalesce) because it does not set skb->truesize correctly
    (Eric Dumazet)

11) vlan_info objects leak for the implicit vlan with ID 0 (Amir
    Hanania)

12) A fix for TX time stamp handling in gianfar does not transfer socket
    ownership from one packet to another correctly, resulting in a
    socket write space imbalance (Eric Dumazet)

13) Julia Lawall found several cases where we do a list iteration, and
    then at the loop termination unconditionally assume we ended up with
    real list object, rather than the list head itself (CNIC, RXRPC,
    mISDN).

14) The bonding driver handles procfs moving incorrectly when a device
    it manages is moved from one namespace to another (Eric Biederman)

15) Missing memory barriers in stmmac descriptor accesses result in
    various crashes (Deepak Sikri)

16) Fix handling of broadcast packets in batman-adv (Simon Wunderlich)

17) Properly check the sanity of sendmsg() lengths in ieee802154's
    dgram_sendmsg().  Dave Jones and others have hit and reported this
    bug (Sasha Levin)

18) Some drivers (b44 and b43legacy) on 64-bit machines stopped working
    because of how netdev_alloc_skb() was adjusted.  Such drivers should
    now use alloc_skb() for obtaining bounce buffers.  (Eric Dumazet)

19) atl1c mis-managed it's link state in that it stops the queue by hand
    on link down.  The generic networking takes care of that and this
    double stop locks the queue down.  So simply removing the driver's
    queue stop call fixes the problem (Cloud Ren)

20) Fix out-of-memory due to mis-accounting in net_em packet scheduler
    (Eric Dumazet)

21) If DCB and SR-IOV are configured at the same time in IXGBE the chip
    will hang because this is not supported (Alexander Duyck)

22) A commit to stop drivers using netdev->base_addr broke the CNIC
    driver (Michael Chan)

23) Timeout regression in ipset caused by an attempt to fix an overflow
    bug (Jozsef Kadlecsik).

24) mac80211 minstrel code allocates memory using incorrect size
    (Thomas Huehn)

25) llcp_sock_getname() needs to check for a NULL device otherwise we
    OOPS (Sasha Levin)

26) mwifiex leaks memory (Bing Zhao)

27) Propagate iwlwifi fix to iwlegacy, even when we're not associated
    we need to monitor for stuck queues in the watchdog handler
    (Stanislaw Geuszka)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  ipvs: fix oops in ip_vs_dst_event on rmmod
  ipvs: fix oops on NAT reply in br_nf context
  ixgbevf: Fix panic when loading driver
  ax25: Fix missing break
  MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
  caif: Fix access to freed pernet memory
  net: cgroup: fix access the unallocated memory in netprio cgroup
  ixgbevf: Prevent RX/TX statistics getting reset to zero
  sctp: Fix list corruption resulting from freeing an association on a list
  net: respect GFP_DMA in __netdev_alloc_skb()
  e1000e: fix test for PHY being accessible on 82577/8/9 and I217
  e1000e: Correct link check logic for 82571 serdes
  sch_sfb: Fix missing NULL check
  bnx2: Fix bug in bnx2_free_tx_skbs().
  IPoIB: fix skb truesize underestimatiom
  net: Fix memory leak - vlan_info struct
  gianfar: fix potential sk_wmem_alloc imbalance
  drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable
  net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable
  drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable
  ...

Merge tag 'single-rpmsg-3.5-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg

Pull rpmsg fix from Ohad Ben-Cohen:
"A single rpmsg fix for 3.5, coming from Federico Fuga, which
eliminates the dependency on arbitrary initialization orders."

* tag 'single-rpmsg-3.5-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
rpmsg: fix dependency on initialization order

Merge branch 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull CMA and DMA-mapping fixes from Marek Szyprowski:
"Another set of minor fixups for recently merged Contiguous Memory
  Allocator and ARM DMA-mapping changes.  Those patches fix mysterious
  crashes on systems with CMA and Himem enabled as well as some corner
  cases caused by typical off-by-one bug."

* 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: dma-mapping: modify condition check while freeing pages
  mm: cma: fix condition check when setting global cma area
  mm: cma: don't replace lowmem pages with highmem

Merge branch 'master' of git://1984.lsi.us.es/nf

Pablo Neira Ayuso says:

====================
I know that we're in fairly late stage to request pulls, but the IPVS people
pinged me with little patches with oops fixes last week.

One of them was recently introduced (during the 3.4 development cycle) while
cleaning up the IPVS netns support. They are:

* Fix one regression introduced in 3.4 while cleaning up the
  netns support for IPVS, from Julian Anastasov.

* Fix one oops triggered due to resetting the conntrack attached to the skb
  instead of just putting it in the forward hook, from Lin Ming. This problem
  seems to be there since 2.6.37 according to Simon Horman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rpmsg: fix dependency on initialization order

When rpmsg drivers are built into the kernel, they must not initialize
before the rpmsg bus does, otherwise they'd trigger a BUG() in
drivers/base/driver.c line 169 (driver_register()).

To fix that, and to stop depending on arbitrary linkage ordering of
those built-in rpmsg drivers, we make the rpmsg bus initialize at
subsys_initcall.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Federico Fuga <fuga@studiofuga.com>
[ohad: rewrite the commit log]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>

ipvs: fix oops in ip_vs_dst_event on rmmod

After commit 39f618b4fd95ae243d940ec64c961009c74e3333 (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix oops on NAT reply in br_nf context

IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.

[  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.781792] PGD 218f9067 PUD 0
[  579.781865] Oops: 0000 [#1] SMP
[  579.781945] CPU 0
[  579.781983] Modules linked in:
[  579.782047]
[  579.782080]
[  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
[  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
[  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
[  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
[  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
[  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
[  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
[  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
[  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
[  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
[  579.783919] Stack:
[  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[  579.784477] Call Trace:
[  579.784523]  <IRQ>
[  579.784562]
[  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
[  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
[  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
[  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
[  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
[  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
[  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
[  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
[  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
[  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
[  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
[  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
[  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
[  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30

The steps to reproduce as follow,

1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
   ipvsadm -A -t 192.168.1.106:80 -s rr
   ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
   ab -n 1000 http://192.168.1.106/

ip_vs_reply4
  ip_vs_out
    handle_response
      ip_vs_notrack
        nf_reset()
        {
          skb->nf_bridge = NULL;
        }

Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.

Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ixgbevf: Fix panic when loading driver

This patch addresses a kernel panic seen when setting up the interface.
Specifically we see a NULL pointer dereference on the Tx descriptor cleanup
path when enabling interrupts. This change corrects that so it cannot
occur.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ax25: Fix missing break

At least there seems to be no reason to disallow ROSE sockets when
NETROM is loaded.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership

As the life flows, developers priorities shifts a bit. Reflect actual
changes in the maintainership of IEEE 802.15.4 code: Sergey mostly
stopped cared about this piece of code. Most of the work recently was
done by Alexander, so put him to the MAINTAINERS file to reflect his
status and to ease the life of respective patches.

Also add new net/mac802154/ directory to the list of maintained files.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
This series contains fixes to e1000e.
...
Bruce Allan (1):
e1000e: fix test for PHY being accessible on 82577/8/9 and I217

Tushar Dave (1):
e1000e: Correct link check logic for 82571 serdes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Fix access to freed pernet memory

unregister_netdevice_notifier() must be called before
unregister_pernet_subsys() to avoid accessing already freed
pernet memory. This fixes the following oops when doing rmmod:

Call Trace:
[<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
[<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
[<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
[<ffffffff810e7734>] sys_delete_module+0x1a4/0x300
[<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
[<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
[<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f

RIP
[<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cgroup: fix access the unallocated memory in netprio cgroup

there are some out of bound accesses in netprio cgroup.

now before accessing the dev->priomap.priomap array,we only check
if the dev->priomap exist.and because we don't want to see
additional bound checkings in fast path, so we should make sure
that dev->priomap is null or array size of dev->priomap.priomap
is equal to max_prioidx + 1;

so in write_priomap logic,we should call extend_netdev_table when
dev->priomap is null and dev->priomap.priomap_len < max_len.
and in cgrp_create->update_netdev_tables logic,we should call
extend_netdev_table only when dev->priomap exist and
dev->priomap.priomap_len < max_len.

and it's not needed to call update_netdev_tables in write_priomap,
we can only allocate the net device's priomap which we change through
net_prio.ifpriomap.

this patch also add a return value for update_netdev_tables &
extend_netdev_table, so when new_priomap is allocated failed,
write_priomap will stop to access the priomap,and return -ENOMEM
back to the userspace to tell the user what happend.

Change From v3:
1. add rtnl protect when reading max_prioidx in write_priomap.

2. only call extend_netdev_table when map->priomap_len < max_len,
this will make sure array size of dev->map->priomap always
bigger than any prioidx.

3. add a function write_update_netdev_table to make codes clear.

Change From v2:
1. protect extend_netdev_table by RTNL.
2. when extend_netdev_table failed,call dev_put to reduce device's refcount.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbevf: Prevent RX/TX statistics getting reset to zero

The commit 4197aa7bb81877ebb06e4f2cc1b5fea2da23a7bd implements 64 bit
per ring statistics. But the driver resets the 'total_bytes' and
'total_packets' from RX and TX rings in the RX and TX interrupt
handlers to zero. This results in statistics being lost and user space
reporting RX and TX statistics as zero. This patch addresses the
issue by preventing the resetting of RX and TX ring statistics to
zero.

Signed-off-by: Narendra K <narendra_k@dell.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Fix list corruption resulting from freeing an association on a list

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]  ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]  ffff880147c03a10
[22766.387140]  ffffffffa169f2b6
[22766.387140]  ffff88013ed95728
[22766.387143]  0000000000000002
[22766.387143]  0000000000000000
[22766.387143]  ffff880003fad062
[22766.387144]  ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145]  <IRQ>
[22766.387150]  [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154]  [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157]  [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161]  [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163]  [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168]  [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169]  [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171]  [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172]  [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174]  [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176]  [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178]  [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180]  [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182]  [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183]  [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185]  [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187]  [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218]  [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242]  [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266]  [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268]  [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270]  [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273]  [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275]  [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278]  [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279]  [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281]  [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283]  [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283]  <EOI>
[22766.387284]
[22766.387285]  [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311]  [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311]  RSP <ffff880147c039b0>
[22766.387142]  ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable.  As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance.  What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table.  the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete.  That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop.  I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising).  Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'gma500' (Alan's GMA patches)

Merge gma500 patches from Alan Cox.

* Merge emailed patches from Alan Cox <alan@lxorguk.ukuu.org.uk>: (3 commits)
  gma500,cdv: Fix the brightness base
  gma500: move the ASLE enable
  gma500: Fix lid related crash

Merge tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs

Pull xfs regression fixes from Ben Myers:
- Really fix a cursor leak in xfs_alloc_ag_vextent_near
- Fix a performance regression related to doing allocation in
   workqueues
- Prevent recursion in xfs_buf_iorequest which is causing stack
   overflows
- Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks

* tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs:
  xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
  xfs: prevent recursion in xfs_buf_iorequest
  xfs: don't defer metadata allocation to the workqueue
  xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near

timekeeping: Add missing update call in timekeeping_resume()

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gma500,cdv: Fix the brightness base

Some desktop environments carefully save and restore the brightness
settings from the previous boot. Unfortunately they don't all check to
see if the range has changed. The end result is that they restore a
brightness of 100/lots not 100/100.

As the old driver and the non-free GMA36xx driver both use 0-100 we thus
need to go back doing the same thing to avoid users getting a mysterious
black screen after boot.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gma500: move the ASLE enable

Otherwise we end up getting the masks wrong, can get events before we
are doing power control and other ungood things. Again this is a
regression fix where the ordering of handling was disturbed by other
work, and the user experience on some boxes is a blank screen.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gma500: Fix lid related crash

We now set up the lid timer before we set up the backlight. On some
devices that causes a crash as we do a backlight change before or during
the setup.

As this fixes a crash on boot regression on some setups it ought to go
in ASAP, especially as all the user gets is a blank screen.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Tested-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform tree fixes from Matthew Garrett:
"Small fixes to a couple of drivers plus a slightly larger number for
  sony-laptop that the maintainer thinks are appropriate, most of which
  fix problems with the earlier 3.5 updates.  These have been in -next
  for a while without complaint."

* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
  intel_ips: blacklist HP ProBook laptops
  ideapad: uninitialized data in ideapad_acpi_add()
  sony-laptop: correct find_snc_handle failure checks
  sony-laptop: fix a couple signedness bugs
  sony-laptop: fix sony_nc_sysfs_store()
  sony-laptop: input initialization should be done before SNC
  sony-laptop: add lid backlight support for handle 0x143
  sony-laptop: store battery care limits on batteries
  sony-laptop: notify userspace of GFX switch position changes
  sony-laptop: use an enum for SNC event types

fifo: Do not restart open() if it already found a partner

If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns.  In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted.  Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.

The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open().  Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come.  (On my systems, this happens
pretty reliably within about 5 to 500 iterations.  Others report that
it manages to loop ~forever sometimes; YMMV.)

  #include <sys/stat.h>
  #include <sys/types.h>
  #include <sys/wait.h>
  #include <fcntl.h>
  #include <signal.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <unistd.h>

  #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)

  void handler(int signum) {}

  int main()
  {
      struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
      CHECK(sigaction(SIGCHLD, &act, NULL));
      CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
      for (;;) {
          int fd;
          pid_t pid;
          putc('.', stderr);
          CHECK(pid = fork());
          if (pid == 0) {
              CHECK(fd = open("fifo", O_RDONLY));
              _exit(0);
          }
          CHECK(fd = open("fifo", O_WRONLY));
          CHECK(close(fd));
          CHECK(waitpid(pid, NULL, 0));
      }
  }

This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:

http://bugs.debian.org/678852

Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fixes-for-v3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull late pinctrl fixes from Linus Walleij:
- Two fixes to the i.MX driver

* tag 'fixes-for-v3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: pinctrl-imx6q: add missed mux function for USBOTG_ID
pinctrl: pinctrl-imx: only print debug message when DEBUG is defined

net: respect GFP_DMA in __netdev_alloc_skb()

Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dma-mapping: modify condition check while freeing pages

WARNING: at mm/vmalloc.c:1471 __iommu_free_buffer+0xcc/0xd0()
Trying to vfree() nonexistent vm area (ef095000)
Modules linked in:
[<c0015a18>] (unwind_backtrace+0x0/0xfc) from [<c0025a94>] (warn_slowpath_common+0x54/0x64)
[<c0025a94>] (warn_slowpath_common+0x54/0x64) from [<c0025b38>] (warn_slowpath_fmt+0x30/0x40)
[<c0025b38>] (warn_slowpath_fmt+0x30/0x40) from [<c0016de0>] (__iommu_free_buffer+0xcc/0xd0)
[<c0016de0>] (__iommu_free_buffer+0xcc/0xd0) from [<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138)
[<c0229a5c>] (exynos_drm_free_buf+0xe4/0x138) from [<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc)
[<c022b358>] (exynos_drm_gem_destroy+0x80/0xfc) from [<c0211230>] (drm_gem_object_free+0x28/0x34)
[<c0211230>] (drm_gem_object_free+0x28/0x34) from [<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8)
[<c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8) from [<c01abe10>] (idr_for_each+0x74/0xb8)
[<c01abe10>] (idr_for_each+0x74/0xb8) from [<c02114e4>] (drm_gem_release+0x1c/0x30)
[<c02114e4>] (drm_gem_release+0x1c/0x30) from [<c0210ae8>] (drm_release+0x608/0x694)
[<c0210ae8>] (drm_release+0x608/0x694) from [<c00b75a0>] (fput+0xb8/0x228)
[<c00b75a0>] (fput+0xb8/0x228) from [<c00b40c4>] (filp_close+0x64/0x84)
[<c00b40c4>] (filp_close+0x64/0x84) from [<c0029d54>] (put_files_struct+0xe8/0x104)
[<c0029d54>] (put_files_struct+0xe8/0x104) from [<c002b930>] (do_exit+0x608/0x774)
[<c002b930>] (do_exit+0x608/0x774) from [<c002bae4>] (do_group_exit+0x48/0xb4)
[<c002bae4>] (do_group_exit+0x48/0xb4) from [<c002bb60>] (sys_exit_group+0x10/0x18)
[<c002bb60>] (sys_exit_group+0x10/0x18) from [<c000ee80>] (ret_fast_syscall+0x0/0x30)

This patch modifies the condition while freeing to match the condition
used while allocation. This fixes the above warning which arises when
array size is equal to PAGE_SIZE where allocation is done using kzalloc
but free is done using vfree.

Signed-off-by: Prathyush K <prathyush.k@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

Linux 3.5-rc7

blk: fix wrong idr_pre_get() error check in loop.c

The idr_pre_get() function never returns a value < 0. It returns 0 (no
memory) or 1 (OK).

Reported-by: Silva Paulo <psdasilva@yahoo.com>
[ Rewrote Silva's patch, but attributing it to Silva anyway - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

pinctrl: pinctrl-imx6q: add missed mux function for USBOTG_ID

The original pin registers table is derived from u-boot mainline,
but somehow it was found missing some mux functions for USBOTG_ID.
We added it at the bottom by following the exist pin function ids,
then it will not break the exist using of pin function id in dts file.

Reported-by: Richard Zhao <richard.zhao@freescale.com>
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>

pinctrl: pinctrl-imx: only print debug message when DEBUG is defined

Fix regression for commit 3a86a5f8 (pinctrl: pinctrl-imx: free allocated
pinctrl_map structure only once and use kernel facilities for IMX_PMX_DUMP)
introduced in 3.5-rc3.
With above commit, the debug code will alway be excuted.
Change to excute it only when DEBUG is defined.

Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Merge tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Containing the regression fixes for USB-audio due to the transition to
  the new streaming logic, mostly found on Logitech webcams."

* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: snd-usb: move calls to usb_set_interface
  ALSA: usb-audio: Fix the first PCM interface assignment

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull ACPI patch from Len Brown.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPICA: Fix possible fault in return package object repair code

vsyscall_64: add missing ifdef CONFIG_SECCOMP

vsyscall_seccomp introduced a dependency on __secure_computing. On
configurations with CONFIG_SECCOMP disabled, compilation will fail.

Reported-by: feng xiangjun <fengxj325@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'cpufreq-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull cpufreq fix from Rafael Wysocki:
"This fixes a regression preventing the ACPI cpufreq driver from
  loading on some systems where it worked previously without any
  problems."

* tag 'cpufreq-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM Samsung SoC fixes from Arnd Bergmann.

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: S3C24XX: Correct CAMIF interrupt definitions
  ARM: S3C24XX: Correct AC97 clock control bit for S3C2440
  ARM: SAMSUNG: fix race in s3c_adc_start for ADC
  ARM: SAMSUNG: Update default rate for xusbxti clock
  ARM: EXYNOS: register devices in 'need_restore' state for pm_domains
  ARM: EXYNOS: read initial state of power domain from hw registers

Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU, perf, and scheduler fixes from Ingo Molnar.

The RCU fix is a revert for an optimization that could cause deadlocks.

One of the scheduler commits (164c33c6adee "sched: Fix fork() error path
to not crash") is correct but not complete (some architectures like Tile
are not covered yet) - the resulting additional fixes are still WIP and
Ingo did not want to delay these pending fixes.  See this thread on
lkml:

  [PATCH] fork: fix error handling in dup_task()

The perf fixes are just trivial oneliners.

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf kvm: Fix segfault with report and mixed guestmount use
  perf kvm: Fix regression with guest machine creation
  perf script: Fix format regression due to libtraceevent merge
  ring-buffer: Fix accounting of entries when removing pages
  ring-buffer: Fix crash due to uninitialized new_pages list head

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS/sched: Update scheduler file pattern
  sched/nohz: Rewrite and fix load-avg computation -- again
  sched: Fix fork() error path to not crash

ACPICA: Fix possible fault in return package object repair code

Fixes a problem that can occur when a lone package object is
wrapped with an outer package object in order to conform to
the ACPI specification. Can affect these predefined names:
_ALR,_MLS,_PSS,_TRT,_TSS,_PRT,_HPX,_DLM,_CSD,_PSD,_TSD

https://bugzilla.kernel.org/show_bug.cgi?id=44171

This problem was introduced in 3.4-rc1 by commit
6a99b1c94d053b3420eaa4a4bc8b2883dd90a2f9
(ACPICA: Object repair code: Support to add Package wrappers)

Reported-by: Vlastimil Babka <caster@gentoo.org>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: <stable@vger.kernel.org> # 3.4
Signed-off-by: Len Brown <len.brown@intel.com>

e1000e: fix test for PHY being accessible on 82577/8/9 and I217

Occasionally, the PHY can be initially inaccessible when the first read of
a PHY register, e.g. PHY_ID1, happens (signified by the returned value
0xFFFF) but subsequent accesses of the PHY work as expected. Add a retry
counter similar to how it is done in the generic e1000_get_phy_id().

Also, when the PHY is completely inaccessible (i.e. when subsequent reads
of the PHY_IDx registers returns all F's) and the MDIO access mode must be
set to slow before attempting to read the PHY ID again, the functions that
do these latter two actions expect the SW/FW/HW semaphore is not already
set so the semaphore must be released before and re-acquired after calling
them otherwise there is an unnecessarily inordinate amount of delay during
device initialization.

Reported-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Correct link check logic for 82571 serdes

SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.

CC: stable <stable@vger.kernel.org> [2.6.38+]
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'v3.5-samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes

From Kukjin Kim <kgene.kim@samsung.com>:

* 'v3.5-samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: S3C24XX: Correct CAMIF interrupt definitions
  ARM: S3C24XX: Correct AC97 clock control bit for S3C2440
  ARM: SAMSUNG: fix race in s3c_adc_start for ADC
  ARM: SAMSUNG: Update default rate for xusbxti clock
  ARM: EXYNOS: register devices in 'need_restore' state for pm_domains
  ARM: EXYNOS: read initial state of power domain from hw registers

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull use-after-free RAID1 bugfix from NeilBrown.

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: fix use-after-free bug in RAID1 data-check code.

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull the leap second fixes from Thomas Gleixner:
"It's a rather large series, but well discussed, refined and reviewed.
  It got a massive testing by John, Prarit and tip.

  In theory we could split it into two parts.  The first two patches

    f55a6faa3843: hrtimer: Provide clock_was_set_delayed()
    4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue

  are merely preventing the stuff loops forever issues, which people
  have observed.

  But there is no point in delaying the other 4 commits which achieve
  full correctness into 3.6 as they are tagged for stable anyway.  And I
  rather prefer to have the full fixes merged in bulk than a "prevent
  the observable wreckage and deal with the hidden fallout later"
  approach."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Update hrtimer base offsets each hrtimer_interrupt
  timekeeping: Provide hrtimer update function
  hrtimers: Move lock held region in hrtimer_interrupt()
  timekeeping: Maintain ktime_t based offsets for hrtimers
  timekeeping: Fix leapsecond triggered load spike issue
  hrtimer: Provide clock_was_set_delayed()

x86/vsyscall: allow seccomp filter in vsyscall=emulate

If a seccomp filter program is installed, older static binaries and
distributions with older libc implementations (glibc 2.13 and earlier)
that rely on vsyscall use will be terminated regardless of the filter
program policy when executing time, gettimeofday, or getcpu. This is
only the case when vsyscall emulation is in use (vsyscall=emulate is the
default).

This patch emulates system call entry inside a vsyscall=emulate by
populating regs->ax and regs->orig_ax with the system call number prior
to calling into seccomp such that all seccomp-dependencies function
normally. Additionally, system call return behavior is emulated in line
with other vsyscall entrypoints for the trace/trap cases.

[ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ]
Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks

xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier. In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: prevent recursion in xfs_buf_iorequest

If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest. This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.

As a short term workaround make sure we always call the iodone handler in
workqueue context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: don't defer metadata allocation to the workqueue

Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near

The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Please pull one hwmon subsystem fix from Jean Delvare.

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (it87) Preserve configuration register bits on init

Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFSv4 mount regression
- Fix O_DIRECT list manipulation snafus

* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix an NFSv4 mount regression
NFS: Fix list manipulation snafus in fs/nfs/direct.c

Remove easily user-triggerable BUG from generic_setlease

This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input layer fixes from Dmitry Torokhov:
"The changes are limited to adding new VID/PID combinations to drivers
  to enable support for new versions of hardware, most notably hardware
  found in new MacBook Pro Retina boxes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xpad - add Andamiro Pump It Up pad
  Input: xpad - add signature for Razer Onza Tournament Edition
  Input: xpad - handle all variations of Mad Catz Beat Pad
  Input: bcm5974 - Add support for 2012 MacBook Pro Retina
  HID: add support for 2012 MacBook Pro Retina

Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
- Some regression fixes at the audio part for devices with
   cx23885/cx25840
- A DMA corruption fix at cx231xx
- two fixes at the winbond IR driver
- Several fixes for the EXYNOS media driver (s5p)
- two fixes at the OMAP3 preview driver
- one fix at the dvb core failure path
- an include missing (slab.h) at smiapp-core causing compilation
   breakage
- em28xx was not loading the IR driver driver anymore.

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (31 commits)
  [media] Revert "[media] V4L: JPEG class documentation corrections"
  [media] s5p-fimc: Add missing FIMC-LITE file operations locking
  [media] omap3isp: preview: Fix contrast and brightness handling
  [media] omap3isp: preview: Fix output size computation depending on input format
  [media] winbond-cir: Initialise timeout, driver_type and allowed_protos
  [media] winbond-cir: Fix txandrx module info
  [media] cx23885: Silence unknown command warnings
  [media] cx23885: add support for HVR-1255 analog (cx23888 variant)
  [media] cx23885: make analog support work for HVR_1250 (cx23885 variant)
  [media] cx25840: fix vsrc/hsrc usage on cx23888 designs
  [media] cx25840: fix regression in HVR-1800 analog audio
  [media] cx25840: fix regression in analog support hue/saturation controls
  [media] cx25840: fix regression in HVR-1800 analog support
  [media] s5p-mfc: Fixed setup of custom controls in decoder and encoder
  [media] cx231xx: don't DMA to random addresses
  [media] em28xx: fix em28xx-rc load
  [media] dvb-core: Release semaphore on error path dvb_register_device()
  [media] s5p-fimc: Stop media entity pipeline if fimc_pipeline_validate fails
  [media] s5p-fimc: Fix compiler warning in fimc-lite.c
  [media] s5p-fimc: media_entity_pipeline_start() may fail
  ...

Merge tag 'mmc-fixes-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
- Revert a patch that made failing to select power class fatal;
   it turns out that it fails non-fatally on Tegra boards.
   Regression against 3.5-rc1.
- Add the IRQF_ONESHOT flag to the cd-gpio driver, which turned
   into a regression in 3.5-rc1 when IRQF_ONESHOT became required
   for threaded IRQs with no handler.

* tag 'mmc-fixes-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: cd-gpio: pass IRQF_ONESHOT to request_threaded_irq()
  mmc: core: Revert "skip card initialization if power class selection fails"

Merge tag 'for-linus-20120712' of git://git.infradead.org/linux-mtd

Pull late MTD fixes from David Woodhouse:
- fix 'sparse warning fix' regression which totally breaks MXC NAND
- fix GPMI NAND regression when used with UBI
- update/correct sysfs documentation for new 'bitflip_threshold' field
- fix nandsim build failure

* tag 'for-linus-20120712' of git://git.infradead.org/linux-mtd:
  mtd: nandsim: don't open code a do_div helper
  mtd: ABI documentation: clarification of bitflip_threshold
  mtd: gpmi-nand: fix read page when reading to vmalloced area
  mtd: mxc_nand: use 32bit copy functions

Merge tag 'mfd-for-linus-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

Pull MFD Fixes from Samuel Ortiz:
- Three Palmas fixes, One of them being a build error fix.
- Two mc13xx fixes.  One for fixing an SPI regmap configuration and
   another one for working around an i.Mx hardware bug.
- One omap-usb regression fix.
- One twl6040 build breakage fix.
- One file deletion (ab5500-core.h) that was overlooked during the last
   merge window.

* tag 'mfd-for-linus-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
  mfd: Add missing hunk to change palmas irq to clear on read
  mfd: Fix palmas regulator pdata missing
  mfd: USB: Fix the omap-usb EHCI ULPI PHY reset fix issues.
  mfd: Update twl6040 Kconfig to avoid build breakage
  mfd: Delete ab5500-core.h
  mfd: mc13xxx workaround SPI hardware bug on i.Mx
  mfd: Fix mc13xxx SPI regmap
  mfd: Add terminating entry for i2c_device_id palmas table

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
SH: Convert out[bwl] macros to inline functions
sh: Fix up se7721 GPIOLIB=y build warnings.

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull a couple of KVM fixes from Avi Kivity:
"One is an adjustment for an irq layer change that affected device
  assignment, the other a one-liner ppc fix."

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
  powerpc/kvm: Fix "PR" KVM implementation of H_CEDE
  KVM: Fix device assignment threaded irq handler

block: fix infinite loop in __getblk_slow

Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.

The problem was initially reported by Torsten Hilbrich here:

    https://lkml.org/lkml/2012/6/18/54

and also reported independently here:

    http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

and then Richard W.M.  Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue.  This patch has been
confirmed to fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=835019

The main problem is here, in __getblk_slow:

        for (;;) {
                struct buffer_head * bh;
                int ret;

                bh = __find_get_block(bdev, block, size);
                if (bh)
                        return bh;

                ret = grow_buffers(bdev, block, size);
                if (ret < 0)
                        return NULL;
                if (ret == 0)
                        free_more_memory();
        }

__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page.  I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding.  However, we also continue to loop for other cases, like the
block lying beond the end of the disk.  So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).

The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org> # 3.0+
[ Jens is on vacation, taking this directly  - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: S3C24XX: Correct CAMIF interrupt definitions

Properly define the CAMIF interrupt resources. This device have two
interrupts - corresponding to the "codec" and "preview" data paths.
IRQ_CAM is handled internally by the architecture and demultiplexed
to IRQ_S3C2440_CAM_C and IRQ_S3C2440_CAM_P - these interrupts only
should be handled in the driver.

Signed-off-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: S3C24XX: Correct AC97 clock control bit for S3C2440

Use correct gate control bit for AC97 clock which is
S3C2440_CLKCON_AC97, not S3C2440_CLKCON_CAMERA.

Signed-off-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ALSA: snd-usb: move calls to usb_set_interface

The rework of the snd-usb endpoint logic moved the calls to
snd_usb_set_interface() into the snd_usb_endpoint implemenation. This
changed the order in which these calls are issued to the device, and
thereby caused regressions for some webcams.

Fix this by moving the calls back to pcm.c for now to make it work again
and use snd_usb_endpoint_activate() to really tear down all remaining
URBs in the flight, consequently fixing another regression caused by USB
packets on the wire after altsetting 0 has been selected.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Philipp Dreimann <philipp@dreimann.net>
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Input: xpad - add Andamiro Pump It Up pad

I couldn't find the vendor ID in any of the online databases, but this
mat has a Pump It Up logo on the top side of the controller compartment,
and a disclaimer stating that Andamiro will not be liable on the bottom.

Signed-off-by: Yuri Khan <yurivkhan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

ARM: SAMSUNG: fix race in s3c_adc_start for ADC

Checking for adc->ts_pend already claimed should be done with the
lock held.

Signed-off-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: SAMSUNG: Update default rate for xusbxti clock

The rate of xusbxti clock is set in individual machine files.
The default value should be defined at the clock definition
and individual machine files should modify it if required.

Division by zero in kernel.
[<c0011849>] (unwind_backtrace+0x1/0x9c) from [<c022c663>] (Ldiv0+0x9/0x12)
[<c022c663>] (Ldiv0+0x9/0x12) from [<c001a3c3>] (s3c_setrate_clksrc+0x33/0x78)
[<c001a3c3>] (s3c_setrate_clksrc+0x33/0x78) from [<c0019e67>] (clk_set_rate+0x2f/0x78)

Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

hwmon: (it87) Preserve configuration register bits on init

We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.

Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>

cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression

Commit d640113fe80e45ebd4a5b420b introduced a regression on SMP
systems where the processor core with ACPI id zero is disabled
(typically should be the case because of hyperthreading).
The regression got spread through stable kernels.
On 3.0.X it got introduced via 3.0.18.

Such platforms may be rare, but do exist.
Look out for a disabled processor with acpi_id 0 in dmesg:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] disabled)

This problem has been observed on a:
HP Proliant BL280c G6 blade

This patch restricts the introduced workaround to platforms
with nr_cpu_ids <= 1.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

sch_sfb: Fix missing NULL check

Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: EXYNOS: register devices in 'need_restore' state for pm_domains

Commit ca1d72f033 ('PM / Domains: Make it possible to add devices to
inactive domains') introduced possibility to add devices to inactive
power domains and added pm_genpd_dev_need_restore() function which lets
platform core to notify power domain core that the specified device must
be restored (with its runtime_resume() callback) before first use.

This patch adds the pm_genpd_dev_need_restore() call what brings back
the suspend/resume behaviour for the client devices known from the
previous power domain driver (removed by commit 91cfbd4ee0 - 'ARM:
EXYNOS: Hook up power domains to generic power domain infrastructure').
Client device drivers relay on that suspend/resume behaviour, thus this
patch fixes runtime pm operation for client devices.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

ARM: EXYNOS: read initial state of power domain from hw registers

Some bootloaders disable unused power domains to reduce power
consuption. Power domain driver can easily read the actual state from
the hardware registers instead of assuming that their initial state is
always 'on'.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>

SH: Convert out[bwl] macros to inline functions

The macros just called BUG(), but that results in unused variable
warnings all over the place, like in the IPMI driver. The build
regression emails were annoying me, so here's the fix. I have
not even compile tested this, but it's rather obvious.

[ port type mangled to unsigned long ]

Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

tracing: Check for allocation failure in __tracing_open()

Clean up and return -ENOMEM on if the kzalloc() fails.

This also prevents a potential crash, as the pointer that failed to
allocate would be later used.

Link: http://lkml.kernel.org/r/20120711063507.GF11812@elgon.mountain
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6

Pull fbdev fixes from Florian Tobias Schandinat:
"Two fixes for OMAPDSS by Tomi Valkeinen:
   - one to avoid warnings when runtime PM is not enabled
   - one workaround to dependancy issues during suspend/resume"

* tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6:
  OMAPDSS: fix warnings if CONFIG_PM_RUNTIME=n
  OMAPDSS: Use PM notifiers for system suspend

Merge branch 'akpm' (Andrew's patch-bomb)

Merge random patches from Andrew Morton.

* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits)
  memblock: free allocated memblock_reserved_regions later
  mm: sparse: fix usemap allocation above node descriptor section
  mm: sparse: fix section usemap placement calculation
  xtensa: fix incorrect memset
  shmem: cleanup shmem_add_to_page_cache
  shmem: fix negative rss in memcg memory.stat
  tmpfs: revert SEEK_DATA and SEEK_HOLE
  drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT
  fat: fix non-atomic NFS i_pos read
  MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section
  sgi-xp: nested calls to spin_lock_irqsave()
  fs: ramfs: file-nommu: add SetPageUptodate()
  drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning
  mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails
  h8300/uaccess: add mising __clear_user()
  h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()
  h8300/time: add missing #include <asm/irq_regs.h>
  h8300/signal: fix typo "statis"
  h8300/pgtable: add missing #include <asm-generic/pgtable.h>
  drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled
  ...

memblock: free allocated memblock_reserved_regions later

memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.

Also tj said there is another bug which could be related to this.

| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing.  We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.

in that case, when DEBUG_PAGEALLOC, will get panic:

     memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
  BUG: unable to handle kernel paging request at ffff88102febd948
  IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
  PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU 0
  Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
  RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155

See the discussion at https://lkml.org/lkml/2012/6/13/469

So try to allocate with PAGE_SIZE alignment and free it later.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: sparse: fix usemap allocation above node descriptor section

After commit f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.

The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.

Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: sparse: fix section usemap placement calculation

Commit 238305bb4d41 ("mm: remove sparsemem allocation details from the
bootmem allocator") introduced a bug in the allocation goal calculation
that put section usemaps not in the same section as the node
descriptors, creating unnecessary hotplug dependencies between them:

  node 0 must be removed before remove section 16399
  node 1 must be removed before remove section 16399
  node 2 must be removed before remove section 16399
  node 3 must be removed before remove section 16399
  node 4 must be removed before remove section 16399
  node 5 must be removed before remove section 16399
  node 6 must be removed before remove section 16399

The reason is that it applies PAGE_SECTION_MASK to the physical address
of the node descriptor when finding a suitable place to put the usemap,
when this mask is actually intended to be used with PFNs.  Because the
PFN mask is wider, the target address will point beyond the wanted
section holding the node descriptor and the node must be offlined before
the section holding the usemap can go.

Fix this by extending the mask to address width before use.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xtensa: fix incorrect memset

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=43871

Reported-by: <dcb314@hotmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: cleanup shmem_add_to_page_cache

shmem_add_to_page_cache() has three callsites, but only one of them wants
the radix_tree_preload() (an exceptional entry guarantees that the radix
tree node is present in the other cases), and only that site can achieve
mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
other cases). We did it this way originally to reflect
add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
preloading and mem_cgroup uncharging to that one caller.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

shmem: fix negative rss in memcg memory.stat

When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.

But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.

It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).

The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.

By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously.  And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.

It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.

Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.

We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().

[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem.  Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked.  So it's not an error, and these
residual pages are easily freed once pressure demands.]

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tmpfs: revert SEEK_DATA and SEEK_HOLE

Revert 4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE").  I believe
it's correct, and it's been nice to have from rc1 to rc6; but as the
original commit said:

I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs.  This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?

Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
the dumb generic support for v3.5.  We can always reinstate it later if
useful, and anyone needing it in a hurry can just get it out of git.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT

Requesting a threaded interrupt without a primary handler and without
IRQF_ONESHOT is dangerous, and after commit 1c6c6952 ("genirq: Reject
bogus threaded irq requests"), these requests are rejected. This causes
->probe() to fail, and the RTC driver not to be availble.

To fix, add IRQF_ONESHOT to the IRQ flags.

Tested on OMAP3730/OveroSTORM and OMAP4430/Panda board using rtcwake to
wake from system suspend multiple times.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fat: fix non-atomic NFS i_pos read

fat_encode_fh() can fetch an invalid i_pos value on systems where 64-bit
accesses are not atomic. Make it use the same accessor as the rest of the
FAT code.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section

Add the OMAP CPUFreq driver to the list of files in the OMAP Power
Management section.

I've already been maintaining this driver, this just makes it official.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>