git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log

memcg: check that kmem_cache has memcg_params before accessing it

If the system had a few memory groups and all of them were destroyed,
memcg_limited_groups_array_size has non-zero value, but all new caches
are created without memcg_params, because memcg_kmem_enabled() returns
false.

We try to enumirate child caches in a few places and all of them are
potentially dangerous.

For example my kernel is compiled with CONFIG_SLAB and it crashed when I
tryed to mount a NFS share after a few experiments with kmemcg.

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
  PGD b942a067 PUD b999f067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: fscache(+) ip6table_filter ip6_tables iptable_filter ip_tables i2c_piix4 pcspkr virtio_net virtio_balloon i2c_core floppy
  CPU: 0 PID: 357 Comm: modprobe Not tainted 3.11.0-rc7+ #59
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff8800b9f98240 ti: ffff8800ba32e000 task.ti: ffff8800ba32e000
  RIP: 0010:[<ffffffff8118166a>]  [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
  RSP: 0018:ffff8800ba32fb70  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
  RDX: 0000000000000000 RSI: ffff8800b9f98910 RDI: 0000000000000246
  RBP: ffff8800ba32fba0 R08: 0000000000000002 R09: 0000000000000004
  R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010
  R13: 0000000000000008 R14: 00000000000000d0 R15: ffff8800375d0200
  FS:  00007f55f1378740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007f24feba57a0 CR3: 0000000037b51000 CR4: 00000000000006f0
  Call Trace:
    enable_cpucache+0x49/0x100
    setup_cpu_cache+0x215/0x280
    __kmem_cache_create+0x2fa/0x450
    kmem_cache_create_memcg+0x214/0x350
    kmem_cache_create+0x2b/0x30
    fscache_init+0x19b/0x230 [fscache]
    do_one_initcall+0xfa/0x1b0
    load_module+0x1c41/0x26d0
    SyS_finit_module+0x86/0xb0
    system_call_fastpath+0x16/0x1b

Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/base/memory.c: fix show_mem_removable() to handle missing sections

"cat /sys/devices/system/memory/memory*/removable" crashed the system.

The problem is that show_mem_removable() is passing a
bad pfn to is_mem_section_removable(), which causes

    if (!node_online(page_to_nid(page)))

to blow up.  Why is it passing in a bad pfn?

The reason is that show_mem_removable() will loop sections_per_block
times.  sections_per_block is 16, but mem->section_count is 8,
indicating holes in this memory block.  Checking that the memory section
is present before checking to see if the memory section is removable
fixes the problem.

   harp5-sys:~ # cat /sys/devices/system/memory/memory*/removable
   0
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   BUG: unable to handle kernel paging request at ffffea00c3200000
   IP: [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   PGD 83ffd4067 PUD 37bdfce067 PMD 0
   Oops: 0000 [#1] SMP
   Modules linked in: autofs4 binfmt_misc rdma_ucm rdma_cm iw_cm ib_addr ib_srp scsi_transport_srp scsi_tgt ib_ipoib ib_cm ib_uverbs ib_umad iw_cxgb3 cxgb3 mdio mlx4_en mlx4_ib ib_sa mlx4_core ib_mthca ib_mad ib_core fuse nls_iso8859_1 nls_cp437 vfat fat joydev loop hid_generic usbhid hid hwperf(O) numatools(O) dm_mod iTCO_wdt ipv6 iTCO_vendor_support igb i2c_i801 ioatdma i2c_algo_bit ehci_pci pcspkr lpc_ich i2c_core ehci_hcd ptp sg mfd_core dca rtc_cmos pps_core mperf button xhci_hcd sd_mod crc_t10dif usbcore usb_common scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh gru(O) xvma(O) xfs crc32c libcrc32c thermal sata_nv processor piix mptsas mptscsih scsi_transport_sas mptbase megaraid_sas fan thermal_sys hwmon ext3 jbd ata_piix ahci libahci libata scsi_mod
   CPU: 4 PID: 5991 Comm: cat Tainted: G           O 3.11.0-rc5-rja-uv+ #10
   Hardware name: SGI UV2000/ROMLEY, BIOS SGI UV 2000/3000 series BIOS 01/15/2013
   task: ffff88081f034580 ti: ffff880820022000 task.ti: ffff880820022000
   RIP: 0010:[<ffffffff81117ed1>]  [<ffffffff81117ed1>] is_pageblock_removable_nolock+0x1/0x90
   RSP: 0018:ffff880820023df8  EFLAGS: 00010287
   RAX: 0000000000040000 RBX: ffffea00c3200000 RCX: 0000000000000004
   RDX: ffffea00c30b0000 RSI: 00000000001c0000 RDI: ffffea00c3200000
   RBP: ffff880820023e38 R08: 0000000000000000 R09: 0000000000000001
   R10: 0000000000000000 R11: 0000000000000001 R12: ffffea00c33c0000
   R13: 0000160000000000 R14: 6db6db6db6db6db7 R15: 0000000000000001
   FS:  00007ffff7fb2700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: ffffea00c3200000 CR3: 000000081b954000 CR4: 00000000000407e0
   Call Trace:
     show_mem_removable+0x41/0x70
     dev_attr_show+0x2a/0x60
     sysfs_read_file+0xf7/0x1c0
     vfs_read+0xc8/0x130
     SyS_read+0x5d/0xa0
     system_call_fastpath+0x16/0x1b

Signed-off-by: Russ Anderson <rja@sgi.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

IPC: bugfix for msgrcv with msgtyp < 0

According to 'man msgrcv': "If msgtyp is less than 0, the first message of
the lowest type that is less than or equal to the absolute value of msgtyp
shall be received."

Bug: The kernel only returns a message if its type is 1; other messages
with type < abs(msgtype) will never get returned.

Fix: After having traversed the list to find the first message with the
lowest type, we need to actually return that message.

This regression was introduced by commit daaf74cf0867 ("ipc: refactor
msg list search into separate function")

Signed-off-by: Svenning Soerensen <sss@secomea.dk>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Omnikey Cardman 4000: pull in ioctl.h in user header

This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h
for the definitions.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

timer_list: correct the iterator for timer_list

Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.

Signed-off-by: Nathan Zimmer <nzimmer@sgi.com>
Reported-by: Holger Hans Peter Freyther <holger@freyther.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Berke Durak <berke.durak@xiphos.com>
Cc: Jeff Layton <jlayton@redhat.com>
Tested-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> # 3.10.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
"Two changes here:

   - Fix a bug in the rbtree code which could cause it to create two
     different cache entries for the same register by adding a single
     register at a time to the cache.  This isn't awesome for
     performance but it's non-invasive which we need for this late in
     the release cycle and the I/O costs we're trying to avoid are high.

   - Add another header used in the !CONFIG_REGMAP stubs where we had
     been relying on implicit inclusion"

* tag 'regmap-v3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: rbtree: Fix overlapping rbnodes.
  regmap: Add another missing header for !CONFIG_REGMAP stubs

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 bug fixes that should probably go into 3.11 since I'm also
  tagging them for stable.

  Once fixes our old /proc/powerpc/lparcfg file which provides partition
  informations when running under our hypervisor and also acts as a
  user-triggerable Oops when hot :-(

  The other two respectively are a one liner to fix a HVSI protocol
  handshake problem causing the console to fail to show up on a bunch of
  machines until we reach userspace, which I deem annoying enough to
  warrant going to stable, and a nasty gcc miscompile causing us to pass
  virtual instead of physical addresses to the firmware under some
  circumstances"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/hvsi: Increase handshake timeout from 200ms to 400ms.
  powerpc: Work around gcc miscompilation of __pa() on 64-bit
  powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without hypervisor

mm: move_ptes -- Set soft dirty bit depending on pte type

Dave reported corrupted swap entries

| [ 4588.541886] swap_free: Unused swap offset entry 00002d15
| [ 4588.541952] BUG: Bad page map in process trinity-kid12 pte:005a2a80 pmd:22c01f067

and Hugh pointed that in move_ptes _PAGE_SOFT_DIRTY bit set regardless
the type of entry pte consists of. The trick here is that when we carry
soft dirty status in swap entries we are to use _PAGE_SWP_SOFT_DIRTY
instead, because this is the only place in pte which can be used for own
needs without intersecting with bits owned by swap entry type/offset.

Reported-and-tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Analyzed-by: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

powerpc/hvsi: Increase handshake timeout from 200ms to 400ms.

This solves a problem observed in kexec'ed kernel where 200ms timeout is
too short and bootconsole fails to initialize. Console did eventually
become workable but much later into the boot process.

Observed timeout was around 260ms, but I decided to make it a little bigger
for more reliability.

This has been tested on Power7 machine with Petitboot as a primary
bootloader and PowerNV firmware.

CC: <stable@vger.kernel.org>
Signed-off-by: Eugene Surovegin <surovegin@google.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Work around gcc miscompilation of __pa() on 64-bit

On 64-bit, __pa(&static_var) gets miscompiled by recent versions of
gcc as something like:

        addis 3,2,.LANCHOR1+4611686018427387904@toc@ha
        addi 3,3,.LANCHOR1+4611686018427387904@toc@l

This ends up effectively ignoring the offset, since its bottom 32 bits
are zero, and means that the result of __pa() still has 0xC in the top
nibble.  This happens with gcc 4.8.1, at least.

To work around this, for 64-bit we make __pa() use an AND operator,
and for symmetry, we make __va() use an OR operator.  Using an AND
operator rather than a subtraction ends up with slightly shorter code
since it can be done with a single clrldi instruction, whereas it
takes three instructions to form the constant (-PAGE_OFFSET) and add
it on.  (Note that MEMORY_START is always 0 on 64-bit.)

CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Don't Oops when accessing /proc/powerpc/lparcfg without hypervisor

/proc/powerpc/lparcfg is an ancient facility (though still actively used)
which allows access to some informations relative to the partition when
running underneath a PAPR compliant hypervisor.

It makes no sense on non-pseries machines. However, currently, not only
can it be created on these if the kernel has pseries support, but accessing
it on such a machine will crash due to trying to do hypervisor calls.

In fact, it should also not do HV calls on older pseries that didn't have
an hypervisor either.

Finally, it has the plumbing to be a module but is a "bool" Kconfig option.

This fixes the whole lot by turning it into a machine_device_initcall
that is only created on pseries, and adding the necessary hypervisor
check before calling the H_GET_EM_PARMS hypercall

CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merge tag 'usb-3.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB bugfix from Greg KH:
"Here is a single bugfix that resolves the "can not build the OHCI
  driver with CONFIG_PM disabled" problem that lots of people have been
  reporting with 3.11-rc7.  Sorry about that one, it missed my build
  tests, and it seems, a number of others as well.

  Thank goodness for Guenter :)"

* tag 'usb-3.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: OHCI: fix build error related to ohci_suspend/resume

Merge tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from Dave Kleikamp:
"One JFS patch to fix an incompatibility with NFSv4 resulting in the
nfs client reporting a readdir loop"

* tag 'jfs-3.11-rc8' of git://github.com/kleikamp/linux-shaggy:
jfs: fix readdir cookie incompatibility with NFSv4

USB: OHCI: fix build error related to ohci_suspend/resume

Commit 9a11899c5e69 (USB: OHCI: add missing PCI PM callbacks to
ohci-pci.c) added missing ohci_suspend and ohci_resume callback
pointers, but forgot that these callbacks are declared and defined
only when CONFIG_PM is enabled.

This patch adds a preprocessor conditional to avoid build errors when
PM is disabled.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Meelis Roos <mroos@linux.ee>,
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux 3.11-rc7

Merge tag 'staging-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging fixes from Greg KH:
"Here are two tiny staging tree fixes (well, one is for an iio driver,
  but those updates come through the staging tree due to dependancies)

  One fixes a problem with an IIO driver, and the other fixes a bug in
  the comedi driver core"

* tag 'staging-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: bug-fix NULL pointer dereference on failed attach
  iio: adjd_s311: Fix non-scan mode data read

Merge tag 'usb-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are two USB fixes for 3.11-rc7

  One fixes a reported regression in the OHCI driver, and the other
  fixes a reported build breakage in the USB phy drivers"

* tag 'usb-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: phy: fix build breakage
  USB: OHCI: add missing PCI PM callbacks to ohci-pci.c

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"This round of fixes is smaller than previous: a couple more updates
  for the security fixes, and a one-liner kexec fix"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7816/1: CONFIG_KUSER_HELPERS: fix help text
  ARM: 7815/1: kexec: offline non panic CPUs on Kdump panic
  ARM: 7819/1: fiq: Cast the first argument of flush_icache_range()

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"Assorted fixes from the last week or so"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: collect_mounts() should return an ERR_PTR
  bfs: iget_locked() doesn't return an ERR_PTR
  efs: iget_locked() doesn't return an ERR_PTR()
  proc: kill the extra proc_readfd_common()->dir_emit_dots()
  cope with potentially long ->d_dname() output for shmem/hugetlb

Merge tag 'acpi-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"I really hoped that it wouldn't be necessary to change anything in
  ACPI at this point, but it turns out that we need to revert one more
  ACPI video commit causing trouble.

  This reverts a change in the ACPI video driver that caused the ACPI
  backlight initialization to be carried out even if acpi_backlight=vendor
  is passed in the kernel command line which turns out to break things
  at least on one system"

* tag 'acpi-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "ACPI / video: Always call acpi_video_init_brightness() on init"

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"This is a set of small bug fixes for lpfc and zfcp and a fix for a
  fairly nasty bug in sg where a process which cancels I/O completes in
  a kernel thread which would then try to write back to the now gone
  userspace and end up writing to a random kernel address instead"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] zfcp: remove access control tables interface (keep sysfs files)
  [SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops
  [SCSI] zfcp: fix lock imbalance by reworking request queue locking
  [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal
  [SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on

ARC: [lib] strchr breakage in Big-endian configuration

For a search buffer, 2 byte aligned, strchr() was returning pointer
outside of buffer (buf - 1)

------------->8----------------
    // Input buffer (default 4 byte aigned)
    char *buffer = "1AA_";

    // actual search start (to mimick 2 byte alignment)
    char *current_line = &(buffer[2]);

    // Character to search for
    char c = 'A';

    char *c_pos = strchr(current_line, c);

    printf("%s\n", c_pos) --> 'AA_' as oppose to 'A_'
------------->8----------------

Reported-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Debugged-by: Anton Kolesov <Anton.Kolesov@synopsys.com>
Cc: <stable@vger.kernel.org> # [3.9 and 3.10]
Cc: Noam Camus <noamc@ezchip.com>
Signed-off-by: Joern Rennecke <joern.rennecke@embecosm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

VFS: collect_mounts() should return an ERR_PTR

This should actually be returning an ERR_PTR on error instead of NULL.
That was how it was designed and all the callers expect it.

[AV: actually, that's what "VFS: Make clone_mnt()/copy_tree()/collect_mounts()
return errors" missed - originally collect_mounts() was expected to return
NULL on failure]

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

bfs: iget_locked() doesn't return an ERR_PTR

iget_locked() returns a NULL on error, it doesn't return an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

efs: iget_locked() doesn't return an ERR_PTR()

The iget_locked() function returns NULL on error and never an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

proc: kill the extra proc_readfd_common()->dir_emit_dots()

proc_readfd_common() does dir_emit_dots() twice in a row,
we need to do this only once.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cope with potentially long ->d_dname() output for shmem/hugetlb

dynamic_dname() is both too much and too little for those - the
output may be well in excess of 64 bytes dynamic_dname() assumes
to be enough (thanks to ashmem feeding really long names to
shmem_file_setup()) and vsnprintf() is an overkill for those
guys.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"This contains three commits all of which are updates for specific
  devices which aren't too widespread.  Pretty limited scope and nothing
  too interesting or dangerous"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  sata_fsl: save irqs while coalescing
  libata: apply behavioral quirks to sil3826 PMP
  sata, highbank: fix ordering of SGPIO signals

Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fix from Tejun Heo:
"A late fix for cgroup.

  This fixes a behavior regression visible to userland which was created
  by a commit merged during -rc1.  While the behavior change isn't too
  likely to be noticeable, the fix is relatively low risk and we'll need
  to backport it through -stable anyway if the bug gets released"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: fix a regression in validating config change

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Ben was on holidays for a week so a few nouveau regression fixes
  backed up, but they all seem necessary.

  Otherwise one i915 and one gma500 fix"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  gma500: Fix SDVO turning off randomly
  drm/nv04/disp: fix framebuffer pin refcounting
  drm/nouveau/mc: fix race condition between constructor and request_irq()
  drm/nouveau: fix reclocking on nv40
  drm/nouveau/ltcg: fix allocating memory as free
  drm/nouveau/ltcg: fix ltcg memory initialization after suspend
  drm/nouveau/fb: fix null derefs in nv49 and nv4e init
  drm/i915: Invalidate TLBs for the rings after a reset

usb: phy: fix build breakage

Commit 94ae9843 (usb: phy: rename all phy drivers to phy-$name-usb.c)
renamed drivers/usb/phy/otg_fsm.h to drivers/usb/phy/phy-fsm-usb.h
but changed drivers/usb/phy/phy-fsm-usb.c to include not existing
"phy-otg-fsm.h" instead of new "phy-fsm-usb.h". This breaks building:
  ...
  drivers/usb/phy/phy-fsm-usb.c:32:25: fatal error: phy-otg-fsm.h: No such file or directory
  compilation terminated.
  make[3]: *** [drivers/usb/phy/phy-fsm-usb.o] Error 1

This commit also missed to modify drivers/usb/phy/phy-fsl-usb.h
to include new "phy-fsm-usb.h" instead of "otg_fsm.h" resulting
in another build breakage:
  ...
  In file included from drivers/usb/phy/phy-fsl-usb.c:46:0:
  drivers/usb/phy/phy-fsl-usb.h:18:21: fatal error: otg_fsm.h: No such file or directory
  compilation terminated.
  make[3]: *** [drivers/usb/phy/phy-fsl-usb.o] Error 1

Fix both issues.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: stable <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

USB: OHCI: add missing PCI PM callbacks to ohci-pci.c

Commit c1117afb8589 (USB: OHCI: make ohci-pci a separate driver)
neglected to preserve the entries for the pci_suspend and pci_resume
driver callbacks. As a result, OHCI controllers don't work properly
during suspend and after hibernation.

This patch adds the missing callbacks to the driver.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Steve Cotton <steve@s.cotton.clara.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

staging: comedi: bug-fix NULL pointer dereference on failed attach

Commit dcd7b8bd63cb81c5b973bf86510ca3c80bbbd162 ("staging: comedi: put
module _after_ detach" by myself) reversed a couple of calls in
`comedi_device_attach()` when recovering from an error returned by the
low-level driver's 'attach' handler. Unfortunately, that introduced a
NULL pointer dereference bug as `dev->driver` is NULL after the call to
`comedi_device_detach()`. We still have a pointer to the low-level
comedi driver structure in the `driv` variable, so use that instead.

Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Merge networking fixes from David Miller:

1) Revert Johannes Berg's genetlink locking fix, because it causes
    regressions.

    Johannes and Pravin Shelar are working on fixing things properly.

2) Do not drop ipv6 ICMP messages without a redirected header option,
    they are legal.  From Duan Jiong.

3) Missing error return propagation in probing of via-ircc driver.
    From Alexey Khoroshilov.

4) Do not clear out broadcast/multicast/unicast/WOL bits in r8169 when
    initializing, from Peter Wu.

5) realtek phy driver programs wrong interrupt status bit, from
    Giuseppe CAVALLARO.

6) Fix statistics regression in AF_PACKET code, from Willem de Bruijn.

7) Bridge code uses wrong bitmap length, from Toshiaki Makita.

8) SFC driver uses wrong indexes to look up MAC filters, from Ben
    Hutchings.

9) Don't pass stack buffers into usb control operations in hso driver,
    from Daniel Gimpelevich.

10) Multiple ipv6 fragmentation headers in one packet is illegal and
    such packets should be dropped, from Hannes Frederic Sowa.

11) When TCP sockets are "repaired" as part of checkpoint/restart, the
    timestamp field of SKBs need to be refreshed otherwise RTOs can be
    wildly off.  From Andrey Vagin.

12) Fix memcpy args (uses 'address of pointer' instead of 'pointer') in
    hostp driver.  From Dan Carpenter.

13) nl80211hdr_put() doesn't return an ERR_PTR, but some code believes
    it does.  From Dan Carpenter.

14) Fix regression in wireless SME disconnects, from Johannes Berg.

15) Don't use a stack buffer for DMA in zd1201 USB wireless driver, from
    Jussi Kivilinna.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  ipv4: expose IPV4_DEVCONF
  ipv6: handle Redirect ICMP Message with no Redirected Header option
  be2net: fix disabling TX in be_close()
  Revert "genetlink: fix family dump race"
  hso: Fix stack corruption on some architectures
  hso: Earlier catch of error condition
  sfc: Fix lookup of default RX MAC filters when steered using ethtool
  bridge: Use the correct bit length for bitmap functions in the VLAN code
  packet: restore packet statistics tp_packets to include drops
  net: phy: rtl8211: fix interrupt on status link change
  r8169: remember WOL preferences on driver load
  via-ircc: don't return zero if via_ircc_open() failed
  macvtap: Ignore tap features when VNET_HDR is off
  macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.
  macvtap: simplify usage of tap_features
  tcp: set timestamps for restored skb-s
  bnx2x: set VF DMAE when first function has 0 supported VFs
  bnx2x: Protect against VFs' ndos when SR-IOV is disabled
  bnx2x: prevent VF benign attentions
  bnx2x: Consider DCBX remote error
  ...

Merge branch 'akpm' (patches from Andrew Morton)

Merge fixes from Andrew Morton:
"A few fixes.  One is a licensing change and I don't do licensing, so
  please eyeball that one"

Licensing eye-balled.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  lib/lz4: correct the LZ4 license
  memcg: get rid of swapaccount leftovers
  nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection
  nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error
  drivers/platform/olpc/olpc-ec.c: initialise earlier

lib/lz4: correct the LZ4 license

The LZ4 code is listed as using the "BSD 2-Clause License".

Signed-off-by: Richard Laager <rlaager@wiktel.com>
Acked-by: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Chanho Min <chanho.min@lge.com>
Cc: Richard Yao <ryao@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ The 2-clause BSD can be just converted into GPL, but that's rude and
pointless, so don't do it - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

memcg: get rid of swapaccount leftovers

The swapaccount kernel parameter without any values has been removed by
commit a2c8990aed5a ("memsw: remove noswapaccount kernel parameter") but
it seems that we didn't get rid of all the left overs.

Make sure that menuconfig help text and kernel-parameters.txt are clear
about value for the paramter and remove the stalled comment which is not
very much useful on its own.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nilfs2: fix issue with counting number of bio requests for BIO_EOPNOTSUPP error detection

Fix the issue with improper counting number of flying bio requests for
BIO_EOPNOTSUPP error detection case.

The sb_nbio must be incremented exactly the same number of times as
complete() function was called (or will be called) because
nilfs_segbuf_wait() will call wail_for_completion() for the number of
times set to sb_nbio:

  do {
      wait_for_completion(&segbuf->sb_bio_event);
  } while (--segbuf->sb_nbio > 0);

Two functions complete() and wait_for_completion() must be called the
same number of times for the same sb_bio_event.  Otherwise,
wait_for_completion() will hang or leak.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nilfs2: remove double bio_put() in nilfs_end_bio_write() for BIO_EOPNOTSUPP error

Remove double call of bio_put() in nilfs_end_bio_write() for the case of
BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter
and he suggests first version of the fix too.

Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/platform/olpc/olpc-ec.c: initialise earlier

Being a low-level component, various drivers (e.g.  olpc-battery) assume
that it is ok to communicate with the OLPC Embedded Controller during
probe.  Therefore the OLPC EC driver must be initialised before other
drivers try to use it.  This was the case until it was recently moved
out of arch/x86 and restructured around commits ac2504151f5a ("Platform:
OLPC: turn EC driver into a platform_driver") and 85f90cf6ca56 ("x86:
OLPC: switch over to using new EC driver on x86").

Use arch_initcall so that olpc-ec is readied earlier, matching the
previous behaviour.

Fixes a regression introduced in Linux-3.6 where various drivers such as
olpc-battery and olpc-xo1-sci failed to load due to an inability to
communicate with the EC.  The user-visible effect was a lack of battery
monitoring, missing ebook/lid switch input devices, etc.

Signed-off-by: Daniel Drake <dsd@laptop.org>
Cc: Andres Salomon <dilinger@queued.net>
Cc: Paul Fox <pgf@laptop.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-intel-fixes-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes

Just one patch that soaked for quite a bit to fix a resume issue,
resulting in gpu hangs (or worse) due to tlb containing garbage.

* tag 'drm-intel-fixes-2013-08-23' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: Invalidate TLBs for the rings after a reset

ipv4: expose IPV4_DEVCONF

IP sends device configuration (see inet_fill_link_af) as an array
in the netlink information, but the indices in that array are not
exposed to userspace through any current santized header file.

It was available back in 2.6.32 (in /usr/include/linux/sysctl.h)
but was broken by:
  commit 02291680ffba92e5b5865bc0c5e7d1f3056b80ec
  Author: Eric W. Biederman <ebiederm@xmission.com>
  Date:   Sun Feb 14 03:25:51 2010 +0000

    net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers

Eric was solving the sysctl problem but then the indices were re-exposed
by a later addition of devconf support for IPV4

  commit 9f0f7272ac9506f4c8c05cc597b7e376b0b9f3e4
  Author: Thomas Graf <tgraf@infradead.org>
  Date:   Tue Nov 16 04:32:48 2010 +0000

    ipv4: AF_INET link address family

Putting them in /usr/include/linux/ip.h seemed the logical match
for the DEVCONF_ definitions for IPV6 in /usr/include/linux/ip6.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: handle Redirect ICMP Message with no Redirected Header option

rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

be2net: fix disabling TX in be_close()

commit fba875591 ("disable TX in be_close()") disabled TX in be_close()
to protect be_xmit() from touching freed up queues in the AER recovery
flow. But, TX must be disabled *before* cleaning up TX completions in
the close() path, not after. This allows be_tx_compl_clean() to free up
all TX-req skbs that were notified to the HW.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ACPI / video: Always call acpi_video_init_brightness() on init"

Revert commit c04c697 (ACPI / video: Always call acpi_video_init_brightness()
on init), because it breaks eDP backlight at 1920x1080 on Acer Aspire S3
for Trevor Bortins.

References: https://bugs.freedesktop.org/show_bug.cgi?id=68355
Reported-and-bisected-by: Trevor Bortins <enabfluw@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'sfc-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc

Merge in a fix for RX MAC address filter programming bug in the sfc
driver.

Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "genetlink: fix family dump race"

This reverts commit 58ad436fcf49810aa006016107f494c9ac9013db.

It turns out that the change introduced a potential deadlock
by causing a locking dependency with netlink's cb_mutex. I
can't seem to find a way to resolve this without doing major
changes to the locking, so revert this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform driver fixes from Matthew Garrett:
"Three trivial fixes - the first reverts a patch that's broken some
  other devices (again - I'm trying to figure out a clean way to
  implement this), the other two fix minor issues in the sony-laptop
  driver"

* 'linux-next' of git://cavan.codon.org.uk/platform-drivers-x86:
  Revert "hp-wmi: Enable hotkeys on some systems"
  sony-laptop: Fix reporting of gfx_switch_status
  sony-laptop: return a negative error code in sonypi_compat_init()

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A handful of fixes for 3.11 are still trickling in.  These are:
   - A couple of fixes for older OMAP platforms
   - Another few fixes for at91 (lateish due to European summer
     vacations)
   - A late-found problem with USB on Tegra, fix is to keep VBUS
     regulator on at all times
   - One fix for Exynos 5440 dealing with CPU detection
   - One MAINTAINERS update"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: tegra: always enable USB VBUS regulators
  ARM: davinci: nand: specify ecc strength
  ARM: OMAP: rx51: change musb mode to OTG
  ARM: OMAP2: fix musb usage for n8x0
  MAINTAINERS: Update email address for Benoit Cousson
  ARM: at91/DT: fix at91sam9n12ek memory node
  ARM: at91: add missing uart clocks DT entries
  ARM: SAMSUNG: fix to support for missing cpu specific map_io
  ARM: at91/DT: at91sam9x5ek: fix USB host property to enable port C

Merge tag 'devicetree-fixes-for-3.11' of git://sources.calxeda.com/kernel/linux

Pull device tree fix from Rob Herring:
"For DT unflattening, add missing memory initialization.

  This is needed for arches like PPC that use memblock_alloc.  This
  appears to have been an issue for some time, but is a somewhat limited
  usecase of OF_DYNAMIC"

* tag 'devicetree-fixes-for-3.11' of git://sources.calxeda.com/kernel/linux:
  of: fdt: fix memory initialization for expanded DT

Merge tag 'dm-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fix from Mike Snitzer:
"A patch to fix dm-cache-policy-mq's remove_mapping() conflict with
sparc32"

* tag 'dm-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: avoid conflicting remove_mapping() in mq policy

x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member

This is the updated version of df54d6fa5427 ("x86 get_unmapped_area():
use proper mmap base for bottom-up direction") that only randomizes the
mmap base address once.

Signed-off-by: Radu Caragea <sinaelgl@gmail.com>
Reported-and-tested-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"

This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826.

The commit isn't necessarily wrong, but because it recalculates the
random mmap_base every time, it seems to confuse user memory allocators
that expect contiguous mmap allocations even when the mmap address isn't
specified.

In particular, the MATLAB Java runtime seems to be unhappy. See

https://bugzilla.kernel.org/show_bug.cgi?id=60774

So we'll want to apply the random offset only once, and Radu has a patch
for that. Revert this older commit in order to apply the other one.

Reported-by: Jeff Shorey <shoreyjeff@gmail.com>
Cc: Radu Caragea <sinaelgl@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[SCSI] zfcp: remove access control tables interface (keep sysfs files)

By popular demand, this patch brings back a couple of sysfs attributes
removed by commit 663e0890e31cb85f0cca5ac1faaee0d2d52880b5
"[SCSI] zfcp: remove access control tables interface".
The content has been irrelevant for years, but the files must be
there forever for whatever user space tools that may rely on them.

Since these files always return a constant value, a new stripped
down show-macro was required. Otherwise build warnings would have
been introduced.

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] zfcp: fix schedule-inside-lock in scsi_device list loops

BUG: sleeping function called from invalid context at kernel/workqueue.c:2752
in_atomic(): 1, irqs_disabled(): 1, pid: 360, name: zfcperp0.0.1700
CPU: 1 Not tainted 3.9.3+ #69
Process zfcperp0.0.1700 (pid: 360, task: 0000000075b7e080, ksp: 000000007476bc30)
<snip>
Call Trace:
([<00000000001165de>] show_trace+0x106/0x154)
[<00000000001166a0>] show_stack+0x74/0xf4
[<00000000006ff646>] dump_stack+0xc6/0xd4
[<000000000017f3a0>] __might_sleep+0x128/0x148
[<000000000015ece8>] flush_work+0x54/0x1f8
[<00000000001630de>] __cancel_work_timer+0xc6/0x128
[<00000000005067ac>] scsi_device_dev_release_usercontext+0x164/0x23c
[<0000000000161816>] execute_in_process_context+0x96/0xa8
[<00000000004d33d8>] device_release+0x60/0xc0
[<000000000048af48>] kobject_release+0xa8/0x1c4
[<00000000004f4bf2>] __scsi_iterate_devices+0xfa/0x130
[<000003ff801b307a>] zfcp_erp_strategy+0x4da/0x1014 [zfcp]
[<000003ff801b3caa>] zfcp_erp_thread+0xf6/0x2b0 [zfcp]
[<000000000016b75a>] kthread+0xf2/0xfc
[<000000000070c9de>] kernel_thread_starter+0x6/0xc
[<000000000070c9d8>] kernel_thread_starter+0x0/0xc

Apparently, the ref_count for some scsi_device drops down to zero,
triggering device removal through execute_in_process_context(), while
the lldd error recovery thread iterates through a scsi device list.
Unfortunately, execute_in_process_context() decides to immediately
execute that device removal function, instead of scheduling asynchronous
execution, since it detects process context and thinks it is safe to do
so. But almost all calls to shost_for_each_device() in our lldd are
inside spin_lock_irq, even in thread context. Obviously, schedule()
inside spin_lock_irq sections is a bad idea.

Change the lldd to use the proper iterator function,
__shost_for_each_device(), in combination with required locking.

Occurences that need to be changed include all calls in zfcp_erp.c,
since those might be executed in zfcp error recovery thread context
with a lock held.

Other occurences of shost_for_each_device() in zfcp_fsf.c do not
need to be changed (no process context, no surrounding locking).

The problem was introduced in Linux 2.6.37 by commit
b62a8d9b45b971a67a0f8413338c230e3117dff5
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit".

Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.37+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] zfcp: fix lock imbalance by reworking request queue locking

This patch adds wait_event_interruptible_lock_irq_timeout(), which is a
straight-forward descendant of wait_event_interruptible_timeout() and
wait_event_interruptible_lock_irq().

The zfcp driver used to call wait_event_interruptible_timeout()
in combination with some intricate and error-prone locking. Using
wait_event_interruptible_lock_irq_timeout() as a replacement
nicely cleans up that locking.

This rework removes a situation that resulted in a locking imbalance
in zfcp_qdio_sbal_get():

BUG: workqueue leaked lock or atomic: events/1/0xffffff00/10
last function: zfcp_fc_wka_port_offline+0x0/0xa0 [zfcp]

It was introduced by commit c2af7545aaff3495d9bf9a7608c52f0af86fb194
"[SCSI] zfcp: Do not wait for SBALs on stopped queue", which had a new
code path related to ZFCP_STATUS_ADAPTER_QDIOUP that took an early exit
without a required lock being held. The problem occured when a
special, non-SCSI I/O request was being submitted in process context,
when the adapter's queues had been torn down. In this case the bug
surfaced when the Fibre Channel port connection for a well-known address
was closed during a concurrent adapter shut-down procedure, which is a
rare constellation.

This patch also fixes these warnings from the sparse tool (make C=1):

drivers/s390/scsi/zfcp_qdio.c:224:12: warning: context imbalance in
'zfcp_qdio_sbal_check' - wrong count at exit
drivers/s390/scsi/zfcp_qdio.c:244:5: warning: context imbalance in
'zfcp_qdio_sbal_get' - unexpected unlock

Last but not least, we get rid of that crappy lock-unlock-lock
sequence at the beginning of the critical section.

It is okay to call zfcp_erp_adapter_reopen() with req_q_lock held.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org #2.6.35+
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

ARM: tegra: always enable USB VBUS regulators

This fixes a regression exposed during the merge window by commit
9f310de "ARM: tegra: fix VBUS regulator GPIO polarity in DT"; namely that
USB VBUS doesn't get turned on, so USB devices are not detected. This
affects the internal USB port on TrimSlice (i.e. the USB->SATA bridge, to
which the SSD is connected) and the external port(s) on Seaboard/
Springbank and Whistler.

The Tegra DT as written in v3.11 allows two paths to enable USB VBUS:

1) Via the legacy DT binding for the USB controller; it can directly
acquire a VBUS GPIO and activate it.

2) Via a regulator for VBUS, which is referenced by the new DT binding
for the USB controller.

Those two methods both use the same GPIO, and hence whichever of the
USB controller and regulator gets probed first ends up owning the GPIO.
In practice, the USB driver only supports path (1) above, since the
patches to support the new USB binding are not present until v3.12:-(

In practice, the regulator ends up being probed first and owning the
GPIO. Since nothing enables the regulator (the USB driver code is not
yet present), the regulator ends up being turned off. This originally
caused no problem, because the polarity in the regulator definition was
incorrect, so attempting to turn off the regulator actually turned it
on, and everything worked:-(

However, when testing the new USB driver code in v3.12, I noticed the
incorrect polarity and fixed it in commit 9f310de "ARM: tegra: fix VBUS
regulator GPIO polarity in DT". In the context of v3.11, this patch then
caused the USB VBUS to actually turn off, which broke USB ports with VBUS
control. I got this patch included in v3.11-rc1 since it fixed a bug in
device tree (incorrect polarity specification), and hence was suitable to
be included early in the rc series. I evidently did not test the patch at
all, or correctly, in the context of v3.11, and hence did not notice the
issue that I have explained above:-(

Fix this by making the USB VBUS regulators always enabled. This way, if
the regulator owns the GPIO, it will always be turned on, even if there
is no USB driver code to request the regulator be turned on. Even
ignoring this bug, this is a reasonable way to configure the HW anyway.

If this patch is applied to v3.11, it will cause a couple pretty trivial
conflicts in tegra20-{trimslice,seaboard}.dts when creating v3.12, since
the context right above the added lines changed in patches destined for
v3.12.

Reported-by: Kyle McMartin <kmcmarti@redhat.com>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

hso: Fix stack corruption on some architectures

As Sergei Shtylyov explained in the #mipslinux IRC channel:
[Mon 2013-08-19 12:28:21 PM PDT] <headless> guys, are you sure it's not "DMA off stack" case?
[Mon 2013-08-19 12:28:35 PM PDT] <headless> it's a known stack corruptor on non-coherent arches
[Mon 2013-08-19 12:31:48 PM PDT] <DonkeyHotei> headless: for usb/ehci?
[Mon 2013-08-19 12:34:11 PM PDT] <DonkeyHotei> headless: explain
[Mon 2013-08-19 12:35:38 PM PDT] <headless> usb_control_msg() (or other such func) should not use buffer on stack. DMA from/to stack is prohibited
[Mon 2013-08-19 12:35:58 PM PDT] <headless> and EHCI uses DMA on control xfers (as well as all the others)

Signed-off-by: Daniel Gimpelevich <daniel@gimpelevich.san-francisco.ca.us>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

hso: Earlier catch of error condition

There is no need to get an interface specification if we know it's the
wrong one.

Signed-off-by: Daniel Gimpelevich <daniel@gimpelevich.san-francisco.ca.us>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

of: fdt: fix memory initialization for expanded DT

Already existing property flags are filled wrong for properties created from
initial FDT. This could cause problems if this DYNAMIC device-tree functions
are used later, i.e. properties are attached/detached/replaced. Simply dumping
flags from the running system show, that some initial static (not allocated via
kzmalloc()) nodes are marked as dynamic.

I putted some debug extensions to property_proc_show(..) :
..
+       if (OF_IS_DYNAMIC(pp))
+               pr_err("DEBUG: xxx : OF_IS_DYNAMIC\n");
+       if (OF_IS_DETACHED(pp))
+               pr_err("DEBUG: xxx : OF_IS_DETACHED\n");

when you operate on the nodes (e.g.: ~$ cat /proc/device-tree/*some_node*) you
will see that those flags are filled wrong, basically in most cases it will dump
a DYNAMIC or DETACHED status, which is in not true.
(BTW. this OF_IS_DETACHED is a own define for debug purposes which which just
make a test_bit(OF_DETACHED, &x->_flags)

If nodes are dynamic kernel is allowed to kfree() them. But it will crash
attempting to do so on the nodes from FDT -- they are not allocated via
kzmalloc().

Signed-off-by: Wladislav Wiebe <wladislav.kw@gmail.com>
Acked-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rob Herring <rob.herring@calxeda.com>

gma500: Fix SDVO turning off randomly

Some Poulsbo cards seem to incorrectly report SDVO_CMD_STATUS_TARGET_NOT_SPECIFIED instead of SDVO_CMD_STATUS_PENDING, which causes the display to be turned off.

Signed-off-by: Guillaume Clement <gclement@baobob.org>
Acked-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

regression fixes and null derefs and oops fixes.

* 'drm-nouveau-next' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
  drm/nv04/disp: fix framebuffer pin refcounting
  drm/nouveau/mc: fix race condition between constructor and request_irq()
  drm/nouveau: fix reclocking on nv40
  drm/nouveau/ltcg: fix allocating memory as free
  drm/nouveau/ltcg: fix ltcg memory initialization after suspend
  drm/nouveau/fb: fix null derefs in nv49 and nv4e init

Merge tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.

* tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/smp: initialize IPI vectors before marking CPU online
  xen/events: mask events when changing their VCPU binding
  xen/events: initialize local per-cpu mask for all possible events
  x86/xen: do not identity map UNUSABLE regions in the machine E820
  xen/arm: missing put_cpu in xen_percpu_init

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fix from Ralf Baechle:
"Just a single patch which fixes a special case in the MIPS FPU
  emulator which is always required, even on CPUs with FPU.  There is
  the rare special case that an FPU (or certain other instructions) in a
  branch delay slot is causing an exception and then the branch
  instruction will need to be emulated by the kernel before resuming
  execution.  This is working great except if the branch instruction is
  an Octeon BBIT instruction.

  The boring disclaimer - all MIPS defconfigs build tested and no
  regressions and runtime tested on Octeon, no known issues"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Handle OCTEON BBIT instructions in FPU emulator.

Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 perf fixes from Catalin Marinas:
"Perf backend fixes for arm64 where the user can cause kernel panic
  (discovered with Vince's fuzzing tool)"

* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: perf: fix event validation for software group leaders
  arm64: perf: fix array out of bounds access in armpmu_map_hw_event()

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and aarch64.

  This pull request is coming a bit later than I would have preferred,
  because I and Gleb happened to have holidays around the same weeks of
  August...  sorry about that"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: ARM: Squash len warning
  arm64: KVM: use 'int' instead of 'u32' for variable 'target' in kvm_host.h.
  arm64: KVM: add missing dsb before invalidating Stage-2 TLBs
  arm64: KVM: perform save/restore of PAR_EL1
  arm64: KVM: fix 2-level page tables unmapping
  ARM: KVM: Fix unaligned unmap_range leak
  ARM: KVM: Fix 64-bit coprocessor handling

Merge tag 'pinctrl-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:
"Fixes for the sunxi (AllWinner) pin control driver.  This was a new
  driver in this merge window, so some post-merge hardening is
  happening"

[ I had completely missed this pull request for some reason, it was sent
  over a week ago but my mailbox is chaotic ]

* tag 'pinctrl-for-v3.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: sunxi: Add spinlocks
  pinctrl: sunxi: Fix gpio_set behaviour
  pinctrl: sunxi: Read register before writing to it in irq_set_type

[SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

- A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

- Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(&sfp->rq_list_lock);
return result; /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

- Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(&sfp->rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
* packet.
*/
wake_up_interruptible(&sfp->read_wait);
kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
kref_put(&sfp->f_ref, sg_remove_sfp);
} else {
INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
schedule_work(&srp->ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

- In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley <James.Bottomley@hansenpartnership.com>,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis <costa@purestorage.com> for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Tested-by: David Milburn <dmilburn@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

[SCSI] lpfc: Don't force CONFIG_GENERIC_CSUM on

We want ppc64 to be able to select between optimised assembly
checksum routines in big endian and the generic lib/checksum.c
routines in little endian.

The lpfc driver is forcing CONFIG_GENERIC_CSUM on which means
we are unable to make the decision to enable it in the arch
Kconfig. If the option exists it is always forced on.

This got introduced in 3.10 via commit 6a7252fdb0c3 ([SCSI] lpfc:
fix up Kconfig dependencies). I spoke to Randy about it and
the original issue was with CRC_T10DIF not being defined.

As such, remove the select of CONFIG_GENERIC_CSUM.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.10
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

Merge remote-tracking branch 'regmap/fix/rbtree' into regmap-linus

Merge remote-tracking branch 'regmap/fix/header' into regmap-linus

regmap: rbtree: Fix overlapping rbnodes.

Avoid overlapping register regions by making the initial blklen of a new
node 1. If a register write occurs to a yet uncached register, that is
lower than but near an existing node's base_reg, a new node is created
and it's blklen is set to an arbitrary value (sizeof(*rbnode)). That may
cause this node to overlap with another node. Those nodes should be merged,
but this merge doesn't happen yet, so this patch at least makes the initial
blklen small enough to avoid hitting the wrong node, which may otherwise
lead to severe breakage.

Signed-off-by: David Jander <david@protonic.nl>
Signed-off-by: Mark Brown <broonie@linaro.org>
Cc: stable@vger.kernel.org

sfc: Fix lookup of default RX MAC filters when steered using ethtool

commit 385904f819e3 ('sfc: Don't use
efx_filter_{build,hash,increment}() for default MAC filters') used the
wrong name to find the index of default RX MAC filters at insertion/
update time. This could result in memory corruption and would in any
case silently fail to update the filter.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

cpuset: fix a regression in validating config change

It's not allowed to clear masks of a cpuset if there're tasks in it,
but it's broken:

  # mkdir /cgroup/sub
  # echo 0 > /cgroup/sub/cpuset.cpus
  # echo 0 > /cgroup/sub/cpuset.mems
  # echo $$ > /cgroup/sub/tasks
  # echo > /cgroup/sub/cpuset.cpus
  (should fail)

This bug was introduced by commit 88fa523bff295f1d60244a54833480b02f775152
("cpuset: allow to move tasks to empty cpusets").

tj: Dropped temp bool variables and nestes the conditionals directly.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

bridge: Use the correct bit length for bitmap functions in the VLAN code

The VLAN code needs to know the length of the per-port VLAN bitmap to
perform its most basic operations (retrieving VLAN informations, removing
VLANs, forwarding database manipulation, etc). Unfortunately, in the
current implementation we are using a macro that indicates the bitmap
size in longs in places where the size in bits is expected, which in
some cases can cause what appear to be random failures.
Use the correct macro.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/nv04/disp: fix framebuffer pin refcounting

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/mc: fix race condition between constructor and request_irq()

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau: fix reclocking on nv40

In commit 77145f1cbdf8d28b46ff8070ca749bad821e0774 was introduced
error which cause that reclocking on nv40 not working anymore.
There is missing assigment of return value from pll_calc to ret.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Martin Peres <martin.peres@labri.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/ltcg: fix allocating memory as free

Allocating type=0 marks the memory as free. This allows the ltcg memory
to be allocated twice.

Add a BUG_ON in core/mm.c to prevent this ever happening again.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/ltcg: fix ltcg memory initialization after suspend

Some registers were not initialized in init, this causes them to be
uninitialized after suspend.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

drm/nouveau/fb: fix null derefs in nv49 and nv4e init

Commit dceef5d87 (drm/nouveau/fb: initialise vram controller as pfb
sub-object) moved some code around and introduced these null derefs.
pfb->ram is set to the new ram object outside of this ctor.

Reported-by: Ronald Uitermark <ronald645@gmail.com>
Tested-by: Ronald Uitermark <ronald645@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

John W. Linville says:

====================
Regarding the iwlwifi bits, Johannes says:

"We revert an rfkill bugfix that unfortunately caused more bugs, shuffle
some code to avoid touching the PCIe device before it's enabled and
disconnect if firmware fails to do our bidding. I also have Stanislaw's
fix to not crash in some channel switch scenarios."

As for the mac80211 bits, Johannes says:

"This time, I have one fix from Dan Carpenter for users of
nl80211hdr_put(), and one fix from myself fixing a regression with the
libertas driver."

Along with the above...

Dan Carpenter fixes some incorrectly placed "address of" operators
in hostap that caused copying of junk data.

Jussi Kivilinna corrects zd1201 to use an allocated buffer rather
than the stack for a URB operation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

packet: restore packet statistics tp_packets to include drops

getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit
ee80fbf301 ("packet: account statistics only in tpacket_stats_u")
cleaned up the getsockopt PACKET_STATISTICS code.
This also changed semantics. Historically, tp_packets included
tp_drops on return. The commit removed the line that adds tp_drops
into tp_packets.

This patch reinstates the old semantics.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: rtl8211: fix interrupt on status link change

This is to fix a problem in the rtl8211 where the driver
wasn't properly enabled the interrupt on link change status.
it has to enable the ineterrupt on the bit 10 in the register 18
(INER).

Reported-by: Sharma Bhupesh <B45370@freescale.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- Check if the skb has been correctly prepared before going on

r8169: remember WOL preferences on driver load

Do not clear Broadcast/Multicast/Unicast Wake Flag or LanWake in
Config5. This is necessary to preserve WOL state when the driver is
loaded. Although the r8168 vendor driver does not write Config5 (it has
been commented out), Hayes Wang from Realtek said that masking bits like
this is more sensible.

Signed-off-by: Peter Wu <lekensteyn@gmail.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

via-ircc: don't return zero if via_ircc_open() failed

If via_ircc_open() fails, data structures of the driver left uninitialized,
but probe (via_init_one()) returns zero. That can lead to null pointer dereference
in via_remove_one(), since it does not check drvdata for NULL.

The patch implements proper error code propagation.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Ignore tap features when VNET_HDR is off

When the user turns off VNET_HDR support on the
macvtap device, there is no way to provide any
offload information to the user. So, it's safer
to ignore offload setting then depend on the user
setting them correctly.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Correctly set tap features when IFF_VNET_HDR is disabled.

When the user turns off IFF_VNET_HDR flag, attempts to change
offload features via TUNSETOFFLOAD do not work. This could cause
GSO packets to be delivered to the user when the user is
not prepared to handle them.

To solve, allow processing of TUNSETOFFLOAD when IFF_VNET_HDR is
disabled.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: simplify usage of tap_features

In macvtap, tap_features specific the features of that the user
has specified via ioctl(). If we treat macvtap as a macvlan+tap
then we could all the tap a pseudo-device and give it other features
like SG and GSO. Then we can stop using the features of lower
device (macvlan) when forwarding the traffic the tap.

This solves the issue of possible checksum offload mismatch between
tap feature and macvlan features.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: set timestamps for restored skb-s

When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent.

The "when" field must be set for such skb. It's used in tcp_rearm_rto
for example. If the "when" field isn't set, the retransmit timeout can
be calculated incorrectly and a tcp connected can stop for two minutes
(TCP_RTO_MAX).

Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MIPS: Handle OCTEON BBIT instructions in FPU emulator.

The branch emulation needs to handle the OCTEON BBIT instructions,
otherwise we get SIGILL instead of emulation.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5726/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

xen/smp: initialize IPI vectors before marking CPU online

An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:

kernel BUG at drivers/xen/events.c:1328!

RCU has detected that a CPU has not entered a quiescent state within the
grace period.  It needs to send the CPU a reschedule IPI if it is not
offline.  rcu_implicit_offline_qs() does this check:

/*
* If the CPU is offline, it is in a quiescent state.  We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}

Else the CPU is online.  Send it a reschedule IPI.

The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()).  See start_secondary():

set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;

start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:

/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();

/* enable local interrupts */
local_irq_enable();

The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug.  In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.

The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up.  rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:

kernel BUG at drivers/xen/events.c:1328!

xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()

because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.

There is at least one other place that has caused the same crash:

xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()

clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:

/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);

This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).

But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."

The conclusion was that:
"
1. At the IPI sender side:

   It is incorrect to send an IPI to an offline CPU (cpu not present in
   the cpu_online_mask). There are numerous places where we check this
   and warn/complain.

2. At the IPI receiver side:

   It is incorrect to let the world know of our presence (by setting
   ourselves in global bitmasks) until our initialization steps are complete
   to such an extent that we can handle the consequences (such as
   receiving interrupts without crashing the sender etc.)
" (from Srivatsa)

As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.

It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.

This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.

Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/events: mask events when changing their VCPU binding

When a event is being bound to a VCPU there is a window between the
EVTCHNOP_bind_vpcu call and the adjustment of the local per-cpu masks
where an event may be lost. The hypervisor upcalls the new VCPU but
the kernel thinks that event is still bound to the old VCPU and
ignores it.

There is even a problem when the event is being bound to the same VCPU
as there is a small window beween the clear_bit() and set_bit() calls
in bind_evtchn_to_cpu(). When scanning for pending events, the kernel
may read the bit when it is momentarily clear and ignore the event.

Avoid this by masking the event during the whole bind operation.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: stable@vger.kernel.org

xen/events: initialize local per-cpu mask for all possible events

The sizeof() argument in init_evtchn_cpu_bindings() is incorrect
resulting in only the first 64 (or 32 in 32-bit guests) ports having
their bindings being initialized to VCPU 0.

In most cases this does not cause a problem as request_irq() will set
the irq affinity which will set the correct local per-cpu mask.
However, if the request_irq() is called on a VCPU other than 0, there
is a window between the unmasking of the event and the affinity being
set were an event may be lost because it is not locally unmasked on
any VCPU. If request_irq() is called on VCPU 0 then local irqs are
disabled during the window and the race does not occur.

Fix this by initializing all NR_EVENT_CHANNEL bits in the local
per-cpu masks.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org

x86/xen: do not identity map UNUSABLE regions in the machine E820

If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.

There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.

We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.

This fixes a boot failure on hosts with tboot.

tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.

  (XEN)  0000000000000000 - 0000000000060000 (usable)
  (XEN)  0000000000060000 - 0000000000068000 (reserved)
  (XEN)  0000000000068000 - 000000000009e000 (usable)
  (XEN)  0000000000100000 - 0000000000800000 (usable)
  (XEN)  0000000000800000 - 0000000000972000 (unusable)

tboot marked this region as unusable.

  (XEN)  0000000000972000 - 00000000cf200000 (usable)
  (XEN)  00000000cf200000 - 00000000cf38f000 (reserved)
  (XEN)  00000000cf38f000 - 00000000cf3ce000 (ACPI data)
  (XEN)  00000000cf3ce000 - 00000000d0000000 (reserved)
  (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
  (XEN)  00000000fe000000 - 0000000100000000 (reserved)
  (XEN)  0000000100000000 - 0000000630000000 (usable)

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

sata_fsl: save irqs while coalescing

Before this patch, I was seeing the following lockdep splat on my
MPC8315 (PPC32) target:

  [    9.086051] =================================
  [    9.090393] [ INFO: inconsistent lock state ]
  [    9.094744] 3.9.7-ajf-gc39503d #1 Not tainted
  [    9.099087] ---------------------------------
  [    9.103432] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
  [    9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes:
  [    9.114642]  (&(&host->lock)->rlock){?.+...}, at: [<c02f4168>] sata_fsl_interrupt+0x50/0x250
  [    9.123137] {HARDIRQ-ON-W} state was registered at:
  [    9.128004]   [<c006cdb8>] lock_acquire+0x90/0xf4
  [    9.132737]   [<c043ef04>] _raw_spin_lock+0x34/0x4c
  [    9.137645]   [<c02f3560>] fsl_sata_set_irq_coalescing+0x68/0x100
  [    9.143750]   [<c02f36a0>] sata_fsl_init_controller+0xa8/0xc0
  [    9.149505]   [<c02f3f10>] sata_fsl_probe+0x17c/0x2e8
  [    9.154568]   [<c02acc90>] driver_probe_device+0x90/0x248
  [    9.159987]   [<c02acf0c>] __driver_attach+0xc4/0xc8
  [    9.164964]   [<c02aae74>] bus_for_each_dev+0x5c/0xa8
  [    9.170028]   [<c02ac218>] bus_add_driver+0x100/0x26c
  [    9.175091]   [<c02ad638>] driver_register+0x88/0x198
  [    9.180155]   [<c0003a24>] do_one_initcall+0x58/0x1b4
  [    9.185226]   [<c05aeeac>] kernel_init_freeable+0x118/0x1c0
  [    9.190823]   [<c0004110>] kernel_init+0x18/0x108
  [    9.195542]   [<c000f6b8>] ret_from_kernel_thread+0x64/0x6c
  [    9.201142] irq event stamp: 160
  [    9.204366] hardirqs last  enabled at (159): [<c043f778>] _raw_spin_unlock_irq+0x30/0x50
  [    9.212469] hardirqs last disabled at (160): [<c000f414>] reenable_mmu+0x30/0x88
  [    9.219867] softirqs last  enabled at (144): [<c002ae5c>] __do_softirq+0x168/0x218
  [    9.227435] softirqs last disabled at (137): [<c002b0d4>] irq_exit+0xa8/0xb4
  [    9.234481]
  [    9.234481] other info that might help us debug this:
  [    9.240995]  Possible unsafe locking scenario:
  [    9.240995]
  [    9.246898]        CPU0
  [    9.249337]        ----
  [    9.251776]   lock(&(&host->lock)->rlock);
  [    9.255878]   <Interrupt>
  [    9.258492]     lock(&(&host->lock)->rlock);
  [    9.262765]
  [    9.262765]  *** DEADLOCK ***
  [    9.262765]
  [    9.268684] no locks held by scsi_eh_1/39.
  [    9.272767]
  [    9.272767] stack backtrace:
  [    9.277117] Call Trace:
  [    9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable)
  [    9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c
  [    9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658
  [    9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840
  [    9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4
  [    9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c
  [    9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250
  [    9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254
  [    9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78
  [    9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104
  [    9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28
  [    9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8
  [    9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14
  [    9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50
  [    9.353983]     LR = _raw_spin_unlock_irq+0x30/0x50
  [    9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188
  [    9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0
  [    9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8
  [    9.382454] [cc713c20] [c02de4b8] ata_dev_read_id+0x158/0x594
  [    9.388205] [cc713ca0] [c02ec244] ata_eh_recover+0xd88/0x13d0
  [    9.393962] [cc713d20] [c02f2520] sata_pmp_error_handler+0xc0/0x8ac
  [    9.400234] [cc713dd0] [c02ecdc8] ata_scsi_port_error_handler+0x464/0x5e8
  [    9.407023] [cc713e10] [c02ecfd0] ata_scsi_error+0x84/0xb8
  [    9.412528] [cc713e40] [c02c4974] scsi_error_handler+0xd8/0x47c
  [    9.418457] [cc713eb0] [c004737c] kthread+0xa8/0xac
  [    9.423355] [cc713f40] [c000f6b8] ret_from_kernel_thread+0x64/0x6c

This fix was suggested by Bhushan Bharat <R65777@freescale.com>, and
was discussed in email at:

  http://linuxppc.10917.n7.nabble.com/MPC8315-reboot-failure-lockdep-splat-possibly-related-tp75162.html

Same patch successfully tested with 3.9.7.  linux-next compiled but
not tested on hardware.

This patch is based off linux-next tag next-20130819
(which is commit 66a01bae29d11916c09f9f5a937cafe7d402e4a5 )

Signed-off-by: Anthony Foiani <anthony.foiani@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

arm64: perf: fix event validation for software group leaders

This is a port of c95eb3184ea1 ("ARM: 7809/1: perf: fix event validation
for software group leaders") to arm64, which fixes a panic in the arm64
perf backend found as a result of Vince's fuzzing tool.

Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: perf: fix array out of bounds access in armpmu_map_hw_event()

This is a port of d9f966357b14 ("ARM: 7810/1: perf: Fix array out of
bounds access in armpmu_map_hw_event()") to arm64, which fixes an oops
in the arm64 perf backend found as a result of Vince's fuzzing tool.

Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

bnx2x: set VF DMAE when first function has 0 supported VFs

There are possible HW configurations in which PFs will have SR-IOV capability
but will have Max VFs set to 0 - this happens when there are Multi-Function
devices where the VFs are allocated to only some of the PFs.

DMAE is configured to support VFs only if the configuring PF has supported VFs.
In case the first PF to be loaded will be one without supported VFs, it will
not configure DMAE to the VF-supporting mode. When VFs of other PFs will be
loaded later on, they will not be able to communicate with their PF.

This changes the requirement for configuring DMAE for VF-supporting mode;
If the device has SR-IOV capabilities there must be some PF that has
max supported VFs > 0, thus it will configure the DMAE for supporting VFs.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>