James Hogan [Tue, 15 Nov 2016 00:06:05 +0000 (00:06 +0000)]
KVM: MIPS/T&E: active_mm = init_mm in guest context
Set init_mm as the active_mm and update mm_cpumask(current->mm) to
reflect that it isn't active when in guest context. This prevents cache
management code from attempting cache flushes on host virtual addresses
while in guest context, for example due to a cache management IPIs or
later when writing of dynamically translated code hits copy on write.
We do this using helpers in static kernel code to avoid having to export
init_mm to modules.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Fri, 18 Nov 2016 13:25:24 +0000 (13:25 +0000)]
KVM: MIPS/T&E: Restore host asid on return to host
We only need the guest ASID loaded while in guest context, i.e. while
running guest code and while handling guest exits. We load the guest
ASID when entering the guest, however we restore the host ASID later
than necessary, when the VCPU state is saved i.e. vcpu_put() or slightly
earlier if preempted after returning to the host.
This mismatch is both unpleasant and causes redundant host ASID restores
in kvm_trap_emul_vcpu_put(). Lets explicitly restore the host ASID when
returning to the host, and don't bother restoring the host ASID on
context switch in unless we're already in guest context.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
Add implementation callbacks for entering the guest (vcpu_run()) and
reentering the guest (vcpu_reenter()), allowing implementation specific
operations to be performed before entering the guest or after returning
to the host without cluttering kvm_arch_vcpu_ioctl_run().
This allows the T&E specific lazy user GVA flush to be moved into
trap_emul.c, along with disabling of the HTW. We also move
kvm_mips_deliver_interrupts() as VZ will need to restore the guest timer
state prior to delivering interrupts.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Tue, 11 Oct 2016 22:14:39 +0000 (23:14 +0100)]
KVM: MIPS: Remove duplicated ASIDs from vcpu
The kvm_vcpu_arch structure contains both mm_structs for allocating MMU
contexts (primarily the ASID) but it also copies the resulting ASIDs
into guest_{user,kernel}_asid[] arrays which are referenced from uasm
generated code.
This duplication doesn't seem to serve any purpose, and it gets in the
way of generalising the ASID handling across guest kernel/user modes, so
lets just extract the ASID straight out of the mm_struct on demand, and
in fact there are convenient cpu_context() and cpu_asid() macros for
doing so.
To reduce the verbosity of this code we do also add kern_mm and user_mm
local variables where the kernel and user mm_structs are used.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Wed, 16 Nov 2016 23:48:56 +0000 (23:48 +0000)]
KVM: MIPS/MMU: Move preempt/ASID handling to implementation
The MIPS KVM host and guest GVA ASIDs may need regenerating when
scheduling a process in guest context, which is done from the
kvm_arch_vcpu_load() / kvm_arch_vcpu_put() functions in mmu.c.
However this is a fairly implementation specific detail. VZ for example
may use GuestIDs instead of normal ASIDs to distinguish mappings
belonging to different guests, and even on VZ without GuestID the root
TLB will be used differently to trap & emulate.
Trap & emulate GVA ASIDs only relate to the user part of the full
address space, so can be left active during guest exit handling (guest
context) to allow guest instructions to be easily read and translated.
VZ root ASIDs however are for GPA mappings so can't be left active
during normal kernel code. They also aren't useful for accessing guest
virtual memory, and we should have CP0_BadInstr[P] registers available
to provide encodings of trapping guest instructions anyway.
Therefore move the ASID preemption handling into the implementation
callback.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Sat, 12 Nov 2016 00:00:13 +0000 (00:00 +0000)]
KVM: MIPS: Convert get/set_regs -> vcpu_load/put
Convert the get_regs() and set_regs() callbacks to vcpu_load() and
vcpu_put(), which provide a cpu argument and more closely match the
kvm_arch_vcpu_load() / kvm_arch_vcpu_put() that they are called by.
This is in preparation for moving ASID management into the
implementations.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Fri, 13 Mar 2015 15:54:08 +0000 (15:54 +0000)]
KVM: MIPS/MMU: Simplify ASID restoration
KVM T&E uses an ASID for guest kernel mode and an ASID for guest user
mode. The current ASID is saved when the guest is scheduled out, and
restored when scheduling back in, with checks for whether the ASID needs
to be regenerated.
This isn't really necessary as the ASID can be easily determined by the
current guest mode, so lets simplify it to just read the required ASID
from guest_kernel_asid or guest_user_asid even if the ASID hasn't been
regenerated.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Wed, 4 Jan 2017 22:05:22 +0000 (22:05 +0000)]
KVM: MIPS: Drop partial KVM_NMI implementation
MIPS incompletely implements the KVM_NMI ioctl to supposedly perform a
CPU reset, but all it actually does is invalidate the ASIDs. It doesn't
expose the KVM_CAP_USER_NMI capability which is supposed to indicate the
presence of the KVM_NMI ioctl, and no user software actually uses it on
MIPS.
Since this is dead code that would technically need updating for GVA
page table handling in upcoming patches, remove it now. If we wanted to
implement NMI injection later it can always be done properly along with
the KVM_CAP_USER_NMI capability, and if we wanted to implement a proper
CPU reset it would be better done with a separate ioctl.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
James Hogan [Mon, 28 Nov 2016 16:38:01 +0000 (16:38 +0000)]
MIPS: Add return errors to protected cache ops
The protected cache ops contain no out of line fixup code to return an
error code in the event of a fault, with the cache op being skipped in
that case. For KVM however we'd like to detect this case as page
faulting will be disabled so it could happen during normal operation if
the GVA page tables were flushed, and need to be handled by the caller.
Add the out-of-line fixup code to load the error value -EFAULT into the
return variable, and adapt the protected cache line functions to pass
the error back to the caller.
James Hogan [Sat, 10 Sep 2016 22:55:07 +0000 (23:55 +0100)]
MIPS: Export some tlbex internals for KVM to use
Export to TLB exception code generating functions so that KVM can
construct a fast TLB refill handler for guest context without
reinventing the wheel quite so much.
James Hogan [Sat, 10 Sep 2016 22:53:57 +0000 (23:53 +0100)]
MIPS: uasm: Add include guards in asm/uasm.h
Add include guards in asm/uasm.h to allow it to be safely used by a new
header asm/tlbex.h in the next patch to expose TLB exception building
functions for KVM to use.
James Hogan [Fri, 16 Oct 2015 15:33:13 +0000 (16:33 +0100)]
MIPS: Export pgd/pmd symbols for KVM
Export pmd_init(), invalid_pmd_table and tlbmiss_handler_setup_pgd to
GPL kernel modules so that MIPS KVM can use the inline page table
management functions and switch between page tables:
- pmd_init() will be used directly by KVM to initialise newly allocated
pmd tables with invalid lower level table pointers.
- invalid_pmd_table is used by pud_present(), pud_none(), and
pud_clear(), which KVM will use to test and clear pud entries.
- tlbmiss_handler_setup_pgd() will be called by KVM entry code to switch
to the appropriate GVA page tables.
James Hogan [Thu, 2 Feb 2017 01:21:35 +0000 (01:21 +0000)]
MIPS: Move pgd_alloc() out of header
pgd_alloc() references init_mm which is not exported to modules. In
order for KVM to be able to use pgd_alloc() to allocate GVA page tables,
move pgd_alloc() into a new pgtable.c file and export it to modules.
Markus Elfring [Thu, 19 Jan 2017 10:10:26 +0000 (11:10 +0100)]
MIPS: KVM: Return directly after a failed copy_from_user() in kvm_arch_vcpu_ioctl()
* Return directly after a call of the function "copy_from_user" failed
in a case block.
* Delete the jump label "out" which became unnecessary with
this refactoring.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: James Hogan <james.hogan@imgtec.com>
Piotr Luc [Tue, 10 Jan 2017 17:34:03 +0000 (18:34 +0100)]
kvm: x86: Expose Intel VPOPCNTDQ feature to guest
Vector population count instructions for dwords and qwords are to be
used in future Intel Xeon & Xeon Phi processors. The bit 14 of
CPUID[level:0x07, ECX] indicates that the new instructions are
supported by a processor.
The spec can be found in the Intel Software Developer Manual (SDM)
or in the Instruction Set Extensions Programming Reference (ISE).
Signed-off-by: Piotr Luc <piotr.luc@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Piotr Luc [Tue, 10 Jan 2017 17:34:02 +0000 (18:34 +0100)]
x86/cpufeature: Add AVX512_VPOPCNTDQ feature
Vector population count instructions for dwords and qwords are going to be
available in future Intel Xeon & Xeon Phi processors. Bit 14 of
CPUID[level:0x07, ECX] indicates that the instructions are supported by a
processor.
The specification can be found in the Intel Software Developer Manual (SDM)
and in the Instruction Set Extensions Programming Reference (ISE).
Populate the feature bit and clear it when xsave is disabled.
Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Link: http://lkml.kernel.org/r/20170110173403.6010-2-piotr.luc@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Linus Torvalds [Mon, 16 Jan 2017 00:09:50 +0000 (16:09 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
"This tree contains 4 fixes.
The first is a fix for a race that can causes oopses under the right
circumstances, and that someone just recently encountered.
Past that are several small trivial correct fixes. A real issue that
was blocking development of an out of tree driver, but does not appear
to have caused any actual problems for in-tree code. A potential
deadlock that was reported by lockdep. And a deadlock people have
experienced and took the time to track down caused by a cleanup that
removed the code to drop a reference count"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Drop reference added by grab_header in proc_sys_readdir
pid: fix lockdep deadlock warning due to ucount_lock
libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
mnt: Protect the mountpoint hashtable with mount_lock
Linus Torvalds [Sun, 15 Jan 2017 20:40:53 +0000 (12:40 -0800)]
Merge tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 4.10-rc4 that resolve
some reported issues.
The MEI driver issue resolves a lot of problems that people have been
having, as does the mem driver fix. The other minor fixes resolve
other reported issues.
All of these have been in linux-next for a while"
* tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
vme: Fix wrong pointer utilization in ca91cx42_slave_get
auxdisplay: fix new ht16k33 build errors
ppdev: don't print a free'd string
extcon: return error code on failure
drivers: char: mem: Fix thinkos in kmem address checks
mei: bus: enable OS version only for SPT and newer
Linus Torvalds [Sun, 15 Jan 2017 20:38:53 +0000 (12:38 -0800)]
Merge tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single patch being reverted to remove a feature that was
added in 4.10-rc1 that isn't quite ready for release.
It will be redone as a debugfs file instead of a sysfs file in the
future"
* tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Add deferred_probe attribute to devices in sysfs"
Linus Torvalds [Sun, 15 Jan 2017 20:34:35 +0000 (12:34 -0800)]
Merge tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB driver fixes for 4.10-rc4 to resolve some
reported issues.
The "largest" here is a number of bugs being fixed in the ch341
usb-serial driver, to hopefully resolve the mess of different devices
floating around that use this driver that have been having problems
with the 4.10-rc1 release.
There's also a tiny musb fix that I missed in the last pull request,
as well as the traditional xhci fix rounding out the batch.
All have been in linux-next with no reported issues"
* tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: fix deadlock at host remove by running watchdog correctly
USB: serial: ch341: fix control-message error handling
usb: musb: fix runtime PM in debugfs
wusbcore: Fix one more crypto-on-the-stack bug
USB: serial: kl5kusb105: fix line-state error handling
USB: serial: ch341: fix baud rate and line-control handling
USB: serial: ch341: fix line settings after reset-resume
USB: serial: ch341: fix resume after reset
USB: serial: ch341: fix open error handling
USB: serial: ch341: fix modem-control and B0 handling
USB: serial: ch341: fix open and resume after B0
USB: serial: ch341: fix initial modem-control state
Linus Torvalds [Sun, 15 Jan 2017 20:28:14 +0000 (12:28 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Bugfixes for I2C. Mostly core this time which is a bit unusual but
nothing really scary in there"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: Avoid race conditions with IMC
i2c: fix spelling mistake: "insufficent" -> "insufficient"
i2c: print correct device invalid address
i2c: do not enable fall back to Host Notify by default
i2c: fix kernel memory disclosure in dev interface
Linus Torvalds [Sun, 15 Jan 2017 20:03:11 +0000 (12:03 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- unwinder fixes
- AMD CPU topology enumeration fixes
- microcode loader fixes
- x86 embedded platform fixes
- fix for a bootup crash that may trigger when clearcpuid= is used
with invalid values"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Use compatible types in comparison to fix sparse error
x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
x86/entry: Fix the end of the stack for newly forked tasks
x86/unwind: Include __schedule() in stack traces
x86/unwind: Disable KASAN checks for non-current tasks
x86/unwind: Silence warnings for non-current tasks
x86/microcode/intel: Use correct buffer size for saving microcode data
x86/microcode/intel: Fix allocation size of struct ucode_patch
x86/microcode/intel: Add a helper which gives the microcode revision
x86/microcode: Use native CPUID to tickle out microcode revision
x86/CPU: Add native CPUID variants returning a single datum
x86/boot: Add missing declaration of string functions
x86/CPU/AMD: Fix Bulldozer topology
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
x86/cpu: Fix typo in the comment for Anniedale
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
Linus Torvalds [Sun, 15 Jan 2017 20:00:37 +0000 (12:00 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull NOHZ fix from Ingo Molnar:
"This fixes an old NOHZ race where we incorrectly calculate the next
timer interrupt in certain circumstances where hrtimers are pending,
that can cause hard to reproduce stalled-values artifacts in
/proc/stat"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix collision between tick and other hrtimers
Linus Torvalds [Sun, 15 Jan 2017 19:37:43 +0000 (11:37 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
driver fixes, plus miscellanous tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Reject non sampling events with precise_ip
perf/x86/intel: Account interrupts for PEBS errors
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
perf/core: Fix sys_perf_event_open() vs. hotplug
perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
perf/x86: Set pmu->module in Intel PMU modules
perf probe: Fix to probe on gcc generated symbols for offline kernel
perf probe: Fix --funcs to show correct symbols for offline module
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
tools lib traceevent: Fix prev/next_prio for deadline tasks
perf record: Fix --switch-output documentation and comment
perf record: Make __record_options static
tools lib subcmd: Add OPT_STRING_OPTARG_SET option
perf probe: Fix to get correct modname from elf header
samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
samples/bpf sock_example: Avoid getting ethhdr from two includes
perf sched timehist: Show total scheduling time
Linus Torvalds [Sun, 15 Jan 2017 18:54:39 +0000 (10:54 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"A number of regression fixes:
- Fix a boot hang on machines that have somewhat unusual memory map
entries of phys_addr=0x0 num_pages=0, which broke due to a recent
commit. This commit got cherry-picked from the v4.11 queue because
the bug is affecting real machines.
- Fix a boot hang also reported by KASAN, caused by incorrect init
ordering introduced by a recent optimization.
- Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
that introduced an invalid assumption. Neither bugs were seen in
the wild AFAIK"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Prune invalid memory map entries and fix boot regression
x86/efi: Don't allocate memmap through memblock after mm_init()
efi/libstub/arm*: Pass latest memory map to the kernel
Linus Torvalds [Sun, 15 Jan 2017 01:13:28 +0000 (17:13 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a fencepost error in pipe_advance()
coredump: Ensure proper size of sparse core files
aio: fix lock dep warning
tmpfs: clear S_ISGID when setting posix ACLs
Linus Torvalds [Sun, 15 Jan 2017 01:07:04 +0000 (17:07 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- the virtio_blk stack DMA corruption fix from Christoph, fixing and
issue with VMAP stacks.
- O_DIRECT blkbits calculation fix from Chandan.
- discard regression fix from Christoph.
- queue init error handling fixes for nbd and virtio_blk, from Omar and
Jeff.
- two small nvme fixes, from Christoph and Guilherme.
- rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
to more closely follow what we do in other places in the block layer.
This interface is new for this series, so let's get the naming right
before releasing a kernel with this feature. From Damien.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't try to discard from __blkdev_issue_zeroout
sd: remove __data_len hack for WRITE SAME
nvme: use blk_rq_payload_bytes
scsi: use blk_rq_payload_bytes
block: add blk_rq_payload_bytes
block: Rename blk_queue_zone_size and bdev_zone_size
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
nvme-rdma: fix nvme_rdma_queue_is_ready
virtio_blk: fix panic in initialization error path
nbd: blk_mq_init_queue returns an error code on failure, not NULL
virtio_blk: avoid DMA to stack for the sense buffer
do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
Al Viro [Sun, 15 Jan 2017 00:33:08 +0000 (19:33 -0500)]
fix a fencepost error in pipe_advance()
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers. If that happened, none of them had been released,
leaving pipe full. Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages. IOW, it's an infoleak.
Cc: stable@vger.kernel.org # v4.9 Reported-by: "Alan J. Wylie" <alan@wylie.me.uk> Tested-by: "Alan J. Wylie" <alan@wylie.me.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dave Kleikamp [Wed, 11 Jan 2017 19:25:00 +0000 (13:25 -0600)]
coredump: Ensure proper size of sparse core files
If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.
After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.
This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dmitry Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 14 Jan 2017 19:09:24 +0000 (11:09 -0800)]
Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"The fixes this time around are spread over drivers, pretty normal
update:
- PCI ID for SKL ioatdma, workaround for SKX and
ioat_alloc_chan_resources sleepy allocation fix
- dw kconfig typo fix
- null pointer deref for stm32
- MAINTAINERS Update for at_hdmac
- pl330 runtime pm fixes
- omap-dma port window fix
- rcar-dmac unmap slave resource fix"
* tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: rcar-dmac: unmap slave resource when channel is freed
dmaengine: omap-dma: Fix the port_window support
dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
dmaengine: pl330: Fix runtime PM support for terminated transfers
MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
dmaengine: omap-dma: Fix dynamic lch_map allocation
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
dmaengine: stm32-dma: Set correct args number for DMA request from DT
dmaengine: dw: fix typo in Kconfig
dmaengine: ioatdma: workaround SKX ioatdma version
dmaengine: ioatdma: Add Skylake PCI Dev ID
This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:
It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:
It then removes these entries from the memory map.
Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT] Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log] Cc: <stable@vger.kernel.org> # v4.9+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121 Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Wed, 28 Dec 2016 13:31:03 +0000 (14:31 +0100)]
perf/x86/intel: Account interrupts for PEBS errors
It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:
taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:
Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.
We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 11 Jan 2017 20:09:50 +0000 (21:09 +0100)]
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.
... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.
That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.
So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).
Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.
Reported-by: Di Shen (Keen Lab) Tested-by: John Dias <joaodias@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Min Chong <mchong@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 9 Dec 2016 13:59:00 +0000 (14:59 +0100)]
perf/core: Fix sys_perf_event_open() vs. hotplug
There is problem with installing an event in a task that is 'stuck' on
an offline CPU.
Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
blocked task doesn't run and doesn't require a CPU etc.. Only on
wakeup do we ammend the situation and place the task on a available
CPU.
If we hit such a task with perf_install_in_context() we'll loop until
either that task wakes up or the CPU comes back online, if the task
waking depends on the event being installed, we're stuck.
While looking into this issue, I also spotted another problem, if we
hit a task with perf_install_in_context() that is in the middle of
being migrated, that is we observe the old CPU before sending the IPI,
but run the IPI (on the old CPU) while the task is already running on
the new CPU, things also go sideways.
Rework things to rely on task_curr() -- outside of rq->lock -- which
is rather tricky. Imagine the following scenario where we're trying to
install the first event into our task 't':
CPU0 CPU1 CPU2
(current == t)
t->perf_event_ctxp[] = ctx;
smp_mb();
cpu = task_cpu(t);
switch(t, n);
migrate(t, 2);
switch(p, t);
ctx = t->perf_event_ctxp[]; // must not be NULL
smp_function_call(cpu, ..);
generic_exec_single()
func();
spin_lock(ctx->lock);
if (task_curr(t)) // false
add_event_to_ctx();
spin_unlock(ctx->lock);
perf_event_context_sched_in();
spin_lock(ctx->lock);
// sees event
So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
Because if CPU2's load of that variable were to observe NULL, it would
not try to schedule the ctx and we'd have a task running without its
counter, which would be 'bad'.
As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
first and not see the event yet, then CPU0 must observe task_curr()
and retry. If the install happens first, then we must see the event on
sched-in and all is well.
I think we can translate the first part (until the 'must not be NULL')
of the scenario to a litmus test like:
Where:
x is perf_event_ctxp[],
y is our tasks's CPU, and
z is our task being placed on the rq of CPU2.
The P0 smp_mb() is the one added by this patch, ordering the store to
perf_event_ctxp[] from find_get_context() and the load of task_cpu()
in task_function_call().
The smp_store_release/smp_load_acquire model the RCpc locking of the
rq->lock and the smp_mb() of P2 is the context switch switching from
whatever CPU2 was running to our task 't'.
Linus Torvalds [Sat, 14 Jan 2017 01:40:22 +0000 (17:40 -0800)]
Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"These are all over the place.
The tracepoint part of the pull fixes a crash and adds a little more
information to two tracepoints, while the rest are good old fashioned
fixes"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: make tracepoint format strings more compact
Btrfs: add truncated_len for ordered extent tracepoints
Btrfs: add 'inode' for extent map tracepoint
btrfs: fix crash when tracepoint arguments are freed by wq callbacks
Btrfs: adjust outstanding_extents counter properly when dio write is split
Btrfs: fix lockdep warning about log_mutex
Btrfs: use down_read_nested to make lockdep silent
btrfs: fix locking when we put back a delayed ref that's too new
btrfs: fix error handling when run_delayed_extent_op fails
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
Linus Torvalds [Sat, 14 Jan 2017 01:35:43 +0000 (17:35 -0800)]
Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
- Export and make use of has_capability() to fix incorrect use of
ns_capable() for testing task capabilities (Jike Song)
* tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: Remove pid_namespace.h include
vfio iommu type1: fix the testing of capability for remote task
capability: export has_capability
vfio-mdev: remove some dead code
vfio-mdev: buffer overflow in ioctl()
vfio-mdev: return -EFAULT if copy_to_user() fails
Linus Torvalds [Sat, 14 Jan 2017 01:00:42 +0000 (17:00 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix huge_ptep_set_access_flags() to return "changed" when any of the
ptes in the contiguous range is changed, not just the last one
- Fix the adr_l assembly macro to work in modules under KASLR
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: assembler: make adr_l work in modules under KASLR
arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
block: don't try to discard from __blkdev_issue_zeroout
Discard can return -EIO asynchronously if the alignment for the request
isn't suitable for the driver, which makes a proper fallback to other
methods in __blkdev_issue_zeroout impossible. Thus only issue a sync
discard from blkdev_issue_zeroout an don't try discard at all from
__blkdev_issue_zeroout as a non-invasive workaround.
One more reason why abusing discard for zeroing must die..
Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Eryu Guan <eguan@redhat.com> Fixes: e73c23ff ("block: add async variant of blkdev_issue_zeroout") Signed-off-by: Jens Axboe <axboe@fb.com>
Now that we have the blk_rq_payload_bytes helper available to determine
the actual I/O size we don't need to mess around with __data_len for
WRITE SAME.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Fri, 13 Jan 2017 20:38:36 +0000 (12:38 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The major fix is the bfa firmware, since the latest 10Gb cards fail
probing with the current firmware.
The rest is a set of minor fixes: one missed Kconfig dependency
causing randconfig failures, a missed error return on an error leg, a
change for how multiqueue waits on a blocked device and a don't reset
while in reset fix"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: bfa: Increase requested firmware version to 3.2.5.1
scsi: snic: Return error code on memory allocation failure
scsi: fnic: Avoid sending reset to firmware when another reset is in progress
scsi: qedi: fix build, depends on UIO
scsi: scsi-mq: Wait for .queue_rq() if necessary
Linus Torvalds [Fri, 13 Jan 2017 19:49:34 +0000 (11:49 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"Small driver fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
Input: adxl34x - make it enumerable in ACPI environment
Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
Input: synaptics-rmi4 - fix F03 build error when serio is module
Input: xpad - use correct product id for x360w controllers
Input: synaptics_i2c - change msleep to usleep_range for small msecs
Input: i8042 - add Pegatron touchpad to noloop table
Input: joydev - remove unused linux/miscdevice.h include
Niklas Söderlund [Wed, 11 Jan 2017 14:39:31 +0000 (15:39 +0100)]
dmaengine: rcar-dmac: unmap slave resource when channel is freed
The slave mapping should be removed together with other channel
resources when the channel is freed. If it's not unmapped it will hang
around forever after the channel is freed.
Jike Song [Thu, 12 Jan 2017 08:52:03 +0000 (16:52 +0800)]
vfio iommu type1: fix the testing of capability for remote task
Before the mdev enhancement type1 iommu used capable() to test the
capability of current task; in the course of mdev development a
new requirement, testing for another task other than current, was
raised. ns_capable() was used for this purpose, however it still
tests current, the only difference is, in a specified namespace.
Fix it by using has_capability() instead, which tests the cap for
specified task in init_user_ns, the same namespace as capable().
Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Jike Song <jike.song@intel.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Linus Torvalds [Thu, 12 Jan 2017 22:45:59 +0000 (14:45 -0800)]
Merge tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This time we got a few more fixes than the previous rc's, and most of
commits were about ASoC.
The only significant change in the core side is the regression fix wrt
the aux device list handling, and all the rest are driver-specific
small / trivial fixes"
* tag 'sound-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add a quirk for Plantronics BT600
ASoC: rt5645: set sel_i2s_pre_div1 to 2
ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
ASoC: Intel: Skylake: Release FW ctx in cleanup
ASoC: Intel: bytcr-rt5640: fix settings in internal clock mode
ASoC: fsl_ssi: set fifo watermark to more reliable value
ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
ASoC: nau8825: correct the function name of register
ASoC: Intel: Skylake: Fix to fail safely if module not available in path
ASoC: tlv320aic3x: Mark the RESET register as volatile
ASoC: Fix binding and probing of auxiliary components
ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
ASoC: Intel: bytcr_rt5640: fallback mechanism if MCLK is not enabled
ASoC: hdmi-codec: use unsigned type to structure members with bit-field
ASoC: topology: kfree kcontrol->private_value before freeing kcontrol
ASoC: rsnd: don't double free kctrl
ASoC: dwc: Fix PIO mode initialization
Ricardo Ribalda [Wed, 11 Jan 2017 09:11:44 +0000 (10:11 +0100)]
i2c: piix4: Avoid race conditions with IMC
On AMD's SB800 and upwards, the SMBus is shared with the Integrated
Micro Controller (IMC).
The platform provides a hardware semaphore to avoid race conditions
among them. (Check page 288 of the SB800-Series Southbridges Register
Reference Guide http://support.amd.com/TechDocs/45482.pdf)
Without this patch, many access to the SMBus end with an invalid
transaction or even with the bus stalled.
Reported-by: Alexandre Desnoyers <alex@qtec.com> Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>: Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Dmitry Torokhov [Thu, 5 Jan 2017 04:57:22 +0000 (20:57 -0800)]
i2c: do not enable fall back to Host Notify by default
Falling back unconditionally to HostNotify as primary client's interrupt
breaks some drivers which alter their functionality depending on whether
interrupt is present or not, so let's introduce a board flag telling I2C
core explicitly if we want wired interrupt or HostNotify-based one:
I2C_CLIENT_HOST_NOTIFY.
For DT-based systems we introduce "host-notify" property that we convert
to I2C_CLIENT_HOST_NOTIFY board flag.
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
i2c: fix kernel memory disclosure in dev interface
i2c_smbus_xfer() does not always fill an entire block, allowing
kernel stack memory disclosure through the temp variable. Clear
it before it's read to.
Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Linus Torvalds [Thu, 12 Jan 2017 19:00:22 +0000 (11:00 -0800)]
Merge tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc
Pull remoteproc fixes from Bjorn Andersson:
"This fixes two regressions that have been reported to be introduced in
v4.10-rc1.
- correct an incorrect usage of the kref api
- revert the change to make the resource table read-only. As the
space each vdev resource is used as virtio device config space it
must be shared with the remote"
* tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc:
Revert "remoteproc: Merge table_ptr and cached_table pointers"
remoteproc: fix vdev reference management
Linus Torvalds [Thu, 12 Jan 2017 18:58:16 +0000 (10:58 -0800)]
Merge tag 'rpmsg-v4.10-fixes' of git://github.com/andersson/remoteproc
Pull rpmsg fixes from Bjorn Andersson:
"This fixes a regression introduced in v4.10-rc1 that prohibits
multiple channels with the same name but different endpoint addresses
to be used"
* tag 'rpmsg-v4.10-fixes' of git://github.com/andersson/remoteproc:
rpmsg: virtio_rpmsg_bus: fix channel creation
Linus Torvalds [Thu, 12 Jan 2017 18:55:28 +0000 (10:55 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- device descriptor length validation fix to hid-cypress driver from
Greg
- introduction of a short delay into i2c-hid, which is not really
mandated by the spec, but fixes Asus Touchpads
- Petzl USB connectable flashlight quirk from myself
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: i2c-hid: Add sleep between POWER ON and RESET
HID: hid-cypress: validate length of report
HID: ignore Petzl USB headlamp
Linus Torvalds [Thu, 12 Jan 2017 18:17:59 +0000 (10:17 -0800)]
Merge tag 'md/4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull md fixes from Shaohua Li:
"Basically one fix for raid5 cache which is merged in this cycle,
others are trival fixes"
* tag 'md/4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
md/raid5: Use correct IS_ERR() variation on pointer check
md: cleanup mddev flag clear for takeover
md/r5cache: fix spelling mistake on "recoverying"
md/r5cache: assign conf->log before r5l_load_log()
md/r5cache: simplify handling of sh->log_start in recovery
md/raid5-cache: removes unnecessary write-through mode judgments
md/raid10: Refactor raid10_make_request
md/raid1: Refactor raid1_make_request
Ard Biesheuvel [Wed, 11 Jan 2017 14:54:53 +0000 (14:54 +0000)]
arm64: assembler: make adr_l work in modules under KASLR
When CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, the offset between loaded
modules and the core kernel may exceed 4 GB, putting symbols exported
by the core kernel out of the reach of the ordinary adrp/add instruction
pairs used to generate relative symbol references. So make the adr_l
macro emit a movz/movk sequence instead when executing in module context.
While at it, remove the pointless special case for the stack pointer.
Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Merge tag 'usb-serial-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.10-rc4
These fixes address a number of issues in the ch341 driver and includes
a partial revert of a change in how we set the line settings that went
into 4.10-rc1 but which turned out to have undesired side effects. This
included deasserting the modem-control lines when configuring the
device, but also prevented a certain class of CH340 devices from working
with the driver.
Included are also two fixes for two minor information leaks in
kl5kusb105 and ch341 due to failures to detect short control transfers.
Damien Le Moal [Thu, 12 Jan 2017 14:58:32 +0000 (07:58 -0700)]
block: Rename blk_queue_zone_size and bdev_zone_size
All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.
No functional change is introduced by this patch.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Collapsed the two patches, they were nonsensically split and broke
bisection.
Paolo Bonzini [Thu, 12 Jan 2017 14:02:32 +0000 (15:02 +0100)]
KVM: x86: fix emulation of "MOV SS, null selector"
This is CVE-2017-2583. On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.
Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.
The syzkaller folks reported a NULL pointer dereference due to
ENABLE_CAP succeeding even without an irqchip. The Hyper-V
synthetic interrupt controller is activated, resulting in a
wrong request to rescan the ioapic and a NULL pointer dereference.
The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.
This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Steve Rutherford [Thu, 12 Jan 2017 02:28:29 +0000 (18:28 -0800)]
KVM: x86: Introduce segmented_write_std
Introduces segemented_write_std.
Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt. This fixes CVE-2017-2584, a longstanding
kernel memory leak.
Since commit 283c95d0e389 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!
Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: 96051572c819194c37a8367624b285be10297eca Fixes: 283c95d0e3891b64087706b344a4b545d04a6e62 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steve Rutherford <srutherford@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.
Use the new static_key_deferred_flush() API to flush pending updates on
module unload.
Signed-off-by: David Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Matlack [Fri, 16 Dec 2016 22:30:35 +0000 (14:30 -0800)]
jump_labels: API for flushing deferred jump label updates
Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.
Signed-off-by: David Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Josh Poimboeuf [Mon, 9 Jan 2017 18:00:25 +0000 (12:00 -0600)]
x86/entry: Fix the end of the stack for newly forked tasks
When unwinding a task, the end of the stack is always at the same offset
right below the saved pt_regs, regardless of which syscall was used to
enter the kernel. That convention allows the unwinder to verify that a
stack is sane.
However, newly forked tasks don't always follow that convention, as
reported by the following unwinder warning seen by Dave Jones:
WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value (null)
The problem is that ret_from_fork() doesn't create a stack frame before
calling other functions. Fix that by carefully using the frame pointer
macros.
In addition to conforming to the end of stack convention, this also
makes related stack traces more sensible by making it clear to the user
that ret_from_fork() was involved.
Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 9 Jan 2017 18:00:24 +0000 (12:00 -0600)]
x86/unwind: Include __schedule() in stack traces
In the following commit:
0100301bfdf5 ("sched/x86: Rewrite the switch_to() code")
... the layout of the 'inactive_task_frame' struct was designed to have
a frame pointer header embedded in it, so that the unwinder could use
the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or
ret_from_fork() for newly forked tasks which haven't actually run yet).
Finish the job by changing get_frame_pointer() to return a pointer to
inactive_task_frame's 'bp' field rather than 'bp' itself. This allows
the unwinder to start one frame higher on the stack, so that it properly
reports __schedule().
Reported-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 9 Jan 2017 18:00:23 +0000 (12:00 -0600)]
x86/unwind: Disable KASAN checks for non-current tasks
There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.
These cases seem to be mostly harmless. The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.
In such cases, it's possible that the unwinder may read a KASAN-poisoned
region of the stack. Account for that by using READ_ONCE_NOCHECK() when
reading the stack of another task.
Use READ_ONCE() when reading the stack of the current task, since KASAN
warnings can still be useful for finding bugs in that case.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Josh Poimboeuf [Mon, 9 Jan 2017 18:00:22 +0000 (12:00 -0600)]
x86/unwind: Silence warnings for non-current tasks
There are a handful of callers to save_stack_trace_tsk() and
show_stack() which try to unwind the stack of a task other than current.
In such cases, it's remotely possible that the task is running on one
CPU while the unwinder is reading its stack from another CPU, causing
the unwinder to see stack corruption.
These cases seem to be mostly harmless. The unwinder has checks which
prevent it from following bad pointers beyond the bounds of the stack.
So it's not really a bug as long as the caller understands that
unwinding another task will not always succeed.
Since stack "corruption" on another task's stack isn't necessarily a
bug, silence the warnings when unwinding tasks other than current.
Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Brendan McGrath [Fri, 6 Jan 2017 21:01:38 +0000 (08:01 +1100)]
HID: i2c-hid: Add sleep between POWER ON and RESET
Support for the Asus Touchpad was recently added. It turns out this
device can fail initialisation (and become unusable) when the RESET
command is sent too soon after the POWER ON command.
Unfortunately the i2c-hid specification does not specify the need for
a delay between these two commands. But it was discovered the Windows
driver has a 1ms delay.
As a result, this patch modifies the i2c-hid module to add a sleep
inbetween the POWER ON and RESET commands which lasts between 1ms and 5ms.
See https://github.com/vlasenko/hid-asus-dkms/issues/24 for further
details.
Linus Torvalds [Wed, 11 Jan 2017 19:15:15 +0000 (11:15 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"27 fixes.
There are three patches that aren't actually fixes. They're simple
function renamings which are nice-to-have in mainline as ongoing net
development depends on them."
* akpm: (27 commits)
timerfd: export defines to userspace
mm/hugetlb.c: fix reservation race when freeing surplus pages
mm/slab.c: fix SLAB freelist randomization duplicate entries
zram: support BDI_CAP_STABLE_WRITES
zram: revalidate disk under init_lock
mm: support anonymous stable page
mm: add documentation for page fragment APIs
mm: rename __page_frag functions to __page_frag_cache, drop order from drain
mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
mm: don't dereference struct page fields of invalid pages
mailmap: add codeaurora.org names for nameless email commits
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: pmd dirty emulation in page fault handler
ipc/sem.c: fix incorrect sem_lock pairing
lib/Kconfig.debug: fix frv build failure
mm: get rid of __GFP_OTHER_NODE
mm: fix remote numa hits statistics
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
ocfs2: fix crash caused by stale lvb with fsdlm plugin
...
Dan Carpenter [Sat, 7 Jan 2017 06:30:08 +0000 (09:30 +0300)]
vfio-mdev: remove some dead code
We set info.count to 1 in mtty_get_irq_info() so static checkers
complain that, "Why do we have impossible conditions?" The answer is
that it seems to be left over dead code that can be safely removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Dan Carpenter [Sat, 7 Jan 2017 06:28:40 +0000 (09:28 +0300)]
vfio-mdev: buffer overflow in ioctl()
This is a sample driver for documentation so the impact is probably
pretty low. But we should check that bar_index is valid so we
don't write beyond the end of the mdev_state->region_info[] array.
Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Dan Carpenter [Sat, 7 Jan 2017 06:27:49 +0000 (09:27 +0300)]
vfio-mdev: return -EFAULT if copy_to_user() fails
The copy_to_user() function returns the number of bytes which it wasn't
able to copy but we want to return a negative error code.
Fixes: 9d1a546c53b4 ("docs: Sample driver to demonstrate how to use Mediated device framework.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Takashi Iwai [Wed, 11 Jan 2017 18:49:27 +0000 (19:49 +0100)]
Merge tag 'asoc-fix-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v4.10
As well as the usual smattering of driver specific fixes collected since
the merge window this has one particularly important fix to the core for
handling of aux_devs which was broken during the merge window by some of
the componentization refactoring.
Jan Kara [Wed, 11 Jan 2017 18:20:04 +0000 (10:20 -0800)]
xfs: Timely free truncated dirty pages
Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
to skip dirty pages in xfs_vm_releasepage() which also has the effect
that if a dirty page is truncated, it does not get freed by
block_invalidatepage() and is lingering in LRU list waiting for reclaim.
So a simple loop like:
while true; do
dd if=/dev/zero of=file bs=1M count=100
rm file
done
will keep using more and more memory until we hit low watermarks and
start pagecache reclaim which will eventually reclaim also the truncate
pages. Keeping these truncated (and thus never usable) pages in memory
is just a waste of memory, is unnecessarily stressing page cache
reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
prematurely.
So instead of just skipping dirty pages in xfs_vm_releasepage(), return
to old behavior of skipping them only if they have delalloc or unwritten
buffers and fix the spurious warnings by warning only if the page is
clean.
CC: stable@vger.kernel.org CC: Brian Foster <bfoster@redhat.com> CC: Vlastimil Babka <vbabka@suse.cz> Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz> Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2) Memory disclosure in appletalk ipddp routing code, from Vlad
Tsyrklevich.
3) r8152 can erroneously split an RX packet into multiple URBs if the
Rx FIFO is not empty when we suspend. Fix this by waiting for the
FIFO to empty before suspending. From Hayes Wang.
4) Two GRO fixes (enter slow path when not enough SKB tail room exists,
disable frag0 optimizations when there are IPV6 extension headers)
from Eric Dumazet and Herbert Xu.
5) A series of mlx5e bug fixes (do source udp port offloading for
tunnels properly, Ip fragment matching fixes, handling firmware
errors properly when installing TC rules, etc.) from Saeed Mahameed,
Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel
Jurgens.
6) Two VRF fixes from David Ahern (don't skip multipath selection for
VRF paths, disallow VRF to be configured with table ID 0).
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
net: vrf: do not allow table id 0
net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
sctp: Fix spelling mistake: "Atempt" -> "Attempt"
net: ipv4: Fix multipath selection with vrf
cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
gro: use min_t() in skb_gro_reset_offset()
net/mlx5: Only cancel recovery work when cleaning up device
net/mlx5e: Remove WARN_ONCE from adaptive moderation code
net/mlx5e: Un-register uplink representor on nic_disable
net/mlx5e: Properly handle FW errors while adding TC rules
net/mlx5e: Fix kbuild warnings for uninitialized parameters
net/mlx5e: Set inline mode requirements for matching on IP fragments
net/mlx5e: Properly get address type of encapsulation IP headers
net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
net/mlx5e: Warn when rejecting offload attempts of IP tunnels
net/mlx5e: Properly handle offloading of source udp port for IP tunnels
gro: Disable frag0 optimization on IPv6 ext headers
gro: Enter slow-path if there is no tailroom
mlx4: Return EOPNOTSUPP instead of ENOTSUPP
net/af_iucv: don't use paged skbs for TX on HiperSockets
...
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.
When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.
In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.
So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.
Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness") Reported-by: Andrew Byrne <byrneadw@ie.ibm.com> Reported-by: Jaime A. H. Gomez <jahgomez@mx1.ibm.com> Reported-by: Zachary D. Myers <zdmyers@us.ibm.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Acked-by: Jeffrey Lien <Jeff.Lien@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Now that we don't abuse the cmd field in struct request for nvme command
passthrough this function needs to be converted to the proper accessor
as well.
Fixes: d49187e97e ("nvme: introduce struct nvme_request") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>