Linus Torvalds [Sat, 5 Mar 2016 01:51:16 +0000 (17:51 -0800)]
Merge tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"Two build fixes for cpufreq drivers (including one for breakage
introduced recently) and a fix for a graph tracer crash when used over
suspend-to-RAM on x86.
Specifics:
- Prevent the graph tracer from crashing when used over suspend-to-
RAM on x86 by pausing it before invoking do_suspend_lowlevel() and
un-pausing it when that function has returned (Todd Brandt).
- Fix build issues in the qoriq and mediatek cpufreq drivers related
to broken dependencies on THERMAL (Arnd Bergmann)"
* tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / sleep / x86: Fix crash on graph trace through x86 suspend
cpufreq: mediatek: allow building as a module
cpufreq: qoriq: allow building as module with THERMAL=m
Linus Torvalds [Sat, 5 Mar 2016 01:43:40 +0000 (17:43 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Arm64 fix for -rc7. Without it, our struct page array can overflow
the vmemmap region on systems with a large PHYS_OFFSET.
Nothing else on the radar at the moment, so hopefully that's it for
4.5 from us.
Summary: Ensure struct page array fits within vmemmap area"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: vmemmap: use virtual projection of linear region
Linus Torvalds [Sat, 5 Mar 2016 01:31:32 +0000 (17:31 -0800)]
Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
"Filipe nailed down a problem where tree log replay would do some work
that orphan code wasn't expecting to be done yet, leading to BUG_ON"
* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix loading of orphan roots leading to BUG_ON
Linus Torvalds [Sat, 5 Mar 2016 00:57:04 +0000 (16:57 -0800)]
Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"A feature was added in 4.3 that allowed users to filter trace points
on a tasks "comm" field. But this prevented filtering on a comm field
that is within a trace event (like sched_migrate_task).
When trying to filter on when a program migrated, this change
prevented the filtering of the sched_migrate_task.
To fix this, the event fields are examined first, and then the extra
fields like "comm" and "cpu" are examined. Also, instead of testing
to assign the comm filter function based on the field's name, the
generic comm field is given a new filter type (FILTER_COMM). When
this field is used to filter the type is checked. The same is done
for the cpu filter field.
Two new special filter types are added: "COMM" and "CPU". This allows
users to still filter the tasks comm for events that have "comm" as
one of their fields, in cases that users would like to filter
sched_migrate_task on the comm of the task that called the event, and
not the comm of the task that is being migrated"
* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not have 'comm' filter override event 'comm' field
tracing: Do not have 'comm' filter override event 'comm' field
Commit 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.
will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.
This fix requires a couple of changes.
1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.
2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.
3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.
Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
Cc: stable@vger.kernel.org # v4.3+ Fixes: 9f61668073a8d "tracing: Allow triggers to filter for CPU ids and process names" Reported-by: Matt Fleming <matt@codeblueprint.co.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Filipe Manana [Wed, 2 Mar 2016 15:49:38 +0000 (15:49 +0000)]
Btrfs: fix loading of orphan roots leading to BUG_ON
When looking for orphan roots during mount we can end up hitting a
BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
replayed and qgroups are enabled. This is because after a log tree is
replayed, a transaction commit is made, which triggers qgroup extent
accounting which in turn does backref walking which ends up reading and
inserting all roots in the radix tree fs_info->fs_root_radix, including
orphan roots (deleted snapshots). So after the log tree is replayed, when
finding orphan roots we hit the BUG_ON with the following trace:
So fix this by not treating the error -EEXIST, returned when attempting
to insert a root already inserted by the backref walking code, as an error.
The following test case for xfstests reproduces the bug:
seq=`basename $0`
seqres=$RESULT_DIR/$seq
echo "QA output created by $seq"
tmp=/tmp/$$
status=1 # failure is the default!
trap "_cleanup; exit \$status" 0 1 2 3 15
_cleanup()
{
_cleanup_flakey
cd /
rm -f $tmp.*
}
# get standard environment, filters and checks
. ./common/rc
. ./common/filter
. ./common/dmflakey
# real QA test starts here
_supported_fs btrfs
_supported_os Linux
_require_scratch
_require_dm_target flakey
_require_metadata_journaling $SCRATCH_DEV
# Create 2 directories with one file in one of them.
# We use these just to trigger a transaction commit later, moving the file from
# directory a to directory b and doing an fsync against directory a.
mkdir $SCRATCH_MNT/a
mkdir $SCRATCH_MNT/b
touch $SCRATCH_MNT/a/f
sync
# Create a snapshot and delete it. This doesn't really delete the snapshot
# immediately, just makes it inaccessible and invisible to user space, the
# snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
# which is woke up at the next transaction commit.
# A root orphan item is inserted into the tree of tree roots, so that if a
# power failure happens before the dedicated kernel thread does the snapshot
# deletion, the next time the filesystem is mounted it resumes the snapshot
# deletion.
_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
_run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap
# Now overwrite half of the extents we wrote before. Because we made a snapshpot
# before, which isn't really deleted yet (since no transaction commit happened
# after we did the snapshot delete request), the non overwritten extents get
# referenced twice, once by the default subvolume and once by the snapshot.
$XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io
# Now move file f from directory a to directory b and fsync directory a.
# The fsync on the directory a triggers a transaction commit (because a file
# was moved from it to another directory) and the file fsync leaves a log tree
# with file extent items to replay.
mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
echo "File digest before power failure:"
md5sum $SCRATCH_MNT/foobar | _filter_scratch
# Now simulate a power failure and mount the filesystem to replay the log tree.
# After the log tree was replayed, we used to hit a BUG_ON() when processing
# the root orphan item for the deleted snapshot. This is because when processing
# an orphan root the code expected to be the first code inserting the root into
# the fs_info->fs_root_radix radix tree, while in reallity it was the second
# caller attempting to do it - the first caller was the transaction commit that
# took place after replaying the log tree, when updating the qgroup counters.
_flakey_drop_and_remount
echo "File digest before after failure:"
# Must match what he got before the power failure.
md5sum $SCRATCH_MNT/foobar | _filter_scratch
_unmount_flakey
status=0
exit
Fixes: 2d9e97761087 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref") Cc: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
Linus Torvalds [Thu, 3 Mar 2016 19:54:56 +0000 (11:54 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero
- x86: Small fix for Skylake TSC scaling
- x86: Improved fix for last week's missed hardware breakpoint bug
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: Update tsc multiplier on change.
mips/kvm: fix ioctl error handling
arm/arm64: KVM: Fix ioctl error handling
KVM: x86: fix root cause for missed hardware breakpoints
Linus Torvalds [Thu, 3 Mar 2016 19:39:11 +0000 (11:39 -0800)]
Merge tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull late GPIO fix from Linus Walleij:
"Regressions never arrive when you want them to, so here is a late fix
for the Renesas RCAR GPIO driver. It only affects that driver on the
very specific Renesas platforms:
- Fix a runtime PM suspend/resume bug in the RCAR driver"
* tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: rcar: Add Runtime PM handling for interrupts
Linus Torvalds [Thu, 3 Mar 2016 19:32:13 +0000 (11:32 -0800)]
Merge tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"One fix for Intel VT-d:
- Use BUS_NOTIFY_REMOVED_DEVICE notifier to unbind a device from its
domain _after_ it has been unbound from its driver. This fixes a
BUG_ON being triggered in the PCI hotplug path.
And three for AMD IOMMU:
- Add a workaround for a hardware issue with ATS in use
- Fix ATS enable/disable balance when a device is removed
- Fix a boot warning being triggered when the system has IOMMU
performance counters and PCI device 00:00.0 is not covered by the
IOMMU"
* tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
iommu/amd: Detach device from domain before removal
iommu/amd: Apply workaround for ATS write permission check
iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
Linus Torvalds [Thu, 3 Mar 2016 19:07:32 +0000 (11:07 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull minor virtio/vhost fixes from Michael Tsirkin:
"This fixes two minor bugs: error handling in vhost, and capability
processing in virtio"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: fix error path in vhost_init_used()
virtio-pci: read the right virtio_pci_notify_cap field
Todd E Brandt [Thu, 3 Mar 2016 00:05:29 +0000 (16:05 -0800)]
PM / sleep / x86: Fix crash on graph trace through x86 suspend
Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.
To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Wed, 2 Mar 2016 17:46:19 +0000 (09:46 -0800)]
Merge branch 'parisc-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"We wire up the copy_file_range syscall, fix two bugs in the parisc
ptrace code and have a trivial fix for floppy.h to clarify an
expression with parentheses"
* 'parisc-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Wire up copy_file_range syscall
parisc: Fix ptrace syscall number and return value modification
parisc: Use parentheses around expression in floppy.h
Linus Torvalds [Wed, 2 Mar 2016 17:15:21 +0000 (09:15 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Various small CIFS/SMB3 fixes for stable:
Fixes address oops that can occur when accessing Macs with SMB3, and
another problem found to Samba when read responses queued (e.g. with
gluster under Samba)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix duplicate line introduced by clone_file_range patch
Fix cifs_uniqueid_to_ino_t() function for s390x
CIFS: Fix SMB2+ interim response processing for read requests
cifs: fix out-of-bounds access in lease parsing
Linus Torvalds [Tue, 1 Mar 2016 19:56:22 +0000 (11:56 -0800)]
userfaultfd: don't block on the last VM updates at exit time
The exit path will do some final updates to the VM of an exiting process
to inform others of the fact that the process is going away.
That happens, for example, for robust futex state cleanup, but also if
the parent has asked for a TID update when the process exits (we clear
the child tid field in user space).
However, at the time we do those final VM accesses, we've already
stopped accepting signals, so the usual "stop waiting for userfaults on
signal" code in fs/userfaultfd.c no longer works, and the process can
become an unkillable zombie waiting for something that will never
happen.
To solve this, just make handle_userfault() abort any user fault
handling if we're already in the exit path past the signal handling
state being dead (marked by PF_EXITING).
This VM special case is pretty ugly, and it is possible that we should
look at finalizing signals later (or move the VM final accesses
earlier). But in the meantime this is a fairly minimally intrusive fix.
Ladi Prosek [Mon, 1 Feb 2016 18:36:31 +0000 (19:36 +0100)]
virtio-pci: read the right virtio_pci_notify_cap field
Looks like a copy-paste bug. The value is used as an optimization and a
wrong value probably isn't causing any serious damage. Found when
porting this code to Windows.
Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Owen Hofmann [Tue, 1 Mar 2016 21:36:13 +0000 (13:36 -0800)]
kvm: x86: Update tsc multiplier on change.
vmx.c writes the TSC_MULTIPLIER field in vmx_vcpu_load, but only when a
vcpu has migrated physical cpus. Record the last value written and
update in vmx_vcpu_load on any change, otherwise a cpu migration must
occur for TSC frequency scaling to take effect.
Returning directly whatever copy_to_user(...) or copy_from_user(...)
returns may not do the right thing if there's a pagefault:
copy_to_user/copy_from_user return the number of bytes not copied in
this case, but ioctls need to return -EFAULT instead.
Fix up kvm on mips to do
return copy_to_user(...)) ? -EFAULT : 0;
and
return copy_from_user(...)) ? -EFAULT : 0;
everywhere.
Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Tue, 1 Mar 2016 23:30:45 +0000 (15:30 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_inode/d_flags race fix from Al Viro.
I love this fix. Not only does it fix the race in the dentry type
handling, it entirely gets rid of the nasty and subtle memory ordering
rules for d_type and d_inode, and replaces them with the basic dentry
locking rules (sequence numbers under RCU, d_lock elsewhere).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use ->d_seq to get coherency between ->d_inode and ->d_flags
Helge Deller [Tue, 19 Jan 2016 15:08:49 +0000 (16:08 +0100)]
parisc: Fix ptrace syscall number and return value modification
Mike Frysinger reported that his ptrace testcase showed strange
behaviour on parisc: It was not possible to avoid a syscall and the
return value of a syscall couldn't be changed.
To modify a syscall number, we were missing to save the new syscall
number to gr20 which is then picked up later in assembly again.
The effect that the return value couldn't be changed is a side-effect of
another bug in the assembly code. When a process is ptraced, userspace
expects each syscall to report entrance and exit of a syscall. If a
syscall number was given which doesn't exist, we jumped to the normal
syscall exit code instead of informing userspace that the (non-existant)
syscall exits. This unexpected behaviour confuses userspace and thus the
bug was misinterpreted as if we can't change the return value.
This patch fixes both problems and was tested on 64bit kernel with
32bit userspace.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: Mike Frysinger <vapier@gentoo.org> Cc: stable@vger.kernel.org # v4.0+ Tested-by: Mike Frysinger <vapier@gentoo.org>
Helge Deller [Wed, 3 Feb 2016 19:54:56 +0000 (20:54 +0100)]
parisc: Use parentheses around expression in floppy.h
David Binderman reported a style issue in the floppy.h header file:
arch/parisc/include/asm/floppy.h:221: (style) Boolean result is used in bitwise
operation. Clarify expression with parentheses.
Reported-by: David Binderman <dcb314@hotmail.com> Cc: David Binderman <dcb314@hotmail.com> Signed-off-by: Helge Deller <deller@gmx.de>
David S. Miller [Sun, 17 Jan 2016 16:47:29 +0000 (11:47 -0500)]
sparc32: Add -Wa,-Av8 to KBUILD_CFLAGS.
Binutils used to be (erroneously) extremely permissive about
instruction usage. But that got fixed and if you don't properly tell
it to accept classes of instructions it will fail.
This uncovered a specs bug on sparc in gcc where it wouldn't pass the
proper options to binutils options.
Deal with this in the kernel build by adding -Wa,-Av8 to KBUILD_CFLAGS.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 29 Feb 2016 16:04:21 +0000 (17:04 +0100)]
cpufreq: mediatek: allow building as a module
The MT8173 cpufreq driver can currently only be built-in, but
it has a Kconfig dependency on the thermal core. THERMAL
can be a loadable module, which in turn makes this driver
impossible to build.
It is nicer to make the cpufreq driver a module as well, so
this patch turns the option in to a 'tristate' and adapts
the dependency accordingly.
The driver has no module_exit() function, so it will continue
to not support unloading, but it can be built as a module
and loaded at runtime now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5269e7067cd6 (cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Arnd Bergmann [Mon, 29 Feb 2016 16:04:20 +0000 (17:04 +0100)]
cpufreq: qoriq: allow building as module with THERMAL=m
My previous patch to avoid link errors with the qoriq cpufreq
driver disallowed all of the broken cases, but also prevented
the driver from being built when CONFIG_THERMAL is a module.
This changes the dependency to allow the cpufreq driver to
also be a module in this case, just not built-in.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 8ae1702a0df5 (cpufreq: qoriq: Register cooling device based on device tree) Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Minghuan Lian [Mon, 29 Feb 2016 23:24:15 +0000 (17:24 -0600)]
PCI: layerscape: Fix MSG TLP drop setting
Some kinds of Layerscape PCIe controllers will forward the received message
TLPs to system application address space, which could corrupt system memory
or lead to a system hang. Enable MSG_DROP to fix this issue.
Signed-off-by: Minghuan Lian <Minghuan.Lian@nxp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Murali Karicheri [Mon, 29 Feb 2016 23:18:22 +0000 (17:18 -0600)]
PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer
Commit cbce7900598c ("PCI: designware: Make driver arch-agnostic") changed
the host bridge sysdata pointer from the ARM pci_sys_data to the DesignWare
pcie_port structure, and changed pcie-designware.c to reflect that. But it
did not change the corresponding code in pci-keystone-dw.c, so it caused
crashes on Keystone:
Unable to handle kernel NULL pointer dereference at virtual address 00000030
pgd = c0003000
[00000030] *pgd=80000800004003, *pmd=00000000
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.2-00139-gb74f926 #2
Hardware name: Keystone
PC is at ks_dw_pcie_msi_irq_unmask+0x24/0x58
Change pci-keystone-dw.c to expect sysdata to be the struct pcie_port
pointer.
Joerg Roedel [Mon, 29 Feb 2016 22:49:47 +0000 (23:49 +0100)]
iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
In the PCI hotplug path of the Intel IOMMU driver, replace
the usage of the BUS_NOTIFY_DEL_DEVICE notifier, which is
executed before the driver is unbound from the device, with
BUS_NOTIFY_REMOVED_DEVICE, which runs after that.
This fixes a kernel BUG being triggered in the VT-d code
when the device driver tries to unmap DMA buffers and the
VT-d driver already destroyed all mappings.
Al Viro [Mon, 29 Feb 2016 17:12:46 +0000 (12:12 -0500)]
use ->d_seq to get coherency between ->d_inode and ->d_flags
Games with ordering and barriers are way too brittle. Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.
Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.
Fix up kvm to do
return copy_to_user(...)) ? -EFAULT : 0;
everywhere.
Cc: stable@vger.kernel.org Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Yadan Fan [Mon, 29 Feb 2016 06:44:57 +0000 (14:44 +0800)]
Fix cifs_uniqueid_to_ino_t() function for s390x
This issue is caused by commit 02323db17e3a7 ("cifs: fix
cifs_uniqueid_to_ino_t not to ever return 0"), when BITS_PER_LONG
is 64 on s390x, the corresponding cifs_uniqueid_to_ino_t()
function will cast 64-bit fileid to 32-bit by using (ino_t)fileid,
because ino_t (typdefed __kernel_ino_t) is int type.
It's defined in arch/s390/include/uapi/asm/posix_types.h
#ifndef __s390x__
typedef unsigned long __kernel_ino_t;
...
#else /* __s390x__ */
typedef unsigned int __kernel_ino_t;
So the #ifdef condition is wrong for s390x, we can just still use
one cifs_uniqueid_to_ino_t() function with comparing sizeof(ino_t)
and sizeof(u64) to choose the correct execution accordingly.
Signed-off-by: Yadan Fan <ydfan@suse.com> CC: stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
Pavel Shilovsky [Sat, 27 Feb 2016 08:58:18 +0000 (11:58 +0300)]
CIFS: Fix SMB2+ interim response processing for read requests
For interim responses we only need to parse a header and update
a number credits. Now it is done for all SMB2+ command except
SMB2_READ which is wrong. Fix this by adding such processing.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Tested-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
Justin Maggard [Tue, 9 Feb 2016 23:52:08 +0000 (15:52 -0800)]
cifs: fix out-of-bounds access in lease parsing
When opening a file, SMB2_open() attempts to parse the lease state from the
SMB2 CREATE Response. However, the parsing code was not careful to ensure
that the create contexts are not empty or invalid, which can lead to out-
of-bounds memory access. This can be seen easily by trying
to read a file from a OSX 10.11 SMB3 server. Here is sample crash output:
Linus Torvalds [Sun, 28 Feb 2016 15:52:00 +0000 (07:52 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A rather largish series of 12 patches addressing a maze of race
conditions in the perf core code from Peter Zijlstra"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Robustify task_function_call()
perf: Fix scaling vs. perf_install_in_context()
perf: Fix scaling vs. perf_event_enable()
perf: Fix scaling vs. perf_event_enable_on_exec()
perf: Fix ctx time tracking by introducing EVENT_TIME
perf: Cure event->pending_disable race
perf: Fix race between event install and jump_labels
perf: Fix cloning
perf: Only update context time when active
perf: Allow perf_release() with !event->ctx
perf: Do not double free
perf: Close install vs. exit race
Linus Torvalds [Sun, 28 Feb 2016 15:49:23 +0000 (07:49 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"This update contains:
- Hopefully the last ASM CLAC fixups
- A fix for the Quark family related to the IMR lock which makes
kexec work again
- A off-by-one fix in the MPX code. Ironic, isn't it?
- A fix for X86_PAE which addresses once more an unsigned long vs
phys_addr_t hickup"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Fix off-by-one comparison with nr_registers
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
x86/entry/compat: Add missing CLAC to entry_INT80_32
x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32
x86/platform/intel/quark: Change the kernel's IMR lock bit to false
Linus Torvalds [Sun, 28 Feb 2016 15:39:15 +0000 (07:39 -0800)]
Merge tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/android fix from Greg KH:
"Here is one patch, for the android binder driver, to resolve a
reported problem. Turns out it has been around for a while (since
3.15), so it is good to finally get it resolved.
It has been in linux-next for a while with no reported issues"
* tag 'staging-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
Linus Torvalds [Sun, 28 Feb 2016 15:37:30 +0000 (07:37 -0800)]
Merge tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few USB fixes for 4.5-rc6
They fix a reported bug for some USB 3 devices by reverting the recent
patch, a MAINTAINERS change for some drivers, some new device ids, and
of course, the usual bunch of USB gadget driver fixes.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
MAINTAINERS: drop OMAP USB and MUSB maintainership
usb: musb: fix DMA for host mode
usb: phy: msm: Trigger USB state detection work in DRD mode
usb: gadget: net2280: fix endpoint max packet for super speed connections
usb: gadget: gadgetfs: unregister gadget only if it got successfully registered
usb: gadget: remove driver from pending list on probe error
Revert "usb: hub: do not clear BOS field during reset device"
usb: chipidea: fix return value check in ci_hdrc_pci_probe()
usb: chipidea: error on overflow for port_test_write
USB: option: add "4G LTE usb-modem U901"
USB: cp210x: add IDs for GE B650V3 and B850V3 boards
USB: option: add support for SIM7100E
usb: musb: Fix DMA desired mode for Mentor DMA engine
usb: gadget: fsl_qe_udc: fix IS_ERR_VALUE usage
usb: dwc2: USB_DWC2 should depend on HAS_DMA
usb: dwc2: host: fix the data toggle error in full speed descriptor dma
usb: dwc2: host: fix logical omissions in dwc2_process_non_isoc_desc
usb: dwc3: Fix assignment of EP transfer resources
usb: dwc2: Add extra delay when forcing dr_mode
Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.
Fix up vfio to do
return copy_to_user(...)) ?
-EFAULT : 0;
everywhere.
Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Linus Torvalds [Sun, 28 Feb 2016 01:10:32 +0000 (17:10 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
do_last(): ELOOP failure exit should be done after leaving RCU mode
should_follow_link(): validate ->d_seq after having decided to follow
namei: ->d_inode of a pinned dentry is stable only for positives
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
fs: return -EOPNOTSUPP if clone is not supported
hpfs: don't truncate the file when delete fails
Linus Torvalds [Sun, 28 Feb 2016 00:58:32 +0000 (16:58 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We didn't have a batch last week, so this one is slightly larger.
None of them are scary though, a handful of fixes for small DT pieces,
replacing properties with newer conventions.
Highlights:
- N900 fix for setting system revision
- onenand init fix to avoid filesystem corruption
- Clock fix for audio on Beaglebone-x15
- Fixes on shmobile to deal with CONFIG_DEBUG_RODATA (default y in 4.6)
+ misc smaller stuff"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Extend info, add wiki and ml for meson arch
MAINTAINERS: alpine: add a new maintainer and update the entry
ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
Revert "regulator: tps65217: remove tps65217.dtsi file"
ARM: shmobile: Remove shmobile_boot_arg
ARM: shmobile: Move shmobile_smp_{mpidr, fn, arg}[] from .text to .bss
ARM: shmobile: r8a7779: Remove remainings of removed SCU boot setup code
ARM: shmobile: Move shmobile_scu_base from .text to .bss
ARM: OMAP2+: Fix omap_device for module reload on PM runtime forbid
ARM: OMAP2+: Improve omap_device error for driver writers
ARM: DTS: am57xx-beagle-x15: Select SYS_CLK2 for audio clocks
ARM: dts: am335x/am57xx: replace gpio-key,wakeup with wakeup-source property
ARM: OMAP2+: Set system_rev from ATAGS for n900
ARM: dts: orion5x: fix the missing mtd flash on linkstation lswtgl
ARM: dts: kirkwood: use unique machine name for ds112
ARM: dts: imx6: remove bogus interrupt-parent from CAAM node
Al Viro [Sun, 28 Feb 2016 00:23:16 +0000 (19:23 -0500)]
namei: ->d_inode of a pinned dentry is stable only for positives
both do_last() and walk_component() risk picking a NULL inode out
of dentry about to become positive, *then* checking its flags and
seeing that it's not negative anymore and using (already stale by
then) value they'd fetched earlier. Usually ends up oopsing soon
after that...
Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sun, 28 Feb 2016 00:17:33 +0000 (19:17 -0500)]
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
... into returning a positive to path_openat(), which would interpret that
as "symlink had been encountered" and proceed to corrupt memory, etc.
It can only happen due to a bug in some ->open() instance or in some LSM
hook, etc., so we report any such event *and* make sure it doesn't trick
us into further unpleasantness.
Cc: stable@vger.kernel.org # v3.6+, at least Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Mikulas Patocka [Thu, 25 Feb 2016 17:17:38 +0000 (18:17 +0100)]
hpfs: don't truncate the file when delete fails
The delete opration can allocate additional space on the HPFS filesystem
due to btree split. The HPFS driver checks in advance if there is
available space, so that it won't corrupt the btree if we run out of space
during splitting.
If there is not enough available space, the HPFS driver attempted to
truncate the file, but this results in a deadlock since the commit 7dd29d8d865efdb00c0542a5d2c87af8c52ea6c7 ("HPFS: Introduce a global mutex
and lock it on every callback from VFS").
This patch removes the code that tries to truncate the file and -ENOSPC is
returned instead. If the user hits -ENOSPC on delete, he should try to
delete other files (that are stored in a leaf btree node), so that the
delete operation will make some space for deleting the file stored in
non-leaf btree node.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@vger.kernel.org # 2.6.39+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sat, 27 Feb 2016 20:46:16 +0000 (12:46 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
Linus Torvalds [Sat, 27 Feb 2016 20:40:49 +0000 (12:40 -0800)]
Merge tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext2/4 DAX fix from Ted Ts'o:
"This fixes a file system corruption bug with DAX"
* tag 'tags/ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
Linus Torvalds [Sat, 27 Feb 2016 20:33:42 +0000 (12:33 -0800)]
Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Enumeration:
Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
Marvell MVEBU host bridge driver:
Restrict build to 32-bit ARM (Thierry Reding)"
* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: mvebu: Restrict build to 32-bit ARM
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
Ross Zwisler [Sat, 27 Feb 2016 19:01:13 +0000 (14:01 -0500)]
ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite()
As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry. For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect. The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.
Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().
Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ross Zwisler [Fri, 26 Feb 2016 23:19:55 +0000 (15:19 -0800)]
dax: move writeback calls into the filesystems
Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.
Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:52 +0000 (15:19 -0800)]
dax: give DAX clearing code correct bdev
dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.
Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:49 +0000 (15:19 -0800)]
ext4: online defrag not supported with DAX
Online defrag operations for ext4 are hard coded to use the page cache.
See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()
When combined with DAX I/O, which circumvents the page cache, this can
result in data corruption. This was observed with xfstests ext4/307 and
ext4/308.
Fix this by only allowing online defrag for non-DAX files.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 26 Feb 2016 23:19:46 +0000 (15:19 -0800)]
ext2, ext4: only set S_DAX for regular inodes
When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes. Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().
With the current code, though, this isn't always true. For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries. For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.
Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4. This allows us to have strict DAX vs non-DAX paths in the
writeback code.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Fri, 26 Feb 2016 23:19:43 +0000 (15:19 -0800)]
block: disable block device DAX by default
The recent *sync enabling discovered that we are inserting into the
block_device pagecache counter to the expectations of the dirty data
tracking for dax mappings. This can lead to data corruption.
We want to support DAX for block devices eventually, but it requires
wider changes to properly manage the pagecache.
Mark the support broken so its disabled by default, but otherwise still
available for testing.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Guozhonghua [Fri, 26 Feb 2016 23:19:40 +0000 (15:19 -0800)]
ocfs2: unlock inode if deleting inode from orphan fails
When doing append direct io cleanup, if deleting inode fails, it goes
out without unlocking inode, which will cause the inode deadlock.
This issue was introduced by commit cf1776a9e834 ("ocfs2: fix a tiny
race when truncate dio orohaned entry").
Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Cashman [Fri, 26 Feb 2016 23:19:37 +0000 (15:19 -0800)]
mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long(). Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Cashman [Fri, 26 Feb 2016 23:19:34 +0000 (15:19 -0800)]
drivers: char: random: add get_random_long()
Commit d07e22597d1d ("mm: mmap: add new /proc tunable for mmap_base
ASLR") added the ability to choose from a range of values to use for
entropy count in generating the random offset to the mmap_base address.
The maximum value on this range was set to 32 bits for 64-bit x86
systems, but this value could be increased further, requiring more than
the 32 bits of randomness provided by get_random_int(), as is already
possible for arm64. Add a new function: get_random_long() which more
naturally fits with the mmap usage of get_random_int() but operates
exactly the same as get_random_int().
Also, fix the shifting constant in mmap_rnd() to be an unsigned long so
that values greater than 31 bits generate an appropriate mask without
overflow. This is especially important on x86, as its shift instruction
uses a 5-bit mask for the shift operand, which meant that any value for
mmap_rnd_bits over 31 acts as a no-op and effectively disables mmap_base
randomization.
Finally, replace calls to get_random_int() with get_random_long() where
appropriate.
This patch (of 2):
Add get_random_long().
Signed-off-by: Daniel Cashman <dcashman@android.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 26 Feb 2016 23:19:31 +0000 (15:19 -0800)]
mm: numa: quickly fail allocations for NUMA balancing on full nodes
Commit 4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the GFP_THISNODE
flag combination due to confusing semantics. It noted that
alloc_misplaced_dst_page() was one such user after changes made by
commit e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify").
Unfortunately when GFP_THISNODE was removed, users of
alloc_misplaced_dst_page() started waking kswapd and entering direct
reclaim because the wrong GFP flags are cleared. The consequence is
that workloads that used to fit into memory now get reclaimed which is
addressed by this patch.
The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching. The configuration
uses 80% of memory and is run 3 times for varying numbers of clients.
The results on a 4-socket NUMA box are
The metric is queries/second with the more the better. The results are
way outside of the noise and the reason for the improvement is obvious
from some of the vmstats
The vanilla kernel is swapping like crazy with large amounts of direct
reclaim and kswapd activity. The figures are aggregate but it's known
that the bad activity is throughout the entire test.
Note that simple streaming anon/file memory consumers also see this
problem but it's not as obvious. In those cases, kswapd is awake when
it should not be.
As there are at least two reclaim-related bugs out there, it's worth
spelling out the user-visible impact. This patch only addresses bugs
related to excessive reclaim on NUMA hardware when the working set is
larger than a NUMA node. There is a bug related to high kswapd CPU
usage but the reports are against laptops and other UMA hardware and is
not addressed by this patch.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Fri, 26 Feb 2016 23:19:28 +0000 (15:19 -0800)]
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[akpm@linux-foundation.org: tidy up comment grammar/layout] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thierry Reding [Thu, 18 Feb 2016 13:32:10 +0000 (14:32 +0100)]
PCI: mvebu: Restrict build to 32-bit ARM
This driver uses PCI glue that is only available on 32-bit ARM. This used
to work fine as long as ARCH_MVEBU and ARCH_DOVE were exclusively 32-bit,
but there's a patch in the pipe to make ARCH_MVEBU also available on 64-bit
ARM.
[bhelgaas: changelog; patch is coming but not merged yet] Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Bjorn Helgaas [Wed, 17 Feb 2016 18:26:42 +0000 (12:26 -0600)]
Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.
Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to 991de2e59090. Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3. In both cases booting with
"pci=routirq" is a workaround.
I think the problem is that after 991de2e59090, we no longer call
pcibios_enable_irq() for upstream bridges. Prior to 991de2e59090, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().
After 991de2e59090, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.
Revert 991de2e59090 to fix these driver regressions.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211 Fixes: 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") Reported-and-tested-by: Олег Мороз <oleg.moroz@mcc.vniiem.ru> Reported-by: Sunjin Yang <fan4326@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> CC: Jiang Liu <jiang.liu@linux.intel.com>
Colin Ian King [Fri, 26 Feb 2016 18:55:31 +0000 (18:55 +0000)]
x86/mpx: Fix off-by-one comparison with nr_registers
In the unlikely event that regno == nr_registers then we get an array
overrun on regoff because the invalid register check is currently
off-by-one. Fix this with a check that regno is >= nr_registers instead.
Detected with static analysis using CoverityScan.
Fixes: fcc7ffd67991 "x86, mpx: Decode MPX instruction to get bound violation information" Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1456512931-3388-1-git-send-email-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ard Biesheuvel [Fri, 26 Feb 2016 16:57:13 +0000 (17:57 +0100)]
arm64: vmemmap: use virtual projection of linear region
Commit dd006da21646 ("arm64: mm: increase VA range of identity map") made
some changes to the memory mapping code to allow physical memory to reside
at an offset that exceeds the size of the virtual mapping.
However, since the size of the vmemmap area is proportional to the size of
the VA area, but it is populated relative to the physical space, we may
end up with the struct page array being mapped outside of the vmemmap
region. For instance, on my Seattle A0 box, I can see the following output
in the dmesg log.
We can fix this by deciding that the vmemmap region is not a projection of
the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
linear region. This way, we are guaranteed that the vmemmap region is of
sufficient size, and we can even reduce the size by half.
Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
Linus Torvalds [Fri, 26 Feb 2016 17:35:03 +0000 (09:35 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There are two small messenger bug fixes and a log spam regression fix"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: don't spam dmesg with stray reply warnings
libceph: use the right footer size when skipping a message
libceph: don't bail early from try_read() when skipping a message
Linus Torvalds [Fri, 26 Feb 2016 17:27:21 +0000 (09:27 -0800)]
Merge tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Things got calmed down for rc6, as it seems, and we have only a few
HD-audio fixes at this time: a fix for Skylake codec probe errors, a
fix for missing interrupt handling, and a few Dell and HP quirks"
* tag 'sound-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Loop interrupt handling until really cleared
ALSA: hda - Fix headset support and noise on HP EliteBook 755 G2
ALSA: hda - Fixup speaker pass-through control for nid 0x14 on ALC225
ALSA: hda - Fixing background noise on Dell Inspiron 3162
ALSA: hda - Apply clock gate workaround to Skylake, too
Linus Torvalds [Fri, 26 Feb 2016 17:21:48 +0000 (09:21 -0800)]
Merge tag 'pm+acpi-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are two reverts of recent PCI-related ACPI core changes (one of
which caused some systems to crash on boot and the other was a cleanup
on top of it) and a devfreq fix for Tegra.
Specifics:
- Revert an ACPI core change related to IRQ management in PCI that
introduced code relying on the use of kmalloc() which turned out to
also run during early init when that's not available yet and caused
some systems to crash on boot for this reason along with a cleanup
on top of it (Rafael Wysocki).
- Prevent devfreq from flooding the kernel log with useless messages
on Tegra (which started to happen after some recent changes in the
devfreq core) by fixing the driver to follow the documentation and
the core's expectations in its ->target callback (Tomeu Vizoso)"
* tag 'pm+acpi-4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI, PCI, irq: remove interrupt count restriction"
Revert "ACPI / PCI: Simplify acpi_penalize_isa_irq()"
PM / devfreq: tegra: Set freq in rate callback
Paolo Bonzini [Fri, 26 Feb 2016 11:28:40 +0000 (12:28 +0100)]
KVM: x86: fix root cause for missed hardware breakpoints
Commit 172b2386ed16 ("KVM: x86: fix missed hardware breakpoints",
2016-02-10) worked around a case where the debug registers are not loaded
correctly on preemption and on the first entry to KVM_RUN.
However, Xiao Guangrong pointed out that the root cause must be that
KVM_DEBUGREG_BP_ENABLED is not being set correctly. This can indeed
happen due to the lazy debug exit mechanism, which does not call
kvm_update_dr7. Fix it by replacing the existing loop (more or less
equivalent to kvm_update_dr0123) with calls to all the kvm_update_dr*
functions.
two attempts have been made at fixing a possible hang caused by
cursor_timer_handler. That function registers a timer to be triggered at
"jiffies + fbcon_ops.cur_blink_jiffies".
A new case had been encountered during initialisation of clcd-pl11x:
If we take an softirq anywhere between A and B (and we do),
cursor_timer_handler executes indefinitely.
Instead of patching all possible paths that lead to this case one at a
time, fix the issue at the source and initialise cur_blink_jiffies to
200ms when allocating fbcon_ops. This was its default value before
aforesaid commit. fbcon_cursor or fbcon_init will refine this value
downstream.
Takashi Iwai [Tue, 23 Feb 2016 14:54:47 +0000 (15:54 +0100)]
ALSA: hda - Loop interrupt handling until really cleared
Currently the interrupt handler of HD-audio driver assumes that no irq
update is needed while processing the irq. But in reality, it has
been confirmed that the HW irq is issued even during the irq
handling. Since we clear the irq status at the beginning, process the
interrupt, then exits from the handler, the lately issued interrupt is
left untouched without being properly processed.
This patch changes the interrupt handler code to loop over the
check-and-process. The handler tries repeatedly as long as the IRQ
status are turned on, and either stream or CORB/RIRB is handled.
For checking the stream handling, snd_hdac_bus_handle_stream_irq()
returns a value indicating the stream indices bits. Other than that,
the change is only in the irq handler itself.
Reported-by: Libin Yang <libin.yang@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Linus Torvalds [Fri, 26 Feb 2016 04:12:09 +0000 (20:12 -0800)]
Merge tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Another small bug reported to me by Chunyu Hu.
When perf added a "reg" function to the function tracing event (not a
tracepoint), it caused that event to be displayed as a tracepoint and
could cause errors in tracepoint handling. That was solved by adding
a flag to ignore ftrace non-tracepoint events. But that flag was
missed when displaying events in available_events, which should only
contain tracepoint events.
This broke a documented way to enable all events with:
cat available_events > set_event
As the function non-tracepoint event would cause that to error out.
The commit here fixes that by having the available_events file not
list events that have the ignore flag set"
* tag 'trace-fixes-v4.5-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix showing function event in available_events
Linus Torvalds [Fri, 26 Feb 2016 03:53:54 +0000 (19:53 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"KVM/ARM fixes:
- Fix per-vcpu vgic bitmap allocation
- Do not give copy random memory on MMIO read
- Fix GICv3 APR register restore order
KVM/x86 fixes:
- Fix ubsan warning
- Fix hardware breakpoints in a guest vs. preempt notifiers
- Fix Hurd
Generic:
- use __GFP_NOWARN together with GFP_NOWAIT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: MMU: fix ubsan index-out-of-range warning
arm64: KVM: vgic-v3: Restore ICH_APR0Rn_EL2 before ICH_APR1Rn_EL2
KVM: async_pf: do not warn on page allocation failures
KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
KVM: x86: fix missed hardware breakpoints
arm/arm64: KVM: Feed initialized memory to MMIO accesses
KVM: arm/arm64: vgic: Ensure bitmaps are long enough
Linus Torvalds [Fri, 26 Feb 2016 03:47:01 +0000 (19:47 -0800)]
Merge tag 'renesas-sh-drivers-fixes-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
Pull SuperH driver fix from Simon Horman:
"Restore legacy clock domain on SuperH platforms"
* tag 'renesas-sh-drivers-fixes-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: Restore legacy clock domain on SuperH platforms
Linus Torvalds [Fri, 26 Feb 2016 03:41:53 +0000 (19:41 -0800)]
Merge tag 'powerpc-4.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- eeh: Fix partial hotplug criterion from Gavin Shan
- mm: Clear the invalid slot information correctly from Aneesh Kumar K.V
* tag 'powerpc-4.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/hash: Clear the invalid slot information correctly
powerpc/eeh: Fix partial hotplug criterion
Linus Torvalds [Fri, 26 Feb 2016 03:36:33 +0000 (19:36 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 bugfixes from Martin Schwidefsky:
"Two critical bug fixes for the signal handling"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/fpu: signals vs. floating point control register
s390/compat: correct restore of high gprs on signal return
Linus Torvalds [Fri, 26 Feb 2016 03:31:01 +0000 (19:31 -0800)]
Merge tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"One fix for a bug that could cause a NULL write past the end of a
buffer in case of unusually long writes to some system interfaces used
by mountd and other nfs support utilities"
* tag 'nfsd-4.5-1' of git://linux-nfs.org/~bfields/linux:
sunrpc/cache: fix off-by-one in qword_get()
Linus Torvalds [Fri, 26 Feb 2016 03:01:42 +0000 (19:01 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is a bit larger than Id like, but I asked the Intel guys to pull
in some Skylake fixes in the possibly vain hope that Skylake might be
more functional now that I'm seeing production hardware shipping.
For i915, it's mostly the same patch in a few places, making sure the
hw doesn't turn off when we are programming it.
Apart from that are two nouveau fixes, one for a module defer bug, and
one for using nouveau on new Lenovo P50 models.
Then there are a bunch of AMDGPU fixes, one is a fix for v4.4 vblank
regressions, and some PM fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (26 commits)
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
drm/nouveau: platform: Fix deferred probe
drm/amdgpu: disable direct VM updates when vm_debug is set
amdgpu: fix NULL pointer dereference at tonga_check_states_equal
drm/i915/gen9: Verify and enforce dc6 state writes
drm/i915/gen9: Check for DC state mismatch
drm/radeon/pm: adjust display configuration after powerstate
drm/amdgpu/pm: adjust display configuration after powerstate
drm/amdgpu/pm: add some checks for PX
drm/amdgpu: fix locking in force performance level
drm/amdgpu/gfx8: fix priv reg interrupt enable
drm/i915/skl: Ensure HW is powered during DDB HW state readout
drm/i915/lvds: Ensure the HW is powered during HW state readout
drm/i915/hdmi: Ensure the HW is powered during HW state readout
drm/i915/dsi: Ensure the HW is powered during HW state readout
drm/i915/dp: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered when accessing the CRC HW block
drm/i915/ddi: Ensure the HW is powered during HW state readout
drm/i915/crt: Ensure the HW is powered during HW state readout
drm/i915: Ensure the HW is powered during HW access in assert_pipe
...
Linus Torvalds [Fri, 26 Feb 2016 02:54:53 +0000 (18:54 -0800)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Two fixes for compatibility with the ACPI 6.1 specification.
Without these fixes multi-interface DIMMs will fail to be probed, and
address range scrub commands to find memory errors will give results
that the kernel will mis-interpret. For multi-interface DIMMs Linux
will accept either the original 6.0 implementation or 6.1.
For address range scrub we'll only support 6.1 since ACPI formalized
this DSM differently than the original example [1] implemented in
v4.2. The expectation is that production systems will only ever ship
the ACPI 6.1 address range scrub command definition.
- The wider async address range scrub work targeting 4.6 discovered
that the original synchronous implementation in 4.5 is not sizing its
return buffer correctly.
- Arnd caught that my recent fix to the size of the pfn_t flags missed
updating the flags variable used in the pmem driver.
- Toshi found that we mishandle the memremap() return value in
devm_memremap().
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: use 'u64' for pfn flags
devm_memremap: Fix error value when memremap failed
nfit: update address range scrub commands to the acpi 6.1 format
libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
Linus Torvalds [Fri, 26 Feb 2016 02:42:08 +0000 (18:42 -0800)]
Merge tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
"Add a regression fix for changed sysfs path of bq27xxx_battery and
update MAINTAINERS file"
* tag 'for-v4.5-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: bq27xxx_battery: Restore device name
MAINTAINERS: update bq27xxx driver
Dexuan Cui [Thu, 25 Feb 2016 09:58:12 +0000 (01:58 -0800)]
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
"d1cd12108346: x86, pageattr: Prevent overflow in slow_virt_to_phys() for
X86_PAE" was unintentionally removed by the recent "34437e67a672: x86/mm: Fix
slow_virt_to_phys() to handle large PAT bit".
And, the variable 'phys_addr' was defined as "unsigned long" by mistake -- it should
be "phys_addr_t".
As a result, Hyper-V network driver in 32-PAE Linux guest can't work again.
Fixes: commit 34437e67a672: "x86/mm: Fix slow_virt_to_phys() to handle large PAT bit" Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Cc: olaf@aepfle.de Cc: gregkh@linuxfoundation.org Cc: jasowang@redhat.com Cc: driverdev-devel@linuxdriverproject.org Cc: linux-mm@kvack.org Cc: apw@canonical.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Link: http://lkml.kernel.org/r/1456394292-9030-1-git-send-email-decui@microsoft.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jay Cornwall [Wed, 10 Feb 2016 21:48:01 +0000 (15:48 -0600)]
iommu/amd: Apply workaround for ATS write permission check
The AMD Family 15h Models 30h-3Fh (Kaveri) BIOS and Kernel Developer's
Guide omitted part of the BIOS IOMMU L2 register setup specification.
Without this setup the IOMMU L2 does not fully respect write permissions
when handling an ATS translation request.
The IOMMU L2 will set PTE dirty bit when handling an ATS translation with
write permission request, even when PTE RW bit is clear. This may occur by
direct translation (which would cause a PPR) or by prefetch request from
the ATC.
This is observed in practice when the IOMMU L2 modifies a PTE which maps a
pagecache page. The ext4 filesystem driver BUGs when asked to writeback
these (non-modified) pages.
Enable ATS write permission check in the Kaveri IOMMU L2 if BIOS has not.
iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
The setup code for the performance counters in the AMD IOMMU driver
tests whether the counters can be written. It tests to setup a counter
for device 00:00.0, which fails on systems where this particular device
is not covered by the IOMMU.
Fix this by not relying on device 00:00.0 but only on the IOMMU being
present.
gpio: rcar: Add Runtime PM handling for interrupts
The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
When using a GPIO purely as an interrupt source, no Runtime PM handling
is done, and the GPIO module's clock may not be enabled.
To fix this:
- Add .irq_request_resources() and .irq_release_resources() callbacks
to handle Runtime PM when an interrupt is requested,
- Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM
when e.g. disabling/enabling an interrupt, or configuring the
interrupt type.