Eric Biggers [Sun, 30 Apr 2017 04:01:02 +0000 (00:01 -0400)]
ext4: remove ext4_xattr_check_entry()
ext4_xattr_check_entry() was redundant with validation of the full xattr
entries list in ext4_xattr_check_entries(), which all callers also did.
ext4_xattr_check_entry() also didn't actually do correct validation;
specifically, it never checked that the value doesn't overlap the xattr
names, nor did it account for padding when checking whether the xattr
value overflows the available space. So remove it to eliminate any
potential confusion.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Sun, 30 Apr 2017 03:56:52 +0000 (23:56 -0400)]
ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
ext4_xattr_check_names() actually validates both the xattr names and
values, not just the names. So rename it to ext4_xattr_check_entries()
to avoid confusion.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Sun, 30 Apr 2017 03:53:17 +0000 (23:53 -0400)]
ext4: merge ext4_xattr_list() into ext4_listxattr()
There's no difference between ext4_xattr_list() and ext4_listxattr(), so
merge them together and just have ext4_listxattr(). Some years ago they
took different arguments, but that's no longer the case.
Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Sun, 30 Apr 2017 03:27:26 +0000 (23:27 -0400)]
ext4: trim return value and 'dir' argument from ext4_insert_dentry()
In the initial implementation of ext4 encryption, the filename was
encrypted in ext4_insert_dentry(), which could fail and also required
access to the 'dir' inode. Since then ext4 filename encryption has been
changed to encrypt the filename earlier, so we can revert the additions
to ext4_insert_dentry().
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Sun, 30 Apr 2017 01:07:30 +0000 (21:07 -0400)]
jbd2: fix dbench4 performance regression for 'nobarrier' mounts
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
filesystem is mounted with nobarrier mount option, journal superblock
writes ended up being async writes after this patch and that caused
heavy performance regression for dbench4 benchmark with high number of
processes. In my test setup with HP RAID array with non-volatile write
cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.
Fix the problem by making sure journal superblock writes are always
treated as synchronous since they generally block progress of the
journalling machinery and thus the whole filesystem.
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Jan Kara [Sun, 30 Apr 2017 00:12:16 +0000 (20:12 -0400)]
jbd2: Fix lockdep splat with generic/270 test
I've hit a lockdep splat with generic/270 test complaining that:
3216.fsstress.b/3533 is trying to acquire lock:
(jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150
but task is already holding lock:
(jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850
The underlying problem is that jbd2_journal_force_commit_nested()
(called from ext4_should_retry_alloc()) may get called while a
transaction handle is started. In such case it takes care to not wait
for commit of the running transaction (which would deadlock) but only
for a commit of a transaction that is already committing (which is safe
as that doesn't wait for any filesystem locks).
In fact there are also other callers of jbd2_log_wait_commit() that take
care to pass tid of a transaction that is already committing and for
those cases, the lockdep instrumentation is too restrictive and leading
to false positive reports. Fix the problem by calling
jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
transaction isn't already committing.
Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
mm: retry writepages() on ENOMEM when doing an data integrity writeback
Currently, file system's writepages() function must not fail with an
ENOMEM, since if they do, it's possible for buffered data to be lost.
This is because on a data integrity writeback writepages() gets called
but once, and if it returns ENOMEM, if you're lucky the error will get
reflected back to the userspace process calling fsync(). If you
aren't lucky, the user is unmounting the file system, and the dirty
pages will simply be lost.
For this reason, file system code generally will use GFP_NOFS, and in
some cases, will retry the allocation in a loop, on the theory that
"kernel livelocks are temporary; data loss is forever".
Unfortunately, this can indeed cause livelocks, since inside the
writepages() call, the file system is holding various mutexes, and
these mutexes may prevent the OOM killer from killing its targetted
victim if it is also holding on to those mutexes.
A better solution would be to allow writepages() to call the memory
allocator with flags that give greater latitude to the allocator to
fail, and then release its locks and return ENOMEM, and in the case of
background writeback, the writes can be retried at a later time. In
the case of data-integrity writeback retry after waiting a brief
amount of time.
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"This is a set of CIFS/SMB3 fixes for stable.
There is another set of four SMB3 reconnect fixes for stable in
progress but they are still being reviewed/tested, so didn't want to
wait any longer to send these five below"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Reset TreeId to zero on SMB2 TREE_CONNECT
CIFS: Fix build failure with smb2
Introduce cifs_copy_file_range()
SMB3: Rename clone_range to copychunk_range
Handle mismatched open calls
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A number of ARM fixes:
- prevent oopses caused by dma_get_sgtable() and declared DMA
coherent memory
- fix boot failure on nommu caused by ID_PFR1 access
- a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
arm: kprobes: Align stack to 8-bytes in test code
arm: kprobes: Fix the return address of multiple kretprobes
arm: kprobes: Skip single-stepping in recursing path if possible
arm: kprobes: Allow to handle reentered kprobe on single-stepping
Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are 3 small fixes for 4.11-rc6.
One resolves a reported issue with sysfs files that NeilBrown found,
one is a documenatation fix for the stable kernel rules, and the last
is a small MAINTAINERS file update for kernfs"
* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
MAINTAINERS: separate out kernfs maintainership
sysfs: be careful of error returns from ops->show()
Documentation: stable-kernel-rules: fix stable-tag format
Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver rfixes from Greg KH:
"Here are a number of small IIO and staging driver fixes for 4.11-rc6.
Nothing big here, just iio fixes for reported issues, and an ashmem
fix for a very old bug that has been reported by a number of Android
vendors"
* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
iio: hid-sensor-attributes: Fix sensor property setting failure.
iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
iio: st_pressure: initialize lps22hb bootime
iio: bmg160: reset chip when probing
iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"statx followup fixes and a fix for stack-smashing on alpha"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
alpha: fix stack smashing in old_adjtimex(2)
statx: Include a mask for stx_attributes in struct statx
statx: Reserve the top bit of the mask for future struct expansion
xfs: report crtime and attribute flags to statx
ext4: Add statx support
statx: optimize copy of struct statx to userspace
statx: remove incorrect part of vfs_statx() comment
statx: reject unknown flags when using NULL path
Documentation/filesystems: fix documentation for ->getattr()
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a pull request for 4.11-rc, fixing a set of issues mostly
centered around the new scheduling framework. These have been brewing
for a while, but split up into what we absolutely need in 4.11, and
what we can defer until 4.12. These are well tested, on both single
queue and multiqueue setups, and with and without shared tags. They
fix several hangs that have happened in testing.
This is obviously larger than I would have preferred at this point in
time, but I don't think we can shave much off this and still get the
desired results.
In detail, this pull request contains:
- a set of five fixes for NVMe, mostly from Christoph and one from
Roland.
- a series from Bart, fixing issues with dm-mq and SCSI shared tags
and scheduling. Note that one of those patches commit messages may
read like an optimization, but it is in fact an important fix for
queue restarts in particular.
- a series from Omar, most importantly fixing a hang with multiple
hardware queues when we fail to get a driver tag. Another important
fix in there is for resizing hardware queues, which nbd does when
handling multiple sockets for one connection.
- fixing an imbalance in putting the ctx for hctx request allocations
from Minchan"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: Restart a single queue if tag sets are shared
dm rq: Avoid that request processing stalls sporadically
scsi: Avoid that SCSI queues get stuck
blk-mq: Introduce blk_mq_delay_run_hw_queue()
blk-mq: remap queues when adding/removing hardware queues
blk-mq-sched: fix crash in switch error path
blk-mq-sched: set up scheduler tags when bringing up new queues
blk-mq-sched: refactor scheduler initialization
blk-mq: use the right hctx when getting a driver tag fails
nvmet: fix byte swap in nvmet_parse_io_cmd
nvmet: fix byte swap in nvmet_execute_write_zeroes
nvmet: add missing byte swap in nvmet_get_smart_log
nvme: add missing byte swap in nvme_setup_discard
nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
block: do not put mq context in blk_mq_alloc_request_hctx
Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"This late fix for pin control is hopefully the last I send this cycle.
The problem was detected early in the v4.11 release cycle and there
has been some back and forth on how to solve it. Sadly the proper fix
arrives late, but at least not too late.
An issue was detected with pin control on the Freescale i.MX after the
refactorings for more general group and function handling.
We now have the proper fix for this"
* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 4.11:
Headed to stable:
- disable HFSCR[TM] if TM is not supported, fixes a potential host
kernel crash triggered by a hostile guest, but only in
configurations that no one uses
- don't try to fix up misaligned load-with-reservation instructions
- fix flush_(d|i)cache_range() called from modules on little endian
kernels
- add missing global TLB invalidate if cxl is active
- fix missing preempt_disable() in crc32c-vpmsum
And a fix for selftests build changes that went in this release:
- selftests/powerpc: Fix standalone powerpc build
Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
Paul Mackerras"
* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
powerpc/mm: Add missing global TLB invalidate if cxl is active
powerpc/64: Fix flush_(d|i)cache_range() called from modules
powerpc: Don't try to fix up misaligned load-with-reservation instructions
powerpc: Disable HFSCR[TM] if TM is not supported
selftests/powerpc: Fix standalone powerpc build
Chris Salls [Sat, 8 Apr 2017 06:48:11 +0000 (23:48 -0700)]
mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.
Signed-off-by: Chris Salls <salls@cs.ucsb.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs: be careful of error returns from ops->show()
ops->show() can return a negative error code.
Commit 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.
Commit 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.
This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.
I can reproduce it reliably only by putting delay like
usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
The bug can be triggered without the usleep.
Fixes: 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.") Fixes: 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs") Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Mon, 3 Apr 2017 13:53:34 +0000 (15:53 +0200)]
Documentation: stable-kernel-rules: fix stable-tag format
A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).
Drop the minus sign to avoid any confusion.
Fixes: fdc81b7910ad ("stable_kernel_rules: Add clause about specification of kernel versions to patch.") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.
Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file. 91360b02ab48 ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call. This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit cleanup from Paul Moore:
"A week later than I had hoped, but as promised, here is the audit
uninline-fix we talked about during the last audit pull request.
The patch is slightly different than what we originally discussed as
it made more sense to keep the audit_signal_info() function in
auditsc.c rather than move it and bunch of other related
variables/definitions into audit.c/audit.h.
At some point in the future I need to look at how the audit code is
organized across kernel/audit*, I suspect we could do things a bit
better, but it doesn't seem like a -rc release is a good place for
that ;)
Regardless, this patch passes our tests without problem and looks good
for v4.11"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_signal_info() into kernel/auditsc.c
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: move pcp and lru-pcp draining into single wq
mailmap: update Yakir Yang email address
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
dax: fix radix tree insertion race
mm, thp: fix setting of defer+madvise thp defrag mode
ptrace: fix PTRACE_LISTEN race corrupting task->state
vmlinux.lds: add missing VMLINUX_SYMBOL macros
mm/page_alloc.c: fix print order in show_free_areas()
userfaultfd: report actual registered features in fdinfo
mm: fix page_vma_mapped_walk() for ksm pages
Michal Hocko [Fri, 7 Apr 2017 23:05:05 +0000 (16:05 -0700)]
mm: move pcp and lru-pcp draining into single wq
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Mel Gorman <mgorman@suse.de> Tested-by: Yang Li <pku.leo@gmail.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Fri, 7 Apr 2017 23:04:57 +0000 (16:04 -0700)]
dax: fix radix tree insertion race
While running generic/340 in my test setup I hit the following race. It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.
Thread 1 Thread 2
-------- --------
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
'entry' is NULL, can't call lock_slot()
spin_unlock_irq()
radix_tree_preload()
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
...
lock_slot()
spin_unlock_irq()
dax_pmd_insert_mapping()
<inserts a PMD mapping>
spin_lock_irq()
__radix_tree_insert() fails with -EEXIST
<fall back to 4k fault, and die horribly
when inserting a 4k entry where a PMD exists>
The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index. For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.
So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.
This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop. It used to fail within the first 10 iterations.
Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: <stable@vger.kernel.org> [4.10+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Fri, 7 Apr 2017 23:04:54 +0000 (16:04 -0700)]
mm, thp: fix setting of defer+madvise thp defrag mode
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().
Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".
Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option") Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
[akpm@linux-foundation.org: coding-style fixes] Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL") Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com Signed-off-by: Ben Segall <bsegall@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:
kernel/extable.c:169: undefined reference to `__start_ro_after_init'
kernel/extable.c:169: undefined reference to `__end_ro_after_init'
The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.
Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.
Mike Rapoport [Fri, 7 Apr 2017 23:04:42 +0000 (16:04 -0700)]
userfaultfd: report actual registered features in fdinfo
fdinfo for userfault file descriptor reports UFFD_API_FEATURES. Up
until recently, the UFFD_API_FEATURES was defined as 0, therefore
corresponding field in fdinfo always contained zero. Now, with
introduction of several additional features, UFFD_API_FEATURES is not
longer 0 and it seems better to report actual features requested for the
userfaultfd object described by the fdinfo.
First, the applications that were using userfault will still see zero at
the features field in fdinfo. Next, reporting actual features rather
than available features, gives clear indication of what userfault
features are used by an application.
Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix
remove_migration_pte() for ksm pages"), you cannot use page->index
calculations on ksm pages.
page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
can lead it off the end of the page table, and into whatever nonsense is
in the next page, ending as an oops inside check_pte()'s pte_page().
KSM tells page_vma_mapped_walk() exactly where to look for the page, it
does not need any page->index calculation: and that's so also for all
the normal and file and anon pages - just not for THPs and their
subpages. Get out early in most cases: instead of a PageKsm test, move
down the earlier not-THP-page test, as suggested by Kirill.
I'm also slightly worried that this loop can stray into other vmas, so
added a vm_end test to prevent surprises; though I have not imagined
anything worse than a very contrived case, in which a page mlocked in
the next vma might be reclaimed because it is not mlocked in this vma.
orangefs: move features validation to fix filesystem hang
Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).
The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked. When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two. While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.
This is only half of the bugfix. The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns. Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores. The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.
Mike Marshall says:
"I've tested this patch against the fixed userspace part. This patch is
real important, I hope it can make it into 4.11...
Here's what happens when the userspace daemon is restarted, without
the patch:
=============================================
[ INFO: possible recursive locking detected ]
[ 4.10.0-00007-ge98bdb3 #1 Not tainted ]
---------------------------------------------
pvfs2-client-co/29032 is trying to acquire lock:
(orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
but task is already holding lock:
(orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]
blk-mq: Restart a single queue if tag sets are shared
To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
dm rq: Avoid that request processing stalls sporadically
While running the srp-test software I noticed that request
processing stalls sporadically at the beginning of a test, namely
when mkfs is run against a dm-mpath device. Every time when that
happened the following command was sufficient to resume request
processing:
echo run >/sys/kernel/debug/block/dm-0/state
This patch avoids that such request processing stalls occur. The
test I ran is as follows:
while srp-test/run_tests -d -r 30 -t 02-mq; do :; done
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Jens Axboe <axboe@fb.com>
If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.
commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.
Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.
Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues") Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in blk_mq_delay_queue()") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Long Li <longli@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().
This function will be used in the next patch in this series.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Long Li <longli@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"We've got a regression fix for the signal raised when userspace makes
an unsupported unaligned access and a revert of the contiguous
(hugepte) support for hugetlb, which has once again been found to be
broken. One day, maybe, we'll get it right.
Summary:
- restore previous SIGBUS behaviour for unhandled unaligned user
accesses
- revert broken support for the contiguous bit in hugetlb (again...)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""
arm64: mm: unaligned access by user-land should be received as SIGBUS
Merge tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull metag usercopy fixes from James Hogan:
"Metag usercopy fault handling fixes
These patches fix a bunch of longstanding (some over a decade old)
metag user copy fault handling bugs. Thanks go to Al Viro for spotting
some of the questionable code in the first place"
* tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
metag/usercopy: Add missing fixups
metag/usercopy: Fix src fixup in from user rapf loops
metag/usercopy: Set flags before ADDZ
metag/usercopy: Zero rest of buffer from copy_from_user
metag/usercopy: Add early abort to copy_to_user
metag/usercopy: Fix alignment error checking
metag/usercopy: Drop unused macros
Merge tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes a core device enumeration code change made in 4.10, in
order to address a reported issue, that went too far.
Specifics:
- Refine the check for the existence of _HID in find_child_checks()
so that it doesn't trigger for device objects with device IDs made
up by the kernel (Rafael Wysocki)"
* tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / scan: Prefer devices without _HID for _ADR matching
sysctl: don't print negative flag for proc_douintvec
I saw some very confusing sysctl output on my system:
# cat /proc/sys/net/core/xfrm_aevent_rseqth
-2
# cat /proc/sys/net/core/xfrm_aevent_etime
-10
# cat /proc/sys/net/ipv4/tcp_notsent_lowat
-4294967295
Because we forget to set the *negp flag in proc_douintvec, so it will
become a garbage value.
Since the value related to proc_douintvec is always an unsigned integer,
so we can set *negp to false explictily to fix this issue.
Commit e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32
fields") introduced the proc_douintvec helper function, but it forgot to
add the related sanity check when doing register_sysctl_table. So add
it now.
blk-mq: remap queues when adding/removing hardware queues
blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
queues as well. This breaks, for example, NBD's multi-connection mode,
leaving the added hardware queues unused. Fix it by making
blk_mq_update_nr_hw_queues() explicitly remap the queues.
Fixes: 4e68a011428a ("blk-mq: don't redistribute hardware queues on a CPU hotplug event") Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.
blk-mq-sched: set up scheduler tags when bringing up new queues
If a new hardware queue is added at runtime, we don't allocate scheduler
tags for it, leading to a crash. This hooks up the scheduler framework
to blk_mq_{init,exit}_hctx() to make sure everything gets properly
initialized/freed.
blk-mq: use the right hctx when getting a driver tag fails
While dispatching requests, if we fail to get a driver tag, we mark the
hardware queue as waiting for a tag and put the requests on a
hctx->dispatch list to be run later when a driver tag is freed. However,
blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
queues if using a single-queue scheduler with a multiqueue device. If
blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
are processing. This means we end up using the hardware queue of the
previous request, which may or may not be the same as that of the
current request. If it isn't, the wrong hardware queue will end up
waiting for a tag, and the requests will be on the wrong dispatch list,
leading to a hang.
The fix is twofold:
1. Make sure we save which hardware queue we were trying to get a
request for in blk_mq_get_driver_tag() regardless of whether it
succeeds or not.
2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
blk_mq_hw_queue to make it clear that it must handle multiple
hardware queues, since I've already messed this up on a couple of
occasions.
This didn't appear in testing with nvme and mq-deadline because nvme has
more driver tags than the default number of scheduler tags. However,
with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.
Tobias Regnery [Thu, 30 Mar 2017 10:34:14 +0000 (12:34 +0200)]
CIFS: Fix build failure with smb2
I saw the following build error during a randconfig build:
fs/cifs/smb2ops.c: In function 'smb2_new_lease_key':
fs/cifs/smb2ops.c:1104:2: error: implicit declaration of function 'generate_random_uuid' [-Werror=implicit-function-declaration]
Explicit include the right header to fix this issue.
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
Sachin Prabhu [Fri, 10 Feb 2017 10:33:51 +0000 (16:03 +0530)]
Introduce cifs_copy_file_range()
The earlier changes to copy range for cifs unintentionally disabled the more
common form of server side copy.
The patch introduces the file_operations helper cifs_copy_file_range()
which is used by the syscall copy_file_range. The new file operations
helper allows us to perform server side copies for SMB2.0 and 2.1
servers as well as SMB 3.0+ servers which do not support the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE.
The new helper uses the ioctl FSCTL_SRV_COPYCHUNK_WRITE to perform
server side copies. The helper is called by vfs_copy_file_range() only
once an attempt to clone the file using the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE has failed.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
Server side copy is one of the most important mechanisms smb2/smb3
supports and it was unintentionally disabled for most use cases.
Renaming calls to reflect the underlying smb2 ioctl called. This is
similar to the name duplicate_extents used for a similar ioctl which is
also used to duplicate files by reusing fs blocks. The name change is to
avoid confusion.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Sachin Prabhu [Fri, 3 Mar 2017 23:41:38 +0000 (15:41 -0800)]
Handle mismatched open calls
A signal can interrupt a SendReceive call which result in incoming
responses to the call being ignored. This is a problem for calls such as
open which results in the successful response being ignored. This
results in an open file resource on the server.
The patch looks into responses which were cancelled after being sent and
in case of successful open closes the open fids.
For this patch, the check is only done in SendReceive2()
Will Deacon [Fri, 31 Mar 2017 11:23:43 +0000 (12:23 +0100)]
Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""
The use of the contiguous bit by our hugetlb implementation violates
the break-before-make requirements of the architecture and can lead to
silent data corruption or TLB conflict aborts. Once again, disable these
hugetlb sizes whilst it gets worked out.
It used to be sufficient just to call pagefault_disable(), because that
also disabled preemption. But the two were decoupled in commit 8222dbe21e79
("sched/preempt, mm/fault: Decouple preemption from the page fault
logic") in mid 2015.
So add the missing preempt_disable/enable(). We should also call
disable_kernel_fp(), although it does nothing by default, there is a
debug switch to make it active and all enables should be paired with
disables.
Tony Lindgren [Thu, 30 Mar 2017 16:16:39 +0000 (09:16 -0700)]
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
Recent pinctrl changes to allow dynamic allocation of pins exposed one
more issue with the pinctrl pins claimed early by the controller itself.
This caused a regression for IMX6 pinctrl hogs.
Before enabling the pin controller driver we need to wait until it has
been properly initialized, then claim the hogs, and only then enable it.
To fix the regression, split the code into pinctrl_claim_hogs() and
pinctrl_enable(). And then let's require that pinctrl_enable() is always
called by the pin controller driver when ready after calling
pinctrl_register_and_init().
Depends-on: 950b0d91dc10 ("pinctrl: core: Fix regression caused by delayed
work for hogs") Fixes: df61b366af26 ("pinctrl: core: Use delayed work for hogs") Fixes: e566fc11ea76 ("pinctrl: imx: use generic pinctrl helpers for
managing groups") Cc: Haojian Zhuang <haojian.zhuang@linaro.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Nishanth Menon <nm@ti.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Stefan Agner <stefan@agner.ch> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Gary Bisson <gary.bisson@boundarydevices.com> Tested-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Merge tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull XFS fixes from Darrick Wong:
"Here are three more fixes for 4.11.
The first one reworks the inline directory verifier to check the
working copy of the directory metadata and to avoid triggering a
periodic crash in xfs/348. The second patch fixes a regression in hole
punching at EOF that corrupts files; and the third patch closes a
kernel memory disclosure bug.
Summary:
- rework the inline directory verifier to avoid crashes on disk
corruption
- don't change file size when punching holes w/ KEEP_SIZE
- close a kernel memory exposure bug"
* tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix kernel memory exposure problems
xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
xfs: rework the inline directory verifiers
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Lantiq:
- Fix adding xbar resoures causing a panic
Loongson3:
- Some Loongson 3A don't identify themselves as having an FTLB so
hardwire that knowledge into CPU probing.
- Handle Loongson 3 TLB peculiarities in the fast path of the RDHWR
emulation.
- Fix invalid FTLB entries with huge page on VTLB+FTLB platforms
- Add missing calculation of S-cache and V-cache cache-way size
Ralink:
- Fix typos in rt3883 pinctrl data
Generic:
- Force o32 fp64 support on 32bit MIPS64r6 kernels
- Yet another build fix after the linux/sched.h changes
- Wire up statx system call
- Fix stack unwinding after introduction of IRQ stack
- Fix spinlock code to build even for microMIPS with recent binutils
SMP-CPS:
- Fix retrieval of VPE mask on big endian CPUs"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: IRQ Stack: Unwind IRQ stack onto task stack
MIPS: c-r4k: Fix Loongson-3's vcache/scache waysize calculation
MIPS: Flush wrong invalid FTLB entry for huge page
MIPS: Check TLB before handle_ri_rdhwr() for Loongson-3
MIPS: Add MIPS_CPU_FTLB for Loongson-3A R2
MIPS: Lantiq: fix missing xbar kernel panic
MIPS: smp-cps: Fix retrieval of VPE mask on big endian CPUs
MIPS: Wire up statx system call
MIPS: Include asm/ptrace.h now linux/sched.h doesn't
MIPS: ralink: Fix typos in rt3883 pinctrl
MIPS: End spinlocks with .insn
MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels
Merge tag 'trace-v4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Wei Yongjun fixed a long standing bug in the ring buffer startup test.
If for some unknown reason, the kthread that is created fails to be
created, the return from kthread_create() is an PTR_ERR and not a
NULL. The test incorrectly checks for NULL instead of an error"
* tag 'trace-v4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Fix return value check in test_ringbuffer()
It's unused for ages, used to be required for ksyms.c back in the v1.1
times.
Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: cb8864559631 ("infiniband: Fix alignment of mmap cookies ...") Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Doug Ledford <dledford@redhat.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Nitin Gupta [Fri, 31 Mar 2017 22:48:53 +0000 (15:48 -0700)]
sparc64: Fix memory corruption when THP is enabled
The memory corruption was happening due to incorrect
TLB/TSB flushing of hugepages.
Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Hromatka [Fri, 31 Mar 2017 22:31:42 +0000 (16:31 -0600)]
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
This commit moves sparc64's prototype of pmd_write() outside
of the CONFIG_TRANSPARENT_HUGEPAGE ifdef.
In 2013, commit a7b9403f0e6d ("sparc64: Encode huge PMDs using PTE
encoding.") exposed a path where pmd_write() could be called without
CONFIG_TRANSPARENT_HUGEPAGE defined. This can result in the panic below.
The diff is awkward to read, but the changes are straightforward.
pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE.
Also, __HAVE_ARCH_PMD_WRITE was defined.
Dan Carpenter [Fri, 17 Mar 2017 20:41:14 +0000 (23:41 +0300)]
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
kzalloc() won't actually fail because sizeof(*resize) is small, but
static checkers complain.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
1) Reject invalid updates to netfilter expectation policies, from Pablo
Neira Ayuso.
2) Fix memory leak in nfnl_cthelper, from Jeffy Chen.
3) Don't do stupid things if we get a neigh_probe() on a neigh entry
whose ops lack a solicit method. From Eric Dumazet.
4) Don't transmit packets in r8152 driver when the carrier is off, from
Hayes Wang.
5) Fix ipv6 packet type detection in aquantia driver, from Pavel
Belous.
6) Don't write uninitialized data into hw registers in bna driver, from
Arnd Bergmann.
7) Fix locking in ping_unhash(), from Eric Dumazet.
8) Make BPF verifier range checks able to understand certain sequences
emitted by LLVM, from Alexei Starovoitov.
9) Fix use after free in ipconfig, from Mark Rutland.
10) Fix refcount leak on force commit in openvswitch, from Jarno
Rajahalme.
11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov.
12) Fix endianness bug in be2net driver, from Suresh Reddy.
13) Don't forget to wake TX queues when processing a timeout, from
Grygorii Strashko.
14) ARP header on-stack storage is wrong in flow dissector, from Simon
Horman.
15) Lost retransmit and reordering SNMP stats in TCP can be
underreported. From Yuchung Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
nfp: fix potential use after free on xdp prog
tcp: fix reordering SNMP under-counting
tcp: fix lost retransmit SNMP under-counting
sctp: get sock from transport in sctp_transport_update_pmtu
net: ethernet: ti: cpsw: fix race condition during open()
l2tp: fix PPP pseudo-wire auto-loading
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
l2tp: take reference on sessions being dumped
tcp: minimize false-positives on TCP/GRO check
sctp: check for dst and pathmtu update in sctp_packet_config
flow dissector: correct size of storage for ARP
net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
l2tp: take a reference on sessions used in genetlink handlers
l2tp: hold session while sending creation notifications
l2tp: fix duplicate session creation
l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
l2tp: fix race in l2tp_recv_common()
sctp: use right in and out stream cnt
bpf: add various verifier test cases for self-tests
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
...
Jakub Kicinski [Tue, 4 Apr 2017 22:56:55 +0000 (15:56 -0700)]
nfp: fix potential use after free on xdp prog
We should unregister the net_device first, before we give back
our reference on xdp_prog. Otherwise xdp_prog may be freed
before .ndo_stop() disabled the datapath. Found by code inspection.
Fixes: ecd63a0217d5 ("nfp: add XDP support in the driver") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the reordering SNMP counters only increase if a connection
sees a higher degree then it has previously seen. It ignores if the
reordering degree is not greater than the default system threshold.
This significantly under-counts the number of reordering events
and falsely convey that reordering is rare on the network.
This patch properly and faithfully records the number of reordering
events detected by the TCP stack, just like the comment says "this
exciting event is worth to be remembered". Note that even so TCP
still under-estimate the actual reordering events because TCP
requires TS options or certain packet sequences to detect reordering
(i.e. ACKing never-retransmitted sequence in recovery or disordered
state).
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The lost retransmit SNMP stat is under-counting retransmission
that uses segment offloading. This patch fixes that so all
retransmission related SNMP counters are consistent.
Fixes: 10d3be569243 ("tcp-tso: do not split TSO packets at retransmit time") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'kbuild-fixes-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- hand-off primary maintainership of Kbuild
- fix build warnings
- fix build error when GCOV is enabled with old compiler
- fix HAVE_ASM_GOTO check when GCC plugin is enabled
* tag 'kbuild-fixes-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
gconfig: remove misleading parentheses around a condition
jump label: fix passing kbuild_cflags when checking for asm goto support
Kbuild: use cc-disable-warning consistently for maybe-uninitialized
kbuild: external module build warnings when KBUILD_OUTPUT set and W=1
MAINTAINERS: add Masahiro Yamada as a Kbuild maintainer
Merge tag 'kvm-arm-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
From: Christoffer Dall <cdall@linaro.org>
KVM/ARM Fixes for v4.11-rc6
Fixes include:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
James Hogan [Tue, 4 Apr 2017 07:51:34 +0000 (08:51 +0100)]
metag/usercopy: Add missing fixups
The rapf copy loops in the Meta usercopy code is missing some extable
entries for HTP cores with unaligned access checking enabled, where
faults occur on the instruction immediately after the faulting access.
Add the fixup labels and extable entries for these cases so that corner
case user copy failures don't cause kernel crashes.
James Hogan [Mon, 3 Apr 2017 16:41:40 +0000 (17:41 +0100)]
metag/usercopy: Fix src fixup in from user rapf loops
The fixup code to rewind the source pointer in
__asm_copy_from_user_{32,64}bit_rapf_loop() always rewound the source by
a single unit (4 or 8 bytes), however this is insufficient if the fault
didn't occur on the first load in the loop, as the source pointer will
have been incremented but nothing will have been stored until all 4
register [pairs] are loaded.
Read the LSM_STEP field of TXSTATUS (which is already loaded into a
register), a bit like the copy_to_user versions, to determine how many
iterations of MGET[DL] have taken place, all of which need rewinding.
James Hogan [Tue, 4 Apr 2017 10:43:26 +0000 (11:43 +0100)]
metag/usercopy: Set flags before ADDZ
The fixup code for the copy_to_user rapf loops reads TXStatus.LSM_STEP
to decide how far to rewind the source pointer. There is a special case
for the last execution of an MGETL/MGETD, since it leaves LSM_STEP=0
even though the number of MGETLs/MGETDs attempted was 4. This uses ADDZ
which is conditional upon the Z condition flag, but the AND instruction
which masked the TXStatus.LSM_STEP field didn't set the condition flags
based on the result.
Fix that now by using ANDS which does set the flags, and also marking
the condition codes as clobbered by the inline assembly.
James Hogan [Fri, 31 Mar 2017 10:14:02 +0000 (11:14 +0100)]
metag/usercopy: Zero rest of buffer from copy_from_user
Currently we try to zero the destination for a failed read from userland
in fixup code in the usercopy.c macros. The rest of the destination
buffer is then zeroed from __copy_user_zeroing(), which is used for both
copy_from_user() and __copy_from_user().
Unfortunately we fail to zero in the fixup code as D1Ar1 is set to 0
before the fixup code entry labels, and __copy_from_user() shouldn't even
be zeroing the rest of the buffer.
Move the zeroing out into copy_from_user() and rename
__copy_user_zeroing() to raw_copy_from_user() since it no longer does
any zeroing. This also conveniently matches the name needed for
RAW_COPY_USER support in a later patch.
Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Cc: stable@vger.kernel.org
Xin Long [Tue, 4 Apr 2017 05:39:55 +0000 (13:39 +0800)]
sctp: get sock from transport in sctp_transport_update_pmtu
This patch is almost to revert commit 02f3d4ce9e81 ("sctp: Adjust PMTU
updates to accomodate route invalidation."). As t->asoc can't be NULL
in sctp_transport_update_pmtu, it could get sk from asoc, and no need
to pass sk into that function.
It is also to remove some duplicated codes from that function.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Hogan [Fri, 31 Mar 2017 12:35:01 +0000 (13:35 +0100)]
metag/usercopy: Add early abort to copy_to_user
When copying to userland on Meta, if any faults are encountered
immediately abort the copy instead of continuing on and repeatedly
faulting, and worse potentially copying further bytes successfully to
subsequent valid pages.
Fixes: 373cd784d0fc ("metag: Memory handling") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: linux-metag@vger.kernel.org Cc: stable@vger.kernel.org
James Hogan [Fri, 31 Mar 2017 10:23:18 +0000 (11:23 +0100)]
metag/usercopy: Fix alignment error checking
Fix the error checking of the alignment adjustment code in
raw_copy_from_user(), which mistakenly considers it safe to skip the
error check when aligning the source buffer on a 2 or 4 byte boundary.
If the destination buffer was unaligned it may have started to copy
using byte or word accesses, which could well be at the start of a new
(valid) source page. This would result in it appearing to have copied 1
or 2 bytes at the end of the first (invalid) page rather than none at
all.
James Hogan [Fri, 31 Mar 2017 09:37:44 +0000 (10:37 +0100)]
metag/usercopy: Drop unused macros
Metag's lib/usercopy.c has a bunch of copy_from_user macros for larger
copies between 5 and 16 bytes which are completely unused. Before fixing
zeroing lets drop these macros so there is less to fix.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-metag@vger.kernel.org Cc: stable@vger.kernel.org
Wei Yongjun [Fri, 17 Jun 2016 17:33:59 +0000 (17:33 +0000)]
ring-buffer: Fix return value check in test_ringbuffer()
In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().
Vic Yang [Fri, 24 Mar 2017 17:44:01 +0000 (18:44 +0100)]
mfd: cros-ec: Fix host command buffer size
For SPI, we can get up to 32 additional bytes for response preamble.
The current overhead (2 bytes) may cause problems when we try to receive
a big response. Update it to 32 bytes.
Without this fix we could see a kernel BUG when we receive a big response
from the Chrome EC when is connected via SPI.
Signed-off-by: Vic Yang <victoryang@google.com> Tested-by: Enric Balletbo i Serra <enric.balletbo.collabora.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Frederic Barrat [Wed, 29 Mar 2017 17:19:42 +0000 (19:19 +0200)]
powerpc/mm: Add missing global TLB invalidate if cxl is active
Commit 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") converted local
TLB invalidates to global if the cxl driver is active. This is necessary
because the CAPP snoops invalidations to forward them to the PSL on the
cxl adapter. However one path was forgotten. native_flush_hash_range()
still does local TLB invalidates, as found out the hard way recently.
This patch fixes it by following the same logic as previously: if the
cxl driver is active, the local TLB invalidates are 'upgraded' to
global.
powerpc/64: Fix flush_(d|i)cache_range() called from modules
When the kernel is compiled to use 64bit ABIv2 the _GLOBAL() macro does
not include a global entry point. A function's global entry point is
used when the function is called from a different TOC context and in the
kernel this typically means a call from a module into the vmlinux (or
vice-versa).
There are a few exported asm functions declared with _GLOBAL() and
calling them from a module will likely crash the kernel since any TOC
relative load will yield garbage.
flush_icache_range() and flush_dcache_range() are both exported to
modules, and use the TOC, so must use _GLOBAL_TOC().
Fixes: 721aeaa9fdf3 ("powerpc: Build little endian ppc64 kernel with ABIv2") Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge tag 'gpio-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull late GPIO fixes from Linus Walleij:
"Some late coming ACPI fixes for GPIO.
We're dealing with ACPI issues here. The first is related to wake IRQs
on Bay Trail/Cherry Trail CPUs which are common in laptops. The second
is about proper probe deferral when reading _CRS properties.
For my untrained eye it seems there was some quarrel between the BIOS
and the kernel about who is supposed to deal with wakeups from GPIO
lines"
* tag 'gpio-v4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
ACPI / gpio: do not fall back to parsing _CRS when we get a deferral
gpio: acpi: Call enable_irq_wake for _IAE GpioInts with Wake set
Sekhar Nori [Mon, 3 Apr 2017 12:04:28 +0000 (17:34 +0530)]
net: ethernet: ti: cpsw: fix race condition during open()
TI's cpsw driver handles both OF and non-OF case for phy
connect. Unfortunately of_phy_connect() returns NULL on
error while phy_connect() returns ERR_PTR().
To handle this, cpsw_slave_open() overrides the return value
from phy_connect() to make it NULL or error.
This leaves a small window, where cpsw_adjust_link() may be
invoked for a slave while slave->phy pointer is temporarily
set to -ENODEV (or some other error) before it is finally set
to NULL.
_cpsw_adjust_link() only handles the NULL case, and an oops
results when ERR_PTR() is seen by it.
Note that cpsw_adjust_link() checks PHY status for each
slave whenever it is invoked. It can so happen that even
though phy_connect() for a given slave returns error,
_cpsw_adjust_link() is still called for that slave because
the link status of another slave changed.
Fix this by using a temporary pointer to store return value
of {of_}phy_connect() and do a one-time write to slave->phy.
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reported-by: Yan Liu <yan-liu@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is just mostly stuff that missed rc5, from vmwgfx and msm
drivers"
* tag 'drm-fixes-for-v4.11-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/msm: Make sure to detach the MMU during GPU cleanup
drm/msm/hdmi: redefinitions of macros not required
drm/msm/mdp5: Update SSPP_MAX value
drm/msm/dsi: Fix bug in dsi_mgr_phy_enable
drm/msm: Don't allow zero sized buffer objects
drm/msm: Fix wrong pointer check in a5xx_destroy
drm/msm: adreno: fix build error without debugfs
drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
drm/vmwgfx: Remove getparam error message
drm/ttm: Avoid calling drm_ht_remove from atomic context
drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
drm/vmwgfx: Type-check lookups of fence objects
Fixes: f1f39f911027 ("l2tp: auto load type modules") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 3 Apr 2017 10:19:10 +0000 (11:19 +0100)]
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
Trival fix, rename HW_INTERRUT_ASSERT_SET_* to HW_INTERRUPT_ASSERT_SET_*
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>