Chuck Lever [Mon, 11 May 2015 18:02:25 +0000 (14:02 -0400)]
SUNRPC: Transport fault injection
It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss. To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.
Fault injection is disabled by default. It is enabled with
where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.
These hooks are disabled when SUNRPC_DEBUG is turned off.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Wed, 10 Jun 2015 20:54:28 +0000 (16:54 -0400)]
NFS: Remove unused nfs_rw_ops->rw_release() function
This was only ever set to nfs_writeback_release_common(), a function
which is completely empty. Let's just drop this function pointer and
simplify the code a bit.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:29 +0000 (16:14 -0400)]
sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.
Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:28 +0000 (16:14 -0400)]
sunrpc: lock xprt before trying to set memalloc on the sockets
It's possible that we could race with a call to xs_reset_transport, in
which case the xprt->inet pointer could be zeroed out while we're
accessing it. Lock the xprt before we try to set memalloc on it.
Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:27 +0000 (16:14 -0400)]
sunrpc: if we're closing down a socket, clear memalloc on it first
We currently increment the memalloc_socks counter if we have a xprt that
is associated with a swapfile. That socket can be replaced however
during a reconnect event, and the memalloc_socks counter is never
decremented if that occurs.
When tearing down a xprt socket, check to see if the xprt is set up for
swapping and sk_clear_memalloc before releasing the socket if so.
Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:26 +0000 (16:14 -0400)]
sunrpc: make xprt->swapper an atomic_t
Split xs_swapper into enable/disable functions and eliminate the
"enable" flag.
Currently, it's racy if you have multiple swapon/swapoff operations
running in parallel over the same xprt. Also fix it so that we only
set it to a memalloc socket on a 0->1 transition and only clear it
on a 1->0 transition.
Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:25 +0000 (16:14 -0400)]
sunrpc: keep a count of swapfiles associated with the rpc_clnt
Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.
To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.
Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.
Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.
This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.
Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Thu, 4 Jun 2015 19:37:10 +0000 (15:37 -0400)]
SUNRPC: Fix a backchannel race
We need to allow the server to send a new request immediately after we've
replied to the previous one. Right now, there is a window between the
send and the release of the old request in rpc_put_task(), where the
server could send us a new backchannel RPC call, and we have no
request to service it.
Chuck Lever [Tue, 26 May 2015 15:53:52 +0000 (11:53 -0400)]
NFS: Fix size of NFSACL SETACL operations
When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.
Fixes: ee5dc7732bd5 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Fri, 8 May 2015 03:10:40 +0000 (13:10 +1000)]
NFS: report more appropriate block size for directories.
In glibc 2.21 (and several previous), a call to opendir() will
result in a 32K (BUFSIZ*4) buffer being allocated and passed to
getdents.
However a call to fdopendir() results in an 'fstat' request to
determine block size and a matching buffer allocated for subsequent
use with getdents. This will typically be 1M.
The first getdents call on an NFS directory will always use
READDIR_PLUS (or NFSv4 equivalent) if available. Subsequent getdents
calls only use this more expensive version if some 'stat' requests are
made between the getdents calls.
For this reason it is good to keep at least that first getdents call
relatively short. When fdopendir() and readdir() is used on a large
directory, it takes approximately 32 times as long to complete as
using "opendir". Current versions of 'find' use fdopendir() and
demonstrate this slowness.
'stat' on a directory currently returns the 'wsize'. This number has
no meaning on directories.
Actual READDIR requests are limited to ->dtsize, which itself is
capped at 4 pages, coincidently the same as BUFSIZ*4.
So this is a meaningful number to use as the blocksize on directories,
and has the effect of making 'find' on large directories go a lot
faster.
Trond Myklebust [Mon, 1 Jun 2015 14:30:11 +0000 (10:30 -0400)]
NFSv4: Always drain the slot table before re-establishing the lease
While the NFSv4.1 code has always drained the slot tables in order to stop
non-recovery related RPC calls when doing lease recovery, the NFSv4 code
did not.
The reason for the difference in behaviour is that NFSv4 does not have
session state, and so RPC calls can in theory proceed while recovery is
happening. In practice, however, anything I/O or state related needs to
wait until recovery is over.
This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
we can simplify the state recovery code by assuming that we do not have to
deal with races between recovery and ordinary I/O.
Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.
Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Linus Torvalds [Sun, 31 May 2015 23:00:34 +0000 (16:00 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
"Off-by-one in d_walk()/__dentry_kill() race fix.
It's very hard to hit; possible in the same conditions as the original
bug, except that you need the skipped branch to contain all the
remaining evictables, so that the d_walk()-calling loop in
d_invalidate() decides there's nothing more to do and doesn't go for
another pass - otherwise that next pass will sweep the sucker.
So it's not too urgent, but seeing that the fix is obvious and the
original commit has spread into all -stable branches..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
d_walk() might skip too much
Linus Torvalds [Sun, 31 May 2015 19:03:42 +0000 (12:03 -0700)]
Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux
Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 4.1 all across the tree"
* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux:
MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
MIPS: BMIPS: Fix bmips_wr_vec()
MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not set
MIPS: Fuloong 2E: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
MIPS: irq: Use DECLARE_BITMAP
ttyFDC: Fix to use native endian MMIO reads
MIPS: Fix CDMM to use native endian MMIO reads
Linus Torvalds [Sun, 31 May 2015 18:39:25 +0000 (11:39 -0700)]
Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat tool fixes from Len Brown:
"Just one minor kernel dependency in this batch -- added a #define to
msr-index.h"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number to 4.7
tools/power turbostat: allow running without cpu0
tools/power turbostat: correctly decode of ENERGY_PERFORMANCE_BIAS
tools/power turbostat: enable turbostat to support Knights Landing (KNL)
tools/power turbostat: correctly display more than 2 threads/core
Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes, with the exception of the following that
address fall-out from recent v4.1-rc1 changes:
- regression fix related to the big fabric API registration changes
and configfs_depend_item() usage, that required cherry-picking one
of HCH's patches from for-next to address the issue for v4.1 code.
- remaining TCM-USER -v2 related changes to enforce full CDB
passthrough from Andy + Ilias.
Also included is a target_core_pscsi driver fix from Andy that
addresses a long standing issue with a Scsi_Host reference being
leaked on PSCSI device shutdown"
* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
iser-target: Fix error path in isert_create_pi_ctx()
target: Use a PASSTHROUGH flag instead of transport_types
target: Move passthrough CDB parsing into a common function
target/user: Only support full command pass-through
target/user: Update example code for new ABI requirements
target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem
target: Drop signal_pending checks after interruptible lock acquire
target: Add missing parentheses
target: Fix bidi command handling
target/user: Disallow full passthrough (pass_level=0)
ISCSI: fix minor memory leak
Linus Torvalds [Sun, 31 May 2015 18:24:49 +0000 (11:24 -0700)]
Merge tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Some late hwmon patches, all headed for -stable
- fix sysfs attribute initialization in nct6775 and nct6683 drivers
- do not attempt to auto-detect tmp435 on I2C address 0x37
- ensure iio channel is of type IIO_VOLTAGE in ntc_thermistor driver"
* tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6683) Add missing sysfs attribute initialization
hwmon: (nct6775) Add missing sysfs attribute initialization
hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37
hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE
Roland Dreier [Sat, 30 May 2015 06:12:10 +0000 (23:12 -0700)]
iser-target: Fix error path in isert_create_pi_ctx()
We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed
in the function. That means the cleanup path should use the local
pi_ctx variable, not desc->pi_ctx.
Andy Grover [Tue, 19 May 2015 21:44:41 +0000 (14:44 -0700)]
target: Use a PASSTHROUGH flag instead of transport_types
It seems like we only care if a transport is passthrough or not. Convert
transport_type to a flags field and replace TRANSPORT_PLUGIN_* with a
flag, TRANSPORT_FLAG_PASSTHROUGH.
Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:40 +0000 (14:44 -0700)]
target: Move passthrough CDB parsing into a common function
Aside from whether they handle BIDI ops or not, parsing of the CDB by
kernel and user SCSI passthrough modules should be identical. Move this
into a new passthrough_parse_cdb() and call it from tcm-pscsi and tcm-user.
Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Tue, 19 May 2015 21:44:39 +0000 (14:44 -0700)]
target/user: Only support full command pass-through
After much discussion, give up on only passing a subset of SCSI commands
to userspace and pass them all. Based on what pscsi is doing, make sure
to set SCF_SCSI_DATA_CDB for I/O ops, and define attributes identical to
pscsi.
Make hw_block_size configurable via dev param.
Remove mention of command filtering from tcmu-design.txt.
Signed-off-by: Andy Grover <agrover@redhat.com> Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Andy Grover [Fri, 22 May 2015 21:07:44 +0000 (14:07 -0700)]
target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
See https://bugzilla.redhat.com/show_bug.cgi?id=1025672
We need to put() the reference to the scsi host that we got in
pscsi_configure_device(). In VIRTUAL_HOST mode it is associated with
the dev_virt, not the hba_virt.
Signed-off-by: Andy Grover <agrover@redhat.com> Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There is just one configfs subsystem in the target code, so we might as
well add two helpers to reference / unreference it from the core code
instead of passing pointers to it around.
This fixes a regression introduced for v4.1-rc1 with commit 9ac8928e6,
where configfs_depend_item() callers using se_tpg_tfo->tf_subsys would
fail, because the assignment from the original target_core_subsystem[]
is no longer happening at target_register_template() time.
The following error message is seen when loading the nct6683 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
The following error message is seen when loading the nct6775 driver
with DEBUG_LOCK_ALLOC enabled.
BUG: key ffff88040b2f0030 not in .data!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 186 at kernel/locking/lockdep.c:2988
lockdep_init_map+0x469/0x630()
DEBUG_LOCKS_WARN_ON(1)
Caused by a missing call to sysfs_attr_init() when initializing
sysfs attributes.
Linus Torvalds [Sat, 30 May 2015 00:09:39 +0000 (17:09 -0700)]
Merge tag 'acpi-pci-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull PCI / ACPI fix from Rafael Wysocki:
"This fixes a bug uncovered by a recent driver core change that
modified the implementation of the ACPI_COMPANION_SET() macro to
strictly rely on its second argument to be either NULL or a valid
pointer to struct acpi_device.
As it turns out, pcibios_root_bridge_prepare() on x86 and ia64 works
with the assumption that the only code path calling pci_create_root_bus()
is pci_acpi_scan_root() and therefore the sysdata argument passed to
it will always match the expectations of pcibios_root_bridge_prepare().
That need not be the case, however, and in particular it is not the
case for the Xen pcifront driver that passes a pointer to its own
private data strcture as sysdata to pci_scan_bus_parented() which then
passes it to pci_create_root_bus() and it ends up being used incorrectly
by pcibios_root_bridge_prepare()"
* tag 'acpi-pci-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI / ACPI: Do not set ACPI companions for host bridges with parents
Linus Torvalds [Fri, 29 May 2015 23:45:45 +0000 (16:45 -0700)]
Merge tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs fixes from Dave Chinner:
"This is a little larger than I'd like late in the release cycle, but
all the fixes are for regressions introduced in the 4.1-rc1 merge, or
are needed back in -stable kernels fairly quickly as they are
filesystem corruption or userspace visible correctness issues.
Changes in this update:
- regression fix for new rename whiteout code
- regression fixes for new superblock generic per-cpu counter code
- fix for incorrect error return sign introduced in 3.17
- metadata corruption fixes that need to go back to -stable kernels"
* tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: fix broken i_nlink accounting for whiteout tmpfile inode
xfs: xfs_iozero can return positive errno
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs: extent size hints can round up extents past MAXEXTLEN
xfs: inode and free block counters need to use __percpu_counter_compare
percpu_counter: batch size aware __percpu_counter_compare()
xfs: use percpu_counter_read_positive for mp->m_icount
Linus Torvalds [Fri, 29 May 2015 21:39:24 +0000 (14:39 -0700)]
Merge tag 'dm-4.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper fixes from Mike Snitzer:
"Quite a few fixes for DM's blk-mq support thanks to extra DM multipath
testing from Junichi Nomura and Bart Van Assche.
Also fix a casting bug in dm_merge_bvec() that could cause only a
single page to be added to a bio (Joe identified this while testing
dm-cache writeback)"
* tag 'dm-4.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix casting bug in dm_merge_bvec()
dm: fix reload failure of 0 path multipath mapping on blk-mq devices
dm: fix false warning in free_rq_clone() for unmapped requests
dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path
dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED
dm: run queue on re-queue
Linus Torvalds [Fri, 29 May 2015 20:28:57 +0000 (13:28 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/gdb: fix lx-lsmod refcnt
omfs: fix potential integer overflow in allocator
omfs: fix sign confusion for bitmap loop counter
omfs: set error return when d_make_root() fails
fs, omfs: add NULL terminator in the end up the token list
MAINTAINERS: update CAPABILITIES pattern
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
tracing/mm: don't trace mm_page_pcpu_drain on offline cpus
tracing/mm: don't trace mm_page_free on offline cpus
tracing/mm: don't trace kmem_cache_free on offline cpus
Linus Torvalds [Fri, 29 May 2015 18:24:28 +0000 (11:24 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull fixes for cpumask and modules from Rusty Russell:
"** NOW WITH TESTING! **
Two fixes which got lost in my recent distraction. One is a weird
cpumask function which needed to be rewritten, the other is a module
bug which is cc:stable"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
cpumask_set_cpu_local_first => cpumask_local_spread, lament
module: Call module notifier on failure after complete_formation()
MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
Correct a regression introduced with 8453eebd [MIPS: Fix strnlen_user()
return value in case of overlong strings.] causing assembler warnings
and broken code generated in __strnlen_kernel_nocheck_asm:
arch/mips/lib/strnlen_user.S: Assembler messages:
arch/mips/lib/strnlen_user.S:64: Warning: Macro instruction expanded into multiple instructions in a branch delay slot
with the CPU_DADDI_WORKAROUNDS option set, resulting in the function
looping indefinitely upon mounting NFS root.
Use conditional assembly to avoid a microMIPS code size regression.
Using $at unconditionally would cause such a regression as there are no
16-bit instruction encodings available for ALU operations using this
register. Using $v1 unconditionally would produce short microMIPS
encodings, but would prevent this register from being used across calls
to this function.
The extra LI operation introduced is free, replacing a NOP originally
scheduled into the delay slot of the branch that follows.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10205/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Linus Torvalds [Fri, 29 May 2015 17:52:15 +0000 (10:52 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is made up 4 groups of fixes detailed below.
vgem:
Due to some misgivings about possible bad use cases this allow,
backout a chunk of the interface to stop those use cases for now.
radeon:
Fix for an oops regression in the audio code, and a partial revert
for a fix that was cauing problems.
nouveau:
regression fix for Fermi, and display-less Maxwell boot fixes.
drm core:
a fix for i915 cursor vblank waiting in the atomic helpers"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau/gr/gm204: remove a stray printk
drm/nouveau/devinit/gm100-: force devinit table execution on boards without PDISP
drm/nouveau/devinit/gf100: make the force-post condition more obvious
drm/nouveau/gr/gf100-: fix wrong constant definition
drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling"
drm/radeon/audio: make sure connector is valid in hotplug case
Revert "drm/radeon: only mark audio as connected if the monitor supports it (v3)"
drm/radeon: don't share plls if monitors differ in audio support
drm/vgem: drop DRIVER_PRIME (v2)
drm/plane-helper: Adapt cursor hack to transitional helpers
Linus Torvalds [Fri, 29 May 2015 17:43:27 +0000 (10:43 -0700)]
Merge tag 'sound-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"No big surprise here, just a bunch of small fixes for HD-audio and
USB-audio:
- partial revert of widget power-saving for IDT codecs
- revert mute-LED enum ctl for Thinkpads due to confusion
- a quirk for a new Radeon HDMI controller
- Realtek codec name fix for Dell
- a workaround for headphone mic boost on some laptops
- stream_pm ops setup (and its fix for regression)
- another quirk for MS LifeCam USB-audio"
* tag 'sound-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix lost sound due to stream_pm ops cleanup
ALSA: hda - Disable Headphone Mic boost for ALC662
ALSA: hda - Disable power_save_node for IDT92HD71bxx
ALSA: hda - Fix noise on AMD radeon 290x controller
ALSA: hda - Set stream_pm ops automatically by generic parser
ALSA: hda/realtek - Add ALC256 alias name for Dell
Revert "ALSA: hda - Add mute-LED mode control to Thinkpad"
ALSA: usb-audio: Add quirk for MS LifeCam HD-3000
Joe Thornber [Fri, 29 May 2015 13:52:51 +0000 (14:52 +0100)]
dm: fix casting bug in dm_merge_bvec()
dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn"). In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int. This code made
assumptions about the value of BIO_MAX_SECTORS.
A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors(). At
this point the cast from sector_t to int resulted in a zero value. The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.
This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.19+
With following kernel message:
device-mapper: ioctl: can't change device type after initial table load.
DM tries to inherit the current table type using dm_table_set_type()
but it doesn't work as expected because of unnecessary check about
whether the target type is hybrid or not.
Hybrid type is for targets that work as either request-based or bio-based
and not required for blk-mq or non blk-mq checking.
Fixes: 65803c205983 ("dm table: train hybrid target type detection to select blk-mq if appropriate") Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Linus Torvalds [Fri, 29 May 2015 17:35:21 +0000 (10:35 -0700)]
Merge tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md
Pull m,ore md bugfixes gfrom Neil Brown:
"Assorted fixes for new RAID5 stripe-batching functionality.
Unfortunately this functionality was merged a little prematurely. The
necessary testing and code review is now complete (or as complete as
it can be) and to code passes a variety of tests and looks quite
sensible.
Also a fix for some recent locking changes - a race was introduced
which causes a reshape request to sometimes fail. No data safety
issues"
* tag 'md/4.1-rc5-fixes' of git://neil.brown.name/md:
md: fix race when unfreezing sync_action
md/raid5: break stripe-batches when the array has failed.
md/raid5: call break_stripe_batch_list from handle_stripe_clean_event
md/raid5: be more selective about distributing flags across batch.
md/raid5: add handle_flags arg to break_stripe_batch_list.
md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list
md/raid5: remove condition test from check_break_stripe_batch_list.
md/raid5: Ensure a batch member is not handled prematurely.
md/raid5: close race between STRIPE_BIT_DELAY and batching.
md/raid5: ensure whole batch is delayed for all required bitmap updates.
Mike Snitzer [Thu, 28 May 2015 19:12:52 +0000 (15:12 -0400)]
dm: fix false warning in free_rq_clone() for unmapped requests
When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.
The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request"). But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.
Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).
Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request") Reported-by: Bart Van Assche <bart.vanassche@sandisk.com> Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
ARM: multi_v7_defconfig: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
Since commit 100832abf065bc18 ("usb: isp1760: Make HCD support
optional"), CONFIG_USB_ISP1760_HCD is automatically selected when
needed. Enabling that option in the defconfig is now a no-op, and no
longer enables ISP1760 HCD support.
Re-enable the ISP1760 driver in the defconfig by enabling
USB_ISP1760_HOST_ROLE instead.
Arnd Bergmann [Fri, 29 May 2015 12:30:35 +0000 (14:30 +0200)]
Merge tag 'imx-fixes-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
Merge "The i.MX fixes for 4.1, 3rd round" from Shawn Guo:
It includes a couple of fixes for i.MX6 GPC code to let the new kernel
be able to boot with old DTBs:
- Booting v4.1-rc kernel with old DTBs will fail with a fat warning
(require low-level debug to be seen), due to the adoption of stacked
IRQ domain. The first fix improves the situation by allowing kernel
boot up with old DTBs, although suspend/resume still breaks.
- Booting new kernel with old DTBs that do not have power-domain info
will result in a hang. The second patch fixes the hang by skipping
the kernel power-domain registration if DTB has no power-domain info.
* tag 'imx-fixes-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: imx6: gpc: don't register power domain if DT data is missing
ARM: imx6: allow booting with old DT
Arnd Bergmann [Fri, 29 May 2015 12:29:37 +0000 (14:29 +0200)]
Merge tag 'samsung-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes
Merge "Samsung fix for v4.1" from Kukjin Kim:
- Set display clock correctly for exynos4412-trats2
: fix the following error
exynos-drm: No connectors reported connected with modes
[drm] Cannot find any crtc or sizes - going 1024x768
* tag 'samsung-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: dts: set display clock correctly for exynos4412-trats2
Takashi Iwai [Fri, 29 May 2015 07:43:29 +0000 (09:43 +0200)]
ALSA: hda - Fix lost sound due to stream_pm ops cleanup
The commit [49fb18972581: ALSA: hda - Set stream_pm ops automatically
by generic parser] resulted in regressions on some Realtek and VIA
codecs because these drivers set patch_ops after calling the generic
parser, thus stream_pm got cleared to NULL again. I haven't noticed
since I tested with IDT codec.
Restore (partial revert) the stream_pm ops for them to fix the
regression.
Fixes: 49fb18972581 ('ALSA: hda - Set stream_pm ops automatically by generic parser') Reported-by: Jeremiah Mahler <jmmahler@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Al Viro [Fri, 29 May 2015 03:09:19 +0000 (23:09 -0400)]
d_walk() might skip too much
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.
Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.
Cc: stable@vger.kernel.org # 3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 2f35c41f58a9 ("module: Replace module_ref with atomic_t refcnt")
changes the way refcnt is handled but did not update the gdb script to
use the new variable.
Since refcnt is not per-cpu anymore, we can directly read its value.
Signed-off-by: Adrien Schildknecht <adrien+dev@schischi.me> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Pantelis Koukousoulas <pktoss@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Copeland [Thu, 28 May 2015 22:44:35 +0000 (15:44 -0700)]
omfs: fix sign confusion for bitmap loop counter
The count variable is used to iterate down to (below) zero from the size
of the bitmap and handle the one-filling the remainder of the last
partial bitmap block. The loop conditional expects count to be signed
in order to detect when the final block is processed, after which count
goes negative.
Unfortunately, a recent change made this unsigned along with some other
related fields. The result of is this is that during mount,
omfs_get_imap will overrun the bitmap array and corrupt memory unless
number of blocks happens to be a multiple of 8 * blocksize.
Fix by changing count back to signed: it is guaranteed to fit in an s32
without overflow due to an enforced limit on the number of blocks in the
filesystem.
Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin [Thu, 28 May 2015 22:44:29 +0000 (15:44 -0700)]
fs, omfs: add NULL terminator in the end up the token list
match_token() expects a NULL terminator at the end of the token list so
that it would know where to stop. Not having one causes it to overrun
to invalid memory.
In practice, passing a mount option that omfs didn't recognize would
sometimes panic the system.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 28 May 2015 22:44:24 +0000 (15:44 -0700)]
fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappings
load_elf_binary() returns `retval', not `error'.
Fixes: a87938b2e246b81b4fb ("fs/binfmt_elf.c: fix bug in loading of PIE binaries") Reported-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tracing/mm: don't trace mm_page_pcpu_drain on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_pcpu_drain can be called on an offline cpu
in this scenario caught by LOCKDEP:
tracing/mm: don't trace mm_page_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_mm_page_free can be called on an offline cpu in this
scenario caught by LOCKDEP:
tracing/mm: don't trace kmem_cache_free on offline cpus
Since tracepoints use RCU for protection, they must not be called on
offline cpus. trace_kmem_cache_free can be called on an offline cpu in
this scenario caught by LOCKDEP:
Dave Airlie [Fri, 29 May 2015 01:13:52 +0000 (11:13 +1000)]
Merge branch 'linux-4.1' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
Regression fix for Fermi acceleration, and fixes important to bringing
up display-less Maxwell boards.
* 'linux-4.1' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau/gr/gm204: remove a stray printk
drm/nouveau/devinit/gm100-: force devinit table execution on boards without PDISP
drm/nouveau/devinit/gf100: make the force-post condition more obvious
drm/nouveau/gr/gf100-: fix wrong constant definition
Commit 3740c82590d8 ("drm/nouveau/gr/gf100-: add symbolic names for
classes") introduced a wrong macro definition causing acceleration setup
to fail. Fix it.
Signed-off-by: Lars Seipel <ls@slrz.net> Fixes: 3740c82590d8 ("drm/nouveau/gr/gf100-: add symbolic names for classes") Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Brian Foster [Thu, 28 May 2015 22:14:55 +0000 (08:14 +1000)]
xfs: fix broken i_nlink accounting for whiteout tmpfile inode
XFS uses the internal tmpfile() infrastructure for the whiteout inode
used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates
the inode, drops di_nlink, adds the inode to the agi unlinked list,
calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode,
and then finishes the common inode setup (e.g., clear I_NEW and unlock).
The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was
pulled up out of that function as part of the following commit to
resolve a deadlock issue:
330033d6 xfs: fix tmpfile/selinux deadlock and initialize security
As a result, callers of xfs_create_tmpfile() are responsible for either
calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout
tmpfile allocation helper does neither. As a result, the vfs ->i_nlink
becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links
it back into the source dentry and calls xfs_bumplink().
Update the assert in xfs_rename() to help detect this problem in the
future and update xfs_rename_alloc_whiteout() to decrement the link
count as part of the manual tmpfile inode setup.
Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:40:32 +0000 (07:40 +1000)]
xfs: xfs_iozero can return positive errno
It was missed when we converted everything in XFs to use negative error
numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global
error sign conversion"), and should go back to stable kernels.
Thanks to Brian Foster for noticing it.
cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:40:08 +0000 (07:40 +1000)]
xfs: xfs_attr_inactive leaves inconsistent attr fork state behind
xfs_attr_inactive() is supposed to clean up the attribute fork when
the inode is being freed. While it removes attribute fork extents,
it completely ignores attributes in local format, which means that
there can still be active attributes on the inode after
xfs_attr_inactive() has run.
This leads to problems with concurrent inode writeback - the in-core
inode attribute fork is removed without locking on the assumption
that nothing will be attempting to access the attribute fork after a
call to xfs_attr_inactive() because it isn't supposed to exist on
disk any more.
To fix this, make xfs_attr_inactive() completely remove all traces
of the attribute fork from the inode, regardless of it's state.
Further, also remove the in-core attribute fork structure safely so
that there is nothing further that needs to be done by callers to
clean up the attribute fork. This means we can remove the in-core
and on-disk attribute forks atomically.
Also, on error simply remove the in-memory attribute fork. There's
nothing that can be done with it once we have failed to remove the
on-disk attribute fork, so we may as well just blow it away here
anyway.
cc: <stable@vger.kernel.org> # 3.12 to 4.0 Reported-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Clearly indicating that the extent length is greater than MAXEXTLEN,
which is 2097151. A prior trace point shows the allocation was an
exact size match and that a length greater than MAXEXTLEN was asked
for:
We don't see this problem with extent size hints through the IO path
because we can't do single IOs large enough to trigger MAXEXTLEN
allocation. fallocate(), OTOH, is not limited in it's allocation
sizes and so needs help here.
The issue is that the extent size hint alignment is rounding up the
extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking
into account extent size hints when calculating the maximum extent
length to allocate. xfs_bmapi_reserve_delalloc() is already doing
this, but direct extent allocation is not.
Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is
wrong, and it works only because delayed allocation extents are not
limited in size to MAXEXTLEN in the in-core extent tree. hence this
calculation does not work for direct allocation, and the delalloc
code needs fixing. This may, in fact be the underlying bug that
occassionally causes transaction overruns in delayed allocation
extent conversion, so now we know it's wrong we should fix it, too.
Many thanks to Brian Foster for finding this problem during review
of this patch.
Hence the fix, after much code reading, is to allow
xfs_bmap_extsize_align() to align partial extents when full
alignment would extend the alignment past MAXEXTLEN. We can safely
do this because all callers have higher layer allocation loops that
already handle short allocations, and so will simply run another
allocation to cover the remainder of the requested allocation range
that we ignored during alignment. The advantage of this approach is
that it also removes the need for callers to do anything other than
limit their requests to MAXEXTLEN - they don't really need to be
aware of extent size hints at all.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
Dave Chinner [Thu, 28 May 2015 21:39:34 +0000 (07:39 +1000)]
xfs: inode and free block counters need to use __percpu_counter_compare
Because the counters use a custom batch size, the comparison
functions need to be aware of that batch size otherwise the
comparison does not work correctly. This leads to ASSERT failures
on generic/027 like this:
XFS uses non-stanard batch sizes for avoiding frequent global
counter updates on it's allocated inode counters, as they increment
or decrement in batches of 64 inodes. Hence the standard percpu
counter batch of 32 means that the counter is effectively a global
counter. Currently Xfs uses a batch size of 128 so that it doesn't
take the global lock on every single modification.
However, Xfs also needs to compare accurately against zero, which
means we need to use percpu_counter_compare(), and that has a
hard-coded batch size of 32, and hence will spuriously fail to
detect when it is supposed to use precise comparisons and hence
the accounting goes wrong.
Add __percpu_counter_compare() to take a custom batch size so we can
use it sanely in XFS and factor percpu_counter_compare() to use it.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
George Wang [Thu, 28 May 2015 21:39:34 +0000 (07:39 +1000)]
xfs: use percpu_counter_read_positive for mp->m_icount
Function percpu_counter_read just return the current counter, which can be
negative. This will cause the checking of "allocated inode
counts <= m_maxicount" false positive. Use percpu_counter_read_positive can
solve this problem, and be consistent with the purpose to introduce percpu
mechanism to xfs.
Signed-off-by: George Wang <xuw2015@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
ALSA: hda - Disable Headphone Mic boost for ALC662
When headphone mic boost is above zero, some 10 - 20 second delay
might occur before the headphone mic is operational.
Therefore disable the headphone mic boost control (recording gain is
sufficient even without it).
(Note: this patch is not about the headset mic, it's about the less
common mic-in only mode.)
BugLink: https://bugs.launchpad.net/bugs/1454235 Suggested-by: Kailang Yang <kailang@realtek.com> Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
NeilBrown [Thu, 28 May 2015 07:53:29 +0000 (17:53 +1000)]
md: fix race when unfreezing sync_action
A recent change removed the need for locking around writing
to "sync_action" (and various other places), but introduced a
subtle race.
When e.g. setting 'reshape' on a 'frozen' array, the 'frozen'
flag is cleared before 'reshape' is set, so the md thread can
get in and start trying recovery - which isn't wanted.
So instead of clearing MD_RECOVERY_FROZEN for any command
except 'frozen', only clear it when each specific command
is parsed. This allows the handling of 'reshape' to clear
the bit while a lock is held.
Also remove some places where we set MD_RECOVERY_NEEDED,
as it is always set on non-error exit of the function.
Signed-off-by: NeilBrown <neilb@suse.de> Fixes: 6791875e2e53 ("md: make reconfig_mutex optional for writes to md sysfs files.")
NeilBrown [Thu, 21 May 2015 02:56:41 +0000 (12:56 +1000)]
md/raid5: call break_stripe_batch_list from handle_stripe_clean_event
Now that the code in break_stripe_batch_list() is nearly identical
to the end of handle_stripe_clean_event, replace the later
with a function call.
The only remaining difference of any interest is the masking that is
applieds to dev[i].flags copied from head_sh.
R5_WriteError certainly isn't wanted as it is set per-stripe, not
per-patch. R5_Overlap isn't wanted as it is explicitly handled.
NeilBrown [Thu, 21 May 2015 02:40:26 +0000 (12:40 +1000)]
md/raid5: be more selective about distributing flags across batch.
When a batch of stripes is broken up, we keep some of the flags
that were per-stripe, and copy other flags from the head to all
others.
This only happens while a stripe is being handled, so many of the
flags are irrelevant.
The "SYNC_FLAGS" (which I've renamed to make it clear there are
several) and STRIPE_DEGRADED are set per-stripe and so need to be
preserved. STRIPE_INSYNC is the only flag that is set on the head
that needs to be propagated to all others.
For safety, add a WARN_ON if others are set, except:
STRIPE_HANDLE - this is safe and per-stripe and we are going to set
in several cases anyway
STRIPE_INSYNC
STRIPE_IO_STARTED - this is just a hint and doesn't hurt.
STRIPE_ON_PLUG_LIST
STRIPE_ON_RELEASE_LIST - It is a point pointless for a batched
stripe to be on one of these lists, but it can happen
as can be safely ignored.
NeilBrown [Thu, 21 May 2015 02:20:36 +0000 (12:20 +1000)]
md/raid5: add handle_flags arg to break_stripe_batch_list.
When we break a stripe_batch_list we sometimes want to set
STRIPE_HANDLE on the individual stripes, and sometimes not.
So pass a 'handle_flags' arg. If it is zero, always set STRIPE_HANDLE
(on non-head stripes). If not zero, only set it if any of the given
flags are present.
NeilBrown [Thu, 21 May 2015 02:00:47 +0000 (12:00 +1000)]
md/raid5: duplicate some more handle_stripe_clean_event code in break_stripe_batch_list
break_stripe_batch list didn't clear head_sh->batch_head.
This was probably a bug.
Also clear all R5_Overlap flags and if any were cleared, wake up
'wait_for_overlap'.
This isn't always necessary but the worst effect is a little
extra checking for code that is waiting on wait_for_overlap.
Also, don't use wake_up_nr() because that does the wrong thing
if 'nr' is zero, and it number of flags cleared doesn't
strongly correlate with the number of threads to wake.
NeilBrown [Thu, 21 May 2015 01:50:16 +0000 (11:50 +1000)]
md/raid5: remove condition test from check_break_stripe_batch_list.
handle_stripe_clean_event() contains a chunk of code very
similar to check_break_stripe_batch_list().
If we make the latter more like the former, we can end up
with just one copy of this code.
This first step removed the condition (and the 'check_') part
of the name. This has the added advantage of making it clear
what check is being performed at the point where the function is
called.
NeilBrown [Fri, 22 May 2015 05:20:04 +0000 (15:20 +1000)]
md/raid5: Ensure a batch member is not handled prematurely.
If a stripe is a member of a batch, but not the head, it must
not be handled separately from the rest of the batch.
'clear_batch_ready()' handles this requirement to some
extent but not completely. If a member is passed to handle_stripe()
a second time it returns '0' indicating the stripe can be handled,
which is wrong.
So add an extra test.
da91309e0a7e (cpumask: Utility function to set n'th cpu...) created a
genuinely weird function. I never saw it before, it went through DaveM.
(He only does this to make us other maintainers feel better about our own
mistakes.)
cpumask_set_cpu_local_first's purpose is say "I need to spread things
across N online cpus, choose the ones on this numa node first"; you call
it in a loop.
It can fail. One of the two callers ignores this, the other aborts and
fails the device open.
It can fail in two ways: allocating the off-stack cpumask, or through a
convoluted codepath which AFAICT can only occur if cpu_online_mask
changes. Which shouldn't happen, because if cpu_online_mask can change
while you call this, it could return a now-offline cpu anyway.
It contains a nonsensical test "!cpumask_of_node(numa_node)". This was
drawn to my attention by Geert, who said this causes a warning on Sparc.
It sets a single bit in a cpumask instead of returning a cpu number,
because that's what the callers want.
It could be made more efficient by passing the previous cpu rather than
an index, but that would be more invasive to the callers.
Fixes: da91309e0a7e8966d916a74cce42ed170fde06bf Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased) Tested-by: Amir Vadai <amirv@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 26 May 2015 22:43:45 +0000 (08:43 +1000)]
md/raid5: close race between STRIPE_BIT_DELAY and batching.
When we add a write to a stripe we need to make sure the bitmap
bit is set. While doing that the stripe is not locked so it could
be added to a batch after which further changes to STRIPE_BIT_DELAY
and ->bm_seq are ineffective.
So we need to hold off adding to a stripe until bitmap_startwrite has
completed at least once, and we need to avoid further changes to
STRIPE_BIT_DELAY once the stripe has been added to a batch.
If a bitmap_startwrite() completes after the stripe was added to a
batch, it will not have set the bit, only incremented a counter, so no
extra delay of the stripe is needed.
Reported-by: Shaohua Li <shli@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de>
Dave Airlie [Thu, 28 May 2015 00:37:35 +0000 (10:37 +1000)]
Merge branch 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
one revert, and two regression fixes for audio/hdmi
* 'drm-fixes-4.1' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon/audio: make sure connector is valid in hotplug case
Revert "drm/radeon: only mark audio as connected if the monitor supports it (v3)"
drm/radeon: don't share plls if monitors differ in audio support
PCI / ACPI: Do not set ACPI companions for host bridges with parents
Commit 97badf873ab6 (device property: Make it possible to use
secondary firmware nodes) uncovered a bug in the x86 (and ia64) PCI
host bridge initialization code that assumes bridge->bus->sysdata
to always point to a struct pci_sysdata object which need not be
the case (in particular, the Xen PCI frontend driver sets it to point
to a different data type). If it is not the case, an incorrect
pointer (or a piece of data that is not a pointer at all) will be
passed to ACPI_COMPANION_SET() and that may cause interesting
breakage to happen going forward.
To work around this problem use the observation that the ACPI
host bridge initialization always passes NULL as parent to
pci_create_root_bus(), so if pcibios_root_bridge_prepare() sees
a non-NULL parent of the bridge, it should not attempt to set
an ACPI companion for it, because that means that
pci_create_root_bus() has been called by someone else.
Fixes: 97badf873ab6 (device property: Make it possible to use secondary firmware nodes) Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com>