huhai [Fri, 18 May 2018 14:32:30 +0000 (08:32 -0600)]
blk-mq: clear hctx->dispatch_from when mappings change
When the number of hardware queues is changed, the drivers will call
blk_mq_update_nr_hw_queues() to remap hardware queues. This changes
the ctx mappings, but the current code doesn't clear the
->dispatch_from hint. This can result in dispatch_from pointing to
a ctx that isn't mapped to the hctx anymore.
Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Signed-off-by: huhai <huhai@kylinos.cn> Reviewed-by: Ming Lei <ming.lei@redhat.com>
Moved the placement of the clearing to where we clear other items
pertaining to the existing mapping, added Fixes line, and reworded
the commit message.
Josef Bacik [Wed, 16 May 2018 18:51:22 +0000 (14:51 -0400)]
nbd: call nbd_bdev_reset instead of bd_set_size on disconnect
We need to make sure we don't just set the size of the bdev to 0 while
it's being used by a file system. We have the appropriate check in
nbd_bdev_reset, simply use that helper instead.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Wed, 16 May 2018 18:51:21 +0000 (14:51 -0400)]
nbd: fix how we set bd_invalidated
bd_invalidated is kind of a pain wrt partitions as it really only
triggers the partition rescan if it is set after bd_ops->open() runs, so
setting it when we reset the device isn't useful. We also sporadically
would still have partitions left over in some disconnect cases, so fix
this by always setting bd_invalidated on open if there's no
configuration or if we've had a disconnect action happen, that way the
partition table gets invalidated and rescanned properly.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Wed, 16 May 2018 18:51:20 +0000 (14:51 -0400)]
nbd: clear_sock on netlink disconnect
This is what the ioctl based nbd disconnect does as well. Without this
the device will just sit there and wait for the connection to go away
(or IO to occur) before the device gets torn down. Instead clear
everything up on our end so the configuration goes away as quickly as
possible.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Wed, 16 May 2018 18:51:19 +0000 (14:51 -0400)]
nbd: use bd_set_size when updating disk size
When we stopped relying on the bdev everywhere I broke updating the
block device size on the fly, which ceph relies on. We can't just do
set_capacity, we also have to do bd_set_size so things like parted will
notice the device size change.
Fixes: 29eaadc ("nbd: stop using the bdev everywhere")
cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Wed, 16 May 2018 18:51:18 +0000 (14:51 -0400)]
nbd: update size when connected
I messed up changing the size of an NBD device while it was connected by
not actually updating the device or doing the uevent. Fix this by
updating everything if we're connected and we change the size.
cc: stable@vger.kernel.org Fixes: 639812a ("nbd: don't set the device size until we're connected") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Josef Bacik [Wed, 16 May 2018 18:51:17 +0000 (14:51 -0400)]
nbd: fix nbd device deletion
This fixes a use after free bug, we shouldn't be doing disk->queue right
after we do del_gendisk(disk). Save the queue and do the cleanup after
the del_gendisk.
Josef Bacik [Wed, 16 May 2018 18:36:01 +0000 (14:36 -0400)]
block: fix MAINTAINERS email for nbd
I've been missing stuff because it's been going into my work email which
is a black hole. Update to the email I actually use so I stop missing
patches and bug reports.
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
huhai [Wed, 16 May 2018 14:21:21 +0000 (08:21 -0600)]
blk-mq: remove redundant insert case in blk_mq_make_request()
We can use blk_mq_sched_insert_request() even if we don't have
an IO scheduler attached, since that case will end up being
exactly the same as what blk_mq_queue_io() was doing now.
Kent Overstreet [Wed, 9 May 2018 01:33:56 +0000 (21:33 -0400)]
block: Add warning for bi_next not NULL in bio_endio()
Recently found a bug where a driver left bi_next not NULL and then
called bio_endio(), and then the submitter of the bio used
bio_copy_data() which was treating src and dst as lists of bios.
Fixed that bug by splitting out bio_list_copy_data(), but in case other
things are depending on bi_next in weird ways, add a warning to help
avoid more bugs like that in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Kent Overstreet [Wed, 9 May 2018 01:33:54 +0000 (21:33 -0400)]
block: Split out bio_list_copy_data()
Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().
Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 14 May 2018 18:17:31 +0000 (12:17 -0600)]
sbitmap: fix race in wait batch accounting
If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.
This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.
For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.
Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index and wakes up
our batch count, loser gets to call __sbq_wake_up() again to
account his wakeups towards the next active wait state index.
Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
All in-tree host drivers set up a proper dma mask and use the dma-mapping
helpers. This means they will be able to deal with any address that we
are throwing at them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:55:14 +0000 (13:55 -0600)]
kyber-iosched: update shallow depth when setting up hardware queue
We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.
Omar Sandoval [Thu, 10 May 2018 00:16:31 +0000 (17:16 -0700)]
sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
The sbitmap queue wake batch is calculated such that once allocations
start blocking, all of the bits which are already allocated must be
enough to fulfill the batch counters of all of the waitqueues. However,
the shallow allocation depth can break this invariant, since we block
before our full depth is being utilized. Add
sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
the sbq will use, and update sbq_calc_wake_batch() to take it into
account.
Paolo Valente [Fri, 4 May 2018 17:17:01 +0000 (19:17 +0200)]
block, bfq: postpone rq preparation to insert or merge
When invoked for an I/O request rq, the prepare_request hook of bfq
increments reference counters in the destination bfq_queue for rq. In
this respect, after this hook has been invoked, rq may still be
transformed into a request with no icq attached, i.e., for bfq, a
request not associated with any bfq_queue. No further hook is invoked
to signal this tranformation to bfq (in general, to the destination
elevator for rq). This leads bfq into an inconsistent state, because
bfq has no chance to correctly lower these counters back. This
inconsistency may in its turn cause incorrect scheduling and hangs. It
certainly causes memory leaks, by making it impossible for bfq to free
the involved bfq_queue.
On the bright side, no transformation can still happen for rq after rq
has been inserted into bfq, or merged with another, already inserted,
request. Exploiting this fact, this commit addresses the above issue
by delaying the preparation of an I/O request to when the request is
inserted or merged.
This change also gives a performance bonus: a lock-contention point
gets removed. To prepare a request, bfq needs to hold its scheduler
lock. After postponing request preparation to insertion or merging, no
lock needs to be grabbed any longer in the prepare_request hook, while
the lock already taken to perform insertion or merging is used to
preparare the request as well.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, struct request has four timestamp fields:
- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq
These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Omar Sandoval [Wed, 9 May 2018 09:08:51 +0000 (02:08 -0700)]
block: use ktime_get_ns() instead of sched_clock() for cfq and bfq
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e5d47 ("block: disable preemption before
using sched_clock()").
Omar Sandoval [Wed, 9 May 2018 09:08:50 +0000 (02:08 -0700)]
block: get rid of struct blk_issue_stat
struct blk_issue_stat squashes three things into one u64:
- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling
It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Omar Sandoval [Wed, 9 May 2018 09:08:49 +0000 (02:08 -0700)]
block: replace bio->bi_issue_stat with bio-specific type
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Omar Sandoval [Wed, 9 May 2018 09:08:48 +0000 (02:08 -0700)]
block: pass struct request instead of struct blk_issue_stat to wbt
issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Omar Sandoval [Wed, 9 May 2018 09:08:47 +0000 (02:08 -0700)]
block: move some wbt helpers to blk-wbt.c
A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Jens Axboe [Mon, 7 May 2018 16:03:23 +0000 (10:03 -0600)]
blk-wbt: throttle discards like background writes
Throttle discards like we would any background write. Discards should
be background activity, so if they are impacting foreground IO, then
we will throttle them down.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 7 May 2018 15:57:08 +0000 (09:57 -0600)]
blk-wbt: pass in enum wbt_flags to get_rq_wait()
This is in preparation for having more write queues, in which
case we would have needed to pass in more information than just
a simple 'is_kswapd' boolean.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tetsuo Handa [Fri, 4 May 2018 16:58:09 +0000 (10:58 -0600)]
loop: remember whether sysfs_create_group() was done
syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.
Thomas Gleixner [Fri, 4 May 2018 14:32:47 +0000 (16:32 +0200)]
block: Shorten interrupt disabled regions
Commit 9c40cef2b799 ("sched: Move blk_schedule_flush_plug() out of
__schedule()") moved the blk_schedule_flush_plug() call out of the
interrupt/preempt disabled region in the scheduler. This allows to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug_list().
But it makes more sense to disable interrupts explicitly when the request
queue is locked end reenable them when the request to is unlocked. This
shortens the interrupt disabled section which is important when the plug
list contains requests for more than one queue. The comment which claims
that disabling interrupts around the loop is misleading as the called
functions can reenable interrupts unconditionally anyway and obfuscates the
scope badly:
Aside of that the detached interrupt disabling is a constant pain for
PREEMPT_RT as it requires patching and special casing when RT is enabled
while with the spin_*_irq() variants this happens automatically.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 2fff8a924d4c ("block: Check locking assumptions at runtime") added a
lockdep_assert_held(q->queue_lock) which makes the WARN_ON() redundant
because lockdep will detect and warn about context violations.
The unconditional WARN_ON() does not provide real additional value, so it
can be removed.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
block: don't disable interrupts during kmap_atomic()
bounce_copy_vec() disables interrupts around kmap_atomic(). This is a
leftover from the old kmap_atomic() implementation which relied on fixed
mapping slots, so the caller had to make sure that the same slot could not
be reused from an interrupting context.
kmap_atomic() was changed to dynamic slots long ago and commit 1ec9c5ddc17a
("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
removed the slot assignements, but the callers were not checked for now
redundant interrupt disabling.
Remove the conditional interrupt disable.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 6 May 2018 15:46:29 +0000 (05:46 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pll KVM fixes from Radim Krčmář:
"ARM:
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
x86:
- Speed up injection of expired timers (for stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove APIC Timer periodic/oneshot spikes
arm64: vgic-v2: Fix proxying of cpuif access
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Linus Torvalds [Sun, 6 May 2018 15:42:24 +0000 (05:42 -1000)]
Merge tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- fix a compile warning in the AMD IOMMU driver with irq remapping
disabled
- fix for VT-d interrupt remapping and invalidation size (caused a
BUG_ON when trying to invalidate more than 4GB)
- build fix and a regression fix for broken graphics with old DTS for
the rockchip iommu driver
- a revert in the PCI window reservation code which fixes a regression
with VFIO.
* tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: rockchip: fix building without CONFIG_OF
iommu/vt-d: Use WARN_ON_ONCE instead of BUG_ON in qi_flush_dev_iotlb()
iommu/vt-d: fix shift-out-of-bounds in bug checking
iommu/dma: Move PCI window region reservation back into dma specific path.
iommu/rockchip: Make clock handling optional
iommu/amd: Hide unused iommu_table_lock
iommu/vt-d: Fix usage of force parameter in intel_ir_reconfigure_irte()
Linus Torvalds [Sun, 6 May 2018 15:37:24 +0000 (05:37 -1000)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when
the evaluation of physical and virtual bits which uses the same CPUID
leaf was moved out of get_cpu_cap()"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Restore CPUID_8000_0008_EBX reload
Linus Torvalds [Sun, 6 May 2018 15:35:23 +0000 (05:35 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource fixes from Thomas Gleixner:
"The recent addition of the early TSC clocksource breaks on machines
which have an unstable TSC because in case that TSC is disabled, then
the clocksource selection logic falls back to the early TSC which is
obviously bogus.
That also unearthed a few robustness issues in the clocksource
derating code which are addressed as well"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Rework stale comment
clocksource: Consistent de-rate when marking unstable
x86/tsc: Fix mark_tsc_unstable()
clocksource: Initialize cs->wd_list
clocksource: Allow clocksource_mark_unstable() on unregistered clocksources
x86/tsc: Always unregister clocksource_tsc_early
Linus Torvalds [Sun, 6 May 2018 15:34:06 +0000 (05:34 -1000)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix to prevent false positives in the spurious interrupt
detector when more than a single demultiplex register is evaluated in
the Qualcom irq combiner driver"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/qcom: Fix check for spurious interrupts
Linus Torvalds [Sun, 6 May 2018 03:30:58 +0000 (17:30 -1000)]
Merge tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
- We missed a case in the Dell config dependencies resulting in a
possible bad configuration, resolve it by giving up on trying to keep
DELL_LAPTOP visible in the menu and make it depend on DELL_SMBIOS.
- Fix a null pointer dereference at module unload for the asus-wireless
driver.
* tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: Kconfig: Fix dell-laptop dependency chain.
platform/x86: asus-wireless: Fix NULL pointer dereference
Linus Torvalds [Sun, 6 May 2018 03:28:08 +0000 (17:28 -1000)]
Merge tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB driver fixes for 4.17-rc4.
The majority of them are some USB gadget fixes that missed my last
pull request. The "largest" patch in here is a fix for the old visor
driver that syzbot found 6 months or so ago and I finally remembered
to fix it.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: host: ehci: Use dma_pool_zalloc()"
usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
usb: typec: tcpm: Release the role mux when exiting
USB: Accept bulk endpoints with 1024-byte maxpacket
xhci: Fix use-after-free in xhci_free_virt_device
USB: serial: visor: handle potential invalid device configuration
USB: serial: option: adding support for ublox R410M
usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
usb: musb: host: fix potential NULL pointer dereference
usb: gadget: composite Allow for larger configuration descriptors
usb: dwc3: gadget: Fix list_del corruption in dwc3_ep_dequeue
usb: dwc3: gadget: dwc3_gadget_del_and_unmap_request() can be static
usb: dwc2: pci: Fix error return code in dwc2_pci_probe()
usb: dwc2: WA for Full speed ISOC IN in DDMA mode.
usb: dwc2: dwc2_vbus_supply_init: fix error check
usb: gadget: f_phonet: fix pn_net_xmit()'s return type
Since the commit "8003c9ae204e: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes.
Here the results on a 150000 times 1ms timer without any load:
Before 8003c9ae204e | After 8003c9ae204e
Max 1834us | 86000us
Mean 1100us | 1021us
Deviation 59us | 149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
Before 8003c9ae204e | After 8003c9ae204e
Max 32000us | 140000us
Mean 1006us | 1997us
Deviation 140us | 11095us
The root cause of the problem is starting hrtimer with an expiry time
already in the past can take more than 20 milliseconds to trigger the
timer function. It can be solved by forward such past timers
immediately, rather than submitting them to hrtimer_start().
In case the timer is periodic, update the target expiration and call
hrtimer_start with it.
v2: Check if the tsc deadline is already expired. Thank you Mika.
v3: Execute the past timers immediately rather than submitting them to
hrtimer_start().
v4: Rearm the periodic timer with advance_periodic_target_expiration() a
simpler version of set_target_expiration(). Thank you Paolo.
Radim Krčmář [Sat, 5 May 2018 21:05:31 +0000 (23:05 +0200)]
Merge tag 'kvmarm-fixes-for-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/arm fixes for 4.17, take #2
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
Linus Torvalds [Sat, 5 May 2018 07:15:25 +0000 (21:15 -1000)]
Merge tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- remove state comment in modpost
- extend MAINTAINERS entry to cover modpost and more makefiles
- fix missed building of SANCOV gcc-plugin
- replace left-over 'bison' with $(YACC)
- display short log when generating parer of genksyms
* tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: fix typo in parse.tab.{c,h} generation rules
kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
gcc-plugins: fix build condition of SANCOV plugin
MAINTAINERS: Update Kbuild entry with a few paths
modpost: delete stale comment
Linus Torvalds [Sat, 5 May 2018 07:12:06 +0000 (21:12 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes froom Stephen Boyd:
"A handful of fixes for the stm32mp1 clk driver came in during the
merge window for the driver that got merged in the merge window.
Plus a warning fix for unused PM ops and a couple fixes for the meson
clk driver clk names that went unnoticed with the regmap rework.
There's also another fix in here for the mux rounding flag which
wasn't doing what it said it did, but now it does"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: meson: meson8b: fix meson8b_cpu_clk parent clock name
clk: meson: meson8b: fix meson8b_fclk_div3_div clock name
clk: meson: drop meson_aoclk_gate_regmap_ops
clk: meson: honor CLK_MUX_ROUND_CLOSEST in clk_regmap
clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
clk: cs2000: mark resume function as __maybe_unused
clk: stm32mp1: remove ck_apb_dbg clock
clk: stm32mp1: set stgen_k clock as critical
clk: stm32mp1: add missing tzc2 clock
clk: stm32mp1: fix SAI3 & SAI4 clocks
clk: stm32mp1: remove unused dfsdm_src[] const
clk: stm32mp1: add missing static
Linus Torvalds [Sat, 5 May 2018 07:05:12 +0000 (21:05 -1000)]
Merge tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"vmwgfx, i915, vc4, vga dac fixes.
This seems eerily quiet, so I expect it will explode next week or
something.
One i915 model firmware, two vmwgfx fixes, one vc4 fix and one bridge
leak fix"
* tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/bridge: vga-dac: Fix edid memory leak
drm/vc4: Make sure vc4_bo_{inc,dec}_usecnt() calls are balanced
drm/i915/glk: Add MODULE_FIRMWARE for Geminilake
drm/vmwgfx: Fix a buffer object leak
drm/vmwgfx: Clean up fbdev modeset locking
Linus Torvalds [Sat, 5 May 2018 06:57:28 +0000 (20:57 -1000)]
Merge tag 'trace-v4.17-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Some of the files in the tracing directory show file mode 0444 when
they are writable by root. To fix the confusion, they should be 0644.
Note, either case root can still write to them.
Zhengyuan asked why I never applied that patch (the first one is from
2014!). I simply forgot about it. /me lowers head in shame"
* tag 'trace-v4.17-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix the file mode of stack tracer
ftrace: Have set_graph_* files have normal file modes
Linus Torvalds [Sat, 5 May 2018 06:51:10 +0000 (20:51 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"This is our first pull request of the rc cycle. It's not that it's
been overly quiet, we were just waiting on a few things before sending
this off.
For instance, the 6 patch series from Intel for the hfi1 driver had
actually been pulled in on Tuesday for a Wednesday pull request, only
to have Jason notice something I missed, so we held off for some
testing, and then on Thursday had to respin the series because the
very first patch needed a minor fix (unnecessary cast is all).
There is a sizable hns patch series in here, as well as a reasonably
largish hfi1 patch series, then all of the lines of uapi updates are
just the change to the new official Linux-OpenIB SPDX tag (a bunch of
our files had what amounts to a BSD-2-Clause + MIT Warranty statement
as their license as a result of the initial code submission years ago,
and the SPDX folks decided it was unique enough to warrant a unique
tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
and core/cache/cma updates to round out the bunch.
None of it was overly large by itself, but in the 2 1/2 weeks we've
been collecting patches, it has added up :-/.
As best I can tell, it's been through 0day (I got a notice about my
last for-next push, but not for my for-rc push, but Jason seems to
think that failure messages are prioritized and success messages not
so much). It's also been through linux-next. And yes, we did notice in
the context portion of the CMA query gid fix patch that there is a
dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
and remove it anywhere we can.
Summary:
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
- SPDX license tag cleanups (new tag Linux-OpenIB)
- RoCE GID fixes related to default GIDs
- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
RDMA/cma: Do not query GID during QP state transition to RTR
IB/mlx4: Fix integer overflow when calculating optimal MTT size
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
IB/hfi1: Fix loss of BECN with AHG
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Fix handling of FECN marked multicast packet
IB/core: Make ib_mad_client_id atomic
iw_cxgb4: Atomically flush per QP HW CQEs
IB/uverbs: Fix kernel crash during MR deregistration flow
IB/uverbs: Prevent reregistration of DM_MR to regular MR
RDMA/mlx4: Add missed RSS hash inner header flag
RDMA/hns: Fix a couple misspellings
RDMA/hns: Submit bad wr
RDMA/hns: Update assignment method for owner field of send wqe
RDMA/hns: Adjust the order of cleanup hem table
RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
RDMA/hns: Remove some unnecessary attr_mask judgement
RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
...
Linus Torvalds [Sat, 5 May 2018 06:41:44 +0000 (20:41 -1000)]
Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should to into this release. This contains:
- Set of bcache fixes from Coly, fixing regression in patches that
went into this series.
- Set of NVMe fixes by way of Keith.
- Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
fixing various issues around device addition/removal.
- Two block inflight fixes from Omar, fixing issues around the
transition to using tags for blk-mq inflight accounting that we
did a few releases ago"
* tag 'for-linus-20180504' of git://git.kernel.dk/linux-block:
bdi: Fix oops in wb_workfn()
nvmet: switch loopback target state to connecting when resetting
nvme/multipath: Fix multipath disabled naming collisions
nvme/multipath: Disable runtime writable enabling parameter
nvme: Set integrity flag for user passthrough commands
nvme: fix potential memory leak in option parsing
bdi: Fix use after free bug in debugfs_remove()
bdi: wake up concurrent wb_shutdown() callers.
bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
bcache: set dc->io_disable to true in conditional_stop_bcache_device()
bcache: add wait_for_kthread_stop() in bch_allocator_thread()
bcache: count backing device I/O error for writeback I/O
bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
bcache: store disk name in struct cache and struct cached_dev
blk-mq: fix sysfs inflight counter
blk-mq: count allocated but not started requests in iostats inflight
Linus Torvalds [Sat, 5 May 2018 06:36:50 +0000 (20:36 -1000)]
Merge tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I've got one more bug fix for xfs for 4.17-rc4, which caps the amount
of data we try to handle in one dedupe request so that userspace can't
livelock the kernel.
This series has been run through a full xfstests run during the week
and through a quick xfstests run against this morning's master, with
no ajor failures reported.
Summary:
- Cap the maximum length of a deduplication request at MAX_RW_COUNT/2
to avoid kernel livelock due to excessively large IO requests"
* tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cap the length of deduplication requests
Linus Torvalds [Sat, 5 May 2018 06:32:18 +0000 (20:32 -1000)]
Merge tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two regression fixes and one fix for stable"
* tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: send, fix missing truncate for inode with prealloc extent past eof
btrfs: Take trans lock before access running trans in check_delayed_ref
btrfs: Fix wrong first_key parameter in replace_path
Since commit d677a4d60193 ("Makefile: support flag
-fsanitizer-coverage=trace-cmp"), you miss to build the SANCOV
plugin under some circumstances.
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
Your compiler does not support -fsanitize-coverage=trace-pc
Your compiler does not support -fsanitize-coverage=trace-cmp
Under this condition, $(CFLAGS_KCOV) is not empty but contains a
space, so the following ifeq-conditional is false.
ifeq ($(CFLAGS_KCOV),)
Then, scripts/Makefile.gcc-plugins misses to add sancov_plugin.so to
gcc-plugin-y while the SANCOV plugin is necessary as an alternative
means.
Fixes: d677a4d60193 ("Makefile: support flag -fsanitizer-coverage=trace-cmp") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org>
Rasmus Villemoes [Thu, 22 Mar 2018 20:58:27 +0000 (21:58 +0100)]
MAINTAINERS: Update Kbuild entry with a few paths
I managed to send some modpost patches to old addresses of both
Masahiro and Michal, and omitted linux-kbuild from cc, because my
tried and trusted scripts/get_maintainer wrapper failed me. Add the
modpost directory to the MAINTAINERS entry, and while at it make the
Makefile glob match scripts/Makefile itself, and add one matching the
Kbuild.include file as well.
Alan writes:
What you can't see just from reading the patch is that in both
cases (ehci->itd_pool and ehci->sitd_pool) there are two
allocation paths -- the two branches of an "if" statement -- and
only one of the paths calls dma_pool_[z]alloc. However, the
memset is needed for both paths, and so it can't be eliminated.
Given that it must be present, there's no advantage to calling
dma_pool_zalloc rather than dma_pool_alloc.
When the module is removed the led workqueue is destroyed in the remove
callback, before the led device is unregistered from the led subsystem.
This leads to a NULL pointer derefence when the led device is
unregistered automatically later as part of the module removal cleanup.
Bellow is the backtrace showing the problem.
Reported-by: Dun Hum <bitter.taste@gmx.com> Cc: stable@vger.kernel.org Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
James Morse [Fri, 4 May 2018 15:19:24 +0000 (16:19 +0100)]
arm64: vgic-v2: Fix proxying of cpuif access
Proxying the cpuif accesses at EL2 makes use of vcpu_data_guest_to_host
and co, which check the endianness, which call into vcpu_read_sys_reg...
which isn't mapped at EL2 (it was inlined before, and got moved OoL
with the VHE optimizations).
The result is of course a nice panic. Let's add some specialized
cruft to keep the broken platforms that require this hack alive.
But, this code used vcpu_data_guest_to_host(), which expected us to
write the value to host memory, instead we have trapped the guest's
read or write to an mmio-device, and are about to replay it using the
host's readl()/writel() which also perform swabbing based on the host
endianness. This goes wrong when both host and guest are big-endian,
as readl()/writel() will undo the guest's swabbing, causing the
big-endian value to be written to device-memory.
What needs doing?
A big-endian guest will have pre-swabbed data before storing, undo this.
If its necessary for the host, writel() will re-swab it.
For a read a big-endian guest expects to swab the data after the load.
The hosts's readl() will correct for host endianness, giving us the
device-memory's value in the register. For a big-endian guest, swab it
as if we'd only done the load.
For a little-endian guest, nothing needs doing as readl()/writel() leave
the correct device-memory value in registers.
Tested on Juno with that rarest of things: a big-endian 64K host.
Based on a patch from Marc Zyngier.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: bf8feb39642b ("arm64: KVM: vgic-v2: Add GICV access from HYP") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
One comment still mentioned process_maintenance operations after
commit af0614991ab6 ("KVM: arm/arm64: vgic: Get rid of unnecessary
process_maintenance operation")
Update the comment to point to vgic_fold_lr_state instead, which
is where maintenance interrupts are taken care of.
Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
James Morse [Wed, 2 May 2018 11:17:02 +0000 (12:17 +0100)]
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
A typo in kvm_vcpu_set_be()'s call:
| vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr)
causes us to use the 32bit register value as an index into the sys_reg[]
array, and sail off the end of the linear map when we try to bring up
big-endian secondaries.
| Unable to handle kernel paging request at virtual address ffff80098b982c00
| Mem abort info:
| ESR = 0x96000045
| Exception class = DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000045
| CM = 0, WnR = 1
| swapper pgtable: 4k pages, 48-bit VAs, pgdp = 000000002ea0571a
| [ffff80098b982c00] pgd=00000009ffff8803, pud=0000000000000000
| Internal error: Oops: 96000045 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 2 PID: 1561 Comm: kvm-vcpu-0 Not tainted 4.17.0-rc3-00001-ga912e2261ca6-dirty #1323
| Hardware name: ARM Juno development board (r1) (DT)
| pstate: 60000005 (nZCv daif -PAN -UAO)
| pc : vcpu_write_sys_reg+0x50/0x134
| lr : vcpu_write_sys_reg+0x50/0x134
Fixes: 8d404c4c24613 ("KVM: arm64: Rewrite system register accessors to read/write functions") CC: Christoffer Dall <cdall@cs.columbia.edu> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Linus Torvalds [Fri, 4 May 2018 15:44:50 +0000 (05:44 -1000)]
Merge tag 'pm-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes a regression from the 4.14 cycle in the CPPC cpufreq driver
causing it to use an incorrect transition delay value which leads to a
very high rate of frequency change requests when the schedutil
governor is in use (Prashanth Prakash)"
* tag 'pm-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / CPPC: Set platform specific transition_delay_us
Linus Torvalds [Fri, 4 May 2018 15:43:33 +0000 (05:43 -1000)]
Merge tag 'acpi-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes an ACPICA utilities (acpidump) build regression from the
4.16 cycle by setting LD in the CFLAGS passed to the linker to $(CC)
again (Jiri Slaby)"
* tag 'acpi-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools: power/acpi, revert to LD = gcc
Linus Torvalds [Fri, 4 May 2018 15:38:51 +0000 (05:38 -1000)]
Merge tag 'media/v4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a trivial one-line fix addressing a PTR_ERR() getting value from a
wrong var at imx driver
- a patch changing my e-mail at the Kernel tree to mchehab@kernel.org.
no code changes
* tag 'media/v4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
MAINTAINERS & files: Canonize the e-mails I use at files
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Linus Torvalds [Fri, 4 May 2018 15:37:22 +0000 (05:37 -1000)]
Merge tag 'sound-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, all deserved for stable.
Two are about core API fixes for the bugs that were triggered by
ever-growing fuzzers, while others are driver-specific fixes"
* tag 'sound-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Check PCM state at xfern compat ioctl
ALSA: aloop: Add missing cable lock to ctl API callbacks
ALSA: dice: fix kernel NULL pointer dereference due to invalid calculation for array index
ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
ALSA: hda - Fix incorrect usage of IS_REACHABLE()
MAINTAINERS & files: Canonize the e-mails I use at files
From now on, I'll start using my @kernel.org as my development e-mail.
As such, let's remove the entries that point to the old
mchehab@s-opensource.com at MAINTAINERS file.
For the files written with a copyright with mchehab@s-opensource,
let's keep Samsung on their names, using mchehab+samsung@kernel.org,
in order to keep pointing to my employer, with sponsors the work.
For the files written before I join Samsung (on July, 4 2013),
let's just use mchehab@kernel.org.
For bug reports, we can simply point to just kernel.org, as
this will reach my mchehab+samsung inbox anyway.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Brian Warner <brian.warner@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Fix inconsistent IS_ERR and PTR_ERR in imx_csi_probe.
The proper pointer to be passed as argument is pinctrl
instead of priv->vdev.
This issue was detected with the help of Coccinelle.
Fixes: 52e17089d185 ("media: imx: Don't initialize vars that won't be used") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>