Jens Axboe [Mon, 14 May 2018 18:17:31 +0000 (12:17 -0600)]
sbitmap: fix race in wait batch accounting
If we have multiple callers of sbq_wake_up(), we can end up in a
situation where the wait_cnt will continually go more and more
negative. Consider the case where our wake batch is 1, hence
wait_cnt will start out as 1.
This ends up with wait_cnt being 0, we'll wakeup immediately
next time. Going through the same loop as above again, and
we'll have wait_cnt -1.
For the case where we have a larger wake batch, the only
difference is that the starting point will be higher. We'll
still end up with continually smaller batch wakeups, which
defeats the purpose of the rolling wakeups.
Always reset the wait_cnt to the batch value. Then it doesn't
matter who wins the race. But ensure that whomever does win
the race is the one that increments the ws index and wakes up
our batch count, loser gets to call __sbq_wake_up() again to
account his wakeups towards the next active wait state index.
Fixes: 6c0ca7ae292a ("sbitmap: fix wakeup hang after sbq resize") Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
All in-tree host drivers set up a proper dma mask and use the dma-mapping
helpers. This means they will be able to deal with any address that we
are throwing at them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 9 May 2018 19:55:14 +0000 (13:55 -0600)]
kyber-iosched: update shallow depth when setting up hardware queue
We don't expect the async depth to be smaller than the wake batch
count for sbitmap, but just in case, inform sbitmap of what shallow
depth kyber may use.
Omar Sandoval [Thu, 10 May 2018 00:16:31 +0000 (17:16 -0700)]
sbitmap: fix missed wakeups caused by sbitmap_queue_get_shallow()
The sbitmap queue wake batch is calculated such that once allocations
start blocking, all of the bits which are already allocated must be
enough to fulfill the batch counters of all of the waitqueues. However,
the shallow allocation depth can break this invariant, since we block
before our full depth is being utilized. Add
sbitmap_queue_min_shallow_depth(), which saves the minimum shallow depth
the sbq will use, and update sbq_calc_wake_batch() to take it into
account.
Paolo Valente [Fri, 4 May 2018 17:17:01 +0000 (19:17 +0200)]
block, bfq: postpone rq preparation to insert or merge
When invoked for an I/O request rq, the prepare_request hook of bfq
increments reference counters in the destination bfq_queue for rq. In
this respect, after this hook has been invoked, rq may still be
transformed into a request with no icq attached, i.e., for bfq, a
request not associated with any bfq_queue. No further hook is invoked
to signal this tranformation to bfq (in general, to the destination
elevator for rq). This leads bfq into an inconsistent state, because
bfq has no chance to correctly lower these counters back. This
inconsistency may in its turn cause incorrect scheduling and hangs. It
certainly causes memory leaks, by making it impossible for bfq to free
the involved bfq_queue.
On the bright side, no transformation can still happen for rq after rq
has been inserted into bfq, or merged with another, already inserted,
request. Exploiting this fact, this commit addresses the above issue
by delaying the preparation of an I/O request to when the request is
inserted or merged.
This change also gives a performance bonus: a lock-contention point
gets removed. To prepare a request, bfq needs to hold its scheduler
lock. After postponing request preparation to insertion or merging, no
lock needs to be grabbed any longer in the prepare_request hook, while
the lock already taken to perform insertion or merging is used to
preparare the request as well.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Tested-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, struct request has four timestamp fields:
- A start time, set at get_request time, in jiffies, used for iostats
- An I/O start time, set at start_request time, in ktime nanoseconds,
used for blk-stats (i.e., wbt, kyber, hybrid polling)
- Another start time and another I/O start time, used for cfq and bfq
These can all be consolidated into one start time and one I/O start
time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
request depending on the kernel config.
Omar Sandoval [Wed, 9 May 2018 09:08:51 +0000 (02:08 -0700)]
block: use ktime_get_ns() instead of sched_clock() for cfq and bfq
cfq and bfq have some internal fields that use sched_clock() which can
trivially use ktime_get_ns() instead. Their timestamp fields in struct
request can also use ktime_get_ns(), which resolves the 8 year old
comment added by commit 28f4197e5d47 ("block: disable preemption before
using sched_clock()").
Omar Sandoval [Wed, 9 May 2018 09:08:50 +0000 (02:08 -0700)]
block: get rid of struct blk_issue_stat
struct blk_issue_stat squashes three things into one u64:
- The time the driver started working on a request
- The original size of the request (for the io.low controller)
- Flags for writeback throttling
It turns out that on x86_64, we have a 4 byte hole in struct request
which we can fill with the non-timestamp fields from blk_issue_stat,
simplifying things quite a bit.
Omar Sandoval [Wed, 9 May 2018 09:08:49 +0000 (02:08 -0700)]
block: replace bio->bi_issue_stat with bio-specific type
struct blk_issue_stat is going away, and bio->bi_issue_stat doesn't even
use the blk-stats interface, so we can provide a separate implementation
specific for bios. The helpers work the same way as the blk-stats
helpers.
Omar Sandoval [Wed, 9 May 2018 09:08:48 +0000 (02:08 -0700)]
block: pass struct request instead of struct blk_issue_stat to wbt
issue_stat is going to go away, so first make writeback throttling take
the containing request, update the internal wbt helpers accordingly, and
change rwb->sync_cookie to be the request pointer instead of the
issue_stat pointer. No functional change.
Omar Sandoval [Wed, 9 May 2018 09:08:47 +0000 (02:08 -0700)]
block: move some wbt helpers to blk-wbt.c
A few helpers are only used from blk-wbt.c, so move them there, and put
wbt_track() behind the CONFIG_BLK_WBT typedef. This is in preparation
for changing how the wbt flags are tracked.
Jens Axboe [Mon, 7 May 2018 16:03:23 +0000 (10:03 -0600)]
blk-wbt: throttle discards like background writes
Throttle discards like we would any background write. Discards should
be background activity, so if they are impacting foreground IO, then
we will throttle them down.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 7 May 2018 15:57:08 +0000 (09:57 -0600)]
blk-wbt: pass in enum wbt_flags to get_rq_wait()
This is in preparation for having more write queues, in which
case we would have needed to pass in more information than just
a simple 'is_kswapd' boolean.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tetsuo Handa [Fri, 4 May 2018 16:58:09 +0000 (10:58 -0600)]
loop: remember whether sysfs_create_group() was done
syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.
Thomas Gleixner [Fri, 4 May 2018 14:32:47 +0000 (16:32 +0200)]
block: Shorten interrupt disabled regions
Commit 9c40cef2b799 ("sched: Move blk_schedule_flush_plug() out of
__schedule()") moved the blk_schedule_flush_plug() call out of the
interrupt/preempt disabled region in the scheduler. This allows to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug_list().
But it makes more sense to disable interrupts explicitly when the request
queue is locked end reenable them when the request to is unlocked. This
shortens the interrupt disabled section which is important when the plug
list contains requests for more than one queue. The comment which claims
that disabling interrupts around the loop is misleading as the called
functions can reenable interrupts unconditionally anyway and obfuscates the
scope badly:
Aside of that the detached interrupt disabling is a constant pain for
PREEMPT_RT as it requires patching and special casing when RT is enabled
while with the spin_*_irq() variants this happens automatically.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 2fff8a924d4c ("block: Check locking assumptions at runtime") added a
lockdep_assert_held(q->queue_lock) which makes the WARN_ON() redundant
because lockdep will detect and warn about context violations.
The unconditional WARN_ON() does not provide real additional value, so it
can be removed.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
block: don't disable interrupts during kmap_atomic()
bounce_copy_vec() disables interrupts around kmap_atomic(). This is a
leftover from the old kmap_atomic() implementation which relied on fixed
mapping slots, so the caller had to make sure that the same slot could not
be reused from an interrupting context.
kmap_atomic() was changed to dynamic slots long ago and commit 1ec9c5ddc17a
("include/linux/highmem.h: remove the second argument of k[un]map_atomic()")
removed the slot assignements, but the callers were not checked for now
redundant interrupt disabling.
Remove the conditional interrupt disable.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 6 May 2018 15:46:29 +0000 (05:46 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pll KVM fixes from Radim Krčmář:
"ARM:
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
x86:
- Speed up injection of expired timers (for stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: remove APIC Timer periodic/oneshot spikes
arm64: vgic-v2: Fix proxying of cpuif access
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
KVM: arm/arm64: vgic: Fix source vcpu issues for GICv2 SGI
Linus Torvalds [Sun, 6 May 2018 15:42:24 +0000 (05:42 -1000)]
Merge tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- fix a compile warning in the AMD IOMMU driver with irq remapping
disabled
- fix for VT-d interrupt remapping and invalidation size (caused a
BUG_ON when trying to invalidate more than 4GB)
- build fix and a regression fix for broken graphics with old DTS for
the rockchip iommu driver
- a revert in the PCI window reservation code which fixes a regression
with VFIO.
* tag 'iommu-fixes-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: rockchip: fix building without CONFIG_OF
iommu/vt-d: Use WARN_ON_ONCE instead of BUG_ON in qi_flush_dev_iotlb()
iommu/vt-d: fix shift-out-of-bounds in bug checking
iommu/dma: Move PCI window region reservation back into dma specific path.
iommu/rockchip: Make clock handling optional
iommu/amd: Hide unused iommu_table_lock
iommu/vt-d: Fix usage of force parameter in intel_ir_reconfigure_irte()
Linus Torvalds [Sun, 6 May 2018 15:37:24 +0000 (05:37 -1000)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"Unbreak the CPUID CPUID_8000_0008_EBX reload which got dropped when
the evaluation of physical and virtual bits which uses the same CPUID
leaf was moved out of get_cpu_cap()"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Restore CPUID_8000_0008_EBX reload
Linus Torvalds [Sun, 6 May 2018 15:35:23 +0000 (05:35 -1000)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull clocksource fixes from Thomas Gleixner:
"The recent addition of the early TSC clocksource breaks on machines
which have an unstable TSC because in case that TSC is disabled, then
the clocksource selection logic falls back to the early TSC which is
obviously bogus.
That also unearthed a few robustness issues in the clocksource
derating code which are addressed as well"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: Rework stale comment
clocksource: Consistent de-rate when marking unstable
x86/tsc: Fix mark_tsc_unstable()
clocksource: Initialize cs->wd_list
clocksource: Allow clocksource_mark_unstable() on unregistered clocksources
x86/tsc: Always unregister clocksource_tsc_early
Linus Torvalds [Sun, 6 May 2018 15:34:06 +0000 (05:34 -1000)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix to prevent false positives in the spurious interrupt
detector when more than a single demultiplex register is evaluated in
the Qualcom irq combiner driver"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/qcom: Fix check for spurious interrupts
Linus Torvalds [Sun, 6 May 2018 03:30:58 +0000 (17:30 -1000)]
Merge tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
- We missed a case in the Dell config dependencies resulting in a
possible bad configuration, resolve it by giving up on trying to keep
DELL_LAPTOP visible in the menu and make it depend on DELL_SMBIOS.
- Fix a null pointer dereference at module unload for the asus-wireless
driver.
* tag 'platform-drivers-x86-v4.17-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: Kconfig: Fix dell-laptop dependency chain.
platform/x86: asus-wireless: Fix NULL pointer dereference
Linus Torvalds [Sun, 6 May 2018 03:28:08 +0000 (17:28 -1000)]
Merge tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB driver fixes for 4.17-rc4.
The majority of them are some USB gadget fixes that missed my last
pull request. The "largest" patch in here is a fix for the old visor
driver that syzbot found 6 months or so ago and I finally remembered
to fix it.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: host: ehci: Use dma_pool_zalloc()"
usb: typec: tps6598x: handle block reads separately with plain-I2C adapters
usb: typec: tcpm: Release the role mux when exiting
USB: Accept bulk endpoints with 1024-byte maxpacket
xhci: Fix use-after-free in xhci_free_virt_device
USB: serial: visor: handle potential invalid device configuration
USB: serial: option: adding support for ublox R410M
usb: musb: trace: fix NULL pointer dereference in musb_g_tx()
usb: musb: host: fix potential NULL pointer dereference
usb: gadget: composite Allow for larger configuration descriptors
usb: dwc3: gadget: Fix list_del corruption in dwc3_ep_dequeue
usb: dwc3: gadget: dwc3_gadget_del_and_unmap_request() can be static
usb: dwc2: pci: Fix error return code in dwc2_pci_probe()
usb: dwc2: WA for Full speed ISOC IN in DDMA mode.
usb: dwc2: dwc2_vbus_supply_init: fix error check
usb: gadget: f_phonet: fix pn_net_xmit()'s return type
Since the commit "8003c9ae204e: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes.
Here the results on a 150000 times 1ms timer without any load:
Before 8003c9ae204e | After 8003c9ae204e
Max 1834us | 86000us
Mean 1100us | 1021us
Deviation 59us | 149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
Before 8003c9ae204e | After 8003c9ae204e
Max 32000us | 140000us
Mean 1006us | 1997us
Deviation 140us | 11095us
The root cause of the problem is starting hrtimer with an expiry time
already in the past can take more than 20 milliseconds to trigger the
timer function. It can be solved by forward such past timers
immediately, rather than submitting them to hrtimer_start().
In case the timer is periodic, update the target expiration and call
hrtimer_start with it.
v2: Check if the tsc deadline is already expired. Thank you Mika.
v3: Execute the past timers immediately rather than submitting them to
hrtimer_start().
v4: Rearm the periodic timer with advance_periodic_target_expiration() a
simpler version of set_target_expiration(). Thank you Paolo.
Radim Krčmář [Sat, 5 May 2018 21:05:31 +0000 (23:05 +0200)]
Merge tag 'kvmarm-fixes-for-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/arm fixes for 4.17, take #2
- Fix proxying of GICv2 CPU interface accesses
- Fix crash when switching to BE
- Track source vcpu git GICv2 SGIs
- Fix an outdated bit of documentation
Linus Torvalds [Sat, 5 May 2018 07:15:25 +0000 (21:15 -1000)]
Merge tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- remove state comment in modpost
- extend MAINTAINERS entry to cover modpost and more makefiles
- fix missed building of SANCOV gcc-plugin
- replace left-over 'bison' with $(YACC)
- display short log when generating parer of genksyms
* tag 'kbuild-fixes-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
genksyms: fix typo in parse.tab.{c,h} generation rules
kbuild: replace hardcoded bison in cmd_bison_h with $(YACC)
gcc-plugins: fix build condition of SANCOV plugin
MAINTAINERS: Update Kbuild entry with a few paths
modpost: delete stale comment
Linus Torvalds [Sat, 5 May 2018 07:12:06 +0000 (21:12 -1000)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes froom Stephen Boyd:
"A handful of fixes for the stm32mp1 clk driver came in during the
merge window for the driver that got merged in the merge window.
Plus a warning fix for unused PM ops and a couple fixes for the meson
clk driver clk names that went unnoticed with the regmap rework.
There's also another fix in here for the mux rounding flag which
wasn't doing what it said it did, but now it does"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: meson: meson8b: fix meson8b_cpu_clk parent clock name
clk: meson: meson8b: fix meson8b_fclk_div3_div clock name
clk: meson: drop meson_aoclk_gate_regmap_ops
clk: meson: honor CLK_MUX_ROUND_CLOSEST in clk_regmap
clk: honor CLK_MUX_ROUND_CLOSEST in generic clk mux
clk: cs2000: mark resume function as __maybe_unused
clk: stm32mp1: remove ck_apb_dbg clock
clk: stm32mp1: set stgen_k clock as critical
clk: stm32mp1: add missing tzc2 clock
clk: stm32mp1: fix SAI3 & SAI4 clocks
clk: stm32mp1: remove unused dfsdm_src[] const
clk: stm32mp1: add missing static
Linus Torvalds [Sat, 5 May 2018 07:05:12 +0000 (21:05 -1000)]
Merge tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"vmwgfx, i915, vc4, vga dac fixes.
This seems eerily quiet, so I expect it will explode next week or
something.
One i915 model firmware, two vmwgfx fixes, one vc4 fix and one bridge
leak fix"
* tag 'drm-fixes-for-v4.17-rc4' of git://people.freedesktop.org/~airlied/linux:
drm/bridge: vga-dac: Fix edid memory leak
drm/vc4: Make sure vc4_bo_{inc,dec}_usecnt() calls are balanced
drm/i915/glk: Add MODULE_FIRMWARE for Geminilake
drm/vmwgfx: Fix a buffer object leak
drm/vmwgfx: Clean up fbdev modeset locking
Linus Torvalds [Sat, 5 May 2018 06:57:28 +0000 (20:57 -1000)]
Merge tag 'trace-v4.17-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Some of the files in the tracing directory show file mode 0444 when
they are writable by root. To fix the confusion, they should be 0644.
Note, either case root can still write to them.
Zhengyuan asked why I never applied that patch (the first one is from
2014!). I simply forgot about it. /me lowers head in shame"
* tag 'trace-v4.17-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix the file mode of stack tracer
ftrace: Have set_graph_* files have normal file modes
Linus Torvalds [Sat, 5 May 2018 06:51:10 +0000 (20:51 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"This is our first pull request of the rc cycle. It's not that it's
been overly quiet, we were just waiting on a few things before sending
this off.
For instance, the 6 patch series from Intel for the hfi1 driver had
actually been pulled in on Tuesday for a Wednesday pull request, only
to have Jason notice something I missed, so we held off for some
testing, and then on Thursday had to respin the series because the
very first patch needed a minor fix (unnecessary cast is all).
There is a sizable hns patch series in here, as well as a reasonably
largish hfi1 patch series, then all of the lines of uapi updates are
just the change to the new official Linux-OpenIB SPDX tag (a bunch of
our files had what amounts to a BSD-2-Clause + MIT Warranty statement
as their license as a result of the initial code submission years ago,
and the SPDX folks decided it was unique enough to warrant a unique
tag), then the typical mlx4 and mlx5 updates, and finally some cxgb4
and core/cache/cma updates to round out the bunch.
None of it was overly large by itself, but in the 2 1/2 weeks we've
been collecting patches, it has added up :-/.
As best I can tell, it's been through 0day (I got a notice about my
last for-next push, but not for my for-rc push, but Jason seems to
think that failure messages are prioritized and success messages not
so much). It's also been through linux-next. And yes, we did notice in
the context portion of the CMA query gid fix patch that there is a
dubious BUG_ON() in the code, and have plans to audit our BUG_ON usage
and remove it anywhere we can.
Summary:
- Various build fixes (USER_ACCESS=m and ADDR_TRANS turned off)
- SPDX license tag cleanups (new tag Linux-OpenIB)
- RoCE GID fixes related to default GIDs
- Various fixes to: cxgb4, uverbs, cma, iwpm, rxe, hns (big batch),
mlx4, mlx5, and hfi1 (medium batch)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (52 commits)
RDMA/cma: Do not query GID during QP state transition to RTR
IB/mlx4: Fix integer overflow when calculating optimal MTT size
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
IB/hfi1: Fix loss of BECN with AHG
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Fix handling of FECN marked multicast packet
IB/core: Make ib_mad_client_id atomic
iw_cxgb4: Atomically flush per QP HW CQEs
IB/uverbs: Fix kernel crash during MR deregistration flow
IB/uverbs: Prevent reregistration of DM_MR to regular MR
RDMA/mlx4: Add missed RSS hash inner header flag
RDMA/hns: Fix a couple misspellings
RDMA/hns: Submit bad wr
RDMA/hns: Update assignment method for owner field of send wqe
RDMA/hns: Adjust the order of cleanup hem table
RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set
RDMA/hns: Remove some unnecessary attr_mask judgement
RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set
...
Linus Torvalds [Sat, 5 May 2018 06:41:44 +0000 (20:41 -1000)]
Merge tag 'for-linus-20180504' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should to into this release. This contains:
- Set of bcache fixes from Coly, fixing regression in patches that
went into this series.
- Set of NVMe fixes by way of Keith.
- Set of bdi related fixes, one from Jan and two from Tetsuo Handa,
fixing various issues around device addition/removal.
- Two block inflight fixes from Omar, fixing issues around the
transition to using tags for blk-mq inflight accounting that we
did a few releases ago"
* tag 'for-linus-20180504' of git://git.kernel.dk/linux-block:
bdi: Fix oops in wb_workfn()
nvmet: switch loopback target state to connecting when resetting
nvme/multipath: Fix multipath disabled naming collisions
nvme/multipath: Disable runtime writable enabling parameter
nvme: Set integrity flag for user passthrough commands
nvme: fix potential memory leak in option parsing
bdi: Fix use after free bug in debugfs_remove()
bdi: wake up concurrent wb_shutdown() callers.
bcache: use pr_info() to inform duplicated CACHE_SET_IO_DISABLE set
bcache: set dc->io_disable to true in conditional_stop_bcache_device()
bcache: add wait_for_kthread_stop() in bch_allocator_thread()
bcache: count backing device I/O error for writeback I/O
bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()
bcache: store disk name in struct cache and struct cached_dev
blk-mq: fix sysfs inflight counter
blk-mq: count allocated but not started requests in iostats inflight
Linus Torvalds [Sat, 5 May 2018 06:36:50 +0000 (20:36 -1000)]
Merge tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I've got one more bug fix for xfs for 4.17-rc4, which caps the amount
of data we try to handle in one dedupe request so that userspace can't
livelock the kernel.
This series has been run through a full xfstests run during the week
and through a quick xfstests run against this morning's master, with
no ajor failures reported.
Summary:
- Cap the maximum length of a deduplication request at MAX_RW_COUNT/2
to avoid kernel livelock due to excessively large IO requests"
* tag 'xfs-4.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: cap the length of deduplication requests
Linus Torvalds [Sat, 5 May 2018 06:32:18 +0000 (20:32 -1000)]
Merge tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two regression fixes and one fix for stable"
* tag 'for-4.17-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: send, fix missing truncate for inode with prealloc extent past eof
btrfs: Take trans lock before access running trans in check_delayed_ref
btrfs: Fix wrong first_key parameter in replace_path
Since commit d677a4d60193 ("Makefile: support flag
-fsanitizer-coverage=trace-cmp"), you miss to build the SANCOV
plugin under some circumstances.
CONFIG_KCOV=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
Your compiler does not support -fsanitize-coverage=trace-pc
Your compiler does not support -fsanitize-coverage=trace-cmp
Under this condition, $(CFLAGS_KCOV) is not empty but contains a
space, so the following ifeq-conditional is false.
ifeq ($(CFLAGS_KCOV),)
Then, scripts/Makefile.gcc-plugins misses to add sancov_plugin.so to
gcc-plugin-y while the SANCOV plugin is necessary as an alternative
means.
Fixes: d677a4d60193 ("Makefile: support flag -fsanitizer-coverage=trace-cmp") Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org>
Rasmus Villemoes [Thu, 22 Mar 2018 20:58:27 +0000 (21:58 +0100)]
MAINTAINERS: Update Kbuild entry with a few paths
I managed to send some modpost patches to old addresses of both
Masahiro and Michal, and omitted linux-kbuild from cc, because my
tried and trusted scripts/get_maintainer wrapper failed me. Add the
modpost directory to the MAINTAINERS entry, and while at it make the
Makefile glob match scripts/Makefile itself, and add one matching the
Kbuild.include file as well.
Alan writes:
What you can't see just from reading the patch is that in both
cases (ehci->itd_pool and ehci->sitd_pool) there are two
allocation paths -- the two branches of an "if" statement -- and
only one of the paths calls dma_pool_[z]alloc. However, the
memset is needed for both paths, and so it can't be eliminated.
Given that it must be present, there's no advantage to calling
dma_pool_zalloc rather than dma_pool_alloc.
When the module is removed the led workqueue is destroyed in the remove
callback, before the led device is unregistered from the led subsystem.
This leads to a NULL pointer derefence when the led device is
unregistered automatically later as part of the module removal cleanup.
Bellow is the backtrace showing the problem.
Reported-by: Dun Hum <bitter.taste@gmx.com> Cc: stable@vger.kernel.org Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
James Morse [Fri, 4 May 2018 15:19:24 +0000 (16:19 +0100)]
arm64: vgic-v2: Fix proxying of cpuif access
Proxying the cpuif accesses at EL2 makes use of vcpu_data_guest_to_host
and co, which check the endianness, which call into vcpu_read_sys_reg...
which isn't mapped at EL2 (it was inlined before, and got moved OoL
with the VHE optimizations).
The result is of course a nice panic. Let's add some specialized
cruft to keep the broken platforms that require this hack alive.
But, this code used vcpu_data_guest_to_host(), which expected us to
write the value to host memory, instead we have trapped the guest's
read or write to an mmio-device, and are about to replay it using the
host's readl()/writel() which also perform swabbing based on the host
endianness. This goes wrong when both host and guest are big-endian,
as readl()/writel() will undo the guest's swabbing, causing the
big-endian value to be written to device-memory.
What needs doing?
A big-endian guest will have pre-swabbed data before storing, undo this.
If its necessary for the host, writel() will re-swab it.
For a read a big-endian guest expects to swab the data after the load.
The hosts's readl() will correct for host endianness, giving us the
device-memory's value in the register. For a big-endian guest, swab it
as if we'd only done the load.
For a little-endian guest, nothing needs doing as readl()/writel() leave
the correct device-memory value in registers.
Tested on Juno with that rarest of things: a big-endian 64K host.
Based on a patch from Marc Zyngier.
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: bf8feb39642b ("arm64: KVM: vgic-v2: Add GICV access from HYP") Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM: arm/arm64: vgic_init: Cleanup reference to process_maintenance
One comment still mentioned process_maintenance operations after
commit af0614991ab6 ("KVM: arm/arm64: vgic: Get rid of unnecessary
process_maintenance operation")
Update the comment to point to vgic_fold_lr_state instead, which
is where maintenance interrupts are taken care of.
Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
James Morse [Wed, 2 May 2018 11:17:02 +0000 (12:17 +0100)]
KVM: arm64: Fix order of vcpu_write_sys_reg() arguments
A typo in kvm_vcpu_set_be()'s call:
| vcpu_write_sys_reg(vcpu, SCTLR_EL1, sctlr)
causes us to use the 32bit register value as an index into the sys_reg[]
array, and sail off the end of the linear map when we try to bring up
big-endian secondaries.
| Unable to handle kernel paging request at virtual address ffff80098b982c00
| Mem abort info:
| ESR = 0x96000045
| Exception class = DABT (current EL), IL = 32 bits
| SET = 0, FnV = 0
| EA = 0, S1PTW = 0
| Data abort info:
| ISV = 0, ISS = 0x00000045
| CM = 0, WnR = 1
| swapper pgtable: 4k pages, 48-bit VAs, pgdp = 000000002ea0571a
| [ffff80098b982c00] pgd=00000009ffff8803, pud=0000000000000000
| Internal error: Oops: 96000045 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 2 PID: 1561 Comm: kvm-vcpu-0 Not tainted 4.17.0-rc3-00001-ga912e2261ca6-dirty #1323
| Hardware name: ARM Juno development board (r1) (DT)
| pstate: 60000005 (nZCv daif -PAN -UAO)
| pc : vcpu_write_sys_reg+0x50/0x134
| lr : vcpu_write_sys_reg+0x50/0x134
Fixes: 8d404c4c24613 ("KVM: arm64: Rewrite system register accessors to read/write functions") CC: Christoffer Dall <cdall@cs.columbia.edu> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Linus Torvalds [Fri, 4 May 2018 15:44:50 +0000 (05:44 -1000)]
Merge tag 'pm-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"This fixes a regression from the 4.14 cycle in the CPPC cpufreq driver
causing it to use an incorrect transition delay value which leads to a
very high rate of frequency change requests when the schedutil
governor is in use (Prashanth Prakash)"
* tag 'pm-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / CPPC: Set platform specific transition_delay_us
Linus Torvalds [Fri, 4 May 2018 15:43:33 +0000 (05:43 -1000)]
Merge tag 'acpi-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"This fixes an ACPICA utilities (acpidump) build regression from the
4.16 cycle by setting LD in the CFLAGS passed to the linker to $(CC)
again (Jiri Slaby)"
* tag 'acpi-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools: power/acpi, revert to LD = gcc
Linus Torvalds [Fri, 4 May 2018 15:38:51 +0000 (05:38 -1000)]
Merge tag 'media/v4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- a trivial one-line fix addressing a PTR_ERR() getting value from a
wrong var at imx driver
- a patch changing my e-mail at the Kernel tree to mchehab@kernel.org.
no code changes
* tag 'media/v4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
MAINTAINERS & files: Canonize the e-mails I use at files
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Linus Torvalds [Fri, 4 May 2018 15:37:22 +0000 (05:37 -1000)]
Merge tag 'sound-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, all deserved for stable.
Two are about core API fixes for the bugs that were triggered by
ever-growing fuzzers, while others are driver-specific fixes"
* tag 'sound-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Check PCM state at xfern compat ioctl
ALSA: aloop: Add missing cable lock to ctl API callbacks
ALSA: dice: fix kernel NULL pointer dereference due to invalid calculation for array index
ALSA: seq: Fix races at MIDI encoding in snd_virmidi_output_trigger()
ALSA: hda - Fix incorrect usage of IS_REACHABLE()
MAINTAINERS & files: Canonize the e-mails I use at files
From now on, I'll start using my @kernel.org as my development e-mail.
As such, let's remove the entries that point to the old
mchehab@s-opensource.com at MAINTAINERS file.
For the files written with a copyright with mchehab@s-opensource,
let's keep Samsung on their names, using mchehab+samsung@kernel.org,
in order to keep pointing to my employer, with sponsors the work.
For the files written before I join Samsung (on July, 4 2013),
let's just use mchehab@kernel.org.
For bug reports, we can simply point to just kernel.org, as
this will reach my mchehab+samsung inbox anyway.
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Brian Warner <brian.warner@samsung.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
media: imx-media-csi: Fix inconsistent IS_ERR and PTR_ERR
Fix inconsistent IS_ERR and PTR_ERR in imx_csi_probe.
The proper pointer to be passed as argument is pinctrl
instead of priv->vdev.
This issue was detected with the help of Coccinelle.
Fixes: 52e17089d185 ("media: imx: Don't initialize vars that won't be used") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Tested-by: Philipp Zabel <p.zabel@pengutronix.de> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Commit 7ed1c1901fe5 (tools: fix cross-compile var clobbering) removed
setting of LD to $(CROSS_COMPILE)gcc. This broke build of acpica
(acpidump) in power/acpi:
ld: unrecognized option '-D_LINUX'
The tools pass CFLAGS to the linker (incl. -D_LINUX), so revert this
particular change and let LD be $(CC) again. Note that the old behaviour
was a bit different, it used $(CROSS_COMPILE)gcc which was eliminated by
the commit 7ed1c1901fe5. We use $(CC) for that reason.
Fixes: 7ed1c1901fe5 (tools: fix cross-compile var clobbering) Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: 4.16+ <stable@vger.kernel.org> # 4.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Fri, 4 May 2018 05:26:51 +0000 (19:26 -1000)]
Merge tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This Kselftest update for 4.17-rc4 consists of a fix for a syntax
error in the script that runs selftests. Mathieu Desnoyers found this
bug in the script on systems running GNU Make 3.8 or older"
* tag 'linux-kselftest-4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: Fix lib.mk run_tests target shell script
1) Various sockmap fixes from John Fastabend (pinned map handling,
blocking in recvmsg, double page put, error handling during redirect
failures, etc.)
2) Fix dead code handling in x86-64 JIT, from Gianluca Borello.
3) Missing device put in RDS IB code, from Dag Moxnes.
4) Don't process fast open during repair mode in TCP< from Yuchung
Cheng.
5) Move address/port comparison fixes in SCTP, from Xin Long.
6) Handle add a bond slave's master into a bridge properly, from
Hangbin Liu.
7) IPv6 multipath code can operate on unitialized memory due to an
assumption that the icmp header is in the linear SKB area. Fix from
Eric Dumazet.
8) Don't invoke do_tcp_sendpages() recursively via TLS, from Dave
Watson.
9) Fix memory leaks in x86-64 JIT, from Daniel Borkmann.
10) RDS leaks kernel memory to userspace, from Eric Dumazet.
11) DCCP can invoke a tasklet on a freed socket, take a refcount. Also
from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
dccp: fix tasklet usage
smc: fix sendpage() call
net/smc: handle unregistered buffers
net/smc: call consolidation
qed: fix spelling mistake: "offloded" -> "offloaded"
net/mlx5e: fix spelling mistake: "loobpack" -> "loopback"
tcp: restore autocorking
rds: do not leak kernel memory to user land
qmi_wwan: do not steal interfaces from class drivers
ipv4: fix fnhe usage by non-cached routes
bpf: sockmap, fix error handling in redirect failures
bpf: sockmap, zero sg_size on error when buffer is released
bpf: sockmap, fix scatterlist update on error path in send with apply
net_sched: fq: take care of throttled flows before reuse
ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
bpf, x64: fix memleak when not converging on calls
bpf, x64: fix memleak when not converging after image
net/smc: restrict non-blocking connect finish
8139too: Use disable_irq_nosync() in rtl8139_poll_controller()
sctp: fix the issue that the cookie-ack with auth can't get processed
...
Linus Torvalds [Fri, 4 May 2018 04:31:19 +0000 (18:31 -1000)]
Merge branch 'parisc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Fix two section mismatches, convert to read_persistent_clock64(), add
further documentation regarding the HPMC crash handler and make
bzImage the default build target"
* 'parisc-4.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix section mismatches
parisc: drivers.c: Fix section mismatches
parisc: time: Convert read_persistent_clock() to read_persistent_clock64()
parisc: Document rules regarding checksum of HPMC handler
parisc: Make bzImage default build target
Dave Airlie [Fri, 4 May 2018 00:03:27 +0000 (10:03 +1000)]
Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Two fixes for now, one for a long standing problem uncovered by a commit
in the 4.17 merge window, one for a regression introduced by a previous
bugfix, Cc'd stable.
* 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Fix a buffer object leak
drm/vmwgfx: Clean up fbdev modeset locking
Jan Kara [Thu, 3 May 2018 16:26:26 +0000 (18:26 +0200)]
bdi: Fix oops in wb_workfn()
Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:
mod_delayed_work(bdi_wq, &wb->dwork, 0);
and if this happens while wb_shutdown() is waiting in:
flush_delayed_work(&wb->dwork);
the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.
Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.
CC: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> CC: Tejun Heo <tj@kernel.org> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977 Reported-by: syzbot <syzbot+9873874c735f2892e7e9@syzkaller.appspotmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Parav Pandit [Wed, 2 May 2018 10:18:59 +0000 (13:18 +0300)]
RDMA/cma: Do not query GID during QP state transition to RTR
When commit [1] was added, SGID was queried to derive the SMAC address.
Then, later on during a refactor [2], SMAC was no longer needed. However,
the now useless GID query remained. Then during additional code changes
later on, the GID query was being done in such a way that it caused iWARP
queries to start breaking. Remove the useless GID query and resolve the
iWARP breakage at the same time.
This is discussed in [3].
[1] commit dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
[2] commit 5c266b2304fb ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av")
[3] https://www.spinics.net/lists/linux-rdma/msg63951.html
IB/mlx4: Fix integer overflow when calculating optimal MTT size
When the kernel was compiled using the UBSAN option,
we saw the following stack trace:
[ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27
[ 1184.828114] signed integer overflow:
[ 1184.828247] -2147483648 - 1 cannot be represented in type 'int'
The problem was caused by calling round_up in procedure
mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack
trace) with the second parameter (1 << block_shift) (which is an int).
The second parameter should have been (1ULL << block_shift) (which
is an unsigned long long).
(1 << block_shift) is treated by the compiler as an int (because 1 is
an integer).
Now, local variable block_shift is initialized to 31.
If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368.
This is the most negative int value.
Inside the round_up macro, there is a cast applied to ((1 << 31) - 1).
However, this cast is applied AFTER ((1 << 31) - 1) is calculated.
Since (1 << 31) is treated as an int, we get the negative overflow
identified by UBSAN in the process of calculating ((1 << 31) - 1).
The fix is to change (1 << block_shift) to (1ULL << block_shift) on
line 349.
Fixes: 9901abf58368 ("IB/mlx4: Use optimal numbers of MTT entries") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Fix memory leak in exception path in get_irq_affinity()
When IRQ affinity is set and the interrupt type is unknown, a cpu
mask allocated within the function is never freed. Fix this memory
leak by allocating memory within the scope where it is used.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure
When allocating device data, if there's an allocation failure, the
already allocated memory won't be freed such as per-cpu counters.
Fix memory leaks in exception path by creating a common reentrant
clean up function hfi1_clean_devdata() to be used at driver unload
time and device data allocation failure.
To accomplish this, free_platform_config() and clean_up_i2c() are
changed to be reentrant to remove dependencies when they are called
in different order. This helps avoid NULL pointer dereferences
introduced by this patch if those two functions weren't reentrant.
In addition, set dd->int_counter, dd->rcv_limit,
dd->send_schedule and dd->tx_opstats to NULL after they're freed in
hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used
When an invalid num_vls is used as a module parameter, the code
execution follows an exception path where the macro dd_dev_err()
expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This
causes a NULL pointer dereference.
Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev
earlier in the code. If a dd exists, then dd->pcidev and
dd->pcidev->dev always exists.
Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
AHG may be armed to use the stored header, which by design is limited
to edits in the PSN/A 32 bit word (bth2).
When the code is trying to send a BECN, the use of the stored header
will lose the BECN bit.
Fix by avoiding AHG when getting ready to send a BECN. This is
accomplished by always claiming the packet is not a middle packet which
is an AHG precursor. BECNs are not a normal case and this should not
hurt AHG optimizations.
Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Michael J. Ruhl [Tue, 1 May 2018 12:35:43 +0000 (05:35 -0700)]
IB/hfi1 Use correct type for num_user_context
The module parameter num_user_context is defined as 'int' and
defaults to -1. The module_param_named() says that it is uint.
Correct module_param_named() type information and update the modinfo
text to reflect the default value.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Fix handling of FECN marked multicast packet
The code for handling a marked UD packet unconditionally returns the
dlid in the header of the FECN marked packet. This is not correct
for multicast packets where the DLID is in the multicast range.
The subsequent attempt to send the CNP with the multicast lid will
cause the chip to halt the ack send context because the source
lid doesn't match the chip programming. The send context will
be halted and flush any other pending packets in the pio ring causing
the CNP to not be sent.
A part of investigating the fix, it was determined that the 16B work
broke the FECN routine badly with inconsistent use of 16 bit and 32 bits
types for lids and pkeys. Since the port's source lid was correctly 32
bits the type mixmatches need to be dealt with at the same time as
fixing the CNP header issue.
Fix these issues by:
- Using the ports lid for as the SLID for responding to FECN marked UD
packets
- Insure pkey is always 16 bit in this and subordinate routines
- Insure lids are 32 bits in this and subordinate routines
Cc: <stable@vger.kernel.org> # 4.14.x Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support") Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
David S. Miller [Thu, 3 May 2018 18:47:32 +0000 (14:47 -0400)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fixes 2018/05/03
here are smc fixes for 2 problems:
* receive buffers in SMC must be registered. If registration fails
these buffers must not be kept within the link group for reuse.
Patch 1 is a preparational patch; patch 2 contains the fix.
* sendpage: do not hold the sock lock when calling kernel_sendpage()
or sock_no_sendpage()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>