Sameeh Jubran [Sun, 23 Jun 2019 07:11:10 +0000 (10:11 +0300)]
net: ena: Fix bug where ring allocation backoff stopped too late
The current code of create_queues_with_size_backoff() allows the ring size
to become as small as ENA_MIN_RING_SIZE/2. This is a bug since we don't
want the queue ring to be smaller than ENA_MIN_RING_SIZE
In this commit we change the loop's termination condition to look at the
queue size of the next iteration instead of that of the current one,
so that the minimal queue size again becomes ENA_MIN_RING_SIZE.
Fixes: eece4d2ab9d2 ("net: ena: add ethtool function for changing io queue sizes") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 20 Jun 2019 13:27:51 +0000 (14:27 +0100)]
hinic: fix dereference of pointer hwdev before it is null checked
Currently pointer hwdev is dereferenced when assigning hwif before
hwdev is null checked. Fix this by only derefencing hwdev after the
null check.
Addresses-Coverity: ("Dereference before null check") Fixes: 4fdc51bb4e92 ("hinic: add support for rss parameters with ethtool") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: mediatek: Add MT7621 TRGMII mode support
Like many other mediatek SOCs, the MT7621 SOC and the internal MT7530
switch both supports TRGMII mode. MT7621 TRGMII speed is fix 1200MBit.
v1->v2:
- Fix breakage on non MT7621 SOC
- Support 25MHz and 40MHz XTAL as MT7530 clocksource
====================
Tested-by: "Frank Wunderlich" <frank-w@public-files.de> Acked-by: Sean Wang <sean.wang@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: René van Dorst <opensource@vdorst.com> Tested-by: Frank Wunderlich <frank-w@public-files.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 20 Jun 2019 11:24:40 +0000 (19:24 +0800)]
netns: restore ops before calling ops_exit_list
ops has been iterated to first element when call pre_exit, and
it needs to restore from save_ops, not save ops to save_ops
Fixes: d7d99872c144 ("netns: add pre_exit method to struct pernet_operations") Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fjes: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Wed, 19 Jun 2019 21:46:28 +0000 (23:46 +0200)]
net: fastopen: robustness and endianness fixes for SipHash
Some changes to the TCP fastopen code to make it more robust
against future changes in the choice of key/cookie size, etc.
- Instead of keeping the SipHash key in an untyped u8[] buffer
and casting it to the right type upon use, use the correct
type directly. This ensures that the key will appear at the
correct alignment if we ever change the way these data
structures are allocated. (Currently, they are only allocated
via kmalloc so they always appear at the correct alignment)
- Use DIV_ROUND_UP when sizing the u64[] array to hold the
cookie, so it is always of sufficient size, even if
TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.
- Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
function, which is no longer used.
- Add endian swabbing when setting the keys and calculating the hash,
to ensure that cookie values are the same for a given key and
source/destination address pair regardless of the endianness of
the server.
Note that none of these are functional changes wrt the current
state of the code, with the exception of the swabbing, which only
affects big endian systems.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix leak of unqueued fragments in ipv6 nf_defrag, from Guillaume
Nault.
2) Don't access the DDM interface unless the transceiver implements it
in bnx2x, from Mauro S. M. Rodrigues.
3) Don't double fetch 'len' from userspace in sock_getsockopt(), from
JingYi Hou.
4) Sign extension overflow in lio_core, from Colin Ian King.
5) Various netem bug fixes wrt. corrupted packets from Jakub Kicinski.
6) Fix epollout hang in hvsock, from Sunil Muthuswamy.
7) Fix regression in default fib6_type, from David Ahern.
8) Handle memory limits in tcp_fragment more appropriately, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
tcp: refine memory limit test in tcp_fragment()
inet: clear num_timeout reqsk_alloc()
net: mvpp2: debugfs: Add pmap to fs dump
ipv6: Default fib6_type to RTN_UNICAST when not set
net: hns3: Fix inconsistent indenting
net/af_iucv: always register net_device notifier
net/af_iucv: build proper skbs for HiperTransport
net/af_iucv: remove GFP_DMA restriction for HiperTransport
net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
hvsock: fix epollout hang from race condition
net/udp_gso: Allow TX timestamp with UDP GSO
net: netem: fix use after free and double free with packet corruption
net: netem: fix backlog accounting for corrupted GSO frames
net: lio_core: fix potential sign-extension overflow on large shift
tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
tun: wake up waitqueues after IFF_UP is set
net: remove duplicate fetch in sock_getsockopt
tipc: fix issues with early FAILOVER_MSG from peer
...
====================
PCI: let pci_disable_link_state propagate errors
Drivers like r8169 rely on pci_disable_link_state() having disabled
certain ASPM link states. If OS can't control ASPM then
pci_disable_link_state() turns into a no-op w/o informing the caller.
The driver therefore may falsely assume the respective ASPM link
states are disabled. Let pci_disable_link_state() propagate errors
to the caller, enabling the caller to react accordingly.
I'd propose to let this series go through the netdev tree if the PCI
core extension is acked by the PCI people.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 18 Jun 2019 21:14:50 +0000 (23:14 +0200)]
r8169: don't activate ASPM in chip if OS can't control ASPM
Certain chip version / board combinations have massive problems if
ASPM is active. If BIOS enables ASPM and doesn't let OS control it,
then we may have a problem with the current code. Therefore check the
return code of pci_disable_link_state() and don't enable ASPM in the
network chip if OS can't control ASPM.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 18 Jun 2019 21:13:48 +0000 (23:13 +0200)]
PCI: let pci_disable_link_state propagate errors
Drivers may rely on pci_disable_link_state() having disabled certain
ASPM link states. If OS can't control ASPM then pci_disable_link_state()
turns into a no-op w/o informing the caller. The driver therefore may
falsely assume the respective ASPM link states are disabled.
Let pci_disable_link_state() propagate errors to the caller, enabling
the caller to react accordingly.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Jun 2019 13:09:55 +0000 (06:09 -0700)]
tcp: refine memory limit test in tcp_fragment()
tcp_fragment() might be called for skbs in the write queue.
Memory limits might have been exceeded because tcp_sendmsg() only
checks limits at full skb (64KB) boundaries.
Therefore, we need to make sure tcp_fragment() wont punish applications
that might have setup very low SO_SNDBUF values.
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 21 Jun 2019 21:47:09 +0000 (14:47 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"This is probably our last -rc pull request. We don't have anything
else outstanding at the moment anyway, and with the summer months on
us and people taking trips, I expect the next weeks leading up to the
merge window to be pretty calm and sedate.
This has two simple, no brainer fixes for the EFA driver.
Then it has ten not quite so simple fixes for the hfi1 driver. The
problem with them is that they aren't simply one liner typo fixes.
They're still fixes, but they're more complex issues like livelock
under heavy load where the answer was to change work queue usage and
spinlock usage to resolve the problem, or issues with orphaned
requests during certain types of failures like link down which
required some more complex work to fix too. They all look like
legitimate fixes to me, they just aren't small like I wish they were.
Summary:
- 2 minor EFA fixes
- 10 hfi1 fixes related to scaling issues"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/efa: Handle mmap insertions overflow
RDMA/efa: Fix success return value in case of error
IB/hfi1: Handle port down properly in pio
IB/hfi1: Handle wakeup of orphaned QPs for pio
IB/hfi1: Wakeup QPs orphaned on wait list after flush
IB/hfi1: Use aborts to trigger RC throttling
IB/hfi1: Create inline to get extended headers
IB/hfi1: Silence txreq allocation warnings
IB/hfi1: Avoid hardlockup with flushlist_lock
IB/hfi1: Correct tid qp rcd to match verbs context
IB/hfi1: Close PSM sdma_progress sleep window
IB/hfi1: Validate fault injection opcode user input
Linus Torvalds [Fri, 21 Jun 2019 20:45:41 +0000 (13:45 -0700)]
Merge tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull more NFS client fixes from Anna Schumaker:
"These are mostly refcounting issues that people have found recently.
The revert fixes a suspend recovery performance issue.
- SUNRPC: Fix a credential refcount leak
- Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
- SUNRPC: Fix xps refcount imbalance on the error path
- NFS4: Only set creation opendata if O_CREAT"
* tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
SUNRPC: Fix a credential refcount leak
Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
net :sunrpc :clnt :Fix xps refcount imbalance on the error path
NFS4: Only set creation opendata if O_CREAT
Andy Lutomirski [Fri, 21 Jun 2019 15:43:04 +0000 (08:43 -0700)]
x86/vdso: Prevent segfaults due to hoisted vclock reads
GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
pages before the vclock mode checks. This creates a path through
vclock_gettime() in which no vclock is enabled at all (due to disabled
TSC on old CPUs, for example) but the pvclock or hvclock page
nevertheless read. This will segfault on bare metal.
This fixes commit 459e3a21535a ("gcc-9: properly declare the
{pv,hv}clock_page storage") in the sense that, before that commit, GCC
didn't seem to generate the offending code. There was nothing wrong
with that commit per se, and -stable maintainers should backport this to
all supported kernels regardless of whether the offending commit was
present, since the same crash could just as easily be triggered by the
phase of the moon.
On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
I'm not too concerned about performance regressions from this fix.
Trond Myklebust [Thu, 20 Jun 2019 14:47:40 +0000 (10:47 -0400)]
SUNRPC: Fix a credential refcount leak
All callers of __rpc_clone_client() pass in a value for args->cred,
meaning that the credential gets assigned and referenced in
the call to rpc_new_client().
Reported-by: Ido Schimmel <idosch@idosch.org> Fixes: 79caa5fad47c ("SUNRPC: Cache cred of process creating the rpc_client") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Anna Schumaker [Tue, 18 Jun 2019 18:57:33 +0000 (14:57 -0400)]
Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
Jon Hunter reports:
"I have been noticing intermittent failures with a system suspend test on
some of our machines that have a NFS mounted root file-system. Bisecting
this issue points to your commit 431235818bc3 ("SUNRPC: Declare RPC
timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
appear to resolve the problem.
The cause of the suspend failure appears to be a long delay observed
sometimes when resuming from suspend, and this is causing our test to
timeout."
We can end up in nfs4_opendata_alloc during task exit, in which case
current->fs has already been cleaned up. This leads to a crash in
current_umask().
Fix this by only setting creation opendata if we are actually doing an open
with O_CREAT. We can drop the check for NULL nfs4_open_createattrs, since
O_CREAT will never be set for the recovery path.
Suggested-by: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Linus Torvalds [Fri, 21 Jun 2019 18:03:33 +0000 (11:03 -0700)]
Merge tag 'drm-fixes-2019-06-21' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just catching up on the week since back from holidays, everything
seems quite sane.
core:
- copy_to_user fix for really legacy codepaths.
vmwgfx:
- two dma fixes
- one virt hw interaction fix
i915:
- modesetting fix
- gvt fix
panfrost:
- BO unmapping fix
imx:
- image converter fixes"
* tag 'drm-fixes-2019-06-21' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Don't clobber M/N values during fastset check
drm: return -EFAULT if copy_to_user() fails
drm/panfrost: Make sure a BO is only unmapped when appropriate
drm/i915/gvt: ignore unexpected pvinfo write
gpu: ipu-v3: image-convert: Fix image downsize coefficients
gpu: ipu-v3: image-convert: Fix input bytesperline for packed formats
gpu: ipu-v3: image-convert: Fix input bytesperline width/height align
drm/vmwgfx: fix a warning due to missing dma_parms
drm/vmwgfx: Honor the sg list segment size limitation
drm/vmwgfx: Use the backdoor port if the HB port is not available
Linus Torvalds [Fri, 21 Jun 2019 17:20:19 +0000 (10:20 -0700)]
Merge tag 'staging-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO/counter fixes from Greg KH:
"Here are some small driver bugfixes for some staging/iio/counter
drivers.
Staging and IIO have been lumped together for a while, as those
subsystems cross the areas a log, and counter is used by IIO, so
that's why they are all in one pull request here.
These are small fixes for reported issues in some iio drivers, the
erofs filesystem, and a build issue for counter code.
All have been in linux-next with no reported issues"
* tag 'staging-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: erofs: add requirements field in superblock
counter/ftm-quaddec: Add missing dependencies in Kconfig
staging: iio: adt7316: Fix build errors when GPIOLIB is not set
iio: temperature: mlx90632 Relax the compatibility check
iio: imu: st_lsm6dsx: fix PM support for st_lsm6dsx i2c controller
staging:iio:ad7150: fix threshold mode config bit
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
habanalabs: use u64_to_user_ptr() for reading user pointers
doc: fix documentation about UIO_MEM_LOGICAL using
MAINTAINERS / Documentation: Thorsten Scherer is the successor of Gavin Schenk
docs: fb: Add TER16x32 to the available font names
MAINTAINERS: fpga: hand off maintainership to Moritz
thunderbolt: Implement CIO reset correctly for Titan Ridge
binder: fix possible UAF when freeing buffer
thunderbolt: Make sure device runtime resume completes before taking domain lock
soundwire: intel: set dai min and max channels correctly
soundwire: stream: fix bad unlock balance
soundwire: stream: fix out of boundary access on port properties
Linus Torvalds [Fri, 21 Jun 2019 17:16:41 +0000 (10:16 -0700)]
Merge tag 'usb-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are four small USB fixes for 5.2-rc6.
They include two xhci bugfixes, a chipidea fix, and a small dwc2 fix.
Nothing major, just nice things to get resolved for reported issues.
All have been in linux-next with no reported issues"
* tag 'usb-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: detect USB 3.2 capable host controllers correctly
usb: xhci: Don't try to recover an endpoint if port is in error state.
usb: dwc2: Use generic PHY width in params setup
usb: chipidea: udc: workaround for endpoint conflict issue
Linus Torvalds [Fri, 21 Jun 2019 16:58:42 +0000 (09:58 -0700)]
Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull still more SPDX updates from Greg KH:
"Another round of SPDX updates for 5.2-rc6
Here is what I am guessing is going to be the last "big" SPDX update
for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
that were "easy" to determine by pattern matching. The ones after this
are going to be a bit more difficult and the people on the spdx list
will be discussing them on a case-by-case basis now.
Another 5000+ files are fixed up, so our overall totals are:
Files checked: 64545
Files with SPDX: 45529
Compared to the 5.1 kernel which was:
Files checked: 63848
Files with SPDX: 22576
This is a huge improvement.
Also, we deleted another 20000 lines of boilerplate license crud,
always nice to see in a diffstat"
* tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
...
Linus Torvalds [Fri, 21 Jun 2019 16:51:44 +0000 (09:51 -0700)]
Merge tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Four small SMB3 fixes, all for stable"
* tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix GlobalMid_Lock bug in cifs_reconnect
SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
cifs: add spinlock for the openFileList to cifsInodeInfo
cifs: fix panic in smb2_reconnect
Dave Airlie [Fri, 21 Jun 2019 01:44:20 +0000 (11:44 +1000)]
Merge tag 'imx-drm-fixes-2019-06-20' of git://git.pengutronix.de/git/pza/linux into drm-fixes
drm/imx: ipu-v3 image converter fixes
This series fixes input buffer alignment and downsizer configuration
to adhere to IPU mem2mem CSC/scaler hardware restrictions in certain
downscaling ratios.
Dave Airlie [Fri, 21 Jun 2019 01:26:59 +0000 (11:26 +1000)]
Merge branch 'vmwgfx-fixes-5.2' of git://people.freedesktop.org/~thomash/linux into drm-fixes
A couple of fixes for vmwgfx. Two fixes for a DMA sg-list debug warning
message. These are not cc'd stable since there is no evidence of actual
breakage.
On fix for the high-bandwidth backdoor port which is cc'd stable due to
upcoming hardware, on which the code would otherwise break.
ARM: 8867/1: vdso: pass --be8 to linker if necessary
The commit fe00e50b2db8 ("ARM: 8858/1: vdso: use $(LD) instead of $(CC)
to link VDSO") removed the passing of CFLAGS, since ld doesn't take
those directly. However, prior, big-endian ARM was relying on gcc to
translate its -mbe8 option into ld's --be8 option. Lacking this, ld
generated be32 code, making the VDSO generate SIGILL when called by
userspace.
This commit passes --be8 if CONFIG_CPU_ENDIAN_BE8 is enabled.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Linus Torvalds [Thu, 20 Jun 2019 21:19:34 +0000 (14:19 -0700)]
Merge tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix two regressions in this cycle, and a couple of older bugs"
* tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: make i_ino consistent with st_ino in more cases
ovl: fix typo in MODULE_PARM_DESC
ovl: fix bogus -Wmaybe-unitialized warning
ovl: don't fail with disconnected lower NFS
ovl: fix wrong flags check in FS_IOC_FS[SG]ETXATTR ioctls
Linus Torvalds [Thu, 20 Jun 2019 21:16:16 +0000 (14:16 -0700)]
Merge tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
"Just a single revert, fixing a regression in -rc1"
* tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
Revert "fuse: require /dev/fuse reads to have enough buffer capacity"
Linus Torvalds [Thu, 20 Jun 2019 20:50:37 +0000 (13:50 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Fixes for ARM and x86, plus selftest patches and nicer structs for
nested state save/restore"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: reorganize initial steps of vmx_set_nested_state
KVM: arm/arm64: Fix emulated ptimer irq injection
tests: kvm: Check for a kernel warning
kvm: tests: Sort tests in the Makefile alphabetically
KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT
KVM: x86: Modify struct kvm_nested_state to have explicit fields for data
KVM: fix typo in documentation
KVM: nVMX: use correct clean fields when copying from eVMCS
KVM: arm/arm64: vgic: Fix kvm_device leak in vgic_its_destroy
KVM: arm64: Filter out invalid core register IDs in KVM_GET_REG_LIST
KVM: arm64: Implement vq_present() as a macro
Linus Torvalds [Thu, 20 Jun 2019 19:04:57 +0000 (12:04 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"This is mainly a couple of email address updates to MAINTAINERS, but
we've also fixed a UAPI build issue with musl libc and an accidental
double-initialisation of our pgd_cache due to a naming conflict with a
weak symbol.
There are a couple of outstanding issues that have been reported, but
it doesn't look like they're new and we're still a long way off from
fully debugging them.
Summary:
- Fix use of #include in UAPI headers for compatability with musl libc
- Update email addresses in MAINTAINERS
- Fix initialisation of pgd_cache due to name collision with weak symbol"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/mm: don't initialize pgd_cache twice
MAINTAINERS: Update my email address
arm64/sve: <uapi/asm/ptrace.h> should not depend on <uapi/linux/prctl.h>
arm64: ssbd: explicitly depend on <linux/prctl.h>
MAINTAINERS: Update my email address to use @kernel.org
Linus Torvalds [Thu, 20 Jun 2019 19:03:41 +0000 (12:03 -0700)]
Merge tag 's390-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Disable address-of-packed-member warning in s390 specific boot code
to get rid of a gcc9 warning which otherwise is already disabled for
the whole kernel.
- Fix yet another compiler error seen with CONFIG_OPTIMIZE_INLINING
enabled.
- Fix memory leak in vfio-ccw code on module exit.
* tag 's390-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
vfio-ccw: Destroy kmem cache region on module exit
s390/ctl_reg: mark __ctl_set_bit and __ctl_clear_bit as __always_inline
s390/boot: disable address-of-packed-member warning
Linus Torvalds [Thu, 20 Jun 2019 17:12:53 +0000 (10:12 -0700)]
Merge tag 'for_v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull two misc vfs fixes from Jan Kara:
"One small quota fix fixing spurious EDQUOT errors and one fanotify fix
fixing a bug in the new fanotify FID reporting code"
* tag 'for_v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: update connector fsid cache on add mark
quota: fix a problem about transfer quota
Linus Torvalds [Thu, 20 Jun 2019 17:08:38 +0000 (10:08 -0700)]
Merge tag 'mmc-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"Here's quite a few MMC fixes intended for v5.2-rc6. This time it also
contains fixes for a WiFi driver, which device is attached to the SDIO
interface. Patches for the WiFi driver have been acked by the
corresponding maintainers.
Summary:
MMC core:
- Make switch to eMMC HS400 more robust for some controllers
- Add two SDIO func API to manage re-tuning constraints
- Prevent processing SDIO IRQs when the card is suspended
MMC host:
- sdhi: Disallow broken HS400 for M3-W ES1.2, RZ/G2M and V3H
- mtk-sd: Fixup support for SDIO IRQs
- sdhci-pci-o2micro: Fixup support for tuning
Wireless BRCMFMAC (SDIO):
- Deal with expected transmission errors related to the idle states
(handled by the Always-On-Subsystem or AOS) on the SDIO-based WiFi
on rk3288-veyron-minnie, rk3288-veyron-speedy and
rk3288-veyron-mickey"
* tag 'mmc-v5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Prevent processing SDIO IRQs when the card is suspended
mmc: sdhci: sdhci-pci-o2micro: Correctly set bus width when tuning
brcmfmac: sdio: Don't tune while the card is off
mmc: core: Add sdio_retune_hold_now() and sdio_retune_release()
brcmfmac: sdio: Disable auto-tuning around commands expected to fail
mmc: core: API to temporarily disable retuning for SDIO CRC errors
Revert "brcmfmac: disable command decode in sdio_aos"
mmc: mediatek: fix SDIO IRQ detection issue
mmc: mediatek: fix SDIO IRQ interrupt handle flow
mmc: core: complete HS400 before checking status
mmc: sdhi: disallow HS400 for M3-W ES1.2, RZ/G2M, and V3H
Linus Torvalds [Thu, 20 Jun 2019 16:58:35 +0000 (09:58 -0700)]
Merge tag 'for-linus-20190620' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three fixes that should go into this series.
One is a set of two patches from Christoph, fixing a page leak on same
page merges. Boiled down version of a bigger fix, but this one is more
appropriate for this late in the cycle (and easier to backport to
stable).
The last patch is for a divide error in MD, from Mariusz (via Song)"
* tag 'for-linus-20190620' of git://git.kernel.dk/linux-block:
md: fix for divide error in status_resync
block: fix page leak when merging to same page
block: return from __bio_try_merge_page if merging occured in the same page
Paolo Bonzini [Thu, 20 Jun 2019 16:24:18 +0000 (18:24 +0200)]
Merge tag 'kvmarm-fixes-for-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm fixes for 5.2, take #2
- SVE cleanup killing a warning with ancient GCC versions
- Don't report non-existent system registers to userspace
- Fix memory leak when freeing the vgic ITS
- Properly lower the interrupt on the emulated physical timer
Paolo Bonzini [Wed, 19 Jun 2019 14:52:27 +0000 (16:52 +0200)]
KVM: nVMX: reorganize initial steps of vmx_set_nested_state
Commit 332d079735f5 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS
state before setting new state", 2019-05-02) broke evmcs_test because the
eVMCS setup must be performed even if there is no VMXON region defined,
as long as the eVMCS bit is set in the assist page.
While the simplest possible fix would be to add a check on
kvm_state->flags & KVM_STATE_NESTED_EVMCS in the initial "if" that
covers kvm_state->hdr.vmx.vmxon_pa == -1ull, that is quite ugly.
Instead, this patch moves checks earlier in the function and
conditionalizes them on kvm_state->hdr.vmx.vmxon_pa, so that
vmx_set_nested_state always goes through vmx_leave_nested
and nested_enable_evmcs.
Fixes: 332d079735f5 ("KVM: nVMX: KVM_SET_NESTED_STATE - Tear down old EVMCS state before setting new state") Cc: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge tag 'fixes-for-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
Felipe writes:
usb: fixes for v5.2-rc5
A single fix to take into account the PHY width during initialization of
dwc2 driver. This change allows deviceTree to pass PHY width if
necessary.
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
* tag 'fixes-for-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: dwc2: Use generic PHY width in params setup
Arnd Bergmann [Mon, 17 Jun 2019 12:41:33 +0000 (14:41 +0200)]
habanalabs: use u64_to_user_ptr() for reading user pointers
We cannot cast a 64-bit integer to a pointer on 32-bit architectures
without a warning:
drivers/misc/habanalabs/habanalabs_ioctl.c: In function 'debug_coresight':
drivers/misc/habanalabs/habanalabs_ioctl.c:143:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
input = memdup_user((const void __user *) args->input_ptr,
Use the macro that was defined for this purpose.
Fixes: 315bc055ed56 ("habanalabs: add new IOCTL for debug, tracing and profiling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com> Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
page_pool: fix compile warning when CONFIG_PAGE_POOL is disabled
Kbuild test robot reported compile warning:
warning: no return statement in function returning non-void
in function page_pool_request_shutdown, when CONFIG_PAGE_POOL is disabled.
The fix makes the code a little more verbose, with a descriptive variable.
Fixes: 99c07c43c4ea ("xdp: tracking page_pool resources and safe removal") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
J. Bruce Fields [Wed, 19 Jun 2019 21:06:24 +0000 (17:06 -0400)]
nfsd: replace Jeff by Chuck as nfsd co-maintainer
Jeff's picking up more responsibilities elsewhere, and Chuck's agreed to
take over.
For now, as before, nothing's changing day-to-day, but I want to have a
co-maintainer if only for bus factor.
Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Dumazet [Wed, 19 Jun 2019 16:38:38 +0000 (09:38 -0700)]
inet: clear num_timeout reqsk_alloc()
KMSAN caught uninit-value in tcp_create_openreq_child() [1]
This is caused by a recent change, combined by the fact
that TCP cleared num_timeout, num_retrans and sk fields only
when a request socket was about to be queued.
Under syncookie mode, a temporary request socket is used,
and req->num_timeout could contain garbage.
Lets clear these three fields sooner, there is really no
point trying to defer this and risk other bugs.
Commit ce4ab73ab0c27c ("net: stmmac: drop the reset delays from struct
stmmac_mdio_bus_data") moved the reset delay array from struct
stmmac_mdio_bus_data to a stack variable.
The values from the array inside struct stmmac_mdio_bus_data were
previously initialized to 0 because the struct was allocated using
devm_kzalloc(). The array on the stack has to be initialized
explicitly, else we might be reading garbage values.
Initialize all reset delays to 0 to ensure that the values are 0 if the
"snps,reset-delays-us" property is not defined.
This fixes booting at least two boards (MIPS pistachio marduk and ARM
sun8i H2+ Orange Pi Zero). These are hanging during boot when
initializing the stmmac Ethernet controller (as found by Kernel CI).
Both have in common that they don't define the "snps,reset-delays-us"
property.
Fixes: ce4ab73ab0c27c ("net: stmmac: drop the reset delays from struct stmmac_mdio_bus_data") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reported-by: "kernelci.org bot" <bot@kernelci.org> Signed-off-by: David S. Miller <davem@davemloft.net>
empty_child_inc/dec() use the ternary operator for conditional
operations. The conditions involve the post/pre in/decrement
operator and the operation is only performed when the condition
is *not* true. This is hard to parse for humans, use a regular
'if' construct instead and perform the in/decrement separately.
This also fixes two warnings that are emitted about the value
of the ternary expression being unused, when building the kernel
with clang + "kbuild: Remove unnecessary -Wno-unused-value"
(https://lore.kernel.org/patchwork/patch/1089869/):
CC net/ipv4/fib_trie.o
net/ipv4/fib_trie.c:351:2: error: expression result unused [-Werror,-Wunused-value]
++tn_info(n)->empty_children ? : ++tn_info(n)->full_children;
Fixes: 95f60ea3e99a ("fib_trie: Add collapse() and should_collapse() to resize") Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There was an unused variable 'mvpp2_dbgfs_prs_pmap_fops'
Added a usage consistent with other fops to dump pmap
to userspace.
Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/529 Signed-off-by: Nathan Huckleberry <nhuck@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 19 Jun 2019 17:50:24 +0000 (10:50 -0700)]
ipv6: Default fib6_type to RTN_UNICAST when not set
A user reported that routes are getting installed with type 0 (RTN_UNSPEC)
where before the routes were RTN_UNICAST. One example is from accel-ppp
which apparently still uses the ioctl interface and does not set
rtmsg_type. Another is the netlink interface where ipv6 does not require
rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
I spent a few cycles on transmit problems for af_iucv over regular
netdevices - please apply the following fixes to -net.
The first patch allows for skb allocations outside of GFP_DMA, while the
second patch respects that drivers might use skb_cow_head() and/or want
additional dev->needed_headroom.
Patch 3 is for a separate issue, where we didn't setup some of the
netdevice-specific infrastructure when running as a z/VM guest.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 18 Jun 2019 18:43:01 +0000 (20:43 +0200)]
net/af_iucv: always register net_device notifier
Even when running as VM guest (ie pr_iucv != NULL), af_iucv can still
open HiperTransport-based connections. For robust operation these
connections require the af_iucv_netdev_notifier, so register it
unconditionally.
Also handle any error that register_netdevice_notifier() returns.
Fixes: 9fbd87d41392 ("af_iucv: handle netdev events") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 18 Jun 2019 18:43:00 +0000 (20:43 +0200)]
net/af_iucv: build proper skbs for HiperTransport
The HiperSockets-based transport path in af_iucv is still too closely
entangled with qeth.
With commit a647a02512ca ("s390/qeth: speed-up L3 IQD xmit"), the
relevant xmit code in qeth has begun to use skb_cow_head(). So to avoid
unnecessary skb head expansions, af_iucv must learn to
1) respect dev->needed_headroom when allocating skbs, and
2) drop the header reference before cloning the skb.
While at it, also stop hard-coding the LL-header creation stage and just
use the appropriate helper.
Fixes: a647a02512ca ("s390/qeth: speed-up L3 IQD xmit") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Tue, 18 Jun 2019 18:42:59 +0000 (20:42 +0200)]
net/af_iucv: remove GFP_DMA restriction for HiperTransport
af_iucv sockets over z/VM IUCV require that their skbs are allocated
in DMA memory. This restriction doesn't apply to connections over
HiperSockets. So only set this limit for z/VM IUCV sockets, thereby
increasing the likelihood that the large (and linear!) allocations for
HiperTransport messages succeed.
Fixes: 3881ac441f64 ("af_iucv: add HiperSockets transport") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 19 Jun 2019 18:44:04 +0000 (11:44 -0700)]
Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Prevent PCI bridges in general (and PCIe ports in particular) from
being put into low-power states during system-wide suspend transitions
if there are any devices in D0 below them and refine the handling of
PCI devices in D0 during suspend-to-idle cycles"
* tag 'pm-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PCI: PM: Skip devices in D0 for suspend-to-idle
Linus Torvalds [Wed, 19 Jun 2019 18:39:00 +0000 (11:39 -0700)]
Merge tag 'apparmor-pr-2019-06-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
Pull apparmor bug fixes from John Johansen:
- fix PROFILE_MEDIATES for untrusted input
- enforce nullbyte at end of tag string
- reset pos on failure to unpack for various functions
* tag 'apparmor-pr-2019-06-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: reset pos on failure to unpack for various functions
apparmor: enforce nullbyte at end of tag string
apparmor: fix PROFILE_MEDIATES for untrusted input
Linus Torvalds [Wed, 19 Jun 2019 18:26:09 +0000 (11:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"Just a few small fixups and switching a couple of Thinkpads to SMBus
for touchpads as PS/2 emulation is not working well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: synaptics - enable SMBus on ThinkPad E480 and E580
Input: imx_keypad - make sure keyboard can always wake up system
Input: iqs5xx - get axis info before calling input_mt_init_slots()
Input: uinput - add compat ioctl number translation for UI_*_FF_UPLOAD
Input: silead - add MSSL0017 to acpi_device_id
Input: elantech - enable middle button support on 2 ThinkPads
Input: elan_i2c - increment wakeup count if wake source
Yang Yingliang [Sat, 15 Jun 2019 09:41:29 +0000 (17:41 +0800)]
doc: fix documentation about UIO_MEM_LOGICAL using
After commit d4fc5069a394 ("mm: switch s_mem and slab_cache in struct page")
page->mapping will be re-used by slab allocations and page->mapping->host
will be used in balance_dirty_pages_ratelimited() as an inode member but
it's not an inode in fact and leads an oops.
Merge tag 'thunderbolt-fixes-for-v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into char-misc-linus
Mika writes:
thunderbolt: Fixes for v5.2-rc6
This includes two fixes for issues found during the current release
cycle:
- Fix runtime PM regression when device is authorized after the
controller is runtime suspended.
- Correct CIO reset flow for Titan Ridge.
* tag 'thunderbolt-fixes-for-v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
thunderbolt: Implement CIO reset correctly for Titan Ridge
thunderbolt: Make sure device runtime resume completes before taking domain lock
====================
inet: fix defrag units dismantle races
This series add a new pre_exit() method to struct pernet_operations
to solve a race in defrag units dismantle, without adding extra
delays to netns dismantles.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 18 Jun 2019 18:09:00 +0000 (11:09 -0700)]
inet: fix various use-after-free in defrags units
syzbot reported another issue caused by my recent patches. [1]
The issue here is that fqdir_exit() is initiating a work queue
and immediately returns. A bit later cleanup_net() was able
to free the MIB (percpu data) and the whole struct net was freed,
but we had active frag timers that fired and triggered use-after-free.
We need to make sure that timers can catch fqdir->dead being set,
to bailout.
Since RCU is used for the reader side, this means
we want to respect an RCU grace period between these operations :
1) qfdir->dead = 1;
2) netns dismantle (freeing of various data structure)
This patch uses new new (struct pernet_operations)->pre_exit
infrastructure to ensures a full RCU grace period
happens between fqdir_pre_exit() and fqdir_exit()
This also means we can use a regular work queue, we no
longer need rcu_work.
Tested:
$ time for i in {1..1000}; do unshare -n /bin/false;done
real 0m2.585s
user 0m0.160s
sys 0m2.214s
[1]
BUG: KASAN: use-after-free in ip_expire+0x73e/0x800 net/ipv4/ip_fragment.c:152
Read of size 8 at addr ffff88808b9fe330 by task syz-executor.4/11860
Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
xdp: page_pool fixes and in-flight accounting
This patchset fix page_pool API and users, such that drivers can use it for
DMA-mapping. A number of places exist, where the DMA-mapping would not get
released/unmapped, all these are fixed. This occurs e.g. when an xdp_frame
gets converted to an SKB. As network stack doesn't have any callback for XDP
memory models.
The patchset also address a shutdown race-condition. Today removing a XDP
memory model, based on page_pool, is only delayed one RCU grace period. This
isn't enough as redirected xdp_frames can still be in-flight on different
queues (remote driver TX, cpumap or veth).
We stress that when drivers use page_pool for DMA-mapping, then they MUST
use one packet per page. This might change in the future, but more work lies
ahead, before we can lift this restriction.
This patchset change the page_pool API to be more strict, as in-flight page
accounting is added.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For DMA mapping use-case the page_pool keeps a pointer
to the struct device, which is used in DMA map/unmap calls.
For our in-flight handling, we also need to make sure that
the struct device have not disappeared. This is assured
via using get_device/put_device API.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reported-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
page_pool: add tracepoints for page_pool with details need by XDP
The xdp tracepoints for mem id disconnect don't carry information about, why
it was not safe_to_remove. The tracepoint page_pool:page_pool_inflight in
this patch can be used for extract this info for further debugging.
This patchset also adds tracepoint for the pages_state_* release/hold
transitions, including a pointer to the page. This can be used for stats
about in-flight pages, or used to debug page leakage via keeping track of
page pointer and combining this with kprobe for __put_page().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
These tracepoints make it easier to troubleshoot XDP mem id disconnect.
The xdp:mem_disconnect tracepoint cannot be replaced via kprobe. It is
placed at the last stable place for the pointer to struct xdp_mem_allocator,
just before it's scheduled for RCU removal. It also extract info on
'safe_to_remove' and 'force'.
Detailed info about in-flight pages is not available at this layer. The next
patch will added tracepoints needed at the page_pool layer for this.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xdp: force mem allocator removal and periodic warning
If bugs exists or are introduced later e.g. by drivers misusing the API,
then we want to warn about the issue, such that developer notice. This patch
will generate a bit of noise in form of periodic pr_warn every 30 seconds.
It is not nice to have this stall warning running forever. Thus, this patch
will (after 120 attempts) force disconnect the mem id (from the rhashtable)
and free the page_pool object. This will cause fallback to the put_page() as
before, which only potentially leak DMA-mappings, if objects are really
stuck for this long. In that unlikely case, a WARN_ONCE should show us the
call stack.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xdp: tracking page_pool resources and safe removal
This patch is needed before we can allow drivers to use page_pool for
DMA-mappings. Today with page_pool and XDP return API, it is possible to
remove the page_pool object (from rhashtable), while there are still
in-flight packet-pages. This is safely handled via RCU and failed lookups in
__xdp_return() fallback to call put_page(), when page_pool object is gone.
In-case page is still DMA mapped, this will result in page note getting
correctly DMA unmapped.
To solve this, the page_pool is extended with tracking in-flight pages. And
XDP disconnect system queries page_pool and waits, via workqueue, for all
in-flight pages to be returned.
To avoid killing performance when tracking in-flight pages, the implement
use two (unsigned) counters, that in placed on different cache-lines, and
can be used to deduct in-flight packets. This is done by mapping the
unsigned "sequence" counters onto signed Two's complement arithmetic
operations. This is e.g. used by kernel's time_after macros, described in
kernel commit 1ba3aab3033b and 5a581b367b5, and also explained in RFC1982.
The trick is these two incrementing counters only need to be read and
compared, when checking if it's safe to free the page_pool structure. Which
will only happen when driver have disconnected RX/alloc side. Thus, on a
non-fast-path.
It is chosen that page_pool tracking is also enabled for the non-DMA
use-case, as this can be used for statistics later.
After this patch, using page_pool requires more strict resource "release",
e.g. via page_pool_release_page() that was introduced in this patchset, and
previous patches implement/fix this more strict requirement.
Drivers no-longer call page_pool_destroy(). Drivers already call
xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will
attempt to disconnect the mem id, and if attempt fails schedule the
disconnect for later via delayed workqueue.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The mlx5 driver is using page_pool, but not for DMA-mapping (currently), and
is a little too relaxed about returning or releasing page resources, as it
is not strictly necessary, when not using DMA-mappings.
As this patchset is working towards tracking page_pool resources, to know
about in-flight frames on shutdown. Then fix places where mlx5 leak
page_pool resource.
In case of dma_mapping_error, then recycle into page_pool.
In mlx5e_free_rq() moved the page_pool_destroy() call to after the
mlx5e_page_release() calls, as it is more correct.
In mlx5e_page_release() when no recycle was requested, then release page
from the page_pool, via page_pool_release_page().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
page_pool: introduce page_pool_free and use in mlx5
In case driver fails to register the page_pool with XDP return API (via
xdp_rxq_info_reg_mem_model()), then the driver can free the page_pool
resources more directly than calling page_pool_destroy(), which does a
unnecessarily RCU free procedure.
This patch is preparing for removing page_pool_destroy(), from driver
invocation.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When converting an xdp_frame into an SKB, and sending this into the network
stack, then the underlying XDP memory model need to release associated
resources, because the network stack don't have callbacks for XDP memory
models. The only memory model that needs this is page_pool, when a driver
use the DMA-mapping feature.
Introduce page_pool_release_page(), which basically does the same as
page_pool_unmap_page(). Add xdp_release_frame() as the XDP memory model
interface for calling it, if the memory model match MEM_TYPE_PAGE_POOL, to
save the function call overhead for others. Have cpumap call
xdp_release_frame() before xdp_scrub_frame().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xdp: fix leak of IDA cyclic id if rhashtable_insert_slow fails
Fix error handling case, where inserting ID with rhashtable_insert_slow
fails in xdp_rxq_info_reg_mem_model, which leads to never releasing the IDA
ID, as the lookup in xdp_rxq_info_unreg_mem_model fails and thus
ida_simple_remove() is never called.
Fix by releasing ID via ida_simple_remove(), and mark xdp_rxq->mem.id with
zero, which is already checked in xdp_rxq_info_unreg_mem_model().
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilias Apalodimas [Tue, 18 Jun 2019 13:05:17 +0000 (15:05 +0200)]
net: page_pool: add helper function to unmap dma addresses
On a previous patch dma addr was stored in 'struct page'.
Use that to unmap DMA addresses used by network drivers
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilias Apalodimas [Tue, 18 Jun 2019 13:05:12 +0000 (15:05 +0200)]
net: page_pool: add helper function to retrieve dma addresses
On a previous patch dma addr was stored in 'struct page'.
Use that to retrieve DMA addresses used by network drivers
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:40 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 59 temple place suite 330 boston ma 02111
1307 usa the full gnu general public license is included in this
distribution in the file called license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 8 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.801261482@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:37 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
Based on 1 normalized pattern(s):
this file is free software you can redistribute it and or modify it
under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 8 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.443595178@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:36 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license this
program is distributed in the hope that it will be useful but
without any warranty without even the implied warranty of
merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a
copy of the gnu general public license along with this program if
not write to the free software foundation 51 franklin street fifth
floor boston ma 02110 1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.308909165@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:35 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license this program
is distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 51 franklin st fifth floor boston ma 02110
1301 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.195075312@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:34 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation see readme and copying for
more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 9 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081207.060259192@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:33 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas Gleixner [Tue, 4 Jun 2019 08:11:28 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the version 2 of the gnu general public
license as published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 10 file(s).
Thomas Gleixner [Tue, 4 Jun 2019 08:11:24 +0000 (10:11 +0200)]
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 3 file(s).