mm: use helper functions for allocating and freeing vm_area structs
The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.
We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.
Right now those functions are literally just wrappers around the
kmem_cache_*() calls. This is a purely mechanical conversion:
# new vma:
kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()
# copy old vma
kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)
to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: memcg: fix use after free in mem_cgroup_iter()
mm/huge_memory.c: fix data loss when splitting a file pmd
fat: fix memory allocation failure handling of match_strdup()
MAINTAINERS: Peter has moved
mm/memblock: add missing include <linux/bootmem.h>
mm: memcg: fix use after free in mem_cgroup_iter()
It was reported that a kernel crash happened in mem_cgroup_iter(), which
can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
......
Call trace:
mem_cgroup_iter+0x2e0/0x6d4
shrink_zone+0x8c/0x324
balance_pgdat+0x450/0x640
kswapd+0x130/0x4b8
kthread+0xe8/0xfc
ret_from_fork+0x10/0x20
mem_cgroup_iter():
......
if (css_tryget(css)) <-- crash here
break;
......
The crashing reason is that mem_cgroup_iter() uses the memcg object whose
pointer is stored in iter->position, which has been freed before and
filled with POISON_FREE(0x6b).
And the root cause of the use-after-free issue is that
invalidate_reclaim_iterators() fails to reset the value of iter->position
to NULL when the css of the memcg is released in non- hierarchical mode.
Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com Fixes: 6df38689e0e9 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim") Signed-off-by: Jing Xia <jing.xia.mail@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <chunyan.zhang@unisoc.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/huge_memory.c: fix data loss when splitting a file pmd
__split_huge_pmd_locked() must check if the cleared huge pmd was dirty,
and propagate that to PageDirty: otherwise, data may be lost when a huge
tmpfs page is modified then split then reclaimed.
How has this taken so long to be noticed? Because there was no problem
when the huge page is written by a write system call (shmem_write_end()
calls set_page_dirty()), nor when the page is allocated for a write fault
(fault_dirty_shared_page() calls set_page_dirty()); but when allocated for
a read fault (which MAP_POPULATE simulates), no set_page_dirty().
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils Fixes: d21b9e57c74c ("thp: handle file pages in split_huge_pmd()") Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Ashwin Chaugule <ashwinch@google.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: <stable@vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fat: fix memory allocation failure handling of match_strdup()
In parse_options(), if match_strdup() failed, parse_options() leaves
opts->iocharset in unexpected state (i.e. still pointing the freed
string). And this can be the cause of double free.
To fix, this initialize opts->iocharset always when freeing.
and commit ea1f5f3712af ("mm: define memblock_virt_alloc_try_nid_raw")
introduced the following function definition:
memblock_virt_alloc_try_nid_raw()
This commit adds an include of header file <linux/bootmem.h> to provide
the missing function prototypes. This silences the following gcc warning
(W=1):
mm/memblock.c:1334:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_raw' [-Wmissing-prototypes]
mm/memblock.c:1371:15: warning: no previous prototype for `memblock_virt_alloc_try_nid_nopanic' [-Wmissing-prototypes]
mm/memblock.c:1407:15: warning: no previous prototype for `memblock_virt_alloc_try_nid' [-Wmissing-prototypes]
Also adds #ifdef blockers to prevent compilation failure on mips/ia64
where CONFIG_NO_BOOTMEM=n as could be seen in commit commit 6cc22dc08a24
("revert "mm/memblock: add missing include <linux/bootmem.h>"").
Merge tag 'for-4.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mike Snitzer:
"Fix DM writecache target to allow an optional offset to the start of
the data and metadata area.
This allows userspace tools (e.g. LVM2) to place a header and metadata
at the front of the writecache device for its use"
* tag 'for-4.18/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm writecache: support optional offset for start of device
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A set of 8 obvious fixes.
Three (2 qla2xxx and the cxlflash oopses) are regressions, two from
4.17 and one from the merge window. The hpsa change is user visible,
but it fixes an error users have complained about"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxlflash: fix assignment of the backend operations
scsi: qedi: Send driver state to MFW
scsi: qedf: Send the driver state to MFW
scsi: hpsa: correct enclosure sas address
scsi: sd_zbc: Fix variable type and bogus comment
scsi: qla2xxx: Fix NULL pointer dereference for fcport search
scsi: qla2xxx: Fix kernel crash due to late workqueue allocation
scsi: qla2xxx: Fix inconsistent DMA mem alloc/free
Merge tag 'iommu-fixes-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Only one revert, for an an Intel VT-d patch that caused issues with
the i915 GPU driver"
* tag 'iommu-fixes-v4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
Revert "iommu/vt-d: Clean up pasid quirk for pre-production devices"
Merge tag 'platform-drivers-x86-v4.18-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
"The Dell laptop ACPI video brightness control is now back after fixing
a regression brought by SMM refactoring"
* tag 'platform-drivers-x86-v4.18-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-laptop: Fix backlight detection
Merge tag 'nds32-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux
Pull nds32 updates from Greentime Hu:
"Bug fixes and build ixes for nds32"
* tag 'nds32-for-linus-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
nds32: fix build error "relocation truncated to fit: R_NDS32_25_PCREL_RELA" when make allyesconfig
nds32: To simplify the implementation of update_mmu_cache()
nds32: Fix the dts pointer is not passed correctly issue.
nds32: To implement these icache invalidation APIs since nds32 cores don't snoop data cache. This issue is found by Guo Ren. Based on the Documentation/core-api/cachetlb.rst and it says:
nds32: Fix build error caused by configuration flag rename
nds32: define __NDS32_E[BL]__ for sparse
Merge tag 'pm-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix a relatively old initialization issue in intel_pstate causing the
pcc-cpufreq driver to be used instead of it on some HP Proliant
systems.
This turned into a functional regression during the 4.17 cycle,
because pcc-cpufreq is a scalability disaster and that was amplified
by the idle loop rework done at that time (Rafael Wysocki).
* tag 'pm-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Register when ACPI PCCH is present
Merge tag 'acpi-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Extend the recently added suspend-to-idle quirk for Thinkpad X1 Carbon
6th to other systems from that familiy which turned out to need it too
(Robin Johnson)"
* tag 'acpi-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / EC: Use ec_no_wakeup on more Thinkpad X1 Carbon 6th systems
The commit ab96746aaa34 ("iommu/vt-d: Clean up pasid quirk for
pre-production devices") triggers ECS mode on some platforms
which have broken ECS support. As the result, graphic device
will be inoperable on boot.
Merge tag 'drm-fixes-2018-07-20' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Just two sets of driver fixes this week to follow up on the set from
earlier in the week and hopefully get me realigned schedule wise.
amdgpu:
- ACP fix for boards with multiple I2S instances
- DP fix for CZ, vega
- hybrid laptop fixes
- Resume regression fix
nouveau:
- large memory systems and Pascal fix
- MST race fixes
- runtime PM fix"
* tag 'drm-fixes-2018-07-20' of git://anongit.freedesktop.org/drm/drm:
drm/nouveau/fb/gp100-: disable address remapper
drm/amd/amdgpu: creating two I2S instances for stoney/cz (v2)
drm/amdgpu: add another ATPX quirk for TOPAZ
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
drm/amdgpu: Make sure IB tests flushed after IP resume
drm/nouveau: Set DRIVER_ATOMIC cap earlier to fix debugfs
drm/nouveau: Remove bogus crtc check in pmops_runtime_idle
drm/nouveau/drm/nouveau: Fix runtime PM leak in nv50_disp_atomic_commit()
drm/nouveau: Avoid looping through fake MST connectors
drm/nouveau: Use drm_connector_list_iter_* for iterating connectors
drm/nouveau/gem: off by one bugs in nouveau_gem_pushbuf_reloc_apply()
drm/nouveau/kms/nv50-: ensure window updates are submitted when flushing mst disables
Dave Airlie [Fri, 20 Jul 2018 00:23:37 +0000 (10:23 +1000)]
Merge branch 'drm-fixes-4.18' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Fixes for 4.18. The ACP patch is a bit bigger than I would like
at this point, but it should have gone in long ago, it just fell
through the cracks. The others are pretty small and straight-forward.
- ACP fix for boards with 2 I2S instances
- DP fix for CZ, vega
- Fix for a hybrid graphics laptop
- Fix a resume regression
Merge tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Fix crashes that happen when PHY drivers are left disabled in the V3
Semiconductor, MediaTek, Faraday, Aardvark, DesignWare, Versatile,
and X-Gene host controller drivers (Sergei Shtylyov)
- Fix a NULL pointer dereference in the endpoint library configfs
support (Kishon Vijay Abraham I)
- Fix a race condition in Hyper-V IRQ handling (Dexuan Cui)
* tag 'pci-v4.18-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: v3-semi: Fix I/O space page leak
PCI: mediatek: Fix I/O space page leak
PCI: faraday: Fix I/O space page leak
PCI: aardvark: Fix I/O space page leak
PCI: designware: Fix I/O space page leak
PCI: versatile: Fix I/O space page leak
PCI: xgene: Fix I/O space page leak
PCI: OF: Fix I/O space page leak
PCI: endpoint: Fix NULL pointer dereference error when CONFIGFS is disabled
PCI: hv: Disable/enable IRQs rather than BH in hv_compose_msi_msg()
This manifsted as strace segfaulting on HSDK because gcc was targetting
the accumulator registers as GPRs, which kernek was not saving/restoring
by default.
Merge tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A rawmidi race fix and three trivial HD-audio quirks"
* tag 'sound-4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Yet another Clevo P950 quirk entry
ALSA: rawmidi: Change resized buffers atomically
ALSA: hda/realtek - Add Panasonic CF-SZ6 headset jack quirk
ALSA: hda: add mute led support for HP ProBook 455 G5
Robin H. Johnson [Fri, 13 Jul 2018 20:50:47 +0000 (20:50 +0000)]
ACPI / EC: Use ec_no_wakeup on more Thinkpad X1 Carbon 6th systems
The ec_no_wakeup matcher added for Thinkpad X1 Carbon 6th gen systems
beyond matched only a single DMI model (20KGS3JF01), that didn't cover
my laptop (20KH002JUS). Change to match based on DMI product family to
cover all X1 6th gen systems.
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ben Skeggs [Wed, 18 Jul 2018 06:10:58 +0000 (16:10 +1000)]
drm/nouveau/fb/gp100-: disable address remapper
This was causing problems on a system with a large amount of RAM, where
display push buffers were being fetched incorrectly when placed in high
system memory addresses.
While this commit will resolve the issue on that particular system, the
issue will be avoided completely with another patch to more fully solve
problems with display and large amounts of system memory on Pascal.
It's still probably a good idea to disable this to prevent weird issues
in the future.
Randy Dunlap [Wed, 18 Jul 2018 01:27:45 +0000 (18:27 -0700)]
tcp: identify cryptic messages as TCP seq # bugs
Attempt to make cryptic TCP seq number error messages clearer by
(1) identifying the source of the message as "TCP", (2) identifying the
errors as "seq # bug", and (3) grouping the field identifiers and values
by separating them with commas.
Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It seems that a *break* is missing in order to avoid falling through
to the default case. Otherwise, checking *chan* makes no sense.
Fixes: 72df7a7244c0 ("ptp: Allow reassigning calibration pin function") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: Fix napi reschedule while receive completion is busy
If out ring is full temporarily and receive completion cannot go out,
we may still need to reschedule napi if certain conditions are met.
Otherwise the napi poll might be stopped forever, and cause network
disconnect.
Fixes: 7426b1a51803 ("netvsc: optimize receive completions") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Vitaly Bordug's email bounces ("ru.mvista.com: Name or service not
known") and there was no activity (ack, review, sign) since 2009.
Cc: Vitaly Bordug <vitb@kernel.crashing.org> Cc: Pantelis Antoniou <pantelis.antoniou@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: cavium: Add fine-granular dependencies on PCI
Add dependencies on PCI where necessary.
Fixes: 7e2bc7fb65 ("net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI") Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Wahren [Wed, 18 Jul 2018 06:31:44 +0000 (08:31 +0200)]
net: qca_spi: Make sure the QCA7000 reset is triggered
In case the SPI thread is not running, a simple reset of sync
state won't fix the transmit timeout. We also need to wake up the kernel
thread.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: ed7d42e24eff ("net: qca_spi: fix transmit queue timeout handling") Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Wahren [Wed, 18 Jul 2018 06:31:43 +0000 (08:31 +0200)]
net: qca_spi: Avoid packet drop during initial sync
As long as the synchronization with the QCA7000 isn't finished, we
cannot accept packets from the upper layers. So let the SPI thread
enable the TX queue after sync and avoid unwanted packet drop.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 17 Jul 2018 16:12:39 +0000 (17:12 +0100)]
ipv6: fix useless rol32 call on hash
The rol32 call is currently rotating hash but the rol'd value is
being discarded. I believe the current code is incorrect and hash
should be assigned the rotated value returned from rol32.
Thanks to David Lebrun for spotting this.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 17 Jul 2018 15:52:54 +0000 (16:52 +0100)]
ipv6: sr: fix useless rol32 call on hash
The rol32 call is currently rotating hash but the rol'd value is
being discarded. I believe the current code is incorrect and hash
should be assigned the rotated value returned from rol32.
Detected by CoverityScan, CID#1468411 ("Useless call")
Fixes: b5facfdba14c ("ipv6: sr: Compute flowlabel for outer IPv6 header of seg6 encap mode") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: dlebrun@google.com Signed-off-by: David S. Miller <davem@davemloft.net>
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The V3 Semiconductor PCI driver has the same issue.
Replace devm_pci_remap_iospace() with its devm_ managed version to fix
the bug.
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The MediaTek PCIe driver has the same issue.
Replace devm_pci_remap_iospace() with its devm_ managed counterpart
to fix the bug.
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The Faraday PCI driver has the same issue. Replace pci_remap_iospace()
with its devm_ managed version to fix the bug.
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The Aardvark PCI controller driver has the same issue.
Replace pci_remap_iospace() with its devm_ managed version to fix the bug.
Fixes: 8c39d710363c ("PCI: aardvark: Add Aardvark PCI host controller driver") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The DesignWare PCIe controller driver has the same issue.
Replace devm_pci_remap_iospace() with a devm_ managed version to fix the
bug.
Fixes: cbce7900598c ("PCI: designware: Make driver arch-agnostic") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jingoo Han <jingoohan1@gmail.com>
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The Versatile PCI controller driver has the same issue.
Replace pci_remap_iospace() with the devm_ managed version to fix the bug.
Fixes: b7e78170efd4 ("PCI: versatile: Add DT-based ARM Versatile PB PCIe host driver") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
The X-Gene PCI controller driver has the same issue.
Replace pci_remap_iospace() with the devm_ managed version so that the
pages get unmapped automagically on any probe failure.
Fixes: 5f6b6ccdbe1c ("PCI: xgene: Add APM X-Gene PCIe driver") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
net: usb: asix: replace mii_nway_restart in resume path
mii_nway_restart is not pm aware which results in a rtnl deadlock.
Implement mii_nway_restart manual by setting BMCR_ANRESTART if
BMCR_ANENABLE is set.
To reproduce:
* plug an asix based usb network interface
* wait until the device enters PM (~5 sec)
* `ip link set eth1 up` will never return
Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Signed-off-by: Alexander Couzens <lynxis@fe80.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
It turned out that pci_remap_iospace() wasn't undone when the driver's
probe failed, and since devm_phy_optional_get() returned -EPROBE_DEFER,
the probe was retried, finally causing the BUG due to trying to remap
already remapped pages.
Introduce the devm_pci_remap_iospace() managed API and replace the
pci_remap_iospace() call with it to fix the bug.
Fixes: dbf9826d5797 ("PCI: generic: Convert to DT resource parsing API") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
[lorenzo.pieralisi@arm.com: split commit/updated the commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Fix this by sanitizing t.qset_idx before using it to index
adapter->msix_info
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
lib/rhashtable: consider param->min_size when setting initial table size
rhashtable_init() currently does not take into account the user-passed
min_size parameter unless param->nelem_hint is set as well. As such,
the default size (number of buckets) will always be HASH_DEFAULT_SIZE
even if the smallest allowed size is larger than that. Remediate this
by unconditionally calling into rounded_hashtable_size() and handling
things accordingly.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this by sanitizing info.index before indirectly using it to index
vdev->region
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Three regression fixes. They're few-liners and fixing some corner
cases missed in the origial patches"
* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
btrfs: fix use-after-free of cmp workspace pages
btrfs: restore uuid_mutex in btrfs_open_devices
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Miscellaneous bugfixes, plus a small patchlet related to Spectre v2"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvmclock: fix TSC calibration for nested guests
KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
x86/kvmclock: set pvti_cpu0_va after enabling kvmclock
x86/kvm/Kconfig: Ensure CRYPTO_DEV_CCP_DD state at minimum matches KVM_AMD
kvm: nVMX: Restore exit qual for VM-entry failure due to MSR loading
x86/kvm/vmx: don't read current->thread.{fs,gs}base of legacy tasks
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
David S. Miller [Wed, 18 Jul 2018 17:58:27 +0000 (10:58 -0700)]
Merge branch 'smc-fixes'
Ursula Braun says:
====================
net/smc: fixes 2018-07-18
here are small fixes for SMC: The first patch speeds up unidirectional
traffic, the second patch increases security, and the third patch
fixes a problem for fallback cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
During clc handshake the receive timeout is set to CLC_WAIT_TIME.
Remember and reset the original timeout value after the receive calls,
and remove a duplicate assignment of CLC_WAIT_TIME.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
For security reasons the return code of get_user() should always be
checked.
Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK") Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The SMC protocol requires to send a separate consumer cursor update,
if it cannot be piggybacked to updates of the producer cursor.
Currently the decision to send a separate consumer cursor update
just considers the amount of data already received by the socket
program. It does not consider the amount of data already arrived, but
not yet consumed by the receiver. Basing the decision on the
difference between already confirmed and already arrived data
(instead of difference between already confirmed and already consumed
data), may lead to a somewhat earlier consumer cursor update send in
fast unidirectional traffic scenarios, and thus to better throughput.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Suggested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
syzbot is reporting stalls at nfc_llcp_send_ui_frame() [1]. This is
because nfc_llcp_send_ui_frame() is retrying the loop without any delay
when nonblocking nfc_alloc_send_skb() returned NULL.
Since there is no need to use MSG_DONTWAIT if we retry until
sock_alloc_send_pskb() succeeds, let's use blocking call.
Also, in case an unexpected error occurred, let's break the loop
if blocking nfc_alloc_send_skb() failed.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+d29d18215e477cfbfbdd@syzkaller.appspotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We almost never run into this by accident because randconfig builds
end up selecting DST_CACHE from some other tunnel protocol, and this
one appears to be the only one missing the explicit 'select'.
>From all I can tell, this problem first appeared in linux-4.9
when dst_cache support got added to ILA.
Fixes: 79ff2fc31e0f ("ila: Cache a route to translated address") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
cpufreq: intel_pstate: Register when ACPI PCCH is present
Currently, intel_pstate doesn't register if _PSS is not present on
HP Proliant systems, because it expects the firmware to take over
CPU performance scaling in that case. However, if ACPI PCCH is
present, the firmware expects the kernel to use it for CPU
performance scaling and the pcc-cpufreq driver is loaded for that.
Unfortunately, the firmware interface used by that driver is not
scalable for fundamental reasons, so pcc-cpufreq is way suboptimal
on systems with more than just a few CPUs. In fact, it is better to
avoid using it at all.
For this reason, modify intel_pstate to look for ACPI PCCH if _PSS
is not present and register if it is there. Also prevent the
pcc-cpufreq driver from trying to initialize itself if intel_pstate
has been registered already.
Fixes: fbbcdc0744da (intel_pstate: skip the driver if ACPI has power mgmt option) Reported-by: Andreas Herrmann <aherrmann@suse.com> Reviewed-by: Andreas Herrmann <aherrmann@suse.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Andreas Herrmann <aherrmann@suse.com> Cc: 4.16+ <stable@vger.kernel.org> # 4.16+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Inside a nested guest, access to hardware can be slow enough that
tsc_read_refs always return ULLONG_MAX, causing tsc_refine_calibration_work
to be called periodically and the nested guest to spend a lot of time
reading the ACPI timer.
However, if the TSC frequency is available from the pvclock page,
we can just set X86_FEATURE_TSC_KNOWN_FREQ and avoid the recalibration.
'refine' operation.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[Commit message rewritten. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Liran Alon [Fri, 29 Jun 2018 19:59:04 +0000 (22:59 +0300)]
KVM: VMX: Mark VMXArea with revision_id of physical CPU even when eVMCS enabled
When eVMCS is enabled, all VMCS allocated to be used by KVM are marked
with revision_id of KVM_EVMCS_VERSION instead of revision_id reported
by MSR_IA32_VMX_BASIC.
However, even though not explictly documented by TLFS, VMXArea passed
as VMXON argument should still be marked with revision_id reported by
physical CPU.
This issue was found by the following setup:
* L0 = KVM which expose eVMCS to it's L1 guest.
* L1 = KVM which consume eVMCS reported by L0.
This setup caused the following to occur:
1) L1 execute hardware_enable().
2) hardware_enable() calls kvm_cpu_vmxon() to execute VMXON.
3) L0 intercept L1 VMXON and execute handle_vmon() which notes
vmxarea->revision_id != VMCS12_REVISION and therefore fails with
nested_vmx_failInvalid() which sets RFLAGS.CF.
4) L1 kvm_cpu_vmxon() don't check RFLAGS.CF for failure and therefore
hardware_enable() continues as usual.
5) L1 hardware_enable() then calls ept_sync_global() which executes
INVEPT.
6) L0 intercept INVEPT and execute handle_invept() which notes
!vmx->nested.vmxon and thus raise a #UD to L1.
7) Raised #UD caused L1 to panic.
Paolo Bonzini [Mon, 28 May 2018 11:31:13 +0000 (13:31 +0200)]
KVM: irqfd: fix race between EPOLLHUP and irq_bypass_register_consumer
A comment warning against this bug is there, but the code is not doing what
the comment says. Therefore it is possible that an EPOLLHUP races against
irq_bypass_register_consumer. The EPOLLHUP handler schedules irqfd_shutdown,
and if that runs soon enough, you get a use-after-free.
Reported-by: syzbot <syzkaller@googlegroups.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
Lan Tianyu [Fri, 22 Dec 2017 02:10:36 +0000 (21:10 -0500)]
KVM/Eventfd: Avoid crash when assign and deassign specific eventfd in parallel.
Syzbot reports crashes in kvm_irqfd_assign(), caused by use-after-free
when kvm_irqfd_assign() and kvm_irqfd_deassign() run in parallel
for one specific eventfd. When the assign path hasn't finished but irqfd
has been added to kvm->irqfds.items list, another thead may deassign the
eventfd and free struct kvm_kernel_irqfd(). The assign path then uses
the struct kvm_kernel_irqfd that has been freed by deassign path. To avoid
such issue, keep irqfd under kvm->irq_srcu protection after the irqfd
has been added to kvm->irqfds.items list, and call synchronize_srcu()
in irq_shutdown() to make sure that irqfd has been fully initialized in
the assign path.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Tianyu Lan <tianyu.lan@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
octeon_mgmt: Fix MIX registers configuration on MTU setup
octeon_mgmt driver doesn't drop RX frames that are 1-4 bytes bigger than
MTU set for the corresponding interface. The problem is in the
AGL_GMX_RX0/1_FRM_MAX register setting, which should not account for VLAN
tagging.
According to Octeon HW manual:
"For tagged frames, MAX increases by four bytes for each VLAN found up to a
maximum of two VLANs, or MAX + 8 bytes."
OCTEON_FRAME_HEADER_LEN "define" is fine for ring buffer management, but
should not be used for AGL_GMX_RX0/1_FRM_MAX.
The problem could be easily reproduced using "ping" command. If affected
system has default MTU 1500, other host (having MTU >= 1504) can
successfully "ping" the affected system with payload size 1473-1476,
resulting in IP packets of size 1501-1504 accepted by the mgmt driver.
Fixed system still accepts IP packets of 1500 bytes even with VLAN tagging,
because the limits are lifted in HW as expected, for every VLAN tag.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
[why] dp hbr2 eye diagram pattern for raven asic is not stabled.
workaround is to use tp4 pattern. But this should not be
applied to asic before raven.
[how] add new bool varilable in asic caps. for raven asic,
use the workaround. for carrizo, vega, do not use workaround.
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Reviewed-by: Harry Wentland <Harry.Wentland@amd.com> Acked-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leo Liu [Fri, 13 Jul 2018 15:26:28 +0000 (11:26 -0400)]
drm/amdgpu: Make sure IB tests flushed after IP resume
Fixes: 2c773de2 (drm/amdgpu: defer test IBs on the rings at boot (V3)) Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Mon, 8 Jan 2018 19:51:04 +0000 (11:51 -0800)]
Mark HI and TASKLET softirq synchronous
Way back in 4.9, we committed 4cd13c21b207 ("softirq: Let ksoftirqd do
its job"), and ever since we've had small nagging issues with it. For
example, we've had:
1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()")
all of which worked around some of the effects of that commit.
The DVB people have also complained that the commit causes excessive USB
URB latencies, which seems to be due to the USB code using tasklets to
schedule USB traffic. This seems to be an issue mainly when already
living on the edge, but waiting for ksoftirqd to handle it really does
seem to cause excessive latencies.
Now Hanna Hawa reports that this issue isn't just limited to USB URB and
DVB, but also causes timeout problems for the Marvell SoC team:
"I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040
SoC with 4 ARMv8 cores of CA72) Raid 5 built with Marvell DMA engine
and async_tx mechanism (ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2)
uses a tasklet to clean the done descriptors from the queue"
The latency problem causes a panic:
mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
Kernel panic - not syncing: async_tx_quiesce: DMA error waiting for transaction
We've discussed simply just reverting the original commit entirely, and
also much more involved solutions (with per-softirq threads etc). This
patch is intentionally stupid and fairly limited, because the issue
still remains, and the other solutions either got sidetracked or had
other issues.
We should probably also consider the timer softirqs to be synchronous
and not be delayed to ksoftirqd (since they were the issue with the
earlier watchdog problems), but that should be done as a separate patch.
This does only the tasklet cases.
Reported-and-tested-by: Hanna Hawa <hannah@marvell.com> Reported-and-tested-by: Josef Griebichler <griebichler.josef@gmx.at> Reported-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SNDRV_RAWMIDI_IOCTL_PARAMS ioctl may resize the buffers and the
current code is racy. For example, the sequencer client may write to
buffer while it being resized.
As a simple workaround, let's switch to the resized buffer inside the
stream runtime lock.
btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block()
In commit ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device
replace") we removed the branch of copy_nocow_pages() to avoid
corruption for compressed nodatasum extents.
However above commit only solves the problem in scrub_extent(), if
during scrub_pages() we failed to read some pages,
sctx->no_io_error_seen will be non-zero and we go to fixup function
scrub_handle_errored_block().
In scrub_handle_errored_block(), for sctx without csum (no matter if
we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
which does the similar thing with copy_nocow_pages(), but does it
without the extra check in copy_nocow_pages() routine.
So for test cases like btrfs/100, where we emulate read errors during
replace/scrub, we could corrupt compressed extent data again.
This patch will fix it just by avoiding any "optimization" for
nodatasum, just falls back to the normal fixup routine by try read from
any good copy.
This also solves WARN_ON() or dead lock caused by lame backref iteration
in scrub_fixup_nodatasum() routine.
The deadlock or WARN_ON() won't be triggered before commit ac0b4145d662
("btrfs: scrub: Don't use inode pages for device replace") since
copy_nocow_pages() have better locking and extra check for data extent,
and it's already doing the fixup work by try to read data from any good
copy, so it won't go scrub_fixup_nodatasum() anyway.
This patch disables the faulty code and will be removed completely in a
followup patch.
Fixes: ac0b4145d662 ("btrfs: scrub: Don't use inode pages for device replace") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
SMC ioctl processing requires the sock lock to work properly in
all thinkable scenarios.
Problem has been found with RaceFuzzer and fixes:
KASAN: null-ptr-deref Read in smc_ioctl
Reported-by: Byoungyoung Lee <lifeasageek@gmail.com> Reported-by: syzbot+35b2c5aa76fd398b9fd4@syzkaller.appspotmail.com Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal <sanjeevb.bansal@broadcom.com> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Allen [Mon, 16 Jul 2018 15:29:30 +0000 (10:29 -0500)]
ibmvnic: Fix error recovery on login failure
Testing has uncovered a failure case that is not handled properly. In the
event that a login fails and we are not able to recover on the spot, we
return 0 from do_reset, preventing any error recovery code from being
triggered. Additionally, the state is set to "probed" meaning that when we
are able to trigger the error recovery, the driver always comes up in the
probed state. To handle the case properly, we need to return a failure code
here and set the adapter state to the state that we entered the reset in
indicating the state that we would like to come out of the recovery reset
in.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Wahren [Sun, 15 Jul 2018 19:53:20 +0000 (21:53 +0200)]
net: lan78xx: Fix race in tx pending skb size calculation
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <stable@vger.kernel.org> Signed-off-by: Floris Bos <bos@je-eigen-domein.nl> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 15 Jul 2018 16:35:19 +0000 (09:35 -0700)]
net/ipv6: Do not allow device only routes via the multipath API
Eric reported that reverting the patch that fixed and simplified IPv6
multipath routes means reverting back to invalid userspace notifications.
eg.,
$ ip -6 route add 2001:db8:1::/64 nexthop dev eth0 nexthop dev eth1
only generates a single notification:
2001:db8:1::/64 dev eth0 metric 1024 pref medium
While working on a fix for this problem I found another case that is just
broken completely - a multipath route with a gateway followed by device
followed by gateway:
$ ip -6 ro add 2001:db8:103::/64
nexthop via 2001:db8:1::64
nexthop dev dummy2
nexthop via 2001:db8:3::64
In this case the device only route is dropped completely - no notification
to userpsace but no addition to the FIB either:
$ ip -6 ro ls
2001:db8:1::/64 dev dummy1 proto kernel metric 256 pref medium
2001:db8:2::/64 dev dummy2 proto kernel metric 256 pref medium
2001:db8:3::/64 dev dummy3 proto kernel metric 256 pref medium
2001:db8:103::/64 metric 1024
nexthop via 2001:db8:1::64 dev dummy1 weight 1
nexthop via 2001:db8:3::64 dev dummy3 weight 1 pref medium
fe80::/64 dev dummy1 proto kernel metric 256 pref medium
fe80::/64 dev dummy2 proto kernel metric 256 pref medium
fe80::/64 dev dummy3 proto kernel metric 256 pref medium
Really, IPv6 multipath is just FUBAR'ed beyond repair when it comes to
device only routes, so do not allow it all.
This change will break any scripts relying on the mpath api for insert,
but I don't see any other way to handle the permutations. Besides, since
the routes are added to the FIB as standalone (non-multipath) routes the
kernel is not doing what the user requested, so it might as well tell the
user that.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Baranoff [Sun, 15 Jul 2018 15:36:37 +0000 (11:36 -0400)]
tcp: Fix broken repair socket window probe patch
Correct previous bad attempt at allowing sockets to come out of TCP
repair without sending window probes. To avoid changing size of
the repair variable in struct tcp_sock, this lets the decision for
sending probes or not to be made when coming out of repair by
introducing two ways to turn it off.
v2:
* Remove erroneous comment; defines now make behavior clear
Fixes: 70b7ff130224 ("tcp: allow user to create repair socket without window probes") Signed-off-by: Stefan Baranoff <sbaranoff@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When a new rx packet arrives, the rx path will decide whether to reuse
the remainder of the page or not according to one of the below conditions:
1. frag_info->frag_stride == PAGE_SIZE / 2
2. frags->page_offset + frag_info->frag_size > PAGE_SIZE;
The first condition is no met for when XDP is set.
For XDP, page_offset is always set to priv->rx_headroom which is
XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some
padding, still the 2nd release condition will hold since
XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not
be released and will be _wrongly_ reused for next free rx descriptor.
In XDP there is an assumption to have a page per packet and reuse can
break such assumption and might cause packet data corruptions.
Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd
case to avoid page reuse when XDP is set, since rx_headroom is set to 0
for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup.
No additional cache line is required for the new condition.
Fixes: 34db548bfb95 ("mlx4: add page recycling in receive path") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Suggested-by: Martin KaFai Lau <kafai@fb.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CC [M] drivers/net/ethernet/freescale/fman/fman.o
In file included from ../drivers/net/ethernet/freescale/fman/fman.c:35:
../include/linux/fsl/guts.h: In function 'guts_set_dmacr':
../include/linux/fsl/guts.h:165:2: error: implicit declaration of function 'clrsetbits_be32' [-Werror=implicit-function-declaration]
clrsetbits_be32(&guts->dmacr, 3 << shift, device << shift);
^~~~~~~~~~~~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Madalin Bucur <madalin.bucur@nxp.com> Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: David S. Miller <davem@davemloft.net>
hv/netvsc: fix handling of fallback to single queue mode
The netvsc device may need to fallback to running in single queue
mode if host side only wants to support single queue.
Recent change for handling mtu broke this in setup logic.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 3ffe64f1a641 ("hv_netvsc: split sub-channel setup into async and sync") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 13 Jul 2018 17:03:32 +0000 (12:03 -0500)]
ibmvnic: Revise RX/TX queue error messages
During a device failover, there may be latency between the loss
of the current backing device and a notification from firmware that
a failover has occurred. This latency can result in a large amount of
error printouts as firmware returns outgoing traffic with a generic
error code. These are not necessarily errors in this case as the
firmware is busy swapping in a new backing adapter and is not ready
to send packets yet. This patch reclassifies those error codes as
warnings with an explanation that a failover may be pending. All
other return codes will be considered errors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: make DAD fail with enhanced DAD when nonce length differs
Commit adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)")
added enhanced DAD with a nonce length of 6 bytes. However, RFC7527
doesn't specify the length of the nonce, other than being 6 + 8*k bytes,
with integer k >= 0 (RFC3971 5.3.2). The current implementation simply
assumes that the nonce will always be 6 bytes, but others systems are
free to choose different sizes.
If another system sends a nonce of different length but with the same 6
bytes prefix, it shouldn't be considered as the same nonce. Thus, check
that the length of the received nonce is the same as the length we sent.
Ugly scapy test script running on veth0:
def loop():
pkt=sniff(iface="veth0", filter="icmp6", count=1)
pkt = pkt[0]
b = bytearray(pkt[Raw].load)
b[1] += 1
b += b'\xde\xad\xbe\xef\xde\xad\xbe\xef'
pkt[Raw].load = bytes(b)
pkt[IPv6].plen += 8
# fixup checksum after modifying the payload
pkt[IPv6].payload.cksum -= 0x3b44
if pkt[IPv6].payload.cksum < 0:
pkt[IPv6].payload.cksum += 0xffff
sendp(pkt, iface="veth0")
This should result in DAD failure for any address added to veth0's peer,
but is currently ignored.
Fixes: adc176c54722 ("ipv6 addrconf: Implemented enhanced DAD (RFC7527)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove the following documentation warning
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:103: warning: Excess function parameter 'priv' description in 'stmmac_axi_setup'
It was introduced in commit afea03656add7 ("stmmac: rework DMA bus setting and introduce new platform AXI structure")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A KASAN:use-after-free bug was found related to ip6-erspan
while running selftests/net/ip6_gre_headroom.sh
It happens because of following sequence:
- ipv6hdr pointer is obtained from skb
- skb_cow_head() is called, skb->head memory is reallocated
- old data is accessed using ipv6hdr pointer
skb_cow_head() call was added in e41c7c68ea77 ("ip6erspan: make sure
enough headroom at xmit."), but looking at the history there was a
chance of similar bug because gre_handle_offloads() and pskb_trim()
can also reallocate skb->head memory. Fixes tag points to commit
which introduced possibility of this bug.
This patch moves ipv6hdr pointer assignment after skb_cow_head() call.
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.
Fixes: 735fc4054b3a ("xdp: change ndo_xdp_xmit API to support bulking") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Watson [Thu, 12 Jul 2018 15:03:43 +0000 (08:03 -0700)]
tls: Stricter error checking in zerocopy sendmsg path
In the zerocopy sendmsg() path, there are error checks to revert
the zerocopy if we get any error code. syzkaller has discovered
that tls_push_record can return -ECONNRESET, which is fatal, and
happens after the point at which it is safe to revert the iter,
as we've already passed the memory to do_tcp_sendpages.
Previously this code could return -ENOMEM and we would want to
revert the iter, but AFAIK this no longer returns ENOMEM after a447da7d004 ("tls: fix waitall behavior in tls_sw_recvmsg"),
so we fail for all error codes.
Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson <davejwatson@fb.com> Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Biggers [Wed, 11 Jul 2018 17:46:29 +0000 (10:46 -0700)]
KEYS: DNS: fix parsing multiple options
My recent fix for dns_resolver_preparse() printing very long strings was
incomplete, as shown by syzbot which still managed to hit the
WARN_ONCE() in set_precision() by adding a crafted "dns_resolver" key:
precision 50001 too large
WARNING: CPU: 7 PID: 864 at lib/vsprintf.c:2164 vsnprintf+0x48a/0x5a0
The bug this time isn't just a printing bug, but also a logical error
when multiple options ("#"-separated strings) are given in the key
payload. Specifically, when separating an option string into name and
value, if there is no value then the name is incorrectly considered to
end at the end of the key payload, rather than the end of the current
option. This bypasses validation of the option length, and also means
that specifying multiple options is broken -- which presumably has gone
unnoticed as there is currently only one valid option anyway.
A similar problem also applied to option values, as the kstrtoul() when
parsing the "dnserror" option will read past the end of the current
option and into the next option.
Fix these bugs by correctly computing the length of the option name and
by copying the option value, null-terminated, into a temporary buffer.
Reproducer for "dnserror" option being parsed incorrectly (expected
behavior is to fail when seeing the unknown option "foo", actual
behavior was to read the dnserror value as "1#foo" and fail there):
Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 4a2d789267e0 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
multicast: init as INCLUDE when join SSM INCLUDE group
Based on RFC3376 5.1 and RFC3810 6.1, we should init as INCLUDE when join SSM
INCLUDE group. In my first version I only clear the group change record. But
this is not enough as when a new group join, it will init as EXCLUDE and
trigger an filter mode change in ip/ip6_mc_add_src(), which will clear all
source addresses' sf_crcount. This will prevent early joined address sending
state change records if multi source addresses joined at the same time.
In this v2 patchset, I fixed it by directly initializing the mode to INCLUDE
for SSM JOIN_SOURCE_GROUP. I also split the original patch into two separated
patches for IPv4 and IPv6.
Test: test by myself and customer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>