Eric Dumazet [Thu, 20 Dec 2018 23:28:56 +0000 (15:28 -0800)]
tcp: fix a race in inet_diag_dump_icsk()
Alexei reported use after frees in inet_diag_dump_icsk() [1]
Because we use refcount_set() when various sockets are setup and
inserted into ehash, we also need to make sure inet_diag_dump_icsk()
wont race with the refcount_set() operations.
Jonathan Lemon sent a patch changing net_twsk_hashdance() but
other spots would need risky changes.
Instead, fix inet_diag_dump_icsk() as this bug came with
linux-4.10 only.
[1] Quoting Alexei :
First something iterating over sockets finds already freed tw socket:
Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexei Starovoitov <ast@kernel.org> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 20 Dec 2018 13:20:10 +0000 (21:20 +0800)]
ipv6: frags: Fix bogus skb->sk in reassembled packets
It was reported that IPsec would crash when it encounters an IPv6
reassembled packet because skb->sk is non-zero and not a valid
pointer.
This is because skb->sk is now a union with ip_defrag_offset.
This patch fixes this by resetting skb->sk when exiting from
the reassembly code.
Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: 219badfaade9 ("ipv6: frags: get rid of ip6frag_skb_cb/...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Allan W. Nielsen [Thu, 20 Dec 2018 08:37:17 +0000 (09:37 +0100)]
mscc: Configured MAC entries should be locked.
The MAC table in Ocelot supports auto aging (normal) and static entries.
MAC entries that is manually configured should be static and not subject
to aging.
Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Allan Nielsen <allan.nielsen@microchip.com> Reviewed-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Dec 2018 15:33:09 +0000 (07:33 -0800)]
Merge tag 'kbuild-fixes-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fix from Masahiro Yamada:
"Fix false positive warning/error about missing library for objtool"
* tag 'kbuild-fixes-v4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix false positive warning/error about missing libelf
Linus Torvalds [Thu, 20 Dec 2018 15:30:37 +0000 (07:30 -0800)]
Merge tag 'char-misc-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are three tiny last-minute driver fixes for 4.20-rc8 that resolve
some reported issues, and one MAINTAINERS file update.
All of them are related to the hyper-v subsystem, it seems people are
actually testing and using it now, which is nice to see :)
The fixes are:
- uio_hv_generic: fix for opening multiple times
- Remove PCI dependancy on hyperv drivers
- return proper error code for an unopened channel.
And Sasha has signed up to help out with the hyperv maintainership.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
x86, hyperv: remove PCI dependency
MAINTAINERS: Patch monkey for the Hyper-V code
uio_hv_generic: set callbacks on open
Linus Torvalds [Thu, 20 Dec 2018 15:29:11 +0000 (07:29 -0800)]
Merge tag 'tty-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fix from Greg KH:
"Here is a single fix, a revert, for the 8250 serial driver to resolve
a reported problem.
There was some attempted patches to fix the issue, but people are
arguing about them, so reverting the patch to revert back to the 4.19
and older behavior is the best thing to do at this late in the release
cycle.
The revert has been in linux-next with no reported issues"
* tag 'tty-4.20-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"
Linus Torvalds [Thu, 20 Dec 2018 15:25:31 +0000 (07:25 -0800)]
Merge tag 'mmc-v4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Restore code to allow BKOPS and CACHE ctrl even if no HPI support
- Reset HPI enabled state during re-init
- Use a default minimum timeout when enabling CACHE ctrl
MMC host:
- omap_hsmmc: Fix DMA API warning
- sdhci-tegra: Fix dt parsing of SDMMC pads autocal values
- Correct register accesses when enabling v4 mode"
* tag 'mmc-v4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl
mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support
mmc: core: Reset HPI enabled state during re-init and in case of errors
mmc: omap_hsmmc: fix DMA API warning
mmc: tegra: Fix for SDMMC pads autocal parsing from dt
mmc: sdhci: Fix sdhci_do_enable_v4_mode
The reverted commit added page reference counting to iomap page
structures that are used to track block size < page size state. This
was supposed to align the code with page migration page accounting
assumptions, but what it has done instead is break XFS filesystems.
Every fstests run I've done on sub-page block size XFS filesystems
has since picking up this commit 2 days ago has failed with bad page
state errors such as:
generic/038 454s ...
run fstests generic/038 at 2018-12-20 18:43:05
XFS (sdc): Unmounting Filesystem
XFS (sdc): Mounting V5 Filesystem
XFS (sdc): Ending clean mount
BUG: Bad page state in process kswapd0 pfn:3a7fa
page:ffffea0000ccbeb0 count:0 mapcount:0 mapping:ffff88800d9b6360 index:0x1
flags: 0xfffffc0000000()
raw: 000fffffc0000000dead000000000100dead000000000200ffff88800d9b6360
raw: 0000000000000001000000000000000000000000ffffffff
page dumped because: non-NULL mapping
CPU: 0 PID: 676 Comm: kswapd0 Not tainted 4.20.0-rc6-dgc+ #915
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
Call Trace:
dump_stack+0x67/0x90
bad_page.cold.116+0x8a/0xbd
free_pcppages_bulk+0x4bf/0x6a0
free_unref_page_list+0x10f/0x1f0
shrink_page_list+0x49d/0xf50
shrink_inactive_list+0x19d/0x3b0
shrink_node_memcg.constprop.77+0x398/0x690
? shrink_slab.constprop.81+0x278/0x3f0
shrink_node+0x7a/0x2f0
kswapd+0x34b/0x6d0
? node_reclaim+0x240/0x240
kthread+0x11f/0x140
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x24/0x30
Disabling lock debugging due to kernel taint
....
The failures are from anyway that frees pages and empties the
per-cpu page magazines, so it's not a predictable failure or an easy
to debug failure.
generic/038 is a reliable reproducer of this problem - it has a 9 in
10 failure rate on one of my test machines. Failure on other
machines have been at random points in fstests runs but every run
has ended up tripping this problem. Hence generic/038 was used to
bisect the failure because it was the most reliable failure.
It is too close to the 4.20 release (not to mention holidays) to
try to diagnose, fix and test the underlying cause of the problem,
so reverting the commit is the only option we have right now. The
revert has been tested against a current tot 4.20-rc7+ kernel across
multiple machines running sub-page block size XFs filesystems and
none of the bad page state failures have been seen.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Piotr Jaroszynski <pjaroszynski@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) Off by one in netlink parsing of mac802154_hwsim, from Alexander
Aring.
2) nf_tables RCU usage fix from Taehee Yoo.
3) Flow dissector needs nhoff and thoff clamping, from Stanislav
Fomichev.
4) Missing sin6_flowinfo initialization in SCTP, from Xin Long.
5) Spectrev1 in ipmr and ip6mr, from Gustavo A. R. Silva.
6) Fix r8169 crash when DEBUG_SHIRQ is enabled, from Heiner Kallweit.
7) Fix SKB leak in rtlwifi, from Larry Finger.
8) Fix state pruning in bpf verifier, from Jakub Kicinski.
9) Don't handle completely duplicate fragments as overlapping, from
Michal Kubecek.
10) Fix memory corruption with macb and 64-bit DMA, from Anssi Hannula.
11) Fix TCP fallback socket release in smc, from Myungho Jung.
12) gro_cells_destroy needs to napi_disable, from Lorenzo Bianconi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (130 commits)
rds: Fix warning.
neighbor: NTF_PROXY is a valid ndm_flag for a dump request
net: mvpp2: fix the phylink mode validation
net/sched: cls_flower: Remove old entries from rhashtable
net/tls: allocate tls context using GFP_ATOMIC
iptunnel: make TUNNEL_FLAGS available in uapi
gro_cell: add napi_disable in gro_cells_destroy
lan743x: Remove MAC Reset from initialization
net/mlx5e: Remove the false indication of software timestamping support
net/mlx5: Typo fix in del_sw_hw_rule
net/mlx5e: RX, Fix wrong early return in receive queue poll
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
bnxt_en: Fix ethtool self-test loopback.
net/rds: remove user triggered WARN_ON in rds_sendmsg
net/rds: fix warn in rds_message_alloc_sgs
ath10k: skip sending quiet mode cmd for WCN3990
mac80211: free skb fraglist before freeing the skb
nl80211: fix memory leak if validate_pae_over_nl80211() fails
net/smc: fix TCP fallback socket release
vxge: ensure data0 is initialized in when fetching firmware version information
...
David S. Miller [Thu, 20 Dec 2018 04:53:18 +0000 (20:53 -0800)]
rds: Fix warning.
>> net/rds/send.c:1109:42: warning: Using plain integer as NULL pointer
Fixes: ea010070d0a7 ("net/rds: fix warn in rds_message_alloc_sgs") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 20 Dec 2018 02:38:54 +0000 (18:38 -0800)]
Merge tag 'nfs-for-4.20-6' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix TCP socket disconnection races by ensuring we always call
xprt_disconnect_done() after releasing the socket.
- Fix a race when clearing both XPRT_CONNECTING and XPRT_LOCKED
- Remove xprt_connect_status() so it does not mask errors that should
be handled by call_connect_status()
* tag 'nfs-for-4.20-6' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Remove xprt_connect_status()
SUNRPC: Fix a race with XPRT_CONNECTING
SUNRPC: Fix disconnection races
Linus Torvalds [Thu, 20 Dec 2018 02:27:58 +0000 (18:27 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- One nasty use-after-free bugfix, from this merge window however
- A less nasty use-after-free that can only zero some words at the
beginning of the page, and hence is not really exploitable
- A NULL pointer dereference
- A dummy implementation of an AMD chicken bit MSR that Windows uses
for some unknown reason
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
KVM: X86: Fix NULL deref in vcpu_scan_ioapic
KVM: Fix UAF in nested posted interrupt processing
KVM: fix unregistering coalesced mmio zone from wrong bus
Linus Torvalds [Thu, 20 Dec 2018 02:16:17 +0000 (18:16 -0800)]
Merge tag 'dma-mapping-4.20-4' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Fix a regression in dma-direct that didn't take account the magic AMD
memory encryption mask in the DMA address"
* tag 'dma-mapping-4.20-4' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: do not include SME mask in the DMA supported check
David Ahern [Thu, 20 Dec 2018 00:54:38 +0000 (16:54 -0800)]
neighbor: NTF_PROXY is a valid ndm_flag for a dump request
When dumping proxy entries the dump request has NTF_PROXY set in
ndm_flags. strict mode checking needs to be updated to allow this
flag.
Fixes: 51183d233b5a ("net/neighbor: Update neigh_dump_info for strict data checking") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 19 Dec 2018 17:00:12 +0000 (18:00 +0100)]
net: mvpp2: fix the phylink mode validation
The mvpp2_phylink_validate() sets all modes that are supported by a
given PPv2 port. An mistake made the 10000baseT_Full mode being
advertised in some cases when a port wasn't configured to perform at
10G. This patch fixes this.
Fixes: d97c9f4ab000 ("net: mvpp2: 1000baseX support") Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 19 Dec 2018 16:07:56 +0000 (18:07 +0200)]
net/sched: cls_flower: Remove old entries from rhashtable
When replacing a rule we add the new rule to the rhashtable
but only remove the old if not in skip_sw.
This commit fix this and remove the old rule anyway.
Fixes: 35cc3cefc4de ("net/sched: cls_flower: Reject duplicated rules also under skip_sw") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: df9d4a178022 ("net/tls: sleeping function from invalid context") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Wed, 19 Dec 2018 22:23:00 +0000 (23:23 +0100)]
gro_cell: add napi_disable in gro_cells_destroy
Add napi_disable routine in gro_cells_destroy since starting from
commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
queues") gro_cell_poll and gro_cells_destroy can run concurrently on
napi_skbs list producing a kernel Oops if the tunnel interface is
removed while gro_cell_poll is running. The following Oops has been
triggered removing a vxlan device while the interface is receiving
traffic
Fixes: c42858eaf492 ("gro_cells: remove spinlock protecting receive queues") Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Whitehead [Wed, 19 Dec 2018 21:55:15 +0000 (16:55 -0500)]
lan743x: Remove MAC Reset from initialization
The MAC Reset was noticed to erase important EEPROM settings.
It is also unnecessary since a chip wide reset was done earlier
in initialization, and that reset preserves EEPROM settings.
There for this patch removes the unnecessary MAC specific reset.
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Sun, 2 Dec 2018 13:45:53 +0000 (15:45 +0200)]
net/mlx5e: RX, Fix wrong early return in receive queue poll
When the completion queue of the RQ is empty, do not immediately return.
If left-over decompressed CQEs (from the previous cycle) were processed,
need to go to the finalization part of the poll function.
Bug exists only when CQE compression is turned ON.
This solves the following issue:
mlx5_core 0000:82:00.1: mlx5_eq_int:544:(pid 0): CQ error on CQN 0xc08, syndrome 0x1
mlx5_core 0000:82:00.1 p4p2: mlx5e_cq_error_event: cqn=0x000c08 event=0x04
Fixes: 4b7dfc992514 ("net/mlx5e: Early-return on empty completion queues") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cong Wang [Wed, 19 Dec 2018 05:17:44 +0000 (21:17 -0800)]
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
syzbot reported the use of uninitialized udp6_addr::sin6_scope_id.
We can just set ::sin6_scope_id to zero, as tunnels are unlikely
to use an IPv6 address that needs a scope id and there is no
interface to bind in this context.
For net-next, it looks different as we have cfg->bind_ifindex there
so we can probably call ipv6_iface_scope_id().
Same for ::sin6_flowinfo, tunnels don't use it.
Fixes: 8024e02879dd ("udp: Add udp_sock_create for UDP tunnels to open listener socket") Reported-by: syzbot+c56449ed3652e6720f30@syzkaller.appspotmail.com Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 19 Dec 2018 18:46:50 +0000 (13:46 -0500)]
bnxt_en: Fix ethtool self-test loopback.
The current code has 2 problems. It assumes that the RX ring for
the loopback packet is combined with the TX ring. This is not
true if the ethtool channels are set to non-combined mode. The
second problem is that it won't work on 57500 chips without
adjusting the logic to get the proper completion ring (cpr) pointer.
Fix both issues by locating the proper cpr pointer through the RX
ring.
Fixes: e44758b78ae8 ("bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2018 18:27:58 +0000 (10:27 -0800)]
Merge branch 'rds-fixes'
Shamir Rabinovitch says:
====================
WARNING in rds_message_alloc_sgs
This patch set fix google syzbot rds bug found in linux-next.
The first patch solve the syzbot issue.
The second patch fix issue mentioned by Leon Romanovsky that
drivers should not call WARN_ON as result from user input.
syzbot bug report can be foud here: https://lkml.org/lkml/2018/10/31/28
v1->v2:
- patch 1: make rds_iov_vector fields name more descriptive (Hakon)
- patch 1: fix potential mem leak in rds_rm_size if krealloc fail
(Hakon)
v2->v3:
- patch 2: harden rds_sendmsg for invalid number of sgs (Gerd)
v3->v4
- Santosh a.b. on both patches + repost to net-dev
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/rds: remove user triggered WARN_ON in rds_sendmsg
per comment from Leon in rdma mailing list
https://lkml.org/lkml/2018/10/31/312 :
Please don't forget to remove user triggered WARN_ON.
https://lwn.net/Articles/769365/
"Greg Kroah-Hartman raised the problem of core kernel API code that will
use WARN_ON_ONCE() to complain about bad usage; that will not generate
the desired result if WARN_ON_ONCE() is configured to crash the machine.
He was told that the code should just call pr_warn() instead, and that
the called function should return an error in such situations. It was
generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
triggered from user space need to be fixed."
in addition harden rds_sendmsg to detect and overcome issues with
invalid sg count and fail the sendmsg.
Suggested-by: Leon Romanovsky <leon@kernel.org> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
redundant copy_from_user in rds_sendmsg system call expose rds
to issue where rds_rdma_extra_size walk the rds iovec and and
calculate the number pf pages (sgs) it need to add to the tail of
rds message and later rds_cmsg_rdma_args copy the rds iovec again
and re calculate the same number and get different result causing
WARN_ON in rds_message_alloc_sgs.
fix this by doing the copy_from_user only once per rds_sendmsg
system call.
Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2018 16:34:46 +0000 (08:34 -0800)]
Merge tag 'mac80211-for-davem-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just three fixes:
* fix a memory leak in an error path
* fix TXQs in interface teardown
* free fraglist if we used it internally
before returning SKB
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As the kernel must run in the memory chunk with the lowest address,
ST-RAM is ignored, and removed from the m68k_memory[] array.
However, it is not removed from memblock, causing a crash later.
More investigation shows that there are 3 places where memory chunks are
ignored, all after the calls to memblock_add() in m68k_parse_bootinfo(),
and thus causing crashes:
1. On classic m68k CPUs with a MMU, paging_init() ignores all memory
chunks below the first chunk, cfr. above,
2. On Amigas equipped with a Zorro III bus, config_amiga() ignores all
Zorro II memory,
3. If CONFIG_SINGLE_MEMORY_CHUNK=y, m68k_parse_bootinfo() ignores all
but the first memory chunk.
Fix this by moving the calls to memblock_add() from
m68k_parse_bootinfo() to paging_init(), after all ignored memory chunks
have been removed from m68k_memory[].
Reported-by: Andreas Schwab <schwab@linux-m68k.org> Fixes: 1008a11590b966b4 ("m68k: switch to MEMBLOCK + NO_BOOTMEM") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Masahiro Yamada [Tue, 18 Dec 2018 05:25:41 +0000 (14:25 +0900)]
kbuild: fix false positive warning/error about missing libelf
For the same reason as commit 25896d073d8a ("x86/build: Fix compiler
support check for CONFIG_RETPOLINE"), you cannot put this $(error ...)
into the parse stage of the top Makefile.
Perhaps I'd propose a more sophisticated solution later, but this is
the best I can do for now.
Link: https://lkml.org/lkml/2017/12/25/211 Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Reported-by: Bernd Edlinger <bernd.edlinger@hotmail.de> Reported-by: Qian Cai <cai@lca.pw> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Tested-by: Qian Cai <cai@lca.pw>
Rakesh Pillai [Fri, 14 Dec 2018 10:17:46 +0000 (12:17 +0200)]
ath10k: skip sending quiet mode cmd for WCN3990
HL2.0 firmware does not support setting quiet mode. If the host driver sends
the quiet mode setting command to the HL2.0 firmware, it crashes with the below
signature.
The quiet mode command support is exposed by the firmware via thermal throttle
wmi service. Enable ath10k thermal support if thermal throttle wmi service bit
is set. 10.x firmware versions support this feature by default, but
unfortunately do not advertise the support via service flags, hence have to
manually set the service flag in ath10k_core_compat_services().
Tested on QCA988X with 10.2.4.70.9-2. Also tested on WCN3990.
Co-developed-by: Govind Singh <govinds@codeaurora.org> Co-developed-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Rakesh Pillai <pillair@codeaurora.org> Signed-off-by: Govind Singh <govinds@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Sara Sharon [Sat, 15 Dec 2018 09:03:06 +0000 (11:03 +0200)]
mac80211: free skb fraglist before freeing the skb
mac80211 uses the frag list to build AMSDU. When freeing
the skb, it may not be really freed, since someone is still
holding a reference to it.
In that case, when TCP skb is being retransmitted, the
pointer to the frag list is being reused, while the data
in there is no longer valid.
Since we will never get frag list from the network stack,
as mac80211 doesn't advertise the capability, we can safely
free and nullify it before releasing the SKB.
Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Myungho Jung [Tue, 18 Dec 2018 17:02:25 +0000 (09:02 -0800)]
net/smc: fix TCP fallback socket release
clcsock can be released while kernel_accept() references it in TCP
listen worker. Also, clcsock needs to wake up before released if TCP
fallback is used and the clcsock is blocked by accept. Add a lock to
safely release clcsock and call kernel_sock_shutdown() to wake up
clcsock from accept in smc_release().
Reported-by: syzbot+0bf2e01269f1274b4b03@syzkaller.appspotmail.com Reported-by: syzbot+e3132895630f957306bc@syzkaller.appspotmail.com Signed-off-by: Myungho Jung <mhjungk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 18 Dec 2018 15:19:47 +0000 (15:19 +0000)]
vxge: ensure data0 is initialized in when fetching firmware version information
Currently variable data0 is not being initialized so a garbage value is
being passed to vxge_hw_vpath_fw_api and this value is being written to
the rts_access_steer_data0 register. There are other occurrances where
data0 is being initialized to zero (e.g. in function
vxge_hw_upgrade_read_version) so I think it makes sense to ensure data0
is initialized likewise to 0.
Detected by CoverityScan, CID#140696 ("Uninitialized scalar variable")
Fixes: 8424e00dfd52 ("vxge: serialize access to steering control register") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Juergen Gross [Tue, 18 Dec 2018 15:06:19 +0000 (16:06 +0100)]
xen/netfront: tolerate frags with no data
At least old Xen net backends seem to send frags with no real data
sometimes. In case such a fragment happens to occur with the frag limit
already reached the frontend will BUG currently even if this situation
is easily recoverable.
Modify the BUG_ON() condition accordingly.
Tested-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kunihiko Hayashi [Tue, 18 Dec 2018 07:57:04 +0000 (16:57 +0900)]
net: phy: Fix the issue that netif always links up after resuming
Even though the link is down before entering hibernation,
there is an issue that the network interface always links up after resuming
from hibernation.
If the link is still down before enabling the network interface,
and after resuming from hibernation, the phydev->state is forcibly set
to PHY_UP in mdio_bus_phy_restore(), and the link becomes up.
In suspend sequence, only if the PHY is attached, mdio_bus_phy_suspend()
calls phy_stop_machine(), and mdio_bus_phy_resume() calls
phy_start_machine().
In resume sequence, it's enough to do the same as mdio_bus_phy_resume()
because the state has been preserved.
This patch fixes the issue by calling phy_start_machine() in
mdio_bus_phy_restore() in the same way as mdio_bus_phy_resume().
Fixes: bc87922ff59d ("phy: Move PHY PM operations into phy_device") Suggested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Martinsen [Tue, 18 Dec 2018 05:38:22 +0000 (05:38 +0000)]
lan78xx: Resolve issue with changing MAC address
Current state for the lan78xx driver does not allow for changing the
MAC address of the interface, without either removing the module (if
you compiled it that way) or rebooting the machine. If you attempt to
change the MAC address, ifconfig will show the new address, however,
the system/interface will not respond to any traffic using that
configuration. A few short-term options to work around this are to
unload the module and reload it with the new MAC address, change the
interface to "promisc", or reboot with the correct configuration to
change the MAC.
This patch enables the ability to change the MAC address via fairly normal means...
ifdown <interface>
modify entry in /etc/network/interfaces OR a similar method
ifup <interface>
Then test via any network communication, such as ICMP requests to gateway.
My only test platform for this patch has been a raspberry pi model 3b+.
Signed-off-by: Jason Martinsen <jasonmartinsen@msn.com>
-----
Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Whitehead [Mon, 17 Dec 2018 21:44:50 +0000 (16:44 -0500)]
lan743x: Expand phy search for LAN7431
The LAN7431 uses an external phy, and it can be found anywhere in
the phy address space. This patch uses phy address 1 for LAN7430
only. And searches all addresses otherwise.
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Dec 2018 05:18:26 +0000 (21:18 -0800)]
Merge branch 'vxlan-Various-fixes'
Petr Machata says:
====================
vxlan: Various fixes
This patch set contains three fixes for the vxlan driver.
Patch #1 fixes handling of offload mark on replaced VXLAN FDB entries. A
way to trigger this is to replace the FDB entry with one that can not be
offloaded. A future patch set should make it possible to veto such FDB
changes. However the FDB might still fail to be offloaded due to another
issue, and the offload mark should reflect that.
Patch #2 fixes problems in __vxlan_dev_create() when a call to
rtnl_configure_link() fails. These failures would be tricky to hit on a
real system, the most likely vector is through an error in vxlan_open().
However, with the abovementioned vetoing patchset, vetoing the created
entry would trigger the same problems (and be easier to reproduce).
Patch #3 fixes a problem in vxlan_changelink(). In situations where the
default remote configured in the FDB table (if any) does not exactly
match the remote address configured at the VXLAN device, changing the
remote address breaks the default FDB entry. Patch #4 is then a self
test for this issue.
v3:
- Patch #2:
- Reuse the same errout block for both cleanup paths. Use a bool to
decide whether the unregister_netdevice() call should be made.
v2:
- Drop former patch #3
- Patch #2:
- Delete the default entry before calling unregister_netdevice(). That
takes care of former patch #3, hence tweak the commit message to
mention that problem as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 18 Dec 2018 13:16:02 +0000 (13:16 +0000)]
vxlan: changelink: Fix handling of default remotes
Default remotes are stored as FDB entries with an Ethernet address of
00:00:00:00:00:00. When a request is made to change a remote address of
a VXLAN device, vxlan_changelink() first deletes the existing default
remote, and then creates a new FDB entry.
This works well as long as the list of default remotes matches exactly
the configuration of a VXLAN remote address. Thus when the VXLAN device
has a remote of X, there should be exactly one default remote FDB entry
X. If the VXLAN device has no remote address, there should be no such
entry.
Besides using "ip link set", it is possible to manipulate the list of
default remotes by using the "bridge fdb". It is therefore easy to break
the above condition. Under such circumstances, the __vxlan_fdb_delete()
call doesn't delete the FDB entry itself, but just one remote. The
following vxlan_fdb_create() then creates a new FDB entry, leading to a
situation where two entries exist for the address 00:00:00:00:00:00,
each with a different subset of default remotes.
An even more obvious breakage rooted in the same cause can be observed
when a remote address is configured for a VXLAN device that did not have
one before. In that case vxlan_changelink() doesn't remove any remote,
and just creates a new FDB entry for the new address:
$ ip link add name vx up type vxlan id 2000 dstport 4789
$ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
$ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
$ ip link set dev vx type vxlan remote 192.0.2.30
$ bridge fdb sh dev vx | grep 00:00:00:00:00:00
00:00:00:00:00:00 dst 192.0.2.30 self permanent <- new entry, 1 rdst
00:00:00:00:00:00 dst 192.0.2.20 self permanent <- orig. entry, 2 rdsts
00:00:00:00:00:00 dst 192.0.2.30 self permanent
To fix this, instead of calling vxlan_fdb_create() directly, defer to
vxlan_fdb_update(). That has logic to handle the duplicates properly.
Additionally, it also handles notifications, so drop that call from
changelink as well.
Fixes: 0241b836732f ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 18 Dec 2018 13:16:00 +0000 (13:16 +0000)]
vxlan: Fix error path in __vxlan_dev_create()
When a failure occurs in rtnl_configure_link(), the current code
calls unregister_netdevice() to roll back the earlier call to
register_netdevice(), and jumps to errout, which calls
vxlan_fdb_destroy().
However unregister_netdevice() calls transitively ndo_uninit, which is
vxlan_uninit(), and that already takes care of deleting the default FDB
entry by calling vxlan_fdb_delete_default(). Since the entry added
earlier in __vxlan_dev_create() is exactly the default entry, the
cleanup code in the errout block always leads to double free and thus a
panic.
Besides, since vxlan_fdb_delete_default() always destroys the FDB entry
with notification enabled, the deletion of the default entry is notified
even before the addition was notified.
Instead, move the unregister_netdevice() call after the manual destroy,
which solves both problems.
Fixes: 0241b836732f ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Tue, 18 Dec 2018 13:15:59 +0000 (13:15 +0000)]
vxlan: Unmark offloaded bit on replaced FDB entries
When rdst of an offloaded FDB entry is replaced, it certainly isn't
offloaded anymore. Drivers are notified about such replacements, and can
re-mark the entry as offloaded again if they so wish. However until a
driver does so explicitly, assume a replaced FDB entry is not offloaded.
Note that replaces coming via vxlan_fdb_external_learn_add() are always
immediately followed by an explicit offload marking.
Fixes: 0efe11733356 ("vxlan: Support marking RDSTs as offloaded") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Here are a couple of race condition fixes for the macb driver. The first
two are for issues observed at runtime on real HW.
v2:
- added received Tested-bys and Acked-bys to the first two patches
- in patch 3/3, moved the timestamp protection barrier closer to the
timestamp reads
- in patch 3/3, removed unnecessary move of the addr assignment in
gem_rx() to keep the patch minimal for maximum clarity
- in patch 3/3, clarified commit message and comments
The 3/3 is the same one I improperly sent last week as a standalone
patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Mon, 17 Dec 2018 13:05:41 +0000 (15:05 +0200)]
net: macb: add missing barriers when reading descriptors
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptors have
been populated, i.e. the ownership has been transferred. However, there
are no memory barriers to ensure that the data protected by the
RX_USED/TX_USED bit is up-to-date with respect to that bit.
Specifically:
- TX timestamp descriptors may be loaded before ctrl is loaded for the
TX_USED check, which is racy as the descriptors may be updated between
the loads, causing old timestamp descriptor data to be used.
- RX ctrl may be loaded before addr is loaded for the RX_USED check,
which is racy as a new frame may be written between the loads, causing
old ctrl descriptor data to be used.
This issue exists for both macb_rx() and gem_rx() variants.
Fix the races by adding DMA read memory barriers on those paths and
reordering the reads in macb_rx().
I have not observed any actual problems in practice caused by these
being missing, though.
Tested on a ZynqMP based system.
Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Mon, 17 Dec 2018 13:05:40 +0000 (15:05 +0200)]
net: macb: fix dropped RX frames due to a race
Bit RX_USED set to 0 in the address field allows the controller to write
data to the receive buffer descriptor.
The driver does not ensure the ctrl field is ready (cleared) when the
controller sees the RX_USED=0 written by the driver. The ctrl field might
only be cleared after the controller has already updated it according to
a newly received frame, causing the frame to be discarded in gem_rx() due
to unexpected ctrl field contents.
A message is logged when the above scenario occurs:
macb ff0b0000.ethernet eth0: not whole frame pointed by descriptor
Fix the issue by ensuring that when the controller sees RX_USED=0 the
ctrl field is already cleared.
This issue was observed on a ZynqMP based system.
Fixes: 4df95131ea80 ("net/macb: change RX path for GEM") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anssi Hannula [Mon, 17 Dec 2018 13:05:39 +0000 (15:05 +0200)]
net: macb: fix random memory corruption on RX with 64-bit DMA
64-bit DMA addresses are split in upper and lower halves that are
written in separate fields on GEM. For RX, bit 0 of the address is used
as the ownership bit (RX_USED). When the RX_USED bit is unset the
controller is allowed to write data to the buffer.
The driver does not guarantee that the controller already sees the upper
half when the RX_USED bit is cleared, possibly resulting in the
controller writing an incoming frame to an address with an incorrect
upper half and therefore possibly corrupting unrelated system memory.
Fix that by adding the necessary DMA memory barrier between the writes.
This corruption was observed on a ZynqMP based system.
Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM") Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Acked-by: Harini Katakam <harini.katakam@xilinx.com> Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com> Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Michal Simek <michal.simek@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Mon, 17 Dec 2018 10:26:38 +0000 (11:26 +0100)]
net: Use __kernel_clockid_t in uapi net_stamp.h
Herton reports the following error when building a userspace program that
includes net_stamp.h:
In file included from foo.c:2:
/usr/include/linux/net_tstamp.h:158:2: error: unknown type name
‘clockid_t’
clockid_t clockid; /* reference clockid */
^~~~~~~~~
Fix it by using __kernel_clockid_t in place of clockid_t.
Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.") Cc: Timothy Redaelli <tredaelli@redhat.com> Reported-by: Herton R. Krzesinski <herton@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Beznea [Mon, 17 Dec 2018 10:02:42 +0000 (10:02 +0000)]
net: macb: restart tx after tx used bit read
On some platforms (currently detected only on SAMA5D4) TX might stuck
even the pachets are still present in DMA memories and TX start was
issued for them. This happens due to race condition between MACB driver
updating next TX buffer descriptor to be used and IP reading the same
descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
is asserted TX must be restarted. Restart TX if used bit is read and
packets are present in software TX queue. Packets are removed from software
TX queue if TX was successful for them (see macb_tx_interrupt()).
Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 17 Dec 2018 08:06:06 +0000 (11:06 +0300)]
net: stmmac: Fix an error code in probe()
The function should return an error if create_singlethread_workqueue()
fails.
Fixes: 34877a15f787 ("net: stmmac: Rework and fix TX Timeout code") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 17 Dec 2018 07:25:12 +0000 (23:25 -0800)]
tipc: check group dests after tipc_wait_for_cond()
Similar to commit 143ece654f9f ("tipc: check tsk->group in tipc_wait_for_cond()")
we have to reload grp->dests too after we re-take the sock lock.
This means we need to move the dsts check after tipc_wait_for_cond()
too.
Fixes: 75da2163dbb6 ("tipc: introduce communication groups") Reported-and-tested-by: syzbot+99f20222fc5018d2b97a@syzkaller.appspotmail.com Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 17 Dec 2018 07:05:13 +0000 (10:05 +0300)]
qed: Fix an error code qed_ll2_start_xmit()
We accidentally deleted the code to set "rc = -ENOMEM;" and this patch
adds it back.
Fixes: d2201a21598a ("qed: No need for LL2 frags indication") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 11 Dec 2018 16:32:28 +0000 (17:32 +0100)]
net: mvpp2: 10G modes aren't supported on all ports
The mvpp2_phylink_validate() function sets all modes that are
supported by a given PPv2 port. A recent change made all ports to
advertise they support 10G modes in certain cases. This is not true,
as only the port #0 can do so. This patch fixes it.
Fixes: 01b3fd5ac97c ("net: mvpp2: fix detection of 10G SFP modules") Cc: Baruch Siach <baruch@tkos.co.il> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Habkost [Tue, 18 Dec 2018 00:34:18 +0000 (22:34 -0200)]
kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs
Some guests OSes (including Windows 10) write to MSR 0xc001102c
on some cases (possibly while trying to apply a CPU errata).
Make KVM ignore reads and writes to that MSR, so the guest won't
crash.
The MSR is documented as "Execution Unit Configuration (EX_CFG)",
at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
15h Models 00h-0Fh Processors".
Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr
and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
However, irqchip is not initialized by this simple testcase, ioapic/apic
objects should not be accessed.
This patch fixes it by also considering whether or not apic is present.
Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cfir Cohen [Tue, 18 Dec 2018 16:18:41 +0000 (08:18 -0800)]
KVM: Fix UAF in nested posted interrupt processing
nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It
caches the kmap()ed page object and pointer, however, it doesn't handle
errors correctly: it's possible to cache a valid pointer, then release
the page and later dereference the dangling pointer.
I was able to reproduce with the following steps:
1. Call vmlaunch with valid posted_intr_desc_addr but an invalid
MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed
pi_desc_page and pi_desc. Later the invalid EFER value fails
check_vmentry_postreqs() which fails the first vmlaunch.
2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr
(I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages
pi_desc_page is unmapped and released and pi_desc_page is set to NULL
(the "shouldn't happen" clause). Due to the invalid
posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and
nested_get_vmcs12_pages() returns. It doesn't return an error value so
vmlaunch proceeds. Note that at this time we have a dangling pointer in
vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs.
3. Issue an IPI in L2 guest code. This triggers a call to
vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which
dereferences the dangling pointer.
Vulnerable code requires nested and enable_apicv variables to be set to
true. The host CPU must also support posted interrupts.
Fixes: 5e2f30b756a37 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Eric Biggers [Mon, 17 Dec 2018 17:36:19 +0000 (09:36 -0800)]
KVM: fix unregistering coalesced mmio zone from wrong bus
If you register a kvm_coalesced_mmio_zone with '.pio = 0' but then
unregister it with '.pio = 1', KVM_UNREGISTER_COALESCED_MMIO will try to
unregister it from KVM_PIO_BUS rather than KVM_MMIO_BUS, which is a
no-op. But it frees the kvm_coalesced_mmio_dev anyway, causing a
use-after-free.
Fix it by only unregistering and freeing the zone if the correct value
of 'pio' is provided.
Reported-by: syzbot+f87f60bb6f13f39b54e3@syzkaller.appspotmail.com Fixes: 0804c849f1df ("kvm/x86 : add coalesced pio support") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jorgen Hansen [Tue, 18 Dec 2018 08:34:06 +0000 (00:34 -0800)]
VSOCK: Send reset control packet when socket is partially bound
If a server side socket is bound to an address, but not in the listening
state yet, incoming connection requests should receive a reset control
packet in response. However, the function used to send the reset
silently drops the reset packet if the sending socket isn't bound
to a remote address (as is the case for a bound socket not yet in
the listening state). This change fixes this by using the src
of the incoming packet as destination for the reset packet in
this case.
Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Vishnu Dasa <vdasa@vmware.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Fix error return code in xfrm_output_one()
when no dst_entry is attached to the skb.
From Wei Yongjun.
2) The xfrm state hash bucket count reported to
userspace is off by one. Fix from Benjamin Poirier.
3) Fix NULL pointer dereference in xfrm_input when
skb_dst_force clears the dst_entry.
4) Fix freeing of xfrm states on acquire. We use a
dedicated slab cache for the xfrm states now,
so free it properly with kmem_cache_free.
From Mathias Krause.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 18 Dec 2018 17:38:34 +0000 (09:38 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes: The t10-pi one is a regression from the 4.19 release, the
qla2xxx one is a 4.20 merge window regression and the bnx2fc is a very
old bug"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: t10-pi: Return correct ref tag when queue has no integrity profile
scsi: bnx2fc: Fix NULL dereference in error handling
Revert "scsi: qla2xxx: Fix NVMe Target discovery"
====================
mlxsw: VXLAN and firmware flashing fixes
Patch #1 fixes firmware flashing failures by increasing the time period
after which the driver fails the transaction with the firmware. The
problem is explained in detail in the commit message.
Patch #2 adds a missing trap for decapsulated ARP packets. It is
necessary for VXLAN routing to work.
Patch #3 fixes a memory leak during driver reload caused by NULLing a
pointer before kfree().
Please consider patch #1 for 4.19.y
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 18 Dec 2018 15:59:22 +0000 (15:59 +0000)]
mlxsw: spectrum: Add trap for decapsulated ARP packets
After a packet was decapsulated it is classified to the relevant FID
based on its VNI and undergoes L2 forwarding.
Unlike regular (non-encapsulated) ARP packets, Spectrum does not trap
decapsulated ARP packets during L2 forwarding and instead can only trap
such packets in the underlay router during decapsulation.
Add this missing packet trap, which is required for VXLAN routing when
the MAC of the target host is not known.
Fixes: b02597d513a9 ("mlxsw: spectrum: Add NVE packet traps") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Tue, 18 Dec 2018 15:59:20 +0000 (15:59 +0000)]
mlxsw: core: Increase timeout during firmware flash process
During the firmware flash process, some of the EMADs get timed out, which
causes the driver to send them again with a limit of 5 retries. There are
some situations in which 5 retries is not enough and the EMAD access fails.
If the failed EMAD was related to the flashing process, the driver fails
the flashing.
The reason for these timeouts during firmware flashing is cache misses in
the CPU running the firmware. In case the CPU needs to fetch instructions
from the flash when a firmware is flashed, it needs to wait for the
flashing to complete. Since flashing takes time, it is possible for pending
EMADs to timeout.
Fix by increasing EMADs' timeout while flashing firmware.
Fixes: ce6ef68f433f ("mlxsw: spectrum: Implement the ethtool flash_device callback") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Trond Myklebust [Mon, 17 Dec 2018 22:33:33 +0000 (17:33 -0500)]
SUNRPC: Remove xprt_connect_status()
Over the years, xprt_connect_status() has been superseded by
call_connect_status(), which now handles all the errors that
xprt_connect_status() does and more. Since the latter converts
all errors that it doesn't recognise to EIO, then it is time
for it to be retired.
Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
Trond Myklebust [Mon, 17 Dec 2018 22:38:51 +0000 (17:38 -0500)]
SUNRPC: Fix a race with XPRT_CONNECTING
Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that
we don't have races between the (asynchronous) socket setup code and
tasks in xprt_connect().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
Trond Myklebust [Mon, 17 Dec 2018 18:34:59 +0000 (13:34 -0500)]
SUNRPC: Fix disconnection races
When the socket is closed, we need to call xprt_disconnect_done() in order
to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks.
However, we also want to ensure that we don't wake them up before the socket
is closed, since that would cause thundering herd issues with everyone
piling up to retransmit before the TCP shutdown dance has completed.
Only the task that holds XPRT_LOCKED needs to wake up early in order to
allow the close to complete.
Reported-by: Dave Wysochanski <dwysocha@redhat.com> Reported-by: Scott Mayhew <smayhew@redhat.com> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
Sara Sharon [Sat, 15 Dec 2018 09:03:10 +0000 (11:03 +0200)]
mac80211: fix a kernel panic when TXing after TXQ teardown
Recently TXQ teardown was moved earlier in ieee80211_unregister_hw(),
to avoid a use-after-free of the netdev data. However, interfaces
aren't fully removed at the point, and cfg80211_shutdown_all_interfaces
can for example, TX a deauth frame. Move the TXQ teardown to the
point between cfg80211_shutdown_all_interfaces and the free of
netdev queues, so we can be sure they are torn down before netdev
is freed, but after there is no ongoing TX.
Fixes: 77cfaf52eca5 ("mac80211: Run TXQ teardown code before de-registering interfaces") Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Vivien Didelot [Mon, 17 Dec 2018 21:05:21 +0000 (16:05 -0500)]
net: dsa: mv88e6xxx: set ethtool regs version
Currently the ethtool_regs version is set to 0 for all DSA drivers.
Use this field to store the chip ID to simplify the pretty dump of
any interfaces registered by the "dsa" driver.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Mon, 17 Dec 2018 17:24:00 +0000 (12:24 -0500)]
net: add missing SOF_TIMESTAMPING_OPT_ID support
SOF_TIMESTAMPING_OPT_ID is supported on TCP, UDP and RAW sockets.
But it was missing on RAW with IPPROTO_IP, PF_PACKET and CAN.
Add skb_setup_tx_timestamp that configures both tx_flags and tskey
for these paths that do not need corking or use bytestream keys.
Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Mon, 17 Dec 2018 17:23:59 +0000 (12:23 -0500)]
ipv6: add missing tx timestamping on IPPROTO_RAW
Raw sockets support tx timestamping, but one case is missing.
IPPROTO_RAW takes a separate packet construction path. raw_send_hdrinc
has an explicit call to sock_tx_timestamp, but rawv6_send_hdrinc does
not. Add it.
Fixes: 11878b40ed5c ("net-timestamp: SOCK_RAW and PING timestamping") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 17 Dec 2018 19:39:57 +0000 (11:39 -0800)]
security: don't use a negative Opt_err token index
The code uses a bitmap to check for duplicate tokens during parsing, and
that doesn't work at all for the negative Opt_err token case.
There is absolutely no reason to make Opt_err be negative, and in fact
it only confuses things, since some of the affected functions actually
return a positive Opt_xyz enum _or_ a regular negative error code (eg
-EINVAL), and using -1 for Opt_err makes no sense.
There are similar problems in ima_policy.c and key encryption, but they
don't have the immediate bug wrt bitmap handing, and ima_policy.c in
particular needs a different patch to make the enum values match the
token array index. Mimi is sending that separately.
Reported-by: syzbot+a22e0dc07567662c50bc@syzkaller.appspotmail.com Reported-by: Eric Biggers <ebiggers@kernel.org> Fixes: 5208cc83423d ("keys, trusted: fix: *do not* allow duplicate key options") Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]") Cc: James Morris James Morris <jmorris@namei.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Sun, 16 Dec 2018 15:04:40 +0000 (00:04 +0900)]
bpf: promote bpf_perf_event.h to mandatory UAPI header
Since commit c895f6f703ad ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type"), all architectures
(except um) are required to have bpf_perf_event.h in uapi/asm.
Add it to mandatory-y so "make headers_install" can check it.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Lendacky, Thomas [Mon, 17 Dec 2018 14:39:16 +0000 (14:39 +0000)]
dma-direct: do not include SME mask in the DMA supported check
The dma_direct_supported() function intends to check the DMA mask against
specific values. However, the phys_to_dma() function includes the SME
encryption mask, which defeats the intended purpose of the check. This
results in drivers that support less than 48-bit DMA (SME encryption mask
is bit 47) from being able to set the DMA mask successfully when SME is
active, which results in the driver failing to initialize.
Change the function used to check the mask from phys_to_dma() to
__phys_to_dma() so that the SME encryption mask is not part of the check.
Fixes: c1d0af1a1d5d ("kernel/dma/direct: take DMA offset into account in dma_direct_supported") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Paul Burton [Sun, 16 Dec 2018 20:10:01 +0000 (20:10 +0000)]
Revert "serial: 8250: Fix clearing FIFOs in RS485 mode again"
Commit f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode
again") makes a change to FIFO clearing code which its commit message
suggests was intended to be specific to use with RS485 mode, however:
1) The change made does not just affect __do_stop_tx_rs485(), it also
affects other uses of serial8250_clear_fifos() including paths for
starting up, shutting down or auto-configuring a port regardless of
whether it's an RS485 port or not.
2) It makes the assumption that resetting the FIFOs is a no-op when
FIFOs are disabled, and as such it checks for this case & explicitly
avoids setting the FIFO reset bits when the FIFO enable bit is
clear. A reading of the PC16550D manual would suggest that this is
OK since the FIFO should automatically be reset if it is later
enabled, but we support many 16550-compatible devices and have never
required this auto-reset behaviour for at least the whole git era.
Starting to rely on it now seems risky, offers no benefit, and
indeed breaks at least the Ingenic JZ4780's UARTs which reads
garbage when the RX FIFO is enabled if we don't explicitly reset it.
3) By only resetting the FIFOs if they're enabled, the behaviour of
serial8250_do_startup() during boot now depends on what the value of
FCR is before the 8250 driver is probed. This in itself seems
questionable and leaves us with FCR=0 & no FIFO reset if the UART
was used by 8250_early, otherwise it depends upon what the
bootloader left behind.
4) Although the naming of serial8250_clear_fifos() may be unclear, it
is clear that callers of it expect that it will disable FIFOs. Both
serial8250_do_startup() & serial8250_do_shutdown() contain comments
to that effect, and other callers explicitly re-enable the FIFOs
after calling serial8250_clear_fifos(). The premise of that patch
that disabling the FIFOs is incorrect therefore seems wrong.
For these reasons, this reverts commit f6aa5beb45be ("serial: 8250: Fix
clearing FIFOs in RS485 mode again").
Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: f6aa5beb45be ("serial: 8250: Fix clearing FIFOs in RS485 mode again"). Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Daniel Jedrychowski <avistel@gmail.com> Cc: Marek Vasut <marex@denx.de> Cc: linux-mips@vger.kernel.org Cc: linux-serial@vger.kernel.org Cc: stable <stable@vger.kernel.org> # 4.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulf Hansson [Mon, 10 Dec 2018 16:52:38 +0000 (17:52 +0100)]
mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrl
Some eMMCs from Micron have been reported to need ~800 ms timeout, while
enabling the CACHE ctrl after running sudden power failure tests. The
needed timeout is greater than what the card specifies as its generic CMD6
timeout, through the EXT_CSD register, hence the problem.
Normally we would introduce a card quirk to extend the timeout for these
specific Micron cards. However, due to the rather complicated debug process
needed to find out the error, let's simply use a minimum timeout of 1600ms,
the double of what has been reported, for all cards when enabling CACHE
ctrl.
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Reported-by: Andreas Dannenberg <dannenberg@ti.com> Reported-by: Faiz Abbas <faiz_abbas@ti.com> Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 10 Dec 2018 16:52:37 +0000 (17:52 +0100)]
mmc: core: Allow BKOPS and CACHE ctrl even if no HPI support
In commit 5320226a0512 ("mmc: core: Disable HPI for certain Hynix eMMC
cards"), then intent was to prevent HPI from being used for some eMMC
cards, which didn't properly support it. However, that went too far, as
even BKOPS and CACHE ctrl became prevented. Let's restore those parts and
allow BKOPS and CACHE ctrl even if HPI isn't supported.
Fixes: 5320226a0512 ("mmc: core: Disable HPI for certain Hynix eMMC cards") Cc: Pratibhasagar V <pratibha@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ulf Hansson [Mon, 10 Dec 2018 16:52:36 +0000 (17:52 +0100)]
mmc: core: Reset HPI enabled state during re-init and in case of errors
During a re-initialization of the eMMC card, we may fail to re-enable HPI.
In these cases, that isn't properly reflected in the card->ext_csd.hpi_en
bit, as it keeps being set. This may cause following attempts to use HPI,
even if's not enabled. Let's fix this!
Dmitry V. Levin [Sun, 16 Dec 2018 01:49:52 +0000 (04:49 +0300)]
uapi: linux/blkzoned.h: fix BLKGETZONESZ and BLKGETNRZONES definitions
According to the documentation in include/uapi/asm-generic/ioctl.h,
_IOW means userspace is writing and kernel is reading, and
_IOR means userspace is reading and kernel is writing.
In case of these two ioctls, kernel is writing and userspace is reading,
so they have to be _IOR instead of _IOW.
Fixes: 72cd87576d1d8 ("block: Introduce BLKGETZONESZ ioctl") Fixes: 65e4e3eee83d7 ("block: Introduce BLKGETNRZONES ioctl") Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>