git.proxmox.com Git - mirror_ubuntu-kernels.git/log

bpf, sparc: fix usage of wrong reg for load_skb_regs after call

When LD_ABS/IND is used in the program, and we have a BPF helper
call that changes packet data (bpf_helper_changes_pkt_data() returns
true), then in case of sparc JIT, we try to reload cached skb data
from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or
assumption that skb sits in R6 at this point, all helpers changing
skb data only have a guarantee that skb sits in R1. Therefore,
store BPF R1 in L7 temporarily and after procedure call use L7 to
reload cached skb data. skb sitting in R6 is only true at the time
when LD_ABS/IND is executed.

Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data

Some JITs don't cache skb context on stack in prologue, so when
LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data()
as true, then they temporarily save/restore skb pointer. However,
the assumption that skb always has to be in r1 is a bit of a
gamble. Right now it turned out to be true for all helpers listed
in bpf_helper_changes_pkt_data(), but lets enforce that from verifier
side, so that we make this a guarantee and bail out if the func
proto is misconfigured in future helpers.

In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data()
is completely unrelevant here (since cBPF is context read-only) and
therefore always false.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, ppc64: do not reload skb pointers in non-skb context

The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the helper
would enforce a reload such as in case of XDP. Here, we do
have a struct xdp_buff instead of struct sk_buff as context,
thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB.

Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, s390x: do not reload skb pointers in non-skb context

The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the
BPF helper would enforce a reload such as in case of XDP.
Here, we do have a struct xdp_buff instead of struct sk_buff
as context, thus this will access garbage.

JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB only. Tested on s390x.

Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

xdp: linearize skb in netif_receive_generic_xdp()

In netif_receive_generic_xdp(), it is necessary to linearize all
nonlinear skb. However, in current implementation, skb with
troom <= 0 are not linearized. This patch fixes this by calling
skb_linearize() for all nonlinear skb.

Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2017-12-13

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Addition of explicit scheduling points to map alloc/free
   in order to avoid having to hold the CPU for too long,
   from Eric.

2) Fixing of a corruption in overlapping perf_event_output
   calls from different BPF prog types on the same CPU out
   of different contexts, from Daniel.

3) Fallout fixes for recent correction of broken uapi for
   BPF_PROG_TYPE_PERF_EVENT. um had a missing asm header
   that needed to be pulled in from asm-generic and for
   BPF selftests the asm-generic include did not work,
   so similar asm include scheme was adapted for that
   problematic header that perf is having with other
   header files under tools, from Daniel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-misc-fixes'

Tariq Toukan says:

====================
mlx4 misc fixes

This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.

Patch 1 by Eugenia fixes an MTU issue in selftest.
Patch 2 by Eran fixes an accounting issue in the resource tracker.
Patch 3 by Eran fixes a race condition that causes counter inconsistency.

Series generated against net commit:
200809716aed fou: fix some member types in guehdr

v2:
Patch 2: Add reviewer credit, rephrase commit message.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fill all counters under one call of stats lock

Before this patch, the stats_lock was acquired twice. In between the
locks Driver sent command to gather some more statistics (per priority
and counter statistics). If the stats lock was acquired by get
statistics NDO in between we would have report out of sync counters.

Fix this by collecting all stats from Firmware in advance and then
fill the Software structs under one lock.

Fixes: 0b131561a7d6 ("net/mlx4_en: Add Flow control statistics display via ethtool")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Fix wrong calculation of free counters

The field res_free indicates the total number of counters which are
available for allocation (reserved and unreserved). Fixed a bug where
the reserved counters were subtracted from res_free before any
allocation was performed.

Before this fix, free counters which were not reserved could not be
allocated.

Fixes: 9de92c60beaa ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix selftest for small MTUs

Set the minimal MTU threshold for running loopback selftest.
MTU should be big enough to include packet payload, NET_IP_ALIGN,
Ethernet headers and preamble length.

Fixes: e7c1c2c46201 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: avoid configuring fiber page for SGMII-to-Copper

When in SGMII-to-Copper mode, the fiber page is used for the MAC facing
link, and does not require configuration of the fiber auto-negotiation
settings. Avoid trying.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dwc-xlgmac: Add co-maintainer

Jose Abreu will join to maintain dwc-xlgmac.
He will help with new feature development for
this driver. Thanks Jose and welcome on board!

Signed-off-by: Jie Deng <jiedeng@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: refresh tcp_mstamp from timers callbacks

Only the retransmit timer currently refreshes tcp_mstamp

We should do the same for delayed acks and keepalives.

Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.

Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix potential underestimation on rcv_rtt

When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.

Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skge: remove redundunt free_irq under spinlock

The code to handle multi-port SKGE boards was freeing IRQ
twice. The first one was under lock and might sleep.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: make function meson_gxl_read_status static

The function meson_gxl_read_status is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
symbol 'meson_gxl_read_status' was not declared. Should it be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of_mdio / mdiobus: ensure mdio devices have fwnode correctly populated

Ensure that all mdio devices populate the struct device fwnode pointer
as well as the of_node pointer to allow drivers that wish to use
fwnode APIs to work.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix resume handling

When a PHY has the BMCR_PDOWN bit set, it may decide to ignore writes
to other registers, or reset the registers to power-on defaults.
Micrel PHYs do this for their interrupt registers.

The current structure of phylib tries to enable interrupts before
resuming (and releasing) the BMCR_PDOWN bit. This fails, causing
Micrel PHYs to stop working after a suspend/resume sequence if they
are using interrupts.

Fix this by ensuring that the PHY driver resume methods do not take
the phydev->lock mutex themselves, but the callers of phy_resume()
take that lock. This then allows us to move the call to phy_resume()
before we enable interrupts in phy_start().

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: vf610-zii-dev: use XAUI for DSA link ports

Use XAUI rather than XGMII for DSA link ports, as this is the interface
mode that the switches actually use. XAUI is the 4 lane bus with clock
per direction, whereas XGMII is a 32 bit bus with clock.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: allow XAUI phy interface mode

XGMII is a 32-bit bus plus two clock signals per direction.  XAUI is
four serial lanes per direction.  The 88e6190 supports XAUI but not
XGMII as it doesn't have enough pins.  The same is true of 88e6176.

Match on PHY_INTERFACE_MODE_XAUI for the XAUI port type, but keep
accepting XGMII for backwards compatibility.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

hippi: Fix a Fix a possible sleep-in-atomic bug in rr_close

The driver may sleep under a spinlock.
The function call path is:
rr_close (acquire the spinlock)
free_irq --> may sleep

To fix it, free_irq is moved to the place without holding the spinlock.

This bug is found by my static analysis tool(DSAC) and checked by my code review.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The follow patchset contains Netfilter fixes for your net tree,
they are:

1) Fix compilation warning in x_tables with clang due to useless
   redundant reassignment, from Colin Ian King.

2) Add bugtrap to net_exit to catch uninitialized lists, patch
   from Vasily Averin.

3) Fix out of bounds memory reads in H323 conntrack helper, this
   comes with an initial patch to remove replace the obscure
   CHECK_BOUND macro as a dependency. From Eric Sesterhenn.

4) Reduce retransmission timeout when window is 0 in TCP conntrack,
   from Florian Westphal.

6) ctnetlink clamp timeout to INT_MAX if timeout is too large,
   otherwise timeout wraps around and it results in killing the
   entry that is being added immediately.

7) Missing CAP_NET_ADMIN checks in cthelper and xt_osf, due to
   no netns support. From Kevin Cernekee.

8) Missing maximum number of instructions checks in xt_bpf, patch
   from Jann Horn.

9) With no CONFIG_PROC_FS ipt_CLUSTERIP compilation breaks,
   patch from Arnd Bergmann.

10) Missing netlink attribute policy in nftables exthdr, from
    Florian Westphal.

11) Enable conntrack with IPv6 MASQUERADE rules, as a357b3f80bc8
    should have done in first place, from Konstantin Khlebnikov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: arc: fix error handling in emac_rockchip_probe

If clk_set_rate() fails, we should disable clk before return.
Found by Linux Driver Verification project (linuxtesting.org).

Changes since v2 [1]:
* Merged with latest code changes

Changes since v1:
Update made thanks to David's review, much appreciated David.
* Improved inconsistent failure handling of clock rate setting
* For completeness of usecase, added arc_emac_probe error handling

Signed-off-by: Branislav Radocaj <branislav@radocaj.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: add Sierra EM7565 1199:9091

Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem.
The USB id is added to qmi_wwan.c to allow QMI communication
with the EM7565.

Signed-off-by: Sebastian Sjoholm <ssjoholm@mac.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: igmp: Use correct source address on IGMPv3 reports

Closing a multicast socket after the final IPv4 address is deleted
from an interface can generate a membership report that uses the
source IP from a different interface.  The following test script, run
from an isolated netns, reproduces the issue:

    #!/bin/bash

    ip link add dummy0 type dummy
    ip link add dummy1 type dummy
    ip link set dummy0 up
    ip link set dummy1 up
    ip addr add 10.1.1.1/24 dev dummy0
    ip addr add 192.168.99.99/24 dev dummy1

    tcpdump -U -i dummy0 &
    socat EXEC:"sleep 2" \
        UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &

    sleep 1
    ip addr del 10.1.1.1/24 dev dummy0
    sleep 5
    kill %tcpdump

RFC 3376 specifies that the report must be sent with a valid IP source
address from the destination subnet, or from address 0.0.0.0.  Add an
extra check to make sure this is the case.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate potential memory leak

In the function tipc_sk_mcast_rcv() we call refcount_dec(&skb->users)
on received sk_buffers. Since the reference counter might hit zero at
this point, we have a potential memory leak.

We fix this by replacing refcount_dec() with kfree_skb().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove duplicate includes

These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.

Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: igmp: guard against silly MTU values

IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: mcast: better catch silly mtu values

syzkaller reported crashes in IPv6 stack [1]

Xin Long found that lo MTU was set to silly values.

IPv6 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in mld code where it is assumed
the mtu is suitable.

Fix this by reading device mtu once and checking IPv6 minimal MTU.

[1]
skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801db307508 EFLAGS: 00010286
RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
FS:  0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  skb_over_panic net/core/skbuff.c:109 [inline]
  skb_put+0x181/0x1c0 net/core/skbuff.c:1694
  add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
  add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
  mld_send_cr net/ipv6/mcast.c:1903 [inline]
  mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
  call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
  expire_timers kernel/time/timer.c:1357 [inline]
  __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
  invoke_softirq kernel/softirq.c:365 [inline]
  irq_exit+0x1d3/0x210 kernel/softirq.c:405
  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ravb: add workaround for clock when resuming with WoL enabled"

This reverts commit fbf3d034f2ff6264183cfa6845770e8cc2a986c8.

As of commit 560869100b99a3da ("clk: renesas: cpg-mssr: Restore module
clocks during resume"), the workaround is no longer needed.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add schedule points to map alloc/free

While using large percpu maps, htab_map_alloc() can hold
cpu for hundreds of ms.

This patch adds cond_resched() calls to percpu alloc/free
call sites, all running in process context.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-misc-fixes'

Daniel Borkmann says:

====================
Couple of outstanding fixes for BPF tree: 1) fixes a perf RB
corruption, 2) and 3) fixes a few build issues from the recent
bpf_perf_event.h uapi corrections. Thanks!
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: fix broken BPF selftest build

At least on x86_64, the kernel's BPF selftests seemed to have stopped
to build due to 618e165b2a8e ("selftests/bpf: sync kernel headers and
introduce arch support in Makefile"):

  [...]
  In file included from test_verifier.c:29:0:
  ../../../include/uapi/linux/bpf_perf_event.h:11:32:
     fatal error: asm/bpf_perf_event.h: No such file or directory
   #include <asm/bpf_perf_event.h>
                                ^
  compilation terminated.
  [...]

While pulling in tools/arch/*/include/uapi/asm/bpf_perf_event.h seems
to work fine, there's no automated fall-back logic right now that would
do the same out of tools/include/uapi/asm-generic/bpf_perf_event.h. The
usual convention today is to add a include/[uapi/]asm/ equivalent that
would pull in the correct arch header or generic one as fall-back, all
ifdef'ed based on compiler target definition. It's similarly done also
in other cases such as tools/include/asm/barrier.h, thus adapt the same
here.

Fixes: 618e165b2a8e ("selftests/bpf: sync kernel headers and introduce arch support in Makefile")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: fix build issues on um due to mising bpf_perf_event.h

Since c895f6f703ad ("bpf: correct broken uapi for
BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
on i386 or x86_64:

  [...]
    CC      init/main.o
  In file included from ../include/linux/perf_event.h:18:0,
                   from ../include/linux/trace_events.h:10,
                   from ../include/trace/syscall.h:7,
                   from ../include/linux/syscalls.h:82,
                   from ../init/main.c:20:
  ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
  asm/bpf_perf_event.h: No such file or directory #include
  <asm/bpf_perf_event.h>
  [...]

Lets add missing bpf_perf_event.h also to um arch. This seems
to be the only one still missing.

Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Richard Weinberger <richard@sigma-star.at>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Richard Weinberger <richard@sigma-star.at>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: fix corruption on concurrent perf_event_output calls

When tracing and networking programs are both attached in the
system and both use event-output helpers that eventually call
into perf_event_output(), then we could end up in a situation
where the tracing attached program runs in user context while
a cls_bpf program is triggered on that same CPU out of softirq
context.

Since both rely on the same per-cpu perf_sample_data, we could
potentially corrupt it. This can only ever happen in a combination
of the two types; all tracing programs use a bpf_prog_active
counter to bail out in case a program is already running on
that CPU out of a different context. XDP and cls_bpf programs
by themselves don't have this issue as they run in the same
context only. Therefore, split both perf_sample_data so they
cannot be accessed from each other.

Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data")
Reported-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Song Liu <songliubraving@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tcp md5sig: Use skb's saddr when replying to an incoming segment

The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.

Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.

This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.

Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fou: fix some member types in guehdr

guehdr struct is used to build or parse gue packets, which
are always in big endian. It's better to define all guehdr
members as __beXX types.

Also, in validate_gue_flags it's not good to use a __be32
variable for both Standard flags(__be16) and Private flags
(__be32), and pass it to other funcions.

This patch could fix a bunch of sparse warnings from fou.

Fixes: 5024c33ac354 ("gue: Add infrastructure for flags and options")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams

Now in sctp_setsockopt_reset_streams, it only does the check
optlen < sizeof(*params) for optlen. But it's not enough, as
params->srs_number_streams should also match optlen.

If the streams in params->srs_stream_list are less than stream
nums in params->srs_number_streams, later when dereferencing
the stream list, it could cause a slab-out-of-bounds crash, as
reported by syzbot.

This patch is to fix it by also checking the stream numbers in
sctp_setsockopt_reset_streams to make sure at least it's not
greater than the streams in the list.

Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: fix for a race condition in raw_sendmsg

inet->hdrincl is racy, and could lead to uninitialized stack pointer
usage, so its value should be read only once.

Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt")
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Add netns check on taps

Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity.  Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.

Test case:

    vpnns -- bash -c "ip link add nlmon0 type nlmon; \
                      ip link set nlmon0 up; \
                      tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
    sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
        spi 0x1 mode transport \
        auth sha1 0x6162633132330000000000000000000000000000 \
        enc aes 0x00000000000000000000000000000000
    grep --binary abc123 /tmp/nlmon.pcap

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sh_eth: do not advertise Gigabit capabilities when not available

Not all variants of the sh_eth hardware have Gigabit
support. Unfortunately, the current driver doesn't tell the PHY about
the limited MAC capabilities. Due to this, if you have a Gigabit
capable PHY, the PHY will advertise its Gigabit capability and
establish a link at 1Gbit/s, even though the MAC doesn't support it.

In order to avoid this, we use the recently introduced
phy_set_max_speed() to tell the PHY to not advertise speed higher than
100 MBit/s.

Tested on a SH7786 platform, with a Gigabit PHY.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: meson-gxl: detect LPA corruption

The purpose of this change is to fix the incorrect detection of the link
partner (LP) advertised capabilities which sometimes happens with this PHY
(roughly 1 time in a dozen)

This issue may cause the link to be negotiated at 10Mbps/Full or
10Mbps/Half when 100MBps/Full is actually possible. In some case, the link
is even completely broken and no communication is possible.

To detect the corruption, we must look for a magic undocumented bit in the
WOL bank (hint given by the SoC vendor kernel) but this is not enough to
cover all cases. We also have to look at the LPA ack. If the LP supports
Aneg but did not ack our base code when aneg is completed, we assume
something went wrong.

The detection of a corrupted LPA triggers a restart of the aneg process.
This solves the problem but may take up to 6 retries to complete.

Fixes: 7334b3e47aee ("net: phy: Add Meson GXL Internal PHY driver")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: ip6t_MASQUERADE: add dependency on conntrack module

After commit 4d3a57f23dec ("netfilter: conntrack: do not enable connection
tracking unless needed") conntrack is disabled by default unless some
module explicitly declares dependency in particular network namespace.

Fixes: a357b3f80bc8 ("netfilter: nat: add dependencies on conntrack module")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ptr_ring: add barriers

Users of ptr_ring expect that it's safe to give the
data structure a pointer and have it be available
to consumers, but that actually requires an smb_wmb
or a stronger barrier.

In absence of such barriers and on architectures that reorder writes,
consumer might read an un=initialized value from an skb pointer stored
in the skb array. This was observed causing crashes.

To fix, add memory barriers. The barrier we use is a wmb, the
assumption being that producers do not need to read the value so we do
not need to order these reads.

Reported-by: George Cherian <george.cherian@cavium.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2017-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three fixes:
* for certificate C file generation, don't use hexdump as it's
   not always installed by default, use pure posix instead (od/sed)
* for certificate C file generation, don't write the file if
   anything fails, so the build abort will not cause a bad build
   upon a second attempt
* fix locking in ieee80211_sta_tear_down_BA_sessions() which had
   been causing lots of locking warnings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: exthdr: add missign attributes to policy

Add missing netlink attribute policy.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

mac80211: fix locking in ieee80211_sta_tear_down_BA_sessions

Due to overlap between
commit 1281103770e9 ("mac80211: Simplify locking in ieee80211_sta_tear_down_BA_sessions()")
and the way that Luca modified
commit 72e2c3438ba3 ("mac80211: tear down RX aggregations first")
when sending it upstream from Intel's internal tree, we get
the following warning:

WARNING: CPU: 0 PID: 5472 at net/mac80211/agg-tx.c:315 ___ieee80211_stop_tx_ba_session+0x158/0x1f0

since there's no appropriate locking around the call to
___ieee80211_stop_tx_ba_session; Sara's original just had
a call to the locked __ieee80211_stop_tx_ba_session (one
less underscore) but it looks like Luca modified both of
the calls when fixing it up for upstream, leading to the
problem at hand.

Move the locking appropriately to fix this problem.

Reported-by: Kalle Valo <kvalo@codeaurora.org>
Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

kmemcheck: rip it out for real

Commit 4675ff05de2d ("kmemcheck: rip it out") has removed the code but
for some reason SPDX header stayed in place. This looks like a rebase
mistake in the mmotm tree or the merge mistake. Let's drop those
leftovers as well.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
    drivers).

2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
    to userspace and broke some apps. From Johannes Berg.

3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
    Wang.

4) Gianfar MAC can't do EEE so don't advertise it by default, from
    Claudiu Manoil.

5) Relax strict netlink attribute validation, but emit a warning. From
    David Ahern.

6) Fix regression in checksum offload of thunderx driver, from Florian
    Westphal.

7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.

8) New card support in iwlwifi, from Ihab Zhaika.

9) BBR congestion control bug fixes from Neal Cardwell.

10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.

11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.

12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.

13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
  net: mvpp2: fix the RSS table entry offset
  tcp: evaluate packet losses upon RTT change
  tcp: fix off-by-one bug in RACK
  tcp: always evaluate losses in RACK upon undo
  tcp: correctly test congestion state in RACK
  bnxt_en: Fix sources of spurious netpoll warnings
  tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
  tcp_bbr: reset full pipe detection on loss recovery undo
  tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
  sfc: pass valid pointers from efx_enqueue_unwind
  gianfar: Disable EEE autoneg by default
  tcp: invalidate rate samples during SACK reneging
  can: peak/pcie_fd: fix potential bug in restarting tx queue
  can: usb_8dev: cancel urb on -EPIPE and -EPROTO
  can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
  can: esd_usb2: cancel urb on -EPIPE and -EPROTO
  can: ems_usb: cancel urb on -EPIPE and -EPROTO
  can: mcba_usb: cancel urb on -EPROTO
  usbnet: fix alignment for frames with no ethernet header
  tcp: use current time in tcp_rcv_space_adjust()
  ...

Merge tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

"A series of fixes for the media subsytem:

   - The largest amount of fixes in this series is with regards to
     comments that aren't kernel-doc, but start with "/**".

     A new check added for 4.15 makes it to produce a *huge* amount of
     new warnings (I'm compiling here with W=1). Most of the patches in
     this series fix those.

     No code changes - just comment changes at the source files

   - rc: some fixed in order to better handle RC repetition codes

   - v4l-async: use the v4l2_dev from the root notifier when matching
     sub-devices

   - v4l2-fwnode: Check subdev count after checking port

   - ov 13858 and et8ek8: compilation fix with randconfigs

   - usbtv: a trivial new USB ID addition

   - dibusb-common: don't do DMA on stack on firmware load

   - imx274: Fix error handling, add MAINTAINERS entry

   - sir_ir: detect presence of port"

* tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits)
  media: imx274: Fix error handling, add MAINTAINERS entry
  media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices
  media: v4l2-fwnode: Check subdev count after checking port
  media: et8ek8: select V4L2_FWNODE
  media: ov13858: Select V4L2_FWNODE
  media: rc: partial revert of "media: rc: per-protocol repeat period"
  media: dvb: i2c transfers over usb cannot be done from stack
  media: dvb-frontends: complete kernel-doc markups
  media: docs: add documentation for frontend attach info
  media: dvb_frontends: fix kernel-doc macros
  media: drivers: remove "/**" from non-kernel-doc comments
  media: lm3560: add a missing kernel-doc parameter
  media: rcar_jpu: fix two kernel-doc markups
  media: vsp1: add a missing kernel-doc parameter
  media: soc_camera: fix a kernel-doc markup
  media: mt2063: fix some kernel-doc warnings
  media: radio-wl1273: fix a parameter name at kernel-doc macro
  media: s3c-camif: add missing description at s3c_camif_find_format()
  media: mtk-vpu: add description for wdt fields at struct mtk_vpu
  media: vdec: fix some kernel-doc warnings
  ...

Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This pull is a bit larger than I'd like but a large bunch of it is
  license fixes, AMD wanted to fix the licenses for a bunch of files
  that were missing them,

Otherwise a bunch of TTM regression fix since the hugepage support,
some i915 and gvt fixes, a core connector free in a safe context fix,
and one bridge fix"

* tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits)
  drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
  Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
  drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
  drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
  drm/exynos: remove unnecessary function declaration
  drm/exynos: remove unnecessary descrptions
  drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
  drm/exynos: Fix dma-buf import
  drm/ttm: swap consecutive allocated pooled pages v4
  drm: safely free connectors from connector_iter
  drm/i915/gvt: set max priority for gvt context
  drm/i915/gvt: Don't mark vgpu context as inactive when preempted
  drm/i915/gvt: Limit read hw reg to active vgpu
  drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
  drm/i915/gvt: Emulate PCI expansion ROM base address register
  drm/ttm: swap consecutive allocated cached pages v3
  drm/ttm: roundup the shrink request to prevent skip huge pool
  drm/ttm: add page order support in ttm_pages_put
  drm/ttm: add set_pages_wb for handling page order more than zero
  drm/ttm: add page order in page pool
  ...

Merge tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md

Pull md fixes from Shaohua Li:
"Some MD fixes.

  The notable one is a raid5-cache deadlock bug with dm-raid, others are
  not significant"

* tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/raid1/10: add missed blk plug
  md: limit mdstat resync progress to max_sectors
  md/r5cache: move mddev_lock() out of r5c_journal_mode_set()
  md/raid5: correct degraded calculation in raid5_error

Merge tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree fixes from Rob Herring:
"Another set of DT fixes:

   - Fixes from overlay code rework. A trifecta of fixes to the locking,
     an out of bounds access, and a memory leak in of_overlay_apply()

   - Clean-up at25 eeprom binding document

   - Remove leading '0x' in unit-addresses from binding docs"

* tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  of: overlay: Make node skipping in init_overlay_changeset() clearer
  of: overlay: Fix out-of-bounds write in init_overlay_changeset()
  of: overlay: Fix (un)locking in of_overlay_apply()
  of: overlay: Fix memory leak in of_overlay_apply() error path
  dt-bindings: eeprom: at25: Document device-specific compatible values
  dt-bindings: eeprom: at25: Grammar s/are can/can/
  dt-bindings: Remove leading 0x from bindings notation
  of: overlay: Remove else after goto
  of: Spelling s/changset/changeset/
  of: unittest: Remove bogus overlay mutex release from overlay_data_add()

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio bugfixes from Michael Tsirkin:
"A couple of minor bugfixes"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_net: fix return value check in receive_mergeable()
  virtio_mmio: add cleanup for virtio_mmio_remove
  virtio_mmio: add cleanup for virtio_mmio_probe

Merge tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
"Just two small fixes for the new pvcalls frontend driver"

* tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pvcalls: Fix a check in pvcalls_front_remove()
xen/pvcalls: check for xenbus_read() errors

Merge tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

"One notable fix for kexec on Power9, where we were not clearing MMU
  PID properly which sometimes leads to hangs. Finally debugged to a
  root cause by Nick.

  A revert of a patch which tried to rework our panic handling to get
  more output on the console, but inadvertently broke reporting the
  panic to the hypervisor, which apparently people care about.

  Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in
  xmon.

  Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria"

* tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/xmon: Don't print hashed pointers in xmon
  powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
  Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"
  powerpc/perf: Fix oops when grouping different pmu events

Merge tag 'linux-can-fixes-for-4.15-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2017-12-08

this is a pull request of 6 patches for net/master.

Martin Kelly provides 5 patches for various USB based CAN drivers, that
properly cancel the URBs on adapter unplug, so that the driver doesn't
end up in an endless loop. Stephane Grosjean provides a patch to restart
the tx queue if zero length packages are transmitted.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2017-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.15

Second set of fixes for 4.15. This time a lot of iwlwifi patches and
two brcmfmac patches. Most important here are the MIC and IVC fixes
for iwlwifi to unbreak 9000 series.

iwlwifi

* fix rate-scaling to not start lowest possible rate

* fix the TX queue hang detection for AP/GO modes

* fix the TX queue hang timeout in monitor interfaces

* fix packet injection

* remove a wrong error message when dumping PCI registers

* fix race condition with RF-kill

* tell mac80211 when the MIC has been stripped (9000 series)

* tell mac80211 when the IVC has been stripped (9000 series)

* add 2 new PCI IDs, one for 9000 and one for 22000

* fix a queue hang due during a P2P Remain-on-Channel operation

brcmfmac

* fix a race which sometimes caused a crash during sdio unbind

* fix a kernel-doc related build error
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: fix the RSS table entry offset

The macro used to access or set an RSS table entry was using an offset
of 8, while it should use an offset of 0. This lead to wrongly configure
the RSS table, not accessing the right entries.

Fixes: 1d7d15d79fb4 ("net: mvpp2: initialize the RSS tables")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-RACK-loss-recovery-bug-fixes'

Yuchung Cheng says:

====================
tcp: RACK loss recovery bug fixes

This patch set has four minor bug fixes in TCP RACK loss recovery.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: evaluate packet losses upon RTT change

RACK skips an ACK unless it advances the most recently delivered
TX timestamp (rack.mstamp). Since RACK also uses the most recent
RTT to decide if a packet is lost, RACK should still run the
loss detection whenever the most recent RTT changes. For example,
an ACK that does not advance the timestamp but triggers the cwnd
undo due to reordering, would then use the most recent (higher)
RTT measurement to detect further losses.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix off-by-one bug in RACK

RACK should mark a packet lost when remaining wait time is zero.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: always evaluate losses in RACK upon undo

When sender detects spurious retransmission, all packets
marked lost are remarked to be in-flight. However some may
be considered lost based on its timestamps in RACK. This patch
forces RACK to re-evaluate, which may be skipped previously if
the ACK does not advance RACK timestamp.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: correctly test congestion state in RACK

RACK does not test the loss recovery state correctly to compute
the reordering window. It assumes if lost_out is zero then TCP is
not in loss recovery. But it can be zero during recovery before
calling tcp_rack_detect_loss(): when an ACK acknowledges all
packets marked lost before receiving this ACK, but has not yet
to discover new ones by tcp_rack_detect_loss(). The fix is to
simply test the congestion state directly.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Priyaranjan Jha <priyarjha@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Fix sources of spurious netpoll warnings

After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and
903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
we still see the following WARN fire:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
  bnxt_poll+0x0/0xd0 exceeded budget in poll
  <snip>
  Call Trace:
   [<ffffffff814be5cd>] dump_stack+0x4d/0x70
   [<ffffffff8107e013>] __warn+0xd3/0xf0
   [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
   [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
   [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
   [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
   [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
   [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
   [<ffffffff810c9549>] console_unlock+0x339/0x5b0
   [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
   [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
   [<ffffffff81173df5>] printk+0x48/0x50
   [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
   [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
   [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
   [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
   [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
   [<ffffffff81095f8b>] process_one_work+0x19b/0x480
   [<ffffffff810967ca>] worker_thread+0x6a/0x520
   [<ffffffff8109c7c4>] kthread+0xe4/0x100
   [<ffffffff81884c52>] ret_from_fork+0x22/0x40

This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
given a non-zero budget.

Signed-off-by: Calvin Owens <calvinowens@fb.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp-bbr-sampling-fixes'

Neal Cardwell says:

====================
TCP BBR sampling fixes for loss recovery undo

This patch series has a few minor bug fixes for cases where spurious
loss recoveries can trick BBR estimators into estimating that the
available bandwidth is much lower than the true available bandwidth.
In both cases the fix here is to just reset the estimator upon loss
recovery undo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: reset long-term bandwidth sampling on loss recovery undo

Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: reset full pipe detection on loss recovery undo

Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.

Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.

Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp_bbr: record "full bw reached" decision in new full_bw_reached bit

This commit records the "full bw reached" decision in a new
full_bw_reached bit. This is a pure refactor that does not change the
current behavior, but enables subsequent fixes and improvements.

In particular, this enables simple and clean fixes because the full_bw
and full_bw_cnt can be unconditionally zeroed without worrying about
forgetting that we estimated we filled the pipe in Startup. And it
enables future improvements because multiple code paths can be used
for estimating that we filled the pipe in Startup; any new code paths
only need to set this bit when they think the pipe is full.

Note that this fix intentionally reduces the width of the full_bw_cnt
counter, since we have never used the most significant bit.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: pass valid pointers from efx_enqueue_unwind

The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.

efx_enqueue_unwind() is called very rarely, during error handling.
Without this fix it would fail with a NULL pointer dereference in
efx_dequeue_buffer, with efx_enqueue_skb in the call stack.

Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Tested-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Disable EEE autoneg by default

This controller does not support EEE, but it may connect to a PHY
which supports EEE and advertises EEE by default, while its link
partner also advertises EEE. If this happens, the PHY enters low
power mode when the traffic rate is low and causes packet loss.
This patch disables EEE advertisement by default for any PHY that
gianfar connects to, to prevent the above unwanted outcome.

Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com>
Tested-by: Yangbo Lu <Yangbo.lu@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:

- three more patches in regard to the SPDX license tags. The missing
   tags for the files in arch/s390/kvm will be merged via the KVM tree.
   With that all s390 related files should have their SPDX tags.

- a patch to get rid of 'struct timespec' in the DASD driver.

- bug fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix compat system call table
  s390/mm: fix off-by-one bug in 5-level page table handling
  s390: Remove redudant license text
  s390: add a few more SPDX identifiers
  s390/dasd: prevent prefix I/O error
  s390: always save and restore all registers on context switch
  s390/dasd: remove 'struct timespec' usage
  s390/qdio: restrict target-full handling to IQDIO
  s390/qdio: consider ERROR buffers for inbound-full condition
  s390/virtio: add BSD license to virtio-ccw

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
"Fix some more FP register fallout from the SVE patches and also some
  problems with the PGD tracking in our software PAN emulation code,
  after we received a crash report from a 3.18 kernel running a
  backport.

  Summary:

   - fix SW PAN pgd shadowing for kernel threads, EFI and exiting user
     tasks

   - fix FP register leak when a task_struct is re-allocated

   - fix potential use-after-free in FP state tracking used by KVM"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64/sve: Avoid dereference of dead task_struct in KVM guest entry
  arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
  arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
  arm64: fpsimd: Abstract out binding of task's fpsimd context to the cpu.
  arm64: fpsimd: Prevent registers leaking from dead tasks

Merge tag 'acpi-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"This fixes an out of bounds warning from KASAN in the ACPI CPPC
driver"

* tag 'acpi-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / CPPC: Fix KASAN global out of bounds warning

Merge tag 'pm-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
"This fixes an issue in the device runtime PM framework that prevents
  customer devices from resuming if runtime PM is disabled for one or
  more of their supplier devices (as reflected by device links between
  those devices)"

* tag 'pm-4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / runtime: Fix handling of suppliers with disabled runtime PM

of: overlay: Make node skipping in init_overlay_changeset() clearer

Make it more clear that nodes without "__overlay__" subnodes are
skipped, by reverting the logic and using continue.
This also reduces indentation level.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>

of: overlay: Fix out-of-bounds write in init_overlay_changeset()

If an overlay has no "__symbols__" node, but it has nodes without
"__overlay__" subnodes at the end (e.g. a "__fixups__" node), after
filling in all fragments for nodes with "__overlay__" subnodes,
"fragment = &fragments[cnt]" will point beyond the end of the allocated
array.

Hence writing to "fragment->overlay" will overwrite unallocated memory,
which may lead to a crash later.

Fix this by deferring both the assignment to "fragment" and the
offending write afterwards until we know for sure the node has an
"__overlay__" subnode, and thus a valid entry in "fragments[]".

Fixes: 61b4de4e0b384f4a ("of: overlay: minor restructuring")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>

tcp: invalidate rate samples during SACK reneging

Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.

< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000

In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.

This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.

Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: peak/pcie_fd: fix potential bug in restarting tx queue

Don't rely on can_get_echo_skb() return value to wake the network tx
queue up: can_get_echo_skb() returns 0 if the echo array slot was not
occupied, but also when the DLC of the released echo frame was 0.

Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: usb_8dev: cancel urb on -EPIPE and -EPROTO

In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).

This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.

Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_usb: cancel urb on -EPIPE and -EPROTO

In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).

This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.

Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: esd_usb2: cancel urb on -EPIPE and -EPROTO

In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).

This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.

Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ems_usb: cancel urb on -EPIPE and -EPROTO

In mcba_usb, we have observed that when you unplug the device, the driver will
endlessly resubmit failing URBs, which can cause CPU stalls. This issue
is fixed in mcba_usb by catching the codes seen on device disconnect
(-EPIPE and -EPROTO).

This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it
in the same way.

Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: mcba_usb: cancel urb on -EPROTO

When we unplug the device, we can see both -EPIPE and -EPROTO depending
on exact timing and what system we run on. If we continue to resubmit
URBs, they will immediately fail, and they can cause stalls, especially
on slower CPUs.

Fix this by not resubmitting on -EPROTO, as we already do on -EPIPE.

Signed-off-by: Martin Kelly <mkelly@xevo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

netfilter: ipt_CLUSTERIP: fix clusterip_net_exit build regression

The added check produces a build error when CONFIG_PROC_FS is
disabled:

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function 'clusterip_net_exit':
net/ipv4/netfilter/ipt_CLUSTERIP.c:822:28: error: 'cn' undeclared (first use in this function)

This moves the variable declaration out of the #ifdef to make it
available to the WARN_ON_ONCE().

Fixes: 613d0776d3fe ("netfilter: exit_net cleanup check added")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'drm-misc-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

regression fix for vc4 + rpm stable fix for analogix bridge

* tag 'drm-misc-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-misc:
drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage

Merge tag 'drm-intel-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

- Fix for fd.o bug #103997 CNL eDP + HDMI causing a machine hard hang (James)
- Fix to allow suspending with a wedged GPU to hopefully unwedge it (Chris)
- Fix for Gen2 vblank timestap/frame counter jumps (Ville)
- Revert of a W/A for enabling FBC on CNL/GLK for certain images
  and sizes (Rodrigo)
- Lockdep fix for i915 userptr code (Chris)

gvt-fixes-2017-12-06

- Fix invalid hw reg read value for vGPU (Xiong)
- Fix qemu warning on PCI ROM bar missing (Changbin)
- Workaround preemption regression (Zhenyu)

* tag 'drm-intel-fixes-2017-12-07' of git://anongit.freedesktop.org/drm/drm-intel:
  Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
  drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
  drm/i915/gvt: set max priority for gvt context
  drm/i915/gvt: Don't mark vgpu context as inactive when preempted
  drm/i915/gvt: Limit read hw reg to active vgpu
  drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
  drm/i915/gvt: Emulate PCI expansion ROM base address register
  drm/i915/cnl: Mask previous DDI - PLL mapping
  drm/i915: Fix vblank timestamp/frame counter jumps on gen2
  drm/i915: Skip switch-to-kernel-context on suspend when wedged

Merge tag 'exynos-drm-fixes-for-v4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes

- fix page fault issue due to using wrong device object in prime import.
- drop NONCONTIG flag without IOMMU support.
- remove unnecessary members and declaration.

* tag 'exynos-drm-fixes-for-v4.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos:
  drm/exynos: remove unnecessary function declaration
  drm/exynos: remove unnecessary descrptions
  drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
  drm/exynos: Fix dma-buf import

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2017-12-06

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fixing broken uapi for BPF tracing programs for s390 and arm64
   architectures due to pt_regs being in-kernel only, and not part
   of uapi right now. A wrapper is added that exports pt_regs in
   an asm-generic way. For arm64 this maps to existing user_pt_regs
   structure and for s390 a user_pt_regs structure exporting the
   beginning of pt_regs is added and uapi-exported, thus fixing the
   BPF issues seen in perf (and BPF selftests), all from Hendrik.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: fix alignment for frames with no ethernet header

The qmi_wwan minidriver support a 'raw-ip' mode where frames are
received without any ethernet header. This causes alignment issues
because the skbs allocated by usbnet are "IP aligned".

Fix by allowing minidrivers to disable the additional alignment
offset. This is implemented using a per-device flag, since the same
minidriver also supports 'ethernet' mode.

Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
Reported-and-tested-by: Jay Foster <jay@systech.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use current time in tcp_rcv_space_adjust()

When I switched rcv_rtt_est to high resolution timestamps, I forgot
that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust()

Using an old timestamp leads to autotuning lags.

Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: Relax attr validation for fixed length types

Commit 28033ae4e0f5 ("net: netlink: Update attr validation to require
exact length for some types") requires attributes using types NLA_U* and
NLA_S* to have an exact length. This change is exposing bugs in various
userspace commands that are sending attributes with an invalid length
(e.g., attribute has type NLA_U8 and userspace sends NLA_U32). While
the commands are clearly broken and need to be fixed, users are arguing
that the sudden change in enforcement is breaking older commands on
newer kernels for use cases that otherwise "worked".

Relax the validation to print a warning mesage similar to what is done
for messages containing extra bytes after parsing.

Fixes: 28033ae4e0f5 ("net: netlink: Update attr validation to require exact length for some types")
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

adding missing rcu_read_unlock in ipxip6_rcv

commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this

v1->v2:
instead of doing rcu_read_unlock in place, we are going to "drop"
section (to prevent skb leakage)

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mv88e6xxx-error-patch-fixes'

Andrew Lunn says:

====================
mv88e6xxx error patch fixes

While trying to bring up a new PHY on a board, i exercised the error
paths a bit, and discovered some bugs. The unwind for interrupt
handling deadlocks, and the MDIO code hits a BUG() when a registered
MDIO device is freed without first being unregistered.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Unregister MDIO bus on error path

The MDIO busses need to be unregistered before they are freed,
otherwise BUG() is called. Add a call to the unregister code if the
registration fails, since we can have multiple busses, of which some
may correctly register before one fails. This requires moving the code
around a little.

Fixes: a3c53be55c95 ("net: dsa: mv88e6xxx: Support multiple MDIO busses")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: Fix interrupt masking on removal

When removing the interrupt handling code, we should mask the
generation of interrupts. The code however unmasked all
interrupts. This can then cause a new interrupt. We then get into a
deadlock where the interrupt thread is waiting to run, and the code
continues, trying to remove the interrupt handler, which means waiting
for the thread to complete. On a UP machine this deadlocks.

Fix so we really mask interrupts in the hardware. The same error is
made in the error path when install the interrupt handling code.

Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: arc: fix error handling in emac_rockchip_probe

If clk_set_rate() fails, we should disable clk before return.
Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Branislav Radocaj <branislav@radocaj.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case

add appropriate calls to clk_disable_unprepare() by jumping to out_mdio
in case orion_mdio_probe() returns -EPROBE_DEFER.

Found by Linux Driver Verification project (linuxtesting.org).

Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock")
Signed-off-by: Tobias Jordan <Tobias.Jordan@elektrobit.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: fix return value check in receive_mergeable()

The function virtqueue_get_buf_ctx() could return NULL, the return
value 'buf' need to be checked with NULL, not value 'ctx'.

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio_mmio: add cleanup for virtio_mmio_remove

cleanup all resource allocated by virtio_mmio_probe.

Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>