]> git.proxmox.com Git - mirror_ubuntu-kernels.git/log
mirror_ubuntu-kernels.git
7 years agomac80211: free netdev on dev_alloc_name() error
Johannes Berg [Fri, 9 Jun 2017 19:33:09 +0000 (21:33 +0200)]
mac80211: free netdev on dev_alloc_name() error

The change to remove free_netdev() from ieee80211_if_free()
erroneously didn't add the necessary free_netdev() for when
ieee80211_if_free() is called directly in one place, rather
than as the priv_destructor. Add the missing call.

Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: rps: send out pending IPI's on CPU hotplug
ashwanth@codeaurora.org [Fri, 9 Jun 2017 08:54:58 +0000 (14:24 +0530)]
net: rps: send out pending IPI's on CPU hotplug

IPI's from the victim cpu are not handled in dev_cpu_callback.
So these pending IPI's would be sent to the remote cpu only when
NET_RX is scheduled on the victim cpu and since this trigger is
unpredictable it would result in packet latencies on the remote cpu.

This patch add support to send the pending ipi's of victim cpu.

Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agostmmac: fix for hw timestamp of GMAC3 unit
Mario Molitor [Thu, 8 Jun 2017 21:03:09 +0000 (23:03 +0200)]
stmmac: fix for hw timestamp of GMAC3 unit

1.) Bugfix of function stmmac_get_tx_hwtstamp.
    Corrected the tx timestamp available check (same as 4.8 and older)
    Change printout from info syslevel to debug.

2.) Bugfix of function stmmac_get_rx_hwtstamp.
    Corrected the rx timestamp available check (same as 4.8 and older)
    Change printout from info syslevel to debug.

Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Mario Molitor <mario_molitor@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agostmmac: fix ptp header for GMAC3 hw timestamp
Mario Molitor [Thu, 8 Jun 2017 20:41:02 +0000 (22:41 +0200)]
stmmac: fix ptp header for GMAC3 hw timestamp

According the CYCLON V documention only the bit 16 of snaptypesel should
set.
(more information see Table 17-20 (cv_5v4.pdf) :
 Timestamp Snapshot Dependency on Register Bits)

Fixes: d2042052a0aa ("stmmac: update the PTP header file")
Signed-off-by: Mario Molitor <mario_molitor@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoFix an intermittent pr_emerg warning about lo becoming free.
Krister Johansen [Thu, 8 Jun 2017 20:12:38 +0000 (13:12 -0700)]
Fix an intermittent pr_emerg warning about lo becoming free.

It looks like this:

Message from syslogd@flamingo at Apr 26 00:45:00 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4

They seem to coincide with net namespace teardown.

The message is emitted by netdev_wait_allrefs().

Forced a kdump in netdev_run_todo, but found that the refcount on the lo
device was already 0 at the time we got to the panic.

Used bcc to check the blocking in netdev_run_todo.  The only places
where we're off cpu there are in the rcu_barrier() and msleep() calls.
That behavior is expected.  The msleep time coincides with the amount of
time we spend waiting for the refcount to reach zero; the rcu_barrier()
wait times are not excessive.

After looking through the list of callbacks that the netdevice notifiers
invoke in this path, it appears that the dst_dev_event is the most
interesting.  The dst_ifdown path places a hold on the loopback_dev as
part of releasing the dev associated with the original dst cache entry.
Most of our notifier callbacks are straight-forward, but this one a)
looks complex, and b) places a hold on the network interface in
question.

I constructed a new bcc script that watches various events in the
liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
the loopback device until the invalidated dst entry gets freed.

[      __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
    __dst_free
    rcu_nocb_kthread
    kthread
    ret_from_fork
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoaf_unix: Add sockaddr length checks before accessing sa_family in bind and connect...
Mateusz Jurczyk [Thu, 8 Jun 2017 09:13:36 +0000 (11:13 +0200)]
af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers

Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_UNIX socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: add missing SPEED_14000
Joe Perches [Fri, 9 Jun 2017 13:45:32 +0000 (15:45 +0200)]
net: phy: add missing SPEED_14000

Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool")
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: vrf: Make add_fib_rules per network namespace flag
David Ahern [Thu, 8 Jun 2017 17:31:11 +0000 (11:31 -0600)]
net: vrf: Make add_fib_rules per network namespace flag

Commit 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
adds the l3mdev FIB rule the first time a VRF device is created. However,
it only creates the rule once and only in the namespace the first device
is created - which may not be init_net. Fix by using the net_generic
capability to make the add_fib_rules flag per network namespace.

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
Reported-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf, tests: fix endianness selection
Daniel Borkmann [Thu, 8 Jun 2017 17:06:25 +0000 (19:06 +0200)]
bpf, tests: fix endianness selection

I noticed that test_l4lb was failing in selftests:

  # ./test_progs
  test_pkt_access:PASS:ipv4 77 nsec
  test_pkt_access:PASS:ipv6 44 nsec
  test_xdp:PASS:ipv4 2933 nsec
  test_xdp:PASS:ipv6 1500 nsec
  test_l4lb:PASS:ipv4 377 nsec
  test_l4lb:PASS:ipv6 544 nsec
  test_l4lb:FAIL:stats 6297600000 200000
  test_tcp_estats:PASS: 0 nsec
  Summary: 7 PASSED, 1 FAILED

Tracking down the issue actually revealed that endianness selection
in bpf_endian.h is broken when compiled with clang with bpf target.
test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
__BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
and bpf_ntohs(iph->tot_len), and compares them against a defined
value and given a wrong endianness, the test outcome is different,
of course.

Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
include order we see different outcomes. Reason is that __BYTE_ORDER
is undefined due to missing endian.h include. Before we include the
asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
__LITTLE_ENDIAN since both are undefined, after the include which
correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
is defined, but given __BYTE_ORDER is still undefined, we match on
__BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
undefined at that point, sigh. ii) But even that would be wrong,
since when compiling the test cases with clang, one can select between
bpfeb and bpfel targets for cross compilation. Hence, we can also not
rely on what the system's endian.h provides, but we need to look at
the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
which also reflects targets bpf (native), bpfel, bpfeb correctly,
thus really only rely on that. After patch:

  # ./test_progs
  test_pkt_access:PASS:ipv4 74 nsec
  test_pkt_access:PASS:ipv6 42 nsec
  test_xdp:PASS:ipv4 2340 nsec
  test_xdp:PASS:ipv6 1461 nsec
  test_l4lb:PASS:ipv4 400 nsec
  test_l4lb:PASS:ipv6 530 nsec
  test_tcp_estats:PASS: 0 nsec
  Summary: 7 PASSED, 0 FAILED

Fixes: 43bcf707ccdc ("bpf: fix _htons occurences in test_progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoethtool.h: remind to update 802.3ad when adding new speeds
Nicolas Dichtel [Thu, 8 Jun 2017 09:18:13 +0000 (11:18 +0200)]
ethtool.h: remind to update 802.3ad when adding new speeds

Each time a new speed is added, the bonding 802.3ad isn't updated. Add a
comment to remind the developer to update this driver.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: fix 802.3ad support for 14G speed
Nicolas Dichtel [Thu, 8 Jun 2017 09:18:12 +0000 (11:18 +0200)]
bonding: fix 802.3ad support for 14G speed

This patch adds 14 Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.

Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: fix 802.3ad support for 5G and 50G speeds
Thibaut Collet [Thu, 8 Jun 2017 09:18:11 +0000 (11:18 +0200)]
bonding: fix 802.3ad support for 5G and 50G speeds

This patch adds [5|50] Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.

Fixes: c9a70d43461d ("net-next: ethtool: Added port speed macros.")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoopenvswitch: warn about missing first netlink attribute
Nicolas Dichtel [Thu, 8 Jun 2017 08:37:45 +0000 (10:37 +0200)]
openvswitch: warn about missing first netlink attribute

The first netlink attribute (value 0) must always be defined
as none/unspec.

Because we cannot change an existing UAPI, I add a comment to point the
mistake and avoid to propagate it in a new ovs API in the future.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoila_xlat: add missing hash secret initialization
Arnd Bergmann [Thu, 8 Jun 2017 07:54:24 +0000 (09:54 +0200)]
ila_xlat: add missing hash secret initialization

While discussing the possible merits of clang warning about unused initialized
functions, I found one function that was clearly meant to be called but
never actually is.

__ila_hash_secret_init() initializes the hash value for the ila locator,
apparently this is intended to prevent hash collision attacks, but this ends
up being a read-only zero constant since there is no caller. I could find
no indication of why it was never called, the earliest patch submission
for the module already was like this. If my interpretation is right, we
certainly want to backport the patch to stable kernels as well.

I considered adding it to the ila_xlat_init callback, but for best effect
the random data is read as late as possible, just before it is first used.
The underlying net_get_random_once() is already highly optimized to avoid
overhead when called frequently.

Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility")
Cc: stable@vger.kernel.org
Link: https://www.spinics.net/lists/kernel/msg2527243.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Fix build regression in rtl8723bs staging driver.
David S. Miller [Thu, 8 Jun 2017 15:51:59 +0000 (11:51 -0400)]
net: Fix build regression in rtl8723bs staging driver.

drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c: In function ‘rtw_cfg80211_add_monitor_if’:
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c:2670:10: error: ‘struct net_device’ has no member named ‘destructor’
  mon_ndev->destructor = rtw_ndev_destructor;
          ^

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'netvsc-bug-fixes'
David S. Miller [Thu, 8 Jun 2017 15:45:49 +0000 (11:45 -0400)]
Merge branch 'netvsc-bug-fixes'

Stephen Hemminger says:

====================
netvsc: bug fixes

These are bugfixes for netvsc driver in 4.12.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetvsc: move filter setting to rndis_device
stephen hemminger [Wed, 7 Jun 2017 22:53:49 +0000 (15:53 -0700)]
netvsc: move filter setting to rndis_device

The work queue and handling of network filter parameters should
be in rndis_device. This gets rid of warning from RCU checks,
eliminates a race and cleans up code.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetvsc: fix net poll mode
stephen hemminger [Wed, 7 Jun 2017 22:53:48 +0000 (15:53 -0700)]
netvsc: fix net poll mode

The ndo_poll_controller function needs to schedule NAPI to pick
up arriving packets and send completions. Otherwise no data
will ever be received. For simple case of netconsole, it also
will allow send completions to happen.  Without this netpoll
will eventually get stuck.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetvsc: fix rcu dereference warning from ethtool
stephen hemminger [Wed, 7 Jun 2017 22:53:47 +0000 (15:53 -0700)]
netvsc: fix rcu dereference warning from ethtool

The ethtool info command calls the netvsc get_sset_count with RTNL
but not with RCU. Which causes warning:

drivers/net/hyperv/netvsc_drv.c:1010 suspicious rcu_dereference_check() usage!

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ipv6: Release route when device is unregistering
David Ahern [Wed, 7 Jun 2017 18:26:23 +0000 (12:26 -0600)]
net: ipv6: Release route when device is unregistering

Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:

$ ifdown bond2    # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2

Steps to reproduce:
    echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
    ip link add dev bond12 type bond
    ip link add dev bond13 type bond
    ip addr add 2001:db8:2::0/64 dev bond12
    ip addr add 2001:db8:3::0/64 dev bond13
    ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
    ip link del dev bond12
    ip link del dev bond13

The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.

Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Zero ifla_vf_info in rtnl_fill_vfinfo()
Mintz, Yuval [Wed, 7 Jun 2017 18:00:33 +0000 (21:00 +0300)]
net: Zero ifla_vf_info in rtnl_fill_vfinfo()

Some of the structure's fields are not initialized by the
rtnetlink. If driver doesn't set those in ndo_get_vf_config(),
they'd leak memory to user.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodecnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Mateusz Jurczyk [Wed, 7 Jun 2017 14:14:29 +0000 (16:14 +0200)]
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb"
David S. Miller [Thu, 8 Jun 2017 14:50:18 +0000 (10:50 -0400)]
Revert "decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb"

This reverts commit 85eac2ba35a2dbfbdd5767c7447a4af07444a5b4.

There is an updated version of this fix which we should
use instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: emac: fix and unify emac_mdio functions
Christian Lamparter [Wed, 7 Jun 2017 13:51:16 +0000 (15:51 +0200)]
net: emac: fix and unify emac_mdio functions

emac_mdio_read_link() was not copying the requested phy settings
back into the emac driver's own phy api. This has caused a link
speed mismatch issue for the AR8035 as the emac driver kept
trying to connect with 10/100MBps on a 1GBit/s link.

This patch also unifies shared code between emac_setup_aneg()
and emac_mdio_setup_forced(). And furthermore it removes
a chunk of emac_mdio_init_phy(), that was copying the same
data into itself.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: emac: fix reset timeout with AR8035 phy
Christian Lamparter [Wed, 7 Jun 2017 13:51:15 +0000 (15:51 +0200)]
net: emac: fix reset timeout with AR8035 phy

This patch fixes a problem where the AR8035 PHY can't be
detected on an Cisco Meraki MR24, if the ethernet cable is
not connected on boot.

Russell Senior provided steps to reproduce the issue:
|Disconnect ethernet cable, apply power, wait until device has booted,
|plug in ethernet, check for interfaces, no eth0 is listed.
|
|This appears to be a problem during probing of the AR8035 Phy chip.
|When ethernet has no link, the phy detection fails, and eth0 is not
|created. Plugging ethernet later has no effect, because there is no
|interface as far as the kernel is concerned. The relevant part of
|the boot log looks like this:
|this is the failing case:
|
|[    0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[    0.882532] /plb/opb/ethernet@ef600c00: reset timeout
|[    0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!
|and the succeeding case:
|
|[    0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[    0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:..
|[    0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)

Based on the comment and the commit message of
commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT").
This is because the AR8035 PHY doesn't provide the TX Clock,
if the ethernet cable is not attached. This causes the reset
to timeout and the PHY detection code in emac_init_phy() is
unable to detect the AR8035 PHY. As a result, the emac driver
bails out early and the user left with no ethernet.

In order to stay compatible with existing configurations, the driver
tries the current reset approach at first. Only if the first attempt
timed out, it does perform one more retry with the clock temporarily
switched to the internal source for just the duration of the reset.

LEDE-Bug: #687 <https://bugs.lede-project.org/index.php?do=details&task_id=687>

Cc: Chris Blake <chrisrblake93@gmail.com>
Reported-by: Russell Senior <russell@personaltelco.net>
Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT")
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodecnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Mateusz Jurczyk [Wed, 7 Jun 2017 13:14:29 +0000 (15:14 +0200)]
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb

Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohsi: Fix build regression due to netdev destructor fix.
David S. Miller [Thu, 8 Jun 2017 14:16:05 +0000 (10:16 -0400)]
hsi: Fix build regression due to netdev destructor fix.

> ../drivers/hsi/clients/ssi_protocol.c:1069:5: error: 'struct net_device' has no member named 'destructor'

Reported-by: Mark Brown <broonie@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: s390: fix up for "Fix inconsistent teardown and release of private netdev state"
Stephen Rothwell [Thu, 8 Jun 2017 09:06:29 +0000 (19:06 +1000)]
net: s390: fix up for "Fix inconsistent teardown and release of private netdev state"

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Fix inconsistent teardown and release of private netdev state.
David S. Miller [Mon, 8 May 2017 16:52:56 +0000 (12:52 -0400)]
net: Fix inconsistent teardown and release of private netdev state.

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf, arm64: use separate register for state in stxr
Daniel Borkmann [Wed, 7 Jun 2017 11:45:37 +0000 (13:45 +0200)]
bpf, arm64: use separate register for state in stxr

Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:

  If s == t, then one of the following behaviors must occur:

   * The instruction is UNDEFINED.
   * The instruction executes as a NOP.
   * The instruction performs the store to the specified address, but
     the value stored is UNKNOWN.

Thus, use a different temporary register for the status flag to fix it.

Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:

  [...]
  0000003c:  c85f7d4b  ldxr x11, [x10]
  00000040:  8b07016b  add x11, x11, x7
  00000044:  c80c7d4b  stxr w12, x11, [x10]
  00000048:  35ffffac  cbnz w12, 0x0000003c
  [...]

  [1] https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf, p.6132

Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: do not bypass the mvpp22_port_mii_set function
Antoine Ténart [Wed, 7 Jun 2017 06:17:50 +0000 (08:17 +0200)]
net: mvpp2: do not bypass the mvpp22_port_mii_set function

The mvpp22_port_mii_set() function was added by 2697582144dd, but the
function directly returns without doing anything. This return was used
when debugging and wasn't removed before sending the patch. Fix this.

Fixes: 2697582144dd ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences")
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoibmvnic: Return failure on attempted mtu change
John Allen [Tue, 6 Jun 2017 21:55:52 +0000 (16:55 -0500)]
ibmvnic: Return failure on attempted mtu change

Changing the mtu is currently not supported in the ibmvnic driver.

Implement .ndo_change_mtu in the driver so that attempting to use ifconfig
to change the mtu will fail and present the user with an error message.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: fix up hash documentation
Michael S. Tsirkin [Tue, 6 Jun 2017 16:01:37 +0000 (19:01 +0300)]
net: fix up hash documentation

commit 61b905da33 ("net: Rename skb->rxhash to skb->hash")
didn't update the documentation, fix this up.

Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobnx2x: fix pf2vf bulletin DMA mapping leak
Michal Schmidt [Tue, 6 Jun 2017 14:30:31 +0000 (16:30 +0200)]
bnx2x: fix pf2vf bulletin DMA mapping leak

When freeing VF's DMA mappings, an already NULLed pointer was checked
again due to an apparent copy&paste error. Consequently, the pf2vf
bulletin DMA mapping was not freed.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: don't call strlen on non-terminated string in dev_set_alias()
Alexander Potapenko [Tue, 6 Jun 2017 13:56:54 +0000 (15:56 +0200)]
net: don't call strlen on non-terminated string in dev_set_alias()

KMSAN reported a use of uninitialized memory in dev_set_alias(),
which was caused by calling strlcpy() (which in turn called strlen())
on the user-supplied non-terminated string.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 6 Jun 2017 21:30:17 +0000 (14:30 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Made TCP congestion control documentation match current reality,
    from Anmol Sarma.

 2) Various build warning and failure fixes from Arnd Bergmann.

 3) Fix SKB list leak in ipv6_gso_segment().

 4) Use after free in ravb driver, from Eugeniu Rosca.

 5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.

 6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
    Piccoli.

 7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
    Zhang.

 8) Use after free in vxlan deletion, from Mark Bloch.

 9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
    Filippov.

10) Fix stmmac hangs with TSO, from Niklas Cassel.

11) Fix crash in CALIPSO ipv6, from Richard Haines.

12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.

13) Fix regression in sk_err socket error queue handling, noticed by
    ping applications. From Soheil Hassas Yeganeh.

14) Update mlx4/mlx5 MAINTAINERS information.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
  net: stmmac: fix a broken u32 less than zero check
  net: stmmac: fix completely hung TX when using TSO
  net: ethoc: enable NAPI before poll may be scheduled
  net: bridge: fix a null pointer dereference in br_afspec
  ravb: Fix use-after-free on `ifconfig eth0 down`
  net/ipv6: Fix CALIPSO causing GPF with datagram support
  net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
  Revert "sit: reload iphdr in ipip6_rcv"
  i40e/i40evf: proper update of the page_offset field
  i40e: Fix state flags for bit set and clean operations of PF
  iwlwifi: fix host command memory leaks
  iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
  iwlwifi: mvm: clear new beacon command template struct
  iwlwifi: mvm: don't fail when removing a key from an inexisting sta
  iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
  iwlwifi: mvm: fix firmware debug restart recording
  iwlwifi: tt: move ucode_loaded check under mutex
  iwlwifi: mvm: support ibss in dqa mode
  iwlwifi: mvm: Fix command queue number on d0i3 flow
  iwlwifi: mvm: rs: start using LQ command color
  ...

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Tue, 6 Jun 2017 21:28:18 +0000 (14:28 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:

 1) Fix TLB context wrap races, from Pavel Tatashin.

 2) Cure some gcc-7 build issues.

 3) Handle invalid setup_hugepagesz command line values properly, from
    Liam R Howlett.

 4) Copy TSB using the correct address shift for the huge TSB, from Mike
    Kravetz.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: delete old wrap code
  sparc64: new context wrap
  sparc64: add per-cpu mm of secondary contexts
  sparc64: redefine first version
  sparc64: combine activate_mm and switch_mm
  sparc64: reset mm cpumask after wrap
  sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
  sparc: Machine description indices can vary
  sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
  arch/sparc: support NR_CPUS = 4096
  sparc64: Add __multi3 for gcc 7.x and later.
  sparc64: Fix build warnings with gcc 7.
  arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5

7 years agocompiler, clang: suppress warning for unused static inline functions
David Rientjes [Tue, 6 Jun 2017 20:36:24 +0000 (13:36 -0700)]
compiler, clang: suppress warning for unused static inline functions

GCC explicitly does not warn for unused static inline functions for
-Wunused-function.  The manual states:

Warn whenever a static function is declared but not defined or
a non-inline static function is unused.

Clang does warn for static inline functions that are unused.

It turns out that suppressing the warnings avoids potentially complex
#ifdef directives, which also reduces LOC.

Suppress the warning for clang.

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge branch 'sparc64-context-wrap-fixes'
David S. Miller [Tue, 6 Jun 2017 20:45:48 +0000 (13:45 -0700)]
Merge branch 'sparc64-context-wrap-fixes'

Pavel Tatashin says:

====================
sparc64: context wrap fixes

This patch series contains fixes for context wrap: when we are out of
context ids, and need to get a new version.

It fixes memory corruption issues which happen when more than number of
context ids (currently set to 8K) number of processes are started
simultaneously, and processes can get a wrong context.

sparc64: new context wrap:
- contains explanation of new wrap method, and also explanation of races
  that it solves
sparc64: reset mm cpumask after wrap
- explains issue of not reseting cpu mask on a wrap
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: delete old wrap code
Pavel Tatashin [Wed, 31 May 2017 15:25:25 +0000 (11:25 -0400)]
sparc64: delete old wrap code

The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: new context wrap
Pavel Tatashin [Wed, 31 May 2017 15:25:24 +0000 (11:25 -0400)]
sparc64: new context wrap

The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: add per-cpu mm of secondary contexts
Pavel Tatashin [Wed, 31 May 2017 15:25:23 +0000 (11:25 -0400)]
sparc64: add per-cpu mm of secondary contexts

The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: redefine first version
Pavel Tatashin [Wed, 31 May 2017 15:25:22 +0000 (11:25 -0400)]
sparc64: redefine first version

CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: combine activate_mm and switch_mm
Pavel Tatashin [Wed, 31 May 2017 15:25:21 +0000 (11:25 -0400)]
sparc64: combine activate_mm and switch_mm

The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: reset mm cpumask after wrap
Pavel Tatashin [Wed, 31 May 2017 15:25:20 +0000 (11:25 -0400)]
sparc64: reset mm cpumask after wrap

After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:

1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU

This patch implements the first solution

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
Liam R. Howlett [Tue, 30 May 2017 19:45:00 +0000 (15:45 -0400)]
sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.

hugetlb_bad_size needs to be called on invalid values.  Also change the
pr_warn to a pr_err to better align with other platforms.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc: Machine description indices can vary
James Clarke [Mon, 29 May 2017 19:17:56 +0000 (20:17 +0100)]
sparc: Machine description indices can vary

VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: mm: fix copy_tsb to correctly copy huge page TSBs
Mike Kravetz [Fri, 2 Jun 2017 21:51:12 +0000 (14:51 -0700)]
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs

When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used.  As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB.  When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB.  We take a performance hit for the table walk miss and
recreation of these entries.

Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.

Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoarch/sparc: support NR_CPUS = 4096
Jane Chu [Tue, 6 Jun 2017 20:32:29 +0000 (14:32 -0600)]
arch/sparc: support NR_CPUS = 4096

Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.

To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.

Orabug: 25505750

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Atish Patra <atish.patra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: fix a broken u32 less than zero check
Colin Ian King [Tue, 6 Jun 2017 13:10:49 +0000 (14:10 +0100)]
net: stmmac: fix a broken u32 less than zero check

The check that queue is less or equal to zero is always true
because queue is a u32; queue is decremented and will wrap around
and never go -ve. Fix this by making queue an int.

Detected by CoverityScan, CID#1428988 ("Unsigned compared against 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: fix completely hung TX when using TSO
Niklas Cassel [Tue, 6 Jun 2017 07:25:00 +0000 (09:25 +0200)]
net: stmmac: fix completely hung TX when using TSO

stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.

This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.

When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.

When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ethoc: enable NAPI before poll may be scheduled
Max Filippov [Tue, 6 Jun 2017 01:31:16 +0000 (18:31 -0700)]
net: ethoc: enable NAPI before poll may be scheduled

ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: bridge: fix a null pointer dereference in br_afspec
Nikolay Aleksandrov [Mon, 5 Jun 2017 22:26:24 +0000 (01:26 +0300)]
net: bridge: fix a null pointer dereference in br_afspec

We might call br_afspec() with p == NULL which is a valid use case if
the action is on the bridge device itself, but the bridge tunnel code
dereferences the p pointer without checking, so check if p is null
first.

Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoravb: Fix use-after-free on `ifconfig eth0 down`
Eugeniu Rosca [Mon, 5 Jun 2017 22:08:10 +0000 (00:08 +0200)]
ravb: Fix use-after-free on `ifconfig eth0 down`

Commit a47b70ea86bd ("ravb: unmap descriptors when freeing rings") has
introduced the issue seen in [1] reproduced on H3ULCB board.

Fix this by relocating the RX skb ringbuffer free operation, so that
swiotlb page unmapping can be done first. Freeing of aligned TX buffers
is not relevant to the issue seen in [1]. Still, reposition TX free
calls as well, to have all kfree() operations performed consistently
_after_ dma_unmap_*()/dma_free_*().

[1] Console screenshot with the problem reproduced:

salvator-x login: root
root@salvator-x:~# ifconfig eth0 up
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \
       attached PHY driver [Micrel KSZ9031 Gigabit PHY]   \
       (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235)
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@salvator-x:~#
root@salvator-x:~# ifconfig eth0 down

==================================================================
BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649

CPU: 0 PID: 1649 Comm: ifconfig Not tainted 4.12.0-rc4-00004-g112eb07287d1 #32
Hardware name: Renesas H3ULCB board based on r8a7795 (DT)
Call trace:
[<ffff20000808f11c>] dump_backtrace+0x0/0x3a4
[<ffff20000808f4d4>] show_stack+0x14/0x1c
[<ffff20000865970c>] dump_stack+0xf8/0x150
[<ffff20000831f8b0>] print_address_description+0x7c/0x330
[<ffff200008320010>] kasan_report+0x2e0/0x2f4
[<ffff20000831eac0>] check_memory_region+0x20/0x14c
[<ffff20000831f054>] memcpy+0x48/0x68
[<ffff20000869ed50>] swiotlb_tbl_unmap_single+0xc4/0x35c
[<ffff20000869fcf4>] unmap_single+0x90/0xa4
[<ffff20000869fd14>] swiotlb_unmap_page+0xc/0x14
[<ffff2000080a2974>] __swiotlb_unmap_page+0xcc/0xe4
[<ffff2000088acdb8>] ravb_ring_free+0x514/0x870
[<ffff2000088b25dc>] ravb_close+0x288/0x36c
[<ffff200008aaf8c4>] __dev_close_many+0x14c/0x174
[<ffff200008aaf9b4>] __dev_close+0xc8/0x144
[<ffff200008ac2100>] __dev_change_flags+0xd8/0x194
[<ffff200008ac221c>] dev_change_flags+0x60/0xb0
[<ffff200008ba2dec>] devinet_ioctl+0x484/0x9d4
[<ffff200008ba7b78>] inet_ioctl+0x190/0x194
[<ffff200008a78c44>] sock_do_ioctl+0x78/0xa8
[<ffff200008a7a128>] sock_ioctl+0x110/0x3c4
[<ffff200008365a70>] vfs_ioctl+0x90/0xa0
[<ffff200008365dbc>] do_vfs_ioctl+0x148/0xc38
[<ffff2000083668f0>] SyS_ioctl+0x44/0x74
[<ffff200008083770>] el0_svc_naked+0x24/0x28

The buggy address belongs to the page:
page:ffff7e001b6213c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffff7e001b6213e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8006d884f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8006d884f780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff8006d884f800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
root@salvator-x:~#

Fixes: a47b70ea86bd ("ravb: unmap descriptors when freeing rings")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/ipv6: Fix CALIPSO causing GPF with datagram support
Richard Haines [Mon, 5 Jun 2017 15:44:40 +0000 (16:44 +0100)]
net/ipv6: Fix CALIPSO causing GPF with datagram support

When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
IP header may have moved.

Also update the payload length after adding the CALIPSO option.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
Colin Ian King [Mon, 5 Jun 2017 09:04:52 +0000 (10:04 +0100)]
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value

The current comparison of entry < 0 will never be true since entry is an
unsigned integer. Make entry an int to ensure -ve error return values
from the call to jumbo_frm are correctly being caught.

Detected by CoverityScan, CID#1238760 ("Macro compares unsigned to 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'wireless-drivers-for-davem-2017-06-06' of git://git.kernel.org/pub/scm...
David S. Miller [Tue, 6 Jun 2017 16:53:20 +0000 (12:53 -0400)]
Merge tag 'wireless-drivers-for-davem-2017-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.12

It has been a slow start of cycle and this the first set of fixes for
4.12. Nothing really major here.

wcn36xx

* fix an issue with module reload

brcmfmac

* fix aligment regression on 64 bit systems

iwlwifi

* fixes for memory leaks, runtime PM, memory initialisation and other
  smaller problems

* fix IBSS on devices using DQA mode (7260 and up)

* fix the minimum firmware API requirement for 7265D, 3168, 8000 and
  8265
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'media/v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Tue, 6 Jun 2017 16:37:44 +0000 (09:37 -0700)]
Merge tag 'media/v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "Some bug fixes:

   - Don't fail build if atomisp has warnings

   - Some CEC Kconfig changes to allow it to be used by DRM without
     media dependencies

   - A race fix at RC initialization code

   - A driver fix at rainshadow-cec

  IMHO, the one that affects most people in this series is a build fix:
  if you try to build the Kernel with W=1 or using gcc7 and
  all[yes|mod]config, build will fail due to -Werror at atomisp
  makefiles"

* tag 'media/v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] rc-core: race condition during ir_raw_event_register()
  [media] cec: drop MEDIA_CEC_DEBUG
  [media] cec: rename MEDIA_CEC_NOTIFIER to CEC_NOTIFIER
  [media] cec: select CEC_CORE instead of depend on it
  [media] rainshadow-cec: ensure exit_loop is intialized
  [media] atomisp: don't treat warnings as errors

7 years agoMerge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Tue, 6 Jun 2017 16:12:57 +0000 (12:12 -0400)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2017-06-06

This series contains fixes to i40e and i40evf only.

Mauro S. M. Rodrigues fixes a flood in the kernel log which was introduced
in a previous commit because of a mistaken substitution of __I40E_VSI_DOWN
instead of __I40E_DOWN when testing the state of the PF.

Björn Töpel fixes an issue introduced in a previous commit where the
offset was incorrect and could lead to data corruption for architectures
using PAGE_SIZE larger than 8191.  Fixed the issue by updating the
page_offset correctly using the proper setting for truesize.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoRevert "sit: reload iphdr in ipip6_rcv"
David S. Miller [Tue, 6 Jun 2017 15:34:06 +0000 (11:34 -0400)]
Revert "sit: reload iphdr in ipip6_rcv"

This reverts commit b699d0035836f6712917a41e7ae58d84359b8ff9.

As per Eric Dumazet, the pskb_may_pull() is a NOP in this
particular case, so the 'iph' reload is unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoi40e/i40evf: proper update of the page_offset field
Björn Töpel [Mon, 15 May 2017 04:52:00 +0000 (06:52 +0200)]
i40e/i40evf: proper update of the page_offset field

In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoi40e: Fix state flags for bit set and clean operations of PF
Mauro S. M. Rodrigues [Sat, 13 May 2017 02:26:56 +0000 (23:26 -0300)]
i40e: Fix state flags for bit set and clean operations of PF

Commit 0da36b9774cc ("i40e: use DECLARE_BITMAP for state fields")
introduced changes in the way i40e works with state flags converting
them to bitmaps using kernel bitmap API. This change introduced a
regression due to a mistaken substitution using __I40E_VSI_DOWN instead
of __I40E_DOWN when testing state of a PF at i40e_reset_subtask()
function. This caused a flood in the kernel log with the follow message:

[49.013] i40e 0002:01:00.0: bad reset request 0x00000020

Commit d19cb64b9222 ("i40e: separate PF and VSI state flags")
also introduced some misuse of the VSI and PF flags, so both could be
considered as the offenders.

This patch simply fixes the flags where it makes sense by changing
__I40E_VSI_DOWN to __I40E_DOWN.

Fixes: 0da36b9774cc ("i40e: use DECLARE_BITMAP for state fields")
Fixes: d19cb64b9222 ("i40e: separate PF and VSI state flags")
Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
7 years agoMerge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 5 Jun 2017 22:37:03 +0000 (15:37 -0700)]
Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "Two cgroup fixes. One to address RCU delay of cpuset removal affecting
  userland visible behaviors. The other fixes a race condition between
  controller disable and cgroup removal"

* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: consider dying css as offline
  cgroup: Prevent kill_css() from being called more than once

7 years agoMerge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 5 Jun 2017 22:31:14 +0000 (15:31 -0700)]
Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:

 - Revert of sata_mv devm_ioremap_resource() conversion. It made init
   fail if there are overlapping resources which led to detection
   failures on some setups.

 - A workaround for an Acer laptop which sometimes reports corrupt port
   map.

 - Other non-critical fixes.

* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix error checking in in ata_parse_force_one()
  Revert "ata: sata_mv: Convert to devm_ioremap_resource()"
  ata: libahci: properly propagate return value of platform_get_irq()
  ata: sata_rcar: Handle return value of clk_prepare_enable
  ahci: Acer SA5-271 SSD Not Detected Fix

7 years agoMerge tag 'iwlwifi-for-kalle-2017-06-05' of git://git.kernel.org/pub/scm/linux/kernel...
Kalle Valo [Mon, 5 Jun 2017 19:21:25 +0000 (22:21 +0300)]
Merge tag 'iwlwifi-for-kalle-2017-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

Fixes for 4.12:

* Some memory leaks;
* IBSS support;
* Some bugzilla bugs;
* Some runtime PM fixes;
* Rate-scaling issues;
* Some locking problems;

7 years agoiwlwifi: fix host command memory leaks
Shahar S Matityahu [Thu, 6 Apr 2017 10:35:38 +0000 (13:35 +0300)]
iwlwifi: fix host command memory leaks

Sending host command with CMD_WANT_SKB flag demands the release of the
response buffer with iwl_free_resp function.
The patch adds the memory release in all the relevant places

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
Luca Coelho [Tue, 25 Apr 2017 07:18:10 +0000 (10:18 +0300)]
iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265

In a previous commit, we removed support for API versions earlier than
22 for these NICs.  By mistake, the *_UCODE_API_MIN definitions were
set to 17.  Fix that.

Fixes: 4b87e5af638b ("iwlwifi: remove support for fw older than -17 and -22")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: clear new beacon command template struct
Johannes Berg [Fri, 31 Mar 2017 08:47:35 +0000 (10:47 +0200)]
iwlwifi: mvm: clear new beacon command template struct

Clear the struct so that all reserved fields are zero when we
send the struct down to the device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: don't fail when removing a key from an inexisting sta
Luca Coelho [Thu, 30 Mar 2017 09:04:47 +0000 (12:04 +0300)]
iwlwifi: mvm: don't fail when removing a key from an inexisting sta

The iwl_mvm_remove_sta_key() function handles removing a key when the
sta doesn't exist anymore.  Mistakenly, this was changed to return an
error while fixing another bug.

If the mvm_sta doesn't exist, we continue normally, but just don't try
to remove the igtk key.

Fixes: cd4d23c1ea9b ("iwlwifi: mvm: Fix removal of IGTK")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
Luca Coelho [Fri, 24 Mar 2017 09:01:45 +0000 (11:01 +0200)]
iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3

We only need to handle d0i3 entry and exit during suspend resume if
system_pm is set to IWL_PLAT_PM_MODE_D0I3, otherwise d0i3 entry
failures will cause suspend to fail.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=194791

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: fix firmware debug restart recording
Emmanuel Grumbach [Wed, 29 Mar 2017 07:21:09 +0000 (10:21 +0300)]
iwlwifi: mvm: fix firmware debug restart recording

When we want to stop the recording of the firmware debug
and restart it later without reloading the firmware we
don't need to resend the configuration that comes with
host commands.
Sending those commands confused the hardware and led to
an NMI 0x66.

Change the flow as following:
* read the relevant registers (DBGC_IN_SAMPLE, DBGC_OUT_CTRL)
* clear those registers
* wait for the hardware to complete its write to the buffer
* get the data
* restore the value of those registers (to restart the
  recording)

For early start (where the configuration is already
compiled in the firmware), we don't need to set those
registers after the firmware has been loaded, but only
when we want to restart the recording without having
restarted the firmware.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: tt: move ucode_loaded check under mutex
Johannes Berg [Wed, 22 Mar 2017 21:00:10 +0000 (22:00 +0100)]
iwlwifi: tt: move ucode_loaded check under mutex

The ucode_loaded check should be under the mutex, since it can
otherwise change state after we looked at it and before we got
the mutex. Fix that.

Fixes: 5c89e7bc557e ("iwlwifi: mvm: add registration to cooling device")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: support ibss in dqa mode
Liad Kaufman [Tue, 21 Mar 2017 15:13:16 +0000 (17:13 +0200)]
iwlwifi: mvm: support ibss in dqa mode

Allow working IBSS also when working in DQA mode.
This is done by setting it to treat the queues the
same as a BSS AP treats the queues.

Fixes: 7948b87308a4 ("iwlwifi: mvm: enable dynamic queue allocation mode")
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: Fix command queue number on d0i3 flow
Haim Dreyfuss [Thu, 16 Mar 2017 15:26:03 +0000 (17:26 +0200)]
iwlwifi: mvm: Fix command queue number on d0i3 flow

During d0i3 flow we flush all the queue except from the command queue.
Currently, in this flow the command queue is hard coded to 9.
In DQA the command queue number has changed from 9 to 0.
Fix that.

This fixes a problem in runtime PM resume flow.

Fixes: 097129c9e625 ("iwlwifi: mvm: move cmd queue to be #0 in dqa mode")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agoiwlwifi: mvm: rs: start using LQ command color
Gregory Greenman [Mon, 6 Mar 2017 09:15:41 +0000 (11:15 +0200)]
iwlwifi: mvm: rs: start using LQ command color

Up until now, the driver was comparing the rate reported by the FW and
the rate of the latest LQ command to avoid processing data belonging
to the old LQ command. Recently, FW changed the meaning of the initial
rate field in tx response and it holds the actual rate (which is not
necessarily the initial rate of LQ's rate table). Use instead LQ cmd
color to be able to filter out tx responses/BA notifications which
where sent during earlier LQ commands' time frame.

This fixes some throughput degradation in noisy environments.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
7 years agosparc64: Add __multi3 for gcc 7.x and later.
David S. Miller [Mon, 5 Jun 2017 18:28:57 +0000 (11:28 -0700)]
sparc64: Add __multi3 for gcc 7.x and later.

Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Mon, 5 Jun 2017 18:19:40 +0000 (11:19 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Three fixes this time around:

   - Two fixes for noMMU, fixing the decompressor header layout, and
     preventing a build error with some configurations.

   - Fixing the hyp-stub updates that went in during the merge window
     for platforms that use MCPM"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
  ARM: 8676/1: NOMMU: provide pgprot_device() macro
  ARM: 8675/1: MCPM: ensure not to enter __hyp_soft_restart from loopback and cpu_power_down

7 years agonet/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
Ido Shamay [Mon, 5 Jun 2017 07:44:56 +0000 (10:44 +0300)]
net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport

The Granular QoS per VF feature must be enabled in FW before it can be
used.

Thus, the driver cannot modify a QP's qos_vport value (via the UPDATE_QP FW
command) if the feature has not been enabled -- the FW returns an error if
this is attempted.

Fixes: 08068cd5683f ("net/mlx4: Added qos_vport QP configuration in VST mode")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: fix kernel-doc warnings
Randy Dunlap [Mon, 5 Jun 2017 02:46:53 +0000 (19:46 -0700)]
net: phy: fix kernel-doc warnings

Fix kernel-doc warnings (typo) in drivers/net/phy/phy.c:

..//drivers/net/phy/phy.c:259: warning: No description found for parameter 'features'
..//drivers/net/phy/phy.c:259: warning: Excess function parameter 'feature' description in 'phy_lookup_setting'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodevlink: fix potential memort leak
Haishuang Yan [Mon, 5 Jun 2017 00:57:21 +0000 (08:57 +0800)]
devlink: fix potential memort leak

We must free allocated skb when genlmsg_put() return fails.

Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: Update TCP congestion control documentation
Anmol Sarma [Sat, 3 Jun 2017 12:10:54 +0000 (17:40 +0530)]
net: Update TCP congestion control documentation

Update tcp.txt to fix mandatory congestion control ops and default
CCA selection. Also, fix comment in tcp.h for undo_cwnd.

Signed-off-by: Anmol Sarma <me@anmolsarma.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
Ard Biesheuvel [Wed, 24 May 2017 14:31:57 +0000 (15:31 +0100)]
ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M

As reported by Patrice, the header layout of the decompressor is
incorrect when building for v7-M. In this case, the __nop macro
resolves to 'mov r0, r0', which is emitted as a narrow encoding,
resulting in the header data fields to end up at lower offsets than
required.

Given the variety of targets we need to support with the same code,
the startup sequence is a bit of a jumble, and uses instructions
and macros whose encoding widths cannot be specified (badr), or only
exist in a narrow encoding (bx)

So force the use of a wide encoding in __nop, and replace the start
sequence with a simple jump to the label marking the start of code,
preceded by a Thumb2 mode switch if required (using explicit wide
encodings where appropriate). The label itself can be moved to the
start of code [where it belongs] due to the larger range of branch
instructions as compared to adr instructions.

Reported-by: Patrice CHOTARD <patrice.chotard@st.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
7 years agoARM: 8676/1: NOMMU: provide pgprot_device() macro
Vladimir Murzin [Wed, 24 May 2017 09:30:18 +0000 (10:30 +0100)]
ARM: 8676/1: NOMMU: provide pgprot_device() macro

NOMMU build leads to the following error:

  CC      drivers/pci/mmap.o
drivers/pci/mmap.c: In function 'pci_mmap_resource_range':
drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration]
   vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
   ^

cc1: some warnings being treated as errors
scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed
make[2]: *** [drivers/pci/mmap.o] Error 1
scripts/Makefile.build:561: recipe for target 'drivers/pci' failed
make[1]: *** [drivers/pci] Error 2
Makefile:1016: recipe for target 'drivers' failed
make: *** [drivers] Error 2

Fix it with support of pgprot_device() macro for NOMMU.

Fixes: 00d2904ffeac ("ARM/PCI: Use generic pci_mmap_resource_range()")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
7 years agonet/mlx4: Fix the check in attaching steering rules
Talat Batheesh [Sun, 4 Jun 2017 11:30:07 +0000 (14:30 +0300)]
net/mlx4: Fix the check in attaching steering rules

Our previous patch (cited below) introduced a regression
for RAW Eth QPs.

Fix it by checking if the QP number provided by user-space
exists, hence allowing steering rules to be added for valid
QPs only.

Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosit: reload iphdr in ipip6_rcv
Haishuang Yan [Sun, 4 Jun 2017 06:43:43 +0000 (14:43 +0800)]
sit: reload iphdr in ipip6_rcv

Since iptunnel_pull_header() can call pskb_may_pull(),
we must reload any pointer that was related to skb->head.

Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: ping: do not abuse udp_poll()
Eric Dumazet [Sat, 3 Jun 2017 16:29:25 +0000 (09:29 -0700)]
net: ping: do not abuse udp_poll()

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: Fix stale cpu_switch reference after unbind then bind
Florian Fainelli [Sat, 3 Jun 2017 05:05:23 +0000 (22:05 -0700)]
net: dsa: Fix stale cpu_switch reference after unbind then bind

Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.

The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.

To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().

Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: Fix leak in ipv6_gso_segment().
David S. Miller [Mon, 5 Jun 2017 01:41:10 +0000 (21:41 -0400)]
ipv6: Fix leak in ipv6_gso_segment().

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogeneve: fix needed_headroom and max_mtu for collect_metadata
Eric Garver [Fri, 2 Jun 2017 18:54:10 +0000 (14:54 -0400)]
geneve: fix needed_headroom and max_mtu for collect_metadata

Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.

For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosock: reset sk_err when the error queue is empty
Soheil Hassas Yeganeh [Fri, 2 Jun 2017 16:38:22 +0000 (12:38 -0400)]
sock: reset sk_err when the error queue is empty

Prior to f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.

Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.

Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.

Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Tested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoamd-xgbe: use PAGE_ALLOC_COSTLY_ORDER in xgbe_map_rx_buffer
Michal Hocko [Fri, 2 Jun 2017 15:54:08 +0000 (17:54 +0200)]
amd-xgbe: use PAGE_ALLOC_COSTLY_ORDER in xgbe_map_rx_buffer

xgbe_map_rx_buffer is rather confused about what PAGE_ALLOC_COSTLY_ORDER
means. It uses PAGE_ALLOC_COSTLY_ORDER-1 assuming that
PAGE_ALLOC_COSTLY_ORDER is the first costly order which is not the case
actually because orders larger than that are costly. And even that
applies only to sleeping allocations which is not the case here. We
simply do not perform any costly operations like reclaim or compaction
for those. Simplify the code by dropping the order calculation and use
PAGE_ALLOC_COSTLY_ORDER directly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoip6_tunnel: fix traffic class routing for tunnels
Liam McBirnie [Thu, 1 Jun 2017 05:36:01 +0000 (15:36 +1000)]
ip6_tunnel: fix traffic class routing for tunnels

ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.

Commit 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.

The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.

Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoLinux 4.12-rc4
Linus Torvalds [Sun, 4 Jun 2017 23:47:43 +0000 (16:47 -0700)]
Linux 4.12-rc4

7 years agofs/ufs: Set UFS default maximum bytes per file
Richard Narron [Sun, 4 Jun 2017 23:23:18 +0000 (16:23 -0700)]
fs/ufs: Set UFS default maximum bytes per file

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

    https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45e2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agonet: qcom/emac: do not use hardware mdio automatic polling
Timur Tabi [Thu, 1 Jun 2017 21:08:13 +0000 (16:08 -0500)]
net: qcom/emac: do not use hardware mdio automatic polling

Use software polling (PHY_POLL) to check for link state changes instead
of relying on the EMAC's hardware polling feature.  Some PHY drivers
are unable to get a functioning link because the HW polling is not
robust enough.

The EMAC is able to poll the PHY on the MDIO bus looking for link state
changes (via the Link Status bit in the Status Register at address 0x1).
When the link state changes, the EMAC triggers an interrupt and tells the
driver what the new state is.  The feature eliminates the need for
software to poll the MDIO bus.

Unfortunately, this feature is incompatible with phylib, because it
ignores everything that the PHY core and PHY drivers are trying to do.
In particular:

1. It assumes a compatible register set, so PHYs with different registers
   may not work.

2. It doesn't allow for hardware errata that have work-arounds implemented
   in the PHY driver.

3. It doesn't support multiple register pages. If the PHY core switches
   the register set to another page, the EMAC won't know the page has
   changed and will still attempt to read the same PHY register.

4. It only checks the copper side of the link, not the SGMII side.  Some
   PHY drivers (e.g. at803x) may also check the SGMII side, and
   report the link as not ready during autonegotiation if the SGMII link
   is still down.  Phylib then waits for another interrupt to query
   the PHY again, but the EMAC won't send another interrupt because it
   thinks the link is up.

Cc: stable@vger.kernel.org # 4.11.x
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sun, 4 Jun 2017 18:56:53 +0000 (11:56 -0700)]
Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Bugfixes include:

   - Fix a typo in commit e092693443b ("NFS append COMMIT after
     synchronous COPY") that breaks copy offload

   - Fix the connect error propagation in xs_tcp_setup_socket()

   - Fix a lock leak in nfs40_walk_client_list

   - Verify that pNFS requests lie within the offset range of the layout
     segment"

* tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Mark unnecessarily extern functions as static
  SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
  NFSv4.0: Fix a lock leak in nfs40_walk_client_list
  pnfs: Fix the check for requests in range of layout segment
  xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup()
  pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
  NFS fix COMMIT after COPY

7 years agoMerge tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sun, 4 Jun 2017 18:41:41 +0000 (11:41 -0700)]
Merge tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fix from Greg KH:
 "Here is a single tty core fix for 4.12-rc4. It reverts a patch that a
  lot of people reported as causing lockdep and other warnings.

  Right after I reverted this in my tree, it seems like another
  "correct" fix might have shown up, but it's too late in the release
  cycle to be messing with tty core locking, so let's just revert this
  for now to go back how things always have been and try it again for
  4.13.

  This has not been in linux-next as I only reverted it a few hours ago"

* tag 'tty-4.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  Revert "tty: fix port buffer locking"

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sun, 4 Jun 2017 18:37:42 +0000 (11:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input subsystem fixes from Dmitry Torokhov:

 - a couple of regression fixes in synaptics and axp20x-pek drivers

 - try to ease transition from PS/2 to RMI for Synaptics touchpad users
   by ensuring we do not try to activate RMI mode when RMI SMBus support
   is not enabled, and nag users a bit to enable it

 - plus a couple of other changes that seemed worthwhile for this
   release

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: axp20x-pek - switch to acpi_dev_present and check for ACPI0011 too
  Input: axp20x-pek - only check for "INTCFD9" ACPI device on Cherry Trail
  Input: tm2-touchkey - use LEN_ON as boolean value instead of LED_FULL
  Input: synaptics - tell users to report when they should be using rmi-smbus
  Input: synaptics - warn the users when there is a better mode
  Input: synaptics - keep PS/2 around when RMI4_SMB is not enabled
  Input: synaptics - clear device info before filling in
  Input: silead - disable interrupt during suspend

7 years agoMerge tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni...
Linus Torvalds [Sun, 4 Jun 2017 18:29:32 +0000 (11:29 -0700)]
Merge tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC fixlet from Alexandre Belloni:
 "A single patch, not really a fix but I don't think there is any reason
  to delay it.

  Change the mailing list address"

* tag 'rtc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  MAINTAINERS: update RTC mailing list

7 years ago[media] rc-core: race condition during ir_raw_event_register()
Sean Young [Wed, 24 May 2017 09:24:51 +0000 (06:24 -0300)]
[media] rc-core: race condition during ir_raw_event_register()

A rc device can call ir_raw_event_handle() after rc_allocate_device(),
but before rc_register_device() has completed. This is racey because
rcdev->raw is set before rcdev->raw->thread has a valid value.

Cc: stable@kernel.org
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>