Mintz, Yuval [Fri, 9 Jun 2017 14:17:01 +0000 (17:17 +0300)]
bnx2x: Allow vfs to disable txvlan offload
VF clients are configured as enforced, meaning firmware is validating
the correctness of their ethertype/vid during transmission.
Once txvlan is disabled, VF would start getting SKBs for transmission
here vlan is on the payload - but it'll pass the packet's ethertype
instead of the vid, leading to firmware declaring it as malicious.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 9 Jun 2017 19:41:57 +0000 (15:41 -0400)]
Merge tag 'linux-can-fixes-for-4.12-20170609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2017-06-09
this is a pull request of 6 patches for net/master.
There's a patch by Stephane Grosjean that fixes an uninitialized symbol warning
in the peak_canfd driver. A patch by Johan Hovold to fix the product-id
endianness in an error message in the the peak_usb driver. A patch by Oliver
Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by
me, one makes the helper function can_change_state() robust to be called with
cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the
last one fixes a lockdep splat by properly initialize the per-net
can_rcvlists_lock spin_lock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Fri, 9 Jun 2017 19:33:09 +0000 (21:33 +0200)]
mac80211: free netdev on dev_alloc_name() error
The change to remove free_netdev() from ieee80211_if_free()
erroneously didn't add the necessary free_netdev() for when
ieee80211_if_free() is called directly in one place, rather
than as the priv_destructor. Add the missing call.
Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
IPI's from the victim cpu are not handled in dev_cpu_callback.
So these pending IPI's would be sent to the remote cpu only when
NET_RX is scheduled on the victim cpu and since this trigger is
unpredictable it would result in packet latencies on the remote cpu.
This patch add support to send the pending ipi's of victim cpu.
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Mario Molitor [Thu, 8 Jun 2017 21:03:09 +0000 (23:03 +0200)]
stmmac: fix for hw timestamp of GMAC3 unit
1.) Bugfix of function stmmac_get_tx_hwtstamp.
Corrected the tx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.
2.) Bugfix of function stmmac_get_rx_hwtstamp.
Corrected the rx timestamp available check (same as 4.8 and older)
Change printout from info syslevel to debug.
Fixes: ba1ffd74df74 ("stmmac: fix PTP support for GMAC4") Signed-off-by: Mario Molitor <mario_molitor@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Mario Molitor [Thu, 8 Jun 2017 20:41:02 +0000 (22:41 +0200)]
stmmac: fix ptp header for GMAC3 hw timestamp
According the CYCLON V documention only the bit 16 of snaptypesel should
set.
(more information see Table 17-20 (cv_5v4.pdf) :
Timestamp Snapshot Dependency on Register Bits)
Fixes: d2042052a0aa ("stmmac: update the PTP header file") Signed-off-by: Mario Molitor <mario_molitor@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix an intermittent pr_emerg warning about lo becoming free.
It looks like this:
Message from syslogd@flamingo at Apr 26 00:45:00 ...
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4
They seem to coincide with net namespace teardown.
The message is emitted by netdev_wait_allrefs().
Forced a kdump in netdev_run_todo, but found that the refcount on the lo
device was already 0 at the time we got to the panic.
Used bcc to check the blocking in netdev_run_todo. The only places
where we're off cpu there are in the rcu_barrier() and msleep() calls.
That behavior is expected. The msleep time coincides with the amount of
time we spend waiting for the refcount to reach zero; the rcu_barrier()
wait times are not excessive.
After looking through the list of callbacks that the netdevice notifiers
invoke in this path, it appears that the dst_dev_event is the most
interesting. The dst_ifdown path places a hold on the loopback_dev as
part of releasing the dev associated with the original dst cache entry.
Most of our notifier callbacks are straight-forward, but this one a)
looks complex, and b) places a hold on the network interface in
question.
I constructed a new bcc script that watches various events in the
liftime of a dst cache entry. Note that dst_ifdown will take a hold on
the loopback device until the invalidated dst entry gets freed.
[ __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
__dst_free
rcu_nocb_kthread
kthread
ret_from_fork Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Jurczyk [Thu, 8 Jun 2017 09:13:36 +0000 (11:13 +0200)]
af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_UNIX socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 9 Jun 2017 13:45:32 +0000 (15:45 +0200)]
net: phy: add missing SPEED_14000
Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Fri, 2 Jun 2017 17:37:30 +0000 (19:37 +0200)]
can: enable CAN FD for virtual CAN devices by default
CAN FD capable CAN interfaces can handle (classic) CAN 2.0 frames too.
New users usually fail at their first attempt to explore CAN FD on
virtual CAN interfaces due to the current CAN_MTU default.
Set the MTU to CANFD_MTU by default to reduce this confusion.
If someone *really* needs a 'classic CAN'-only device this can be set
with the 'ip' tool with e.g. 'ip link set vcan0 mtu 16' as before.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch uses spin_lock_init() instead of __SPIN_LOCK_UNLOCKED() to
initialize the per namespace net->can.can_rcvlists_lock lock to fix this
lockdep warning:
| INFO: trying to register non-static key.
| the code is fine but needs lockdep annotation.
| turning off the locking correctness validator.
| CPU: 0 PID: 186 Comm: candump Not tainted 4.12.0-rc3+ #47
| Hardware name: Marvell Kirkwood (Flattened Device Tree)
| [<c0016644>] (unwind_backtrace) from [<c00139a8>] (show_stack+0x18/0x1c)
| [<c00139a8>] (show_stack) from [<c0058c8c>] (register_lock_class+0x1e4/0x55c)
| [<c0058c8c>] (register_lock_class) from [<c005bdfc>] (__lock_acquire+0x148/0x1990)
| [<c005bdfc>] (__lock_acquire) from [<c005deec>] (lock_acquire+0x174/0x210)
| [<c005deec>] (lock_acquire) from [<c04a6780>] (_raw_spin_lock+0x50/0x88)
| [<c04a6780>] (_raw_spin_lock) from [<bf02116c>] (can_rx_register+0x94/0x15c [can])
| [<bf02116c>] (can_rx_register [can]) from [<bf02a868>] (raw_enable_filters+0x60/0xc0 [can_raw])
| [<bf02a868>] (raw_enable_filters [can_raw]) from [<bf02ac14>] (raw_enable_allfilters+0x2c/0xa0 [can_raw])
| [<bf02ac14>] (raw_enable_allfilters [can_raw]) from [<bf02ad38>] (raw_bind+0xb0/0x250 [can_raw])
| [<bf02ad38>] (raw_bind [can_raw]) from [<c03b5fb8>] (SyS_bind+0x70/0xac)
| [<c03b5fb8>] (SyS_bind) from [<c000f8c0>] (ret_fast_syscall+0x0/0x1c)
Cc: Mario Kicherer <dev@kicherer.org> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: peak_canfd: fix uninitialized symbol warnings
This patch fixes two uninitialized symbol warnings in the new code adding
support of the PEAK-System PCAN-PCI Express FD boards, in the socket-CAN
network protocol family.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
can: dev: make can_change_state() robust to be called with cf == NULL
In OOM situations where no skb can be allocated, can_change_state() may
be called with cf == NULL. As this function updates the state and error
statistics it's not an option to skip the call to can_change_state() in
OOM situations.
This patch makes can_change_state() robust, so that it can be called
with cf == NULL.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
David Ahern [Thu, 8 Jun 2017 17:31:11 +0000 (11:31 -0600)]
net: vrf: Make add_fib_rules per network namespace flag
Commit 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
adds the l3mdev FIB rule the first time a VRF device is created. However,
it only creates the rule once and only in the namespace the first device
is created - which may not be init_net. Fix by using the net_generic
capability to make the add_fib_rules flag per network namespace.
Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Reported-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tracking down the issue actually revealed that endianness selection
in bpf_endian.h is broken when compiled with clang with bpf target.
test_pkt_access.c, test_l4lb.c is compiled with __BYTE_ORDER as
__BIG_ENDIAN, test_xdp.c as __LITTLE_ENDIAN! test_l4lb noticeably
fails, because the test accounts bytes via bpf_ntohs(ip6h->payload_len)
and bpf_ntohs(iph->tot_len), and compares them against a defined
value and given a wrong endianness, the test outcome is different,
of course.
Turns out that there are actually two bugs: i) when we do __BYTE_ORDER
comparison with __LITTLE_ENDIAN/__BIG_ENDIAN, then depending on the
include order we see different outcomes. Reason is that __BYTE_ORDER
is undefined due to missing endian.h include. Before we include the
asm/byteorder.h (e.g. through linux/in.h), then __BYTE_ORDER equals
__LITTLE_ENDIAN since both are undefined, after the include which
correctly pulls in linux/byteorder/little_endian.h, __LITTLE_ENDIAN
is defined, but given __BYTE_ORDER is still undefined, we match on
__BYTE_ORDER equals to __BIG_ENDIAN since __BIG_ENDIAN is also
undefined at that point, sigh. ii) But even that would be wrong,
since when compiling the test cases with clang, one can select between
bpfeb and bpfel targets for cross compilation. Hence, we can also not
rely on what the system's endian.h provides, but we need to look at
the compiler's defined endianness. The compiler defines __BYTE_ORDER__,
and we can match __ORDER_LITTLE_ENDIAN__ and __ORDER_BIG_ENDIAN__,
which also reflects targets bpf (native), bpfel, bpfeb correctly,
thus really only rely on that. After patch:
Fixes: 43bcf707ccdc ("bpf: fix _htons occurences in test_progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Thu, 8 Jun 2017 09:18:13 +0000 (11:18 +0200)]
ethtool.h: remind to update 802.3ad when adding new speeds
Each time a new speed is added, the bonding 802.3ad isn't updated. Add a
comment to remind the developer to update this driver.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Thu, 8 Jun 2017 09:18:12 +0000 (11:18 +0200)]
bonding: fix 802.3ad support for 14G speed
This patch adds 14 Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.
Fixes: 0d7e2d2166f6 ("IB/ipoib: add get_link_ksettings in ethtool") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Thibaut Collet [Thu, 8 Jun 2017 09:18:11 +0000 (11:18 +0200)]
bonding: fix 802.3ad support for 5G and 50G speeds
This patch adds [5|50] Gbps enum definition, and fixes
aggregated bandwidth calculation based on above slave links.
Fixes: c9a70d43461d ("net-next: ethtool: Added port speed macros.") Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 8 Jun 2017 07:54:24 +0000 (09:54 +0200)]
ila_xlat: add missing hash secret initialization
While discussing the possible merits of clang warning about unused initialized
functions, I found one function that was clearly meant to be called but
never actually is.
__ila_hash_secret_init() initializes the hash value for the ila locator,
apparently this is intended to prevent hash collision attacks, but this ends
up being a read-only zero constant since there is no caller. I could find
no indication of why it was never called, the earliest patch submission
for the module already was like this. If my interpretation is right, we
certainly want to backport the patch to stable kernels as well.
I considered adding it to the ila_xlat_init callback, but for best effect
the random data is read as late as possible, just before it is first used.
The underlying net_get_random_once() is already highly optimized to avoid
overhead when called frequently.
Fixes: 7f00feaf1076 ("ila: Add generic ILA translation facility") Cc: stable@vger.kernel.org Link: https://www.spinics.net/lists/kernel/msg2527243.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 Jun 2017 15:51:59 +0000 (11:51 -0400)]
net: Fix build regression in rtl8723bs staging driver.
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c: In function ‘rtw_cfg80211_add_monitor_if’:
drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c:2670:10: error: ‘struct net_device’ has no member named ‘destructor’
mon_ndev->destructor = rtw_ndev_destructor;
^
Signed-off-by: David S. Miller <davem@davemloft.net>
The work queue and handling of network filter parameters should
be in rndis_device. This gets rid of warning from RCU checks,
eliminates a race and cleans up code.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_poll_controller function needs to schedule NAPI to pick
up arriving packets and send completions. Otherwise no data
will ever be received. For simple case of netconsole, it also
will allow send completions to happen. Without this netpoll
will eventually get stuck.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 7 Jun 2017 18:26:23 +0000 (12:26 -0600)]
net: ipv6: Release route when device is unregistering
Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:
$ ifdown bond2 # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2
Steps to reproduce:
echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
ip link add dev bond12 type bond
ip link add dev bond13 type bond
ip addr add 2001:db8:2::0/64 dev bond12
ip addr add 2001:db8:3::0/64 dev bond13
ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
ip link del dev bond12
ip link del dev bond13
The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.
Fixes: a1a22c12060e4 ("net: ipv6: Keep nexthop of multipath route on admin down") Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Wed, 7 Jun 2017 18:00:33 +0000 (21:00 +0300)]
net: Zero ifla_vf_info in rtnl_fill_vfinfo()
Some of the structure's fields are not initialized by the
rtnetlink. If driver doesn't set those in ndo_get_vf_config(),
they'd leak memory to user.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> CC: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Jurczyk [Wed, 7 Jun 2017 14:14:29 +0000 (16:14 +0200)]
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.
The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
emac_mdio_read_link() was not copying the requested phy settings
back into the emac driver's own phy api. This has caused a link
speed mismatch issue for the AR8035 as the emac driver kept
trying to connect with 10/100MBps on a 1GBit/s link.
This patch also unifies shared code between emac_setup_aneg()
and emac_mdio_setup_forced(). And furthermore it removes
a chunk of emac_mdio_init_phy(), that was copying the same
data into itself.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem where the AR8035 PHY can't be
detected on an Cisco Meraki MR24, if the ethernet cable is
not connected on boot.
Russell Senior provided steps to reproduce the issue:
|Disconnect ethernet cable, apply power, wait until device has booted,
|plug in ethernet, check for interfaces, no eth0 is listed.
|
|This appears to be a problem during probing of the AR8035 Phy chip.
|When ethernet has no link, the phy detection fails, and eth0 is not
|created. Plugging ethernet later has no effect, because there is no
|interface as far as the kernel is concerned. The relevant part of
|the boot log looks like this:
|this is the failing case:
|
|[ 0.876611] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[ 0.882532] /plb/opb/ethernet@ef600c00: reset timeout
|[ 0.888546] /plb/opb/ethernet@ef600c00: can't find PHY!
|and the succeeding case:
|
|[ 0.876672] /plb/opb/emac-rgmii@ef601500: input 0 in RGMII mode
|[ 0.883952] eth0: EMAC-0 /plb/opb/ethernet@ef600c00, MAC 00:01:..
|[ 0.890822] eth0: found Atheros 8035 Gigabit Ethernet PHY (0x01)
Based on the comment and the commit message of
commit 23fbb5a87c56 ("emac: Fix EMAC soft reset on 460EX/GT").
This is because the AR8035 PHY doesn't provide the TX Clock,
if the ethernet cable is not attached. This causes the reset
to timeout and the PHY detection code in emac_init_phy() is
unable to detect the AR8035 PHY. As a result, the emac driver
bails out early and the user left with no ethernet.
In order to stay compatible with existing configurations, the driver
tries the current reset approach at first. Only if the first attempt
timed out, it does perform one more retry with the clock temporarily
switched to the internal source for just the duration of the reset.
Cc: Chris Blake <chrisrblake93@gmail.com> Reported-by: Russell Senior <russell@personaltelco.net> Fixes: 23fbb5a87c56e98 ("emac: Fix EMAC soft reset on 460EX/GT") Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Mateusz Jurczyk [Wed, 7 Jun 2017 13:14:29 +0000 (15:14 +0200)]
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.
The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 May 2017 16:52:56 +0000 (12:52 -0400)]
net: Fix inconsistent teardown and release of private netdev state.
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init(). However, the release of these resources
can occur in one of two different places.
Either netdev_ops->ndo_uninit() or netdev->destructor().
The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.
netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.
netdev->destructor(), on the other hand, does not run until the
netdev references all go away.
Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().
This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.
If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit(). But
it is not able to invoke netdev->destructor().
This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.
However, this means that the resources that would normally be released
by netdev->destructor() will not be.
Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.
Many drivers do not try to deal with this, and instead we have leaks.
Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().
netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().
netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().
Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().
And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 7 Jun 2017 11:45:37 +0000 (13:45 +0200)]
bpf, arm64: use separate register for state in stxr
Will reported that in BPF_XADD we must use a different register in stxr
instruction for the status flag due to otherwise CONSTRAINED UNPREDICTABLE
behavior per architecture. Reference manual says [1]:
If s == t, then one of the following behaviors must occur:
* The instruction is UNDEFINED.
* The instruction executes as a NOP.
* The instruction performs the store to the specified address, but
the value stored is UNKNOWN.
Thus, use a different temporary register for the status flag to fix it.
Disassembly extract from test 226/STX_XADD_DW from test_bpf.ko:
Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Ténart [Wed, 7 Jun 2017 06:17:50 +0000 (08:17 +0200)]
net: mvpp2: do not bypass the mvpp22_port_mii_set function
The mvpp22_port_mii_set() function was added by 2697582144dd, but the
function directly returns without doing anything. This return was used
when debugging and wasn't removed before sending the patch. Fix this.
Fixes: 2697582144dd ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences") Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 6 Jun 2017 14:30:31 +0000 (16:30 +0200)]
bnx2x: fix pf2vf bulletin DMA mapping leak
When freeing VF's DMA mappings, an already NULLed pointer was checked
again due to an apparent copy&paste error. Consequently, the pf2vf
bulletin DMA mapping was not freed.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: don't call strlen on non-terminated string in dev_set_alias()
KMSAN reported a use of uninitialized memory in dev_set_alias(),
which was caused by calling strlcpy() (which in turn called strlen())
on the user-supplied non-terminated string.
Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Made TCP congestion control documentation match current reality,
from Anmol Sarma.
2) Various build warning and failure fixes from Arnd Bergmann.
3) Fix SKB list leak in ipv6_gso_segment().
4) Use after free in ravb driver, from Eugeniu Rosca.
5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.
6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
Piccoli.
7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
Zhang.
8) Use after free in vxlan deletion, from Mark Bloch.
9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
Filippov.
10) Fix stmmac hangs with TSO, from Niklas Cassel.
11) Fix crash in CALIPSO ipv6, from Richard Haines.
12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.
13) Fix regression in sk_err socket error queue handling, noticed by
ping applications. From Soheil Hassas Yeganeh.
14) Update mlx4/mlx5 MAINTAINERS information.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
net: stmmac: fix a broken u32 less than zero check
net: stmmac: fix completely hung TX when using TSO
net: ethoc: enable NAPI before poll may be scheduled
net: bridge: fix a null pointer dereference in br_afspec
ravb: Fix use-after-free on `ifconfig eth0 down`
net/ipv6: Fix CALIPSO causing GPF with datagram support
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
Revert "sit: reload iphdr in ipip6_rcv"
i40e/i40evf: proper update of the page_offset field
i40e: Fix state flags for bit set and clean operations of PF
iwlwifi: fix host command memory leaks
iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
iwlwifi: mvm: clear new beacon command template struct
iwlwifi: mvm: don't fail when removing a key from an inexisting sta
iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
iwlwifi: mvm: fix firmware debug restart recording
iwlwifi: tt: move ucode_loaded check under mutex
iwlwifi: mvm: support ibss in dqa mode
iwlwifi: mvm: Fix command queue number on d0i3 flow
iwlwifi: mvm: rs: start using LQ command color
...
1) Fix TLB context wrap races, from Pavel Tatashin.
2) Cure some gcc-7 build issues.
3) Handle invalid setup_hugepagesz command line values properly, from
Liam R Howlett.
4) Copy TSB using the correct address shift for the huge TSB, from Mike
Kravetz.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: delete old wrap code
sparc64: new context wrap
sparc64: add per-cpu mm of secondary contexts
sparc64: redefine first version
sparc64: combine activate_mm and switch_mm
sparc64: reset mm cpumask after wrap
sparc/mm/hugepages: Fix setup_hugepagesz for invalid values.
sparc: Machine description indices can vary
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
arch/sparc: support NR_CPUS = 4096
sparc64: Add __multi3 for gcc 7.x and later.
sparc64: Fix build warnings with gcc 7.
arch/sparc: increase CONFIG_NODES_SHIFT on SPARC64 to 5
David S. Miller [Tue, 6 Jun 2017 20:45:48 +0000 (13:45 -0700)]
Merge branch 'sparc64-context-wrap-fixes'
Pavel Tatashin says:
====================
sparc64: context wrap fixes
This patch series contains fixes for context wrap: when we are out of
context ids, and need to get a new version.
It fixes memory corruption issues which happen when more than number of
context ids (currently set to 8K) number of processes are started
simultaneously, and processes can get a wrong context.
sparc64: new context wrap:
- contains explanation of new wrap method, and also explanation of races
that it solves
sparc64: reset mm cpumask after wrap
- explains issue of not reseting cpu mask on a wrap
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:25 +0000 (11:25 -0400)]
sparc64: delete old wrap code
The old method that is using xcall and softint to get new context id is
deleted, as it is replaced by a method of using per_cpu_secondary_mm
without xcall to perform the context wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:24 +0000 (11:25 -0400)]
sparc64: new context wrap
The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap. This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a memory location in every process.
Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.
Several approaches were explored before settling on this one:
Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).
This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.
Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.
The approach in this patch is the simplest and has almost no impact on
runtime. We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:23 +0000 (11:25 -0400)]
sparc64: add per-cpu mm of secondary contexts
The new wrap is going to use information from this array to figure out
mm's that currently have valid secondary contexts setup.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:22 +0000 (11:25 -0400)]
sparc64: redefine first version
CTX_FIRST_VERSION defines the first context version, but also it defines
first context. This patch redefines it to only include the first context
version.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:21 +0000 (11:25 -0400)]
sparc64: combine activate_mm and switch_mm
The only difference between these two functions is that in activate_mm we
unconditionally flush context. However, there is no need to keep this
difference after fixing a bug where cpumask was not reset on a wrap. So, in
this patch we combine these.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Tatashin [Wed, 31 May 2017 15:25:20 +0000 (11:25 -0400)]
sparc64: reset mm cpumask after wrap
After a wrap (getting a new context version) a process must get a new
context id, which means that we would need to flush the context id from
the TLB before running for the first time with this ID on every CPU. But,
we use mm_cpumask to determine if this process has been running on this CPU
before, and this mask is not reset after a wrap. So, there are two possible
fixes for this issue:
1. Clear mm cpumask whenever mm gets a new context id
2. Unconditionally flush context every time process is running on a CPU
This patch implements the first solution
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Clarke [Mon, 29 May 2017 19:17:56 +0000 (20:17 +0100)]
sparc: Machine description indices can vary
VIO devices were being looked up by their index in the machine
description node block, but this often varies over time as devices are
added and removed. Instead, store the ID and look up using the type,
config handle and ID.
Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112541 Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Kravetz [Fri, 2 Jun 2017 21:51:12 +0000 (14:51 -0700)]
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
When a TSB grows beyond its current capacity, a new TSB is allocated
and copy_tsb is called to copy entries from the old TSB to the new.
A hash shift based on page size is used to calculate the index of an
entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these
calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT
should be used. As a result, when copy_tsb is called for a huge page
TSB the entries are placed at the incorrect index in the newly
allocated TSB. When doing hardware table walk, the MMU does not
match these entries and we end up in the TSB miss handling code.
This code will then create and write an entry to the correct index
in the TSB. We take a performance hit for the table walk miss and
recreation of these entries.
Pass a new parameter to copy_tsb that is the page size shift to be
used when copying the TSB.
Suggested-by: Anthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jane Chu [Tue, 6 Jun 2017 20:32:29 +0000 (14:32 -0600)]
arch/sparc: support NR_CPUS = 4096
Linux SPARC64 limits NR_CPUS to 4064 because init_cpu_send_mondo_info()
only allocates a single page for NR_CPUS mondo entries. Thus we cannot
use all 4096 CPUs on some SPARC platforms.
To fix, allocate (2^order) pages where order is set according to the size
of cpu_list for possible cpus. Since cpu_list_pa and cpu_mondo_block_pa
are not used in asm code, there are no imm13 offsets from the base PA
that will break because they can only reach one page.
Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Bob Picco <bob.picco@oracle.com> Reviewed-by: Atish Patra <atish.patra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 6 Jun 2017 13:10:49 +0000 (14:10 +0100)]
net: stmmac: fix a broken u32 less than zero check
The check that queue is less or equal to zero is always true
because queue is a u32; queue is decremented and will wrap around
and never go -ve. Fix this by making queue an int.
Detected by CoverityScan, CID#1428988 ("Unsigned compared against 0")
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 6 Jun 2017 07:25:00 +0000 (09:25 +0200)]
net: stmmac: fix completely hung TX when using TSO
stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.
This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.
When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.
When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Max Filippov [Tue, 6 Jun 2017 01:31:16 +0000 (18:31 -0700)]
net: ethoc: enable NAPI before poll may be scheduled
ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.
Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.") Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: fix a null pointer dereference in br_afspec
We might call br_afspec() with p == NULL which is a valid use case if
the action is on the bridge device itself, but the bridge tunnel code
dereferences the p pointer without checking, so check if p is null
first.
Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eugeniu Rosca [Mon, 5 Jun 2017 22:08:10 +0000 (00:08 +0200)]
ravb: Fix use-after-free on `ifconfig eth0 down`
Commit a47b70ea86bd ("ravb: unmap descriptors when freeing rings") has
introduced the issue seen in [1] reproduced on H3ULCB board.
Fix this by relocating the RX skb ringbuffer free operation, so that
swiotlb page unmapping can be done first. Freeing of aligned TX buffers
is not relevant to the issue seen in [1]. Still, reposition TX free
calls as well, to have all kfree() operations performed consistently
_after_ dma_unmap_*()/dma_free_*().
[1] Console screenshot with the problem reproduced:
salvator-x login: root
root@salvator-x:~# ifconfig eth0 up
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \
attached PHY driver [Micrel KSZ9031 Gigabit PHY] \
(mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235)
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@salvator-x:~#
root@salvator-x:~# ifconfig eth0 down
==================================================================
BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649
Fixes: a47b70ea86bd ("ravb: unmap descriptors when freeing rings") Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Haines [Mon, 5 Jun 2017 15:44:40 +0000 (16:44 +0100)]
net/ipv6: Fix CALIPSO causing GPF with datagram support
When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
IP header may have moved.
Also update the payload length after adding the CALIPSO option.
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Huw Davies <huw@codeweavers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 5 Jun 2017 09:04:52 +0000 (10:04 +0100)]
net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
The current comparison of entry < 0 will never be true since entry is an
unsigned integer. Make entry an int to ensure -ve error return values
from the call to jumbo_frm are correctly being caught.
Detected by CoverityScan, CID#1238760 ("Macro compares unsigned to 0")
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 6 Jun 2017 16:37:44 +0000 (09:37 -0700)]
Merge tag 'media/v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Some bug fixes:
- Don't fail build if atomisp has warnings
- Some CEC Kconfig changes to allow it to be used by DRM without
media dependencies
- A race fix at RC initialization code
- A driver fix at rainshadow-cec
IMHO, the one that affects most people in this series is a build fix:
if you try to build the Kernel with W=1 or using gcc7 and
all[yes|mod]config, build will fail due to -Werror at atomisp
makefiles"
* tag 'media/v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] rc-core: race condition during ir_raw_event_register()
[media] cec: drop MEDIA_CEC_DEBUG
[media] cec: rename MEDIA_CEC_NOTIFIER to CEC_NOTIFIER
[media] cec: select CEC_CORE instead of depend on it
[media] rainshadow-cec: ensure exit_loop is intialized
[media] atomisp: don't treat warnings as errors
David S. Miller [Tue, 6 Jun 2017 16:12:57 +0000 (12:12 -0400)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-06-06
This series contains fixes to i40e and i40evf only.
Mauro S. M. Rodrigues fixes a flood in the kernel log which was introduced
in a previous commit because of a mistaken substitution of __I40E_VSI_DOWN
instead of __I40E_DOWN when testing the state of the PF.
Björn Töpel fixes an issue introduced in a previous commit where the
offset was incorrect and could lead to data corruption for architectures
using PAGE_SIZE larger than 8191. Fixed the issue by updating the
page_offset correctly using the proper setting for truesize.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Björn Töpel [Mon, 15 May 2017 04:52:00 +0000 (06:52 +0200)]
i40e/i40evf: proper update of the page_offset field
In f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.
Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.
Fixes: f8b45b74cc62 ("i40e/i40evf: Use build_skb to build frames") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Fix state flags for bit set and clean operations of PF
Commit 0da36b9774cc ("i40e: use DECLARE_BITMAP for state fields")
introduced changes in the way i40e works with state flags converting
them to bitmaps using kernel bitmap API. This change introduced a
regression due to a mistaken substitution using __I40E_VSI_DOWN instead
of __I40E_DOWN when testing state of a PF at i40e_reset_subtask()
function. This caused a flood in the kernel log with the follow message:
[49.013] i40e 0002:01:00.0: bad reset request 0x00000020
Commit d19cb64b9222 ("i40e: separate PF and VSI state flags")
also introduced some misuse of the VSI and PF flags, so both could be
considered as the offenders.
This patch simply fixes the flags where it makes sense by changing
__I40E_VSI_DOWN to __I40E_DOWN.
Fixes: 0da36b9774cc ("i40e: use DECLARE_BITMAP for state fields") Fixes: d19cb64b9222 ("i40e: separate PF and VSI state flags") Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com> Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Mon, 5 Jun 2017 22:37:03 +0000 (15:37 -0700)]
Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes. One to address RCU delay of cpuset removal affecting
userland visible behaviors. The other fixes a race condition between
controller disable and cgroup removal"
* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: consider dying css as offline
cgroup: Prevent kill_css() from being called more than once
Linus Torvalds [Mon, 5 Jun 2017 22:31:14 +0000 (15:31 -0700)]
Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
- Revert of sata_mv devm_ioremap_resource() conversion. It made init
fail if there are overlapping resources which led to detection
failures on some setups.
- A workaround for an Acer laptop which sometimes reports corrupt port
map.
- Other non-critical fixes.
* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: fix error checking in in ata_parse_force_one()
Revert "ata: sata_mv: Convert to devm_ioremap_resource()"
ata: libahci: properly propagate return value of platform_get_irq()
ata: sata_rcar: Handle return value of clk_prepare_enable
ahci: Acer SA5-271 SSD Not Detected Fix
Sending host command with CMD_WANT_SKB flag demands the release of the
response buffer with iwl_free_resp function.
The patch adds the memory release in all the relevant places
Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
In a previous commit, we removed support for API versions earlier than
22 for these NICs. By mistake, the *_UCODE_API_MIN definitions were
set to 17. Fix that.
Fixes: 4b87e5af638b ("iwlwifi: remove support for fw older than -17 and -22") Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Luca Coelho [Thu, 30 Mar 2017 09:04:47 +0000 (12:04 +0300)]
iwlwifi: mvm: don't fail when removing a key from an inexisting sta
The iwl_mvm_remove_sta_key() function handles removing a key when the
sta doesn't exist anymore. Mistakenly, this was changed to return an
error while fixing another bug.
If the mvm_sta doesn't exist, we continue normally, but just don't try
to remove the igtk key.
Luca Coelho [Fri, 24 Mar 2017 09:01:45 +0000 (11:01 +0200)]
iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
We only need to handle d0i3 entry and exit during suspend resume if
system_pm is set to IWL_PLAT_PM_MODE_D0I3, otherwise d0i3 entry
failures will cause suspend to fail.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=194791
When we want to stop the recording of the firmware debug
and restart it later without reloading the firmware we
don't need to resend the configuration that comes with
host commands.
Sending those commands confused the hardware and led to
an NMI 0x66.
Change the flow as following:
* read the relevant registers (DBGC_IN_SAMPLE, DBGC_OUT_CTRL)
* clear those registers
* wait for the hardware to complete its write to the buffer
* get the data
* restore the value of those registers (to restart the
recording)
For early start (where the configuration is already
compiled in the firmware), we don't need to set those
registers after the firmware has been loaded, but only
when we want to restart the recording without having
restarted the firmware.
Haim Dreyfuss [Thu, 16 Mar 2017 15:26:03 +0000 (17:26 +0200)]
iwlwifi: mvm: Fix command queue number on d0i3 flow
During d0i3 flow we flush all the queue except from the command queue.
Currently, in this flow the command queue is hard coded to 9.
In DQA the command queue number has changed from 9 to 0.
Fix that.
This fixes a problem in runtime PM resume flow.
Fixes: 097129c9e625 ("iwlwifi: mvm: move cmd queue to be #0 in dqa mode") Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Up until now, the driver was comparing the rate reported by the FW and
the rate of the latest LQ command to avoid processing data belonging
to the old LQ command. Recently, FW changed the meaning of the initial
rate field in tx response and it holds the actual rate (which is not
necessarily the initial rate of LQ's rate table). Use instead LQ cmd
color to be able to filter out tx responses/BA notifications which
where sent during earlier LQ commands' time frame.
This fixes some throughput degradation in noisy environments.
Linus Torvalds [Mon, 5 Jun 2017 18:19:40 +0000 (11:19 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Three fixes this time around:
- Two fixes for noMMU, fixing the decompressor header layout, and
preventing a build error with some configurations.
- Fixing the hyp-stub updates that went in during the merge window
for platforms that use MCPM"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
ARM: 8676/1: NOMMU: provide pgprot_device() macro
ARM: 8675/1: MCPM: ensure not to enter __hyp_soft_restart from loopback and cpu_power_down
Ido Shamay [Mon, 5 Jun 2017 07:44:56 +0000 (10:44 +0300)]
net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
The Granular QoS per VF feature must be enabled in FW before it can be
used.
Thus, the driver cannot modify a QP's qos_vport value (via the UPDATE_QP FW
command) if the feature has not been enabled -- the FW returns an error if
this is attempted.
Fixes: 08068cd5683f ("net/mlx4: Added qos_vport QP configuration in VST mode") Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Randy Dunlap [Mon, 5 Jun 2017 02:46:53 +0000 (19:46 -0700)]
net: phy: fix kernel-doc warnings
Fix kernel-doc warnings (typo) in drivers/net/phy/phy.c:
..//drivers/net/phy/phy.c:259: warning: No description found for parameter 'features'
..//drivers/net/phy/phy.c:259: warning: Excess function parameter 'feature' description in 'phy_lookup_setting'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Mon, 5 Jun 2017 00:57:21 +0000 (08:57 +0800)]
devlink: fix potential memort leak
We must free allocated skb when genlmsg_put() return fails.
Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Wed, 24 May 2017 14:31:57 +0000 (15:31 +0100)]
ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
As reported by Patrice, the header layout of the decompressor is
incorrect when building for v7-M. In this case, the __nop macro
resolves to 'mov r0, r0', which is emitted as a narrow encoding,
resulting in the header data fields to end up at lower offsets than
required.
Given the variety of targets we need to support with the same code,
the startup sequence is a bit of a jumble, and uses instructions
and macros whose encoding widths cannot be specified (badr), or only
exist in a narrow encoding (bx)
So force the use of a wide encoding in __nop, and replace the start
sequence with a simple jump to the label marking the start of code,
preceded by a Thumb2 mode switch if required (using explicit wide
encodings where appropriate). The label itself can be moved to the
start of code [where it belongs] due to the larger range of branch
instructions as compared to adr instructions.
Reported-by: Patrice CHOTARD <patrice.chotard@st.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Vladimir Murzin [Wed, 24 May 2017 09:30:18 +0000 (10:30 +0100)]
ARM: 8676/1: NOMMU: provide pgprot_device() macro
NOMMU build leads to the following error:
CC drivers/pci/mmap.o
drivers/pci/mmap.c: In function 'pci_mmap_resource_range':
drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration]
vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
^
cc1: some warnings being treated as errors
scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed
make[2]: *** [drivers/pci/mmap.o] Error 1
scripts/Makefile.build:561: recipe for target 'drivers/pci' failed
make[1]: *** [drivers/pci] Error 2
Makefile:1016: recipe for target 'drivers' failed
make: *** [drivers] Error 2
Fix it with support of pgprot_device() macro for NOMMU.
Fixes: 00d2904ffeac ("ARM/PCI: Use generic pci_mmap_resource_range()") Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Talat Batheesh [Sun, 4 Jun 2017 11:30:07 +0000 (14:30 +0300)]
net/mlx4: Fix the check in attaching steering rules
Our previous patch (cited below) introduced a regression
for RAW Eth QPs.
Fix it by checking if the QP number provided by user-space
exists, hence allowing steering rules to be added for valid
QPs only.
Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring") Reported-by: Or Gerlitz <gerlitz.or@gmail.com> Signed-off-by: Talat Batheesh <talatb@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Sun, 4 Jun 2017 06:43:43 +0000 (14:43 +0800)]
sit: reload iphdr in ipip6_rcv
Since iptunnel_pull_header() can call pskb_may_pull(),
we must reload any pointer that was related to skb->head.
Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 3 Jun 2017 16:29:25 +0000 (09:29 -0700)]
net: ping: do not abuse udp_poll()
Alexander reported various KASAN messages triggered in recent kernels
The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <alexander.levin@verizon.com> Cc: Solar Designer <solar@openwall.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Lorenzo Colitti <lorenzo@google.com> Acked-By: Lorenzo Colitti <lorenzo@google.com> Tested-By: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: Fix stale cpu_switch reference after unbind then bind
Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.
The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.
To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().
Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Jun 2017 01:41:10 +0000 (21:41 -0400)]
ipv6: Fix leak in ipv6_gso_segment().
If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.
Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Garver [Fri, 2 Jun 2017 18:54:10 +0000 (14:54 -0400)]
geneve: fix needed_headroom and max_mtu for collect_metadata
Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.
For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.
Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.
Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.
Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb) Reported-by: Cyril Hrubis <chrubis@suse.cz> Tested-by: Cyril Hrubis <chrubis@suse.cz> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Hocko [Fri, 2 Jun 2017 15:54:08 +0000 (17:54 +0200)]
amd-xgbe: use PAGE_ALLOC_COSTLY_ORDER in xgbe_map_rx_buffer
xgbe_map_rx_buffer is rather confused about what PAGE_ALLOC_COSTLY_ORDER
means. It uses PAGE_ALLOC_COSTLY_ORDER-1 assuming that
PAGE_ALLOC_COSTLY_ORDER is the first costly order which is not the case
actually because orders larger than that are costly. And even that
applies only to sleeping allocations which is not the case here. We
simply do not perform any costly operations like reclaim or compaction
for those. Simplify the code by dropping the order calculation and use
PAGE_ALLOC_COSTLY_ORDER directly.
Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Liam McBirnie [Thu, 1 Jun 2017 05:36:01 +0000 (15:36 +1000)]
ip6_tunnel: fix traffic class routing for tunnels
ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.
Commit 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.
The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets") Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com> Acked-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>