mlxsw: spectrum: Forbid linking to devices that have uppers
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.
Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.
One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.
Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.
Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Nogah Frankel <nogahf@mellanox.com> Tested-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 31 Aug 2017 14:47:43 +0000 (16:47 +0200)]
wl1251: add a missing spin_lock_init()
wl1251: add a missing spin_lock_init()
This fixes the following kernel warning:
[ 5668.771453] BUG: spinlock bad magic on CPU#0, kworker/u2:3/9745
[ 5668.771850] lock: 0xce63ef20, .magic: 00000000, .owner: <none>/-1,
.owner_cpu: 0
[ 5668.772277] CPU: 0 PID: 9745 Comm: kworker/u2:3 Tainted: G W 4.12.0-03002-gec979a4-dirty #40
[ 5668.772796] Hardware name: Nokia RX-51 board
[ 5668.773071] Workqueue: phy1 wl1251_irq_work
[ 5668.773345] [<c010c9e4>] (unwind_backtrace) from [<c010a274>]
(show_stack+0x10/0x14)
[ 5668.773803] [<c010a274>] (show_stack) from [<c01545a4>]
(do_raw_spin_lock+0x6c/0xa0)
[ 5668.774230] [<c01545a4>] (do_raw_spin_lock) from [<c06ca578>]
(_raw_spin_lock_irqsave+0x10/0x18)
[ 5668.774658] [<c06ca578>] (_raw_spin_lock_irqsave) from [<c048c010>]
(wl1251_op_tx+0x38/0x5c)
[ 5668.775115] [<c048c010>] (wl1251_op_tx) from [<c06a12e8>]
(ieee80211_tx_frags+0x188/0x1c0)
[ 5668.775543] [<c06a12e8>] (ieee80211_tx_frags) from [<c06a138c>]
(__ieee80211_tx+0x6c/0x130)
[ 5668.775970] [<c06a138c>] (__ieee80211_tx) from [<c06a3dbc>]
(ieee80211_tx+0xdc/0x104)
[ 5668.776367] [<c06a3dbc>] (ieee80211_tx) from [<c06a4af0>]
(__ieee80211_subif_start_xmit+0x454/0x8c8)
[ 5668.776824] [<c06a4af0>] (__ieee80211_subif_start_xmit) from
[<c06a4f94>] (ieee80211_subif_start_xmit+0x30/0x2fc)
[ 5668.777343] [<c06a4f94>] (ieee80211_subif_start_xmit) from
[<c0578848>] (dev_hard_start_xmit+0x80/0x118)
...
by adding the missing spin_lock_init().
Reported-by: Pavel Machek <pavel@ucw.cz> Cc: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Pavel Machek <pavel@ucw.cz> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 31 Aug 2017 00:49:29 +0000 (17:49 -0700)]
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy:
Correctly process PHY_HALTED in phy_stop_machine()") because it is
creating the possibility for a NULL pointer dereference.
David Daney provide the following call trace and diagram of events:
The original motivation for this change originated from Marc Gonzales
indicating that his network driver did not have its adjust_link callback
executing with phydev->link = 0 while he was expecting it.
PHYLIB has never made any such guarantees ever because phy_stop() merely just
tells the workqueue to move into PHY_HALTED state which will happen
asynchronously.
Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Reported-by: David Daney <ddaney.cavm@gmail.com> Fixes: 7ad813f20853 ("net: phy: Correctly process PHY_HALTED in phy_stop_machine()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kernels >= 4.8
net/mlx5e: Fix inline header size for small packets
net/mlx5: E-Switch, Unload the representors in the correct order
net/mlx5: Fix arm SRQ command for ISSI version 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 30 Aug 2017 19:39:33 +0000 (12:39 -0700)]
net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
BCM7278 has only 128 entries while BCM7445 has the full 256 entries set,
fix that.
Fixes: 7318166cacad ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 30 Aug 2017 16:29:31 +0000 (09:29 -0700)]
kcm: do not attach PF_KCM sockets to avoid deadlock
syzkaller had no problem to trigger a deadlock, attaching a KCM socket
to another one (or itself). (original syzkaller report was a very
confusing lockdep splat during a sendmsg())
It seems KCM claims to only support TCP, but no enforcement is done,
so we might need to add additional checks.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
I went over all qdiscs' init, destroy and reset callbacks and found the
issues fixed in each patch. Mostly they are null pointer dereferences due
to uninitialized timer (qdisc watchdog) or double frees due to ->destroy
cleaning up a second time. There's more information in each patch.
I've tested these by either sending wrong attributes from user-spaces, no
attributes or by simulating memory alloc failure where applicable. Also
tried all of the qdiscs as a default qdisc.
Most of these bugs were present before commit 87b60cfacf9f, I've tried to
include proper fixes tags in each patch.
I haven't included individual patch acks in the set, I'd appreciate it if
you take another look and resend them.
====================
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_tbf: fix two null pointer dereferences on init failure
sch_tbf calls qdisc_watchdog_cancel() in both its ->reset and ->destroy
callbacks but it may fail before the timer is initialized due to missing
options (either not supplied by user-space or set as a default qdisc),
also q->qdisc is used by ->reset and ->destroy so we need it initialized.
Reproduce:
$ sysctl net.core.default_qdisc=tbf
$ ip l set ethX up
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_sfq: fix null pointer dereference on init failure
Currently only a memory allocation failure can lead to this, so let's
initialize the timer first.
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_netem: avoid null pointer deref on init failure
netem can fail in ->init due to missing options (either not supplied by
user-space or used as a default qdisc) causing a timer->base null
pointer deref in its ->destroy() and ->reset() callbacks.
Reproduce:
$ sysctl net.core.default_qdisc=netem
$ ip l set ethX up
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It is very unlikely to happen but the backlogs memory allocation
could fail and will free q->flows, but then ->destroy() will free
q->flows too. For correctness remove the first free and let ->destroy
clean up.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_cbq: fix null pointer dereferences on init failure
CBQ can fail on ->init by wrong nl attributes or simply for missing any,
f.e. if it's set as a default qdisc then TCA_OPTIONS (opt) will be NULL
when it is activated. The first thing init does is parse opt but it will
dereference a null pointer if used as a default qdisc, also since init
failure at default qdisc invokes ->reset() which cancels all timers then
we'll also dereference two more null pointers (timer->base) as they were
never initialized.
To reproduce:
$ sysctl net.core.default_qdisc=cbq
$ ip l set ethX up
Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_hfsc: fix null pointer deref and double free on init failure
Depending on where ->init fails we can get a null pointer deref due to
uninitialized hires timer (watchdog) or a double free of the qdisc hash
because it is already freed by ->destroy().
Fixes: 8d5537387505 ("net/sched/hfsc: allocate tcf block for hfsc root class") Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sch_hhf: fix null pointer dereference on init failure
If sch_hhf fails in its ->init() function (either due to wrong
user-space arguments as below or memory alloc failure of hh_flows) it
will do a null pointer deref of q->hh_flows in its ->destroy() function.
To reproduce the crash:
$ tc qdisc add dev eth0 root hhf quantum 2000000 non_hh_weight 10000000
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The below commit added a call to ->destroy() on init failure, but multiq
still frees ->queues on error in init, but ->queues is also freed by
->destroy() thus we get double free and corrupted memory.
Very easy to reproduce (eth0 not multiqueue):
$ tc qdisc add dev eth0 root multiq
RTNETLINK answers: Operation not supported
$ ip l add dumdum type dummy
(crash)
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: f07d1501292b ("multiq: Further multiqueue cleanup") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The commit below added a call to the ->destroy() callback for all qdiscs
which failed in their ->init(), but some were not prepared for such
change and can't handle partially initialized qdisc. HTB is one of them
and if any error occurs before the qdisc watchdog timer and qdisc work are
initialized then we can hit either a null ptr deref (timer->base) when
canceling in ->destroy or lockdep error info about trying to register
a non-static key and a stack dump. So to fix these two move the watchdog
timer and workqueue init before anything that can err out.
To reproduce userspace needs to send broken htb qdisc create request,
tested with a modified tc (q_htb.c).
Note that probably this bug goes further back because the default qdisc
handling always calls ->destroy on init failure too.
Fixes: 87b60cfacf9f ("net_sched: fix error recovery at qdisc creation") Fixes: 0fbbeb1ba43b ("[PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt()") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tal Gilboa [Mon, 28 Aug 2017 15:45:08 +0000 (18:45 +0300)]
net/mlx5e: Fix CQ moderation mode not set properly
cq_period_mode assignment was mistakenly removed so it was always set to "0",
which is EQE based moderation, regardless of the device CAPs and
requested value in ethtool.
Fixes: 6a9764efb255 ("net/mlx5e: Isolate open_channels from priv->params") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Shahar Klein [Tue, 1 Aug 2017 12:29:55 +0000 (15:29 +0300)]
net/mlx5: E-Switch, Unload the representors in the correct order
When changing from switchdev to legacy mode, all the representor port
devices (uplink nic and reps) are cleaned up. Part of this cleaning
process is removing the neigh entries and the hash table containing them.
However, a representor neigh entry might be linked to the uplink port
hash table and if the uplink nic is cleaned first the cleaning of the
representor will end up in null deref.
Fix that by unloading the representors in the opposite order of load.
Fixes: cb67b832921c ("net/mlx5e: Introduce SRIOV VF representors") Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently if vxlan tunnel ipv6 src isn't supplied the driver fails to
resolve it as part of the route lookup. The resulting encap header
is left with a zeroed out ipv6 src address so the packets are sent
with this src ip.
Use an appropriate route lookup API that also resolves the source
ipv6 address if it's not supplied.
Fixes: ce99f6b97fcd ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels') Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Inbar Karmy [Mon, 14 Aug 2017 13:12:16 +0000 (16:12 +0300)]
net/mlx5e: Don't override user RSS upon set channels
Currently, increasing the number of combined channels is changing
the RSS spread to use the new created channels.
Prevent the RSS spread change in case the user explicitly declare it,
to avoid overriding user configuration.
Tested:
when RSS default:
# ethtool -L ens8 combined 4
RSS spread will change and point to 4 channels.
# ethtool -X ens8 equal 4
# ethtool -L ens8 combined 6
RSS will not change after increasing the number of the channels.
Fixes: 8bf368620486 ('ethtool: ensure channel counts are within bounds during SCHANNELS') Signed-off-by: Inbar Karmy <inbark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Wed, 16 Aug 2017 11:37:11 +0000 (14:37 +0300)]
net/mlx5e: Fix dangling page pointer on DMA mapping error
Function mlx5e_dealloc_rx_wqe is using page pointer value as an
indication to valid DMA mapping. In case that the mapping failed, we
released the page but kept the dangling pointer. Store the page pointer
only after the DMA mapping passed to avoid invalid page DMA unmap.
Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Huy Nguyen [Tue, 8 Aug 2017 18:17:00 +0000 (13:17 -0500)]
net/mlx5: Skip mlx5_unload_one if mlx5_load_one fails
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Support for ISSI version 0 was recently broken as the arm_srq_cmd
command, which is used only for ISSI version 0, was given the opcode
for ISSI version 1 instead of ISSI version 0.
Change arm_srq_cmd to use the correct command opcode for ISSI version
0.
net/mlx5e: Fix DCB_CAP_ATTR_DCBX capability for DCBNL getcap.
Current code doesn't report DCB_CAP_DCBX_HOST capability when query
through getcap. User space lldptool expects capability to have HOST mode
set when it wants to configure DCBX CEE mode. In absence of HOST mode
capability, lldptool fails to switch to CEE mode.
This fix returns DCB_CAP_DCBX_HOST capability when port's DCBX
controlled mode is under software control.
net/mlx5e: Check for qos capability in dcbnl_initialize
qos capability is the master capability bit that determines
if the DCBX is supported for the PCI function. If this bit is off,
driver cannot run any dcbx code.
Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Sekhar Nori [Wed, 30 Aug 2017 08:07:13 +0000 (13:37 +0530)]
net: ti: cpsw-common: dont print error if ti_cm_get_macid() fails
It is quite common for ti_cm_get_macid() to fail on some of the
platforms it is invoked on. They include any platform where
mac address is not part of SoC register space.
On these platforms, mac address is read and populated in
device-tree by bootloader. An example is TI DA850.
Downgrade the severity of message to "information", so it does
not spam logs when 'quiet' boot is desired.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The phy is connected at early stage of probe but not properly
disconnected if error occurs. This patch fixes the issue.
Also changing the return type of xgene_enet_check_phy_handle(),
since this function always returns success.
Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 29 Aug 2017 19:15:16 +0000 (22:15 +0300)]
nfp: double free on error in probe
Both the nfp_net_pf_app_start() and the nfp_net_pci_probe() functions
call nfp_net_pf_app_stop_ctrl(pf) so there is a double free. The free
should be done from the probe function because it's allocated there so
I have removed the call from nfp_net_pf_app_start().
Fixes: 02082701b974 ("nfp: create control vNICs and wire up rx/tx") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous [Mon, 28 Aug 2017 18:52:13 +0000 (21:52 +0300)]
net:ethernet:aquantia: Show info message if bad firmware version detected.
We should inform user about wrong firmware version
by printing message in dmesg.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Mon, 28 Aug 2017 18:52:12 +0000 (21:52 +0300)]
net:ethernet:aquantia: Fix for multicast filter handling.
Since the HW supports up to 32 multicast filters we should
track count of multicast filters to avoid overflow.
If we attempt to add >32 multicast filter - just set NETIF_ALLMULTI flag
instead.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code") Signed-off-by: Igor Russkikh <Igor.Russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous [Mon, 28 Aug 2017 18:52:11 +0000 (21:52 +0300)]
net:ethernet:aquantia: Fix for incorrect speed index.
The driver choose the optimal interrupt throttling settings depends
of current link speed.
Due this bug link_status field from aq_hw is never updated and as result
always used same interrupt throttling values.
Fixes: 3d2ff7eebe26 ("net: ethernet: aquantia: Atlantic hardware abstraction layer") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous [Mon, 28 Aug 2017 18:52:10 +0000 (21:52 +0300)]
net:ethernet:aquantia: Workaround for HW checksum bug.
The hardware has the HW Checksum Offload bug when small
TCP patckets (with length <= 60 bytes) has wrong "checksum valid" bit.
The solution is - ignore checksum valid bit for small packets
(with length <= 60 bytes) and mark this as CHECKSUM_NONE to allow
network stack recalculate checksum itself.
Fixes: ccf9a5ed14be ("net: ethernet: aquantia: Atlantic A0 and B0 specific functions.") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous [Mon, 28 Aug 2017 18:52:09 +0000 (21:52 +0300)]
net:ethernet:aquantia: Fix for number of RSS queues.
The number of RSS queues should be not more than numbers of CPU.
Its does not make sense to increase perfomance, and also cause problems on
some motherboards.
Fixes: 94f6c9e4cdf6 ("net: ethernet: aquantia: Support for NIC-specific code") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Belous [Mon, 28 Aug 2017 18:52:08 +0000 (21:52 +0300)]
net:ethernet:aquantia: Extra spinlocks removed.
This patch removes datapath spinlocks which does not perform any
useful work.
Fixes: 6e70637f9f1e ("net: ethernet: aquantia: Add ring support code") Signed-off-by: Pavel Belous <Pavel.Belous@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin Poirier [Mon, 28 Aug 2017 18:29:41 +0000 (14:29 -0400)]
packet: Don't write vnet header beyond end of buffer
... which may happen with certain values of tp_reserve and maclen.
Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
For a bond slave device as a tipc bearer, the dev represents the bond
interface and orig_dev represents the slave in tipc_l2_rcv_msg().
Since we decode the tipc_ptr from bonding device (dev), we fail to
find the bearer and thus tipc links are not established.
In this commit, we register the tipc protocol callback per device and
look for tipc bearer from both the devices.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The warn_on happned when sk->sk_rmem_alloc wasn't 0 in inet_sock_destruct.
As after commit f970bd9e3a06 ("udp: implement memory accounting helpers"),
udp has changed to use udp_destruct_sock as sk_destruct where it would
udp_rmem_release all rmem.
But IPV6_ADDRFORM sockopt sets sk_destruct with inet_sock_destruct after
changing family to PF_INET. If rmem is not 0 at that time, and there is
no place to release rmem before calling inet_sock_destruct, the warn_on
will be triggered.
This patch is to fix it by not setting sk_destruct in IPV6_ADDRFORM sockopt
any more. As IPV6_ADDRFORM sockopt only works for tcp and udp. TCP sock has
already set it's sk_destruct with inet_sock_destruct and UDP has set with
udp_destruct_sock since they're created.
Fixes: f970bd9e3a06 ("udp: implement memory accounting helpers") Reported-by: ChunYu Wang <chunwang@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 29 Aug 2017 00:10:51 +0000 (17:10 -0700)]
net: dsa: Don't dereference dst->cpu_dp->netdev
If we do not have a master network device attached dst->cpu_dp will be
NULL and accessing cpu_dp->netdev will create a trace similar to the one
below. The correct check is on dst->cpu_dp period.
[ 1.004650] DSA: switch 0 0 parsed
[ 1.008078] Unable to handle kernel NULL pointer dereference at
virtual address 00000010
[ 1.016195] pgd = c0003000
[ 1.018918] [00000010] *pgd=80000000004003, *pmd=00000000
[ 1.024349] Internal error: Oops: 206 [#1] SMP ARM
[ 1.029157] Modules linked in:
[ 1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00071-g45b45afab9bd-dirty #7
[ 1.040772] Hardware name: Broadcom STB (Flattened Device Tree)
[ 1.046704] task: ee08f840 task.stack: ee090000
[ 1.051258] PC is at dsa_register_switch+0x5e0/0x9dc
[ 1.056234] LR is at dsa_register_switch+0x5d0/0x9dc
[ 1.061211] pc : [<c08fb28c>] lr : [<c08fb27c>] psr: 60000213
[ 1.067491] sp : ee091d88 ip : 00000000 fp : 0000000c
[ 1.072728] r10: 00000000 r9 : 00000001 r8 : ee208010
[ 1.077965] r7 : ee2b57b0 r6 : ee2b5780 r5 : 00000000 r4 : ee208e0c
[ 1.084506] r3 : 00000000 r2 : 00040d00 r1 : 2d1b2000 r0 : 00000016
[ 1.091050] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment user
[ 1.098199] Control: 32c5387d Table: 00003000 DAC: fffffffd
[ 1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use dst->cpu_dp->netdev") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Sun, 27 Aug 2017 04:13:48 +0000 (21:13 -0700)]
bridge: check for null fdb->dst before notifying switchdev drivers
current switchdev drivers dont seem to support offloading fdb
entries pointing to the bridge device which have fdb->dst
not set to any port. This patch adds a NULL fdb->dst check in
the switchdev notifier code.
This patch fixes the below NULL ptr dereference:
$bridge fdb add 00:02:00:00:00:33 dev br0 self
Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 26 Aug 2017 12:10:10 +0000 (20:10 +0800)]
ipv6: set dst.obsolete when a cached route has expired
Now it doesn't check for the cached route expiration in ipv6's
dst_ops->check(), because it trusts dst_gc that would clean the
cached route up when it's expired.
The problem is in dst_gc, it would clean the cached route only
when it's refcount is 1. If some other module (like xfrm) keeps
holding it and the module only release it when dst_ops->check()
fails.
But without checking for the cached route expiration, .check()
may always return true. Meanwhile, without releasing the cached
route, dst_gc couldn't del it. It will cause this cached route
never to expire.
This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
in .check.
Note that this is even needed when ipv6 dst_gc timer is removed
one day. It would set dst.obsolete in .redirect and .update_pmtu
instead, and check for cached route expiration when getting it,
just like what ipv4 route does.
Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Fri, 25 Aug 2017 22:03:10 +0000 (15:03 -0700)]
ipv6: fix sparse warning on rt6i_node
Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
generates a new sparse warning on rt->rt6i_node related code:
net/ipv6/route.c:1394:30: error: incompatible types in comparison
expression (different address spaces)
./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
expression (different address spaces)
This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
rcu API is used for it.
After this fix, sparse no longer generates the above warning.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Fri, 25 Aug 2017 20:48:48 +0000 (22:48 +0200)]
cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()
Passing commands for logging to t4_record_mbox() with size
MBOX_LEN, when the actual command size is actually smaller,
causes out-of-bounds stack accesses in t4_record_mbox() while
copying command words here:
for (i = 0; i < size / 8; i++)
entry->cmd[i] = be64_to_cpu(cmd[i]);
Up to 48 bytes from the stack are then leaked to debugfs.
When we call t4_record_mbox() to log a command reply, a MBOX_LEN
size can be used though, as get_mbox_rpl() will fill cmd_rpl up
completely.
Fixes: 7f080c3f2ff0 ("cxgb4: Add support to enable logging of firmware mailbox commands") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxime Ripard [Fri, 25 Aug 2017 19:12:17 +0000 (21:12 +0200)]
net: stmmac: sun8i: Remove the compatibles
Since the bindings have been controversial, and we follow the DT stable ABI
rule, we shouldn't let a driver with a DT binding that might change slip
through in a stable release.
Remove the compatibles to make sure the driver will not probe and no-one
will start using the binding currently implemented. This commit will
obviously need to be reverted in due time.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Aug 2017 22:20:25 +0000 (15:20 -0700)]
Merge branch 'nfp-flow-dissector-layer'
Pieter Jansen van Vuuren says:
====================
nfp: fix layer calculation and flow dissector use
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. Formerly flow dissectors were referenced
without first checking that they are in use and correctly populated by TC.
Additionally this patch set fixes the incorrect use of mask field for vlan
matching.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: remove incorrect mask check for vlan matching
Previously the vlan tci field was incorrectly exact matched. This patch
fixes this by using the flow dissector to populate the vlan tci field.
Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Previously when calculating the supported key layers MPLS, IPv4/6
TTL and TOS were not considered. This patch checks that the TTL and
TOS fields are masked out before offloading. Additionally this patch
checks that MPLS packets are correctly handled, by not offloading them.
Fixes: af9d842c1354 ("nfp: extend flower add flow offload") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Previously flow dissectors were referenced without first checking that
they are in use and correctly populated by TC. This patch fixes this by
checking each flow dissector key before referencing them.
Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 28 Aug 2017 18:34:59 +0000 (11:34 -0700)]
Merge branch 'l2tp-tunnel-refs'
Guillaume Nault says:
====================
l2tp: fix some l2tp_tunnel_find() issues in l2tp_netlink
Since l2tp_tunnel_find() doesn't take a reference on the tunnel it
returns, its users are almost guaranteed to be racy.
This series defines l2tp_tunnel_get() which can be used as a safe
replacement, and converts some of l2tp_tunnel_find() users in the
l2tp_netlink module.
Other users often combine this issue with other more or less subtle
races. They will be fixed incrementally in followup series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:51:46 +0000 (16:51 +0200)]
l2tp: hold tunnel used while creating sessions with netlink
Use l2tp_tunnel_get() to retrieve tunnel, so that it can't go away on
us. Otherwise l2tp_tunnel_destruct() might release the last reference
count concurrently, thus freeing the tunnel while we're using it.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:51:43 +0000 (16:51 +0200)]
l2tp: hold tunnel while handling genl TUNNEL_GET commands
Use l2tp_tunnel_get() instead of l2tp_tunnel_find() so that we get
a reference on the tunnel, preventing l2tp_tunnel_destruct() from
freeing it from under us.
Also move l2tp_tunnel_get() below nlmsg_new() so that we only take
the reference when needed.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:51:42 +0000 (16:51 +0200)]
l2tp: hold tunnel while handling genl tunnel updates
We need to make sure the tunnel is not going to be destroyed by
l2tp_tunnel_destruct() concurrently.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:51:42 +0000 (16:51 +0200)]
l2tp: hold tunnel while processing genl delete command
l2tp_nl_cmd_tunnel_delete() needs to take a reference on the tunnel, to
prevent it from being concurrently freed by l2tp_tunnel_destruct().
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:51:40 +0000 (16:51 +0200)]
l2tp: hold tunnel while looking up sessions in l2tp_netlink
l2tp_tunnel_find() doesn't take a reference on the returned tunnel.
Therefore, it's unsafe to use it because the returned tunnel can go
away on us anytime.
Fix this by defining l2tp_tunnel_get(), which works like
l2tp_tunnel_find(), but takes a reference on the returned tunnel.
Caller then has to drop this reference using l2tp_tunnel_dec_refcount().
As l2tp_tunnel_dec_refcount() needs to be moved to l2tp_core.h, let's
simplify the patch and not move the L2TP_REFCNT_DEBUG part. This code
has been broken (not even compiling) in May 2012 by
commit a4ca44fa578c ("net: l2tp: Standardize logging styles")
and fixed more than two years later by
commit 29abe2fda54f ("l2tp: fix missing line continuation"). So it
doesn't appear to be used by anyone.
Same thing for l2tp_tunnel_free(); instead of moving it to l2tp_core.h,
let's just simplify things and call kfree_rcu() directly in
l2tp_tunnel_dec_refcount(). Extra assertions and debugging code
provided by l2tp_tunnel_free() didn't help catching any of the
reference counting and socket handling issues found while working on
this series.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 25 Aug 2017 14:22:17 +0000 (16:22 +0200)]
l2tp: initialise session's refcount before making it reachable
Sessions must be fully initialised before calling
l2tp_session_add_to_tunnel(). Otherwise, there's a short time frame
where partially initialised sessions can be accessed by external users.
Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation") Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Fri, 25 Aug 2017 14:14:17 +0000 (16:14 +0200)]
net: mvpp2: fix the mac address used when using PPv2.2
The mac address is only retrieved from h/w when using PPv2.1. Otherwise
the variable holding it is still checked and used if it contains a valid
value. As the variable isn't initialized to an invalid mac address
value, we end up with random mac addresses which can be the same for all
the ports handled by this PPv2 driver.
Fixes this by initializing the h/w mac address variable to {0}, which is
an invalid mac address value. This way the random assignation fallback
is called and all ports end up with their own addresses.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com> Fixes: 2697582144dd ("net: mvpp2: handle misc PPv2.1/PPv2.2 differences") Signed-off-by: David S. Miller <davem@davemloft.net>
The u-blox TOBY-L4 is a LTE Advanced (Cat 6) module with HSPA+ and 2G
fallback.
Unlike the TOBY-L2, this module has one single USB layout and exposes
several TTYs for control and a NCM interface for data. Connecting this
module may be done just by activating the desired PDP context with
'AT+CGACT=1,<cid>' and then running DHCP on the NCM interface.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Signed-off-by: David S. Miller <davem@davemloft.net>
net: missing call of trace_napi_poll in busy_poll_stop
Noticed that busy_poll_stop() also invoke the drivers napi->poll()
function pointer, but didn't have an associated call to trace_napi_poll()
like all other call sites.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mathias Krause [Sat, 26 Aug 2017 15:09:00 +0000 (17:09 +0200)]
xfrm_user: fix info leak in build_aevent()
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the sa_id before filling it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Fixes: d51d081d6504 ("[IPSEC]: Sync series - user") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Mathias Krause [Sat, 26 Aug 2017 15:08:59 +0000 (17:08 +0200)]
xfrm_user: fix info leak in build_expire()
The memory reserved to dump the expired xfrm state includes padding
bytes in struct xfrm_user_expire added by the compiler for alignment. To
prevent the heap info leak, memset(0) the remainder of the struct.
Initializing the whole structure isn't needed as copy_to_user_state()
already takes care of clearing the padding bytes within the 'state'
member.
Mathias Krause [Sat, 26 Aug 2017 15:08:58 +0000 (17:08 +0200)]
xfrm_user: fix info leak in xfrm_notify_sa()
The memory reserved to dump the ID of the xfrm state includes a padding
byte in struct xfrm_usersa_id added by the compiler for alignment. To
prevent the heap info leak, memset(0) the whole struct before filling
it.
Cc: Herbert Xu <herbert@gondor.apana.org.au> Fixes: 0603eac0d6b7 ("[IPSEC]: Add XFRMA_SA/XFRMA_POLICY for delete notification") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Mathias Krause [Sat, 26 Aug 2017 15:08:57 +0000 (17:08 +0200)]
xfrm_user: fix info leak in copy_user_offload()
The memory reserved to dump the xfrm offload state includes padding
bytes of struct xfrm_user_offload added by the compiler for alignment.
Add an explicit memset(0) before filling the buffer to avoid the heap
info leak.
Paolo Abeni [Fri, 25 Aug 2017 12:31:01 +0000 (14:31 +0200)]
udp6: set rx_dst_cookie on rx_dst updates
Currently, in the udp6 code, the dst cookie is not initialized/updated
concurrently with the RX dst used by early demux.
As a result, the dst_check() in the early_demux path always fails,
the rx dst cache is always invalidated, and we can't really
leverage significant gain from the demux lookup.
Fix it adding udp6 specific variant of sk_rx_dst_set() and use it
to set the dst cookie when the dst entry is really changed.
The issue is there since the introduction of early demux for ipv6.
Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 26 Aug 2017 02:13:28 +0000 (19:13 -0700)]
Merge branch 'r8169-Be-drop-monitor-friendly'
Florian Fainelli says:
====================
r8169: Be drop monitor friendly
First patch may be questionable but no other driver appears to be
doing that and while it is defendable to account for left packets as
dropped during TX clean, this appears misleading. I picked Stanislaw
changes which brings us back to 2010, but this was present from
pre-git days as well.
Second patch fixes the two missing calls to dev_consume_skb_any().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 25 Aug 2017 01:34:44 +0000 (18:34 -0700)]
r8169: Be drop monitor friendly
rtl_tx() is the TX reclamation process whereas rtl8169_tx_clear_range() does
the TX ring cleaning during shutdown, both of these functions should call
dev_consume_skb_any() to be drop monitor friendly.
Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets") Fixes: eb781397904e ("r8169: Do not use dev_kfree_skb in xmit path") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 25 Aug 2017 01:34:43 +0000 (18:34 -0700)]
r8169: Do not increment tx_dropped in TX ring cleaning
rtl8169_tx_clear_range() is responsible for cleaning up the TX ring
during interface shutdown, incrementing tx_dropped for every SKB that we
left at the time in the ring is misleading.
Fixes: cac4b22f3d6a ("r8169: do not account fragments as packets") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 25 Aug 2017 11:10:12 +0000 (13:10 +0200)]
tcp: fix refcnt leak with ebpf congestion control
There are a few bugs around refcnt handling in the new BPF congestion
control setsockopt:
- The new ca is assigned to icsk->icsk_ca_ops even in the case where we
cannot get a reference on it. This would lead to a use after free,
since that ca is going away soon.
- Changing the congestion control case doesn't release the refcnt on
the previous ca.
- In the reinit case, we first leak a reference on the old ca, then we
call tcp_reinit_congestion_control on the ca that we have just
assigned, leading to deinitializing the wrong ca (->release of the
new ca on the old ca's data) and releasing the refcount on the ca
that we actually want to use.
This is visible by building (for example) BIC as a module and setting
net.ipv4.tcp_congestion_control=bic, and using tcp_cong_kern.c from
samples/bpf.
This patch fixes the refcount issues, and moves reinit back into tcp
core to avoid passing a ca pointer back to BPF.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Fri, 25 Aug 2017 07:05:42 +0000 (09:05 +0200)]
ipv6: Fix may be used uninitialized warning in rt6_check
rt_cookie might be used uninitialized, fix this by
initializing it.
Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Fri, 25 Aug 2017 05:34:35 +0000 (07:34 +0200)]
esp: Fix skb tailroom calculation
We use skb_availroom to calculate the skb tailroom for the
ESP trailer. skb_availroom calculates the tailroom and
subtracts this value by reserved_tailroom. However
reserved_tailroom is a union with the skb mark. This means
that we subtract the tailroom by the skb mark if set.
Fix this by using skb_tailroom instead.
Steffen Klassert [Fri, 25 Aug 2017 05:16:07 +0000 (07:16 +0200)]
esp: Fix locking on page fragment allocation
We allocate the page fragment for the ESP trailer inside
a spinlock, but consume it outside of the lock. This
is racy as some other cou could get the same page fragment
then. Fix this by consuming the page fragment inside the
lock too.
netvsc: fix deadlock betwen link status and removal
There is a deadlock possible when canceling the link status
delayed work queue. The removal process is run with RTNL held,
and the link status callback is acquring RTNL.
Resolve the issue by using trylock and rescheduling.
If cancel is in process, that block it from happening.
Fixes: 122a5f6410f4 ("staging: hv: use delayed_work for netvsc_send_garp()") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: reassign pointers after skb reallocation / linearization
In tipc_msg_reverse(), we assign skb attributes to local pointers
in stack at startup. This is followed by skb_linearize() and for
cloned buffers we perform skb relocation using pskb_expand_head().
Both these methods may update the skb attributes and thus making
the pointers incorrect.
In this commit, we fix this error by ensuring that the pointers
are re-assigned after any of these skb operations.
Fixes: 29042e19f2c60 ("tipc: let function tipc_msg_reverse() expand header
when needed") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: perform skb_linearize() before parsing the inner header
In tipc_rcv(), we linearize only the header and usually the packets
are consumed as the nodes permit direct reception. However, if the
skb contains tunnelled message due to fail over or synchronization
we parse it in tipc_node_check_state() without performing
linearization. This will cause link disturbances if the skb was
non linear.
In this commit, we perform linearization for the above messages.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 7b9364050246 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Reshetova, Elena <elena.reshetova@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 24 Aug 2017 23:01:13 +0000 (16:01 -0700)]
net: systemport: Free DMA coherent descriptors on errors
In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
would return with an error, and call bcm_sysport_fini_tx_ring() and it
would see that ring->cbs is NULL and do nothing. This would leak the
coherent DMA descriptor area, so we need to free it on error before
returning.
Reported-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 24 Aug 2017 22:56:29 +0000 (15:56 -0700)]
net: bcmgenet: Be drop monitor friendly
There are 3 spots where we call dev_kfree_skb() but we are actually
just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
Fixes: d6707bec5986 ("net: bcmgenet: rewrite bcmgenet_rx_refill()") Fixes: f48bed16a756 ("net: bcmgenet: Free skb after last Tx frag") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 24 Aug 2017 22:20:41 +0000 (15:20 -0700)]
net: systemport: Be drop monitor friendly
Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
used when a TX packet is completed, as well as when the RX ring is
cleaned on shutdown. None of these two cases are packet drops, so be
drop monitor friendly.
Suggested-by: Eric Dumazet <edumazet@gmail.com> Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bob Peterson [Wed, 23 Aug 2017 14:43:02 +0000 (10:43 -0400)]
tipc: Fix tipc_sk_reinit handling of -EAGAIN
In 9dbbfb0ab6680c6a85609041011484e6658e7d3c function tipc_sk_reinit
had additional logic added to loop in the event that function
rhashtable_walk_next() returned -EAGAIN. No worries.
However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
and therefore skips the call to rhashtable_walk_stop(). That has
the effect of calling rcu_read_lock() without its paired call to
rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
may not be apparent for a while, especially since resize events may
be rare. But the comments to rhashtable_walk_start() state:
* ...Note that we take the RCU lock in all
* cases including when we return an error. So you must always call
* rhashtable_walk_stop to clean up.
This patch replaces the continue with a goto and label to ensure a
matching call to rhashtable_walk_stop().
Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 23 Aug 2017 13:59:49 +0000 (15:59 +0200)]
qlge: avoid memcpy buffer overflow
gcc-8.0.0 (snapshot) points out that we copy a variable-length string
into a fixed length field using memcpy() with the destination length,
and that ends up copying whatever follows the string:
inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
Changing it to use strncpy() will instead zero-pad the destination,
which seems to be the right thing to do here.
The bug is probably harmless, but it seems like a good idea to address
it in stable kernels as well, if only for the purpose of building with
gcc-8 without warnings.
Fixes: a61f80261306 ("qlge: Add ethtool register dump function.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch
from Sabrina Dubroca.
2) Fix spurious EINVAL errors from iptables over nft compatibility layer.
3) Reload pointer to ip header only if there is non-terminal verdict,
ie. XT_CONTINUE, otherwise invalid memory access may happen, patch
from Taehee Yoo.
4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence
adjustment already, however from nf_nat_setup() assumes there's not.
Patch from Xin Long.
5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during
NFWS in Faro. Patch from Andy Zhou.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation treats the burst configuration the same as
rate configuration. This can cause the per packet cost to be lower
than configured. In effect, this bug causes the token bucket to be
refilled at a higher rate than what user has specified.
This patch changes the implementation so that the token bucket size
is controlled by "rate + burst", while maintain the token bucket
refill rate the same as user specified.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Xin Long [Thu, 10 Aug 2017 02:22:24 +0000 (10:22 +0800)]
netfilter: check for seqadj ext existence before adding it in nf_nat_setup_info
Commit 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy
and seqadj ct extensions") wanted to drop the packet when it fails to add
seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns
NULL.
But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext
already exists in a nf_conn. It will cause that userspace protocol doesn't
work when both dnat and snat are configured.
In router, when both dnat and snat are added, nf_nat_setup_info will be
called twice. The packet can be dropped at the 2nd time for DNAT due to
seqadj ext is already added at the 1st time for SNAT.
This patch is to fix it by checking for seqadj ext existence before adding
it, so that the packet will not be dropped if seqadj ext already exists.
Note that as Florian mentioned, as a long term, we should review ext_add()
behaviour, it's better to return a pointer to the existing ext instead.
Fixes: 4440a2ab3b9f ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions") Reported-by: Li Shuang <shuali@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Luca Coelho [Tue, 22 Aug 2017 07:37:29 +0000 (10:37 +0300)]
iwlwifi: pcie: move rx workqueue initialization to iwl_trans_pcie_alloc()
Work queues cannot be allocated when a mutex is held because the mutex
may be in use and that would make it sleep. Doing so generates the
following splat with 4.13+:
[ 19.513298] ======================================================
[ 19.513429] WARNING: possible circular locking dependency detected
[ 19.513557] 4.13.0-rc5+ #6 Not tainted
[ 19.513638] ------------------------------------------------------
[ 19.513767] cpuhp/0/12 is trying to acquire lock:
[ 19.513867] (&tz->lock){+.+.+.}, at: [<ffffffff924afebb>] thermal_zone_get_temp+0x5b/0xb0
[ 19.514047]
[ 19.514047] but task is already holding lock:
[ 19.514166] (cpuhp_state){+.+.+.}, at: [<ffffffff91cc4baa>] cpuhp_thread_fun+0x3a/0x210
[ 19.514338]
[ 19.514338] which lock already depends on the new lock.
This lock dependency already existed with previous kernel versions,
but it was not detected until commit 49dfe2a67797 ("cpuhotplug: Link
lock stacks for hotplug callbacks") was introduced.
Reported-by: David Weinehall <david.weinehall@intel.com> Reported-by: Jiri Kosina <jikos@kernel.org> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Lorenzo Colitti [Wed, 23 Aug 2017 08:14:39 +0000 (17:14 +0900)]
net: xfrm: don't double-hold dst when sk_policy in use.
While removing dst_entry garbage collection, commit 52df157f17e5
("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
changed xfrm_resolve_and_create_bundle so it returns an xdst with
a refcount of 1 instead of 0.
However, it did not delete the dst_hold performed by xfrm_lookup
when a per-socket policy is in use. This means that when a
socket policy is in use, dst entries returned by xfrm_lookup have
a refcount of 2, and are not freed when no longer in use.
Cc: Wei Wang <weiwan@google.com> Fixes: 52df157f17 ("xfrm: take refcnt of dst when creating struct xfrm_dst bundle")
Tested: https://android-review.googlesource.com/417481
Tested: https://android-review.googlesource.com/418659
Tested: https://android-review.googlesource.com/424463
Tested: https://android-review.googlesource.com/452776 passes on net-next Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
David S. Miller [Thu, 24 Aug 2017 05:42:43 +0000 (22:42 -0700)]
Merge branch 'bnxt_en-bug-fixes'
Michael Chan says:
====================
bnxt_en: bug fixes.
3 bug fixes related to XDP ring accounting in bnxt_setup_tc(), freeing
MSIX vectors when bnxt_re unregisters, and preserving the user-administered
PF MAC address when disabling SRIOV.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 23 Aug 2017 23:34:05 +0000 (19:34 -0400)]
bnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps().
bnxt_hwrm_func_qcaps() is called during probe to get all device
resources and it also sets up the factory MAC address. The same function
is called when SRIOV is disabled to reclaim all resources. If
the MAC address has been overridden by a user administered MAC
address, calling this function will overwrite it.
Separate the logic that sets up the default MAC address into a new
function bnxt_init_mac_addr() that is only called during probe time.
Fixes: 4a21b49b34c0 ("bnxt_en: Improve VF resource accounting.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 23 Aug 2017 23:34:04 +0000 (19:34 -0400)]
bnxt_en: Free MSIX vectors when unregistering the device from bnxt_re.
Take back ownership of the MSIX vectors when unregistering the device
from bnxt_re.
Fixes: a588e4580a7e ("bnxt_en: Add interface to support RDMA driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 23 Aug 2017 23:34:03 +0000 (19:34 -0400)]
bnxt_en: Fix .ndo_setup_tc() to include XDP rings.
When the number of TX rings is changed in bnxt_setup_tc(), we need to
include the XDP rings in the total TX ring count.
Fixes: 38413406277f ("bnxt_en: Add support for XDP_TX action.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 23 Aug 2017 21:41:50 +0000 (14:41 -0700)]
nfp: TX time stamp packets before HW doorbell is rung
TX completion may happen any time after HW queue was kicked.
We can't access the skb afterwards. Move the time stamping
before ringing the doorbell.
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Wed, 23 Aug 2017 11:27:13 +0000 (13:27 +0200)]
sctp: Avoid out-of-bounds reads from address storage
inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
to export diagnostic information to userspace.
However, the memory allocated to store sockaddr information is
smaller than that and depends on the address family, so we leak
up to 100 uninitialized bytes to userspace. Just use the size of
the source structs instead, in all the three cases this is what
userspace expects. Zero out the remaining memory.
Unused bytes (i.e. when IPv4 addresses are used) in source
structs sctp_sockaddr_entry and sctp_transport are already
cleared by sctp_add_bind_addr() and sctp_transport_new(),
respectively.
Noticed while testing KASAN-enabled kernel with 'ss':