flow_dissector: allow access only to a subset of __sk_buff fields
Use whitelist instead of a blacklist and allow only a small set of
fields that might be relevant in the context of flow dissector:
* data
* data_end
* flow_keys
This is required for the eth_get_headlen case where we have only a
chunk of data to dissect (i.e. trying to read the other skb fields
doesn't make sense).
Note, that it is a breaking API change! However, we've provided
flow_keys->n_proto as a substitute for skb->protocol; and there is
no need to manually handle skb->vlan_present. So even if we
break somebody, the migration is trivial. Unfortunately, we can't
support eth_get_headlen use-case without those breaking changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
flow_dissector: fix clamping of BPF flow_keys for non-zero nhoff
Don't allow BPF program to set flow_keys->nhoff to less than initial
value. We currently don't read the value afterwards in anything but
the tests, but it's still a good practice to return consistent
values to the test programs.
Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
selftests/bpf: fix vlan handling in flow dissector program
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.
Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.
Add simple test cases with VLAN tagged frames:
* single vlan for ipv4
* double vlan for ipv6
Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The device type for ip6 tunnels is set to
ARPHRD_TUNNEL6. However, the ip4ip6_err function
is expecting the device type of the tunnel to be
ARPHRD_TUNNEL. Since the device types do not
match, the function exits and the ICMP error
packet is not sent to the originating host. Note
that the device type for IPv4 tunnels is set to
ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of
ARPHRD_TUNNEL6 instead. Now the tunnel device
type matches and the ICMP error packet is sent
to the originating host.
Signed-off-by: Sheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
If dccp_feat_push_change fails, we forget free the mem
which is alloced by kmemdup in dccp_feat_clone_sp_val.
Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: e8ef967a54f4 ("dccp: Registration routines for changing feature values") Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 31 Mar 2019 08:58:15 +0000 (16:58 +0800)]
sctp: initialize _pad of sockaddr_in before copying to user memory
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
Call Trace:
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:174 [inline]
sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
...
Uninit was stored to memory at:
sctp_transport_init net/sctp/transport.c:61 [inline]
sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
...
Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
struct sockaddr_in) wasn't initialized, but directly copied to user memory
in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: flower: fix matching and pushing vlan CFI bit
This patch clears up some confusion around the meaning of bit 12
for FW messages related to VLAN and flower offload.
Pieter says:
It fixes issues with matching, pushing and popping vlan tags.
We replace the vlan CFI bit with a vlan present bit that
indicates the presence of a vlan tag. We also no longer set
the CFI when pushing vlan tags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: flower: remove vlan CFI bit from push vlan action
We no longer set CFI when pushing vlan tags, therefore we remove
the CFI bit from push vlan.
Fixes: 1a1e586f54bf ("nfp: add basic action capabilities to flower offloads") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Replace vlan CFI bit with a vlan present bit that indicates the
presence of a vlan tag. Previously the driver incorrectly assumed
that an vlan id of 0 is not matchable, therefore we indicate vlan
presence with a vlan present bit.
Fixes: 5571e8c9f241 ("nfp: extend flower matching capabilities") Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Louis Peens <louis.peens@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Fri, 29 Mar 2019 11:19:46 +0000 (12:19 +0100)]
kcm: switch order of device registration to fix a crash
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
Oops: 0002 [#1] SMP KASAN PTI
CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
...
CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
Call Trace:
kcm_create+0x600/0xbf0 [kcm]
__sock_create+0x324/0x750 net/socket.c:1272
...
This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: sched: fix stats accounting for child NOLOCK qdiscs
Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock)
qdiscs is buggy. Per CPU values are ignored in most places, as a result,
stats dump in the above scenario always report 0 length backlog and parent
backlog len is not updated correctly on NOLOCK qdisc removal.
The first patch address stats dumping, and the second one child qdisc removal.
I'm targeting the net tree as this is a bugfix, but it could be moved to
net-next due to the relatively large diffstat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 28 Mar 2019 15:53:13 +0000 (16:53 +0100)]
net: sched: introduce and use qdisc tree flush/purge helpers
The same code to flush qdisc tree and purge the qdisc queue
is duplicated in many places and in most cases it does not
respect NOLOCK qdisc: the global backlog len is used and the
per CPU values are ignored.
This change addresses the above, factoring-out the relevant
code and using the helpers introduced by the previous patch
to fetch the correct backlog len.
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 28 Mar 2019 15:53:12 +0000 (16:53 +0100)]
net: sched: introduce and use qstats read helpers
Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.
Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.
This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.
DRR needs also to access the child qdisc queue length, so it
needs custom handling.
Fixes: c5ad119fb6c0 ("net: sched: pfifo_fast use skb_array") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Thu, 28 Mar 2019 09:35:06 +0000 (10:35 +0100)]
net/sched: fix ->get helper of the matchall cls
It returned always NULL, thus it was never possible to get the filter.
Example:
$ ip link add foo type dummy
$ ip link add bar type dummy
$ tc qdisc add dev foo clsact
$ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
matchall action mirred ingress mirror dev bar
Before the patch:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
Error: Specified filter handle not found.
We have an error talking to the kernel
After:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
not_in_hw
action order 1: mirred (Ingress Mirror to device bar) pipe
index 1 ref 1 bind 1
CC: Yotam Gigi <yotamg@mellanox.com> CC: Jiri Pirko <jiri@mellanox.com> Fixes: fd62d9f5c575 ("net/sched: matchall: Fix configuration race") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vrf: check accept_source_route on the original netdevice
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.
v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Martin Habets <mhabets@solarflare.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Acked-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dust Li [Mon, 1 Apr 2019 08:04:53 +0000 (16:04 +0800)]
tcp: fix a potential NULL pointer dereference in tcp_sk_exit
When tcp_sk_init() failed in inet_ctl_sock_create(),
'net->ipv4.tcp_congestion_control' will be left
uninitialized, but tcp_sk_exit() hasn't check for
that.
This patch add checking on 'net->ipv4.tcp_congestion_control'
in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e1524477 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control") Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7c9 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
header function. It means even when the check added in that fix fails, it
won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
triggered again.
So this patch is to add the process for the err returned from cmd header
function in tipc_nl_compat_dumpit().
Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 31 Mar 2019 14:50:09 +0000 (22:50 +0800)]
tipc: check link name with right length in tipc_nl_compat_link_set
A similar issue as fixed by Patch "tipc: check bearer name with right
length in tipc_nl_compat_bearer_enable" was also found by syzbot in
tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
offsetof(struct tipc_link_config, name)'.
Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length:
'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: stmmac: fix handling of oversized frames
I accidentally had MTU size mismatch (9000 vs. 1500) in my network,
and I noticed I could kill a system using stmmac & 1500 MTU simply
by pinging it with "ping -s 2000 ...".
While testing a fix I encountered also some other issues that need fixing.
I have tested these only with enhanced descriptors, so the normal
descriptor changes need a careful review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 27 Mar 2019 20:35:39 +0000 (22:35 +0200)]
net: stmmac: fix dropping of multi-descriptor RX frames
Packets without the last descriptor set should be dropped early. If we
receive a frame larger than the DMA buffer, the HW will continue using the
next descriptor. Driver mistakes these as individual frames, and sometimes
a truncated frame (without the LD set) may look like a valid packet.
This fixes a strange issue where the system replies to 4098-byte ping
although the MTU/DMA buffer size is set to 4096, and yet at the same
time it's logging an oversized packet.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 27 Mar 2019 20:35:38 +0000 (22:35 +0200)]
net: stmmac: don't overwrite discard_frame status
If we have error bits set, the discard_frame status will get overwritten
by checksum bit checks, which might set the status back to good one.
Fix by checking the COE status only if the frame is good.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 27 Mar 2019 20:35:37 +0000 (22:35 +0200)]
net: stmmac: don't stop NAPI processing when dropping a packet
Currently, if we drop a packet, we exit from NAPI loop before the budget
is consumed. In some situations this will make the RX processing stall
e.g. when flood pinging the system with oversized packets, as the
errorneous packets are not dropped efficiently.
If we drop a packet, we should just continue to the next one as long as
the budget allows.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen [Wed, 27 Mar 2019 20:35:35 +0000 (22:35 +0200)]
net: stmmac: use correct DMA buffer size in the RX descriptor
We always program the maximum DMA buffer size into the receive descriptor,
although the allocated size may be less. E.g. with the default MTU size
we allocate only 1536 bytes. If somebody sends us a bigger frame, then
memory may get corrupted.
Fix by using exact buffer sizes.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 30 Mar 2019 16:13:24 +0000 (17:13 +0100)]
r8169: disable default rx interrupt coalescing on RTL8168
It was reported that re-introducing ASPM, in combination with RX
interrupt coalescing, results in significantly increased packet
latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes
the issue. Therefore change the driver's default to disable RX
interrupt coalescing. Users still have the option to enable RX
coalescing via ethtool.
Fixes: a99790bf5c7f ("r8169: Reinstate ASPM Support") Reported-by: Mike Crowe <mac@mcrowe.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Fri, 29 Mar 2019 01:18:02 +0000 (09:18 +0800)]
net: ethtool: not call vzalloc for zero sized memory request
NULL or ZERO_SIZE_PTR will be returned for zero sized memory
request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory
request and not call functions which maybe derefence the
NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats
returns error, memory should be freed before exit
Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Wang Li <wangli39@baidu.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 28 Mar 2019 21:54:43 +0000 (14:54 -0700)]
net: tls: prevent false connection termination with offload
Only decrypt_internal() performs zero copy on rx, all paths
which don't hit decrypt_internal() must set zc to false,
otherwise tls_sw_recvmsg() may return 0 causing the application
to believe that that connection got closed.
Currently this happens with device offload when new record
is first read from.
Fixes: d069b780e367 ("tls: Fix tls_device receive") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Thu, 28 Mar 2019 19:40:36 +0000 (19:40 +0000)]
hv_netvsc: Fix unwanted wakeup after tx_disable
After queue stopped, the wakeup mechanism may wake it up again
when ring buffer usage is lower than a threshold. This may cause
send path panic on NULL pointer when we stopped all tx queues in
netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted
queue wakeup.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic") Reported-by: Mohammed Gamal <mgamal@redhat.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Britstein [Mon, 18 Mar 2019 09:25:59 +0000 (09:25 +0000)]
net/mlx5e: Consider tunnel type for encap contexts
The driver allocates an encap context based on the tunnel properties,
and reuse that context for all flows using the same tunnel properties.
Commit df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading")
introduced another tunnel protocol other than the single VXLAN
previously supported. A flow that uses a tunnel with the same tunnel
properties but with a different tunnel type (GRE vs VXLAN for example)
would mistakenly reuse the previous alocated context, causing the
traffic to be sent with the wrong encapsulation. Fix that by
considering the tunnel type for encap contexts.
Fixes: df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Omri Kahalon [Sun, 24 Feb 2019 14:31:08 +0000 (16:31 +0200)]
net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands
Traditionally, the PF (Physical Function) which resides on vport 0 was
the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
which resides on vport 0xfffe, was introduced as the E-Switch manager,
the assumption that the E-switch manager is on vport 0 is incorrect.
Since the eswitch code already uses the actual vport value, all we
need is to always set other_vport=1.
Roi Dayan [Thu, 21 Mar 2019 22:51:35 +0000 (15:51 -0700)]
net/mlx5: E-Switch, Protect from invalid memory access in offload fdb table
The esw offloads structures share a union with the legacy mode structs.
Reset the offloads struct to zero in init to protect from null
assumptions made by the legacy mode code.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tonghao Zhang [Tue, 26 Feb 2019 12:28:32 +0000 (04:28 -0800)]
net/mlx5e: Correctly use the namespace type when allocating pedit action
The capacity of FDB offloading and NIC offloading table are
different, and when allocating the pedit actions, we should
use the correct namespace type.
Fixes: c500c86b0c75d ("net/mlx5e: support for two independent packet edit actions") Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Thu, 7 Mar 2019 07:27:18 +0000 (09:27 +0200)]
net/mlx5: E-Switch, Fix access to invalid memory when toggling esw modes
The esw fdb table has a union of legacy and offloads members.
So if we were in a certain esw mode we could set some memebers and not
set null which is fine as on destroy path and don't care.
But then moving from legacy to switchdev a second time, the cleanup flow
of legacy mode checks if a struct member was in use if it's not null so
we need to make sure to reset the code to null when we init legacy mode.
Fixes: 8da202b24913 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Thu, 28 Feb 2019 07:39:02 +0000 (09:39 +0200)]
net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys
Allow configuration of legacy link-modes even when extended link-modes
are supported. This requires reading of legacy advertisement even when
extended link-modes are supported. Since legacy and extended
advertisement are mutually excluded, wait for empty reply from extended
advertisement before reading legacy advertisement.
Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Aya Levin [Thu, 28 Feb 2019 07:27:33 +0000 (09:27 +0200)]
net/mlx5: ethtool, Fix type analysis of advertised link-mode
Ethtool option set_link_ksettings allows setting of legacy link-modes
or extended link-modes. Refine the decision of which type of link-modes
is set.
Fixes: 6a897372417e ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Dmytro Linkin [Mon, 4 Feb 2019 09:45:47 +0000 (09:45 +0000)]
net/mlx5e: Allow IPv4 ttl & IPv6 hop_limit rewrite for all L4 protocols
For some protocols we are not allowing IP header rewrite offload, since
the HW is not capable to properly adjust the l4 checksum. However, TTL
& HOPLIMIT modification can be done for all IP protocols, because they
are not part of the pseudo header taken into account for checksum.
Fixes: 738678817573 ("drivers: net: use flow action infrastructure") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gavi Teitz [Mon, 11 Mar 2019 09:56:34 +0000 (11:56 +0200)]
net/mlx5e: Fix error handling when refreshing TIRs
Previously, a false positive would be caught if the TIRs list is
empty, since the err value was initialized to -ENOMEM, and was only
updated if a TIR is refreshed. This is resolved by initializing the
err value to zero.
Artemy Kovalyov [Tue, 19 Mar 2019 09:24:38 +0000 (11:24 +0200)]
net/mlx5: Decrease default mr cache size
Delete initialization of high order entries in mr cache to decrease initial
memory footprint. When required, the administrator can populate the
entries with memory keys via the /sys interface.
This approach is very helpful to significantly reduce the per HW function
memory footprint in virtualization environments such as SRIOV.
Fixes: 9603b61de1ee ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reported-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We want to avoid leaking pointer info from xdp_frame (that is placed in
top of frame) like commit 6dfb970d3dbd ("xdp: avoid leaking info stored in
frame data on page reuse"), and followup commit 97e19cce05e5 ("bpf:
reserve xdp_frame size in xdp headroom") that reserve this headroom.
These changes also affected how cpumap constructed SKBs, as xdpf->headroom
size changed, the skb data starting point were in-effect shifted with 32
bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
calculation also included xdpf->headroom which were reduced by same amount.
A bug was introduced in commit 77ea5f4cbe20 ("bpf/cpumap: make sure
frame_size for build_skb is aligned if headroom isn't"), where the
xdpf->headroom became part of the SKB_DATA_ALIGN rounding up. This
round-up to find the frame_size is in principle still correct as it does
not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
limit. This cause skb_shared_info to spill into next frame. It is a little
hard to trigger, as the SKB need to use above 15 skb_shinfo->frags[] as
far as I calculate. This does happen in practise for TCP streams when
skb_try_coalesce() kicks in.
KASAN can be used to detect these wrong memory accesses, I've seen:
BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250
Driver veth also construct a SKB from xdp_frame in this way, but is not
affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
and allows SKB to use this area.
The fix in this patch is to do like veth and instead allow SKB to (re)use
the area occupied by xdp_frame, by clearing via xdp_scrub_frame(). (This
does kill the idea of the SKB being able to access (mem) info from this
area, but I guess it was a bad idea anyhow, and it was already killed by
the veth changes.)
Fixes: 77ea5f4cbe20 ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
net: core: netif_receive_skb_list: unlist skb before passing to pt->func
__netif_receive_skb_list_ptype() leaves skb->next poisoned before passing
it to pt_prev->func handler, what may produce (in certain cases, e.g. DSA
setup) crashes like:
Fix this by pulling skb off the sublist and zeroing skb->next pointer
before calling ptype callback.
Fixes: 88eb1944e18c ("net: core: propagate SKB lists through packet_type lookup") Reviewed-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Thu, 28 Mar 2019 09:10:56 +0000 (17:10 +0800)]
net: rds: force to destroy connection if t_sock is NULL in rds_tcp_kill_sock().
When it is to cleanup net namespace, rds_tcp_exit_net() will call
rds_tcp_kill_sock(), if t_sock is NULL, it will not call
rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
sock->ops->connect, but if connect() is failed, it will call
rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
failed, rds_connect_worker() will try to reconnect all the time, so
rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do
cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
NULL, and there is on other path to cancel cp_conn_w and free
connection. So this patch is to fix this.
rds_tcp_kill_sock():
...
if (net != c_net || !tc->t_sock)
... Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
==================================================================
BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
net/ipv4/af_inet.c:340
Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Righi [Thu, 28 Mar 2019 06:36:00 +0000 (07:36 +0100)]
openvswitch: fix flow actions reallocation
The flow action buffer can be resized if it's not big enough to contain
all the requested flow actions. However, this resize doesn't take into
account the new requested size, the buffer is only increased by a factor
of 2x. This might be not enough to contain the new data, causing a
buffer overflow, for example:
Fix by making sure the new buffer is properly resized to contain all the
requested data.
BugLink: https://bugs.launchpad.net/bugs/1813244 Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: fix retcode and disable netpoll on representors
This series avoids a potential crash on nfp representor devices
when netpoll is in use. If transmitting the frame through underlying
vNIC fails we'd return an error code (by passing on error code from
__dev_queue_xmit()) and cause double free in netpoll code.
Fix the error code and disable netpoll on reprs altogether.
IRQ-safety of locking the queues and calling __dev_queue_xmit()
is questionable.
Big thanks to John Hurley for debugging and narrowing down
the trace log after I gave up! :)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 27 Mar 2019 18:38:39 +0000 (11:38 -0700)]
nfp: disable netpoll on representors
NFP reprs are software device on top of the PF's vNIC.
The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because
the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just
disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the
commit 5de73ee46704 ("nfp: general representor implementation")
.ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034e5 ("netpoll: make ndo_poll_controller() optional") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 27 Mar 2019 18:38:38 +0000 (11:38 -0700)]
nfp: validate the return code from dev_queue_xmit()
dev_queue_xmit() may return error codes as well as netdev_tx_t,
and it always consumes the skb. Make sure we always return a
correct netdev_tx_t value.
Fixes: eadfa4c3be99 ("nfp: add stats and xmit helpers for representors") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 27 Mar 2019 15:21:30 +0000 (08:21 -0700)]
netns: provide pure entropy for net_hash_mix()
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)
I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162aa6 ("netns: introduce the net_hash_mix "salt" for hashes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Amit Klein <aksecurity@gmail.com> Reported-by: Benny Pinkas <benny@pinkas.net> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 28 Mar 2019 19:59:54 +0000 (12:59 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-03-26
This series contains updates to igb, ixgbe, i40e and fm10k.
Jake fixes an issue with PTP in i40e where a previous commit resulted
in a regression where the driver would interpret small negative
adjustments as large positive additions, resulting in incorrect
behavior.
Arvind Sankar fixes an issue in igb where a previous commit would cause
a warning in the PCI pm core and resulted in pci_pm_runtime_suspend
would not call pci_save_state or pci_finish_runtime_suspend.
Ivan Vecera fixes MDIO bus registration with ixgbe, where the driver was
ignoring errors returned when registering and would leave the pointer in
a NULL state which triggered a BUG when un-registering.
Stefan Assmann fixes the check for Wake-On-LAN for i40e, which only
supports magic packet.
Yue Haibing fixes a potential NULL pointer de-reference in fm10k by
adding a simple check if the value is NULL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Björn Töpel [Wed, 27 Mar 2019 13:51:14 +0000 (14:51 +0100)]
libbpf: add libelf dependency to shared library build
The DPDK project is moving forward with its AF_XDP PMD, and during
that process some libbpf issues surfaced [1]: When libbpf was built
as a shared library, libelf was not included in the linking phase.
Since libelf is an internal depedency to libbpf, libelf should be
included. This patch adds '-lelf' to resolve that.
[1] https://patches.dpdk.org/patch/50704/#93571
Fixes: 1b76c13e4b36 ("bpf tools: Introduce 'bpf' library and add bpf feature check") Suggested-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Björn Töpel [Wed, 27 Mar 2019 13:51:13 +0000 (14:51 +0100)]
libbpf: add xsk.h to install_headers target
The xsk.h header file was missing from the install_headers target in
the Makefile. This patch simply adds xsk.h to the set of installed
headers.
Fixes: 1cad07884239 ("libbpf: add support for using AF_XDP sockets") Reported-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sabrina Dubroca [Tue, 26 Mar 2019 17:22:16 +0000 (18:22 +0100)]
vrf: prevent adding upper devices
VRF devices don't work with upper devices. Currently, it's possible to
add a VRF device to a bridge or team, and to create macvlan, macsec, or
ipvlan devices on top of a VRF (bond and vlan are prevented respectively
by the lack of an ndo_set_mac_address op and the NETIF_F_VLAN_CHALLENGED
feature flag).
Fix this by setting the IFF_NO_RX_HANDLER flag (introduced in commit f5426250a6ec ("net: introduce IFF_NO_RX_HANDLER")).
Cc: David Ahern <dsahern@gmail.com> Fixes: 193125dbd8eb ("net: Introduce VRF device driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In attempting to optimize receive buffer page recycling for XDP, commit 773225388dae15e72790 ("net: thunderx: Optimize page recycling for XDP")
inadvertently introduced two problems for the non-XDP case, that will be
addressed by this patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Nelson [Tue, 26 Mar 2019 15:53:26 +0000 (11:53 -0400)]
thunderx: eliminate extra calls to put_page() for pages held for recycling
For the non-XDP case, commit 773225388dae15e72790 ("net: thunderx: Optimize
page recycling for XDP") added code to nicvf_free_rbdr() that, when releasing
the additional receive buffer page reference held for recycling, repeatedly
calls put_page() until the page's _refcount goes to zero. Which results in
the page being freed.
This is not okay if the page's _refcount was greater than 1 (in the non-XDP
case), because nicvf_free_rbdr() should not be subtracting more than what
nicvf_alloc_page() had previously added to the page's _refcount, which was
only 1 (in the non-XDP case).
This can arise if a received packet is still being processed and the receive
buffer (i.e., skb->head) has not yet been freed via skb_free_head() when
nicvf_free_rbdr() is spinning through the aforementioned put_page() loop.
If this should occur, when the received packet finishes processing and
skb_free_head() is called, various problems can ensue. Exactly what, depends on
whether the page has already been reallocated or not, anything from "BUG: Bad
page state ... ", to "Unable to handle kernel NULL pointer dereference ..." or
"Unable to handle kernel paging request...".
So this patch changes nicvf_free_rbdr() to only call put_page() once for pages
held for recycling (in the non-XDP case).
Fixes: 773225388dae ("net: thunderx: Optimize page recycling for XDP") Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dean Nelson [Tue, 26 Mar 2019 15:53:19 +0000 (11:53 -0400)]
thunderx: enable page recycling for non-XDP case
Commit 773225388dae15e72790 ("net: thunderx: Optimize page recycling for XDP")
added code to nicvf_alloc_page() that inadvertently disables receive buffer
page recycling for the non-XDP case by always NULL'ng the page pointer.
This patch corrects two if-conditionals to allow for the recycling of non-XDP
mode pages by only setting the page pointer to NULL when the page is not ready
for recycling.
Fixes: 773225388dae ("net: thunderx: Optimize page recycling for XDP") Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Tue, 26 Mar 2019 09:48:57 +0000 (11:48 +0200)]
net: mii: Fix PAUSE cap advertisement from linkmode_adv_to_lcl_adv_t() helper
With a recent link mode advertisement code update this helper
providing local pause capability translation used for flow
control link mode negotiation got broken.
For eth drivers using this helper, the issue is apparent only
if either PAUSE or ASYM_PAUSE is being advertised.
Fixes: 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 26 Mar 2019 05:50:14 +0000 (13:50 +0800)]
ila: Fix rhashtable walker list corruption
ila_xlat_nl_cmd_flush uses rhashtable walkers allocated from the
stack but it never frees them. This corrupts the walker list of
the hash table.
This patch fixes it.
Reported-by: syzbot+dae72a112334aa65a159@syzkaller.appspotmail.com Fixes: b6e71bdebb12 ("ila: Flush netlink command to clear xlat...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 25 Mar 2019 21:35:30 +0000 (14:35 -0700)]
MAINTAINERS: Fix documentation file name for PHY Library
MAINTAINERS still pointed to phy.txt after moving this file into the rst
format, fix this.
Reported-by: Joe Perches <joe@perches.com> Fixes: 25fe02d00a1e ("Documentation: net: phy: switch documentation to rst format") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Mon, 25 Mar 2019 13:18:06 +0000 (14:18 +0100)]
net: datagram: fix unbounded loop in __skb_try_recv_datagram()
Christoph reported a stall while peeking datagram with an offset when
busy polling is enabled. __skb_try_recv_datagram() uses as the loop
termination condition 'queue empty'. When peeking, the socket
queue can be not empty, even when no additional packets are received.
Address the issue explicitly checking for receive queue changes,
as currently done by __skb_wait_for_more_packets().
Fixes: 2b5cd0dfa384 ("net: Change return type of sk_busy_loop from bool to void") Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Sat, 23 Mar 2019 18:41:32 +0000 (19:41 +0100)]
net: dsa: mv88e6xxx: fix few issues in mv88e6390x_port_set_cmode
This patches fixes few issues in mv88e6390x_port_set_cmode().
1. When entering the function the old cmode may be 0, in this case
mv88e6390x_serdes_get_lane() returns -ENODEV. As result we bail
out and have no chance to set a new mode. Therefore deal properly
with -ENODEV.
2. Once we have disabled power and irq, let's set the cached cmode to 0.
This reflects the actual status and is cleaner if we bail out with an
error in the following function calls.
3. The cached cmode is used by mv88e6390x_serdes_get_lane(),
mv88e6390_serdes_power_lane() and mv88e6390_serdes_irq_enable().
Currently we set the cached mode to the new one at the very end of
the function only, means until then we use the old one what may be
wrong.
4. When calling mv88e6390_serdes_irq_enable() we use the lane value
belonging to the old cmode. Get the lane belonging to the new cmode
before calling this function.
It's hard to provide a good "Fixes" tag because quite a few smaller
changes have been done to the code in question recently.
Fixes: d235c48b40d3 ("net: dsa: mv88e6xxx: power serdes on/off for 10G interfaces on 6390X") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
====================
This patch set fixes bug in btf_dedup_is_equiv() check mishandling equivalence
comparison between VOID kind in candidate type graph versus anonymous non-VOID
kind in canonical type graph.
Patch #1 fixes bug, by comparing candidate and canonical kinds for equality,
before proceeding to kind-specific checks.
Patch #2 adds a test case testing this specific scenario.
====================
Andrii Nakryiko [Wed, 27 Mar 2019 05:00:07 +0000 (22:00 -0700)]
selftests/bpf: add btf_dedup test for VOID equivalence check
This patch adds specific test exposing bug in btf_dedup_is_equiv() when
comparing candidate VOID type to a non-VOID canonical type. It's
important for canonical type to be anonymous, otherwise name equality
check will do the right thing and will exit early.
Andrii Nakryiko [Wed, 27 Mar 2019 05:00:06 +0000 (22:00 -0700)]
libbpf: fix btf_dedup equivalence check handling of different kinds
btf_dedup_is_equiv() used to compare btf_type->info fields, before doing
kind-specific equivalence check. This comparsion implicitly verified
that candidate and canonical types are of the same kind. With enum fwd
resolution logic this check couldn't be done generically anymore, as for
enums info contains vlen, which differs between enum fwd and
fully-defined enum, so this check was subsumed by kind-specific
equivalence checks.
This change caused btf_dedup_is_equiv() to let through VOID vs other
types check to reach switch, which was never meant to be handing VOID
kind, as VOID kind is always pre-resolved to itself and is only
equivalent to itself, which is checked early in btf_dedup_is_equiv().
This change adds back BTF kind equality check in place of more generic
btf_type->info check, still defering further kind-specific checks to
a per-kind switch.
Stefan Assmann [Thu, 21 Mar 2019 12:45:35 +0000 (13:45 +0100)]
i40e: fix WoL support check
The current check for WoL on i40e is broken. Code comment says only
magic packet is supported, so only check for that.
Fixes: 540a152da762 (i40e/ixgbe/igb: fail on new WoL flag setting WAKE_MAGICSECURE) Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ivan Vecera [Fri, 15 Mar 2019 08:45:15 +0000 (09:45 +0100)]
ixgbe: fix mdio bus registration
The ixgbe ignores errors returned from mdiobus_register() and leaves
adapter->mii_bus non-NULL and MDIO bus state as MDIOBUS_ALLOCATED.
This triggers a BUG from mdiobus_unregister() during ixgbe_remove() call.
Fixes: 8fa10ef01260 ("ixgbe: register a mdiobus") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Arvind Sankar [Sat, 2 Mar 2019 16:01:17 +0000 (11:01 -0500)]
igb: Fix WARN_ONCE on runtime suspend
The runtime_suspend device callbacks are not supposed to save
configuration state or change the power state. Commit fb29f76cc566
("igb: Fix an issue that PME is not enabled during runtime suspend")
changed the driver to not save configuration state during runtime
suspend, however the driver callback still put the device into a
low-power state. This causes a warning in the pci pm core and results in
pci_pm_runtime_suspend not calling pci_save_state or pci_finish_runtime_suspend.
Fix this by not changing the power state either, leaving that to pci pm
core, and make the same change for suspend callback as well.
Also move a couple of defines into the appropriate header file instead
of inline in the .c file.
Fixes: fb29f76cc566 ("igb: Fix an issue that PME is not enabled during runtime suspend") Signed-off-by: Arvind Sankar <niveditas98@gmail.com> Reviewed-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 25 Feb 2019 19:20:05 +0000 (11:20 -0800)]
i40e: fix i40e_ptp_adjtime when given a negative delta
Commit 0ac30ce43323 ("i40e: fix up 32 bit timespec references",
2017-07-26) claims to be cleaning up references to 32-bit timespecs.
The actual contents of the commit make no sense, as it converts a call
to timespec64_add into timespec64_add_ns. This would seem ok, if (a) the
change was documented in the commit message, and (b) timespec64_add_ns
supported negative numbers.
timespec64_add_ns doesn't work with signed deltas, because the
implementation is based around iter_div_u64_rem. This change resulted in
a regression where i40e_ptp_adjtime would interpret small negative
adjustments as large positive additions, resulting in incorrect
behavior.
This commit doesn't appear to fix anything, is not well explained, and
introduces a bug, so lets just revert it.
Reverts: 0ac30ce43323 ("i40e: fix up 32 bit timespec references", 2017-07-26) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Linus Torvalds [Tue, 26 Mar 2019 21:25:48 +0000 (14:25 -0700)]
Merge tag 'nfs-for-5.1-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
- fix mount/umount race in nlmclnt.
- NFSv4.1 don't free interrupted slot on open
Bugfixes:
- Don't let RPC_SOFTCONN tasks time out if the transport is connected
- Fix a typo in nfs_init_timeout_values()
- Fix layoutstats handling during read failovers
- fix uninitialized variable warning"
* tag 'nfs-for-5.1-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: fix uninitialized variable warning
pNFS/flexfiles: Fix layoutstats handling during read failovers
NFS: Fix a typo in nfs_init_timeout_values()
SUNRPC: Don't let RPC_SOFTCONN tasks time out if the transport is connected
NFSv4.1 don't free interrupted slot on open
NFS: fix mount/umount race in nlmclnt.
NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data()
Section 2.2.1 BTF_KIND_INT a bullet list was collapsed due to
text reflow in commit 9ab5305dbe3f ("docs/btf: reflow text to
fill up to 78 characters").
This patch correct the mistake. Also adjust next bullet list,
which is used for comparison, to get rendered the same way.
Fixes: 9ab5305dbe3f ("docs/btf: reflow text to fill up to 78 characters") Link: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-int Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alakesh Haloi [Tue, 26 Mar 2019 02:00:01 +0000 (02:00 +0000)]
SUNRPC: fix uninitialized variable warning
Avoid following compiler warning on uninitialized variable
net/sunrpc/xprtsock.c: In function ‘xs_read_stream_request.constprop’:
net/sunrpc/xprtsock.c:525:10: warning: ‘read’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return read;
^~~~
net/sunrpc/xprtsock.c:529:23: warning: ‘ret’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return ret < 0 ? ret : read;
~~~~~~~~~~~~~~^~~~~~
====================
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
The first patch implements the conservative option and rejects such code;
the second adds a test case.
====================
Paul Chaignon [Wed, 20 Mar 2019 12:58:27 +0000 (13:58 +0100)]
bpf: remove incorrect 'verifier bug' warning
The BPF verifier checks the maximum number of call stack frames twice,
first in the main CFG traversal (do_check) and then in a subsequent
traversal (check_max_stack_depth). If the second check fails, it logs a
'verifier bug' warning and errors out, as the number of call stack frames
should have been verified already.
However, the second check may fail without indicating a verifier bug: if
the excessive function calls reside in dead code, the main CFG traversal
may not visit them; the subsequent traversal visits all instructions,
including dead code.
This case raises the question of how invalid dead code should be treated.
This patch implements the conservative option and rejects such code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Tested-by: Xiao Han <xiao.han@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ioana Ciornei [Mon, 25 Mar 2019 13:06:22 +0000 (13:06 +0000)]
dpaa2-eth: fix race condition with bql frame accounting
It might happen that Tx conf acknowledges a frame before it was
subscribed in bql, as subscribing was previously done after the enqueue
operation.
This patch moves the netdev_tx_sent_queue call before the actual frame
enqueue, so that this can never happen.
Fixes: 569dac6a5a0d ("dpaa2-eth: bql support") Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 25 Mar 2019 12:49:16 +0000 (13:49 +0100)]
chelsio: use BUG() instead of BUG_ON(1)
clang warns about possible bugs in a dead code branch after
BUG_ON(1) when CONFIG_PROFILE_ALL_BRANCHES is enabled:
drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: error: variable 'buf_size' is used uninitialized whenever 'if'
condition is false [-Werror,-Wsometimes-uninitialized]
BUG_ON(1);
^~~~~~~~~
include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
^~~~~~~~~~~~~~~~~~~
include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
# define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x)))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/sge.c:482:9: note: uninitialized use occurs here
return buf_size;
^~~~~~~~
drivers/net/ethernet/chelsio/cxgb4/sge.c:479:3: note: remove the 'if' if its condition is always true
BUG_ON(1);
^
include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
^
drivers/net/ethernet/chelsio/cxgb4/sge.c:459:14: note: initialize the variable 'buf_size' to silence this warning
int buf_size;
^
= 0
Use BUG() here to create simpler code that clang understands
correctly.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sat, 23 Mar 2019 23:21:03 +0000 (00:21 +0100)]
net: devlink: skip info_get op call if it is not defined in dumpit
In dumpit, unlike doit, the check for info_get op being defined
is missing. Add it and avoid null pointer dereference in case driver
does not define this op.
Fixes: f9cf22882c60 ("devlink: add device information API") Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 23 Mar 2019 22:18:46 +0000 (00:18 +0200)]
net: phy: bcm54xx: Encode link speed and activity into LEDs
Previously the green and amber LEDs on this quad PHY were solid, to
indicate an encoding of the link speed (10/100/1000).
This keeps the LEDs always on just as before, but now they flash on
Rx/Tx activity.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It was caused by the netns freed without deleting the discoverer timer,
while later on the netns would be accessed in the timer handler.
The timer should have been deleted by tipc_net_stop() when cleaning up a
netns. However, tipc has been able to enable a bearer and start d->timer
without the local node_addr set since Commit 52dfae5c85a4 ("tipc: obtain
node identity from interface by default"), which caused the timer not to
be deleted in tipc_net_stop() then.
So fix it in tipc_net_stop() by changing to check local node_id instead
of local node_addr, as Jon suggested.
While at it, remove the calling of tipc_nametbl_withdraw() there, since
tipc_nametbl_stop() will take of the nametbl's freeing after.
Fixes: 52dfae5c85a4 ("tipc: obtain node identity from interface by default") Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>