Xin Long [Mon, 15 Jan 2018 09:02:00 +0000 (17:02 +0800)]
sctp: do not allow the v4 socket to bind a v4mapped v6 address
The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.
Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Jan 2018 09:01:36 +0000 (17:01 +0800)]
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.
Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Jan 2018 09:01:19 +0000 (17:01 +0800)]
sctp: reinit stream if stream outcnt has been change by sinit in sendmsg
After introducing sctp_stream structure, sctp uses stream->outcnt as the
out stream nums instead of c.sinit_num_ostreams.
However when users use sinit in cmsg, it only updates c.sinit_num_ostreams
in sctp_sendmsg. At that moment, stream->outcnt is still using previous
value. If it's value is not updated, the sinit_num_ostreams of sinit could
not really work.
This patch is to fix it by updating stream->outcnt and reiniting stream
if stream outcnt has been change by sinit in sendmsg.
Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Thu, 11 Jan 2018 01:39:52 +0000 (19:39 -0600)]
ibmvnic: Fix pending MAC address changes
Due to architecture limitations, the IBM VNIC client driver is unable
to perform MAC address changes unless the device has "logged in" to
its backing device. Currently, pending MAC changes are handled before
login, resulting in an error and failure to change the MAC address.
Moving that chunk to the end of the ibmvnic_login function, when we are
sure that it was successful, fixes that.
The MAC address can be changed when the device is up or down, so
only check if the device is in a "PROBED" state before setting the
MAC address.
Fixes: c26eba03e407 ("ibmvnic: Update reset infrastructure to support tunable parameters") Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Reviewed-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning
in newer versions of gcc:
warning: array initialized from parenthesized string constant
Just remove the parentheses, they're not needed in this context since
anyway since there can be no operator precendence issues or similar.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
This used to be the previous behavior in older kernels but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path)
and then later removed because it was broken in 0bb4087cbec0 (ipv4: Fix neigh
lookup keying over loopback/point-to-point devices)
Not having this results in there being an arp entry for every remote ip
address that the device talks to. Given a fairly active device it can
cause the arp table to become huge and/or having to add/purge large number
of entires to keep within table size thresholds.
$ ip -4 neigh show nud noarp | grep tun | wc -l
55850
Jim Westfall [Sun, 14 Jan 2018 12:18:51 +0000 (04:18 -0800)]
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices
to avoid making an entry for every remote ip the device needs to talk to.
This used the be the old behavior but became broken in a263b3093641f
(ipv4: Make neigh lookups directly in output packet path) and later removed
in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point
devices) because it was broken.
Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 13 Jan 2018 17:22:01 +0000 (20:22 +0300)]
sh_eth: fix dumping ARSTR
ARSTR is always located at the start of the TSU register region, thus
using add_reg() instead of add_tsu_reg() in __sh_eth_get_regs() to dump it
causes EDMR or EDSR (depending on the register layout) to be dumped instead
of ARSTR. Use the correct condition/macro there...
Fixes: 6b4b4fead342 ("sh_eth: Implement ethtool register dump operations") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field. Thus, this
patch reverts it and later patch will redo it using nested attr.
Signed-off-by: William Tu <u9012063@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
r.hering@avm.de [Fri, 12 Jan 2018 14:42:06 +0000 (15:42 +0100)]
net/tls: Fix inverted error codes to avoid endless loop
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs.
Socket error codes must be inverted by Kernel TLS before returning because
they are stored with positive sign. If returned non-inverted they are
interpreted as number of bytes sent, causing endless looping of the
splice mechanic behind sendfile().
Signed-off-by: Robert Hering <r.hering@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 12 Jan 2018 06:31:18 +0000 (22:31 -0800)]
ipv6: ip6_make_skb() needs to clear cork.base.dst
In my last patch, I missed fact that cork.base.dst was not initialized
in ip6_make_skb() :
If ip6_setup_cork() returns an error, we might attempt a dst_release()
on some random pointer.
Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 11 Jan 2018 09:36:26 +0000 (18:36 +0900)]
net: ipv4: Make "ip route get" match iif lo rules again.
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu
versions of route lookup") broke "ip route get" in the presence
of rules that specify iif lo.
Host-originated traffic always has iif lo, because
ip_route_output_key_hash and ip6_route_output_flags set the flow
iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a
convenient way to select only originated traffic and not forwarded
traffic.
inet_rtm_getroute used to match these rules correctly because
even though it sets the flow iif to 0, it called
ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX.
But now that it calls ip_route_output_key_hash_rcu, the ifindex
will remain 0 and not match the iif lo in the rule. As a result,
"ip route get" will return ENETUNREACH.
Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup")
Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Wed, 10 Jan 2018 21:00:39 +0000 (13:00 -0800)]
netlink: extack needs to be reset each time through loop
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value.
The problem is that netlink_rcv_skb loops over the skb repeatedly invoking
the callback and without resetting the extack leaving potentially stale
data. Initializing each time through avoids the WARN_ON.
Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting") Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 10 Jan 2018 20:50:25 +0000 (12:50 -0800)]
tipc: fix a memory leak in tipc_nl_node_get_link()
When tipc_node_find_by_name() fails, the nlmsg is not
freed.
While on it, switch to a goto label to properly
free it.
Fixes: be9c086715c ("tipc: narrow down exposure of struct tipc_node") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Wed, 10 Jan 2018 17:45:10 +0000 (12:45 -0500)]
ipv6: fix udpv6 sendmsg crash caused by too small MTU
The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers. A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU. For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.
Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.
Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mike Maloney <maloney@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Wed, 10 Jan 2018 15:24:45 +0000 (16:24 +0100)]
ppp: unlock all_ppp_mutex before registering device
ppp_dev_uninit(), which is the .ndo_uninit() handler of PPP devices,
needs to lock pn->all_ppp_mutex. Therefore we mustn't call
register_netdevice() with pn->all_ppp_mutex already locked, or we'd
deadlock in case register_netdevice() fails and calls .ndo_uninit().
Fortunately, we can unlock pn->all_ppp_mutex before calling
register_netdevice(). This lock protects pn->units_idr, which isn't
used in the device registration process.
However, keeping pn->all_ppp_mutex locked during device registration
did ensure that no device in transient state would be published in
pn->units_idr. In practice, unlocking it before calling
register_netdevice() doesn't change this property: ppp_unit_register()
is called with 'ppp_mutex' locked and all searches done in
pn->units_idr hold this lock too.
Fixes: 8cb775bc0a34 ("ppp: fix device unregistration upon netns deletion") Reported-and-tested-by: syzbot+367889b9c9e279219175@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
This explains why is the net usage of __ptr_ring_peek
actually ok without locks.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The 9P of Xen module is missing required license and module information.
See https://bugzilla.kernel.org/show_bug.cgi?id=198109
Reported-by: Alan Bartlett <ajb@elrepo.org> Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 15 Jan 2018 08:58:27 +0000 (09:58 +0100)]
cfg80211: check dev_set_name() return value
syzbot reported a warning from rfkill_alloc(), and after a while
I think that the reason is that it was doing fault injection and
the dev_set_name() failed, leaving the name NULL, and we didn't
check the return value and got to rfkill_alloc() with a NULL name.
Since we really don't want a NULL name, we ought to check the
return value.
Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()") Reported-by: syzbot+1ddfb3357e1d7bb5b5d3@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 15 Jan 2018 08:32:36 +0000 (09:32 +0100)]
mac80211_hwsim: validate number of different channels
When creating a new radio on the fly, hwsim allows this
to be done with an arbitrary number of channels, but
cfg80211 only supports a limited number of simultaneous
channels, leading to a warning.
Fix this by validating the number - this requires moving
the define for the maximum out to a visible header file.
Reported-by: syzbot+8dd9051ff19940290931@syzkaller.appspotmail.com Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unload
When closing multiple wmediumd instances with many radios and try to
unload the mac80211_hwsim module, it may happen that the work items live
longer than the module. To wait especially for this deletion work items,
add a work queue, otherwise flush_scheduled_work would be necessary.
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211: take RCU read lock when calling ieee80211_bss_get_ie()
As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both
the call to this function and any operation on this variable need
protection by the RCU read lock.
Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces") Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 15 Jan 2018 08:12:05 +0000 (09:12 +0100)]
cfg80211: fully initialize old channel for event
Paul reported that he got a report about undefined behaviour
that seems to me to originate in using uninitialized memory
when the channel structure here is used in the event code in
nl80211 later.
He never reported whether this fixed it, and I wasn't able
to trigger this so far, but we should do the right thing and
fully initialize the on-stack structure anyway.
Reported-by: Paul Menzel <pmenzel+linux-wireless@molgen.mpg.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Follow-up fix to the recent BPF out-of-bounds speculation
fix that prevents max_entries overflows and an undefined
behavior on 32 bit archs on index_mask calculation, from
Daniel.
2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that
was otherwise throwing an unknown opcode warning in the
interpreter, from Daniel.
3) Typo fix in one of the user facing verbose() messages that
was added during the BPF out-of-bounds speculation fix,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following series includes fixes to mlx5 core and netdev driver.
To highlight we have two critical fixes in this series:
1st patch from Eran to address a fix for Host2BMC Breakage.
2nd patch from Saeed to address the RDMA IRQ vector affinity settings query
issue, the patch provides the correct mlx5_core implementation for RDMA to
correctly query vector affinity.
I sent this patch privately to Sagi a week a go, so he could to test it
but I didn't hear from him.
All other patches are trivial misc fixes.
Please pull and let me know if there's any problem.
for -stable v4.14-y and later:
("net/mlx5: Fix get vector affinity helper function")
("{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed")
Note: Merging this series with net-next will produce the following conflict:
<<<<<<< HEAD
u8 disable_local_lb[0x1];
u8 reserved_at_3e2[0x1];
u8 log_min_hairpin_wq_data_sz[0x5];
u8 reserved_at_3e8[0x3];
=======
u8 disable_local_lb_uc[0x1];
u8 disable_local_lb_mc[0x1];
u8 reserved_at_3e3[0x8];
>>>>>>> 359c96447ac2297fabe15ef30b60f3b4b71e7fd0
To resolve, use the following hunk:
i.e:
<<<<<<
u8 disable_local_lb_uc[0x1];
u8 disable_local_lb_mc[0x1];
u8 log_min_hairpin_wq_data_sz[0x5];
u8 reserved_at_3e8[0x3];
>>>>>>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Don't allow to change the encap type on state updates.
The encap type is set on state initialization and
should not change anymore. From Herbert Xu.
2) Skip dead policies when rehashing to fix a
slab-out-of-bounds bug in xfrm_hash_rebuild.
From Florian Westphal.
3) Two buffer overread fixes in pfkey.
From Eric Biggers.
4) Fix rcu usage in xfrm_get_type_offload,
request_module can sleep, so can't be used
under rcu_read_lock. From Sabrina Dubroca.
5) Fix an uninitialized lock in xfrm_trans_queue.
Use __skb_queue_tail instead of skb_queue_tail
in xfrm_trans_queue as we don't need the lock.
From Herbert Xu.
6) Currently it is possible to create an xfrm state with an
unknown encap type in ESP IPv4. Fix this by returning an
error on unknown encap types. Also from Herbert Xu.
7) Fix sleeping inside a spinlock in xfrm_policy_cache_flush.
From Florian Westphal.
8) Fix ESP GRO when the headers not fully in the linear part
of the skb. We need to pull before we can access them.
9) Fix a skb leak on error in key_notify_policy.
10) Fix a race in the xdst pcpu cache, we need to
run the resolver routines with bottom halfes
off like the old flowcache did.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 12 Jan 2018 00:57:32 +0000 (16:57 -0800)]
Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two rbd fixes for 4.12 and 4.2 issues respectively, marked for
stable"
* tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client:
rbd: set max_segments to USHRT_MAX
rbd: reacquire lock should update lock owner client id
Linus Torvalds [Fri, 12 Jan 2018 00:54:35 +0000 (16:54 -0800)]
Merge tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fix from Linus Walleij:
"Fix a raw vs elaborate GPIO descriptor bug introduced by yours truly"
* tag 'gpio-v4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Feras Daoud [Mon, 8 Jan 2018 08:01:04 +0000 (10:01 +0200)]
net/mlx5e: Remove timestamp set from netdevice open flow
To avoid configuration override, timestamp set call will
be moved from the netdevice open flow to the init flow.
By this, a close-open procedure will not override the timestamp
configuration.
In addition, the change will rename mlx5e_timestamp_set function
to be mlx5e_timestamp_init.
PPS event did not update ptp_clock_event fields, therefore,
timestamp value was not updated correctly. This fix updates the
event source and the timestamp value for each PPS event.
Gal Pressman [Wed, 10 Jan 2018 15:11:11 +0000 (17:11 +0200)]
net/mlx5e: Don't override netdev features field unless in error flow
Set features function sets dev->features in order to keep track of which
features were successfully changed and which weren't (in case the user
asks for more than one change in a single command).
This breaks the logic in __netdev_update_features which assumes that
dev->features is not changed on success and checks for diffs between
features and dev->features (diffs that might not exist at this point
because of the driver override).
The solution is to keep track of successful/failed feature changes and
assign them to dev->features in case of failure only.
Fixes: 0e405443e803 ("net/mlx5e: Improve set features ndo resiliency") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Gal Pressman [Tue, 26 Dec 2017 11:44:49 +0000 (13:44 +0200)]
net/mlx5e: Keep updating ethtool statistics when the interface is down
ethtool statistics should be updated even when the interface is down
since it shows more than just netdev counters, which might change while
the logical link is down.
One useful use case, for example, is when running RoCE traffic over the
interface (while the logical link is down, but physical link is up) and
examining rx_prioX_bytes.
Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eran Ben Elisha [Mon, 20 Nov 2017 07:58:01 +0000 (09:58 +0200)]
net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.
Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Saeed Mahameed [Thu, 4 Jan 2018 02:35:51 +0000 (04:35 +0200)]
net/mlx5: Fix get vector affinity helper function
mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users. To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.
Eran Ben Elisha [Tue, 9 Jan 2018 09:41:10 +0000 (11:41 +0200)]
{net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.
Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)
Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.
Adapt mlx5_ib and mlx5e to support the new cap bits.
Fixes: 2c43c5a036be ("net/mlx5e: Enable local loopback in loopback selftest") Fixes: c85023e153e3 ("IB/mlx5: Add raw ethernet local loopback support") Fixes: bded747bb432 ("net/mlx5: Add raw ethernet local loopback firmware command") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Cc: kernel-team@fb.com Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
1) BPF speculation prevention and BPF_JIT_ALWAYS_ON, from Alexei
Starovoitov.
2) Revert dev_get_random_name() changes as adjust the error code
returns seen by userspace definitely breaks stuff.
3) Fix TX DMA map/unmap on older iwlwifi devices, from Emmanuel
Grumbach.
4) From wrong AF family when requesting sock diag modules, from Andrii
Vladyka.
5) Don't add new ipv6 routes attached to the null_entry, from Wei Wang.
6) Some SCTP sockopt length fixes from Marcelo Ricardo Leitner.
7) Don't leak when removing VLAN ID 0, from Cong Wang.
8) Hey there's a potential leak in ipv6_make_skb() too, from Eric
Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: sr: fix TLVs not being copied using setsockopt
ipv6: fix possible mem leaks in ipv6_make_skb()
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
mlxsw: pci: Wait after reset before accessing HW
nfp: always unmask aux interrupts at init
8021q: fix a memory leak for VLAN 0 device
of_mdio: avoid MDIO bus removal when a PHY is missing
caif_usb: use strlcpy() instead of strncpy()
doc: clarification about setting SO_ZEROCOPY
net: gianfar_ptp: move set_fipers() to spinlock protecting area
sctp: make use of pre-calculated len
sctp: add a ceiling to optlen in some sockopts
sctp: GFP_ATOMIC is not needed in sctp_setsockopt_events
bpf: introduce BPF_JIT_ALWAYS_ON config
bpf: avoid false sharing of map refcount with max_entries
ipv6: remove null_entry before adding default route
SolutionEngine771x: add Ether TSU resource
SolutionEngine771x: fix Ether platform data
docs-rst: networking: wire up msg_zerocopy
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
...
Al Viro [Wed, 10 Jan 2018 23:47:05 +0000 (18:47 -0500)]
Fix a leak in socket(2) when we fail to allocate a file descriptor.
Got broken by "make sock_alloc_file() do sock_release() on failures" -
cleanup after sock_map_fd() failure got pulled all the way into
sock_alloc_file(), but it used to serve the case when sock_map_fd()
failed *before* getting to sock_alloc_file() as well, and that got
lost. Trivial to fix, fortunately.
Fixes: 8e1611e23579 (make sock_alloc_file() do sock_release() on failures) Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Daniel Borkmann [Wed, 10 Jan 2018 22:25:05 +0000 (23:25 +0100)]
bpf, array: fix overflow in max_entries and undefined behavior in index_mask
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.
However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.
Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.
This fixes all the issues triggered by syzkaller's reproducers.
Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.
Mathieu Xhonneux [Wed, 10 Jan 2018 13:35:49 +0000 (13:35 +0000)]
ipv6: sr: fix TLVs not being copied using setsockopt
Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
to a socket through setsockopt, but the current implementation doesn't
copy possible TLVs at the end of the SRH received from userspace.
Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr))
{ ... } will never complete since the len and type fields of a possible
HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error,
and the HMAC will not be computed.
This commit adds a memcpy in case TLVs have been appended to the SRH.
Fixes: a149e7c7ce81 ("ipv6: sr: add support for SRH injection through setsockopt") Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 10 Jan 2018 11:45:49 +0000 (03:45 -0800)]
ipv6: fix possible mem leaks in ipv6_make_skb()
ip6_setup_cork() might return an error, while memory allocations have
been done and must be rolled back.
Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Reported-by: Mike Maloney <maloney@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 10 Jan 2018 10:42:44 +0000 (11:42 +0100)]
mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.
Fixes: 96f17e0776c2 ("mlxsw: spectrum: Support RED qdisc offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 10 Jan 2018 10:42:43 +0000 (11:42 +0100)]
mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.
Wait for 100ms before starting the polling.
Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 10 Jan 2018 02:14:28 +0000 (18:14 -0800)]
nfp: always unmask aux interrupts at init
The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area. The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).
Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.
Always unmask the auxiliary interrupts after request_irq(). On the
remove path add missing PCI write flush before free_irq().
Fixes: 4c3523623dc0 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 9 Jan 2018 21:40:41 +0000 (13:40 -0800)]
8021q: fix a memory leak for VLAN 0 device
A vlan device with vid 0 is allow to creat by not able to be fully
cleaned up by unregister_vlan_dev() which checks for vlan_id!=0.
Also, VLAN 0 is probably not a valid number and it is kinda
"reserved" for HW accelerating devices, but it is probably too
late to reject it from creation even if makes sense. Instead,
just remove the check in unregister_vlan_dev().
Reported-by: Dmitry Vyukov <dvyukov@google.com> Fixes: ad1afb003939 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)") Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur [Tue, 9 Jan 2018 12:43:34 +0000 (14:43 +0200)]
of_mdio: avoid MDIO bus removal when a PHY is missing
If one of the child devices is missing the of_mdiobus_register_phy()
call will return -ENODEV. When a missing device is encountered the
registration of the remaining PHYs is stopped and the MDIO bus will
fail to register. Propagate all errors except ENODEV to avoid it.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiongfeng Wang [Tue, 9 Jan 2018 11:58:18 +0000 (19:58 +0800)]
caif_usb: use strlcpy() instead of strncpy()
gcc-8 reports
net/caif/caif_usb.c: In function 'cfusbl_device_notify':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kornilios Kourtis <kou@zurich.ibm.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yangbo Lu [Tue, 9 Jan 2018 03:02:33 +0000 (11:02 +0800)]
net: gianfar_ptp: move set_fipers() to spinlock protecting area
set_fipers() calling should be protected by spinlock in
case that any interrupt breaks related registers setting
and the function we expect. This patch is to move set_fipers()
to spinlock protecting area in ptp_gianfar_adjtime().
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 10 Jan 2018 19:53:23 +0000 (14:53 -0500)]
Merge branch 'sctp-Some-sockopt-optlen-fixes'
Marcelo Ricardo Leitner says:
====================
sctp: Some sockopt optlen fixes
Hangbin Liu reported that some SCTP sockopt are allowing the user to get
the kernel to allocate really large buffers by not having a ceiling on
optlen.
This patchset address this issue (in patch 2), replace an GFP_ATOMIC
that isn't needed and avoid calculating the option size multiple times
in some setsockopt.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Some sockopt handling functions were calculating the length of the
buffer to be written to userspace and then calculating it again when
actually writing the buffer, which could lead to some write not using
an up-to-date length.
This patch updates such places to just make use of the len variable.
Also, replace some sizeof(type) to sizeof(var).
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu reported that some sockopt calls could cause the kernel to log
a warning on memory allocation failure if the user supplied a large optlen
value. That is because some of them called memdup_user() without a ceiling
on optlen, allowing it to try to allocate really large buffers.
This patch adds a ceiling by limiting optlen to the maximum allowed that
would still make sense for these sockopt.
Reported-by: Hangbin Liu <haliu@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Prevent out-of-bounds speculation in BPF maps by masking the
index after bounds checks in order to fix spectre v1, and
add an option BPF_JIT_ALWAYS_ON into Kconfig that allows for
removing the BPF interpreter from the kernel in favor of
JIT-only mode to make spectre v2 harder, from Alexei.
2) Remove false sharing of map refcount with max_entries which
was used in spectre v1, from Daniel.
3) Add a missing NULL psock check in sockmap in order to fix
a race, from John.
4) Fix test_align BPF selftest case since a recent change in
verifier rejects the bit-wise arithmetic on pointers
earlier but test_align update was missing, from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
gpio: Add missing open drain/source handling to gpiod_set_value_cansleep()
Since commit f11a04464ae57e8d ("i2c: gpio: Enable working over slow
can_sleep GPIOs"), probing the i2c RTC connected to an i2c-gpio bus on
r8a7740/armadillo fails with:
rtc-s35390a 0-0030: error resetting chip
rtc-s35390a: probe of 0-0030 failed with error -5
More debug code reveals:
i2c i2c-0: master_xfer[0] R, addr=0x30, len=1
i2c i2c-0: NAK from device addr 0x30 msg #0
s35390a_get_reg: ret = -6
Commit 02e479808b5d62f8 ("gpio: Alter semantics of *raw* operations to
actually be raw") moved open drain/source handling from
gpiod_set_raw_value_commit() to gpiod_set_value(), but forgot to take
into account that gpiod_set_value_cansleep() also needs this handling.
The i2c protocol mandates that i2c signals are open drain, hence i2c
communication fails.
Fix this by adding the missing handling to gpiod_set_value_cansleep(),
using a new common helper gpiod_set_value_nocheck().
Fixes: 02e479808b5d62f8 ("gpio: Alter semantics of *raw* operations to actually be raw") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
[removed underscore syntax, added kerneldoc] Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Steffen Klassert [Wed, 10 Jan 2018 11:14:28 +0000 (12:14 +0100)]
xfrm: Fix a race in the xdst pcpu cache.
We need to run xfrm_resolve_and_create_bundle() with
bottom halves off. Otherwise we may reuse an already
released dst_enty when the xfrm lookup functions are
called from process context.
Linus Torvalds [Tue, 9 Jan 2018 23:45:06 +0000 (15:45 -0800)]
Merge tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V updates from Palmer Dabbelt:
"This contains what I hope are the last RISC-V changes to go into 4.15.
I know it's a bit last minute, but I think they're all fairly small
changes:
- SR_* constants have been renamed to match the latest ISA
specification.
- Some CONFIG_MMU #ifdef cruft has been removed. We've never
supported !CONFIG_MMU.
- __NR_riscv_flush_icache is now visible to userspace. We were hoping
to avoid making this public in order to force userspace to call the
vDSO entry, but it looks like QEMU's user-mode emulation doesn't
want to emulate a vDSO. In order to allow glibc to fall back to a
system call when the vDSO entry doesn't exist we're just
- Our defconfig is no long empty. This is another one that just
slipped through the cracks. The defconfig isn't perfect, but it's
at least close to what users will want for the first RISC-V
development board. Getting closer is kind of splitting hairs here:
none of the RISC-V specific drivers are in yet, so it's not like
things will boot out of the box.
The only one that's strictly necessary is the __NR_riscv_flush_icache
change, as I want that to be part of the public API starting from our
first kernel so nobody has to worry about it. The others are nice to
haves, but they seem sane for 4.15 to me"
* tag 'riscv-for-linus-4.15-rc8_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
riscv: rename SR_* constants to match the spec
riscv: remove CONFIG_MMU ifdefs
RISC-V: Make __NR_riscv_flush_icache visible to userspace
RISC-V: Add a basic defconfig
Linus Torvalds [Tue, 9 Jan 2018 23:43:13 +0000 (15:43 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"Another round of MIPS fixes for 4.15.
- Maciej Rozycki found another series of FP issues which requires a
seven part series to restructure and fix.
- James fixes a warning about .set mt which gas doesn't like when
building for R1 processors"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: CPS: Fix r1 .set mt assembler warning
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linus Torvalds [Tue, 9 Jan 2018 19:20:55 +0000 (11:20 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes that should go into this release. This contains:
- An NVMe pull request from Christoph, with a few critical fixes for
NVMe.
- A block drain queue fix from Ming.
- The concurrent lo_open/release fix for loop"
* 'for-linus' of git://git.kernel.dk/linux-block:
loop: fix concurrent lo_open/lo_release
block: drain queue before waiting for q_usage_counter becoming zero
nvme-fcloop: avoid possible uninitialized variable warning
nvme-mpath: fix last path removal during traffic
nvme-rdma: fix concurrent reset and reconnect
nvme: fix sector units when going between formats
nvme-pci: move use_sgl initialization to nvme_init_iod()
Daniel Borkmann [Tue, 9 Jan 2018 12:17:44 +0000 (13:17 +0100)]
bpf: avoid false sharing of map refcount with max_entries
In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.
/* size: 128, cachelines: 2, members: 17 */
/* sum members: 121, holes: 1, sum holes: 7 */
};
Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.
Quoting from Google's Project Zero blog [1]:
Additionally, at least on the Intel machine on which this was
tested, bouncing modified cache lines between cores is slow,
apparently because the MESI protocol is used for cache coherence
[8]. Changing the reference counter of an eBPF array on one
physical CPU core causes the cache line containing the reference
counter to be bounced over to that CPU core, making reads of the
reference counter on all other CPU cores slow until the changed
reference counter has been written back to memory. Because the
length and the reference counter of an eBPF array are stored in
the same cache line, this also means that changing the reference
counter on one physical CPU core causes reads of the eBPF array's
length to be slow on other physical CPU cores (intentional false
sharing).
While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.
Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.
Wei Wang [Mon, 8 Jan 2018 18:34:00 +0000 (10:34 -0800)]
ipv6: remove null_entry before adding default route
In the current code, when creating a new fib6 table, tb6_root.leaf gets
initialized to net->ipv6.ip6_null_entry.
If a default route is being added with rt->rt6i_metric = 0xffffffff,
fib6_add() will add this route after net->ipv6.ip6_null_entry. As
null_entry is shared, it could cause problem.
In order to fix it, set fn->leaf to NULL before calling
fib6_add_rt2node() when trying to add the first default route.
And reset fn->leaf to null_entry when adding fails or when deleting the
last default route.
syzkaller reported the following issue which is fixed by this commit:
Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 18:53:27 +0000 (21:53 +0300)]
SolutionEngine771x: add Ether TSU resource
After the Ether platform data is fixed, the driver probe() method would
still fail since the 'struct sh_eth_cpu_data' corresponding to SH771x
indicates the presence of TSU but the memory resource for it is absent.
Add the missing TSU resource to both Ether devices and fix the harmless
off-by-one error in the main memory resources, while at it...
Fixes: 4986b996882d ("net: sh_eth: remove the SH_TSU_ADDR") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 18:53:26 +0000 (21:53 +0300)]
SolutionEngine771x: fix Ether platform data
The 'sh_eth' driver's probe() method would fail on the SolutionEngine7710
board and crash on SolutionEngine7712 board as the platform code is
hopelessly behind the driver's platform data -- it passes the PHY address
instead of 'struct sh_eth_plat_data *'; pass the latter to the driver in
order to fix the bug...
Fixes: 71557a37adb5 ("[netdrvr] sh_eth: Add SH7619 support") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolai Stange [Mon, 8 Jan 2018 14:54:44 +0000 (15:54 +0100)]
net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()
Commit 8f659a03a0ba ("net: ipv4: fix for a race condition in
raw_sendmsg") fixed the issue of possibly inconsistent ->hdrincl handling
due to concurrent updates by reading this bit-field member into a local
variable and using the thus stabilized value in subsequent tests.
However, aforementioned commit also adds the (correct) comment that
/* hdrincl should be READ_ONCE(inet->hdrincl)
* but READ_ONCE() doesn't work with bit fields
*/
because as it stands, the compiler is free to shortcut or even eliminate
the local variable at its will.
Note that I have not seen anything like this happening in reality and thus,
the concern is a theoretical one.
However, in order to be on the safe side, emulate a READ_ONCE() on the
bit-field by doing it on the local 'hdrincl' variable itself:
int hdrincl = inet->hdrincl;
hdrincl = READ_ONCE(hdrincl);
This breaks the chain in the sense that the compiler is not allowed
to replace subsequent reads from hdrincl with reloads from inet->hdrincl.
Fixes: 8f659a03a0ba ("net: ipv4: fix for a race condition in raw_sendmsg") Signed-off-by: Nicolai Stange <nstange@suse.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiongfeng Wang [Mon, 8 Jan 2018 11:43:00 +0000 (19:43 +0800)]
net: caif: use strlcpy() instead of strncpy()
gcc-8 reports
net/caif/caif_dev.c: In function 'caif_enroll_dev':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
net/caif/cfctrl.c: In function 'cfctrl_linkup_request':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
net/caif/cfcnfg.c: In function 'caif_connect_client':
./include/linux/string.h:245:9: warning: '__builtin_strncpy' output may
be truncated copying 15 bytes from a string of length 15
[-Wstringop-truncation]
The compiler require that the input param 'len' of strncpy() should be
greater than the length of the src string, so that '\0' is copied as
well. We can just use strlcpy() to avoid this warning.
Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilya Dryomov [Thu, 21 Dec 2017 14:35:11 +0000 (15:35 +0100)]
rbd: set max_segments to USHRT_MAX
Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped
max_segments (unsigned short) to max_hw_sectors (unsigned int).
max_hw_sectors is set to the number of 512-byte sectors in an object
and overflows unsigned short for 32M (largest possible) objects, making
the block layer resort to handing us single segment (i.e. single page
or even smaller) bios in that case.
esp: Fix GRO when the headers not fully in the linear part of the skb.
The GRO layer does not necessarily pull the complete headers
into the linear part of the skb, a part may remain on the
first page fragment. This can lead to a crash if we try to
pull the headers, so make sure we have them on the linear
part before pulling.
1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
Alexander Duyck.
2) Undo unintentional UAPI change in netfilter conntrack, from Florian
Westphal.
3) Revert a change to how error codes are returned from
dev_get_valid_name(), it broke some apps.
4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
dual-stack. From Eli Cooper.
5) Fix missed PMTU updates in geneve, from Xin Long.
6) Cure double free in macvlan, from Gao Feng.
7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
Mohamed Ghannam.
8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
deferral of probe when the regulator is not ready yet).
9) Missing DMA mapping error checks in 3c59x, from Neil Horman.
10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.
11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
isn't initialized. From Andrei Vagin.
12) Fix crashes in fib6_add(), from Wei Wang.
13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
sh_eth: fix TXALCR1 offsets
mdio-sun4i: Fix a memory leak
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
sctp: fix the handling of ICMP Frag Needed for too small MTUs
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
xen-netfront: enable device after manual module load
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
sh_eth: fix SH7757 GEther initialization
net: fec: free/restore resource in related probe error pathes
uapi/if_ether.h: prevent redefinition of struct ethhdr
ipv6: fix general protection fault in fib6_add()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
net: stmmac: enable EEE in MII, GMII or RGMII only
rtnetlink: give a user socket to get_target_net()
MAINTAINERS: Update my email address.
can: ems_usb: improve error reporting for error warning and error passive
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
can: gs_usb: fix return value of the "set_bittiming" callback
...
Linus Torvalds [Tue, 9 Jan 2018 00:17:31 +0000 (16:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
- One line fix to mlx4 error flow (same as mlx5 fix in last pull
request, just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that
they don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/srpt: Fix ACL lookup during login
IB/srpt: Disable RDMA access by the initiator
RDMA/netlink: Fix locking around __ib_get_device_by_index
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sergei Shtylyov [Sat, 6 Jan 2018 21:26:47 +0000 (00:26 +0300)]
sh_eth: fix TXALCR1 offsets
The TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error. Luckily, the driver never uses this
register. :-)
Fixes: 4a55530f38e4 ("net: sh_eth: modify the definitions of register") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.
Fixes: 4bdcb1dd9feb ("net: Add MDIO bus driver for the Allwinner EMAC") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1463447 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:19:13 +0000 (14:19 -0500)]
Merge branch 'SCTP-PMTU-discovery-fixes'
Marcelo Ricardo Leitner says:
====================
SCTP PMTU discovery fixes
This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.
Please consider these to stable.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: fix the handling of ICMP Frag Needed for too small MTUs
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit 06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.
The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.
Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Otubo [Fri, 5 Jan 2018 08:42:16 +0000 (09:42 +0100)]
xen-netfront: enable device after manual module load
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.
The guest would face errors like:
[root@guest ~]# ethtool -i eth0
Cannot get driver information: No such device
[root@guest ~]# ifconfig eth0
eth0: error fetching interface information: Device not found
This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.
Signed-off-by: Eduardo Otubo <otubo@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:13:45 +0000 (14:13 -0500)]
Merge branch 'bnxt_en_fixes'
Michael Chan says:
====================
bnxt_en: 2 small bug fixes.
The first one fixes the TC Flower flow parameter passed to firmware. The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Challa [Thu, 4 Jan 2018 23:46:54 +0000 (18:46 -0500)]
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.
Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>