net/core: remove explicit do_softirq() from busy_poll_stop()
Since commit 217f69743681 ("net: busy-poll: allow preemption in
sk_busy_loop()") there is an explicit do_softirq() invocation after
local_bh_enable() has been invoked.
I don't understand why we need this because local_bh_enable() will
invoke do_softirq() once the softirq counter reached zero and we have
softirq-related work pending.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Serhey Popovych [Fri, 16 Jun 2017 12:44:47 +0000 (15:44 +0300)]
fib_rules: Resolve goto rules target on delete
We should avoid marking goto rules unresolved when their
target is actually reachable after rule deletion.
Consolder following sample scenario:
# ip -4 ru sh
0: from all lookup local
32000: from all goto 32100
32100: from all lookup main
32100: from all lookup default
32766: from all lookup main
32767: from all lookup default
# ip -4 ru del pref 32100 table main
# ip -4 ru sh
0: from all lookup local
32000: from all goto 32100 [unresolved]
32100: from all lookup default
32766: from all lookup main
32767: from all lookup default
After removal of first rule with preference 32100 we
mark all goto rules as unreachable, even when rule with
same preference as removed one still present.
Check if next rule with same preference is available
and make all rules with goto action pointing to it.
Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 17 Jun 2017 08:10:27 +0000 (16:10 +0800)]
sctp: ensure ep is not destroyed before doing the dump
Now before dumping a sock in sctp_diag, it only holds the sock while
the ep may be already destroyed. It can cause a use-after-free panic
when accessing ep->asocs.
This patch is to set sctp_sk(sk)->ep NULL in sctp_endpoint_destroy,
and check if this ep is already destroyed before dumping this ep.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdrver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lin Yun Sheng [Fri, 16 Jun 2017 09:24:51 +0000 (17:24 +0800)]
net/hns:bugfix of ethtool -t phy self_test
This patch fixes the phy loopback self_test failed issue. when
Marvell Phy Module is loaded, it will powerdown fiber when doing
phy loopback self test, which cause phy loopback self_test fail.
Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Fri, 16 Jun 2017 07:00:02 +0000 (15:00 +0800)]
net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
The register_vlan_device would invoke free_netdev directly, when
register_vlan_dev failed. It would trigger the BUG_ON in free_netdev
if the dev was already registered. In this case, the netdev would be
freed in netdev_run_todo later.
So add one condition check now. Only when dev is not registered, then
free it directly.
The following is the part coredump when netdev_upper_dev_link failed
in register_vlan_dev. I removed the lines which are too long.
Raju Rangoju [Mon, 19 Jun 2017 14:16:00 +0000 (19:46 +0530)]
cxgb4: notify uP to route ctrlq compl to rdma rspq
During the module initialisation there is a possible race
(basically race between uld and lld) where neither the uld
nor lld notifies the uP about where to route the ctrl queue
completions. LLD skips notifying uP as the rdma queues were
not created by then (will leave it to ULD to notify the uP).
As the ULD comes up, it also skips notifying the uP as the
flag FULL_INIT_DONE is not set yet (ULD assumes that the
interface is not up yet).
Consequently, this race between uld and lld leaves uP
unnotified about where to send the ctrl queue completions
to, leading to iwarp RI_RES WR failure.
Here is the race:
CPU 0 CPU1
- allocates nic rx queus
- t4_sge_alloc_ctrl_txq()
(if rdma rsp queues exists,
tell uP to route ctrl queue
compl to rdma rspq)
- acquires the mutex_lock
- allocates rdma response queues
- if FULL_INIT_DONE set,
tell uP to route ctrl queue compl
to rdma rspq
- relinquishes mutex_lock
- acquires the mutex_lock
- enable_rx()
- set FULL_INIT_DONE
- relinquishes mutex_lock
This patch fixes the above issue.
Fixes: e7519f9926f1('cxgb4: avoid enabling napi twice to the same queue') Signed-off-by: Raju Rangoju <rajur@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> CC: Stable <stable@vger.kernel.org> # 4.9+ Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Jun 2017 04:03:51 +0000 (00:03 -0400)]
Merge tag 'mac80211-for-davem-2017-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Here's just the fix for that ancient bug:
* remove wext calling ndo_do_ioctl, since nobody needs
that now and it makes the type change easier
* use struct iwreq instead of struct ifreq almost everywhere
in wireless extensions code
* copy only struct iwreq from userspace in dev_ioctl for the
wireless extensions, since it's smaller than struct ifreq
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Sat, 17 Jun 2017 03:38:05 +0000 (11:38 +0800)]
ip6_tunnel: Correct tos value in collect_md mode
Same as ip_gre, geneve and vxlan, use key->tos as traffic class value.
CC: Peter Dawson <petedaws@gmail.com> Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets”) Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Fri, 16 Jun 2017 17:46:37 +0000 (10:46 -0700)]
decnet: always not take dst->__refcnt when inserting dst into hash table
In the existing dn_route.c code, dn_route_output_slow() takes
dst->__refcnt before calling dn_insert_route() while dn_route_input_slow()
does not take dst->__refcnt before calling dn_insert_route().
This makes the whole routing code very buggy.
In dn_dst_check_expire(), dnrt_free() is called when rt expires. This
makes the routes inserted by dn_route_output_slow() not able to be
freed as the refcnt is not released.
In dn_dst_gc(), dnrt_drop() is called to release rt which could
potentially cause the dst->__refcnt to be dropped to -1.
In dn_run_flush(), dst_free() is called to release all the dst. Again,
it makes the dst inserted by dn_route_output_slow() not able to be
released and also, it does not wait on the rcu and could potentially
cause crash in the path where other users still refer to this dst.
This patch makes sure both input and output path do not take
dst->__refcnt before calling dn_insert_route() and also makes sure
dnrt_free()/dst_free() is called when removing dst from the hash table.
The only difference between those 2 calls is that dnrt_free() waits on
the rcu while dst_free() does not.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Thu, 15 Jun 2017 02:29:29 +0000 (10:29 +0800)]
ip_tunnel: fix potential issue in ip_tunnel_rcv
When ip_tunnel_rcv fails, the tun_dst won't be freed, so call
dst_release to free it in error code path.
Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.") Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Tested-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series contains some fixes for the mlx5 core and netdev driver.
Please pull and let me know if there's any problem.
For -stable:
("net/mlx5: Wait for FW readiness before initializing command interface") kernels >= 4.4
("net/mlx5e: Fix timestamping capabilities reporting") kernels >= 4.5
("net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it") kernels >= 4.9
("net/mlx5e: Fix min inline value for VF rep SQs") kernels >= 4.11
The "net/mlx5e: Fix min inline .." (a oneliner patch) doesn't cleanly apply
to 4.11, it hits a contextual conflict and can be easily resolved by:
+ mlx5_query_min_inline(mdev, &priv->params.tx_min_inline_mode);
to the end of mlx5e_build_rep_netdev_priv. Note the 2nd parameter of
mlx5_query_min_inline is slightly different from the original one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 15 Jun 2017 17:08:32 +0000 (20:08 +0300)]
net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it
The error flow of mlx5e_create_netdev calls the cleanup call
of the given profile without checking if it exists, fix that.
Currently the VF reps don't register that callback and we crash
if getting into error -- can be reproduced by the user doing ctrl^C
while attempting to change the sriov mode from legacy to switchdev.
Chris Mi [Tue, 16 May 2017 11:07:11 +0000 (07:07 -0400)]
net/mlx5e: Fix min inline value for VF rep SQs
The offending commit only changed the code path for PF/VF, but it
didn't take care of VF representors. As a result, since
params->tx_min_inline_mode for VF representors is kzalloced to 0
(MLX5_INLINE_MODE_NONE), all VF reps SQs were set to that mode.
This actually works on CX5 by default but broke CX4. Fix that by
adding a call to query the min inline mode from the VF rep build up code.
Fixes: a6f402e49901 ("net/mlx5e: Tx, no inline copy on ConnectX-5") Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Thu, 8 Jun 2017 16:33:16 +0000 (11:33 -0500)]
net/mlx5: Wait for FW readiness before initializing command interface
Before attempting to initialize the command interface we must wait till
the fw_initializing bit is clear.
If we fail to meet this condition the hardware will drop our
configuration, specifically the descriptors page address. This scenario
can happen when the firmware is still executing an FLR flow and did not
finish yet so the driver needs to wait for that to finish.
Fixes: e3297246c2c8 ('net/mlx5_core: Wait for FW readiness on startup') Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Doc: net: dsa: b53: update location of referenced dsa.txt
The referenced file dsa.txt is located at
Documentation/devicetree/bindings/net/dsa/dsa.txt
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 15 Jun 2017 09:49:08 +0000 (17:49 +0800)]
sctp: return next obj by passing pos + 1 into sctp_transport_get_idx
In sctp_for_each_transport, pos is used to save how many objs it has
dumped. Now it gets the last obj by sctp_transport_get_idx, then gets
the next obj by sctp_transport_get_next.
The issue is that in the meanwhile if some objs in transport hashtable
are removed and the objs nums are less than pos, sctp_transport_get_idx
would return NULL and hti.walker.tbl is NULL as well. At this moment
it should stop hti, instead of continue getting the next obj. Or it
would cause a NULL pointer dereference in sctp_transport_get_next.
This patch is to pass pos + 1 into sctp_transport_get_idx to get the
next obj directly, even if pos > objs nums, it would return NULL and
stop hti.
Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and reuse some for proc") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 14 Jun 2017 23:12:24 +0000 (00:12 +0100)]
rxrpc: Fix several cases where a padded len isn't checked in ticket decode
This fixes CVE-2017-7482.
When a kerberos 5 ticket is being decoded so that it can be loaded into an
rxrpc-type key, there are several places in which the length of a
variable-length field is checked to make sure that it's not going to
overrun the available data - but the data is padded to the nearest
four-byte boundary and the code doesn't check for this extra. This could
lead to the size-remaining variable wrapping and the data pointer going
over the end of the buffer.
Fix this by making the various variable-length data checks use the padded
length.
Reported-by: 石磊 <shilei-c@360.cn> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.c.dionne@auristor.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 15 Jun 2017 08:33:58 +0000 (16:33 +0800)]
ipv6: fix calling in6_ifa_hold incorrectly for dad work
Now when starting the dad work in addrconf_mod_dad_work, if the dad work
is idle and queued, it needs to hold ifa.
The problem is there's one gap in [1], during which if the pending dad work
is removed elsewhere. It will miss to hold ifa, but the dad word is still
idea and queue.
if (!delayed_work_pending(&ifp->dad_work))
in6_ifa_hold(ifp);
<--------------[1]
mod_delayed_work(addrconf_wq, &ifp->dad_work, delay);
An use-after-free issue can be caused by this.
Chen Wei found this issue when WARN_ON(!hlist_unhashed(&ifp->addr_lst)) in
net6_ifa_finish_destroy was hit because of it.
As Hannes' suggestion, this patch is to fix it by holding ifa first in
addrconf_mod_dad_work, then calling mod_delayed_work and putting ifa if
the dad_work is already in queue.
Note that this patch did not choose to fix it with:
if (!mod_delayed_work(delay))
in6_ifa_hold(ifp);
As with it, when delay == 0, dad_work would be scheduled immediately, all
addrconf_mod_dad_work(0) callings had to be moved under ifp->lock.
Reported-by: Wei Chen <weichen@redhat.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) The netlink attribute passed in to dev_set_alias() is not
necessarily NULL terminated, don't use strlcpy() on it. From
Alexander Potapenko.
2) Fix implementation of atomics in arm64 bpf JIT, from Daniel
Borkmann.
3) Correct the release of netdevs and driver private data in certain
circumstances.
4) Sanitize netlink message length properly in decnet, from Mateusz
Jurczyk.
5) Don't leak kernel data in rtnl_fill_vfinfo() netlink blobs. From
Yuval Mintz.
6) Hash secret is never initialized in ipv6 ILA translation code, from
Arnd Bergmann. I guess those clang warnings about unused inline
functions are useful for something!
7) Fix endian selection in bpf_endian.h, from Daniel Borkmann.
8) Sanitize sockaddr length before dereferncing any fields in AF_UNIX
and CAIF. From Mateusz Jurczyk.
9) Fix timestamping for GMAC3 chips in stmmac driver, from Mario
Molitor.
10) Do not leak netdev on dev_alloc_name() errors in mac80211, from
Johannes Berg.
11) Fix locking in sctp_for_each_endpoint(), from Xin Long.
12) Fix wrong memset size on 32-bit in snmp6, from Christian Perle.
13) Fix use after free in ip_mc_clear_src(), from WANG Cong.
14) Fix regressions caused by ICMP rate limiting changes in 4.11, from
Jesper Dangaard Brouer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
i40e: Fix a sleep-in-atomic bug
net: don't global ICMP rate limit packets originating from loopback
net/act_pedit: fix an error code
net: update undefined ->ndo_change_mtu() comment
net_sched: move tcf_lock down after gen_replace_estimator()
caif: Add sockaddr length check before accessing sa_family in connect handler
qed: fix dump of context data
qmi_wwan: new Telewell and Sierra device IDs
net: phy: Fix MDIO_THUNDER dependencies
netconsole: Remove duplicate "netconsole: " logging prefix
igmp: acquire pmc lock for ip_mc_clear_src()
r8152: give the device version
net: rps: fix uninitialized symbol warning
mac80211: don't send SMPS action frame in AP mode when not needed
mac80211/wpa: use constant time memory comparison for MACs
mac80211: set bss_info data before configuring the channel
mac80211: remove 5/10 MHz rate code from station MLME
mac80211: Fix incorrect condition when checking rx timestamp
mac80211: don't look at the PM bit of BAR frames
i40e: fix handling of HW ATR eviction
...
Linus Torvalds [Thu, 15 Jun 2017 08:51:19 +0000 (17:51 +0900)]
Merge tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an ACPICA commit from the 4.11 cycle that causes problems
to happen on some systems and add a protection against possible kernel
crashes due to table reference counter imbalance.
Specifics:
- Revert a 4.11 ACPICA change that made assumptions which are not
satisfied on some systems and caused the enumeration of resources
to fail on them (Rafael Wysocki).
- Add a mechanism to prevent tables from being unmapped prematurely
due to reference counter overflows (Lv Zheng)"
* tag 'acpi-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Revert "ACPICA: Disassembler: Enhance resource descriptor detection"
Linus Torvalds [Thu, 15 Jun 2017 08:47:46 +0000 (17:47 +0900)]
Merge tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These revert a recent cpufreq schedutil governor change that turned
out to be problematic and fix a few minor issues in cpufreq, cpuidle
and the Exynos devfreq drivers.
Specifics:
- Revert a recent cpufreq schedutil governor change that caused some
systems to behave undesirably (Rafael Wysocki).
- Fix a cpufreq conservative governor issue introduced during the
3.10 cycle that prevents it from working as expected in some
situations (Tomasz Wilczyński).
- Fix an error code path in the generic cpuidle driver for DT-based
systems (Christophe Jaillet).
- Fix three minor issues in devfreq drivers for Exynos (Arvind Yadav,
Krzysztof Kozlowski)"
* tag 'pm-4.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: dt: Add missing 'of_node_put()'
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Revert "cpufreq: schedutil: Reduce frequencies slower"
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
Linus Torvalds [Thu, 15 Jun 2017 08:44:41 +0000 (17:44 +0900)]
Merge branch 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
- ifdef-based bandaid for a long-standing issue with HID driver
matching, avoiding regressions in cases where specific driver is not
enabled in kernel .config, from Jiri Kosina
* 'for-4.12/driver-matching-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: let generic driver yield control iff specific driver has been enabled
Linus Torvalds [Thu, 15 Jun 2017 08:37:40 +0000 (17:37 +0900)]
Merge tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- some build dependency issues at CEC core with randconfigs
- fix an off by one error at vb2
- a race fix at cec core
- driver fixes at tc358743, sir_ir and rainshadow-cec
* tag 'media/v4.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] media/cec.h: use IS_REACHABLE instead of IS_ENABLED
[media] cec: race fix: don't return -ENONET in cec_receive()
[media] sir_ir: infinite loop in interrupt handler
[media] cec-notifier.h: handle unreachable CONFIG_CEC_CORE
[media] cec: improve MEDIA_CEC_RC dependencies
[media] vb2: Fix an off by one error in 'vb2_plane_vaddr'
[media] rainshadow-cec: Fix missing spin_lock_init()
[media] tc358743: fix register i2c_rd/wr function fix
Jia-Ju Bai [Wed, 14 Jun 2017 23:35:31 +0000 (16:35 -0700)]
i40e: Fix a sleep-in-atomic bug
The driver may sleep under a spin lock, and the function call path is:
i40e_ndo_set_vf_port_vlan (acquire the lock by spin_lock_bh)
i40e_vsi_remove_pvid
i40e_vlan_stripping_disable
i40e_aq_update_vsi_params
i40e_asq_send_command
mutex_lock --> may sleep
To fixed it, the spin lock is released before "i40e_vsi_remove_pvid", and
the lock is acquired again after this function.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: don't global ICMP rate limit packets originating from loopback
Florian Weimer seems to have a glibc test-case which requires that
loopback interfaces does not get ICMP ratelimited. This was broken by
commit c0303efeab73 ("net: reduce cycles spend on ICMP replies that
gets rate limited").
An ICMP response will usually be routed back-out the same incoming
interface. Thus, take advantage of this and skip global ICMP
ratelimit when the incoming device is loopback. In the unlikely event
that the outgoing it not loopback, due to strange routing policy
rules, ICMP rate limiting still works via peer ratelimiting via
icmpv4_xrlim_allow(). Thus, we should still comply with RFC1812
(section 4.3.2.8 "Rate Limiting").
This seems to fix the reproducer given by Florian. While still
avoiding to perform expensive and unneeded outgoing route lookup for
rate limited packets (in the non-loopback case).
Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: "H.J. Lu" <hjl.tools@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 14 Jun 2017 10:29:31 +0000 (13:29 +0300)]
net/act_pedit: fix an error code
I'm reviewing static checker warnings where we do ERR_PTR(0), which is
the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL)
here. Sometimes these bugs lead to a NULL dereference but I don't
immediately see that problem here.
Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Magnus Damm [Wed, 14 Jun 2017 07:15:24 +0000 (16:15 +0900)]
net: update undefined ->ndo_change_mtu() comment
Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.
Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Tue, 13 Jun 2017 20:36:24 +0000 (13:36 -0700)]
net_sched: move tcf_lock down after gen_replace_estimator()
Laura reported a sleep-in-atomic kernel warning inside
tcf_act_police_init() which calls gen_replace_estimator() with
spinlock protection.
It is not necessary in this case, we already have RTNL lock here
so it is enough to protect concurrent writers. For the reader,
i.e. tcf_act_police(), it needs to make decision based on this
rate estimator, in the worst case we drop more/less packets than
necessary while changing the rate in parallel, it is still acceptable.
Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Nick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 14 Jun 2017 07:28:11 +0000 (09:28 +0200)]
dev_ioctl: copy only the smaller struct iwreq for wext
Unfortunately, struct iwreq isn't a proper subset of struct ifreq,
but is still handled by the same code path. Robert reported that
then applications may (randomly) fault if the struct iwreq they
pass happens to land within 8 bytes of the end of a mapping (the
struct is only 32 bytes, vs. struct ifreq's 40 bytes).
To fix this, pull out the code handling wireless extension ioctls
and copy only the smaller structure in this case.
This bug goes back a long time, I tracked that it was introduced
into mainline in 2.1.15, over 20 years ago!
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195869
Reported-by: Robert O'Callahan <robert@ocallahan.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 14 Jun 2017 07:21:58 +0000 (09:21 +0200)]
wireless: wext: use struct iwreq earlier in the call chain
To make it clear that we never use struct ifreq, cast from it
directly in the wext entrypoint and use struct iwreq from there
on. The next patch will remove the cast again and pass the
correct struct from the beginning.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 14 Jun 2017 07:17:38 +0000 (09:17 +0200)]
wireless: wext: remove ndo_do_ioctl fallback
There are no longer any drivers (in the tree proper, I didn't
check all the staging drivers) that take WEXT ioctls through
this API, the only remaining ones that even have ndo_do_ioctl
are using it only for private ioctls.
Therefore, we can remove this call.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Mateusz Jurczyk [Tue, 13 Jun 2017 18:06:12 +0000 (20:06 +0200)]
caif: Add sockaddr length check before accessing sa_family in connect handler
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in the connect()
handler of the AF_CAIF socket. Since the syscall doesn't enforce a minimum
size of the corresponding memory region, very short sockaddrs (zero or one
byte long) result in operating on uninitialized memory while referencing
sa_family.
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 13 Jun 2017 17:34:13 +0000 (13:34 -0400)]
Merge tag 'mac80211-for-davem-2017-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Some fixes:
* Avi fixes some fallout from my mac80211 RX flags changes
* Emmanuel fixes an issue with adhering to the spec, and
an oversight in the SMPS management code
* Jason's patch makes mac80211 use constant-time memory
comparisons for message authentication, to avoid having
potentially observable timing differences
* my fix makes mac80211 set the basic rates bitmap before
the channel so the next update to the driver has more
consistent data - this required another rework patch to
remove some useless 5/10 MHz code that can never be hit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tayar, Tomer [Tue, 13 Jun 2017 09:15:59 +0000 (12:15 +0300)]
qed: fix dump of context data
Currently when dumping a context data only word number '1' is read for the
entire context.
Fixes: c965db444629 ("qed: Add support for debug data collection") Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Tue, 13 Jun 2017 17:10:18 +0000 (19:10 +0200)]
qmi_wwan: new Telewell and Sierra device IDs
A new Sierra Wireless EM7305 device ID used in a Toshiba laptop,
and two Longcheer device IDs entries used by Telewell TW-3G HSPA+
branded modems.
Reported-by: Petr Kloc <petr_kloc@yahoo.com> Reported-by: Teemu Likonen <tlikonen@iki.fi> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Tue, 13 Jun 2017 00:18:51 +0000 (17:18 -0700)]
net: phy: Fix MDIO_THUNDER dependencies
After commit 90eff9096c01 ("net: phy: Allow splitting MDIO
bus/device support from PHYs") we could create a configuration where
MDIO_DEVICE=y and PHYLIB=m which leads to the following undefined
references:
drivers/built-in.o: In function `thunder_mdiobus_pci_remove':
>> mdio-thunder.c:(.text+0x2a212f): undefined reference to
>> `mdiobus_unregister'
>> mdio-thunder.c:(.text+0x2a2138): undefined reference to
>> `mdiobus_free'
drivers/built-in.o: In function `thunder_mdiobus_pci_probe':
mdio-thunder.c:(.text+0x2a22e7): undefined reference to
`devm_mdiobus_alloc_size'
mdio-thunder.c:(.text+0x2a236f): undefined reference to
`of_mdiobus_register'
Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: 90eff9096c01 ("net: phy: Allow splitting MDIO bus/device support from PHYs") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This happens because we don't hold pmc->lock in ip_mc_clear_src()
and a parallel mr_ifc_timer timer could jump in and access them.
The RCU lock is there but it is merely for pmc itself, this
spinlock could actually ensure we don't access them in parallel.
Thanks to Eric and Long for discussion on this bug.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ashwanth Goli [Tue, 13 Jun 2017 11:24:55 +0000 (16:54 +0530)]
net: rps: fix uninitialized symbol warning
This patch fixes uninitialized symbol warning that
got introduced by the following commit 773fc8f6e8d6 ("net: rps: send out pending IPI's on CPU hotplug")
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Kosina [Fri, 9 Jun 2017 11:15:37 +0000 (13:15 +0200)]
HID: let generic driver yield control iff specific driver has been enabled
There are many situations where generic HID driver provides some basic level
of support for certain device, but later this support (usually by implementing
vendor-specific extensions of HID protocol) is extended and the support moved
over to a separate (usually per-vendor) specific driver.
This might bring a rather unpleasant suprise for users, as all of a sudden
there is a new config option they have to enable in order to get any support
for their device whatsoever, although previous kernel versions provided basic
support through the generic driver. Which is rightfully seen as a regression.
Fix this by including the entry for a particular device in
hid_have_special_driver[] iff the specific config option has been specified,
and let generic driver handle the device otherwise.
Also make the behavior of hid_scan_report() (where the same decision is being
taken on a per-report level) consistent.
While at it, reshuffle the hid_have_special_driver[] a bit to restore the
alphabetical ordering (first order by config option, and within those
sections order by VID).
This is considered a short-term solution, before generic way of giving
precedence to special drivers and falling back to generic driver is
figured out.
While at it, fixup a missing entry for GFRM driver; thanks to Hans de Geode for
spotting this (and for discovering a few issues in the conversion).
mac80211: don't send SMPS action frame in AP mode when not needed
mac80211 allows to modify the SMPS state of an AP both,
when it is started, and after it has been started. Such a
change will trigger an action frame to all the peers that
are currently connected, and will be remembered so that
new peers will get notified as soon as they connect (since
the SMPS setting in the beacon may not be the right one).
This means that we need to remember the SMPS state
currently requested as well as the SMPS state that was
configured initially (and advertised in the beacon).
The former is bss->req_smps and the latter is
sdata->smps_mode.
Initially, the AP interface could only be started with
SMPS_OFF, which means that sdata->smps_mode was SMPS_OFF
always. Later, a nl80211 API was added to be able to start
an AP with a different AP mode. That code forgot to update
bss->req_smps and because of that, if the AP interface was
started with SMPS_DYNAMIC, we had:
sdata->smps_mode = SMPS_DYNAMIC
bss->req_smps = SMPS_OFF
That configuration made mac80211 think it needs to fire off
an action frame to any new station connecting to the AP in
order to let it know that the actual SMPS configuration is
SMPS_OFF.
Fix that by properly setting bss->req_smps in
ieee80211_start_ap.
Fixes: f69931748730 ("mac80211: set smps_mode according to ap params") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
mac80211/wpa: use constant time memory comparison for MACs
Otherwise, we enable all sorts of forgeries via timing attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Sat, 10 Jun 2017 10:52:43 +0000 (13:52 +0300)]
mac80211: set bss_info data before configuring the channel
When mac80211 changes the channel, it also calls into the driver's
bss_info_changed() callback, e.g. with BSS_CHANGED_IDLE. The driver
may, like iwlwifi does, access more data from bss_info in that case
and iwlwifi accesses the basic_rates bitmap, but if changing from a
band with more (basic) rates to one with fewer, an out-of-bounds
access of the rate array may result.
While we can't avoid having invalid data at some point in time, we
can avoid having it while we call the driver - so set up all the
data before configuring the channel, and then apply it afterwards.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=195677
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de> Debugged-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Sat, 10 Jun 2017 10:52:44 +0000 (13:52 +0300)]
mac80211: remove 5/10 MHz rate code from station MLME
There's no need for the station MLME code to handle bitrates for 5
or 10 MHz channels when it can't ever create such a configuration.
Remove the unnecessary code.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Avraham Stern [Mon, 12 Jun 2017 07:44:58 +0000 (10:44 +0300)]
mac80211: Fix incorrect condition when checking rx timestamp
If the driver reports the rx timestamp at PLCP start, mac80211 can
only handle legacy encoding, but the code checks that the encoding
is not legacy. Fix this.
Fixes: da6a4352e7c8 ("mac80211: separate encoding/bandwidth from flags") Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Tue, 13 Jun 2017 06:09:10 +0000 (15:09 +0900)]
Merge tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa fixes from Max Filippov:
- don't use linux IRQ #0 in legacy irq domains: fixes timer interrupt
assignment when it's hardware IRQ # is 0 and the kernel is built w/o
device tree support
- reduce reservation size for double exception vector literals from 48
to 20 bytes: fixes build on cores with small user exception vector
- cleanups: use kmalloc_array instead of kmalloc in simdisk_init and
seq_puts instead of seq_printf in c_show.
* tag 'xtensa-20170612' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: don't use linux IRQ #0
xtensa: reduce double exception literal reservation
xtensa: ISS: Use kmalloc_array() in simdisk_init()
xtensa: Use seq_puts() in c_show()
Linus Torvalds [Tue, 13 Jun 2017 06:07:11 +0000 (15:07 +0900)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- A fix for KVM to avoid kernel oopses in case of host protection
faults due to runtime instrumentation
- A fix for the AP bus to avoid dead devices after unbind / bind
- A fix for a compile warning merged from the vfio_ccw tree
- Updated default configurations
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: update defconfig
s390/zcrypt: Fix blocking queue device after unbind/bind.
s390/vfio_ccw: make some symbols static
s390/kvm: do not rely on the ILC on kvm host protection fauls
Jacob Keller [Mon, 12 Jun 2017 22:38:36 +0000 (15:38 -0700)]
i40e: fix handling of HW ATR eviction
A recent commit to refactor the driver and remove the hw_disabled_flags
field accidentally introduced two regressions. First, we overwrote
pf->flags which removed various key flags including the MSI-X settings.
Additionally, it was intended that we have now two flags,
HW_ATR_EVICT_CAPABLE and HW_ATR_EVICT_ENABLED, but this was not done,
and we accidentally were mis-using HW_ATR_EVICT_CAPABLE everywhere.
This patch adds the missing piece, HW_ATR_EVICT_ENABLED, and safely
updates pf->flags instead of overwriting it.
Without this patch we will have many problems including disabling MSI-X
support, and we'll attempt to use HW ATR eviction on devices which do
not support it.
Fixes: 47994c119a36 ("i40e: remove hw_disabled_flags in favor of using separate flag bits", 2017-04-19) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When HSR interface is setup using ip link command, an annoying warning
appears with the trace as below:-
[ 203.019828] hsr_get_node: Non-HSR frame
[ 203.019833] Modules linked in:
[ 203.019848] CPU: 0 PID: 158 Comm: sd-resolve Tainted: G W 4.12.0-rc3-00052-g9fa6bf70 #2
[ 203.019853] Hardware name: Generic DRA74X (Flattened Device Tree)
[ 203.019869] [<c0110280>] (unwind_backtrace) from [<c010c2f4>] (show_stack+0x10/0x14)
[ 203.019880] [<c010c2f4>] (show_stack) from [<c04b9f64>] (dump_stack+0xac/0xe0)
[ 203.019894] [<c04b9f64>] (dump_stack) from [<c01374e8>] (__warn+0xd8/0x104)
[ 203.019907] [<c01374e8>] (__warn) from [<c0137548>] (warn_slowpath_fmt+0x34/0x44)
root@am57xx-evm:~# [ 203.019921] [<c0137548>] (warn_slowpath_fmt) from [<c081126c>] (hsr_get_node+0x148/0x170)
[ 203.019932] [<c081126c>] (hsr_get_node) from [<c0814240>] (hsr_forward_skb+0x110/0x7c0)
[ 203.019942] [<c0814240>] (hsr_forward_skb) from [<c0811d64>] (hsr_dev_xmit+0x2c/0x34)
[ 203.019954] [<c0811d64>] (hsr_dev_xmit) from [<c06c0828>] (dev_hard_start_xmit+0xc4/0x3bc)
[ 203.019963] [<c06c0828>] (dev_hard_start_xmit) from [<c06c13d8>] (__dev_queue_xmit+0x7c4/0x98c)
[ 203.019974] [<c06c13d8>] (__dev_queue_xmit) from [<c0782f54>] (ip6_finish_output2+0x330/0xc1c)
[ 203.019983] [<c0782f54>] (ip6_finish_output2) from [<c0788f0c>] (ip6_output+0x58/0x454)
[ 203.019994] [<c0788f0c>] (ip6_output) from [<c07b16cc>] (mld_sendpack+0x420/0x744)
As this is an expected path to hsr_get_node() with frame coming from
the master interface, add a check to ensure packet is not from the
master port and then warn.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Perle [Mon, 12 Jun 2017 08:06:57 +0000 (10:06 +0200)]
proc: snmp6: Use correct type in memset
Reading /proc/net/snmp6 yields bogus values on 32 bit kernels.
Use "u64" instead of "unsigned long" in sizeof().
Fixes: 4a4857b1c81e ("proc: Reduce cache miss in snmp6_seq_show") Signed-off-by: Christian Perle <christian.perle@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq
Pull devfreq fixes from MyungJoo Ham.
* 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq:
PM / devfreq: exynos-ppmu: Staticize event list
PM / devfreq: exynos-ppmu: Handle return value of clk_prepare_enable
PM / devfreq: exynos-nocp: Handle return value of clk_prepare_enable
'of_node_put()' should be called on pointer returned by
'of_parse_phandle()' when done. In this function this is done in all path
except this 'continue', so add it.
Fixes: 97735da074fd (drivers: cpuidle: Add status property to ARM idle states) Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
Commit 27ed3cd2ebf4 (cpufreq: conservative: Fix the logic in frequency
decrease checking) removed the 10 point substraction when comparing the
load against down_threshold but did not remove the related limit for the
down_threshold value. As a result, down_threshold lower than 11 is not
allowed even though values from 1 to 10 do work correctly too. The
comment ("cannot be lower than 11 otherwise freq will not fall") also
does not apply after removing the substraction.
For this reason, allow down_threshold to take any value from 1 to 99
and fix the related comment.
Fixes: 27ed3cd2ebf4 (cpufreq: conservative: Fix the logic in frequency decrease checking) Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Revert commit 39b64aa1c007 (cpufreq: schedutil: Reduce frequencies
slower) that introduced unintentional changes in behavior leading
to adverse effects on some systems.
Reported-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Lv Zheng [Wed, 7 Jun 2017 04:54:58 +0000 (12:54 +0800)]
ACPICA: Tables: Mechanism to handle late stage acpi_get_table() imbalance
Considering this case:
1. A program opens a sysfs table file 65535 times, it can increase
validation_count and first increment cause the table to be mapped:
validation_count = 65535
2. AML execution causes "Load" to be executed on the same
table, this time it cannot increase validation_count, so
validation_count remains:
validation_count = 65535
3. The program closes sysfs table file 65535 times, it can decrease
validation_count and the last decrement cause the table to be
unmapped:
validation_count = 0
4. AML code still accessing the loaded table, kernel crash can be
observed.
To prevent that from happening, add a validation_count threashold.
When it is reached, the validation_count can no longer be
incremented/decremented to invalidate the table descriptor (means
preventing table unmappings)
Note that code added in acpi_tb_put_table() is actually a no-op but
changes the warning message into a "warn once" one. Lv Zheng.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog, comments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Sun, 11 Jun 2017 23:17:29 +0000 (16:17 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key subsystem fixes from James Morris:
"Here are a bunch of fixes for Linux keyrings, including:
- Fix up the refcount handling now that key structs use the
refcount_t type and the refcount_t ops don't allow a 0->1
transition.
- Fix a potential NULL deref after error in x509_cert_parse().
- Don't put data for the crypto algorithms to use on the stack.
- Fix the handling of a null payload being passed to add_key().
- Fix incorrect cleanup an uninitialised key_preparsed_payload in
key_update().
- Explicit sanitisation of potentially secure data before freeing.
- Fixes for the Diffie-Helman code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
KEYS: fix refcount_inc() on zero
KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP API
crypto : asymmetric_keys : verify_pefile:zero memory content before freeing
KEYS: DH: add __user annotations to keyctl_kdf_params
KEYS: DH: ensure the KDF counter is properly aligned
KEYS: DH: don't feed uninitialized "otherinfo" into KDF
KEYS: DH: forbid using digest_null as the KDF hash
KEYS: sanitize key structs before freeing
KEYS: trusted: sanitize all key material
KEYS: encrypted: sanitize all key material
KEYS: user_defined: sanitize key payloads
KEYS: sanitize add_key() and keyctl() key payloads
KEYS: fix freeing uninitialized memory in key_update()
KEYS: fix dereferencing NULL payload with nonzero length
KEYS: encrypted: use constant-time HMAC comparison
KEYS: encrypted: fix race causing incorrect HMAC calculations
KEYS: encrypted: fix buffer overread in valid_master_desc()
KEYS: encrypted: avoid encrypting/decrypting stack buffers
KEYS: put keyring if install_session_keyring_to_cred() fails
KEYS: Delete an error message for a failed memory allocation in get_derived_key()
...
Linus Torvalds [Sun, 11 Jun 2017 22:51:56 +0000 (15:51 -0700)]
compiler, clang: properly override 'inline' for clang
Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.
Donald Sharp [Sat, 10 Jun 2017 20:30:17 +0000 (16:30 -0400)]
net: ipmr: Fix some mroute forwarding issues in vrf's
This patch fixes two issues:
1) When forwarding on *,G mroutes that are in a vrf, the
kernel was dropping information about the actual incoming
interface when calling ip_mr_forward from ip_mr_input.
This caused ip_mr_forward to send the multicast packet
back out the incoming interface. Fix this by
modifying ip_mr_forward to be handed the correctly
resolved dev.
2) When a unresolved cache entry is created we store
the incoming skb on the unresolved cache entry and
upon mroute resolution from the user space daemon,
we attempt to forward the packet. Again we were
not resolving to the correct incoming device for
a vrf scenario, before calling ip_mr_forward.
Fix this by resolving to the correct interface
and calling ip_mr_forward with the result.
Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series contains some fixes for the mlx5 core and netdev driver.
Please pull and let me know if there's any problem.
For -stable:
("net/mlx5e: Added BW check for DIM decision mechanism") kernels >= 4.9
("net/mlx5e: Fix wrong indications in DIM due to counter wraparound") kernels >= 4.9
("net/mlx5: Remove several module events out of ethtool stats") kernels >= 4.10
("net/mlx5: Enable 4K UAR only when page size is bigger than 4K") kernels >= 4.11
*all patches apply with no issue on their -stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:50 +0000 (15:42 +0300)]
net: ena: bug fix in lost tx packets detection mechanism
check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:49 +0000 (15:42 +0300)]
net: ena: disable admin msix while working in polling mode
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:48 +0000 (15:42 +0300)]
net: ena: fix theoretical Rx hang on low memory systems
For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more
The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:47 +0000 (15:42 +0300)]
net: ena: add missing unmap bars on device removal
This patch also change the mapping functions to devm_ functions
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:46 +0000 (15:42 +0300)]
net: ena: fix race condition between submit and completion admin command
Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).
Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.
This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.
Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:45 +0000 (15:42 +0300)]
net: ena: add missing return when ena_com_get_io_handlers() fails
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:44 +0000 (15:42 +0300)]
net: ena: fix bug that might cause hang after consecutive open/close interface.
Fixing a bug that the driver does not unmask the IO interrupts
in ndo_open():
occasionally, the MSI-X interrupt (for one or more IO queues)
can be masked when ndo_close() was called.
If that is followed by ndo open(),
then the MSI-X will be still masked so no interrupt
will be received by the driver.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current flow to detect admin completion is:
while (command_not_completed) {
if (timeout)
error
check_for_completion()
sleep()
}
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.
The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 11 Jun 2017 19:02:01 +0000 (12:02 -0700)]
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull randomness fixes from Ted Ts'o:
"Improve performance by using a lockless update mechanism suggested by
Linus, and make sure we refresh per-CPU entropy returned get_random_*
as soon as the CRNG is initialized"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: invalidate batched entropy after crng init
random: use lockless method of accessing and updating f->reg_idx
Linus Torvalds [Sun, 11 Jun 2017 18:57:47 +0000 (11:57 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix various bug fixes in ext4 caused by races and memory allocation
failures"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fdatasync(2) after extent manipulation operations
ext4: fix data corruption for mmap writes
ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO
ext4: fix quota charging for shared xattr blocks
ext4: remove redundant check for encrypted file on dio write path
ext4: remove unused d_name argument from ext4_search_dir() et al.
ext4: fix off-by-one error when writing back pages before dio read
ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
ext4: keep existing extra fields when inode expands
ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors
ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff()
ext4: fix SEEK_HOLE
jbd2: preserve original nofs flag during journal restart
ext4: clear lockdep subtype for quota files on quota off
Linus Torvalds [Sun, 11 Jun 2017 18:34:27 +0000 (11:34 -0700)]
Merge tag 'gpio-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"A few overdue GPIO patches for the v4.12 kernel.
- Fix debounce logic on the Aspeed platform.
- Fix the "virtual gpio" things on the Intel Crystal Cove.
- Fix the blink counter selection on the MVEBU platform"
* tag 'gpio-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mvebu: fix gpio bank registration when pwm is used
gpio: mvebu: fix blink counter register selection
MAINTAINERS: remove self from GPIO maintainers
gpio: crystalcove: Do not write regular gpio registers for virtual GPIOs
gpio: aspeed: Don't attempt to debounce if disabled
Linus Torvalds [Sun, 11 Jun 2017 18:29:15 +0000 (11:29 -0700)]
Merge tag 'char-misc-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes for 4.12-rc5. Nothing major here,
just some small bugfixes found by people testing, and a MAINTAINERS
file update for the genwqe driver.
All have been in linux-next with no reported issues"
[ The cxl driver fix came in through the powerpc tree earlier ]
* tag 'char-misc-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
cxl: Avoid double free_irq() for psl,slice interrupts
mei: make sysfs modalias format similar as uevent modalias
drivers: char: mem: Fix wraparound check to allow mappings up to the end
MAINTAINERS: Change maintainer of genwqe driver
goldfish_pipe: use GFP_ATOMIC under spin lock
firmware: vpd: do not leak kobjects
firmware: vpd: avoid potential use-after-free when destroying section
firmware: vpd: do not leave freed section attributes to the list
Linus Torvalds [Sun, 11 Jun 2017 18:25:51 +0000 (11:25 -0700)]
Merge tag 'staging-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"These are mostly all IIO driver fixes, resolving a number of tiny
issues. There's also a ccree and lustre fix in here as well, both fix
problems found in those codebases.
All have been in linux-next with no reported issues"
* tag 'staging-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ccree: fix buffer copy
staging/lustre/lov: remove set_fs() call from lov_getstripe()
staging: ccree: add CRYPTO dependency
iio: adc: sun4i-gpadc-iio: fix parent device being used in devm function
iio: light: ltr501 Fix interchanged als/ps register field
iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's
iio: trigger: fix NULL pointer dereference in iio_trigger_write_current()
iio: adc: max9611: Fix attribute measure unit
iio: adc: ti_am335x_adc: allocating too much in probe
iio: adc: sun4i-gpadc-iio: Fix module autoload when OF devices are registered
iio: adc: sun4i-gpadc-iio: Fix module autoload when PLATFORM devices are registered
iio: proximity: as3935: fix iio_trigger_poll issue
iio: proximity: as3935: fix AS3935_INT mask
iio: adc: Max9611: checking for ERR_PTR instead of NULL in probe
iio: proximity: as3935: recalibrate RCO after resume
Linus Torvalds [Sun, 11 Jun 2017 18:21:08 +0000 (11:21 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of user visible fixes (excepting one format string
change).
Four of the qla2xxx fixes only affect the firmware dump path, but it's
still important to the enterprise. The rest are various NULL pointer
crash conditions or outright driver hangs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: libcxgbi: in error case RST tcp conn
scsi: scsi_debug: Avoid PI being disabled when TPGS is enabled
scsi: qla2xxx: Fix extraneous ref on sp's after adapter break
scsi: lpfc: prevent potential null pointer dereference
scsi: lpfc: Avoid NULL pointer dereference in lpfc_els_abort()
scsi: lpfc: nvmet_fc: fix format string
scsi: qla2xxx: Fix crash due to NULL pointer dereference of ctx
scsi: qla2xxx: Fix mailbox pointer error in fwdump capture
scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC
scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues
scsi: qla2xxx: Fix crash due to mismatch mumber of Q-pair creation for Multi queue
scsi: qla2xxx: Fix NULL pointer access due to redundant fc_host_port_name call
scsi: qla2xxx: Fix recursive loop during target mode configuration for ISP25XX leaving system unresponsive
scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats()
scsi: qla2xxx: don't disable a not previously enabled PCI device
Linus Torvalds [Sun, 11 Jun 2017 18:15:09 +0000 (11:15 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dan Williams:
"We expanded the device-dax fs type in 4.12 to be a generic provider of
a struct dax_device with an embedded inode. However, Sasha found some
basic negative testing was not run to verify that this fs cleanly
handles being mounted directly.
Note that the fresh rebase was done to remove an unnecessary Cc:
<stable> tag, but this commit otherwise had a build success
notification from the 0day robot."
Tal Gilboa [Mon, 29 May 2017 14:02:55 +0000 (17:02 +0300)]
net/mlx5e: Fix wrong indications in DIM due to counter wraparound
DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Each iteration of the algorithm, DIM calculates the difference in
throughput, packet rate and interrupt rate from last iteration in order
to make a decision. DIM relies on counters for each metric. When these
counters get to their type's max value they wraparound. In this case
the delta between 'end' and 'start' samples is negative and when
translated to unsigned integers - very high. This results in a false
indication to the algorithm and might result in a wrong decision.
The fix calculates the 'distance' between 'end' and 'start' samples in a
cyclic way around the relevant type's max value. It can also be viewed as
an absolute value around the type's max value instead of around 0.
Testing show higher stability in DIM profile selection and no wraparound
issues.
Fixes: cb3c7fd4f839 ("net/mlx5e: Support adaptive RX coalescing") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tal Gilboa [Mon, 15 May 2017 11:13:16 +0000 (14:13 +0300)]
net/mlx5e: Added BW check for DIM decision mechanism
DIM (Dynamically-tuned Interrupt Moderation) is a mechanism designed for
changing the channel interrupt moderation values in order to reduce CPU
overhead for all traffic types.
Until now only interrupt and packet rate were sampled.
We found a scenario on which we get a false indication since a change in
DIM caused more aggregation and reduced packet rate while increasing BW.
We now regard a change as succesfull iff:
current_BW > (prev_BW + threshold) or
current_BW ~= prev_BW and current_PR > (prev_PR + threshold) or
current_BW ~= prev_BW and current_PR ~= prev_PR and
current_IR < (prev_IR - threshold)
Where BW = Bandwidth, PR = Packet rate and IR = Interrupt rate
Huy Nguyen [Mon, 8 May 2017 16:46:50 +0000 (11:46 -0500)]
net/mlx5: Remove several module events out of ethtool stats
Remove the following module event counters out of ethtool stats. The
reason for removing these event counters is that these events do not
occur without techinician's intervention.
module_pwr_budget_exd
module_long_range
module_no_eeprom
module_enforce_part
module_unknown_id
module_unknown_status
module_plug
Fixes: bedb7c909c19 ("net/mlx5e: Add port module event counters to ethtool stats") Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed by: Gal Pressman <galp@mellanox.com>