Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).
To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.
The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.
Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Huy Nguyen [Thu, 7 Feb 2019 15:22:56 +0000 (09:22 -0600)]
net/mlx5: No command allowed when command interface is not ready
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.
Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.
Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Maria Pasechnik [Sun, 3 Feb 2019 15:55:09 +0000 (17:55 +0200)]
net/mlx5e: Fix NULL pointer derefernce in set channels error flow
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.
The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.
This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.
Fixes: bbeb53b8b2c9 ("net/mlx5e: Move RSS params to a dedicated struct") Signed-off-by: Maria Pasechnik <mariap@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
as while we retrieve 'opt_inst' from team->option_inst_list, it could
be added to the local 'opt_inst_list' for multiple times. The
__team_option_inst_tmp_find() doesn't work, as the setter
team_mode_option_set() still calls team->ops.exit() which uses
->tmp_list too in __team_options_change_check().
Simplify the list operations by moving the 'opt_inst_list' and
team_nl_send_event_options_get() into the nla_for_each_nested() loop so
that it can be guranteed that we won't insert a same list entry for
multiple times. Therefore, __team_option_inst_tmp_find() can be removed
too.
Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list") Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message") Reported-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com Reported-by: syzbot+68ee510075cf64260cc4@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@resnulli.us> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 11 Feb 2019 21:06:16 +0000 (13:06 -0800)]
net_sched: fix two more memory leaks in cls_tcindex
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.
For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.
For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 11 Feb 2019 21:06:15 +0000 (13:06 -0800)]
net_sched: fix a memory leak in cls_tcindex
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 11 Feb 2019 21:06:14 +0000 (13:06 -0800)]
net_sched: fix a race condition in tcindex_destroy()
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <bugs@abtelecom.ro> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2019 19:05:07 +0000 (14:05 -0500)]
Merge branch 'ena-races'
Arthur Kiyanovski says:
====================
net: ena: race condition bug fix and version update
This patchset includes a fix to a race condition that can cause
kernel panic, as well as a driver version update because of this
fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: ena: fix race between link up and device initalization
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kal Conley [Sun, 10 Feb 2019 08:57:11 +0000 (09:57 +0100)]
net/packet: fix 4gb buffer limit due to overflow check
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.
This change fixes support for packet ring buffers >= UINT_MAX.
Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr") Signed-off-by: Kal Conley <kal.conley@dectris.com> Signed-off-by: David S. Miller <davem@davemloft.net>
inet_diag: fix reporting cgroup classid and fallback to priority
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.
Extension INET_DIAG_CLASS_ID has not way to request from the beginning.
This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.
Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.
Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).
So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.
Fixes: 0888e372c37f ("net: inet: diag: expose sockets cgroup classid") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Mon, 11 Feb 2019 11:32:20 +0000 (19:32 +0800)]
ipv6: propagate genlmsg_reply return code
genlmsg_reply can fail, so propagate its return code
Fixes: 915d7e5e593 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Mon, 11 Feb 2019 16:04:17 +0000 (18:04 +0200)]
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.
Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.
Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Mon, 11 Feb 2019 15:04:24 +0000 (15:04 +0000)]
net: phylink: avoid resolving link state too early
During testing on Armada 388 platforms, it was found with a certain
module configuration that it was possible to trigger a kernel oops
during the module load process, caused by the phylink resolver being
triggered for a currently disabled interface.
This problem was introduced by changing the way the SFP registration
works, which now can result in the sfp link down notification being
called during phylink_create().
Fixes: b5bfc21af5cb ("net: sfp: do not probe SFP module before we're attached") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Matteo Croce [Mon, 11 Feb 2019 14:32:36 +0000 (15:32 +0100)]
geneve: change NET_UDP_TUNNEL dependency to select
Due to the depends on NET_UDP_TUNNEL, at the moment it is impossible to
compile GENEVE if no other protocol depending on NET_UDP_TUNNEL is
selected.
Fix this changing the depends to a select, and drop NET_IP_TUNNEL from the
select list, as it already depends on NET_UDP_TUNNEL.
Signed-off-by: Matteo Croce <mcroce@redhat.com> Reviewed-and-tested-by: Andrea Claudi <aclaudi@redhat.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Tue, 12 Feb 2019 13:10:00 +0000 (13:10 +0000)]
sfc: initialise found bitmap in efx_ef10_mtd_probe
The bitmap of found partitions in efx_ef10_mtd_probe was not
initialised, causing partitions to be suppressed based off whatever
value was in the bitmap at the start.
Fixes: 3366463513f5 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2019 16:43:19 +0000 (11:43 -0500)]
Merge tag 'mac80211-for-davem-2019-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just a few fixes:
* aggregation session teardown with internal TXQs was
continuing to send some frames marked as aggregation,
fix from Ilan
* IBSS join was missed during firmware restart, should
such a thing happen
* speculative execution based on the return value of
cfg80211_classify8021d() - which is controlled by the
sender of the packet - could be problematic in some
code using it, prevent it
* a few peer measurement fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tuong Lien [Mon, 11 Feb 2019 06:29:43 +0000 (13:29 +0700)]
tipc: fix link session and re-establish issues
When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.
However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:
1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!
Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.
2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!
Solution:
===========
This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.
In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.
The solution is backward compatible.
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhiqiang Liu [Mon, 11 Feb 2019 02:57:46 +0000 (10:57 +0800)]
net: fix IPv6 prefix route residue
Follow those steps:
# ip addr add 2001:123::1/32 dev eth0
# ip addr add 2001:123:456::2/64 dev eth0
# ip addr del 2001:123::1/32 dev eth0
# ip addr del 2001:123:456::2/64 dev eth0
and then prefix route of 2001:123::1/32 will still exist.
This is because ipv6_prefix_equal in check_cleanup_prefix_route
func does not check whether two IPv6 addresses have the same
prefix length. If the prefix of one address starts with another
shorter address prefix, even though their prefix lengths are
different, the return value of ipv6_prefix_equal is true.
Here I add a check of whether two addresses have the same prefix
to decide whether their prefixes are equal.
Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE") Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hoang Le [Mon, 11 Feb 2019 02:18:28 +0000 (09:18 +0700)]
tipc: fix skb may be leaky in tipc_link_input
When we free skb at tipc_data_input, we return a 'false' boolean.
Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,
<snip>
1303 int tipc_link_rcv:
...
1354 if (!tipc_data_input(l, skb, l->inputq))
1355 rc |= tipc_link_input(l, skb, l->inputq);
</snip>
Fix it by simple changing to a 'true' boolean when skb is being free-ed.
Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
condition.
Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <maloy@donjonn.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Cc: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 7 Feb 2019 20:27:38 +0000 (12:27 -0800)]
vxlan: test dev->flags & IFF_UP before calling netif_rx()
netif_rx() must be called under a strict contract.
At device dismantle phase, core networking clears IFF_UP
and flush_all_backlogs() is called after rcu grace period
to make sure no incoming packet might be in a cpu backlog
and still referencing the device.
Most drivers call netif_rx() from their interrupt handler,
and since the interrupts are disabled at device dismantle,
netif_rx() does not have to check dev->flags & IFF_UP
Virtual drivers do not have this guarantee, and must
therefore make the check themselves.
Otherwise we risk use-after-free and/or crashes.
Note this patch also fixes a small issue that came
with commit ce6502a8f957 ("vxlan: fix a use after free
in vxlan_encap_bypass"), since the dev->stats.rx_dropped
change was done on the wrong device.
Fixes: d342894c5d2f ("vxlan: virtual extensible lan") Fixes: ce6502a8f957 ("vxlan: fix a use after free in vxlan_encap_bypass") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Machata <petrm@mellanox.com> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for net:
1) Out-of-bound access to packet data from the snmp nat helper,
from Jann Horn.
2) ICMP(v6) error packets are set as related traffic by conntrack,
update protocol number before calling nf_nat_ipv4_manip_pkt()
to use ICMP(v6) rather than the original protocol number,
from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.
I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.
Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).
Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.
Jann Horn [Wed, 6 Feb 2019 21:56:15 +0000 (22:56 +0100)]
netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs
The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.
(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)
Fixes: cc2d58634e0f ("netfilter: nf_nat_snmp_basic: use asn1 decoder library") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ilan Peer [Wed, 6 Feb 2019 11:17:21 +0000 (13:17 +0200)]
mac80211: Fix Tx aggregation session tear down with ITXQs
When mac80211 requests the low level driver to stop an ongoing
Tx aggregation, the low level driver is expected to call
ieee80211_stop_tx_ba_cb_irqsafe() to indicate that it is ready
to stop the session. The callback in turn schedules a worker
to complete the session tear down, which in turn also handles
the relevant state for the intermediate Tx queue.
However, as this flow in asynchronous, the intermediate queue
should be stopped and not continue servicing frames, as in
such a case frames that are dequeued would be marked as part
of an aggregation, although the aggregation is already been
stopped.
Fix this by stopping the intermediate Tx queue, before
calling the low level driver to stop the Tx aggregation.
Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 6 Feb 2019 11:17:14 +0000 (13:17 +0200)]
cfg80211: prevent speculation on cfg80211_classify8021d() return
It's possible that the caller of cfg80211_classify8021d() uses the
value to index an array, like mac80211 in ieee80211_downgrade_queue().
Prevent speculation on the return value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 6 Feb 2019 11:17:07 +0000 (13:17 +0200)]
cfg80211: pmsr: record netlink port ID
Without recording the netlink port ID, we cannot return the
results or complete messages to userspace, nor will we be
able to abort if the socket is closed, so clearly we need
to fill the value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 6 Feb 2019 11:17:12 +0000 (13:17 +0200)]
mac80211: call drv_ibss_join() on restart
If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
====================
r8169: revert two commits due to a regression
Sander reported a regression (kernel panic, see[1]), therefore let's
revert these commits. Removal of the barriers doesn't seem to
contribute to the issue, the patch just overlaps with the problematic
one and only reverting both patches was tested.
Ursula Braun [Thu, 7 Feb 2019 13:52:54 +0000 (14:52 +0100)]
net/smc: fix byte_order for rx_curs_confirmed
The recent change in the rx_curs_confirmed assignment disregards
byte order, which causes problems on little endian architectures.
This patch fixes it.
Fixes: b8649efad879 ("net/smc: fix sender_free computation") (net-tree) Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 7 Feb 2019 13:13:18 +0000 (14:13 +0100)]
vsock: cope with memory allocation failure at socket creation time
In the unlikely event that the kmalloc call in vmci_transport_socket_init()
fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
and oopsing.
This change addresses the above explicitly checking for zero vmci_trans()
at destruction time.
Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ipv4: use a dedicated counter for icmp_v4 redirect packets
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.
Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host
I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Wed, 6 Feb 2019 10:52:30 +0000 (10:52 +0000)]
net: sfp: do not probe SFP module before we're attached
When we probe a SFP module, we expect to be able to call the upstream
device's module_insert() function so that the upstream link can be
configured. However, when the upstream device is delayed, we currently
may end up probing the module before the upstream device is available,
and lose the module_insert() call.
Avoid this by holding off probing the module until the SFP bus is
properly connected to both the SFP socket driver and the upstream
driver.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
Linus Torvalds [Fri, 8 Feb 2019 18:56:31 +0000 (10:56 -0800)]
Merge tag 'char-misc-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are some small char and misc driver fixes for 5.0-rc6.
Nothing huge here, some more binderfs fixups found as people use it,
and there is a "large" selftest added to validate the binderfs code,
which makes up the majority of this pull request.
There's also some small mei and mic fixes to resolve some reported
issues.
All of these have been in linux-next for over a week with no reported
issues"
* tag 'char-misc-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
mic: vop: Fix crash on remove
mic: vop: Fix use-after-free on remove
binderfs: remove separate device_initcall()
fpga: stratix10-soc: fix wrong of_node_put() in init function
mic: vop: Fix broken virtqueues
mei: free read cb on ctrl_wr list flush
samples: mei: use /dev/mei0 instead of /dev/mei
mei: me: add ice lake point device id.
binderfs: respect limit on binder control creation
binder: fix CONFIG_ANDROID_BINDER_DEVICES
selftests: add binderfs selftests
Linus Torvalds [Fri, 8 Feb 2019 18:53:44 +0000 (10:53 -0800)]
Merge tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 5.0-rc6.
Well, not so much "driver core" as "debugfs". There's a lot of
outstanding debugfs cleanup patches coming in through different
subsystem trees, and in that process the debugfs core was found that
it really should return errors when something bad happens, to prevent
random files from showing up in the root of debugfs afterward. So
debugfs was fixed up to handle this properly, and then two fixes for
the relay and blk-mq code was needed as it was making invalid
assumptions about debugfs return values.
There's also a cacheinfo fix in here that resolves a tiny issue.
All of these have been in linux-next for over a week with no reported
problems"
* tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
blk-mq: protect debugfs_create_files() from failures
relay: check return of create_buf_file() properly
debugfs: debugfs_lookup() should return NULL if not found
debugfs: return error values, not NULL
debugfs: fix debugfs_rename parameter checking
cacheinfo: Keep the old value if of_property_read_u32 fails
Linus Torvalds [Fri, 8 Feb 2019 18:49:55 +0000 (10:49 -0800)]
Merge tag 'tty-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial fixes for 5.0-rc6.
Nothing huge, just a few small fixes for reported issues. The speakup
fix is in here as it is a tty operation issue.
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: fix race between flush_to_ldisc and tty_open
staging: speakup: fix tty-operation NULL derefs
serial: sh-sci: Do not free irqs that have already been freed
serial: 8250_pci: Make PCI class test non fatal
tty: serial: 8250_mtk: Fix potential NULL pointer dereference
Linus Torvalds [Fri, 8 Feb 2019 18:46:14 +0000 (10:46 -0800)]
Merge tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Here are a handful of XFS fixes to fix a data corruption problem, a
crasher bug, and a deadlock.
Summary:
- Fix cache coherency problem with writeback mappings
- Fix buffer deadlock when shutting fs down
- Fix a null pointer dereference when running online repair"
* tag 'xfs-5.0-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: set buffer ops when repair probes for btree type
xfs: end sync buffer I/O properly on shutdown error
xfs: eof trim writeback mapping as soon as it is cached
Linus Torvalds [Fri, 8 Feb 2019 18:43:15 +0000 (10:43 -0800)]
Merge tag 'drm-fixes-2019-02-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Missed fixes last week as had nothing until amdgpu showed up on
Saturday. Other stuff has since rolled in along with some more amdgpu
fixes, so we have two weeks of those, and some i915, vmwgfx, sun4i,
rockchip and omap fixes.
amdgpu/radeon:
- fix crash on passthrough for SI
- fencing fix for shared buffers
- APU hwmon fix
- API powerplay fix
- eDP freesync fix
- PASID mgr locking fix
- KFD warning fix
- DC/powerplay fix
- raven revision ids fix
- vega20 doorbell fix
* tag 'drm-fixes-2019-02-08' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/omap: dsi: Hack-fix DSI bus flags
drm/omap: dsi: Fix OF platform depopulate
drm/omap: dsi: Fix crash in DSI debug dumps
drm/i915: Try to sanitize bogus DPLL state left over by broken SNB BIOSen
drm/amd/display: Attach VRR properties for eDP connectors
drm/amdkfd: Fix if preprocessor statement above kfd_fill_iolink_info_for_cpu
drm/amdgpu: use spin_lock_irqsave to protect vm_manager.pasid_idr
drm/i915: always return something on DDI clock selection
drm/i915: Fix skl srckey mask bits
drm/vmwgfx: Improve on IOMMU detection
drm/vmwgfx: Fix setting of dma masks
drm/vmwgfx: Also check for crtc status while checking for DU active
drm/vmwgfx: Fix an uninitialized fence handle value
drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user
drm/sun4i: tcon: Prepare and enable TCON channel 0 clock at init
drm/amdgpu: fix the incorrect external id for raven series
drm/amdgpu: Implement doorbell self-ring for NBIO 7.4
drm/amd/display: Fix fclk idle state
drm/amdgpu: Transfer fences to dmabuf importer
drm/amd/powerplay: Fix missing break in switch
...
net: dsa: b53: Fix for failure when irq is not defined in dt
Fixes the issues with non BCM58XX chips in the b53 driver
failing, when the irq is not specified in the device tree.
Removed the check for BCM58XX in b53_srab_prepare_irq(),
so the 'port->irq' will be set to '-EXIO' if the irq is not
specified in the device tree.
Fixes: 16994374a6fc ("net: dsa: b53: Make SRAB driver manage port interrupts") Fixes: b2ddc48a81b5 ("net: dsa: b53: Do not fail when IRQ are not initialized") Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 8 Feb 2019 00:32:47 +0000 (10:32 +1000)]
Merge tag 'drm-misc-fixes-2019-02-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.0-rc6:
- Fixes to omap/dsi encoder.
- Clock fix for sun4i.
- Licensing header fix for rockchip.
- Fix division by zero in the mode when trying to set a mode on
i915 with GVT-g enabled.
Linus Torvalds [Thu, 7 Feb 2019 22:53:26 +0000 (15:53 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Three security fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221)
KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222)
kvm: fix kvm_ioctl_create_device() reference counting (CVE-2019-6974)
Linus Torvalds [Thu, 7 Feb 2019 22:44:45 +0000 (15:44 -0700)]
Merge tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Two small nfsd bugfixes for 5.0, for an RDMA bug and a file clone bug"
* tag 'nfsd-5.0-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: Remove max_sge check at connect time
nfsd: Fix error return values for nfsd4_clone_file_range()
Linus Torvalds [Thu, 7 Feb 2019 22:42:43 +0000 (15:42 -0700)]
Merge tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Both of these fixes address issues in changes merged for 5.0-rc4:
- Fix DM core's missing memory barrier before waitqueue_active()
calls.
- Fix DM core's clone_bio() to work when cloning a subset of a bio
with an integrity payload; bio_integrity_trim() wasn't getting
called due to bio_trim()'s early return"
* tag 'for-5.0/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't use bio_trim() afterall
dm: add memory barrier before waitqueue_active
David S. Miller [Thu, 7 Feb 2019 18:48:42 +0000 (10:48 -0800)]
Merge branch 'ipv6-fixes'
Hangbin Liu says:
====================
fix two kernel panics when disabled IPv6 on boot up
When disabled IPv6 on boot up, since there is no ipv6 route tables, we should
not call rt6_lookup. Fix them by checking if we have inet6_dev pointer on
netdevice.
v2: Fix idev reference leak, declarations and code mixing as Stefano,
Eric pointed. Since we only want to check if idev exists and not
reference it, use __in6_dev_get() insteand of in6_dev_get().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 7 Feb 2019 10:36:11 +0000 (18:36 +0800)]
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
If we disabled IPv6 from the kernel command line (ipv6.disable=1), we should
not call ip6_err_gen_icmpv6_unreach(). This:
ip link add sit1 type sit local 192.0.2.1 remote 192.0.2.2 ttl 1
ip link set sit1 up
ip addr add 198.51.100.1/24 dev sit1
ping 198.51.100.2
if IPv6 is disabled at boot time, will crash the kernel.
v2: there's no need to use in6_dev_get(), use __in6_dev_get() instead,
as we only need to check that idev exists and we are under
rcu_read_lock() (from netif_receive_skb_internal()).
Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: ca15a078bd90 ("sit: generate icmpv6 error when receiving icmpv4 error") Cc: Oussama Ghorbel <ghorbel@pivasoftware.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 7 Feb 2019 10:36:10 +0000 (18:36 +0800)]
geneve: should not call rt6_lookup() when ipv6 was disabled
When we add a new GENEVE device with IPv6 remote, checking only for
IS_ENABLED(CONFIG_IPV6) is not enough as we may disable IPv6 in the
kernel command line (ipv6.disable=1), and calling rt6_lookup() would
cause a NULL pointer dereference.
v2:
- don't mix declarations and code (reported by Stefano Brivio, Eric Dumazet)
- there's no need to use in6_dev_get() as we only need to check that
idev exists (reported by David Ahern). This is under RTNL, so we can
simply use __in6_dev_get() instead (Stefano, Eric).
Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: c40e89fd358e9 ("geneve: configure MTU based on a lower device") Cc: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There are multiple code paths where an hrtimer may have been started to
emulate an L1 VMX preemption timer that can result in a call to free_nested
without an intervening L2 exit where the hrtimer is normally
cancelled. Unconditionally cancel in free_nested to cover all cases.
Embargoed until Feb 7th 2019.
Signed-off-by: Peter Shier <pshier@google.com> Reported-by: Jim Mattson <jmattson@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org
Message-Id: <20181011184646.154065-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with
memory operand, INVEPT, INVVPID) can incorrectly inject a page fault
when passed an operand that points to an MMIO address. The page fault
will use uninitialized kernel stack memory as the CR2 and error code.
The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
exit to userspace; however, it is not an easy fix, so for now just
ensure that the error code and CR2 are zero.
Embargoed until Feb 7th 2019.
Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1. creates a device that holds a reference to the VM object (with a borrowed
reference, the VM's refcount has not been bumped yet)
2. initializes the device
3. transfers the reference to the device to the caller's file descriptor table
4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real
reference
The ownership transfer in step 3 must not happen before the reference to the VM
becomes a proper, non-borrowed reference, which only happens in step 4.
After step 3, an attacker can close the file descriptor and drop the borrowed
reference, which can cause the refcount of the kvm object to drop to zero.
This means that we need to grab a reference for the device before
anon_inode_getfd(), otherwise the VM can disappear from under us.
Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: stable@kernel.org Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Thu, 7 Feb 2019 08:33:56 +0000 (08:33 +0000)]
Merge tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of a few small fixes.
The most significant one is the fix for the possible race at loading
HD-audio drivers. This has been present for long time and surfaced
only in a rare occasion, but finally spotted out"
* tag 'sound-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/ca0132 - Fix build error without CONFIG_PCI
ALSA: compress: Fix stop handling on compressed capture streams
ALSA: usb-audio: Add support for new T+A USB DAC
ALSA: hda - Serialize codec registrations
ALSA: hda/realtek - Use a common helper for hp pin reference
ALSA: hda/realtek - Fix lose hp_pins for disable auto mute
ALSA: hda/realtek - Headset microphone support for System76 darp5
Linus Torvalds [Thu, 7 Feb 2019 08:05:28 +0000 (08:05 +0000)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"A small fix for a uapi header, and a fix for VDPA for non-x86 guests"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio: drop internal struct from UAPI
virtio: support VIRTIO_F_ORDER_PLATFORM
Linus Torvalds [Thu, 7 Feb 2019 07:59:01 +0000 (07:59 +0000)]
Merge tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This has two fixes for uprobe code.
- Cut and paste fix to have uprobe printks say "uprobe" and not
"kprobe"
- Add terminating '\0' byte when copying function arguments"
* tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobes: Fix output for multiple string arguments
tracing: uprobes: Fix typo in pr_fmt string
Linus Torvalds [Thu, 7 Feb 2019 07:52:08 +0000 (07:52 +0000)]
Merge tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"A fix for a CUSE regression introduced in v4.20, as well as fixes for
a couple of old bugs"
* tag 'fuse-fixes-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: decrement NR_WRITEBACK_TEMP on the right page
fuse: call pipe_buf_release() under pipe lock
cuse: fix ioctl
fuse: handle zero sized retrieve correctly
Linus Torvalds [Thu, 7 Feb 2019 07:47:08 +0000 (07:47 +0000)]
Merge tag 'pinctrl-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
- Mediatek Kconfig fix
- Sunxi regulator, IRQ banks and pin base fixup
- Intel Cherryview Strago DMI workaround
- Potential regmap problem on mcp23s08
* tag 'pinctrl-v5.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller
pinctrl: mcp23s08: spi: Fix regmap allocation for mcp23s18
pinctrl: cherryview: fix Strago DMI workaround
pinctrl: sunxi: Consider pin_base when calculating regulator array index
pinctrl: sunxi: Fix and simplify pin bank regulator handling
pinctrl: mediatek: fix Kconfig build errors for moore core
Bjorn Helgaas [Tue, 5 Feb 2019 20:47:21 +0000 (14:47 -0600)]
net: Don't default Cavium PTP driver to 'y'
8c56df372bc1 ("net: add support for Cavium PTP coprocessor") added the
Cavium PTP coprocessor driver and enabled it by default. Remove the
"default y" because the driver only applies to Cavium ThunderX processors.
Fixes: 8c56df372bc1 ("net: add support for Cavium PTP coprocessor") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Wei [Tue, 5 Feb 2019 16:01:04 +0000 (00:01 +0800)]
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
dev_consume_skb_irq() should be called in dfx_xmt_done() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Mon, 28 Jan 2019 23:28:06 +0000 (15:28 -0800)]
net/mlx5e: Don't overwrite pedit action when multiple pedit used
In some case, we may use multiple pedit actions to modify packets.
The command shown as below: the last pedit action is effective.
$ tc filter add dev netdev_rep parent ffff: protocol ip prio 1 \
flower skip_sw ip_proto icmp dst_ip 3.3.3.3 \
action pedit ex munge ip dst set 192.168.1.100 pipe \
action pedit ex munge eth src set 00:00:00:00:00:01 pipe \
action pedit ex munge eth dst set 00:00:00:00:00:02 pipe \
action csum ip pipe \
action tunnel_key set src_ip 1.1.1.100 dst_ip 1.1.1.200 dst_port 4789 id 100 \
action mirred egress redirect dev vxlan0
To fix it, we add max_mod_hdr_actions to mlx5e_tc_flow_parse_attr struction,
max_mod_hdr_actions will store the max pedit action number we support and
num_mod_hdr_actions indicates how many pedit action we used, and store all
pedit action to mod_hdr_actions.
Fixes: d79b6df6b10a ("net/mlx5e: Add parsing of TC pedit actions to HW format") Cc: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tonghao Zhang [Mon, 28 Jan 2019 23:28:05 +0000 (15:28 -0800)]
net/mlx5e: Update hw flows when encap source mac changed
When we offload tc filters to hardware, hardware flows can
be updated when mac of encap destination ip is changed.
But we ignore one case, that the mac of local encap ip can
be changed too, so we should also update them.
To fix it, add route_dev in mlx5e_encap_entry struct to save
the local encap netdevice, and when mac changed, kernel will
flush all the neighbour on the netdevice and send NETEVENT_NEIGH_UPDATE
event. The mlx5 driver will delete the flows and add them when neighbour
available again.
Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Cc: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Wed, 6 Feb 2019 22:43:47 +0000 (14:43 -0800)]
qed*: Advance drivers version to 8.37.0.20
Version update for qed/qede modules.
Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rahul Verma [Wed, 6 Feb 2019 22:43:46 +0000 (14:43 -0800)]
qed: Change verbosity for coalescing message.
Fix unnecessary logging of message in an expected
default case where coalescing value read (via ethtool -c)
migh not be valid unless they are configured explicitly
in the hardware using ethtool -C.
Signed-off-by: Rahul Verma <rverma@marvell.com> Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Under heavy traffic load, when changing number of channels via
ethtool (ethtool -L) which will cause interface to be reloaded,
it was observed that some packets gets transmitted on old TX
channel/queue id which doesn't really exist after the channel
configuration leads to system crash.
Add a safeguard in the driver by validating queue id through
ndo_select_queue() which is called before the ndo_start_xmit().
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Consider TX tcs while deriving the max num_queues for PF.
Max supported queues is derived incorrectly in the case of multi-CoS.
Need to consider TCs while calculating num_queues for PF.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Assign UFP TC value to vlan priority in UFP mode.
In the case of Unified Fabric Port (UFP) mode, switch provides
the traffic class (TC) value to be used for the traffic.
Configure hardware to use this TC value for vlan priority.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Wed, 6 Feb 2019 22:43:42 +0000 (14:43 -0800)]
qed: Fix EQ full firmware assert.
When slowpath messages are sent with high rate, the resulting
events can lead to a FW assert in case they are not handled fast
enough (Event Queue Full assert). Attempt to send queued slowpath
messages only after the newly evacuated entries in the EQ ring
are indicated to FW.
Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Thu, 7 Feb 2019 00:36:43 +0000 (10:36 +1000)]
Merge branch 'vmwgfx-fixes-5.0-2' of git://people.freedesktop.org/~thomash/linux into drm-fixes
A patch set from Christoph for vmwgfx dma mode detection breakage with the
new dma code restructuring in 5.0
A couple of fixes also CC'd stable
Finally an improved IOMMU detection that automatically enables dma mapping
also with other vIOMMUS than the intel one if present and enabled.
Currently trying to start a VM in that case would fail catastrophically.
Mike Snitzer [Tue, 5 Feb 2019 22:07:58 +0000 (17:07 -0500)]
dm: don't use bio_trim() afterall
bio_trim() has an early return, which makes it _not_ idempotent, if the
offset is 0 and the bio's bi_size already matches the requested size.
Prior to DM, all users of bio_trim() were fine with this. But DM has
exposed the fact that bio_trim()'s early return is incompatible with a
cloned bio whose integrity payload must be trimmed via
bio_integrity_trim().
Fix this by reverting DM back to doing the equivalent of bio_trim() but
in an idempotent manner (so bio_integrity_trim is always performed).
Follow-on work is needed to assess what benefit bio_trim()'s early
return is providing to its existing callers.
Reported-by: Milan Broz <gmazyland@gmail.com> Fixes: 57c36519e4b94 ("dm: fix clone_bio() to trigger blk_recount_segments()") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Mikulas Patocka [Tue, 5 Feb 2019 10:09:00 +0000 (05:09 -0500)]
dm: add memory barrier before waitqueue_active
Block core changes to switch bio-based IO accounting to be percpu had a
side-effect of altering DM core to now rely on calling waitqueue_active
(in both bio-based and request-based) to check if another task is in
dm_wait_for_completion().
A memory barrier is needed before calling waitqueue_active(). DM core
doesn't piggyback on a preceding memory barrier so it must explicitly
use its own.
For more details on why using waitqueue_active() without a preceding
barrier is unsafe, please see the comment before the waitqueue_active()
definition in include/linux/wait.h.
Add the missing memory barrier by switching to using wq_has_sleeper().
Fixes: 6f75723190d8 ("dm: remove the pending IO accounting") Fixes: c4576aed8d85 ("dm: fix request-based dm's use of dm_wait_for_completion") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Dan Carpenter [Wed, 6 Feb 2019 15:35:15 +0000 (18:35 +0300)]
net: dsa: Fix NULL checking in dsa_slave_set_eee()
This function can't succeed if dp->pl is NULL. It will Oops inside the
call to return phylink_ethtool_get_eee(dp->pl, e);
Fixes: 1be52e97ed3e ("dsa: slave: eee: Allow ports to use phylink") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuck Lever [Fri, 25 Jan 2019 21:54:54 +0000 (16:54 -0500)]
svcrdma: Remove max_sge check at connect time
Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987b ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.
Apparently my memory is going, because some time later, I submitted
commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.
The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.
Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.
Reported-by: Don Dutile <ddutile@redhat.com> Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...") Cc: stable@vger.kernel.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Trond Myklebust [Mon, 21 Jan 2019 20:58:38 +0000 (15:58 -0500)]
nfsd: Fix error return values for nfsd4_clone_file_range()
If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.
Fixes: 42ec3d4c0218 ("vfs: make remap_file_range functions take and...") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Russell King [Wed, 6 Feb 2019 10:54:54 +0000 (10:54 +0000)]
MAINTAINERS: add maintainer for SFF/SFP/SFP+ support
Add maintainer entry for SFF/SFP/SFP+ support.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 4 Feb 2019 16:36:06 +0000 (08:36 -0800)]
rxrpc: bad unlock balance in rxrpc_recvmsg
When either "goto wait_interrupted;" or "goto wait_error;"
paths are taken, socket lock has already been released.
This patch fixes following syzbot splat :
WARNING: bad unlock balance detected!
5.0.0-rc4+ #59 Not tainted
-------------------------------------
syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
[<ffffffff86651353>] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syz-executor223/8256:
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
#0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Howells <dhowells@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomi Valkeinen [Fri, 11 Jan 2019 03:50:35 +0000 (05:50 +0200)]
drm/omap: dsi: Hack-fix DSI bus flags
Since commit b4935e3a3cfa ("drm/omap: Store bus flags in the
omap_dss_device structure") video mode flags are managed by the omapdss
(and later omapdrm) core based on bus flags stored in omap_dss_device.
This works fine for all devices whose video modes are set by the omapdss
and omapdrm core, but breaks DSI operation as the DSI still uses legacy
code paths and sets the DISPC timings manually.
To fix the problem properly we should move the DSI encoder to the new
encoder model. This will however require a considerable amount of work.
Restore DSI operation by adding back video mode flags handling in the
DSI encoder driver as a hack in the meantime.
Tomi Valkeinen [Fri, 11 Jan 2019 03:50:34 +0000 (05:50 +0200)]
drm/omap: dsi: Fix OF platform depopulate
Commit edb715dffdee ("drm/omap: dss: dsi: Move initialization code from
bind to probe") moved the of_platform_populate() call from dsi_bind() to
dsi_probe(), but failed to move the corresponding
of_platform_depopulate() from dsi_unbind() to dsi_remove(). This results
in OF child devices being potentially removed multiple times. Fix it by
placing the of_platform_depopulate() call where it belongs.