Qiwu Chen [Fri, 8 May 2020 01:36:20 +0000 (18:36 -0700)]
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
Since commit a9e7c39fa9fd9 ("mm/vmscan.c: remove 7th argument of
isolate_lru_pages()"), the explanation of 'mode' argument has been
unnecessary. Let's remove it.
Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200501090346.2894-1-chenqiwu@xiaomi.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Penyaev [Fri, 8 May 2020 01:36:16 +0000 (18:36 -0700)]
epoll: atomically remove wait entry on wake up
This patch does two things:
- fixes a lost wakeup introduced by commit 339ddb53d373 ("fs/epoll:
remove unnecessary wakeups of nested epoll")
- improves performance for events delivery.
The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the
following __remove_wait_queue() calls in ep_poll() function.
This can lead to lost wakeups, because thread, which was woken up, can
handle not all the events in ->rdllist. (in better words the problem is
described here: https://lkml.org/lkml/2019/10/7/905)
The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry().
Internally init_wait() sets autoremove_wake_function as a callback,
which removes the wait entry atomically (under the wq locks) from the
list, thus the next coming wakeup hits the next wait entry in the wait
queue, thus preventing lost wakeups.
Problem is very well reproduced by the epoll60 test case [1].
Wait entry removal on wakeup has also performance benefits, because
there is no need to take a ep->lock and remove wait entry from the queue
after the successful wakeup. Here is the timing output of the epoll60
test case:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
real 0m6.970s
user 0m49.786s
sys 0m0.113s
After this patch:
real 0m5.220s
user 0m36.879s
sys 0m0.019s
The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
(number of "events/ms" represents event bandwidth, thus higher is
better; number of "run-time ms" represents overall time spent
doing the benchmark, thus lower is better)
Roman Penyaev [Fri, 8 May 2020 01:36:13 +0000 (18:36 -0700)]
kselftests: introduce new epoll60 testcase for catching lost wakeups
This test case catches lost wake up introduced by commit 339ddb53d373
("fs/epoll: remove unnecessary wakeups of nested epoll")
The test is simple: we have 10 threads and 10 event fds. Each thread
can harvest only 1 event. 1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.
In case of lost wakeup epoll_wait() will timeout and 0 will be returned.
Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:
5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")
the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups. Problem is fixed in the
following patch.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Khazhismel Kumykov <khazhy@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filipe Manana [Fri, 8 May 2020 01:36:10 +0000 (18:36 -0700)]
percpu: make pcpu_alloc() aware of current gfp context
Since 5.7-rc1, on btrfs we have a percpu counter initialization for
which we always pass a GFP_KERNEL gfp_t argument (this happens since
commit 2992df73268f78 ("btrfs: Implement DREW lock")).
That is safe in some contextes but not on others where allowing fs
reclaim could lead to a deadlock because we are either holding some
btrfs lock needed for a transaction commit or holding a btrfs
transaction handle open. Because of that we surround the call to the
function that initializes the percpu counter with a NOFS context using
memalloc_nofs_save() (this is done at btrfs_init_fs_root()).
However it turns out that this is not enough to prevent a possible
deadlock because percpu_alloc() determines if it is in an atomic context
by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this
case) and it is not aware that a NOFS context is set.
Because percpu_alloc() thinks it is in a non atomic context it locks the
pcpu_alloc_mutex. This can result in a btrfs deadlock when
pcpu_balance_workfn() is running, has acquired that mutex and is waiting
for reclaim, while the btrfs task that called percpu_counter_init() (and
therefore percpu_alloc()) is holding either the btrfs commit_root
semaphore or a transaction handle (done fs/btrfs/backref.c:
iterate_extent_inodes()), which prevents reclaim from finishing as an
attempt to commit the current btrfs transaction will deadlock.
Lockdep reports this issue with the following trace:
======================================================
WARNING: possible circular locking dependency detected
5.6.0-rc7-btrfs-next-77 #1 Not tainted
------------------------------------------------------
kswapd0/91 is trying to acquire lock: ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs]
but task is already holding lock: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL
to percpu_counter_init() in contextes where it is not reclaim safe,
however that type of approach is discouraged since
memalloc_[nofs|noio]_save() were introduced. Therefore this change
makes pcpu_alloc() look up into an existing nofs/noio context before
deciding whether it is in an atomic context or not.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Link: http://lkml.kernel.org/r/20200430164356.15543-1-fdmanana@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Waiman Long [Fri, 8 May 2020 01:36:06 +0000 (18:36 -0700)]
mm/slub: fix incorrect interpretation of s->offset
In a couple of places in the slub memory allocator, the code uses
"s->offset" as a check to see if the free pointer is put right after the
object. That check is no longer true with commit 3202fa62fb43 ("slub:
relocate freelist pointer to middle of object").
As a result, echoing "1" into the validate sysfs file, e.g. of dentry,
may cause a bunch of "Freepointer corrupt" error reports like the
following to appear with the system in panic afterwards.
=============================================================================
BUG dentry(666:pmcd.service) (Tainted: G B): Freepointer corrupt
-----------------------------------------------------------------------------
To fix it, use the check "s->offset == s->inuse" in the new helper
function freeptr_outside_object() instead. Also add another helper
function get_info_end() to return the end of info block (inuse + free
pointer if not overlapping with object).
Fixes: 3202fa62fb43 ("slub: relocate freelist pointer to middle of object") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Vitaly Nikolenko <vnik@duasynt.com> Cc: Silvio Cesare <silvio.cesare@gmail.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Markus Elfring <Markus.Elfring@web.de> Cc: Changbin Du <changbin.du@gmail.com> Link: http://lkml.kernel.org/r/20200429135328.26976-1-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current implementations of the rb_first() and rb_last() gdb
functions have a variable that references itself in its instanciation,
which causes the function to throw an error if a specific condition on
the argument is met. The original author rather intended to reference
the argument and made a typo. Referring the argument instead makes the
function work as intended.
Signed-off-by: Aymeric Agon-Rambosson <aymeric.agon@yandex.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Douglas Anderson <dianders@chromium.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Cc: Jackie Liu <liuyun01@kylinos.cn> Cc: Jason Wessel <jason.wessel@windriver.com> Link: http://lkml.kernel.org/r/20200427051029.354840-1-aymeric.agon@yandex.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
In the event that we add to ovflist, before commit 339ddb53d373
("fs/epoll: remove unnecessary wakeups of nested epoll") we would be
woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback.
With that wakeup removed, if we add to ovflist here, we may never wake
up. Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.
We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups. I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.
[khazhy@google.com: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Roman Penyaev <rpenyaev@suse.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: Heiher <r@hev.cc> Cc: Jason Baron <jbaron@akamai.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().
Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory(). As the
original coding of this call was correct, revert the change made by that
commit.
Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the trapping instruction contains a ':', for a memory access through
segment registers for example, the sed substitution will insert the '*'
marker in the middle of the instruction instead of the line address:
I started to think I had forgotten some quirk of the assembly syntax
before noticing that it was actually coming from the script. Fix it to
add the address marker at the right place for these instructions:
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
RIP: __pageblock_pfn_to_page+0x134/0x1c0
Call Trace:
set_zone_contiguous+0x56/0x70
page_alloc_init_late+0x166/0x176
kernel_init_freeable+0xfa/0x255
kernel_init+0xa/0x106
ret_from_fork+0x35/0x40
The issue becomes visible when having a lot of memory (e.g., 4TB)
assigned to a single NUMA node - a system that can easily be created
using QEMU. Inside VMs on a hypervisor with quite some memory
overcommit, this is fairly easy to trigger.
Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Baoquan He <bhe@redhat.com> Reviewed-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Baoquan He <bhe@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yafang Shao [Fri, 8 May 2020 01:35:43 +0000 (18:35 -0700)]
mm, memcg: fix error return value of mem_cgroup_css_alloc()
When I run my memcg testcase which creates lots of memcgs, I found
there're unexpected out of memory logs while there're still enough
available free memory. The error log is
The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs,
an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right
errno should be -ENOSPC "No space left on device", which is an
appropriate errno for userspace's failed mkdir.
As the errno really misled me, we should make it right. After this
patch, the error log will be
mkdir: cannot create directory 'foo.65533': No space left on device
[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal]
[akpm@linux-foundation.org: s/EBUSY/ENOSPC/, per Michal] Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Link: http://lkml.kernel.org/r/20200407063621.GA18914@dhcp22.suse.cz Link: http://lkml.kernel.org/r/1586192163-20099-1-git-send-email-laoar.shao@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 8 May 2020 01:35:39 +0000 (18:35 -0700)]
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.
Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().
This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?
1) Fix reference count leaks in various parts of batman-adv, from Xiyu
Yang.
2) Update NAT checksum even when it is zero, from Guillaume Nault.
3) sk_psock reference count leak in tls code, also from Xiyu Yang.
4) Sanity check TCA_FQ_CODEL_DROP_BATCH_SIZE netlink attribute in
fq_codel, from Eric Dumazet.
5) Fix panic in choke_reset(), also from Eric Dumazet.
6) Fix VLAN accel handling in bnxt_fix_features(), from Michael Chan.
7) Disallow out of range quantum values in sch_sfq, from Eric Dumazet.
8) Fix crash in x25_disconnect(), from Yue Haibing.
9) Don't pass pointer to local variable back to the caller in
nf_osf_hdr_ctx_init(), from Arnd Bergmann.
10) Wireguard should use the ECN decap helper functions, from Toke
Høiland-Jørgensen.
11) Fix command entry leak in mlx5 driver, from Moshe Shemesh.
12) Fix uninitialized variable access in mptcp's
subflow_syn_recv_sock(), from Paolo Abeni.
13) Fix unnecessary out-of-order ingress frame ordering in macsec, from
Scott Dial.
14) IPv6 needs to use a global serial number for dst validation just
like ipv4, from David Ahern.
15) Fix up PTP_1588_CLOCK deps, from Clay McClure.
16) Missing NLM_F_MULTI flag in gtp driver netlink messages, from
Yoshiyuki Kurauchi.
17) Fix a regression in that dsa user port errors should not be fatal,
from Florian Fainelli.
18) Fix iomap leak in enetc driver, from Dejin Zheng.
19) Fix use after free in lec_arp_clear_vccs(), from Cong Wang.
20) Initialize protocol value earlier in neigh code paths when
generating events, from Roman Mashak.
21) netdev_update_features() must be called with RTNL mutex in macsec
driver, from Antoine Tenart.
22) Validate untrusted GSO packets even more strictly, from Willem de
Bruijn.
23) Wireguard decrypt worker needs a cond_resched(), from Jason
Donenfeld.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
wireguard: send/receive: cond_resched() when processing worker ringbuffers
wireguard: socket: remove errant restriction on looping to self
wireguard: selftests: use normal kernel stack size on ppc64
net: ethernet: ti: am65-cpsw-nuss: fix irqs type
ionic: Use debugfs_create_bool() to export bool
net: dsa: Do not leave DSA master with NULL netdev_ops
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
net: stricter validation of untrusted gso packets
seg6: fix SRH processing to comply with RFC8754
net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms
net: dsa: ocelot: the MAC table on Felix is twice as large
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
selftests: net: tcp_mmap: fix SO_RCVLOWAT setting
net: hsr: fix incorrect type usage for protocol variable
net: macsec: fix rtnl locking issue
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
...
net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE
This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
that the frontend does not need counters, this hw stats type request
never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
the driver to disable the stats, however, if the driver cannot disable
counters, it bails out.
TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
(this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.
Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Lukas Bulwahn [Wed, 6 May 2020 20:29:06 +0000 (22:29 +0200)]
MAINTAINERS: put DYNAMIC INTERRUPT MODERATION in proper order
Commit 9b038086f06b ("docs: networking: convert DIM to RST") added a new
file entry to DYNAMIC INTERRUPT MODERATION to the end, and not following
alphabetical order.
So, ./scripts/checkpatch.pl -f MAINTAINERS complains:
WARNING: Misordered MAINTAINERS entry - list file patterns in alphabetic
order
#5966: FILE: MAINTAINERS:5966:
+F: lib/dim/
+F: Documentation/networking/net_dim.rst
Reorder the file entries to keep MAINTAINERS nicely ordered.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 7 May 2020 03:03:48 +0000 (20:03 -0700)]
Merge branch 'wireguard-fixes'
Jason A. Donenfeld says:
====================
wireguard fixes for 5.7-rc5
With Ubuntu and Debian having backported this into their kernels, we're
finally seeing testing from places we hadn't seen prior, which is nice.
With that comes more fixes:
1) The CI for PPC64 was running with extremely small stacks for 64-bit,
causing spurious crashes in surprising places.
2) There's was an old leftover routing loop restriction, which no longer
makes sense given the queueing architecture, and was causing problems
for people who really did want nested routing.
3) Not yielding our kthread on CONFIG_PREEMPT_VOLUNTARY systems caused
RCU stalls and other issues, reported by Wang Jian, with the fix
suggested by Sultan Alsawaf.
4) Clang spewed warnings in a selftest for CONFIG_IPV6=n, reported by
Arnd Bergmann.
5) A complicated if statement was simplified to an assignment while also
making the likely/unlikely hinting more correct and simple, and
increasing readability, suggested by Sultan.
Patches (2) and (3) have Fixes: lines and are probably good candidates
for stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: send/receive: use explicit unlikely branch instead of implicit coalescing
It's very unlikely that send will become true. It's nearly always false
between 0 and 120 seconds of a session, and in most cases becomes true
only between 120 and 121 seconds before becoming false again. So,
unlikely(send) is clearly the right option here.
What happened before was that we had this complex boolean expression
with multiple likely and unlikely clauses nested. Since this is
evaluated left-to-right anyway, the whole thing got converted to
unlikely. So, we can clean this up to better represent what's going on.
The generated code is the same.
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: selftests: initalize ipv6 members to NULL to squelch clang warning
Without setting these to NULL, clang complains in certain
configurations that have CONFIG_IPV6=n:
In file included from drivers/net/wireguard/ratelimiter.c:223:
drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning
struct sk_buff *skb4, *skb6;
^
= NULL
drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized]
ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count);
^~~~
drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning
struct ipv6hdr *hdr6;
^
We silence this warning by setting the variables to NULL as the warning
suggests.
Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: send/receive: cond_resched() when processing worker ringbuffers
Users with pathological hardware reported CPU stalls on CONFIG_
PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning
these workers would never terminate. That turned out not to be okay on
systems without forced preemption, which Sultan observed. This commit
adds a cond_resched() to the bottom of each loop iteration, so that
these workers don't hog the core. Note that we don't need this on the
napi poll worker, since that terminates after its budget is expended.
Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Reported-by: Wang Jian <larkwang@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: socket: remove errant restriction on looping to self
It's already possible to create two different interfaces and loop
packets between them. This has always been possible with tunnels in the
kernel, and isn't specific to wireguard. Therefore, the networking stack
already needs to deal with that. At the very least, the packet winds up
exceeding the MTU and is discarded at that point. So, since this is
already something that happens, there's no need to forbid the not very
exceptional case of routing a packet back to the same interface; this
loop is no different than others, and we shouldn't special case it, but
rather rely on generic handling of loops in general. This also makes it
easier to do interesting things with wireguard such as onion routing.
At the same time, we add a selftest for this, ensuring that both onion
routing works and infinite routing loops do not crash the kernel. We
also add a test case for wireguard interfaces nesting packets and
sending traffic between each other, as well as the loop in this case
too. We make sure to send some throughput-heavy traffic for this use
case, to stress out any possible recursion issues with the locks around
workqueues.
Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: selftests: use normal kernel stack size on ppc64
While at some point it might have made sense to be running these tests
on ppc64 with 4k stacks, the kernel hasn't actually used 4k stacks on
64-bit powerpc in a long time, and more interesting things that we test
don't really work when we deviate from the default (16k). So, we stop
pushing our luck in this commit, and return to the default instead of
the minimum.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The K3 INTA driver, which is source TX/RX IRQs for CPSW NUSS, defines IRQs
triggering type as EDGE by default, but triggering type for CPSW NUSS TX/RX
IRQs has to be LEVEL as the EDGE triggering type may cause unnecessary IRQs
triggering and NAPI scheduling for empty queues. It was discovered with
RT-kernel.
Fix it by explicitly specifying CPSW NUSS TX/RX IRQ type as
IRQF_TRIGGER_HIGH.
Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bool ionic_cq.done_color is exported using
debugfs_create_u8(), which requires a cast, preventing further compiler
checks.
Fix this by switching to debugfs_create_bool(), and dropping the cast.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: Do not leave DSA master with NULL netdev_ops
When ndo_get_phys_port_name() for the CPU port was added we introduced
an early check for when the DSA master network device in
dsa_master_ndo_setup() already implements ndo_get_phys_port_name(). When
we perform the teardown operation in dsa_master_ndo_teardown() we would
not be checking that cpu_dp->orig_ndo_ops was successfully allocated and
non-NULL initialized.
With network device drivers such as virtio_net, this leads to a NPD as
soon as the DSA switch hanging off of it gets torn down because we are
now assigning the virtio_net device's netdev_ops a NULL pointer.
Fixes: da7b9e9b00d4 ("net: dsa: Add ndo_get_phys_port_name() for CPU port") Reported-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 4 May 2020 19:58:56 +0000 (22:58 +0300)]
net: dsa: remove duplicate assignment in dsa_slave_add_cls_matchall_mirred
This was caused by a poor merge conflict resolution on my side. The
"act = &cls->rule->action.entries[0];" assignment was already present in
the code prior to the patch mentioned below.
Fixes: e13c2075280e ("net: dsa: refactor matchall mirred action to separate function") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Syzkaller again found a path to a kernel crash through bad gso input:
a packet with transport header extending beyond skb_headlen(skb).
Tighten validation at kernel entry:
- Verify that the transport header lies within the linear section.
To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.
- Match the gso_type against the ip_proto found by the flow dissector.
Fixes: bfd5f4a3d605 ("packet: Add GSO/csum offload support.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Segment Routing Header (SRH) which defines the SRv6 dataplane is defined
in RFC8754.
RFC8754 (section 4.1) defines the SR source node behavior which encapsulates
packets into an outer IPv6 header and SRH. The SR source node encodes the
full list of Segments that defines the packet path in the SRH. Then, the
first segment from list of Segments is copied into the Destination address
of the outer IPv6 header and the packet is sent to the first hop in its path
towards the destination.
If the Segment list has only one segment, the SR source node can omit the SRH
as he only segment is added in the destination address.
RFC8754 (section 4.1.1) defines the Reduced SRH, when a source does not
require the entire SID list to be preserved in the SRH. A reduced SRH does
not contain the first segment of the related SR Policy (the first segment is
the one already in the DA of the IPv6 header), and the Last Entry field is
set to n-2, where n is the number of elements in the SR Policy.
RFC8754 (section 4.3.1.1) defines the SRH processing and the logic to
validate the SRH (S09, S10, S11) which works for both reduced and
non-reduced behaviors.
This patch updates seg6_validate_srh() to validate the SRH as per RFC8754.
Signed-off-by: Ahmed Abdelsalam <ahabdels@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
FDB fixes for Felix and Ocelot switches
This series fixes the following problems:
- Dynamically learnt addresses never expiring (neither for Ocelot nor
for Felix)
- Half of the FDB not visible in 'bridge fdb show' (for Felix only)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sun, 3 May 2020 22:20:26 +0000 (01:20 +0300)]
net: dsa: ocelot: the MAC table on Felix is twice as large
When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC
addresses would appear, sometimes they wouldn't.
Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192
entries on VSC9959 (Felix), so the existing code from the Ocelot common
library only dumped half of Felix's MAC table. They are both organized
as a 4-way set-associative TCAM, so we just need a single variable
indicating the correct number of rows.
Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 6 May 2020 23:40:14 +0000 (16:40 -0700)]
Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform fix from Benson Leung:
"Fix a resource allocation issue in cros_ec_sensorhub.c"
* tag 'tag-chrome-platform-fixes-for-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_sensorhub: Allocate sensorhub resource before claiming sensors
Vladimir Oltean [Wed, 6 May 2020 17:48:13 +0000 (20:48 +0300)]
net: dsa: sja1105: the PTP_CLK extts input reacts on both edges
It looks like the sja1105 external timestamping input is not as generic
as we thought. When fed a signal with 50% duty cycle, it will timestamp
both the rising and the falling edge. When fed a short pulse signal,
only the timestamp of the falling edge will be seen in the PTPSYNCTS
register, because that of the rising edge had been overwritten. So the
moral is: don't feed it short pulse inputs.
Luckily this is not a complete deal breaker, as we can still work with
1 Hz square waves. But the problem is that the extts polling period was
not dimensioned enough for this input signal. If we leave the period at
half a second, we risk losing timestamps due to jitter in the measuring
process. So we need to increase it to 4 times per second.
Also, the very least we can do to inform the user is to deny any other
flags combination than with PTP_RISING_EDGE and PTP_FALLING_EDGE both
set.
Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hsr: fix incorrect type usage for protocol variable
Fix following sparse checker warning:-
net/hsr/hsr_slave.c:38:18: warning: incorrect type in assignment (different base types)
net/hsr/hsr_slave.c:38:18: expected unsigned short [unsigned] [usertype] protocol
net/hsr/hsr_slave.c:38:18: got restricted __be16 [usertype] h_proto
net/hsr/hsr_slave.c:39:25: warning: restricted __be16 degrades to integer
net/hsr/hsr_slave.c:39:57: warning: restricted __be16 degrades to integer
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Wed, 6 May 2020 13:58:30 +0000 (15:58 +0200)]
net: macsec: fix rtnl locking issue
netdev_update_features() must be called with the rtnl lock taken. Not
doing so triggers a warning, as ASSERT_RTNL() is used in
__netdev_update_features(), the first function called by
netdev_update_features(). Fix this.
Fixes: c850240b6c41 ("net: macsec: report real_dev features when HW offloading is enabled") Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 6 May 2020 10:16:56 +0000 (13:16 +0300)]
net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del()
The "info->fs.location" is a u32 that comes from the user via the
ethtool_set_rxnfc() function. We need to check for invalid values to
prevent a buffer overflow.
I copy and pasted this check from the mvpp2_ethtool_cls_rule_ins()
function.
Fixes: 90b509b39ac9 ("net: mvpp2: cls: Add Classification offload support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 6 May 2020 10:16:22 +0000 (13:16 +0300)]
net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx()
The "rss_context" variable comes from the user via ethtool_get_rxfh().
It can be any u32 value except zero. Eventually it gets passed to
mvpp22_rss_ctx() and if it is over MVPP22_N_RSS_TABLES (8) then it
results in an array overflow.
Fixes: 895586d5dc32 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We added fields in tcp_zerocopy_receive structure,
so make sure to clear all fields to not pass garbage to the kernel.
We were lucky because recent additions added 'out' parameters,
still we need to clean our reference implementation, before folks
copy/paste it.
Fixes: c8856c051454 ("tcp-zerocopy: Return inq along with tcp receive zerocopy.") Fixes: 33946518d493 ("tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 5 May 2020 23:29:03 +0000 (16:29 -0700)]
Merge tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
- Avoid loading asus-nb-wmi module on selected laptop models
- Fix S0ix debug support for Jasper Lake PMC
- Few fixes which have been reported by Hulk bot and others
* tag 'platform-drivers-x86-v5.7-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: thinkpad_acpi: Remove always false 'value < 0' statement
platform/x86: intel_pmc_core: avoid unused-function warnings
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
platform/x86/intel-uncore-freq: make uncore_root_kobj static
platform/x86: wmi: Make two functions static
platform/x86: surface3_power: Fix a NULL vs IS_ERR() check in probe
Roman Mashak [Sat, 2 May 2020 01:34:18 +0000 (21:34 -0400)]
neigh: send protocol value in neighbor create notification
When a new neighbor entry has been added, event is generated but it does not
include protocol, because its value is assigned after the event notification
routine has run, so move protocol assignment code earlier.
Fixes: df9b0e30d44c ("neighbor: Add protocol attribute") Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Roman Mashak <mrv@mojatatu.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dejin Zheng [Tue, 5 May 2020 02:03:29 +0000 (10:03 +0800)]
net: broadcom: fix a mistake about ioremap resource
Commit d7a5502b0bb8b ("net: broadcom: convert to
devm_platform_ioremap_resource_byname()") will broke this driver.
idm_base and nicpm_base were optional, after this change, they are
mandatory. it will probe fails with -22 when the dtb doesn't have them
defined. so revert part of this commit and make idm_base and nicpm_base
as optional.
Fixes: d7a5502b0bb8bde ("net: broadcom: convert to devm_platform_ioremap_resource_byname()") Reported-by: Jonathan Richardson <jonathan.richardson@broadcom.com> Cc: Scott Branden <scott.branden@broadcom.com> Cc: Ray Jui <ray.jui@broadcom.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since 'value' is declared as unsigned long, the following statement is
always false.
value < 0
So let's remove it.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
When both CONFIG_DEBUG_FS and CONFIG_PM_SLEEP are disabled, the
functions that got moved out of the #ifdef section now cause
a warning:
drivers/platform/x86/intel_pmc_core.c:654:13: error: 'pmc_core_lpm_display' defined but not used [-Werror=unused-function]
654 | static void pmc_core_lpm_display(struct pmc_dev *pmcdev, struct device *dev,
| ^~~~~~~~~~~~~~~~~~~~
drivers/platform/x86/intel_pmc_core.c:617:13: error: 'pmc_core_slps0_display' defined but not used [-Werror=unused-function]
617 | static void pmc_core_slps0_display(struct pmc_dev *pmcdev, struct device *dev,
| ^~~~~~~~~~~~~~~~~~~~~~
Rather than add even more #ifdefs here, remove them entirely and
let the compiler work it out, it can actually get rid of all the
debugfs calls without problems as long as the struct member is
there.
The two PM functions just need a __maybe_unused annotations to avoid
another warning instead of the #ifdef.
Fixes: aae43c2bcdc1 ("platform/x86: intel_pmc_core: Relocate pmc_core_*_display() to outside of CONFIG_DEBUG_FS") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Hans de Goede [Wed, 22 Apr 2020 22:05:59 +0000 (00:05 +0200)]
platform/x86: asus-nb-wmi: Do not load on Asus T100TA and T200TA
asus-nb-wmi does not add any extra functionality on these Asus
Transformer books. They have detachable keyboards, so the hotkeys are
send through a HID device (and handled by the hid-asus driver) and also
the rfkill functionality is not used on these devices.
Besides not adding any extra functionality, initializing the WMI interface
on these devices actually has a negative side-effect. For some reason
the \_SB.ATKD.INIT() function which asus_wmi_platform_init() calls drives
GPO2 (INT33FC:02) pin 8, which is connected to the front facing webcam LED,
high and there is no (WMI or other) interface to drive this low again
causing the LED to be permanently on, even during suspend.
This commit adds a blacklist of DMI system_ids on which not to load the
asus-nb-wmi and adds these Transformer books to this list. This fixes
the webcam LED being permanently on under Linux.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
platform/x86: intel_pmc_core: Change Jasper Lake S0ix debug reg map back to ICL
Jasper Lake uses Icelake PCH IPs and the S0ix debug interfaces are same as
Icelake. It uses SLP_S0_DBG register latch/read interface from Icelake
generation. It doesn't use Tiger Lake LPM debug registers. Change the
Jasper Lake S0ix debug interface to use the ICL reg map.
Fixes: 16292bed9c56 ("platform/x86: intel_pmc_core: Add Atom based Jasper Lake (JSL) platform support") Signed-off-by: Archana Patni <archana.patni@intel.com> Acked-by: David E. Box <david.e.box@intel.com> Tested-by: Divagar Mohandass <divagar.mohandass@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Linus Torvalds [Tue, 5 May 2020 01:55:20 +0000 (18:55 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- Wacom driver functional and regression fixes from Jason Gerecke
- race condition fix in usbhid, found by syzbot and fixed by Alan Stern
- a few device-specific quirks and ID additions
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
HID: mcp2221: add gpiolib dependency
HID: i2c-hid: reset Synaptics SYNA2393 on resume
HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT
HID: usbhid: Fix race between usbhid_close() and usbhid_stop()
Revert "HID: wacom: generic: read the number of expected touches on a per collection basis"
HID: alps: ALPS_1657 is too specific; use U1_UNICORN_LEGACY instead
HID: alps: Add AUI1657 device ID
HID: logitech: Add support for Logitech G11 extra keys
HID: multitouch: add eGalaxTouch P80H84 support
HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices
Qiushi Wu [Sat, 2 May 2020 22:42:59 +0000 (17:42 -0500)]
nfp: abm: fix a memory leak bug
In function nfp_abm_vnic_set_mac, pointer nsp is allocated by nfp_nsp_open.
But when nfp_nsp_has_hwinfo_lookup fail, the pointer is not released,
which can lead to a memory leak bug. Fix this issue by adding
nfp_nsp_close(nsp) in the error path.
Fixes: f6e71efdf9fb1 ("nfp: abm: look up MAC addresses via management FW") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 1 May 2020 18:11:08 +0000 (11:11 -0700)]
atm: fix a UAF in lec_arp_clear_vccs()
Gengming reported a UAF in lec_arp_clear_vccs(),
where we add a vcc socket to an entry in a per-device
list but free the socket without removing it from the
list when vcc->dev is NULL.
We need to call lec_vcc_close() to search and remove
those entries contain the vcc being destroyed. This can
be done by calling vcc->push(vcc, NULL) unconditionally
in vcc_destroy_socket().
Another issue discovered by Gengming's reproducer is
the vcc->dev may point to the static device lecatm_dev,
for which we don't need to register/unregister device,
so we can just check for vcc->dev->ops->owner.
Reported-by: Gengming Liu <l.dmxcsnsbh@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Fri, 1 May 2020 14:10:16 +0000 (15:10 +0100)]
net: stmmac: gmac5+: fix potential integer overflow on 32 bit multiply
The multiplication of cfg->ctr[1] by 1000000000 is performed using a
32 bit multiplication (since cfg->ctr[1] is a u32) and this can lead
to a potential overflow. Fix this by making the constant a ULL to
ensure a 64 bit multiply occurs.
Fixes: 504723af0d85 ("net: stmmac: Add basic EST support for GMAC5+")
Addresses-Coverity: ("Unintentional integer overflow") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Fri, 1 May 2020 03:53:49 +0000 (20:53 -0700)]
net_sched: fix tcm_parent in tc filter dump
When we tell kernel to dump filters from root (ffff:ffff),
those filters on ingress (ffff:0000) are matched, but their
true parents must be dumped as they are. However, kernel
dumps just whatever we tell it, that is either ffff:ffff
or ffff:0000:
$ nl-cls-list --dev=dummy0 --parent=root
cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
$ nl-cls-list --dev=dummy0 --parent=ffff:
cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all
This is confusing and misleading, more importantly this is
a regression since 4.15, so the old behavior must be restored.
And, when tc filters are installed on a tc class, the parent
should be the classid, rather than the qdisc handle. Commit edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto")
removed the classid we save for filters, we can just restore
this classid in tcf_block.
Steps to reproduce this:
ip li set dev dummy0 up
tc qd add dev dummy0 ingress
tc filter add dev dummy0 parent ffff: protocol arp basic action pass
tc filter show dev dummy0 root
Before this patch:
filter protocol arp pref 49152 basic
filter protocol arp pref 49152 basic handle 0x1
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
After this patch:
filter parent ffff: protocol arp pref 49152 basic
filter parent ffff: protocol arp pref 49152 basic handle 0x1
action order 1: gact action pass
random type none pass val 0
index 1 ref 1 bind 1
Fixes: a10fa20101ae ("net: sched: propagate q and parent from caller down to tcf_fill_node") Fixes: edf6711c9840 ("net: sched: remove classid and q fields from tcf_proto") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gcc-10 warns about functions that return a pointer to a stack
variable. In chcr_write_cpl_set_tcb_ulp(), this does not actually
happen, but it's too hard to see for the compiler:
drivers/crypto/chelsio/chcr_ktls.c: In function 'chcr_write_cpl_set_tcb_ulp.constprop':
drivers/crypto/chelsio/chcr_ktls.c:760:9: error: function may return address of local variable [-Werror=return-local-addr]
760 | return pos;
| ^~~
drivers/crypto/chelsio/chcr_ktls.c:712:5: note: declared here
712 | u8 buf[48] = {0};
| ^~~
Split the middle part of the function out into a helper to make
it easier to understand by both humans and compilers, which avoids
the warning.
Fixes: 5a4b9fe7fece ("cxgb4/chcr: complete record tx handling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Mon, 4 May 2020 17:39:42 +0000 (19:39 +0200)]
s390/qeth: fix cancelling of TX timer on dev_close()
With the introduction of TX coalescing, .ndo_start_xmit now potentially
starts the TX completion timer. So only kill the timer _after_ TX has
been disabled.
Fixes: ee1e52d1e4bb ("s390/qeth: add TX IRQ coalescing support for IQD devices") Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tag 'gcc-plugins-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-common.h: Update for GCC 10
gcc-plugins/stackleak: Avoid assignment for unused macro argument
Linus Torvalds [Mon, 4 May 2020 18:10:24 +0000 (11:10 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"A couple of bug fixes"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: vsock: kick send_pkt worker once device is started
virtio-blk: handle block_device_operations callbacks after hot unplug
Dejin Zheng [Mon, 4 May 2020 12:01:27 +0000 (20:01 +0800)]
net: enetc: fix an issue about leak system resources
the related system resources were not released when enetc_hw_alloc()
return error in the enetc_pci_mdio_probe(), add iounmap() for error
handling label "err_hw_alloc" to fix it.
Fixes: 6517798dd3432a ("enetc: Make MDIO accessors more generic and export to include/linux/fsl") Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 4 May 2020 17:47:41 +0000 (10:47 -0700)]
Merge tag 'flexible-array-member-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flex-array reverts from Gustavo Silva:
"This reverts flexible array changes in include/uapi/
These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems[1]. So, we better don't
take any chances and keep the zero-length arrays in place for now"
Tariq Toukan [Mon, 4 May 2020 08:36:02 +0000 (11:36 +0300)]
net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc()
When ENOSPC is set the idx is still valid and gets set to the global
MLX4_SINK_COUNTER_INDEX. However gcc's static analysis cannot tell that
ENOSPC is impossible from mlx4_cmd_imm() and gives this warning:
drivers/net/ethernet/mellanox/mlx4/main.c:2552:28: warning: 'idx' may be
used uninitialized in this function [-Wmaybe-uninitialized]
2552 | priv->def_counter[port] = idx;
Also, when ENOSPC is returned mlx4_allocate_default_counters should not
fail.
Fixes: 6de5f7f6a1fa ("net/mlx4_core: Allocate default counter per port") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aya Levin [Mon, 4 May 2020 08:27:46 +0000 (11:27 +0300)]
devlink: Fix reporter's recovery condition
Devlink health core conditions the reporter's recovery with the
expiration of the grace period. This is not relevant for the first
recovery. Explicitly demand that the grace period will only apply to
recoveries other than the first.
Fixes: c8e1da0bf923 ("devlink: Add health report functionality") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Petrov [Mon, 4 May 2020 06:26:43 +0000 (09:26 +0300)]
stmmac: fix pointer check after utilization in stmmac_interrupt
The paranoidal pointer check in IRQ handler looks very strange - it
really protects us only against bogus drivers which request IRQ line
with null pointer dev_id. However, the code fragment is incorrect
because the dev pointer is used before the actual check which leads
to undefined behavior. Remove the check to avoid confusing people
with incorrect code.
Signed-off-by: Maxim Petrov <mmrmaximuzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tuong Lien [Mon, 4 May 2020 04:15:54 +0000 (11:15 +0700)]
tipc: fix partial topology connection closure
When an application connects to the TIPC topology server and subscribes
to some services, a new connection is created along with some objects -
'tipc_subscription' to store related data correspondingly...
However, there is one omission in the connection handling that when the
connection or application is orderly shutdown (e.g. via SIGQUIT, etc.),
the connection is not closed in kernel, the 'tipc_subscription' objects
are not freed too.
This results in:
- The maximum number of subscriptions (65535) will be reached soon, new
subscriptions will be rejected;
- TIPC module cannot be removed (unless the objects are somehow forced
to release first);
The commit fixes the issue by closing the connection if the 'recvmsg()'
returns '0' i.e. when the peer is shutdown gracefully. It also includes
the other unexpected cases.
Acked-by: Jon Maloy <jmaloy@redhat.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to 1d27732f411d ("net: dsa: setup and teardown ports"), we would
not treat failures to set-up an user port as fatal, but after this
commit we would, which is a regression for some systems where interfaces
may be declared in the Device Tree, but the underlying hardware may not
be present (pluggable daughter cards for instance).
Fixes: 1d27732f411d ("net: dsa: setup and teardown ports") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
These structures can get embedded in other structures in user-space
and cause all sorts of warnings and problems. So, we better don't take
any chances and keep the zero-length arrays in place for now.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Linus Torvalds [Mon, 4 May 2020 16:16:37 +0000 (09:16 -0700)]
gcc-10 warnings: fix low-hanging fruit
Due to a bug-report that was compiler-dependent, I updated one of my
machines to gcc-10. That shows a lot of new warnings. Happily they
seem to be mostly the valid kind, but it's going to cause a round of
churn for getting rid of them..
This is the really low-hanging fruit of removing a couple of zero-sized
arrays in some core code. We have had a round of these patches before,
and we'll have many more coming, and there is nothing special about
these except that they were particularly trivial, and triggered more
warnings than most.
Hans de Goede [Sat, 2 May 2020 18:18:42 +0000 (20:18 +0200)]
HID: quirks: Add HID_QUIRK_NO_INIT_REPORTS quirk for Dell K12A keyboard-dock
Add a HID_QUIRK_NO_INIT_REPORTS quirk for the Dell K12A keyboard-dock,
which can be used with various Dell Venue 11 models.
Without this quirk the keyboard/touchpad combo works fine when connected
at boot, but when hotplugged 9 out of 10 times it will not work properly.
Adding the quirk fixes this.
Cc: Mario Limonciello <mario.limonciello@dell.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Dejin Zheng [Sun, 3 May 2020 12:32:26 +0000 (20:32 +0800)]
net: macb: fix an issue about leak related system resources
A call of the function macb_init() can fail in the function
fu540_c000_init. The related system resources were not released
then. use devm_platform_ioremap_resource() to replace ioremap()
to fix it.
Fixes: c218ad559020ff9 ("macb: Add support for SiFive FU540-C000") Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Yash Shah <yash.shah@sifive.com> Suggested-by: Nicolas Ferre <nicolas.ferre@microchip.com> Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 3 May 2020 03:09:25 +0000 (20:09 -0700)]
net_sched: sch_skbprio: add message validation to skbprio_change()
Do not assume the attribute has the right size.
Fixes: aea5f654e6b7 ("net/sched: add skbprio scheduler") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 3 May 2020 18:30:08 +0000 (11:30 -0700)]
Merge tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull more btrfs fixes from David Sterba:
"A few more stability fixes, minor build warning fixes and git url
fixup:
- fix partial loss of prealloc extent past i_size after fsync
- fix potential deadlock due to wrong transaction handle passing via
journal_info
- fix gcc 4.8 struct intialization warning
- update git URL in MAINTAINERS entry"
* tag 'for-5.7-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
MAINTAINERS: btrfs: fix git repo URL
btrfs: fix gcc-4.8 build warning for struct initializer
btrfs: transaction: Avoid deadlock due to bad initialization timing of fs_info::journal_info
btrfs: fix partial loss of prealloc extent past i_size after fsync
Linus Torvalds [Sun, 3 May 2020 18:04:57 +0000 (11:04 -0700)]
Merge tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- Fix a memory leak when dev_iommu gets freed and a sub-pointer does
not
- Build dependency fixes for Mediatek, spapr_tce, and Intel IOMMU
driver
- Export iommu_group_get_for_dev() only for GPLed modules
- Fix AMD IOMMU interrupt remapping when x2apic is enabled
- Fix error path in the QCOM IOMMU driver probe function
* tag 'iommu-fixes-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/qcom: Fix local_base status check
iommu: Properly export iommu_group_get_for_dev()
iommu/vt-d: Use right Kconfig option name
iommu/amd: Fix legacy interrupt remapping for x2APIC-enabled system
iommu: spapr_tce: Disable compile testing to fix build on book3s_32 config
iommu/mediatek: Fix MTK_IOMMU dependencies
iommu: Fix the memory leak in dev_iommu_free()
Linus Torvalds [Sat, 2 May 2020 20:45:30 +0000 (13:45 -0700)]
Merge tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
- prevent the intel_pstate driver from printing excessive diagnostic
messages in some cases (Chris Wilson)
- make the hibernation restore kernel freeze kernel threads as well as
user space tasks (Dexuan Cui)
- fix the ACPI device PM disagnostic messages to include the correct
power state name (Kai-Heng Feng).
* tag 'pm-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: ACPI: Output correct message on target power state
PM: hibernate: Freeze kernel threads in software_resume()
cpufreq: intel_pstate: Only mention the BIOS disabling turbo mode once
Linus Torvalds [Sat, 2 May 2020 18:31:12 +0000 (11:31 -0700)]
Merge tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
"Hoist the check for an unrepresentable FIBMAP return value into
ioctl_fibmap.
The internal kernel function can handle 64-bit values (and is needed
to fix a regression on ext4 + jbd2). It is only the userspace ioctl
that is so old that it cannot deal"
* tag 'iomap-5.7-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fibmap: Warn and return an error in case of block > INT_MAX
Linus Torvalds [Sat, 2 May 2020 18:24:01 +0000 (11:24 -0700)]
Merge tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- fix handling of backchannel binding in BIND_CONN_TO_SESSION
Bugfixes:
- Fix a credential use-after-free issue in pnfs_roc()
- Fix potential posix_acl refcnt leak in nfs3_set_acl
- defer slow parts of rpc_free_client() to a workqueue
- Fix an Oopsable race in __nfs_list_for_each_server()
- Fix trace point use-after-free race
- Regression: the RDMA client no longer responds to server disconnect
requests
- Fix return values of xdr_stream_encode_item_{present, absent}
- _pnfs_return_layout() must always wait for layoutreturn completion
Cleanups:
- Remove unreachable error conditions"
* tag 'nfs-for-5.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a race in __nfs_list_for_each_server()
NFSv4.1: fix handling of backchannel binding in BIND_CONN_TO_SESSION
SUNRPC: defer slow parts of rpc_free_client() to a workqueue.
NFSv4: Remove unreachable error condition due to rpc_run_task()
SUNRPC: Remove unreachable error condition
xprtrdma: Fix use of xdr_stream_encode_item_{present, absent}
xprtrdma: Fix trace point use-after-free race
xprtrdma: Restore wake-up-all to rpcrdma_cm_event_handler()
nfs: Fix potential posix_acl refcnt leak in nfs3_set_acl
NFS/pnfs: Fix a credential use-after-free issue in pnfs_roc()
NFS/pnfs: Ensure that _pnfs_return_layout() waits for layoutreturn completion
Linus Torvalds [Sat, 2 May 2020 18:16:14 +0000 (11:16 -0700)]
Merge tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"Core:
- Documentation typo fixes
- fix the channel indexes
- dmatest: fixes for process hang and iterations
Drivers:
- hisilicon: build error fix without PCI_MSI
- ti-k3: deadlock fix
- uniphier-xdmac: fix for reg region
- pch: fix data race
- tegra: fix clock state"
* tag 'dmaengine-fix-5.7-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: dmatest: Fix process hang when reading 'wait' parameter
dmaengine: dmatest: Fix iteration non-stop logic
dmaengine: tegra-apb: Ensure that clock is enabled during of DMA synchronization
dmaengine: fix channel index enumeration
dmaengine: mmp_tdma: Reset channel error on release
dmaengine: mmp_tdma: Do not ignore slave config validation errors
dmaengine: pch_dma.c: Avoid data race between probe and irq handler
dt-bindings: dma: uniphier-xdmac: switch to single reg region
include/linux/dmaengine: Typos fixes in API documentation
dmaengine: xilinx_dma: Add missing check for empty list
dmaengine: ti: k3-psil: fix deadlock on error path
dmaengine: hisilicon: Fix build error without PCI_MSI
Jia He [Fri, 1 May 2020 04:38:40 +0000 (12:38 +0800)]
vhost: vsock: kick send_pkt worker once device is started
Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.
The abnormal flow looks like:
host-userspace vhost vsock guest vsock
============== =========== ============
connect() --------> vhost_transport_send_pkt_work() initializing
| vq->private_data==NULL
| will not be queued
V
schedule_timeout(2s)
vhost_vsock_start() <--------- device ready
set vq->private_data
wait for 2s and failed
connect() again vq->private_data!=NULL recv connecting pkt
Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
initializing, hence the vhost_vsock_start has not been called. So
vq->private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq->private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
everything is fine.
As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.
After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.
Reported-by: Ning Bo <n.b@live.com> Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jia He <justin.he@arm.com> Link: https://lore.kernel.org/r/20200501043840.186557-1-justin.he@arm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Stefan Hajnoczi [Thu, 30 Apr 2020 14:04:42 +0000 (15:04 +0100)]
virtio-blk: handle block_device_operations callbacks after hot unplug
A userspace process holding a file descriptor to a virtio_blk device can
still invoke block_device_operations after hot unplug. This leads to a
use-after-free accessing vblk->vdev in virtblk_getgeo() when
ioctl(HDIO_GETGEO) is invoked:
A related problem is that virtblk_remove() leaks the vd_index_ida index
when something still holds a reference to vblk->disk during hot unplug.
This causes virtio-blk device names to be lost (vda, vdb, etc).
Fix these issues by protecting vblk->vdev with a mutex and reference
counting vblk so the vd_index_ida index can be removed in all cases.
Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature") Reported-by: Lance Digby <ldigby@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Linus Torvalds [Sat, 2 May 2020 00:19:15 +0000 (17:19 -0700)]
Merge tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio
Pull VFIO fixes from Alex Williamson:
- copy_*_user validity check for new vfio_dma_rw interface (Yan Zhao)
- Fix a potential math overflow (Yan Zhao)
- Use follow_pfn() for calculating PFNMAPs (Sean Christopherson)
* tag 'vfio-v5.7-rc4' of git://github.com/awilliam/linux-vfio:
vfio/type1: Fix VA->PA translation for PFNMAP VMAs in vaddr_get_pfn()
vfio: avoid possible overflow in vfio_iommu_type1_pin_pages
vfio: checking of validity of user vaddr in vfio_dma_rw
Linus Torvalds [Sat, 2 May 2020 00:03:06 +0000 (17:03 -0700)]
Merge tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fix for statx not grabbing the file table, making AT_EMPTY_PATH fail
- Cover a few cases where async poll can handle retry, eliminating the
need for an async thread
- fallback request busy/free fix (Bijan)
- syzbot reported SQPOLL thread exit fix for non-preempt (Xiaoguang)
- Fix extra put of req for sync_file_range (Pavel)
- Always punt splice async. We'll improve this for 5.8, but wanted to
eliminate the inode mutex lock from the non-blocking path for 5.7
(Pavel)
* tag 'io_uring-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
io_uring: punt splice async because of inode mutex
io_uring: check non-sync defer_list carefully
io_uring: fix extra put in sync_file_range()
io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()
io_uring: use proper references for fallback_req locking
io_uring: only force async punt if poll based retry can't handle it
io_uring: enable poll retry for any file with ->read_iter / ->write_iter
io_uring: statx must grab the file table for valid fd
drop_monitor: work around gcc-10 stringop-overflow warning
The current gcc-10 snapshot produces a false-positive warning:
net/core/drop_monitor.c: In function 'trace_drop_common.constprop':
cc1: error: writing 8 bytes into a region of size 0 [-Werror=stringop-overflow=]
In file included from net/core/drop_monitor.c:23:
include/uapi/linux/net_dropmon.h:36:8: note: at offset 0 to object 'entries' with size 4 declared here
36 | __u32 entries;
| ^~~~~~~
I reported this in the gcc bugzilla, but in case it does not get
fixed in the release, work around it by using a temporary variable.
Fixes: 9a8afc8d3962 ("Network Drop Monitor: Adding drop monitor implementation & Netlink protocol") Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94881 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In drivers/net/gtp.c, gtp_genl_dump_pdp() should set NLM_F_MULTI
flag since it returns multipart message.
This patch adds a new arg "flags" in gtp_genl_fill_info() so that
flags can be set by the callers.
Signed-off-by: Yoshiyuki Kurauchi <ahochauwaaaaa@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 29 Apr 2020 20:59:50 +0000 (13:59 -0700)]
ice: cleanup language in ice.rst for fw.app
The documentation for the ice driver around "fw.app" has a spelling
mistake in variation. Additionally, the language of "shall have a unique
name" sounds like a requirement. Reword this to read more like
a description or property.
Reported-by: Benjamin Fisher <benjamin.l.fisher@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Jakub Kicinski <kubakici@wp.pl> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Make PTP-specific drivers depend on PTP_1588_CLOCK
Commit d1cbfd771ce8 ("ptp_clock: Allow for it to be optional") changed
all PTP-capable Ethernet drivers from `select PTP_1588_CLOCK` to `imply
PTP_1588_CLOCK`, "in order to break the hard dependency between the PTP
clock subsystem and ethernet drivers capable of being clock providers."
As a result it is possible to build PTP-capable Ethernet drivers without
the PTP subsystem by deselecting PTP_1588_CLOCK. Drivers are required to
handle the missing dependency gracefully.
Some PTP-capable Ethernet drivers (e.g., TI_CPSW) factor their PTP code
out into separate drivers (e.g., TI_CPTS_MOD). The above commit also
changed these PTP-specific drivers to `imply PTP_1588_CLOCK`, making it
possible to build them without the PTP subsystem. But as Grygorii
Strashko noted in [1]:
On Wed, Apr 22, 2020 at 02:16:11PM +0300, Grygorii Strashko wrote:
> Another question is that CPTS completely nonfunctional in this case and
> it was never expected that somebody will even try to use/run such
> configuration (except for random build purposes).
In my view, enabling a PTP-specific driver without the PTP subsystem is
a configuration error made possible by the above commit. Kconfig should
not allow users to create a configuration with missing dependencies that
results in "completely nonfunctional" drivers.
I audited all network drivers that call ptp_clock_register() but merely
`imply PTP_1588_CLOCK` and found five PTP-specific drivers that are
likely nonfunctional without PTP_1588_CLOCK:
Note how these symbols all reference PTP or timestamping in their name;
this is a clue that they depend on PTP_1588_CLOCK.
Change them from `imply PTP_1588_CLOCK` [2] to `depends on PTP_1588_CLOCK`.
I'm not using `select PTP_1588_CLOCK` here because PTP_1588_CLOCK has
its own dependencies, which `select` would not transitively apply.
Additionally, remove the `select NET_PTP_CLASSIFY` from CPTS_TI_MOD;
PTP_1588_CLOCK already selects that.
[2]: NET_DSA_SJA1105_PTP had never declared any type of dependency on
PTP_1588_CLOCK (`imply` or otherwise); adding a `depends on PTP_1588_CLOCK`
here seems appropriate.
Cc: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: d1cbfd771ce8 ("ptp_clock: Allow for it to be optional") Signed-off-by: Clay McClure <clay@daemons.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 29 Apr 2020 02:01:58 +0000 (19:01 -0700)]
devlink: fix return value after hitting end in region read
Commit d5b90e99e1d5 ("devlink: report 0 after hitting end in region read")
fixed region dump, but region read still returns a spurious error:
$ devlink region read netdevsim/netdevsim1/dummy snapshot 0 addr 0 len 128 0000000000000000 a6 f4 c4 1c 21 35 95 a6 9d 34 c3 5b 87 5b 35 79 0000000000000010 f3 a0 d7 ee 4f 2f 82 7f c6 dd c4 f6 a5 c3 1b ae 0000000000000020 a4 fd c8 62 07 59 48 03 70 3b c7 09 86 88 7f 68 0000000000000030 6f 45 5d 6d 7d 0e 16 38 a9 d0 7a 4b 1e 1e 2e a6 0000000000000040 e6 1d ae 06 d6 18 00 85 ca 62 e8 7e 11 7e f6 0f 0000000000000050 79 7e f7 0f f3 94 68 bd e6 40 22 85 b6 be 6f b1 0000000000000060 af db ef 5e 34 f0 98 4b 62 9a e3 1b 8b 93 fc 17
devlink answers: Invalid argument 0000000000000070 61 e8 11 11 66 10 a5 f7 b1 ea 8d 40 60 53 ed 12
This is a minimal fix, I'll follow up with a restructuring
so we don't have two checks for the same condition.
Fixes: fdd41ec21e15 ("devlink: Return right error code in case of errors for region read") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netvsc_start_xmit is used as a callback function for the ndo_start_xmit
function pointer. ndo_start_xmit's return type is netdev_tx_t but
netvsc_start_xmit's return type is int.
This causes a failure with Control Flow Integrity (CFI), which requires
function pointer prototypes and callback function definitions to match
exactly. When CFI is in enforcing, the kernel panics. When booting a
CFI kernel with WSL 2, the VM is immediately terminated because of this.
====================
WoL fixes for DP83822 and DP83tc811
The WoL feature for each device was enabled during boot or when the PHY was
brought up which may be undesired. These patches disable the WoL in the
config_init. The disabling and enabling of the WoL is now done though the
set_wol call.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Tue, 28 Apr 2020 16:03:54 +0000 (11:03 -0500)]
net: phy: DP83TC811: Fix WoL in config init to be disabled
The WoL feature should be disabled when config_init is called and the
feature should turned on or off when set_wol is called.
In addition updated the calls to modify the registers to use the set_bit
and clear_bit function calls.
Fixes: 6d749428788b ("net: phy: DP83TC811: Introduce support for the
DP83TC811 phy") Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Tue, 28 Apr 2020 16:03:53 +0000 (11:03 -0500)]
net: phy: DP83822: Fix WoL in config init to be disabled
The WoL feature should be disabled when config_init is called and the
feature should turned on or off when set_wol is called.
In addition updated the calls to modify the registers to use the set_bit
and clear_bit function calls.
Fixes: 3b427751a9d0 ("net: phy: DP83822 initial driver submission") Signed-off-by: Dan Murphy <dmurphy@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 1 May 2020 14:53:08 +0000 (08:53 -0600)]
ipv6: Use global sernum for dst validation with nexthop objects
Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
$ ip netns add foo
$ ip -netns foo li set lo up
$ ip -netns foo addr add 2001:db8:11::1/128 dev lo
$ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
$ ip li add veth1 type veth peer name veth2
$ ip li set veth1 up
$ ip addr add 2001:db8:10::1/64 dev veth1
$ ip li set dev veth2 netns foo
$ ip -netns foo li set veth2 up
$ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
$ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
$ ip -6 route add 2001:db8:11::1/128 nhid 100
Create a pcpu entry on cpu 0:
$ taskset -a -c 0 ip -6 route get 2001:db8:11::1
Re-add the route entry:
$ ip -6 ro del 2001:db8:11::1
$ ip -6 route add 2001:db8:11::1/128 nhid 100
Route get on cpu 0 returns the stale pcpu:
$ taskset -a -c 0 ip -6 route get 2001:db8:11::1
RTNETLINK answers: Network is unreachable
While cpu 1 works:
$ taskset -a -c 1 ip -6 route get 2001:db8:11::1
2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium
Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.
IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.
With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.
Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).
This problem only affects routes using the new, external nexthops.
Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.
Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 1 May 2020 18:13:36 +0000 (11:13 -0700)]
Merge tag 'block-5.7-2020-05-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes for this release:
- NVMe pull request from Christoph, with a single fix for a double
free in the namespace error handling.
- Kill the bd_openers check in blk_drop_partitions(), fixing a
regression in this merge window (Christoph)"
* tag 'block-5.7-2020-05-01' of git://git.kernel.dk/linux-block:
block: remove the bd_openers checks in blk_drop_partitions
nvme: prevent double free in nvme_alloc_ns() error handling
Linus Torvalds [Fri, 1 May 2020 18:10:09 +0000 (11:10 -0700)]
Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes, and two reverts because the original patches
revealed underlying problems which the Tegra guys are now working on"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: aspeed: Avoid i2c interrupt status clear race condition.
i2c: amd-mp2-pci: Fix Oops in amd_mp2_pci_init() error handling
Revert "i2c: tegra: Better handle case where CPU0 is busy for a long time"
Revert "i2c: tegra: Synchronize DMA before termination"
i2c: iproc: generate stop event for slave writes
Linus Torvalds [Fri, 1 May 2020 18:05:28 +0000 (11:05 -0700)]
Merge tag 'sound-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a collection of small fixes around this time:
- One more try for fixing PCM OSS regression
- HD-audio: a new quirk for Lenovo, the improved driver blacklisting,
a lock fix in the minor error path, and a fix for the possible race
at monitor notifiaction
- USB-audio: a quirk ID fix, a fix for POD HD500 workaround"
* tag 'sound-5.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Correct a typo of NuPrime DAC-10 USB ID
ALSA: opti9xx: shut up gcc-10 range warning
ALSA: hda/hdmi: fix without unlocked before return
ALSA: hda/hdmi: fix race in monitor detection during probe
ALSA: hda/realtek - Two front mics on a Lenovo ThinkCenter
ALSA: line6: Fix POD HD500 audio playback
ALSA: pcm: oss: Place the plugin buffer overflow checks correctly (for 5.7)
ALSA: pcm: oss: Place the plugin buffer overflow checks correctly
ALSA: hda: Match both PCI ID and SSID for driver blacklist
Linus Torvalds [Fri, 1 May 2020 18:01:51 +0000 (11:01 -0700)]
Merge tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular scheduled fixes for graphics. Nothing to extreme bunch of
amdgpu fixes, i915 and qxl fixes, along with some misc ones.
All seems to be progressing normally.
core:
- EDID off by one DTD fix
- DP mst write return code fix
dma-buf:
- fix SET_NAME ioctl uapi
- doc fixes
amdgpu:
- Fix a green screen on resume issue
- PM fixes for SR-IOV SDMA fix for navi
- Renoir display fixes
- Cursor and pageflip stuttering fixes
- Misc additional display fixes
- (uapi) Add additional DCC tiling flags for navi1x
qxl:
- use after gree fix
- fix lost kunmap
- release leak fix
virtio:
- context destruction fix"
* tag 'drm-fixes-2020-05-01' of git://anongit.freedesktop.org/drm/drm: (26 commits)
dma-buf: fix documentation build warnings
drm/qxl: qxl_release use after free
drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
drm/i915: Use proper fault mask in interrupt postinstall too
drm/amd/display: Use cursor locking to prevent flip delays
drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1
drm/amd/display: Defer cursor update around VUPDATE for all ASIC
drm/amd/display: fix rn soc bb update
drm/amd/display: check if REFCLK_CNTL register is present
drm/amdgpu: bump version for invalidate L2 before SDMA IBs
drm/amdgpu: invalidate L2 before SDMA IBs (v2)
drm/amdgpu: add tiling flags from Mesa
drm/amd/powerplay: avoid using pm_en before it is initialized revised
Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
drm/virtio: only destroy created contexts
drm/dp_mst: Fix drm_dp_send_dpcd_write() return code
drm/i915/gt: Check cacheline is valid before acquiring
drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()
...
Linus Torvalds [Fri, 1 May 2020 18:00:07 +0000 (11:00 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four minor fixes: three in drivers and one in the core.
The core one allows an additional state change that fixes a regression
introduced by an update to the aacraid driver in the previous merge
window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target/iblock: fix WRITE SAME zeroing
scsi: qla2xxx: check UNLOADING before posting async work
scsi: qla2xxx: set UNLOADING before waiting for session deletion
scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK