Jeff Layton [Mon, 1 Mar 2021 13:01:54 +0000 (08:01 -0500)]
ceph: don't use d_add in ceph_handle_snapdir
It's possible ceph_get_snapdir could end up finding a (disconnected)
inode that already exists in the cache. Change the prototype for
ceph_handle_snapdir to return a dentry pointer and have it use
d_splice_alias so we don't end up with an aliased dentry in the cache.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 1 Mar 2021 12:38:01 +0000 (07:38 -0500)]
ceph: don't clobber i_snap_caps on non-I_NEW inode
We want the snapdir to mirror the non-snapped directory's attributes for
most things, but i_snap_caps represents the caps granted on the snapshot
directory by the MDS itself. A misbehaving MDS could issue different
caps for the snapdir and we lose them here.
Only reset i_snap_caps when the inode is I_NEW. Also, move the setting
of i_op and i_fop inside the if block since they should never change
anyway.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break and a goto statements instead
of just letting the code fall through to the next case.
URL: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Fri, 5 Jun 2020 14:43:21 +0000 (10:43 -0400)]
ceph: convert ceph_write_begin to netfs_write_begin
Convert ceph_write_begin to use the netfs_write_begin helper. Most of
the ops we need for it are already in place from the readpage conversion
but we do add a new check_write_begin op since ceph needs to be able to
vet whether there is an incompatible writeback already in flight before
reading in the page.
With this, we can also remove the old ceph_do_readpage helper.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Mon, 1 Jun 2020 14:10:21 +0000 (10:10 -0400)]
ceph: convert ceph_readpage to netfs_readpage
Have the ceph KConfig select NETFS_SUPPORT. Add a new netfs ops
structure and the operations for it. Convert ceph_readpage to use
the new netfs_readpage helper.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Thu, 21 Jan 2021 21:27:14 +0000 (16:27 -0500)]
ceph: rework PageFsCache handling
With the new fscache API, the PageFsCache bit now indicates that the
page is being written to the cache and shouldn't be modified or released
until it's finished.
Change releasepage and invalidatepage to wait on that bit before
returning.
Also define FSCACHE_USE_NEW_IO_API so that we opt into the new fscache
API.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Jeff Layton [Thu, 21 Jan 2021 17:32:05 +0000 (12:32 -0500)]
ceph: rip out old fscache readpage handling
With the new netfs read helper functions, we won't need a lot of this
infrastructure as it handles the pagecache pages itself. Rip out the
read handling for now, and much of the old infrastructure that deals in
individual pages.
The cookie handling is mostly unchanged, however.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
David Howells [Sun, 25 Apr 2021 21:02:38 +0000 (22:02 +0100)]
iov_iter: Four fixes for ITER_XARRAY
Fix four things[1] in the patch that adds ITER_XARRAY[2]:
(1) Remove the address_space struct predeclaration. This is a holdover
from when it was ITER_MAPPING.
(2) Fix _copy_mc_to_iter() so that the xarray segment updates count and
iov_offset in the iterator before returning.
(3) Fix iov_iter_alignment() to not loop in the xarray case. Because the
middle pages are all whole pages, only the end pages need be
considered - and this can be reduced to just looking at the start
position in the xarray and the iteration size.
(4) Fix iov_iter_advance() to limit the size of the advance to no more
than the remaining iteration size.
Merge tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs
- Fix a panic when initializing perf uncore machinery on Haswell and
Broadwell servers
* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
Merge tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
"Fix ordering in the queued writer lock's slowpath"
* tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
Merge tag 'x86_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"Fix an out-of-bounds memory access when setting up a crash kernel with
kexec"
* tag 'x86_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
The games with 'rm' are on (two separate instances) of a local variable,
and make no difference.
Quoting Aditya Pakki:
"I was the author of the patch and it was the cause of the giant UMN
revert.
The patch is garbage and I was unaware of the steps involved in
retracting it. I *believed* the maintainers would pull it, given it
was already under Greg's list. The patch does not introduce any bugs
but is pointless and is stupid. I accept my incompetence and for not
requesting a revert earlier."
Link: https://lwn.net/Articles/854319/ Requested-by: Aditya Pakki <pakki001@umn.edu> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Late pin control fixes, would have been in the main pull request
normally but hey I got lucky and we got another week to polish up
v5.12 so here we go.
One driver fix and one making the core debugfs work:
- Fix the number of pins in the community of the Intel Lewisburg SoC
- Show pin numbers for controllers with base = 0 in the new debugfs
feature"
* tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Show pin numbers for the controllers with base = 0
pinctrl: lewisburg: Update number of pins in community
Subsystems affected by this patch series: coda, overlayfs, and
mm (pagecache and memcg)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/cgroup/slabinfo.py: updated to work on current kernel
mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
mm/filemap: fix find_lock_entries hang on 32-bit THP
ovl: fix reference counting in ovl_mmap error path
coda: fix reference counting in coda_file_mmap error path
tools/cgroup/slabinfo.py: updated to work on current kernel
slabinfo.py script does not work with actual kernel version.
First, it was unable to recognise SLUB susbsytem, and when I specified
it manually it failed again with
AttributeError: 'struct page' has no member 'obj_cgroups'
.. and then again with
File "tools/cgroup/memcg_slabinfo.py", line 221, in main
memcg.kmem_caches.address_of_(),
AttributeError: 'struct mem_cgroup' has no member 'kmem_caches'
Link: https://lkml.kernel.org/r/cec1a75e-43b4-3d64-2084-d9f98fda037f@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Tested-by: Roman Gushchin <guro@fb.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
No problem on 64-bit, or without huge pages, but xfstests generic/285
and other SEEK_HOLE/SEEK_DATA tests have regressed on huge tmpfs, and on
32-bit architectures, with the new mapping_seek_hole_data(). Several
different bugs turned out to need fixing.
u64 cast to stop losing bits when converting unsigned long to loff_t
(and let's use shifts throughout, rather than mixed with * and /).
Use round_up() when advancing pos, to stop assuming that pos was already
THP-aligned when advancing it by THP-size. (This use of round_up()
assumes that any THP has THP-aligned index: true at present and true
going forward, but could be recoded to avoid the assumption.)
Use xas_set() when iterating away from a THP, so that xa_index stays in
synch with start, instead of drifting away to return bogus offset.
Check start against end to avoid wrapping 32-bit xa_index to 0 (and to
handle these additional cases, seek_data or not, it's easier to break
the loop than goto: so rearrange exit from the function).
[hughd@google.com: remove unneeded u64 casts, per Matthew] Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104221347240.1170@eggly.anvils Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104211737410.3299@eggly.anvils Fixes: 41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <dchinner@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/filemap: fix find_lock_entries hang on 32-bit THP
No problem on 64-bit, or without huge pages, but xfstests generic/308
hung uninterruptibly on 32-bit huge tmpfs.
Since commit 0cc3b0ec23ce ("Clarify (and fix) in 4.13 MAX_LFS_FILESIZE
macros"), MAX_LFS_FILESIZE is only a PAGE_SIZE away from wrapping 32-bit
xa_index to 0, so the new find_lock_entries() has to be extra careful
when handling a THP.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2104211735430.3299@eggly.anvils Fixes: 5c211ba29deb ("mm: add and use find_lock_entries") Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wanpeng Li [Fri, 23 Apr 2021 08:23:20 +0000 (16:23 +0800)]
KVM: x86/xen: Take srcu lock when accessing kvm_memslots()
kvm_memslots() will be called by kvm_write_guest_offset_cached() so we should
take the srcu lock. Let's pull the srcu lock operation from kvm_steal_time_set_preempted()
again to fix xen part.
Fixes: 30b5c851af7 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1619166200-9215-1-git-send-email-wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge tag 'arm-fixes-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"These should be the final fixes for v5.12.
There is one fix for SD card detection on one Allwinner board, and a
few fixes for the Tegra platform that I had already queued up for
v5.13 due to a communication problem. This addresses MMC device
ordering on multiple machines, audio support on Jetson AGX Xavier and
suspend/resume on Jetson TX2"
* tag 'arm-fixes-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
arm64: tegra: Move clocks from RT5658 endpoint to device node
arm64: tegra: Fix mmc0 alias for Jetson Xavier NX
arm64: tegra: Set fw_devlink=on for Jetson TX2
arm64: tegra: Add unit-address for ACONNECT on Tegra186
Zhen Lei [Thu, 15 Apr 2021 09:27:44 +0000 (17:27 +0800)]
perf map: Fix error return code in maps__clone()
Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.
Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked thread") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: zhen lei <thunder.leizhen@huawei.com> Link: http://lore.kernel.org/lkml/20210415092744.3793-1-thunder.leizhen@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Thomas Richter [Wed, 21 Apr 2021 12:04:00 +0000 (14:04 +0200)]
perf ftrace: Fix access to pid in array when setting a pid filter
Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).
The root cause is a missing pointer dereference which causes an
array element address to be used as PID.
Fix this by extracting the PID.
Output before:
# ./perf ftrace -v -- ls
function_graph tracer is used
write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
failed to set ftrace pid
#
Output after:
./perf ftrace -v -- ls
function_graph tracer is used
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
4) | rcu_read_lock_sched_held() {
4) 0.552 us | rcu_lockdep_current_cpu_online();
4) 6.124 us | }
Reported-by: Alexander Schmidt <alexschm@de.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20210421120400.2126433-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization. This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.
Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.
Fixes: d20031bb63dd6dde ("perf tools: Add AUX area tracing Snapshot Mode") Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Link: http://lore.kernel.org/lkml/20210420151554.2031768-1-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
i915:
- GVT's BDW regression fix for cmd parser
- Fix modesetting in case of unexpected AUX timeouts"
* tag 'drm-fixes-2021-04-23' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix GCR_GENERAL_CNTL offset for dimgrey_cavefish
amd/display: allow non-linear multi-planar formats
drm/amd/display: Update modifier list for gfx10_3
drm/amdgpu: reserve fence slot to update page table
drm/i915: Fix modesetting in case of unexpected AUX timeouts
drm/i915/gvt: Fix BDW command parser regression
This contains a couple of device tree fixes for the v5.12 release cycle.
These are needed for proper audio support on Jetson AGX Xavier, to boot
the Jetson Xavier NX from an SD card and to be able to suspend/resume
the Jetson TX2.
* tegra/dt64:
arm64: tegra: Move clocks from RT5658 endpoint to device node
arm64: tegra: Fix mmc0 alias for Jetson Xavier NX
arm64: tegra: Set fw_devlink=on for Jetson TX2
arm64: tegra: Add unit-address for ACONNECT on Tegra186
David Howells [Mon, 22 Feb 2021 11:39:47 +0000 (11:39 +0000)]
fscache, cachefiles: Add alternate API to use kiocb for read/write to cache
Add an alternate API by which the cache can be accessed through a kiocb,
doing async DIO, rather than using the current API that tells the cache
where all the pages are.
The new API is intended to be used in conjunction with the netfs helper
library. A filesystem must pick one or the other and not mix them.
Filesystems wanting to use the new API must #define FSCACHE_USE_NEW_IO_API
before #including the header. This prevents them from continuing to use
the old API at the same time as there are incompatibilities in how the
PG_fscache page bit is used.
Changes:
v6:
- Provide a routine to shape a write so that the start and length can be
aligned for DIO[3].
v4:
- Use the vfs_iocb_iter_read/write() helpers[1]
- Move initial definition of fscache_begin_read_operation() here.
- Remove a commented-out line[2]
- Combine ki->term_func calls in cachefiles_read_complete()[2].
- Remove explicit NULL initialiser[2].
- Remove extern on func decl[2].
- Put in param names on func decl[2].
- Remove redundant else[2].
- Fill out the kdoc comment for fscache_begin_read_operation().
- Rename fs/fscache/page2.c to io.c to match later patches.
David Howells [Thu, 6 Feb 2020 14:22:24 +0000 (14:22 +0000)]
netfs: Define an interface to talk to a cache
Add an interface to the netfs helper library for reading data from the
cache instead of downloading it from the server and support for writing
data just downloaded or cleared to the cache.
The API passes an iov_iter to the cache read/write routines to indicate the
data/buffer to be used. This is done using the ITER_XARRAY type to provide
direct access to the netfs inode's pagecache.
When the netfs's ->begin_cache_operation() method is called, this must fill
in the cache_resources in the netfs_read_request struct, including the
netfs_cache_ops used by the helper lib to talk to the cache. The helper
lib does not directly access the cache.
Changes:
v6:
- Call trace_netfs_read() after beginning the cache op so that the cookie
debug ID can be logged[3].
- Don't record the error from writing to the cache. We don't want to pass
it back to the netfs[4].
- Fix copy-to-cache subreq amalgamation to not round up as it goes along
otherwise it overcalculates the length of the write[5].
v5:
- Use end_page_fscache() rather than unlock_page_fscache()[2].
v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue (can't use in_softirq()[1]).
- Add missing inc of netfs_n_rh_read stat.
- Move initial definition of fscache_begin_read_operation() elsewhere.
- Need to call op->begin_cache_operation() from netfs_write_begin().
David Howells [Tue, 22 Sep 2020 10:06:07 +0000 (11:06 +0100)]
netfs: Add write_begin helper
Add a helper to do the pre-reading work for the netfs write_begin address
space op.
Changes
v6:
- Fixed a missing rreq put in netfs_write_begin()[3].
- Use DEFINE_READAHEAD()[4].
v5:
- Made the wait for PG_fscache in netfs_write_begin() killable[2].
v4:
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue (can't use in_softirq()[1]).
David Howells [Tue, 3 Nov 2020 11:32:41 +0000 (11:32 +0000)]
netfs: Gather stats
Gather statistics from the netfs interface that can be exported through a
seqfile. This is intended to be called by a later patch when viewing
/proc/fs/fscache/stats.
David Howells [Wed, 13 May 2020 16:41:20 +0000 (17:41 +0100)]
netfs: Provide readahead and readpage netfs helpers
Add a pair of helper functions:
(*) netfs_readahead()
(*) netfs_readpage()
to do the work of handling a readahead or a readpage, where the page(s)
that form part of the request may be split between the local cache, the
server or just require clearing, and may be single pages and transparent
huge pages. This is all handled within the helper.
Note that while both will read from the cache if there is data present,
only netfs_readahead() will expand the request beyond what it was asked to
do, and only netfs_readahead() will write back to the cache.
netfs_readpage(), on the other hand, is synchronous and only fetches the
page (which might be a THP) it is asked for.
The netfs gives the helper parameters from the VM, the cache cookie it
wants to use (or NULL) and a table of operations (only one of which is
mandatory):
(*) expand_readahead() [optional]
Called to allow the netfs to request an expansion of a readahead
request to meet its own alignment requirements. This is done by
changing rreq->start and rreq->len.
(*) clamp_length() [optional]
Called to allow the netfs to cut down a subrequest to meet its own
boundary requirements. If it does this, the helper will generate
additional subrequests until the full request is satisfied.
(*) is_still_valid() [optional]
Called to find out if the data just read from the cache has been
invalidated and must be reread from the server.
(*) issue_op() [required]
Called to ask the netfs to issue a read to the server. The subrequest
describes the read. The read request holds information about the file
being accessed.
The netfs can cache information in rreq->netfs_priv.
Upon completion, the netfs should set the error, transferred and can
also set FSCACHE_SREQ_CLEAR_TAIL and then call
fscache_subreq_terminated().
(*) done() [optional]
Called after the pages have been unlocked. The read request is still
pinning the file and mapping and may still be pinning pages with
PG_fscache. rreq->error indicates any error that has been
accumulated.
(*) cleanup() [optional]
Called when the helper is disposing of a finished read request. This
allows the netfs to clear rreq->netfs_priv.
Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built
even if CONFIG_FSCACHE=n and in this case much of it should be optimised
away, allowing the filesystem to use it even when caching is disabled.
Changes:
v5:
- Comment why netfs_readahead() is putting pages[2].
- Use page_file_mapping() rather than page->mapping[2].
- Use page_index() rather than page->index[2].
- Use set_page_fscache()[3] rather then SetPageFsCache() as this takes an
appropriate ref too[4].
v4:
- Folded in a kerneldoc comment fix.
- Folded in a fix for the error handling in the case that ENOMEM occurs.
- Added flag to netfs_subreq_terminated() to indicate that the caller may
have been running async and stuff that might sleep needs punting to a
workqueue (can't use in_softirq()[1]).
Add set/end/wait_on_page_fscache() as aliases of
set/end/wait_page_private_2(). These allow a page to marked with
PG_fscache, the flag to be removed and waiters woken and waiting for the
flag to be cleared. A ref on the page is also taken and dropped.
[Linus suggested putting the fscache-themed functions into the
caching-specific headers rather than pagemap.h[1]]
Changes:
v5:
- Mirror the changes to the core routines[2].
David Howells [Mon, 15 Feb 2021 13:23:33 +0000 (13:23 +0000)]
netfs, mm: Move PG_fscache helper funcs to linux/netfs.h
Move the PG_fscache related helper funcs (such as SetPageFsCache()) to
linux/netfs.h rather than linux/fscache.h as the intention is to move to a
model where they're used by the network filesystem and the helper library,
but not by fscache/cachefiles itself.
David Howells [Thu, 10 Sep 2020 13:03:27 +0000 (14:03 +0100)]
mm: Implement readahead_control pageset expansion
Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.
The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.
The expansion will be cut short if a page already exists in either of the
areas being expanded into. Note that any expansion made in such a case is
not rolled back.
This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.
Changes:
v6:
- Fold in a patch from Matthew Wilcox to tell the ondemand readahead
algorithm about the expansion so that the next readahead starts at the
right place[2].
v4:
- Moved the declaration of readahead_expand() to a better place[1].
mm/readahead: Handle ractl nr_pages being modified
Filesystems are not currently permitted to modify the number of pages
in the ractl. An upcoming patch to add readahead_expand() changes that
rule, so remove the check and resync the loop counter after every call
to the filesystem.
For readahead_expand(), we need to modify the file ra_state, so pass it
down by adding it to the ractl. We have to do this because it's not always
the same as f_ra in the struct file that is already being passed.
David Howells [Mon, 10 Feb 2020 10:00:21 +0000 (10:00 +0000)]
iov_iter: Add ITER_XARRAY
Add an iterator, ITER_XARRAY, that walks through a set of pages attached to
an xarray, starting at a given page and offset and walking for the
specified amount of bytes. The iterator supports transparent huge pages.
The iterate_xarray() macro calls the helper function with rcu_access()
helped. I think that this is only a problem for iov_iter_for_each_range()
- and that returns an error for ITER_XARRAY (also, this function does not
appear to be called).
The caller must guarantee that the pages are all present and they must be
locked using PG_locked, PG_writeback or PG_fscache to prevent them from
going away or being migrated whilst they're being accessed.
This is useful for copying data from socket buffers to inodes in network
filesystems and for transferring data between those inodes and the cache
using direct I/O.
Whilst it is true that ITER_BVEC could be used instead, that would require
a bio_vec array to be allocated to refer to all the pages - which should be
redundant if inode->i_pages also points to all these pages.
Note that older versions of this patch implemented an ITER_MAPPING instead,
which was almost the same.
Changes:
v7:
- Rename iter_xarray_copy_pages() to iter_xarray_populate_pages()[1].
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Very late in the cycle but both risky if left unfixed and more or less
obvious.."
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
vhost-vdpa: protect concurrent access to vhost device iotlb
vhost-vdpa: protect concurrent access to vhost device iotlb
Protect vhost device iotlb by vhost_dev->mutex. Otherwise,
it might cause corruption of the list and interval tree in
struct vhost_iotlb if userspace sends the VHOST_IOTLB_MSG_V2
message concurrently.
Fixes: 4c8cf318("vhost: introduce vDPA-based backend") Cc: stable@vger.kernel.org Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210412095512.178-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Merge tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
One fix for the MMC card detect on the Pine H64 board
* tag 'sunxi-fixes-for-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd
Pull tpm fix from James Bottomley:
"This is an urgent regression fix for a tpm patch set that went in this
merge window. It looks like a rebase before the original pull request
lost a tpm_try_get_ops() so we have a lock imbalance in our code which
is causing oopses. The original patch was correct on the mailing list.
I'm sending this in agreement with Mimi (as joint maintainers of
trusted keys) because Jarkko is off communing with the Reindeer or
whatever it is Finns do when on holiday"
* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/tpmdd:
KEYS: trusted: Fix TPM reservation for seal/unseal
Jim Mattson [Thu, 22 Apr 2021 00:18:34 +0000 (17:18 -0700)]
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
The only stepping of Broadwell Xeon parts is stepping 1. Fix the
relevant isolation_ucodes[] entry, which previously enumerated
stepping 2.
Although the original commit was characterized as an optimization, it
is also a workaround for a correctness issue.
If a PMI arrives between kvm's call to perf_guest_get_msrs() and the
subsequent VM-entry, a stale value for the IA32_PEBS_ENABLE MSR may be
restored at the next VM-exit. This is because, unbeknownst to kvm, PMI
throttling may clear bits in the IA32_PEBS_ENABLE MSR. CPUs with "PEBS
isolation" don't suffer from this issue, because perf_guest_get_msrs()
doesn't report the IA32_PEBS_ENABLE value.
Fixes: 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Peter Shier <pshier@google.com> Acked-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20210422001834.1748319-1-jmattson@google.com
Andre Przywara [Wed, 14 Apr 2021 10:47:40 +0000 (11:47 +0100)]
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
Commit 941432d00768 ("arm64: dts: allwinner: Drop non-removable from
SoPine/LTS SD card") enabled the card detect GPIO for the SOPine module,
along the way with the Pine64-LTS, which share the same base .dtsi.
This was based on the observation that the Pine64-LTS has as "push-push"
SD card socket, and that the schematic mentions the card detect GPIO.
After having received two reports about failing SD card access with that
patch, some more research and polls on that subject revealed that there
are at least two different versions of the Pine64-LTS out there:
- On some boards (including mine) the card detect pin is "stuck" at
high, regardless of an microSD card being inserted or not.
- On other boards the card-detect is working, but is active-high, by
virtue of an explicit inverter circuit, as shown in the schematic.
To cover all versions of the board out there, and don't take any chances,
let's revert the introduction of the active-low CD GPIO, but let's use
the broken-cd property for the Pine64-LTS this time. That should avoid
regressions and should work for everyone, even allowing SD card changes
now.
The SOPine card detect has proven to be working, so let's keep that
GPIO in place.
Fixes: 941432d00768 ("arm64: dts: allwinner: Drop non-removable from SoPine/LTS SD card") Reported-by: Michael Weiser <michael.weiser@gmx.de> Reported-by: Daniel Kulesz <kuleszdl@posteo.org> Suggested-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Michael Weiser <michael.weiser@gmx.de> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20210414104740.31497-1-andre.przywara@arm.com
Andy Shevchenko [Thu, 15 Apr 2021 13:03:56 +0000 (16:03 +0300)]
pinctrl: core: Show pin numbers for the controllers with base = 0
The commit f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
enabled GPIO pin number and label in debugfs for pin controller. However,
it limited that feature to the chips where base is positive number. This,
in particular, excluded chips where base is 0 for the historical or backward
compatibility reasons. Refactor the code to include the latter as well.
But somehow got rebased so that the tpm_try_get_ops() in
tpm2_seal_trusted() got lost. This causes an imbalanced put of the
TPM ops and causes oopses on TIS based hardware.
This fix puts back the lost tpm_try_get_ops()
Fixes: 8c657a0590de ("KEYS: trusted: Reserve TPM for seal and unseal operations") Reported-by: Mimi Zohar <zohar@linux.ibm.com> Acked-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Merge tag 'mmc-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
"Replace WARN_ONCE with dev_warn_once for non-optimal sg-alignment in
the meson-gx host driver"
* tag 'mmc-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: meson-gx: replace WARN_ONCE with dev_warn_once about scatterlist size alignment in block mode
block: return -EBUSY when there are open partitions in blkdev_reread_part
The switch to go through blkdev_get_by_dev means we now ignore the
return value from bdev_disk_changed in __blkdev_get. Add a manual
check to restore the old semantics.
Fixes: 4601b4b130de ("block: reopen the device in blkdev_reread_part") Reported-by: Karel Zak <kzak@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210421160502.447418-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Accept non-linear buffers which use a multi-planar format, as long
as they don't use DCC.
Tested on GFX9 with NV12.
Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Harry Wentland <hwentlan@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
[Why]
Current list supports modifiers that have DCC_MAX_COMPRESSED_BLOCK
set to AMD_FMT_MOD_DCC_BLOCK_128B, while AMD_FMT_MOD_DCC_BLOCK_64B
is used instead by userspace.
[How]
Replace AMD_FMT_MOD_DCC_BLOCK_128B with AMD_FMT_MOD_DCC_BLOCK_64B
for modifiers with DCC supported.
Fixes: faa37f54ce0462 ("drm/amd/display: Expose modifiers") Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Philip Yang [Thu, 1 Apr 2021 04:22:23 +0000 (00:22 -0400)]
drm/amdgpu: reserve fence slot to update page table
Forgot to reserve a fence slot to use sdma to update page table, cause
below kernel BUG backtrace to handle vm retry fault while application is
exiting.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 5.11.x
Tony Lindgren [Sat, 17 Apr 2021 08:38:39 +0000 (11:38 +0300)]
gpio: omap: Save and restore sysconfig
As we are using cpu_pm to save and restore context, we must also save and
restore the GPIO sysconfig register. This is needed because we are not
calling PM runtime functions at all with cpu_pm.
We need to save the sysconfig on idle as it's value can get reconfigured by
PM runtime and can be different from the init time value. Device specific
flags like "ti,no-idle-on-init" can affect the init value.
Fixes: b764a5863fd8 ("gpio: omap: Remove custom PM calls and use cpu_pm instead") Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Adam Ford <aford173@gmail.com> Cc: Andreas Kemnade <andreas@kemnade.info> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Kan Liang [Thu, 15 Apr 2021 21:22:43 +0000 (14:22 -0700)]
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
There may be a kernel panic on the Haswell server and the Broadwell
server, if the snbep_pci2phy_map_init() return error.
The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
detect the existence of the SBOX, which is a MSR type of PMON unit.
The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
snbep_pci2phy_map_init() returns error, perf doesn't initialize the
PCI type of the PMON units, so the uncore_extra_pci_dev will not be
allocated. But perf may continue initializing the MSR type of PMON
units. A null dereference kernel panic will be triggered.
The sockets in a Haswell server or a Broadwell server are identical.
Only need to detect the existence of the SBOX once.
Current perf probes all available PCU devices and stores them into the
uncore_extra_pci_dev. It's unnecessary.
Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
detect the existence of the SBOX on the first available PCU device once.
Factor out hswep_has_limit_sbox(), since the Haswell server and the
Broadwell server uses the same way to detect the existence of the SBOX.
Add some macros to replace the magic number.
Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes") Reported-by: Steve Wahl <steve.wahl@hpe.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lkml.kernel.org/r/1618521764-100923-1-git-send-email-kan.liang@linux.intel.com
Merge tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix tp_printk command line and trace events
Masami added a wrapper to be able to unhash trace event pointers as
they are only read by root anyway, and they can also be extracted by
the raw trace data buffers. But this wrapper utilized the iterator to
have a temporary buffer to manipulate the text with.
tp_printk is a kernel command line option that will send the trace
output of a trace event to the console on boot up (useful when the
system crashes before finishing the boot). But the code used the same
wrapper that Masami added, and its iterator did not have a buffer, and
this caused the system to crash.
Have the wrapper just print the trace event normally if the iterator
has no temporary buffer"
* tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix checking event hash pointer logic when tp_printk is enabled
cap_setfcap is required to create file capabilities.
Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"),
a process running as uid 0 but without cap_setfcap is able to work
around this as follows: unshare a new user namespace which maps parent
uid 0 into the child namespace.
While this task will not have new capabilities against the parent
namespace, there is a loophole due to the way namespaced file
capabilities are represented as xattrs. File capabilities valid in
userns 1 are distinguished from file capabilities valid in userns 2 by
the kuid which underlies uid 0. Therefore the restricted root process
can unshare a new self-mapping namespace, add a namespaced file
capability onto a file, then use that file capability in the parent
namespace.
To prevent that, do not allow mapping parent uid 0 if the process which
opened the uid_map file does not have CAP_SETFCAP, which is the
capability for setting file capabilities.
As a further wrinkle: a task can unshare its user namespace, then open
its uid_map file itself, and map (only) its own uid. In this case we do
not have the credential from before unshare, which was potentially more
restricted. So, when creating a user namespace, we record whether the
creator had CAP_SETFCAP. Then we can use that during map_write().
3. Root user without CAP_SETFCAP cannot unshare -Ur:
root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted
Note: an alternative solution would be to allow uid 0 mappings by
processes without CAP_SETFCAP, but to prevent such a namespace from
writing any file capabilities. This approach can be seen at [1].
Background history: commit 95ebabde382 ("capabilities: Don't allow
writing ambiguous v3 file capabilities") tried to fix the issue by
preventing v3 fscaps to be written to disk when the root uid would map
to the same uid in nested user namespaces. This led to regressions for
various workloads. For example, see [2]. Ultimately this is a valid
use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing
ambiguous v3 file capabilities")").
Zhen Lei [Thu, 15 Apr 2021 08:34:16 +0000 (16:34 +0800)]
perf data: Fix error return code in perf_data__create_dir()
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.
However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:
BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
Write of size 8 at addr ffffc90000426008 by task kexec/1187
(gdb) list *crash_setup_memmap_entries+0x17e
0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
317 unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)
Make sure the ranges array becomes a single element allocated.
[ bp: Write a proper commit message. ]
Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gmx.de
tracing: Fix checking event hash pointer logic when tp_printk is enabled
Pointers in events that are printed are unhashed if the flags allow it,
and the logic to do so is called before processing the event output from
the raw ring buffer. In most cases, this is done when a user reads one of
the trace files.
But if tp_printk is added on the kernel command line, this logic is done
for trace events when they are triggered, and their output goes out via
printk. The unhash logic (and even the validation of the output) did not
support the tp_printk output, and would crash.
Nathan Chancellor points out that it should not have been merged into
mainline by itself. It was a fix for "gcov: use kvmalloc()", which is
still in -mm/-next. Merging it alone has broken the build.
Imre Deak [Mon, 12 Apr 2021 23:24:12 +0000 (02:24 +0300)]
drm/i915: Fix modesetting in case of unexpected AUX timeouts
In case AUX failures happen unexpectedly during a modeset, the driver
should still complete the modeset. In particular the driver should
perform the link training sequence steps even in case of an AUX failure,
as this sequence also includes port initialization steps. Not doing that
can leave the port/pipe in a broken state and lead for instance to a
flip done timeout.
Fix this by continuing with link training (in a no-LTTPR mode) if the
DPRX DPCD readout failed for some reason at the beginning of link
training. After a successful connector detection we already have the
DPCD read out and cached, so the failed repeated read for it should not
cause a problem. Note that a partial AUX read could in theory partly
overwrite the cached DPCD (and return error) but this overwrite should
not happen if the returned values are corrupted (due to a timeout or
some other IO error).
Kudos to Ville to root cause the problem.
Fixes: 7dffbdedb96a ("drm/i915: Disable LTTPR support when the DPCD rev < 1.4")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3308 Cc: stable@vger.kernel.org # 5.11 Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210412232413.2755054-1-imre.deak@intel.com
(cherry picked from commit e42e7e585984b85b0fb9dd1fefc85ee4800ca629) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[adjusted Fixes: tag]
preempt/dynamic: Fix typo in macro conditional statement
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched()
static call") tried to provide irqentry_exit_cond_resched() static call
in irqentry_exit, but has a typo in macro conditional statement.
Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210410073523.5493-1-zhouzhouyi@gmail.com
Neil Armstrong [Fri, 16 Apr 2021 09:43:47 +0000 (11:43 +0200)]
mmc: meson-gx: replace WARN_ONCE with dev_warn_once about scatterlist size alignment in block mode
Since commit e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode"),
support for SDIO SD_IO_RW_EXTENDED transferts are properly filtered but some driver
like brcmfmac still gives a block sg buffer size not aligned with SDIO block,
triggerring a WARN_ONCE() with scary stacktrace even if the transfer works fine
but with possible degraded performances.
Simply replace with dev_warn_once() to inform user this should be fixed to avoid
degraded performance.
This should be ultimately fixed in brcmfmac, but since it's only a performance issue
the warning should be removed.
Fixes: e085b51c74cc ("mmc: meson-gx: check for scatterlist size alignment in block mode") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lore.kernel.org/r/20210416094347.2015896-1-narmstrong@baylibre.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Merge tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Another smaller set of fixes for three of the Arm platforms:
TI OMAP:
Fix swapped mmc device order also for omap3 that got changed with
the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the
aliases should be board specific, all the mmc device instances are
all there in the SoC, and we do probe them by default so that PM
runtime can idle the devices if left enabled from the bootloader.
Qualcomm Snapdragon:
This bypasses the recently introduced interconnect handling in
the GENI (serial engine) driver when running off ACPI, as this
causes the GENI probe to fail and the Lenovo Yoga C630 to boot
without keyboard and touchpad.
Allwinner:
One 32kHz clock fix for the beelink gs1, a CD polarity fix for the
SoPine, some MAINTAINERS maintainance, and a clk / reset switch to
our headers"
* tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
MAINTAINERS: Match on allwinner keyword
MAINTAINERS: Add our new mailing-list
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
ARM: OMAP2+: Fix uninitialized sr_inst
ARM: dts: Fix swapped mmc order for omap3
ARM: OMAP2+: Fix warning for omap_init_time_of()
soc: qcom: geni: shield geni_icc_get() for ACPI boot
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
- Fix conversion for_each_membock() to for_each_mem_range()
- Fix footbridge PCI mapping
- Avoid uprobes hooking on thumb instructions
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9071/1: uprobes: Don't hook on thumb instructions
ARM: footbridge: fix PCI interrupt mapping
ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
Fredrik Strupe [Mon, 5 Apr 2021 20:52:05 +0000 (21:52 +0100)]
ARM: 9071/1: uprobes: Don't hook on thumb instructions
Since uprobes is not supported for thumb, check that the thumb bit is
not set when matching the uprobes instruction hooks.
The Arm UDF instructions used for uprobes triggering
(UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the
same encoding as a pair of unallocated 32-bit thumb instructions (not
UDF) when the condition code is 0b1111 (0xf). This in effect makes it
possible to trigger the uprobes functionality from thumb, and at that
using two unallocated instructions which are not permanently undefined.
Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Cc: stable@vger.kernel.org Fixes: c7edc9e326d5 ("ARM: add uprobes support") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two fixes: the libsas fix is for a problem that occurs when trying to
change the cache type of an ATA device and the libiscsi one is a
regression fix from this merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: Reset num_scatter if libata marks qc as NODATA
scsi: iscsi: Fix iSCSI cls conn state
Merge tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm
Pull vmwgfx fixes from Dave Airlie:
"This contains two regression fixes for vmwgfx, one due to a refactor
which meant locks were being used before initialisation, and the other
in fixing up some warnings from the core when destroying pinned
buffers.
vmwgfx:
- fixed unpinning before destruction
- lockdep init reordering"
* tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Make sure bo's are unpinned before putting them back
drm/vmwgfx: Fix the lockdep breakage
drm/vmwgfx: Make sure we unpin no longer needed buffers
Dave Airlie [Sat, 17 Apr 2021 23:26:54 +0000 (09:26 +1000)]
Merge tag 'vmwgfx-fixes-2021-04-14' of gitlab.freedesktop.org:zack/vmwgfx into drm-fixes
vmwgfx fixes for regressions in 5.12
Here's a set of 3 patches fixing ugly regressions
in the vmwgfx driver. We broke lock initialization
code and ended up using spinlocks before initialization
breaking lockdep.
Also there was a bit of a fallout from drm changes
which made the core validate that unreferenced buffers
have been unpinned. vmwgfx pinning code predates a lot
of the core drm and wasn't written to account for those
semantics. Fortunately changes required to fix it
are not too intrusive.
The changes have been validated by our internal ci.
readdir: make sure to verify directory entry for legacy interfaces too
This does the directory entry name verification for the legacy
"fillonedir" (and compat) interface that goes all the way back to the
dark ages before we had a proper dirent, and the readdir() system call
returned just a single entry at a time.
Nobody should use this interface unless you still have binaries from
1991, but let's do it right.
This came up during discussions about unsafe_copy_to_user() and proper
checking of all the inputs to it, as the networking layer is looking to
use it in a few new places. So let's make sure the _old_ users do it
all right and proper, before we add new ones.
See also commit 8a23eb804ca4 ("Make filldir[64]() verify the directory
entry filename is valid") which did the proper modern interfaces that
people actually use. It had a note:
Note that I didn't bother adding the checks to any legacy interfaces
that nobody uses.
which this now corrects. Note that we really don't care about POSIX and
the presense of '/' in a directory entry, but verify_dirent_name() also
ends up doing the proper name length verification which is what the
input checking discussion was about.
[ Another option would be to remove the support for this particular very
old interface: any binaries that use it are likely a.out binaries, and
they will no longer run anyway since we removed a.out binftm support
in commit eac616557050 ("x86: Deprecate a.out support").
But I'm not sure which came first: getdents() or ELF support, so let's
pretend somebody might still have a working binary that uses the
legacy readdir() case.. ]
Merge tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes for 5.12-rc8, including fixes from netfilter, and
bpf. BPF verifier changes stand out, otherwise things have slowed
down.
Current release - regressions:
- gro: ensure frag0 meets IP header alignment
- Revert "net: stmmac: re-init rx buffers when mac resume back"
- ethernet: macb: fix the restore of cmp registers
Previous releases - regressions:
- ixgbe: Fix NULL pointer dereference in ethtool loopback test
- ixgbe: fix unbalanced device enable/disable in suspend/resume
- phy: marvell: fix detection of PHY on Topaz switches
- make tcp_allowed_congestion_control readonly in non-init netns
- xen-netback: Check for hotplug-status existence before watching
Previous releases - always broken:
- bpf: mitigate a speculative oob read of up to map value size by
tightening the masking window
- sctp: fix race condition in sctp_destroy_sock
- sit, ip6_tunnel: Unregister catch-all devices
- netfilter: nftables: clone set element expression template
- net: geneve: check skb is large enough for IPv4/IPv6 header
- netlink: don't call ->netlink_bind with table lock held"
* tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
netlink: don't call ->netlink_bind with table lock held
MAINTAINERS: update my email
bpf: Update selftests to reflect new error states
bpf: Tighten speculative pointer arithmetic mask
bpf: Move sanitize_val_alu out of op switch
bpf: Refactor and streamline bounds check into helper
bpf: Improve verifier error messages for users
bpf: Rework ptr_limit into alu_limit and add common error path
bpf: Ensure off_reg has no mixed signed bounds for all types
bpf: Move off_reg into sanitize_ptr_alu
bpf: Use correct permission flag for mixed signed bounds arithmetic
ch_ktls: do not send snd_una update to TCB in middle
ch_ktls: tcb close causes tls connection failure
ch_ktls: fix device connection close
ch_ktls: Fix kernel panic
i40e: fix the panic when running bpf in xdpdrv mode
net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
net/mlx5e: Fix setting of RS FEC mode
net/mlx5: Fix setting of devlink traps in switchdev mode
Revert "net: stmmac: re-init rx buffers when mac resume back"
...
Merge tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"The largest change is for a regression that landed during -rc1 for
block-device read-only handling. Vaibhav found a new use for the
ability (originally introduced by virtio_pmem) to call back to the
platform to flush data, but also found an original bug in that
implementation. Lastly, Arnd cleans up some compile warnings in dax.
This has all appeared in -next with no reported issues.
Summary:
- Fix a regression of read-only handling in the pmem driver
- Fix a compile warning
- Fix support for platform cache flush commands on powerpc/papr"
* tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
libnvdimm: Notify disk drivers to revalidate region read-only
dax: avoid -Wempty-body warnings
Merge tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL memory class fixes from Dan Williams:
"A collection of fixes for the CXL memory class driver introduced in
this release cycle.
The driver was primarily developed on a work-in-progress QEMU
emulation of the interface and we have since found a couple places
where it hid spec compliance bugs in the driver, or had a spec
implementation bug itself.
The biggest change here is replacing a percpu_ref with an rwsem to
cleanup a couple bugs in the error unwind path during ioctl device
init. Lastly there were some minor cleanups to not export the
power-management sysfs-ABI for the ioctl device, use the proper sysfs
helper for emitting values, and prevent subtle bugs as new
administration commands are added to the supported list.
The bulk of it has appeared in -next save for the top commit which was
found today and validated on a fixed-up QEMU model.
Summary:
- Fix support for CXL memory devices with registers offset from the
BAR base.
- Fix the reporting of device capacity.
- Fix the driver commands list definition to be disconnected from the
UAPI command list.
- Replace percpu_ref with rwsem to fix initialization error path.
- Fix leaks in the driver initialization error path.
- Drop the power/ directory from CXL device sysfs.
- Use the recommended sysfs helper for attribute 'show'
implementations"
* tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/mem: Fix memory device capacity probing
cxl/mem: Fix register block offset calculation
cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX
cxl/mem: Disable cxl device power management
cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
cxl/mem: Use sysfs_emit() for attribute show routines
Ali Saidi [Thu, 15 Apr 2021 17:27:11 +0000 (17:27 +0000)]
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock. The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads
following the lock acquisition to observe values speculatively before
the write lock is truly acquired.
We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from
under it.
A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire
semantics addresses this issue at which point the atomic_cond_read can
be switched to use relaxed semantics.
Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <alisaidi@amazon.com>
[peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Tested-by: Steve Capper <steve.capper@arm.com>
Dan Williams [Sat, 17 Apr 2021 00:43:30 +0000 (17:43 -0700)]
cxl/mem: Fix memory device capacity probing
The CXL Identify Memory Device output payload emits capacity in 256MB
units. The driver is treating the capacity field as bytes. This was
missed because QEMU reports bytes when it should report bytes / 256MB.
netlink: don't call ->netlink_bind with table lock held
When I added support to allow generic netlink multicast groups to be
restricted to subscribers with CAP_NET_ADMIN I was unaware that a
genl_bind implementation already existed in the past.
It was reverted due to ABBA deadlock:
1. ->netlink_bind gets called with the table lock held.
2. genetlink bind callback is invoked, it grabs the genl lock.
But when a new genl subsystem is (un)registered, these two locks are
taken in reverse order.
One solution would be to revert again and add a comment in genl
referring 1e82a62fec613, "genetlink: remove genl_bind").
This would need a second change in mptcp to not expose the raw token
value anymore, e.g. by hashing the token with a secret key so userspace
can still associate subflow events with the correct mptcp connection.
However, Paolo Abeni reminded me to double-check why the netlink table is
locked in the first place.
I can't find one. netlink_bind() is already called without this lock
when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
Same holds for the netlink_unbind operation.
Digging through the history, commit f773608026ee1
("netlink: access nlk groups safely in netlink bind and getname")
expanded the lock scope.
commit 3a20773beeeeade ("net: netlink: cap max groups which will be considered in netlink_bind()")
... removed the nlk->ngroups access that the lock scope
extension was all about.
Reduce the lock scope again and always call ->netlink_bind without
the table lock.
The Fixes tag should be vs. the patch mentioned in the link below,
but that one got squash-merged into the patch that came earlier in the
series.
Fixes: 4d54cc32112d8d ("mptcp: avoid lock_fast usage in accept path") Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Sean Tranchetti <stranche@codeaurora.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix various kernel-doc warnings in lib/ due to missing or erroneous
function names.
Add kernel-doc for some function parameters that was missing. Use
kernel-doc "Return:" notation in earlycpio.c.
Quietens the following warnings:
lib/earlycpio.c:61: warning: expecting prototype for cpio_data find_cpio_data(). Prototype was for find_cpio_data() instead
lib/lru_cache.c:640: warning: expecting prototype for lc_dump(). Prototype was for lc_seq_dump_details() instead
lru_cache.c:90: warning: Function parameter or member 'cache' not described in 'lc_create'
lib/parman.c:368: warning: expecting prototype for parman_item_del(). Prototype was for parman_item_remove() instead
parman.c:309: warning: Excess function parameter 'prority' description in 'parman_prio_init'
lib/radix-tree.c:703: warning: expecting prototype for __radix_tree_insert(). Prototype was for radix_tree_insert() instead
radix-tree.c:180: warning: Excess function parameter 'addr' description in 'radix_tree_find_next_bit'
radix-tree.c:180: warning: Excess function parameter 'size' description in 'radix_tree_find_next_bit'
radix-tree.c:931: warning: Function parameter or member 'iter' not described in 'radix_tree_iter_replace'
Link: https://lkml.kernel.org/r/20210411221756.15461-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jiri Pirko <jiri@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>