Chuck Lever [Wed, 27 Jul 2022 18:40:03 +0000 (14:40 -0400)]
NFSD: Fix strncpy() fortify warning
In function ‘strncpy’,
inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3,
inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11:
/home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation]
52 | #define __underlying_strncpy __builtin_strncpy
| ^
/home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’
89 | return __underlying_strncpy(p, q, size);
| ^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 22 Jul 2022 20:09:04 +0000 (16:09 -0400)]
NFSD: Optimize nfsd4_encode_readv()
write_bytes_to_xdr_buf() is pretty expensive to use for inserting
an XDR data item that is always 1 XDR_UNIT at an address that is
always XDR word-aligned.
Since both the readv and splice read paths encode EOF and maxcount
values, move both to a common code path.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 22 Jul 2022 20:08:45 +0000 (16:08 -0400)]
NFSD: Optimize nfsd4_encode_fattr()
write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.
However, it is costly. In nfsd4_encode_fattr(), it is unnecessary
because the data item is fixed in size and the buffer destination
address is always word-aligned.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 22 Jul 2022 20:08:38 +0000 (16:08 -0400)]
NFSD: Optimize nfsd4_encode_operation()
write_bytes_to_xdr_buf() is a generic way to place a variable-length
data item in an already-reserved spot in the encoding buffer.
However, it is costly, and here, it is unnecessary because the
data item is fixed in size, the buffer destination address is
always word-aligned, and the destination location is already in
@p.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Jeff Layton [Wed, 20 Jul 2022 12:39:23 +0000 (08:39 -0400)]
nfsd: silence extraneous printk on nfsd.ko insertion
This printk pops every time nfsd.ko gets plugged in. Most kmods don't do
that and this one is not very informative. Olaf's email address seems to
be defunct at this point anyway. Just drop it.
Cc: Olaf Kirch <okir@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Dai Ngo [Fri, 15 Jul 2022 23:54:53 +0000 (16:54 -0700)]
NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Currently there is no limit on how many v4 clients are supported
by the system. This can be a problem in systems with small memory
configuration to function properly when a very large number of
clients exist that creates memory shortage conditions.
This patch enforces a limit of 1024 NFSv4 clients, including courtesy
clients, per 1GB of system memory. When the number of the clients
reaches the limit, requests that create new clients are returned
with NFS4ERR_DELAY and the laundromat is kicked start to trim old
clients. Due to the overhead of the upcall to remove the client
record, the maximun number of clients the laundromat removes on
each run is limited to 128. This is done to ensure the laundromat
can still process the other tasks in a timely manner.
Since there is now a limit of the number of clients, the 24-hr
idle time limit of courtesy client is no longer needed and was
removed.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:27:09 +0000 (14:27 -0400)]
NFSD: Ensure nf_inode is never dereferenced
The documenting comment for struct nf_file states:
/*
* A representation of a file that has been opened by knfsd. These are hashed
* in the hashtable by inode pointer value. Note that this object doesn't
* hold a reference to the inode by itself, so the nf_inode pointer should
* never be dereferenced, only used for comparison.
*/
Replace the two existing dereferences to make the comment always
true.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:27:02 +0000 (14:27 -0400)]
NFSD: NFSv4 CLOSE should release an nfsd_file immediately
The last close of a file should enable other accessors to open and
use that file immediately. Leaving the file open in the filecache
prevents other users from accessing that file until the filecache
garbage-collects the file -- sometimes that takes several seconds.
Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?387 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:26:30 +0000 (14:26 -0400)]
NFSD: Convert the filecache to use rhashtable
Enable the filecache hash table to start small, then grow with the
workload. Smaller server deployments benefit because there should
be lower memory utilization. Larger server deployments should see
improved scaling with the number of open files.
Suggested-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:26:16 +0000 (14:26 -0400)]
NFSD: Replace the "init once" mechanism
In a moment, the nfsd_file_hashtbl global will be replaced with an
rhashtable. Replace the one or two spots that need to check if the
hash table is available. We can easily reuse the SHUTDOWN flag for
this purpose.
Document that this mechanism relies on callers to hold the
nfsd_mutex to prevent init, shutdown, and purging to run
concurrently.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:25:57 +0000 (14:25 -0400)]
NFSD: Refactor __nfsd_file_close_inode()
The code that computes the hashval is the same in both callers.
To prevent them from going stale, reframe the documenting comments
to remove descriptions of the underlying hash table structure, which
is about to be replaced.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:25:37 +0000 (14:25 -0400)]
NFSD: No longer record nf_hashval in the trace log
I'm about to replace nfsd_file_hashtbl with an rhashtable. The
individual hash values will no longer be visible or relevant, so
remove them from the tracepoints.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:25:30 +0000 (14:25 -0400)]
NFSD: Never call nfsd_file_gc() in foreground paths
The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.
However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:
- Every call to nfsd_file_acquire() and nfsd_file_put() results in
an LRU walk. This has the same effect on lookup latency as long
chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
clients, which are the type of files the filecache is supposed to
help.
To address those negative impacts, remove the direct calls to the
garbage collector. Subsequent patches will address maintaining
lookup efficiency as cache capacity increases.
Suggested-by: Wang Yugui <wangyugui@e16-tech.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:25:24 +0000 (14:25 -0400)]
NFSD: Fix the filecache LRU shrinker
Without LRU item rotation, the shrinker visits only a few items on
the end of the LRU list, and those would always be long-term OPEN
files for NFSv4 workloads. That makes the filecache shrinker
completely ineffective.
Adopt the same strategy as the inode LRU by using LRU_ROTATE.
Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:25:17 +0000 (14:25 -0400)]
NFSD: Leave open files out of the filecache LRU
There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:
fs/nfsd/filecache.c
482 ret = list_lru_walk(&nfsd_file_lru,
483 nfsd_file_lru_cb,
484 &head, LONG_MAX);
causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.
What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.
Address this by ensuring that open files are not kept on the LRU
list.
Reported-by: Frank van der Linden <fllinden@amazon.com> Reported-by: Wang Yugui <wangyugui@e16-tech.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386 Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:24:58 +0000 (14:24 -0400)]
NFSD: Hook up the filecache stat file
There has always been the capability of exporting filecache metrics
via /proc, but it was never hooked up. Let's surface these metrics
to enable better observability of the filecache.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Fri, 8 Jul 2022 18:23:59 +0000 (14:23 -0400)]
NFSD: Report count of calls to nfsd_file_acquire()
Count the number of successful acquisitions that did not create a
file (ie, acquisitions that do not result in a compulsory cache
miss). This count can be compared directly with the reported hit
count to compute a hit ratio.
Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Tue, 21 Jun 2022 14:06:23 +0000 (10:06 -0400)]
NFSD: Instrument fh_verify()
Capture file handles and how they map to local inodes. In particular,
NFSv4 PUTFH uses fh_verify() so we can now observe which file handles
are the target of OPEN, LOOKUP, RENAME, and so on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
NLM: Defend against file_lock changes after vfs_test_lock()
Instead of trusting that struct file_lock returns completely unchanged
after vfs_test_lock() when there's no conflicting lock, stash away our
nlm_lockowner reference so we can properly release it for all cases.
This defends against another file_lock implementation overwriting fl_owner
when the return type is F_UNLCK.
Reported-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Tested-by: Roberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Chuck Lever [Tue, 19 Jul 2022 13:18:35 +0000 (09:18 -0400)]
SUNRPC: Fix xdr_encode_bool()
I discovered that xdr_encode_bool() was returning the same address
that was passed in the @p parameter. The documenting comment states
that the intent is to return the address of the next buffer
location, just like the other "xdr_encode_*" helpers.
The result was the encoded results of NFSv3 PATHCONF operations were
not formed correctly.
Fixes: ded04a587f6c ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
Jeff Layton [Fri, 29 Jul 2022 21:01:07 +0000 (17:01 -0400)]
nfsd: eliminate the NFSD_FILE_BREAK_* flags
We had a report from the spring Bake-a-thon of data corruption in some
nfstest_interop tests. Looking at the traces showed the NFS server
allowing a v3 WRITE to proceed while a read delegation was still
outstanding.
Currently, we only set NFSD_FILE_BREAK_* flags if
NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc.
NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for
COMMIT ops, where we need a writeable filehandle but don't need to
break read leases.
It doesn't make any sense to consult that flag when allocating a file
since the file may be used on subsequent calls where we do want to break
the lease (and the usage of it here seems to be reverse from what it
should be anyway).
Also, after calling nfsd_open_break_lease, we don't want to clear the
BREAK_* bits. A lease could end up being set on it later (more than
once) and we need to be able to break those leases as well.
This means that the NFSD_FILE_BREAK_* flags now just mirror
NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just
drop those flags and unconditionally call nfsd_open_break_lease every
time.
Reported-by: Olga Kornieskaia <kolga@netapp.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360 Fixes: 65294c1f2c5e (nfsd: add a new struct file caching facility to nfsd) Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca18d1e NFSD: Clean up the show_nf_flags() macro Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix SIGSEGV when processing syscall args in perf.data files in 'perf
trace'
- Sync kvm, msr-index and cpufeatures headers with the kernel sources
- Fix 'convert perf time to TSC' 'perf test':
- No need to open events twice
- Fix finding correct event on hybrid systems
* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf trace: Fix SIGSEGV when processing syscall args
perf tests: Fix Convert perf time to TSC test for hybrid
perf tests: Stop Convert perf time to TSC test opening events twice
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/kvm.h with the kernel sources
Matthew Auld [Tue, 12 Jul 2022 17:40:50 +0000 (18:40 +0100)]
drm/i915/ttm: fix 32b build
Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- A single data race fix on the perf event cleanup path to avoid
endless loops due to insufficient locking
* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Improve the check whether the kernel supports WP mappings so that it
can accomodate a XenPV guest due to how the latter is setting up the
PAT machinery
- Now that the retbleed nightmare is public, here's the first round of
fallout fixes:
* Fix a build failure on 32-bit due to missing include
* Remove an untraining point in espfix64 return path
* other small cleanups
* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bugs: Remove apostrophe typo
um: Add missing apply_returns()
x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
x86/bugs: Mark retbleed_strings static
x86/pat: Fix x86_has_pat_wp()
x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
Merge tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- fix Goodix driver to properly behave on the Aya Neo Next
- some more sanity checks in usbtouchscreen driver
- a tweak in wm97xx driver in preparation for remove() to return void
- a clarification in input core regarding units of measurement for
resolution on touch events.
* tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: document the units for resolution of size axes
Input: goodix - call acpi_device_fix_up_power() in some cases
Input: wm97xx - make .remove() obviously always return 0
Input: usbtouchscreen - add driver_info sanity check
perf trace: Fix SIGSEGV when processing syscall args
On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':
#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319
That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);
As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.
Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 13 Jul 2022 12:34:59 +0000 (15:34 +0300)]
perf tests: Fix Convert perf time to TSC test for hybrid
The test does not always correctly determine the number of events for
hybrids, nor allow for more than 1 evsel when parsing.
Fix by iterating the events actually created and getting the correct
evsel for the events processed.
Fixes: d9da6f70eb235110 ("perf tests: Support 'Convert perf time to TSC' test for hybrid") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20220713123459.24145-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 13 Jul 2022 12:34:58 +0000 (15:34 +0300)]
perf tests: Stop Convert perf time to TSC test opening events twice
Do not call evlist__open() twice.
Fixes: 5bb017d4b97a0f13 ("perf test: Fix error message for test case 71 on s390, where it is not supported") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20220713123459.24145-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/YtQTm9wsB3hxQWvy@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org Link: https://lore.kernel.org/lkml/YtQM40VmiLTkPND2@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:
1b870fa5573e260b ("kvm: stats: tell userspace which values are boolean")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lore.kernel.org/lkml/YtQLDvQrBhJNl3n5@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs reverts from David Sterba:
"Due to a recent report [1] we need to revert the radix tree to xarray
conversion patches.
There's a problem with sleeping under spinlock, when xa_insert could
allocate memory under pressure. We use GFP_NOFS so this is a real
problem that we unfortunately did not discover during review.
I'm sorry to do such change at rc6 time but the revert is IMO the
safer option, there are patches to use mutex instead of the spin locks
but that would need more testing. The revert branch has been tested on
a few setups, all seem ok.
The conversion to xarray will be revisited in the future"
Link: https://lore.kernel.org/linux-btrfs/cover.1657097693.git.fdmanana@suse.com/
* tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Revert "btrfs: turn delayed_nodes_tree into an XArray"
Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"
Revert "btrfs: turn fs_info member buffer_radix into XArray"
Revert "btrfs: turn fs_roots_radix in btrfs_fs_info into an XArray"
Merge tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial driver fixes from Greg KH:
"Here are some TTY and Serial driver fixes for 5.19-rc7. They resolve a
number of reported problems including:
- longtime bug in pty_write() that has been reported in the past.
- 8250 driver fixes
- new serial device ids
- vt overlapping data copy bugfix
- other tiny serial driver bugfixes
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: use new tty_insert_flip_string_and_push_buffer() in pty_write()
tty: extract tty_flip_buffer_commit() from tty_flip_buffer_push()
serial: 8250: dw: Fix the macro RZN1_UART_xDMACR_8_WORD_BURST
vt: fix memory overlapping when deleting chars in the buffer
serial: mvebu-uart: correctly report configured baudrate value
serial: 8250: Fix PM usage_count for console handover
serial: 8250: fix return error code in serial8250_request_std_resource()
serial: stm32: Clear prev values before setting RTS delays
tty: Add N_CAN327 line discipline ID for ELM327 based CAN driver
serial: 8250: Fix __stop_tx() & DMA Tx restart races
serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle
tty: serial: samsung_tty: set dma burst_size to 1
serial: 8250: dw: enable using pdata with ACPI
Merge tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix building of out-of-tree kernel modules without a pre-built kernel
in case CONFIG_EXPOLINE_EXTERN=y.
- Fix a reference counting error that could prevent unloading of zcrypt
modules.
* tag 's390-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ap: fix error handling in __verify_queue_reservations()
s390/nospec: remove unneeded header includes
s390/nospec: build expoline.o for modules_prepare target
Merge tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki
"Fix recent regression in the cpufreq mediatek driver related to
incorrect handling of regulator_get_optional() return value
(AngeloGioacchino Del Regno)"
* tag 'pm-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: mediatek: Handle sram regulator probe deferral
Merge tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix more fallout from recent changes of the ACPI CPPC handling on AMD
platforms (Mario Limonciello)"
* tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
random: cap jitter samples per bit to factor of HZ
Currently the jitter mechanism will require two timer ticks per
iteration, and it requires N iterations per bit. This N is determined
with a small measurement, and if it's too big, it won't waste time with
jitter entropy because it'd take too long or not have sufficient entropy
anyway.
With the current max N of 32, there are large timeouts on systems with a
small CONFIG_HZ. Rather than set that maximum to 32, instead choose a
factor of CONFIG_HZ. In this case, 1/30 seems to yield sane values for
different configurations of CONFIG_HZ.
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Fixes: 78c768e619fb ("random: vary jitter iterations based on cycle counter speed") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid printing a warning when modules do not exercise any
errata-dependent behavior and the SiFive errata are enabled.
- A fix to the Microchip PFSOC to attach the L2 cache to the CPU nodes.
* tag 'riscv-for-linus-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: don't warn for sifive erratas in modules
riscv: dts: microchip: hook up the mpfs' l2cache
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"RISC-V:
- Fix missing PAGE_PFN_MASK
- Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
x86:
- Fix for nested virtualization when TSC scaling is active
- Estimate the size of fastcc subroutines conservatively, avoiding
disastrous underestimation when return thunks are enabled
- Avoid possible use of uninitialized fields of 'struct
kvm_lapic_irq'
Generic:
- Mark as such the boolean values available from the statistics file
descriptors
- Clarify statistics documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: emulate: do not adjust size of fastop and setcc subroutines
KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()
Documentation: kvm: clarify histogram units
kvm: stats: tell userspace which values are boolean
x86/kvm: fix FASTOP_SIZE when return thunks are enabled
KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1
RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests()
riscv: Fix missing PAGE_PFN_MASK
Merge tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A folio locking fixup that Xiubo and David cooperated on, marked for
stable. Most of it is in netfs but I picked it up into ceph tree on
agreement with David"
* tag 'ceph-for-5.19-rc7' of https://github.com/ceph/ceph-client:
netfs: do not unlock and put the folio twice
Merge tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few driver specific fixes, none especially remarkable, plus a
MAINTAINERS file update due to the previous maintainer for the NXP
FSPI driver having left the company"
* tag 'spi-fix-v5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: cadence-quadspi: Remove spi_master_put() in probe failure path
MAINTAINERS: change the NXP FSPI driver maintainer.
spi: amd: Limit max transfer and message size
spi: aspeed: Fix division by zero
spi: aspeed: Add dev_dbg() to dump the spi-mem direct mapping descriptor
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
Revert the xarray conversion, there's a problem with potential
sleep-inside-spinlock [1] when calling xa_insert that triggers GFP_NOFS
allocation. The radix tree used the preloading mechanism to avoid
sleeping but this is not available in xarray.
Conversion from spin lock to mutex is possible but at time of rc6 is
riskier than a clean revert.
Merge tag 'platform-drivers-x86-v5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- Fix brightness key events getting reported twice on some Dells.
Regression caused by recent Panasonic hotkey fixes
- Fix poweroff no longer working on some devices regression caused
by recent poweroff handler rework
- Mark new (in 5.19) Intel IFS driver as broken, because of some
issues surrounding the userspace (sysfs) API which need to be
cleared up
- Some hardware-id / quirk additions"
* tag 'platform-drivers-x86-v5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
ACPI: video: Fix acpi_video_handles_brightness_key_presses()
platform/x86: intel_atomisp2_led: Also turn off the always-on camera LED on the Asus T100TAF
platform/x86/intel/ifs: Mark as BROKEN
platform/x86: asus-wmi: Add key mappings
efi: Fix efi_power_off() not being run before acpi_power_off() when necessary
platform/x86: x86-android-tablets: Fix Lenovo Yoga Tablet 2 830/1050 poweroff again
platform/x86: gigabyte-wmi: add support for B660I AORUS PRO DDR4
platform/x86/amd/pmc: Add new platform support
platform/x86/amd/pmc: Add new acpi id for PMC controller
Merge tag 'drm-fixes-2022-07-15' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is the regular fixes pull for this week. This has a bunch of
amdgpu fixes, major one reverts the buddy allocator until it can be
tested more, otherwise just small ones, then i915 has a bunch of
fixes.
The outstanding firmware regressions reported by phoronix will
hopefully be dealt with ASAP.
amdgpu:
- revert buddy allocator support for now
- DP MST blank screen fix for specific platforms
- MEC firmware check fix for GC 10.3.7
- Deep color fix for DCE
- Fix possible divide by 0
- Coverage blend mode fix
- Fix cursor only commit timestamps
i915:
- Selftest fix
- TTM fix sg_table construction
- Error return fixes
- Fix a performance regression related to waitboost
- Fix GT resets"
* tag 'drm-fixes-2022-07-15' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Ensure valid event timestamp for cursor-only commits
drm/amd/display: correct check of coverage blend mode
drm/amd/pm: Prevent divide by zero
drm/amd/display: Only use depth 36 bpp linebuffers on DCN display engines.
drm/amdkfd: correct the MEC atomic support firmware checking for GC 10.3.7
drm/amd/display: Ignore First MST Sideband Message Return Error
drm/i915/selftests: fix subtraction overflow bug
drm/i915/gem: Look for waitboosting across the whole object prior to individual waits
drm/i915/gt: Serialize TLB invalidates with GT resets
drm/i915/gt: Serialize GRDOM access between multiple engine resets
drm/i915/ttm: fix sg_table construction
drm/i915/selftests: fix a couple IS_ERR() vs NULL tests
drm/i915: Fix vm use-after-free in vma destruction
drm/i915/guc: ADL-N should use the same GuC FW as ADL-S
drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector()
drm/i915/gvt: IS_ERR() vs NULL bug in intel_gvt_update_reg_whitelist()
Revert "drm/amdgpu: add drm buddy support to amdgpu"
Merge tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pyll sysctl fix from Luis Chamberlain:
"Only one fix for sysctl"
* tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
Paolo Bonzini [Fri, 15 Jul 2022 11:34:55 +0000 (07:34 -0400)]
KVM: emulate: do not adjust size of fastop and setcc subroutines
Instead of doing complicated calculations to find the size of the subroutines
(which are even more complicated because they need to be stringified into
an asm statement), just hardcode to 16.
It is less dense for a few combinations of IBT/SLS/retbleed, but it has
the advantage of being really simple.
Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0bc1: x86/kvm: fix FASTOP_SIZE when return thunks are enabled Cc: stable@vger.kernel.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>