Linus Torvalds [Wed, 11 Nov 2020 22:15:06 +0000 (14:15 -0800)]
Merge branch 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"Two tiny fixes for issues that make drivers under Xen unhappy under
certain conditions"
* 'stable/for-linus-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Linus Torvalds [Tue, 10 Nov 2020 18:33:55 +0000 (10:33 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull core dump fix from Al Viro:
"Fix for multithreaded coredump playing fast and loose with getting
registers of secondary threads; if a secondary gets caught in the
middle of exit(2), the conditition it will be stopped in for dumper to
examine might be unusual enough for things to go wrong.
Quite a few architectures are fine with that, but some are not."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
don't dump the threads that had been already exiting when zapped.
Linus Torvalds [Tue, 10 Nov 2020 18:07:15 +0000 (10:07 -0800)]
Merge tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A handful of minor fixes and updates:
- handle missing device replace item on mount (syzbot report)
- fix space reservation calculation when finishing relocation
- fix memory leak on error path in ref-verify (debugging feature)
- fix potential overflow during defrag on 32bit arches
- minor code update to silence smatch warning
- minor error message updates"
* tag 'for-5.10-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod
btrfs: dev-replace: fail mount if we don't have replace item with target device
btrfs: scrub: update message regarding read-only status
btrfs: clean up NULL checks in qgroup_unreserve_range()
btrfs: fix min reserved size calculation in merge_reloc_root
btrfs: print the block rsv type when we fail our reservation
btrfs: fix potential overflow in cluster_pages_for_defrag on 32bit arch
Linus Torvalds [Tue, 10 Nov 2020 18:05:37 +0000 (10:05 -0800)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt fix from Eric Biggers:
"Fix a regression where a new WARN_ON() was reachable when using
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32 on ext4, causing xfstest
generic/602 to sometimes fail on ext4"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: remove reachable WARN in fscrypt_setup_iv_ino_lblk_32_key()
Linus Torvalds [Tue, 10 Nov 2020 18:02:31 +0000 (10:02 -0800)]
Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown:
"Update update to version 20.09.30, one kernel side fix"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
powercap: restrict energy meter to root access
tools/power turbostat: harden against cpu hotplug
tools/power turbostat: adjust for temperature offset
tools/power turbostat: Build with _FILE_OFFSET_BITS=64
tools/power turbostat: Support AMD Family 19h
tools/power turbostat: Remove empty columns for Jacobsville
tools/power turbostat: Add a new GFXAMHz column that exposes gt_act_freq_mhz.
tools/power x86_energy_perf_policy: Input/output error in a VM
tools/power turbostat: Skip pc8, pc9, pc10 columns, if they are disabled
tools/power turbostat: Support additional CPU model numbers
tools/power turbostat: Fix output formatting for ACPI CST enumeration
tools/power turbostat: Replace HTTP links with HTTPS ones: TURBOSTAT UTILITY
tools/power turbostat: Use sched_getcpu() instead of hardcoded cpu 0
tools/power turbostat: Enable accumulate RAPL display
tools/power turbostat: Introduce functions to accumulate RAPL consumption
tools/power turbostat: Make the energy variable to be 64 bit
tools/power turbostat: Always print idle in the system configuration header
tools/power turbostat: Print /dev/cpu_dma_latency
Len Brown [Tue, 10 Nov 2020 21:00:00 +0000 (13:00 -0800)]
powercap: restrict energy meter to root access
Remove non-privileged user access to power data contained in
/sys/class/powercap/intel-rapl*/*/energy_uj
Non-privileged users currently have read access to power data and can
use this data to form a security attack. Some privileged
drivers/applications need read access to this data, but don't expose it
to non-privileged users.
For example, thermald uses this data to ensure that power management
works correctly. Thus removing non-privileged access is preferred over
completely disabling this power reporting capability with
CONFIG_INTEL_RAPL=n.
Fixes: 95677a9a3847 ("PowerCap: Fix mode for energy counter") Signed-off-by: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org
Linus Torvalds [Mon, 9 Nov 2020 21:58:10 +0000 (13:58 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- fix compilation error when PMD and PUD are folded
- fix regression in reads-as-zero behaviour of ID_AA64ZFR0_EL1
- add aarch64 get-reg-list test
x86:
- fix semantic conflict between two series merged for 5.10
- fix (and test) enforcement of paravirtual cpuid features
selftests:
- various cleanups to memory management selftests
- new selftests testcase for performance of dirty logging"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
KVM: selftests: allow two iterations of dirty_log_perf_test
KVM: selftests: Introduce the dirty log perf test
KVM: selftests: Make the number of vcpus global
KVM: selftests: Make the per vcpu memory size global
KVM: selftests: Drop pointless vm_create wrapper
KVM: selftests: Add wrfract to common guest code
KVM: selftests: Simplify demand_paging_test with timespec_diff_now
KVM: selftests: Remove address rounding in guest code
KVM: selftests: Factor code out of demand_paging_test
KVM: selftests: Use a single binary for dirty/clear log test
KVM: selftests: Always clear dirty bitmap after iteration
KVM: selftests: Add blessed SVE registers to get-reg-list
KVM: selftests: Add aarch64 get-reg-list test
selftests: kvm: test enforcement of paravirtual cpuid features
selftests: kvm: Add exception handling to selftests
selftests: kvm: Clear uc so UCALL_NONE is being properly reported
selftests: kvm: Fix the segment descriptor layout to match the actual layout
KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
kvm: x86: request masterclock update any time guest uses different msr
kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
...
Linus Torvalds [Mon, 9 Nov 2020 20:43:12 +0000 (12:43 -0800)]
Merge tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"This is mainly server-to-server copy and fallout from Chuck's 5.10 rpc
refactoring"
* tag 'nfsd-5.10-1' of git://linux-nfs.org/~bfields/linux:
net/sunrpc: fix useless comparison in proc_do_xprt()
net/sunrpc: return 0 on attempt to write to "transports"
NFSD: fix missing refcount in nfsd4_copy by nfsd4_do_async_copy
NFSD: Fix use-after-free warning when doing inter-server copy
NFSD: MKNOD should return NFSERR_BADTYPE instead of NFSERR_INVAL
SUNRPC: Fix general protection fault in trace_rpc_xdr_overflow()
NFSD: NFSv3 PATHCONF Reply is improperly formed
Linus Torvalds [Mon, 9 Nov 2020 20:36:58 +0000 (12:36 -0800)]
Merge tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes and cleanups from Ted Ts'o:
"More fixes and cleanups for the new fast_commit features, but also a
few other miscellaneous bug fixes and a cleanup for the MAINTAINERS
file"
* tag 'ext4_for_linus_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (28 commits)
jbd2: fix up sparse warnings in checkpoint code
ext4: fix sparse warnings in fast_commit code
ext4: cleanup fast commit mount options
jbd2: don't start fast commit on aborted journal
ext4: make s_mount_flags modifications atomic
ext4: issue fsdev cache flush before starting fast commit
ext4: disable fast commit with data journalling
ext4: fix inode dirty check in case of fast commits
ext4: remove unnecessary fast commit calls from ext4_file_mmap
ext4: mark buf dirty before submitting fast commit buffer
ext4: fix code documentatioon
ext4: dedpulicate the code to wait on inode that's being committed
jbd2: don't read journal->j_commit_sequence without taking a lock
jbd2: don't touch buffer state until it is filled
jbd2: add todo for a fast commit performance optimization
jbd2: don't pass tid to jbd2_fc_end_commit_fallback()
jbd2: don't use state lock during commit path
jbd2: drop jbd2_fc_init documentation
ext4: clean up the JBD2 API that initializes fast commits
jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
...
Linus Torvalds [Mon, 9 Nov 2020 20:23:01 +0000 (12:23 -0800)]
Merge tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"A week ago, Vladimir reported an issue that the kernel log would
become polluted if the page allocation debug option is enabled. I also
found this when I cleaned up magical page->mapping and originally
planned to submit these all for 5.11 but it seems the impact can be
noticed so submit the fix in advance.
In addition, nl6720 also reported that atime is empty although EROFS
has the only one on-disk timestamp as a practical consideration for
now but it's better to derive it as what we did for the other
timestamps.
Summary:
- fix setting up pcluster improperly for temporary pages
- derive atime instead of leaving it empty"
* tag 'erofs-for-5.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix setting up pcluster for temporary pages
erofs: derive atime instead of leaving it empty
Paolo Bonzini [Mon, 9 Nov 2020 14:45:17 +0000 (09:45 -0500)]
KVM: selftests: allow two iterations of dirty_log_perf_test
Even though one iteration is not enough for the dirty log performance
test (due to the cost of building page tables, zeroing memory etc.)
two is okay and it is the default. Without this patch,
"./dirty_log_perf_test" without any further arguments fails.
Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Dan Carpenter [Fri, 6 Nov 2020 20:50:39 +0000 (15:50 -0500)]
net/sunrpc: fix useless comparison in proc_do_xprt()
In the original code, the "if (*lenp < 0)" check didn't work because
"*lenp" is unsigned. Fortunately, the memory_read_from_buffer() call
will never fail in this context so it doesn't affect runtime.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Sun, 8 Nov 2020 19:30:25 +0000 (11:30 -0800)]
Merge tag 'driver-core-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core documentation fixes from Greg KH:
"Some small Documentation fixes that were fallout from the larger
documentation update we did in 5.10-rc2.
Nothing major here at all, but all of these have been in linux-next
and resolve build warnings when building the documentation files"
* tag 'driver-core-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Documentation: remove mic/index from misc-devices/index.rst
scripts: get_api.pl: Add sub-titles to ABI output
scripts: get_abi.pl: Don't let ABI files to create subtitles
docs: leds: index.rst: add a missing file
docs: ABI: sysfs-class-net: fix a typo
docs: ABI: sysfs-driver-dma-ioatdma: what starts with /sys
Linus Torvalds [Sun, 8 Nov 2020 19:28:08 +0000 (11:28 -0800)]
Merge tag 'tty-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are a small number of small tty and serial fixes for some
reported problems for the tty core, vt code, and some serial drivers.
They include fixes for:
- a buggy and obsolete vt font ioctl removal
- 8250_mtk serial baudrate runtime warnings
- imx serial earlycon build configuration fix
- txx9 serial driver error path cleanup issues
- tty core fix in release_tty that can be triggered by trying to bind
an invalid serial port name to a speakup console device
Almost all of these have been in linux-next without any problems, the
only one that hasn't, just deletes code :)"
* tag 'tty-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt: Disable KD_FONT_OP_COPY
tty: fix crash in release_tty if tty->port is not set
serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init
tty: serial: imx: enable earlycon by default if IMX_SERIAL_CONSOLE is enabled
serial: 8250_mtk: Fix uart_get_baud_rate warning
Daniel Vetter [Sun, 8 Nov 2020 15:38:06 +0000 (16:38 +0100)]
vt: Disable KD_FONT_OP_COPY
It's buggy:
On Fri, Nov 06, 2020 at 10:30:08PM +0800, Minh Yuan wrote:
> We recently discovered a slab-out-of-bounds read in fbcon in the latest
> kernel ( v5.10-rc2 for now ). The root cause of this vulnerability is that
> "fbcon_do_set_font" did not handle "vc->vc_font.data" and
> "vc->vc_font.height" correctly, and the patch
> <https://lkml.org/lkml/2020/9/27/223> for VT_RESIZEX can't handle this
> issue.
>
> Specifically, we use KD_FONT_OP_SET to set a small font.data for tty6, and
> use KD_FONT_OP_SET again to set a large font.height for tty1. After that,
> we use KD_FONT_OP_COPY to assign tty6's vc_font.data to tty1's vc_font.data
> in "fbcon_do_set_font", while tty1 retains the original larger
> height. Obviously, this will cause an out-of-bounds read, because we can
> access a smaller vc_font.data with a larger vc_font.height.
Further there was only one user ever.
- Android's loadfont, busybox and console-tools only ever use OP_GET
and OP_SET
- fbset documentation only mentions the kernel cmdline font: option,
not anything else.
- systemd used OP_COPY before release 232 published in Nov 2016
Now unfortunately the crucial report seems to have gone down with
gmane, and the commit message doesn't say much. But the pull request
hints at OP_COPY being broken
https://github.com/systemd/systemd/pull/3651
So in other words, this never worked, and the only project which
foolishly every tried to use it, realized that rather quickly too.
Instead of trying to fix security issues here on dead code by adding
missing checks, fix the entire thing by removing the functionality.
Note that systemd code using the OP_COPY function ignored the return
value, so it doesn't matter what we're doing here really - just in
case a lone server somewhere happens to be extremely unlucky and
running an affected old version of systemd. The relevant code from
font_copy_to_all_vcs() in systemd was:
/* copy font from active VT, where the font was uploaded to */
cfo.op = KD_FONT_OP_COPY;
cfo.height = vcs.v_active-1; /* tty1 == index 0 */
(void) ioctl(vcfd, KDFONTOP, &cfo);
Note this just disables the ioctl, garbage collecting the now unused
callbacks is left for -next.
v2: Tetsuo found the old mail, which allowed me to find it on another
archive. Add the link too.
Linus Torvalds [Sun, 8 Nov 2020 18:23:07 +0000 (10:23 -0800)]
Merge tag 'xfs-5.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
- Fix an uninitialized struct problem
- Fix an iomap problem zeroing unwritten EOF blocks
- Fix some clumsy error handling when writeback fails on filesystems
with blocksize < pagesize
- Fix a retry loop not resetting loop variables properly
- Fix scrub flagging rtinherit inodes on a non-rt fs, since the kernel
actually does permit that combination
- Fix excessive page cache flushing when unsharing part of a file
* tag 'xfs-5.10-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only flush the unshared range in xfs_reflink_unshare
xfs: fix scrub flagging rtinherit even if there is no rt device
xfs: fix missing CoW blocks writeback conversion retry
iomap: clean up writeback state logic on writepage error
iomap: support partial page discard on writeback block mapping failure
xfs: flush new eof page on truncate to avoid post-eof corruption
xfs: set xefi_discard when creating a deferred agfl free log intent item
Linus Torvalds [Sun, 8 Nov 2020 18:11:31 +0000 (10:11 -0800)]
Merge branch 'hch' (patches from Christoph)
Merge procfs splice read fixes from Christoph Hellwig:
"Greg reported a problem due to the fact that Android tests use procfs
files to test splice, which stopped working with the changes for
set_fs() removal.
This series adds read_iter support for seq_file, and uses those for
various proc files using seq_file to restore splice read support"
[ Side note: Christoph initially had a scripted "move everything over"
patch, which looks fine, but I personally would prefer us to actively
discourage splice() on random files. So this does just the minimal
basic core set of proc file op conversions.
For completeness, and in case people care, that script was
sed -i -e 's/\.proc_read\(\s*=\s*\)seq_read/\.proc_read_iter\1seq_read_iter/g'
but I'll wait and see if somebody has a strong argument for using
splice on random small /proc files before I'd run it on the whole
kernel. - Linus ]
* emailed patches from Christoph Hellwig <hch@lst.de>:
proc "seq files": switch to ->read_iter
proc "single files": switch to ->read_iter
proc/stat: switch to ->read_iter
proc/cpuinfo: switch to ->read_iter
proc: wire up generic_file_splice_read for iter ops
seq_file: add seq_read_iter
Linus Torvalds [Sun, 8 Nov 2020 18:09:36 +0000 (10:09 -0800)]
Merge tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Use SYM_FUNC_START_WEAK in the mem* ASM functions instead of a
combination of .weak and SYM_FUNC_START_LOCAL which makes LLVMs
integrated assembler upset
- Correct the mitigation selection logic which prevented the related
prctl to work correctly
- Make the UV5 hubless system work correctly by fixing up the
malformed table entries and adding the missing ones"
* tag 'x86-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Recognize UV5 hubless system identifier
x86/platform/uv: Remove spaces from OEM IDs
x86/platform/uv: Fix missing OEM_TABLE_ID
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
x86/lib: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S
Linus Torvalds [Sun, 8 Nov 2020 18:05:10 +0000 (10:05 -0800)]
Merge tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for the perf core plugging a memory leak in the address
filter parser"
* tag 'perf-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix a memory leak in perf_event_parse_addr_filter()
Linus Torvalds [Sun, 8 Nov 2020 17:56:37 +0000 (09:56 -0800)]
Merge tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fix from Thomas Gleixner:
"A single fix for the futex code where an intermediate state in the
underlying RT mutex was not handled correctly and triggering a BUG()
instead of treating it as another variant of retry condition"
* tag 'locking-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Handle transient "ownerless" rtmutex state correctly
Linus Torvalds [Sun, 8 Nov 2020 17:52:57 +0000 (09:52 -0800)]
Merge tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Fix the fallout of the IPI as interrupt conversion in Kconfig and
the BCM2836 interrupt chip driver
- Fixes for interrupt affinity setting and the handling of
hierarchical irq domains in the SiFive PLIC driver
- Make the unmapped event handling in the TI SCI driver work
correctly
- A few minor fixes and cleanups in various chip drivers and Kconfig"
* tag 'irq-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: irqchip: ti, sci-inta: Fix diagram indentation for unmapped events
irqchip/ti-sci-inta: Add support for unmapped event handling
dt-bindings: irqchip: ti, sci-inta: Update for unmapped event handling
irqchip/renesas-intc-irqpin: Merge irlm_bit and needs_irlm
irqchip/sifive-plic: Fix chip_data access within a hierarchy
irqchip/sifive-plic: Fix broken irq_set_affinity() callback
irqchip/stm32-exti: Add all LP timer exti direct events support
irqchip/bcm2836: Fix missing __init annotation
irqchip/mips: Drop selection of IRQ_DOMAIN_HIERARCHY
irqchip/mst: Make mst_intc_of_init static
irqchip/mst: MST_IRQ should depend on ARCH_MEDIATEK or ARCH_MSTARV7
genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
Linus Torvalds [Sun, 8 Nov 2020 17:51:28 +0000 (09:51 -0800)]
Merge tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull entry code fix from Thomas Gleixner:
"A single fix for the generic entry code to correct the wrong
assumption that the lockdep interrupt state needs not to be
established before calling the RCU check"
* tag 'core-urgent-2020-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Fix the incorrect ordering of lockdep and RCU check
Linus Torvalds [Sun, 8 Nov 2020 17:37:20 +0000 (09:37 -0800)]
Merge tag 'powerpc-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- fix miscompilation with GCC 4.9 by using asm_goto_volatile for put_user()
- fix for an RCU splat at boot caused by a recent lockdep change
- fix for a possible deadlock in our EEH debugfs code
- several fixes for handling of _PAGE_ACCESSED on 32-bit platforms
- build fix when CONFIG_NUMA=n
Thanks to Andreas Schwab, Christophe Leroy, Oliver O'Halloran, Qian Cai,
and Scott Cheloha.
* tag 'powerpc-5.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/numa: Fix build when CONFIG_NUMA=n
powerpc/8xx: Manage _PAGE_ACCESSED through APG bits in L1 entry
powerpc/8xx: Always fault when _PAGE_ACCESSED is not set
powerpc/40x: Always fault when _PAGE_ACCESSED is not set
powerpc/603: Always fault when _PAGE_ACCESSED is not set
powerpc: Use asm_goto_volatile for put_user()
powerpc/smp: Call rcu_cpu_starting() earlier
powerpc/eeh_cache: Fix a possible debugfs deadlock
Ben Gardon [Tue, 27 Oct 2020 23:37:33 +0000 (16:37 -0700)]
KVM: selftests: Introduce the dirty log perf test
The dirty log perf test will time verious dirty logging operations
(enabling dirty logging, dirtying memory, getting the dirty log,
clearing the dirty log, and disabling dirty logging) in order to
quantify dirty logging performance. This test can be used to inform
future performance improvements to KVM's dirty logging infrastructure.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-6-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Wed, 4 Nov 2020 21:23:53 +0000 (22:23 +0100)]
KVM: selftests: Make the number of vcpus global
We also check the input number of vcpus against the maximum supported.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201104212357.171559-8-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Wed, 4 Nov 2020 21:23:52 +0000 (22:23 +0100)]
KVM: selftests: Make the per vcpu memory size global
Rename vcpu_memory_bytes to something with "percpu" in it
in order to be less ambiguous. Also make it global to
simplify things.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201104212357.171559-7-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Wed, 4 Nov 2020 21:23:48 +0000 (22:23 +0100)]
KVM: selftests: Drop pointless vm_create wrapper
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201104212357.171559-3-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Tue, 27 Oct 2020 23:37:32 +0000 (16:37 -0700)]
KVM: selftests: Add wrfract to common guest code
Wrfract will be used by the dirty logging perf test introduced later in
this series to dirty memory sparsely.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-5-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Tue, 27 Oct 2020 23:37:31 +0000 (16:37 -0700)]
KVM: selftests: Simplify demand_paging_test with timespec_diff_now
Add a helper function to get the current time and return the time since
a given start time. Use that function to simplify the timekeeping in the
demand paging test.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-4-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Tue, 27 Oct 2020 23:37:30 +0000 (16:37 -0700)]
KVM: selftests: Remove address rounding in guest code
Rounding the address the guest writes to a host page boundary
will only have an effect if the host page size is larger than the guest
page size, but in that case the guest write would still go to the same
host page. There's no reason to round the address down, so remove the
rounding to simplify the demand paging test.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-3-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Ben Gardon [Tue, 27 Oct 2020 23:37:29 +0000 (16:37 -0700)]
KVM: selftests: Factor code out of demand_paging_test
Much of the code in demand_paging_test can be reused by other, similar
multi-vCPU-memory-touching-perfromance-tests. Factor that common code
out for reuse.
No functional change expected.
This series was tested by running the following invocations on an Intel
Skylake machine:
dirty_log_perf_test -b 20m -i 100 -v 64
dirty_log_perf_test -b 20g -i 5 -v 4
dirty_log_perf_test -b 4g -i 5 -v 32
demand_paging_test -b 20m -v 64
demand_paging_test -b 20g -v 4
demand_paging_test -b 4g -v 32
All behaved as expected.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201027233733.1484855-2-bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Xu [Thu, 1 Oct 2020 01:22:33 +0000 (21:22 -0400)]
KVM: selftests: Use a single binary for dirty/clear log test
Remove the clear_dirty_log test, instead merge it into the existing
dirty_log_test. It should be cleaner to use this single binary to do
both tests, also it's a preparation for the upcoming dirty ring test.
The default behavior will run all the modes in sequence.
Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012233.6013-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Xu [Thu, 1 Oct 2020 01:22:28 +0000 (21:22 -0400)]
KVM: selftests: Always clear dirty bitmap after iteration
We used not to clear the dirty bitmap before because KVM_GET_DIRTY_LOG
would overwrite it the next time it copies the dirty log onto it.
In the upcoming dirty ring tests we'll start to fetch dirty pages from
a ring buffer, so no one is going to clear the dirty bitmap for us.
Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201001012228.5916-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Thu, 29 Oct 2020 20:17:03 +0000 (21:17 +0100)]
KVM: selftests: Add blessed SVE registers to get-reg-list
Add support for the SVE registers to get-reg-list and create a
new test, get-reg-list-sve, which tests them when running on a
machine with SVE support.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-5-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Jones [Thu, 29 Oct 2020 20:17:01 +0000 (21:17 +0100)]
KVM: selftests: Add aarch64 get-reg-list test
Check for KVM_GET_REG_LIST regressions. The blessed list was
created by running on v4.15 with the --core-reg-fixup option.
The following script was also used in order to annotate system
registers with their names when possible. When new system
registers are added the names can just be added manually using
the same grep.
while read reg; do
if [[ ! $reg =~ ARM64_SYS_REG ]]; then
printf "\t$reg\n"
continue
fi
encoding=$(echo "$reg" | sed "s/ARM64_SYS_REG(//;s/),//")
if ! name=$(grep "$encoding" ../../../../arch/arm64/include/asm/sysreg.h); then
printf "\t$reg\n"
continue
fi
name=$(echo "$name" | sed "s/.*SYS_//;s/[\t ]*sys_reg($encoding)$//")
printf "\t$reg\t/* $name */\n"
done < <(aarch64/get-reg-list --core-reg-fixup --list)
Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20201029201703.102716-3-drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Tue, 27 Oct 2020 23:10:44 +0000 (16:10 -0700)]
selftests: kvm: test enforcement of paravirtual cpuid features
Add a set of tests that ensure the guest cannot access paravirtual msrs
and hypercalls that have been disabled in the KVM_CPUID_FEATURES leaf.
Expect a #GP in the case of msr accesses and -KVM_ENOSYS from
hypercalls.
Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20201027231044.655110-7-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Lewis [Mon, 12 Oct 2020 19:47:15 +0000 (12:47 -0700)]
selftests: kvm: Add exception handling to selftests
Add the infrastructure needed to enable exception handling in selftests.
This allows any of the exception and interrupt vectors to be overridden
in the guest.
Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20201012194716.3950330-4-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Lewis [Mon, 12 Oct 2020 19:47:14 +0000 (12:47 -0700)]
selftests: kvm: Clear uc so UCALL_NONE is being properly reported
Ensure the out value 'uc' in get_ucall() is properly reporting
UCALL_NONE if the call fails. The return value will be correctly
reported, however, the out parameter 'uc' will not be. Clear the struct
to ensure the correct value is being reported in the out parameter.
Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20201012194716.3950330-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Aaron Lewis [Mon, 12 Oct 2020 19:47:13 +0000 (12:47 -0700)]
selftests: kvm: Fix the segment descriptor layout to match the actual layout
Fix the layout of 'struct desc64' to match the layout described in the
SDM Vol 3, Chapter 3 "Protected-Mode Memory Management", section 3.4.5
"Segment Descriptors", Figure 3-8 "Segment Descriptor". The test added
later in this series relies on this and crashes if this layout is not
correct.
Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Alexander Graf <graf@amazon.com>
Message-Id: <20201012194716.3950330-2-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pankaj Gupta [Thu, 5 Nov 2020 15:39:32 +0000 (16:39 +0100)]
KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:
Oliver Upton [Tue, 27 Oct 2020 23:10:43 +0000 (16:10 -0700)]
kvm: x86: request masterclock update any time guest uses different msr
Commit 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.
Fixes: 5b9bb0ebbcdc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21) Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-6-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Tue, 27 Oct 2020 23:10:42 +0000 (16:10 -0700)]
kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.
Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Oliver Upton [Tue, 27 Oct 2020 23:10:41 +0000 (16:10 -0700)]
kvm: x86: reads of restricted pv msrs should also result in #GP
commit 66570e966dd9 ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.
Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID") Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-4-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Sun, 1 Nov 2020 11:55:23 +0000 (13:55 +0200)]
KVM: x86: use positive error values for msr emulation that causes #GP
Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.
This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.
Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.
Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace") Reported-by: Qian Cai <cai@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Li RongQing [Sun, 27 Sep 2020 08:44:57 +0000 (16:44 +0800)]
KVM: x86/mmu: fix counting of rmap entries in pte_list_add
Fix an off-by-one style bug in pte_list_add() where it failed to
account the last full set of SPTEs, i.e. when desc->sptes is full
and desc->more is NULL.
Merge the two "PTE_LIST_EXT-1" checks as part of the fix to avoid
an extra comparison.
Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1601196297-24104-1-git-send-email-lirongqing@baidu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Sat, 7 Nov 2020 21:56:07 +0000 (13:56 -0800)]
Merge tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph:
- revert a nvme_queue size optimization (Keith Bush)
- fabrics timeout races fixes (Chao Leng and Sagi Grimberg)"
- null_blk zone locking fix (Damien)
* tag 'block-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
null_blk: Fix scheduling in atomic with zoned mode
nvme-tcp: avoid repeated request completion
nvme-rdma: avoid repeated request completion
nvme-tcp: avoid race between time out and tear down
nvme-rdma: avoid race between time out and tear down
nvme: introduce nvme_sync_io_queues
Revert "nvme-pci: remove last_sq_tail"
Linus Torvalds [Sat, 7 Nov 2020 21:49:24 +0000 (13:49 -0800)]
Merge tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A set of fixes for io_uring:
- SQPOLL cancelation fixes
- Two fixes for the io_identity COW
- Cancelation overflow fix (Pavel)
- Drain request cancelation fix (Pavel)
- Link timeout race fix (Pavel)"
* tag 'io_uring-5.10-2020-11-07' of git://git.kernel.dk/linux-block:
io_uring: fix link lookup racing with link timeout
io_uring: use correct pointer for io_uring_show_cred()
io_uring: don't forget to task-cancel drained reqs
io_uring: fix overflowed cancel w/ linked ->files
io_uring: drop req/tctx io_identity separately
io_uring: ensure consistent view of original task ->mm from SQPOLL
io_uring: properly handle SQPOLL request cancelations
io-wq: cancel request if it's asking for files and we don't have them
Mike Galbraith [Wed, 4 Nov 2020 15:12:44 +0000 (16:12 +0100)]
futex: Handle transient "ownerless" rtmutex state correctly
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
Linus Torvalds [Sat, 7 Nov 2020 19:24:03 +0000 (11:24 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Driver bugfixes for I2C.
Most of them are for the new mlxbf driver which got more exposure
after rc1. The sh_mobile patch should already have reached you during
the merge window, but I accidently dropped it. However, since it fixes
a problem with rebooting, it is still fine for rc3"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: designware: slave should do WRITE_REQUESTED before WRITE_RECEIVED
i2c: designware: call i2c_dw_read_clear_intrbits_slave() once
i2c: mlxbf: I2C_MLXBF should depend on MELLANOX_PLATFORM
i2c: mlxbf: Update author and maintainer email info
i2c: mlxbf: Update reference clock frequency
i2c: mlxbf: Remove unecessary wrapper functions
i2c: mlxbf: Fix resrticted cast warning of sparse
i2c: mlxbf: Add CONFIG_ACPI to guard ACPI function call
i2c: sh_mobile: implement atomic transfers
i2c: mediatek: move dma reset before i2c reset
Linus Torvalds [Sat, 7 Nov 2020 19:16:37 +0000 (11:16 -0800)]
Merge tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- SPDX comment style fix
- ignore memory that is unusable
- avoid setting a kernel text offset for the !MMU kernels, where
skipping the first page of memory is both unnecessary and costly
- avoid passing the flag bits in satp to pfn_to_virt()
- fix __put_kernel_nofault, where we had the arguments to
__put_user_nocheck reversed
- workaround for a bug in the FU540 to avoid triggering PMP issues
during early boot
- change to how we pull symbols out of the vDSO. The old mechanism was
removed from binutils-2.35 (and has been backported to Debian's 2.34)
* tag 'riscv-for-linus-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: Fix the VDSO symbol generaton for binutils-2.35+
RISC-V: Use non-PGD mappings for early DTB access
riscv: uaccess: fix __put_kernel_nofault()
riscv: fix pfn_to_virt err in do_page_fault().
riscv: Set text_offset correctly for M-Mode
RISC-V: Remove any memblock representing unusable memory area
risc-v: kernel: ftrace: Fixes improper SPDX comment style
Mike Travis [Thu, 5 Nov 2020 22:27:40 +0000 (16:27 -0600)]
x86/platform/uv: Remove spaces from OEM IDs
Testing shows that trailing spaces caused problems with the OEM_ID and
the OEM_TABLE_ID. One being that the OEM_ID would not string compare
correctly. Another the OEM_ID and OEM_TABLE_ID would be concatenated
in the printout. Remove any trailing spaces.
Mike Travis [Thu, 5 Nov 2020 22:27:39 +0000 (16:27 -0600)]
x86/platform/uv: Fix missing OEM_TABLE_ID
Testing shows a problem in that the OEM_TABLE_ID was missing for
hubless systems. This is used to determine the APIC type (legacy or
extended). Add the OEM_TABLE_ID to the early hubless processing.
Theodore Ts'o [Sat, 7 Nov 2020 05:00:49 +0000 (00:00 -0500)]
jbd2: fix up sparse warnings in checkpoint code
Add missing __acquires() and __releases() annotations. Also, in an
"this should never happen" WARN_ON check, if it *does* actually
happen, we need to release j_state_lock since this function is always
supposed to release that lock. Otherwise, things will quickly grind
to a halt after the WARN_ON trips.
Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Drop no_fc mount option that disable fast commit even if it was
enabled at mkfs time. Move fc_debug_force mount option under ifdef
EXT4_DEBUG to annotate that this is strictly for debugging and testing
purposes and should not be used in production.
Fast commit file system states are recorded in
sbi->s_mount_flags. Fast commit expects these bit manipulations to be
atomic. This patch adds helpers to make those modifications atomic.
ext4: fix inode dirty check in case of fast commits
In case of fast commits, determine if the inode is dirty by checking
if the inode is on fast commit list. This also helps us get rid of
ext4_inode_info.i_fc_committed_subtid field.
Reported-by: Andrea Righi <andrea.righi@canonical.com> Tested-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-18-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fast commit buffers should be filled in before toucing their
state. Remove code that sets buffer state as dirty before the buffer
is passed to the file system.
jbd2: add todo for a fast commit performance optimization
Fast commit performance can be optimized if commit thread doesn't wait
for ongoing fast commits to complete until the transaction enters
T_FLUSH state. Document this optimization.
Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during
commit path. Since today we allow only one process to perform a fast
commit, there is no need take state lock before accessing these
variables. This patch removes these locks and adds comments to
describe this.
ext4: clean up the JBD2 API that initializes fast commits
This patch removes jbd2_fc_init() API and its related functions to
simplify enabling fast commits. With this change, the number of fast
commit blocks to use is solely determined by the JBD2 layer. So, we
move the default value for minimum number of fast commit blocks from
ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to
use fast commits is determined by the file system. The file system
just sets the fast commit feature using
jbd2_journal_set_features(). JBD2 layer then determines how many
blocks to use for fast commits (based on the value found in the JBD2
superblock).
Note that the JBD2 feature flag of fast commits is just an indication
that there are fast commit blocks present on disk. It doesn't tell
JBD2 layer about the intent of the file system of whether to it wants
to use fast commit or not. That's why, we blindly clear the fast
commit flag in journal_reset() after the recovery is done.
jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs
The on-disk superblock field sb->s_maxlen represents the total size of
the journal including the fast commit area and is no more the max
number of blocks available for a transaction. The maximum number of
blocks available to a transaction is reduced by the number of fast
commit blocks. So, this patch renames j_maxlen to j_total_len to
better represent its intent. Also, it adds a function to calculate max
number of bufs available for a transaction.
Firstly, pass handle to all ext4_fc_track_* functions and use
transaction id found in handle->h_transaction->h_tid for tracking fast
commit updates. Secondly, don't pass inode to
ext4_fc_track_link/create/unlink functions. inode can be found inside
these functions as d_inode(dentry). However, rename path is an
exeception. That's because in that case, we need inode that's not same
as d_inode(dentry). To handle that, add a couple of low-level wrapper
functions that take inode and dentry as arguments.
ext4_fc_track_range() should only be called when blocks are added or
removed from an inode. So, the only places from where we need to call
this function are ext4_map_blocks(), punch hole, collapse / zero
range, truncate. Remove all the other redundant calls to ths function.
ext4: mark fc ineligible if inode gets evictied due to mem pressure
If inode gets evicted due to memory pressure, we have to remove it
from the fast commit list. However, that inode may have uncommitted
changes that fast commits will lose. So, just fall back to full
commits in this case. Also, rename the fast commit ineligiblity reason
from "EXT4_FC_REASON_MEM" to "EXT4_FC_REASON_MEM_NOMEM" for better
expression.
Fast commit feature has flags in the file system as well in JBD2. The
meaning of fast commit feature flags can get confusing. Update docs
and code to add more documentation about it.
Joseph Qi [Tue, 3 Nov 2020 02:29:02 +0000 (10:29 +0800)]
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
It takes xattr_sem to check inline data again but without unlock it
in case not have. So unlock it before return.
Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
Dan Carpenter [Fri, 30 Oct 2020 11:46:20 +0000 (14:46 +0300)]
ext4: silence an uninitialized variable warning
Smatch complains that "i" can be uninitialized if we don't enter the
loop. I don't know if it's possible but we may as well silence this
warning.
[ Initialize i to sb->s_blocksize instead of 0. The only way the for
loop could be skipped entirely is the in-memory data structures, in
particular the bh->b_data for the on-disk superblock has gotten
corrupted enough that calculated value of group is >= to
ext4_get_groups_count(sb). In that case, we want to exit
immediately without allocating a block. -- TYT ]
Fixes: 8016e29f4362 ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201030114620.GB3251003@mwanda Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
Kaixu Xia [Thu, 29 Oct 2020 15:46:36 +0000 (23:46 +0800)]
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
The macro MOPT_Q is used to indicates the mount option is related to
quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is
disabled. Normally the quota options are handled explicitly, so it
didn't matter that the MOPT_STRING flag was missing, even though the
usrjquota and grpjquota mount options take a string argument. It's
important that's present in the !CONFIG_QUOTA case, since without
MOPT_STRING, the mount option matcher will match usrjquota= followed
by an integer, and will otherwise skip the table entry, and so "mount
option not supported" error message is never reported.
[ Fixed up the commit description to better explain why the fix
works. --TYT ]
Linus Torvalds [Fri, 6 Nov 2020 23:46:39 +0000 (15:46 -0800)]
Merge tag 'ceph-for-5.10-rc3' of git://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a potential stall on umount caused by the MDS dropping our
REQUEST_CLOSE message. The code that handled this case was
inadvertently disabled in 5.9, this patch removes it entirely and
fixes the problem in a way that is consistent with ceph-fuse"
* tag 'ceph-for-5.10-rc3' of git://github.com/ceph/ceph-client:
ceph: check session state after bumping session->s_seq
Linus Torvalds [Fri, 6 Nov 2020 23:42:42 +0000 (15:42 -0800)]
Merge tag 'linux-kselftest-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Fixes to the ftrace test and several fixes from Tommi Rantala for
various other tests"
* tag 'linux-kselftest-fixes-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: binderfs: use SKIP instead of XFAIL
selftests: clone3: use SKIP instead of XFAIL
selftests: core: use SKIP instead of XFAIL in close_range_test.c
selftests: proc: fix warning: _GNU_SOURCE redefined
selftests: pidfd: drop needless linux/kcmp.h inclusion in pidfd_setns_test.c
selftests: pidfd: add CONFIG_CHECKPOINT_RESTORE=y to config
selftests: pidfd: skip test on kcmp() ENOSYS
selftests: pidfd: use ksft_test_result_skip() when skipping test
selftests/harness: prettify SKIP message whitespace again
selftests: pidfd: fix compilation errors due to wait.h
selftests: filter kselftest headers from command in lib.mk
selftests/ftrace: check for do_sys_openat2 in user-memory test
selftests/ftrace: Use $FUNCTION_FORK to reference kernel fork function
Linus Torvalds [Fri, 6 Nov 2020 23:24:12 +0000 (15:24 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three driver fixes. Two (alua and hpsa) are in hard to trigger
attach/detach situations but the mp3sas one involves a polled to
interrupt switch over that could trigger in any high IOPS situation"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Fix timeouts observed while reenabling IRQ
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
scsi: hpsa: Fix memory leak in hpsa_init_one()
Linus Torvalds [Fri, 6 Nov 2020 21:08:25 +0000 (13:08 -0800)]
Merge branch 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal.
* 'mtd/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: stm32_fmc2: fix broken ECC
mtd: spi-nor: Fix address width on flash chips > 16MB
mtd: spi-nor: Don't copy self-pointing struct around
mtd: rawnand: ifc: Move the ECC engine initialization to the right place
mtd: rawnand: mxc: Move the ECC engine initialization to the right place
Linus Torvalds [Fri, 6 Nov 2020 21:05:21 +0000 (13:05 -0800)]
Merge tag 'spi-fix-v5.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"This is an additional fix on top of 5e31ba0c0543 ('spi: bcm2835: fix
gpio cs level inversion') - when sending my prior pull request I had
misremembred the status of that patch, apologies for the noise here"
* tag 'spi-fix-v5.10-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: bcm2835: remove use of uninitialized gpio flags variable
Linus Torvalds [Fri, 6 Nov 2020 20:58:11 +0000 (12:58 -0800)]
Merge tag 'sound-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Quite a bunch of small fixes that have been gathered since the last
pull, including changes like below:
- HD-audio runtime PM fixes and refactoring
- HD-audio and USB-audio quirks
- SOF warning fix
- Various ASoC device-specific fixes for Intel, Qualcomm, etc"
* tag 'sound-5.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (26 commits)
ALSA: usb-audio: Add implicit feedback quirk for Qu-16
ASoC: mchp-spdiftx: Do not set Validity bit(s)
ALSA: usb-audio: Add implicit feedback quirk for MODX
ALSA: usb-audio: add usb vendor id as DSD-capable for Khadas devices
ALSA: hda/realtek - Enable headphone for ASUS TM420
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
ASoC: qcom: lpass-cpu: Fix clock disable failure
ASoC: qcom: lpass-sc7180: Fix MI2S bitwidth field bit positions
ASoC: codecs: wcd9335: Set digital gain range correctly
ASoC: codecs: wcd934x: Set digital gain range correctly
ALSA: hda: Reinstate runtime_allow() for all hda controllers
ALSA: hda: Separate runtime and system suspend
ALSA: hda: Refactor codec PM to use direct-complete optimization
ALSA: hda/realtek - Fixed HP headset Mic can't be detected
ALSA: usb-audio: Add implicit feedback quirk for Zoom UAC-2
ALSA: make snd_kcontrol_new name a normal string
ALSA: fix kernel-doc markups
ASoC: SOF: loader: handle all SOF_IPC_EXT types
ASoC: cs42l51: manage mclk shutdown delay
ASoC: qcom: sdm845: set driver name correctly
...
Dan Carpenter [Fri, 6 Nov 2020 20:39:50 +0000 (15:39 -0500)]
net/sunrpc: return 0 on attempt to write to "transports"
You can't write to this file because the permissions are 0444. But
it sort of looked like you could do a write and it would result in
a read. Then it looked like proc_sys_call_handler() just ignored
it. Which is confusing. It's more clear if the "write" just
returns zero.
Also, the "lenp" pointer is never NULL so that check can be removed.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Fri, 6 Nov 2020 20:54:00 +0000 (12:54 -0800)]
Merge tag 'drm-fixes-2020-11-06-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"It's Friday here so that means another installment of drm fixes to
distract you from the counting process.
Changes all over the place, the amdgpu changes contain support for a
new GPU that is close to current one already in the tree (Green
Sardine) so it shouldn't have much side effects.
Otherwise imx has a few cleanup patches and fixes, amdgpu and i915
have around the usual smattering of fixes, fonts got constified, and
vc4/panfrost has some minor fixes. All in all a fairly regular rc3.
We have an outstanding nouveau regression, but the author is looking
into the fix, so should be here next week.
I now return you to counting.
fonts:
- constify font structures.
MAINTAINERS:
- Fix path for amdgpu power management
amdgpu:
- Add support for more navi1x SKUs
- Fix for suspend on CI dGPUs
- VCN DPG fix for Picasso
- Sienna Cichlid fixes
- Polaris DPM fix
- Add support for Green Sardine
amdkfd:
- Fix an allocation failure check
i915:
- Fix set domain's cache coherency
- Fixes around breadcrumbs
- Fix encoder lookup during PSR atomic
- Hold onto an explicit ref to i915_vma_work.pinned
- gvt: HWSP reset handling fix
- gvt: flush workaround
- gvt: vGPU context pin/unpin
- gvt: mmio cmd access fix for bxt/apl
imx:
- drop unused functions and callbacks
- reuse imx_drm_encoder_parse_of
- spinlock rework
- memory leak fix
- minor cleanups
vc4:
- resource cleanup fix
panfrost:
- madvise/shrinker fix"
* tag 'drm-fixes-2020-11-06-1' of git://anongit.freedesktop.org/drm/drm: (55 commits)
drm/amdgpu/display: remove DRM_AMD_DC_GREEN_SARDINE
drm/amd/display: Add green_sardine support to DM
drm/amd/display: Add green_sardine support to DC
drm/amdgpu: enable vcn support for green_sardine (v2)
drm/amdgpu: enable green_sardine_asd.bin loading (v2)
drm/amdgpu/sdma: add sdma engine support for green_sardine (v2)
drm/amdgpu: add gfx support for green_sardine (v2)
drm/amdgpu: add soc15 common ip block support for green_sardine (v3)
drm/amdgpu: add green_sardine support for gpu_info and ip block setting (v2)
drm/amdgpu: add Green_Sardine APU flag
drm/amdgpu: resolved ASD loading issue on sienna
amdkfd: Check kvmalloc return before memcpy
drm/amdgpu: update golden setting for sienna_cichlid
amd/amdgpu: Disable VCN DPG mode for Picasso
drm/amdgpu/swsmu: remove duplicate call to smu_set_default_dpm_table
drm/i915: Hold onto an explicit ref to i915_vma_work.pinned
drm/i915/gt: Flush xcs before tgl breadcrumbs
drm/i915/gt: Expose more parameters for emitting writes into the ring
drm/i915: Fix encoder lookup during PSR atomic check
drm/i915/gt: Use the local HWSP offset during submission
...
Linus Torvalds [Fri, 6 Nov 2020 20:51:29 +0000 (12:51 -0800)]
Merge tag 'tpmdd-next-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two critical tpm driver bug fixes"
* tag 'tpmdd-next-v5.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: efi: Don't create binary_bios_measurements file for an empty log
tpm_tis: Disable interrupts on ThinkPad T490s
Linus Torvalds [Fri, 6 Nov 2020 20:48:19 +0000 (12:48 -0800)]
Merge tag 'iommu-fixes-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Fix a NULL-ptr dereference in the Intel VT-d driver
- Two fixes for Intel SVM support
- Increase IRQ remapping table size in the AMD IOMMU driver. The old
number of 128 turned out to be too low for some recent devices.
- Fix a mask check in generic IOMMU code
* tag 'iommu-fixes-v5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix a check in iommu_check_bind_data()
iommu/vt-d: Fix a bug for PDP check in prq_event_thread
iommu/vt-d: Fix sid not set issue in intel_svm_bind_gpasid()
iommu/vt-d: Fix kernel NULL pointer dereference in find_domain()
iommu/amd: Increase interrupt remapping table limit to 512 entries