Linus Torvalds [Sat, 9 Dec 2023 20:49:22 +0000 (12:49 -0800)]
Merge tag 'tty-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for 6.7-rc4 to resolve some
reported issues. Included in here are:
- pl011 dma support fix
- sc16is7xx driver fix
- ma35d1 console index fix
- 8250 driver fixes for small issues
All of these have been in linux-next with no reported issues"
* tag 'tty-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250_dw: Add ACPI ID for Granite Rapids-D UART
serial: ma35d1: Validate console index before assignment
ARM: PL011: Fix DMA support
serial: sc16is7xx: address RX timeout interrupt errata
serial: 8250: 8250_omap: Clear UART_HAS_RHR_IT_DIS bit
serial: 8250_omap: Add earlycon support for the AM654 UART controller
serial: 8250: 8250_omap: Do not start RX DMA on THRI interrupt
Linus Torvalds [Sat, 9 Dec 2023 20:44:10 +0000 (12:44 -0800)]
Merge tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are some small fixes for 6.7-rc5 for a variety of small driver
subsystems. Included in here are:
- debugfs revert for reported issue
- greybus revert for reported issue
- greybus fixup for endian build warning
- coresight driver fixes
- nvmem driver fixes
- devcoredump fix
- parport new device id
- ndtest build fix
All of these have ben in linux-next with no reported issues"
* tag 'char-misc-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
nvmem: Do not expect fixed layouts to grab a layout driver
parport: Add support for Brainboxes IX/UC/PX parallel cards
Revert "greybus: gb-beagleplay: Ensure le for values in transport"
greybus: gb-beagleplay: Ensure le for values in transport
greybus: BeaglePlay driver needs CRC_CCITT
Revert "debugfs: annotate debugfs handlers vs. removal with lockdep"
devcoredump: Send uevent once devcd is ready
ndtest: fix typo class_regster -> class_register
misc: mei: client.c: fix problem of return '-EOVERFLOW' in mei_cl_write
misc: mei: client.c: return negative error code in mei_cl_write
mei: pxp: fix mei_pxp_send_message return value
coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base
coresight: ultrasoc-smb: Config SMB buffer before register sink
coresight: ultrasoc-smb: Fix sleep while close preempt in enable_smb
Documentation: coresight: fix `make refcheckdocs` warning
hwtracing: hisi_ptt: Don't try to attach a task
hwtracing: hisi_ptt: Handle the interrupt in hardirq context
hwtracing: hisi_ptt: Add dummy callback pmu::read()
coresight: Fix crash when Perf and sysfs modes are used concurrently
coresight: etm4x: Remove bogous __exit annotation for some functions
Linus Torvalds [Sat, 9 Dec 2023 20:25:56 +0000 (12:25 -0800)]
Merge tag 'loongarch-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Preserve syscall nr across execve(), slightly clean up drdtime(), fix
the Clang built zboot kernel, fix a stack unwinder bug and several bpf
jit bugs"
* tag 'loongarch-fixes-6.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: BPF: Fix unconditional bswap instructions
LoongArch: BPF: Fix sign-extension mov instructions
LoongArch: BPF: Don't sign extend function return value
LoongArch: BPF: Don't sign extend memory load operand
LoongArch: Preserve syscall nr across execve()
LoongArch: Set unwind stack type to unknown rather than set error flag
LoongArch: Slightly clean up drdtime()
LoongArch: Apply dynamic relocations for LLD
Linus Torvalds [Sat, 9 Dec 2023 20:22:20 +0000 (12:22 -0800)]
Merge tag 'mips-fixes_6.7_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- Fixes for broken Loongson firmware
- Fix lockdep splat
- Fix FPU states when creating kernel threads
* tag 'mips-fixes_6.7_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kernel: Clear FPU states when setting up kernel threads
MIPS: Loongson64: Handle more memory types passed from firmware
MIPS: Loongson64: Enable DMA noncoherent support
MIPS: Loongson64: Reserve vgabios memory on boot
mips/smp: Call rcutree_report_cpu_starting() earlier
Linus Torvalds [Sat, 9 Dec 2023 20:16:50 +0000 (12:16 -0800)]
Merge tag 'perf-tools-fixes-for-v6.7-2-2023-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools fixes from Namhyung Kim:
"A random set of small bug fixes including:
- Fix segfault on AmpereOne due to missing default metricgroup name
- Fix segfault on `perf list --json` due to NULL pointer"
* tag 'perf-tools-fixes-for-v6.7-2-2023-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools:
perf list: Fix JSON segfault by setting the used skip_duplicate_pmus callback
perf vendor events arm64: AmpereOne: Add missing DefaultMetricgroupName fields
perf metrics: Avoid segv if default metricgroup isn't set
Linus Torvalds [Sat, 9 Dec 2023 20:10:56 +0000 (12:10 -0800)]
Merge tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"Six smb3 client fixes:
- Fixes for copy_file_range and clone (cache invalidation and file
size), also addresses an xfstest failure
- Fix to return proper error if REMAP_FILE_DEDUP set (also fixes
xfstest generic/304)
- Fix potential null pointer reference with DFS
- Multichannel fix addressing (reverting an earlier patch) some of
the problems with enabling/disabling channels dynamically
Still working on a followon multichannel fix to address another issue
found in reconnect testing that will send next week"
* tag '6.7-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: reconnect worker should take reference on server struct unconditionally
Revert "cifs: reconnect work should have reference on server struct"
cifs: Fix non-availability of dedup breaking generic/304
smb: client: fix potential NULL deref in parse_dfs_referrals()
cifs: Fix flushing, invalidation and file size with FICLONE
cifs: Fix flushing, invalidation and file size with copy_file_range()
We can see that "bswap32: Takes an unsigned 32-bit number in either big-
or little-endian format and returns the equivalent number with the same
bit width but opposite endianness" in BPF Instruction Set Specification,
so it should clear the upper 32 bits in "case 32:" for both BPF_ALU and
BPF_ALU64.
We can see that "Short form of movsx, dst_reg = (s8,s16,s32)src_reg" in
include/linux/filter.h, additionally, for BPF_ALU64 the value of the
destination register is unchanged whereas for BPF_ALU the upper 32 bits
of the destination register are zeroed, so it should clear the upper 32
bits for BPF_ALU.
The pointer returned is later signed-extended to 0xfffffffffc00b480 at
`BPF_JMP | BPF_EXIT`. During BPF prog run, this triggers unhandled page
fault and a kernel panic.
Drop the unnecessary signed-extension on return values like other
architectures do.
The expression `task->cgroups->dfl_cgrp` involves two memory load.
Since the field offset fits in imm12 or imm14, we use ldd or ldptrd
instructions. But both instructions have the side effect that it will
signed-extended the imm operand. Finally, we got the wrong addresses
and panics is inevitable.
Use a generic ldxd instruction to avoid this kind of issues.
ELF_PLAT_INIT() reset regs[11] to 0, so in syscall_exit_to_user_mode()
we later get a wrong syscall nr. This breaks tools like execsnoop since
it relies on execve() tracepoints.
Skip pt_regs::regs[11] reset in ELF_PLAT_INIT() to fix the issue.
Jinyang He [Sat, 9 Dec 2023 07:49:15 +0000 (15:49 +0800)]
LoongArch: Set unwind stack type to unknown rather than set error flag
During unwinding, unwind_done() is used as an end condition. Normally it
unwind to the user stack and then set the stack type to unknown, which
is a normal exit. When something unexpected happens in unwind process
and we cannot unwind anymore, we should set the error flag, and also set
the stack type to unknown to indicate that the unwind process can not
continue. The error flag emphasizes that the unwind process produce an
unexpected error. There is no unexpected things when we unwind the PT_REGS
in the top of IRQ stack and find out that is an user mode PT_REGS. Thus,
we should not set error flag and just set stack type to unknown.
WANG Rui [Sat, 9 Dec 2023 07:49:15 +0000 (15:49 +0800)]
LoongArch: Apply dynamic relocations for LLD
For the following assembly code:
.text
.global func
func:
nop
.data
var:
.dword func
When linked with `-pie`, GNU LD populates the `var` variable with the
pre-relocated value of `func`. However, LLVM LLD does not exhibit the
same behavior. This issue also arises with the `kernel_entry` in arch/
loongarch/kernel/head.S:
The correct kernel entry from the MS-DOS header is crucial for jumping
to vmlinux from zboot. This necessity is why the compressed relocatable
kernel compiled by Clang encounters difficulties in booting.
To address this problem, it is proposed to apply dynamic relocations to
place with `--apply-dynamic-relocs`.
Linus Torvalds [Fri, 8 Dec 2023 20:36:45 +0000 (12:36 -0800)]
Merge tag 'block-6.7-2023-12-08' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Nothing major in here, just miscellanous fixes for MD and NVMe:
- NVMe pull request via Keith:
- Proper nvme ctrl state setting (Keith)
- Passthrough command optimization (Keith)
- Spectre fix (Nitesh)
- Kconfig clarifications (Shin'ichiro)
- Frozen state deadlock fix (Bitao)
- Power setting quirk (Georg)
- MD pull requests via Song:
- 6.7 regresisons with recovery/sync (Yu)
- Reshape fix (David)"
* tag 'block-6.7-2023-12-08' of git://git.kernel.dk/linux:
md: split MD_RECOVERY_NEEDED out of mddev_resume
nvme-pci: Add sleep quirk for Kingston drives
md: fix stopping sync thread
md: don't leave 'MD_RECOVERY_FROZEN' in error path of md_set_readonly()
md: fix missing flush of sync_work
nvme: fix deadlock between reset and scan
nvme: prevent potential spectre v1 gadget
nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions
nvme-ioctl: move capable() admin check to the end
nvme: ensure reset state check ordering
nvme: introduce helper function to get ctrl state
md/raid6: use valid sector values to determine if an I/O should wait on the reshape
Linus Torvalds [Fri, 8 Dec 2023 20:32:38 +0000 (12:32 -0800)]
Merge tag 'io_uring-6.7-2023-12-08' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Two minor fixes for issues introduced in this release cycle, and two
fixes for issues or potential issues that are heading to stable.
One of these ends up disabling passing io_uring file descriptors via
SCM_RIGHTS. There really shouldn't be an overlap between that kind of
historic use case and modern usage of io_uring, which is why this was
deemed appropriate"
* tag 'io_uring-6.7-2023-12-08' of git://git.kernel.dk/linux:
io_uring/af_unix: disable sending io_uring over sockets
io_uring/kbuf: check for buffer list readiness after NULL check
io_uring/kbuf: Fix an NULL vs IS_ERR() bug in io_alloc_pbuf_ring()
io_uring: fix mutex_unlock with unreferenced ctx
Linus Torvalds [Fri, 8 Dec 2023 20:27:11 +0000 (12:27 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Primarily rtrs and irdma fixes:
- Fix uninitialized value in ib_get_eth_speed()
- Fix hns refusing to work if userspace doesn't select the correct
congestion control algorithm
- Several irdma fixes - unreliable Send Queue Drain, use after free,
64k page size bugs, device removal races
- Several rtrs bug fixes - crashes, memory leaks, use after free, bad
credit accounting, bogus WARN_ON
- Typos and a MAINTAINER update"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Avoid free the non-cqp_request scratch
RDMA/irdma: Fix support for 64k pages
RDMA/irdma: Ensure iWarp QP queue memory is OS paged aligned
RDMA/core: Fix umem iterator when PAGE_SIZE is greater then HCA pgsz
RDMA/irdma: Fix UAF in irdma_sc_ccq_get_cqe_info()
RDMA/bnxt_re: Correct module description string
RDMA/rtrs-clt: Remove the warnings for req in_use check
RDMA/rtrs-clt: Fix the max_send_wr setting
RDMA/rtrs-srv: Destroy path files after making sure no IOs in-flight
RDMA/rtrs-srv: Free srv_mr iu only when always_invalidate is true
RDMA/rtrs-srv: Check return values while processing info request
RDMA/rtrs-clt: Start hb after path_up
RDMA/rtrs-srv: Do not unconditionally enable irq
MAINTAINERS: Add Chengchang Tang as Hisilicon RoCE maintainer
RDMA/irdma: Add wait for suspend on SQD
RDMA/irdma: Do not modify to SQD on error
RDMA/hns: Fix unnecessary err return when using invalid congest control algorithm
RDMA/core: Fix uninit-value access in ib_get_eth_speed()
Linus Torvalds [Fri, 8 Dec 2023 19:57:55 +0000 (11:57 -0800)]
Merge tag 'pm-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix cpufreq reference counting in the DTPM (dynamic thermal and power
management) power capping framework (Lukasz Luba)"
* tag 'pm-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: DTPM: Fix missing cpufreq_cpu_put() calls
Linus Torvalds [Fri, 8 Dec 2023 19:54:07 +0000 (11:54 -0800)]
Merge tag 'acpi-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a possible crash on an attempt to free unallocated memory in the
error path of acpi_evaluate_reference() that has been introduced by
one of the recent changes (Rafael Wysocki)"
* tag 'acpi-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: utils: Fix error path in acpi_evaluate_reference()
Linus Torvalds [Fri, 8 Dec 2023 19:29:45 +0000 (11:29 -0800)]
Merge tag 'sound-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This is a typical bump in the middle of its way; we've gathered lots
of fixes (mostly for ASoC) at this time:
- PCM array out-of-bound access fix
- Correction of SOC PCM merge error
- Lots of ASoC SOF Intel updates
- A few ASoC AMD quirks
- More proper timer handling in PCM test module
- HD-audio and USB-audio quirks as usual
- Other device-specific fixes for various ASoC codecs"
* tag 'sound-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (39 commits)
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7
ALSA: pcmtest: stop timer before buffer is released
ALSA: hda/realtek: Add Framework laptop 16 to quirks
ALSA: hda/realtek: add new Framework laptop to quirks
ALSA: pcm: fix out-of-bounds in snd_pcm_state_names
ASoC: qcom: sc8280xp: Limit speaker digital volumes
ASoC: ops: add correct range check for limiting volume
ALSA: hda/realtek: Enable headset on Lenovo M90 Gen5
ALSA: hda/realtek: fix speakers on XPS 9530 (2023)
ALSA: usb-audio: Add Pioneer DJM-450 mixer controls
ASoC: wm_adsp: fix memleak in wm_adsp_buffer_populate
ASoC: da7219: Support low DC impedance headset
ASoC: amd: acp: Add support for a new Huawei Matebook laptop
ALSA: hda/realtek: Apply quirk for ASUS UM3504DA
ASoC: SOF: ipc4-topology: Correct data structures for the GAIN module
ASoC: SOF: ipc4-topology: Correct data structures for the SRC module
ASoC: hdac_hda: Conditionally register dais for HDMI and Analog
ASoC: codecs: lpass-tx-macro: set active_decimator correct default value
ASoC: amd: yc: Fix non-functional mic on ASUS E1504FA
ASoC: amd: yc: Add DMI entry to support System76 Pangolin 13
...
Linus Torvalds [Fri, 8 Dec 2023 19:17:44 +0000 (11:17 -0800)]
Merge tag 'drm-fixes-2023-12-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular weekly fixes, mostly amdgpu and i915 as usual. A couple of
nouveau, panfrost, one core and one bridge Kconfig.
Seems about normal for rc5.
atomic-helpers:
- invoke end_fb_access while owning plane state
i915:
- fix a missing dep for a previous fix
- Relax BXT/GLK DSI transcoder hblank limits
- Fix DP MST .mode_valid_ctx() return values
- Reject DP MST modes that require bigjoiner (as it's not yet
supported on DP MST)
- Fix _intel_dsb_commit() variable type to allow negative values
nouveau:
- document some bits of gsp rm
- flush vmm more on tu102 to avoid hangs
* tag 'drm-fixes-2023-12-08' of git://anongit.freedesktop.org/drm/drm: (27 commits)
drm/exynos: fix a wrong error checking
drm/exynos: fix a potential error pointer dereference
drm/amdgpu: fix buffer funcs setting order on suspend
drm/amdgpu: Avoid querying DRM MGCG status
drm/amdgpu: Update HDP 4.4.2 clock gating flags
drm/amdgpu: Add NULL checks for function pointers
drm/amdgpu: Restrict extended wait to PSP v13.0.6
drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml
drm/amdgpu: optimize the printing order of error data
drm/amdgpu: Update fw version for boot time error query
drm/amd/pm: support new mca smu error code decoding
drm/amd/swsmu: update smu v14_0_0 driver if version and metrics table
drm/amd/display: Fix array-index-out-of-bounds in dml2
drm/amd/display: Add monitor patch for specific eDP
drm/amd/display: Use channel_width = 2 for vram table 3.0
drm/amdgpu: disable MCBP by default
drm/atomic-helpers: Invoke end_fb_access while owning plane state
drm/i915: correct the input parameter on _intel_dsb_commit()
drm/i915/mst: Reject modes that require the bigjoiner
drm/i915/mst: Fix .mode_valid_ctx() return values
...
Armin Wolf [Thu, 7 Dec 2023 21:07:23 +0000 (22:07 +0100)]
hwmon: (corsair-psu) Fix probe when built-in
It seems that when the driver is built-in, the HID bus is
initialized after the driver is loaded, which whould cause
module_hid_driver() to fail.
Fix this by registering the driver after the HID bus using
late_initcall() in accordance with other hwmon HID drivers.
Signed-off-by: Armin Wolf <W_Armin@gmx.de> Link: https://lore.kernel.org/r/20231207210723.222552-1-W_Armin@gmx.de
[groeck: Dropped "compile tested" comment; the patch has been tested
but the tester did not provide a Tested-by: tag] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Fri, 8 Dec 2023 17:03:54 +0000 (09:03 -0800)]
Merge tag 'riscv-for-linus-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A pair of fixes to the new module load-time relocation code
- A fix for hwprobe overflowing on rv32
- A fix for to correctly decode C.SWSP and C.SDSP, which manifests in
misaligned access handling
- A fix for a boot-time shadow call stack initialization ordering issue
- A fix for Andes' errata probing, which was calling
riscv_noncoherent_supported() too late in the boot process and
triggering an oops
* tag 'riscv-for-linus-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: errata: andes: Probe for IOCP only once in boot stage
riscv: Fix SMP when shadow call stacks are enabled
dt-bindings: perf: riscv,pmu: drop unneeded quotes
riscv: fix misaligned access handling of C.SWSP and C.SDSP
RISC-V: hwprobe: Always use u64 for extension bits
Support rv32 ULEB128 test
riscv: Correct type casting in module loading
riscv: Safely remove entries from relocation list
Linus Torvalds [Fri, 8 Dec 2023 16:58:39 +0000 (08:58 -0800)]
Merge tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"Most of the changes are devicetree fixes for NXP, Mediatek, Rockchips
Arm machines as well as Microchip RISC-V, and most of these address
build-time warnings for spec violations and other minor issues. One of
the Mediatek warnings was enabled by default and prevented a clean
build.
The ones that address serious runtime issues are all on the i.MX
platform:
- a boot time panic on imx8qm
- USB hanging under load on imx8
- regressions on the imx93 ethernet phy
Code fixes include a minor error handling for the i.MX PMU driver, and
a number of firmware driver fixes:
- OP-TEE fix for supplicant based device enumeration, and a new sysfs
attribute to needed to fix a race against userspace
- Arm SCMI fix for possible truncation/overflow in the frequency
computations
- Multiple FF-A fixes for the newly added notification support"
* tag 'soc-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (55 commits)
MAINTAINERS: change the S32G2 maintainer's email address.
arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588
ARM: dts: imx28-xea: Pass the 'model' property
ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry
arm64: dts: imx8-apalis: set wifi regulator to always-on
ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
arm64: dts: imx8ulp: update gpio node name to align with register address
arm64: dts: imx93: update gpio node name to align with register address
arm64: dts: imx93: correct mediamix power
arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup
arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells
arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3
arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi
firmware: arm_scmi: Fix possible frequency truncation when using level indexing mode
firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
...
Linus Torvalds [Fri, 8 Dec 2023 16:44:43 +0000 (08:44 -0800)]
Merge tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Snapshot buffer issues:
1. When instances started allowing latency tracers, it uses a
snapshot buffer (another buffer that is not written to but swapped
with the main buffer that is). The snapshot buffer needs to be the
same size as the main buffer. But when the snapshot buffers were
added to instances, the code to make the snapshot equal to the
main buffer still was only doing it for the main buffer and not
the instances.
2. Need to stop the current tracer when resizing the buffers.
Otherwise there can be a race if the tracer decides to make a
snapshot between resizing the main buffer and the snapshot buffer.
3. When a tracer is "stopped" in disables both the main buffer and
the snapshot buffer. This needs to be done for instances and not
only the main buffer, now that instances also have a snapshot
buffer.
- Buffered event for filtering issues:
When filtering is enabled, because events can be dropped often, it is
quicker to copy the event into a temp buffer and write that into the
main buffer if it is not filtered or just drop the event if it is,
than to write the event into the ring buffer and then try to discard
it. This temp buffer is allocated and needs special synchronization
to do so. But there were some issues with that:
1. When disabling the filter and freeing the buffer, a call to all
CPUs is required to stop each per_cpu usage. But the code called
smp_call_function_many() which does not include the current CPU.
If the task is migrated to another CPU when it enables the CPUs
via smp_call_function_many(), it will not enable the one it is
currently on and this causes issues later on. Use
on_each_cpu_mask() instead, which includes the current CPU.
2.When the allocation of the buffered event fails, it can give a
warning. But the buffered event is just an optimization (it's
still OK to write to the ring buffer and free it). Do not WARN in
this case.
3.The freeing of the buffer event requires synchronization. First a
counter is decremented to zero so that no new uses of it will
happen. Then it sets the buffered event to NULL, and finally it
frees the buffered event. There's a synchronize_rcu() between the
counter decrement and the setting the variable to NULL, but only a
smp_wmb() between that and the freeing of the buffer. It is
theoretically possible that a user missed seeing the decrement,
but will use the buffer after it is free. Another
synchronize_rcu() is needed in place of that smp_wmb().
- ring buffer timestamps on 32 bit machines
The ring buffer timestamp on 32 bit machines has to break the 64 bit
number into multiple values as cmpxchg is required on it, and a 64
bit cmpxchg on 32 bit architectures is very slow. The code use to
just use two 32 bit values and make it a 60 bit timestamp where the
other 4 bits were used as counters for synchronization. It later came
known that the timestamp on 32 bit still need all 64 bits in some
cases. So 3 words were created to handle the 64 bits. But issues
arised with this:
1. The synchronization logic still only compared the counter with
the first two, but not with the third number, so the
synchronization could fail unknowingly.
2. A check on discard of an event could race if an event happened
between the discard and updating one of the counters. The counter
needs to be updated (forcing an absolute timestamp and not to use
a delta) before the actual discard happens.
* tag 'trace-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Test last update in 32bit version of __rb_time_read()
ring-buffer: Force absolute timestamp on discard of event
tracing: Fix a possible race when disabling buffered events
tracing: Fix a warning when allocating buffered events fails
tracing: Fix incomplete locking when disabling buffered events
tracing: Disable snapshot buffer when stopping instance tracers
tracing: Stop current tracer when resizing buffer
tracing: Always update snapshot buffer size
Linus Torvalds [Fri, 8 Dec 2023 16:36:23 +0000 (08:36 -0800)]
Merge tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"31 hotfixes. Ten of these address pre-6.6 issues and are marked
cc:stable. The remainder address post-6.6 issues or aren't considered
serious enough to justify backporting"
* tag 'mm-hotfixes-stable-2023-12-07-18-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
scripts/gdb: fix lx-device-list-bus and lx-device-list-class
MAINTAINERS: drop Antti Palosaari
highmem: fix a memory copy problem in memcpy_from_folio
nilfs2: fix missing error check for sb_set_blocksize call
kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
units: add missing header
drivers/base/cpu: crash data showing should depends on KEXEC_CORE
mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
scripts/gdb/tasks: fix lx-ps command error
mm/Kconfig: make userfaultfd a menuconfig
selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
mm/damon/core: copy nr_accesses when splitting region
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
checkstack: fix printed address
mm/memory_hotplug: fix error handling in add_memory_resource()
mm/memory_hotplug: add missing mem_hotplug_lock
.mailmap: add a new address mapping for Chester Lin
...
Arnd Bergmann [Fri, 8 Dec 2023 07:36:17 +0000 (08:36 +0100)]
Merge tag 'v6.7-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes
Devicetree fixes for the 6.7-cycle.
All over the place this time. From adapting the size of the vdec nodes
on rk3328 and rk3399, fixing some wrong pinctrl settings on rk3128 and
the Turing RK1 board, emmc-settings fixes on rk3588 and interrupt-name
mishaps, down to some dt-cleanups.
Also this adds the missing rockchip,rk3588-pmugrf compatible to the soc
grf binding, that I somehow messed up during the pull requests for the
-rc1 . At least with it included the dt-checker is happier.
* tag 'v6.7-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix eMMC Data Strobe PD on rk3588
arm64: dts: rockchip: Fix PCI node addresses on rk3399-gru
arm64: dts: rockchip: drop interrupt-names property from rk3588s dfi
arm64: dts: rockchip: Fix Turing RK1 interrupt pinctrls
ARM: dts: rockchip: Fix sdmmc_pwren's pinmux setting for RK3128
arm64: dts: rockchip: minor whitespace cleanup around '='
ARM: dts: rockchip: minor whitespace cleanup around '='
dt-bindings: soc: rockchip: grf: add rockchip,rk3588-pmugrf
arm64: dts: rockchip: fix rk356x pcie msg interrupt name
arm64: dts: rockchip: Expand reg size of vdec node for RK3399
arm64: dts: rockchip: Expand reg size of vdec node for RK3328
Dave Airlie [Fri, 8 Dec 2023 03:55:29 +0000 (13:55 +1000)]
Merge tag 'exynos-drm-next-for-v6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes
Two fixups
- Fix a potential error pointer dereference by checking the return value
of exynos_drm_crtc_get_by_type() function before accessing to crtc
object.
- Fix a wrong error checking in exynos_drm_dma.c modules, which was reported
by Dan[1]
- netfilter:
- three fixes for crashes on bad admin commands
- xt_owner: fix race accessing sk->sk_socket, TOCTOU null-deref
- nf_tables: fix 'exist' matching on bigendian arches
- leds: netdev: fix RTNL handling to prevent potential deadlock
- eth: tg3: prevent races in error/reset handling
- eth: r8169: fix rtl8125b PAUSE storm when suspended
- eth: r8152: improve reset and surprise removal handling
- eth: hns: fix race between changing features and sending
- eth: nfp: fix sleep in atomic for bonding offload"
* tag 'net-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
net/smc: fix missing byte order conversion in CLC handshake
net: dsa: microchip: provide a list of valid protocols for xmit handler
drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
psample: Require 'CAP_NET_ADMIN' when joining "packets" group
bpf: sockmap, updating the sg structure should also update curr
net: tls, update curr on splice as well
nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
tcp: do not accept ACK of bytes we never sent
selftests/bpf: Add test for early update in prog_array_map_poke_run
bpf: Fix prog_array_map_poke_run map poke update
netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
netfilter: nf_tables: validate family when identifying table via handle
netfilter: nf_tables: bail out on mismatching dynset and set expressions
netfilter: nf_tables: fix 'exist' matching on bigendian arches
netfilter: nft_set_pipapo: skip inactive elements during set walk
netfilter: bpf: fix bad registration on nf_defrag
leds: trigger: netdev: fix RTNL handling to prevent potential deadlock
octeontx2-af: Update Tx link register range
...
Dave Airlie [Fri, 8 Dec 2023 01:00:58 +0000 (11:00 +1000)]
Merge tag 'drm-intel-fixes-2023-12-07' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v6.7-rc5:
- d21a3962d304 ("drm/i915: Call intel_pre_plane_updates() also for pipes
getting enabled") in the previous fixes pull depends on a change that
wasn't included. Pick it up.
- Relax BXT/GLK DSI transcoder hblank limits
- Fix DP MST .mode_valid_ctx() return values
- Reject DP MST modes that require bigjoiner (as it's not yet supported on DP MST)
- Fix _intel_dsb_commit() variable type to allow negative values
Linus Torvalds [Thu, 7 Dec 2023 20:42:40 +0000 (12:42 -0800)]
Merge tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"Just one fix.
Commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
changed how freezing state is recorded which made cgroup_freezing()
disagree with the actual state of the task while thawing triggering a
warning. Fix it by updating cgroup_freezing()"
* tag 'cgroup-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup_freezer: cgroup_freezing: Check if not frozen
ACPI: utils: Fix error path in acpi_evaluate_reference()
If a pointer to an uninitialized struct acpi_handle_list is passed to
acpi_evaluate_reference() and it decides to bail out early, either
because acpi_evaluate_object() fails, or because it produces invalid
data, the handles pointer from the struct acpi_handle_list will be
passed to kfree() and if it is not NULL, the kernel will crash on an
attempt to free unallocated memory.
Address this by moving the "end" label in acpi_evaluate_reference() to
the end of the function, which is sufficient, because no cleanup is
needed in that case.
Fixes: 2e57d10a6591 ("ACPI: utils: Dynamically determine acpi_handle_list size") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Woody Suwalski <terraluna977@gmail.com>
Linus Torvalds [Thu, 7 Dec 2023 20:36:32 +0000 (12:36 -0800)]
Merge tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Just one patch to fix a bug which can crash the kernel if the
housekeeping and wq_unbound_cpu cpumask configuration combination
leaves the latter empty"
* tag 'wq-for-6.7-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Make sure that wq_unbound_cpumask is never empty
Linus Torvalds [Thu, 7 Dec 2023 20:30:54 +0000 (12:30 -0800)]
Merge tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"An incremental fix for the fix introduced during the merge window for
caching of the selector for windowed register ranges. We were
incorrectly leaking an error code in the case where the last selector
accessed was for some reason not cached"
* tag 'regmap-fix-v6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix bogus error on regcache_sync success
Linus Torvalds [Thu, 7 Dec 2023 20:10:55 +0000 (12:10 -0800)]
Merge tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- Fix i8042 filter resource handling, input, and suspend issues in
asus-wmi
- Skip zero instance WMI blocks to avoid issues with some laptops
- Differentiate dev/production keys in mlxbf-bootctl
- Correct surface serdev related return value to avoid leaking errno
into userspace
- Error checking fixes
* tag 'platform-drivers-x86-v6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/mellanox: Check devm_hwmon_device_register_with_groups() return value
platform/mellanox: Add null pointer checks for devm_kasprintf()
mlxbf-bootctl: correctly identify secure boot with development keys
platform/x86: wmi: Skip blocks with zero instances
platform/surface: aggregator: fix recv_buf() return value
platform/x86: asus-wmi: disable USB0 hub on ROG Ally before suspend
platform/x86: asus-wmi: Filter Volume key presses if also reported via atkbd
platform/x86: asus-wmi: Change q500a_i8042_filter() into a generic i8042-filter
platform/x86: asus-wmi: Move i8042 filter install to shared asus-wmi code
Linus Torvalds [Thu, 7 Dec 2023 19:56:34 +0000 (11:56 -0800)]
Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 int80 fixes from Dave Hansen:
"Avoid VMM misuse of 'int 0x80' handling in TDX and SEV guests.
It also has the very nice side effect of getting rid of a bunch of
assembly entry code"
* tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Allow 32-bit emulation by default
x86/entry: Do not allow external 0x80 interrupts
x86/entry: Convert INT 0x80 emulation to IDTENTRY
x86/coco: Disable 32-bit emulation by default on TDX and SEV
Yu Kuai [Thu, 7 Dec 2023 02:07:24 +0000 (10:07 +0800)]
md: split MD_RECOVERY_NEEDED out of mddev_resume
New mddev_resume() calls are added to synchronize IO with array
reconfiguration, however, this introduces a performance regression while
adding it in md_start_sync():
1) someone sets MD_RECOVERY_NEEDED first;
2) daemon thread grabs reconfig_mutex, then clears MD_RECOVERY_NEEDED and
queues a new sync work;
3) daemon thread releases reconfig_mutex;
4) in md_start_sync
a) check that there are spares that can be added/removed, then suspend
the array;
b) remove_and_add_spares may not be called, or called without really
add/remove spares;
c) resume the array, then set MD_RECOVERY_NEEDED again!
Loop between 2 - 4, then mddev_suspend() will be called quite often, for
consequence, normal IO will be quite slow.
Fix this problem by don't set MD_RECOVERY_NEEDED again in md_start_sync(),
hence the loop will be broken.
Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration") Suggested-by: Song Liu <song@kernel.org> Reported-by: Janpieter Sollie <janpieter.sollie@edpnet.be> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218200 Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231207020724.2797445-1-yukuai1@huaweicloud.com
vsock/virtio: fix "comparison of distinct pointer types lacks a cast" warning
After backporting commit 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY
flag support") in CentOS Stream 9, CI reported the following error:
In file included from ./include/linux/kernel.h:17,
from ./include/linux/list.h:9,
from ./include/linux/preempt.h:11,
from ./include/linux/spinlock.h:56,
from net/vmw_vsock/virtio_transport_common.c:9:
net/vmw_vsock/virtio_transport_common.c: In function ‘virtio_transport_can_zcopy‘:
./include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast [-Werror]
20 | (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
| ^~
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck‘
26 | (__typecheck(x, y) && __no_side_effects(x, y))
| ^~~~~~~~~~~
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp‘
36 | __builtin_choose_expr(__safe_cmp(x, y), \
| ^~~~~~~~~~
./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp‘
45 | #define min(x, y) __careful_cmp(x, y, <)
| ^~~~~~~~~~~~~
net/vmw_vsock/virtio_transport_common.c:63:37: note: in expansion of macro ‘min‘
63 | int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS);
We could solve it by using min_t(), but this operation seems entirely
unnecessary, because we also pass MAX_SKB_FRAGS to iov_iter_npages(),
which performs almost the same check, returning at most MAX_SKB_FRAGS
elements. So, let's eliminate this unnecessary comparison.
Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Cc: avkrasnov@salutedevices.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Link: https://lore.kernel.org/r/20231206164143.281107-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wen Gu [Wed, 6 Dec 2023 17:02:37 +0000 (01:02 +0800)]
net/smc: fix missing byte order conversion in CLC handshake
The byte order conversions of ISM GID and DMB token are missing in
process of CLC accept and confirm. So fix it.
Fixes: 3d9725a6a133 ("net/smc: common routine for CLC accept and confirm") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/1701882157-87956-1-git-send-email-guwen@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Nyekjaer [Wed, 6 Dec 2023 07:16:54 +0000 (08:16 +0100)]
net: dsa: microchip: provide a list of valid protocols for xmit handler
Provide a list of valid protocols for which the driver will provide
it's deferred xmit handler.
When using DSA_TAG_PROTO_KSZ8795 protocol, it does not provide a
"connect" method, therefor ksz_connect() is not allocating ksz_tagger_data.
This avoids the following null pointer dereference:
ksz_connect_tag_protocol from dsa_register_switch+0x9ac/0xee0
dsa_register_switch from ksz_switch_register+0x65c/0x828
ksz_switch_register from ksz_spi_probe+0x11c/0x168
ksz_spi_probe from spi_probe+0x84/0xa8
spi_probe from really_probe+0xc8/0x2d8
Fixes: ab32f56a4100 ("net: dsa: microchip: ptp: add packet transmission timestamping") Signed-off-by: Sean Nyekjaer <sean@geanix.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20231206071655.1626479-1-sean@geanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Restrict two generic netlink multicast groups - in the "psample" and
"NET_DM" families - to be root-only with the appropriate capabilities.
See individual patches for more details.
====================
Ido Schimmel [Wed, 6 Dec 2023 21:31:02 +0000 (23:31 +0200)]
drop_monitor: Require 'CAP_SYS_ADMIN' when joining "events" group
The "NET_DM" generic netlink family notifies drop locations over the
"events" multicast group. This is problematic since by default generic
netlink allows non-root users to listen to these notifications.
Fix by adding a new field to the generic netlink multicast group
structure that when set prevents non-root users or root without the
'CAP_SYS_ADMIN' capability (in the user namespace owning the network
namespace) from joining the group. Set this field for the "events"
group. Use 'CAP_SYS_ADMIN' rather than 'CAP_NET_ADMIN' because of the
nature of the information that is shared over this group.
Note that the capability check in this case will always be performed
against the initial user namespace since the family is not netns aware
and only operates in the initial network namespace.
A new field is added to the structure rather than using the "flags"
field because the existing field uses uAPI flags and it is inappropriate
to add a new uAPI flag for an internal kernel check. In net-next we can
rework the "flags" field to use internal flags and fold the new field
into it. But for now, in order to reduce the amount of changes, add a
new field.
Since the information can only be consumed by root, mark the control
plane operations that start and stop the tracing as root-only using the
'GENL_ADMIN_PERM' flag.
Ido Schimmel [Wed, 6 Dec 2023 21:31:01 +0000 (23:31 +0200)]
psample: Require 'CAP_NET_ADMIN' when joining "packets" group
The "psample" generic netlink family notifies sampled packets over the
"packets" multicast group. This is problematic since by default generic
netlink allows non-root users to listen to these notifications.
Fix by marking the group with the 'GENL_UNS_ADMIN_PERM' flag. This will
prevent non-root users or root without the 'CAP_NET_ADMIN' capability
(in the user namespace owning the network namespace) from joining the
group.
Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling") Reported-by: "The UK's National Cyber Security Centre (NCSC)" <security@ncsc.gov.uk> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231206213102.1824398-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
32-bit emulation was disabled on TDX to prevent a possible attack by
a VMM injecting an interrupt on vector 0x80.
Now that int80_emulation() has a check for external interrupts the
limitation can be lifted.
To distinguish software interrupts from external ones, int80_emulation()
checks the APIC ISR bit relevant to the 0x80 vector. For
software interrupts, this bit will be 0.
On TDX, the VAPIC state (including ISR) is protected and cannot be
manipulated by the VMM. The ISR bit is set by the microcode flow during
the handling of posted interrupts.
[ dhansen: more changelog tweaks ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+
Thomas Gleixner [Mon, 4 Dec 2023 08:31:40 +0000 (11:31 +0300)]
x86/entry: Do not allow external 0x80 interrupts
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
also triggers the same codepath.
An external interrupt on vector 0x80 will currently be interpreted as a
32-bit system call, and assuming that it was a user context.
Panic on external interrupts on the vector.
To distinguish software interrupts from external ones, the kernel checks
the APIC ISR bit relevant to the 0x80 vector. For software interrupts,
this bit will be 0.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@vger.kernel.org> # v6.0+
x86/coco: Disable 32-bit emulation by default on TDX and SEV
The INT 0x80 instruction is used for 32-bit x86 Linux syscalls. The
kernel expects to receive a software interrupt as a result of the INT
0x80 instruction. However, an external interrupt on the same vector
triggers the same handler.
The kernel interprets an external interrupt on vector 0x80 as a 32-bit
system call that came from userspace.
A VMM can inject external interrupts on any arbitrary vector at any
time. This remains true even for TDX and SEV guests where the VMM is
untrusted.
Put together, this allows an untrusted VMM to trigger int80 syscall
handling at any given point. The content of the guest register file at
that moment defines what syscall is triggered and its arguments. It
opens the guest OS to manipulation from the VMM side.
Disable 32-bit emulation by default for TDX and SEV. User can override
it with the ia32_emulation=y command line option.
Jakub Kicinski [Thu, 7 Dec 2023 17:43:29 +0000 (09:43 -0800)]
Merge tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Incorrect nf_defrag registration for bpf link infra, from D. Wythe.
2) Skip inactive elements in pipapo set backend walk to avoid double
deactivation, from Florian Westphal.
3) Fix NFT_*_F_PRESENT check with big endian arch, also from Florian.
4) Bail out if number of expressions in NFTA_DYNSET_EXPRESSIONS mismatch
stateful expressions in set declaration.
5) Honor family in table lookup by handle. Broken since 4.16.
6) Use sk_callback_lock to protect access to sk->sk_socket in xt_owner.
sock_orphan() might zap this pointer, from Phil Sutter.
All of these fixes address broken stuff for several releases.
* tag 'nf-23-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: xt_owner: Fix for unsafe access of sk->sk_socket
netfilter: nf_tables: validate family when identifying table via handle
netfilter: nf_tables: bail out on mismatching dynset and set expressions
netfilter: nf_tables: fix 'exist' matching on bigendian arches
netfilter: nft_set_pipapo: skip inactive elements during set walk
netfilter: bpf: fix bad registration on nf_defrag
====================
Pavel Begunkov [Wed, 6 Dec 2023 13:26:47 +0000 (13:26 +0000)]
io_uring/af_unix: disable sending io_uring over sockets
File reference cycles have caused lots of problems for io_uring
in the past, and it still doesn't work exactly right and races with
unix_stream_read_generic(). The safest fix would be to completely
disallow sending io_uring files via sockets via SCM_RIGHT, so there
are no possible cycles invloving registered files and thus rendering
SCM accounting on the io_uring side unnecessary.
Jakub Kicinski [Thu, 7 Dec 2023 17:32:23 +0000 (09:32 -0800)]
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-12-06
We've added 4 non-merge commits during the last 6 day(s) which contain
a total of 7 files changed, 185 insertions(+), 55 deletions(-).
The main changes are:
1) Fix race found by syzkaller on prog_array_map_poke_run when
a BPF program's kallsym symbols were still missing, from Jiri Olsa.
2) Fix BPF verifier's branch offset comparison for BPF_JMP32 | BPF_JA,
from Yonghong Song.
3) Fix xsk's poll handling to only set mask on bound xsk sockets,
from Yewon Choi.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Add test for early update in prog_array_map_poke_run
bpf: Fix prog_array_map_poke_run map poke update
xsk: Skip polling event check for unbound socket
bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
====================
* tag 'nvme-6.7-2023-12-7' of git://git.infradead.org/nvme:
nvme-pci: Add sleep quirk for Kingston drives
nvme: fix deadlock between reset and scan
nvme: prevent potential spectre v1 gadget
nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptions
nvme-ioctl: move capable() admin check to the end
nvme: ensure reset state check ordering
nvme: introduce helper function to get ctrl state
Georg Gottleuber [Wed, 20 Sep 2023 08:52:10 +0000 (10:52 +0200)]
nvme-pci: Add sleep quirk for Kingston drives
Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO
platforms in s2idle sleep if 'Simple Suspend' is used.
This patch applies a new quirk 'Force No Simple Suspend' to achieve a
low power sleep without 'Simple Suspend'.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com> Cc: <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
Arnd Bergmann [Thu, 7 Dec 2023 16:36:27 +0000 (17:36 +0100)]
Merge tag 'imx-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 6.7:
- A MAINTAINERS update to reinstate freescale ARM64 DT directory in i.MX
entry.
- A series from Alexander Stein to fix #pwm-cells for imx8-ss.
- A series from Haibo Chen to fix GPIO node name for i.MX93 and
i.MX8ULP.
- Add parkmode-disable-ss-quirk for DWC3 on i.MX8MP and i.MX8MQ to fix
an issue that the controller may hang when processing transactions
under heavy USB traffic from multiple endpoints.
- Fix mediamix block power on/off for i.MX93 by correcting the power
domain clock to be 'nic_media'.
- A couple of Ethernet PHY clock regression fixes for imx6ul-pico and
imx6q-skov board.
- Fix edma3 power domain for i.MX8QM to fix a panic during startup
process.
* tag 'imx-fixes-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx28-xea: Pass the 'model' property
ARM: dts: imx7: Declare timers compatible with fsl,imx6dl-gpt
MAINTAINERS: reinstate freescale ARM64 DT directory in i.MX entry
arm64: dts: imx8-apalis: set wifi regulator to always-on
ARM: imx: Check return value of devm_kasprintf in imx_mmdc_perf_init
arm64: dts: imx8ulp: update gpio node name to align with register address
arm64: dts: imx93: update gpio node name to align with register address
arm64: dts: imx93: correct mediamix power
arm64: dts: imx8qm: Add imx8qm's own pm to avoid panic during startup
arm64: dts: freescale: imx8-ss-dma: Fix #pwm-cells
arm64: dts: freescale: imx8-ss-lsio: Fix #pwm-cells
dt-bindings: pwm: imx-pwm: Unify #pwm-cells for all compatibles
ARM: dts: imx6ul-pico: Describe the Ethernet PHY clock
arm64: dts: imx8mp: imx8mq: Add parkmode-disable-ss-quirk on DWC3
ARM: dts: imx6q: skov: fix ethernet clock regression
arm64: dt: imx93: tqma9352-mba93xxla: Fix LPUART2 pad config
Hui Zhou [Tue, 5 Dec 2023 09:26:25 +0000 (11:26 +0200)]
nfp: flower: fix for take a mutex lock in soft irq context and rcu lock
The neighbour event callback call the function nfp_tun_write_neigh,
this function will take a mutex lock and it is in soft irq context,
change the work queue to process the neighbour event.
Move the nfp_tun_write_neigh function out of range rcu_read_lock/unlock()
in function nfp_tunnel_request_route_v4 and nfp_tunnel_request_route_v6.
Fixes: abc210952af7 ("nfp: flower: tunnel neigh support bond offload") CC: stable@vger.kernel.org # 6.2+ Signed-off-by: Hui Zhou <hui.zhou@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Orlov [Wed, 6 Dec 2023 22:32:11 +0000 (22:32 +0000)]
ALSA: pcmtest: stop timer before buffer is released
Stop timer in the 'trigger' and 'sync_stop' callbacks since we want
the timer to be stopped before the DMA buffer is released. Otherwise,
it could trigger a kernel panic in some circumstances, for instance
when the DMA buffer is already released but the timer callback is
still running.
Linus Torvalds [Thu, 7 Dec 2023 07:22:48 +0000 (23:22 -0800)]
Merge tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"A single line patch for parisc which fixes the build in tinyconfig
configurations:
- Fix asm operand number out of range build error in bug table"
* tag 'parisc-for-6.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix asm operand number out of range build error in bug table
net: dsa: mv88e6xxx: Restore USXGMII support for 6393X
In 4a56212774ac, USXGMII support was added for 6393X, but this was
lost in the PCS conversion (the blamed commit), most likely because
these efforts where more or less done in parallel.
Restore this feature by porting Michal's patch to fit the new
implementation.
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Michal Smulski <michal.smulski@ooma.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Fixes: e5b732a275f5 ("net: dsa: mv88e6xxx: convert 88e639x to phylink_pcs") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Link: https://lore.kernel.org/r/20231205221359.3926018-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 5 Dec 2023 16:18:41 +0000 (16:18 +0000)]
tcp: do not accept ACK of bytes we never sent
This patch is based on a detailed report and ideas from Yepeng Pan
and Christian Rossow.
ACK seq validation is currently following RFC 5961 5.2 guidelines:
The ACK value is considered acceptable only if
it is in the range of ((SND.UNA - MAX.SND.WND) <= SEG.ACK <=
SND.NXT). All incoming segments whose ACK value doesn't satisfy the
above condition MUST be discarded and an ACK sent back. It needs to
be noted that RFC 793 on page 72 (fifth check) says: "If the ACK is a
duplicate (SEG.ACK < SND.UNA), it can be ignored. If the ACK
acknowledges something not yet sent (SEG.ACK > SND.NXT) then send an
ACK, drop the segment, and return". The "ignored" above implies that
the processing of the incoming data segment continues, which means
the ACK value is treated as acceptable. This mitigation makes the
ACK check more stringent since any ACK < SND.UNA wouldn't be
accepted, instead only ACKs that are in the range ((SND.UNA -
MAX.SND.WND) <= SEG.ACK <= SND.NXT) get through.
This can be refined for new (and possibly spoofed) flows,
by not accepting ACK for bytes that were never sent.
This greatly improves TCP security at a little cost.
I added a Fixes: tag to make sure this patch will reach stable trees,
even if the 'blamed' patch was adhering to the RFC.
tp->bytes_acked was added in linux-4.2
Following packetdrill test (courtesy of Yepeng Pan) shows
the issue at hand:
// when window scale is set to 14 the window size can be extended to
// 65535 * (2^14) = 1073725440. Linux would accept an ACK packet
// with ack number in (Server_ISN+1-1073725440. Server_ISN+1)
// ,though this ack number acknowledges some data never
// sent by the server.
// For the established connection, we send an ACK packet,
// the ack packet uses ack number 1 - 1073725300 + 2^32,
// where 2^32 is used to wrap around.
// Note: we used 1073725300 instead of 1073725440 to avoid possible
// edge cases.
// 1 - 1073725300 + 2^32 = 3221241997
// Oops, old kernels happily accept this packet.
+0 < . 1:1001(1000) ack 3221241997 win 65535
// After the kernel fix the following will be replaced by a challenge ACK,
// and prior malicious frame would be dropped.
+0 > . 1:1(0) ack 1001
Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yepeng Pan <yepeng.pan@cispa.de> Reported-by: Christian Rossow <rossow@cispa.de> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231205161841.2702925-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Inki Dae [Wed, 1 Nov 2023 09:36:51 +0000 (18:36 +0900)]
drm/exynos: fix a wrong error checking
Fix a wrong error checking in exynos_drm_dma.c module.
In the exynos_drm_register_dma function, both arm_iommu_create_mapping()
and iommu_get_domain_for_dev() functions are expected to return NULL as
an error.
However, the error checking is performed using the statement
if(IS_ERR(mapping)), which doesn't provide a suitable error value.
So check if 'mapping' is NULL, and if it is, return -ENODEV.
Xiang Yang [Sat, 12 Aug 2023 06:27:48 +0000 (14:27 +0800)]
drm/exynos: fix a potential error pointer dereference
Smatch reports the warning below:
drivers/gpu/drm/exynos/exynos_hdmi.c:1864 hdmi_bind()
error: 'crtc' dereferencing possible ERR_PTR()
The return value of exynos_drm_crtc_get_by_type maybe ERR_PTR(-ENODEV),
which can not be used directly. Fix this by checking the return value
before using it.
Signed-off-by: Xiang Yang <xiangyang3@huawei.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
Miquel Raynal [Fri, 24 Nov 2023 19:38:14 +0000 (20:38 +0100)]
nvmem: Do not expect fixed layouts to grab a layout driver
Two series lived in parallel for some time, which led to this situation:
- The nvmem-layout container is used for dynamic layouts
- We now expect fixed layouts to also use the nvmem-layout container but
this does not require any additional driver, the support is built-in the
nvmem core.
Ensure we don't refuse to probe for wrong reasons.
Andi Shyti [Mon, 4 Dec 2023 16:38:03 +0000 (17:38 +0100)]
serial: ma35d1: Validate console index before assignment
The console is immediately assigned to the ma35d1 port without
checking its index. This oversight can lead to out-of-bounds
errors when the index falls outside the valid '0' to
MA35_UART_NR range. Such scenario trigges ran error like the
following:
UBSAN: array-index-out-of-bounds in drivers/tty/serial/ma35d1_serial.c:555:51
index -1 is out of range for type 'uart_ma35d1_port [17]
Check the index before using it and bail out with a warning.
The commit 89ff3dfac604 ("usb: gadget: f_hid: fix f_hidg lifetime vs
cdev") has introduced a bug that leads to hid device corruption after
the replug operation.
Reverse device managed memory allocation for the report descriptor
to fix the issue.
Tested:
This change was tested on the AMD EthanolX CRB server with the BMC
based on the OpenBMC distribution. The BMC provides KVM functionality
via the USB gadget device:
- before: KVM page refresh results in a broken USB device,
- after: KVM page refresh works without any issues.
Jiexun Wang [Thu, 21 Sep 2023 12:27:51 +0000 (20:27 +0800)]
mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()
I conducted real-time testing and observed that
madvise_cold_or_pageout_pte_range() causes significant latency under
memory pressure, which can be effectively reduced by adding cond_resched()
within the loop.
I tested on the LicheePi 4A board using Cylictest for latency testing and
Ftrace for latency tracing. The board uses TH1520 processor and has a
memory size of 8GB. The kernel version is 6.5.0 with the PREEMPT_RT patch
applied.
After the modification, the cause of maximum latency is no longer
madvise_cold_or_pageout_pte_range(), so this modification can reduce the
latency caused by madvise_cold_or_pageout_pte_range().
Currently the madvise_cold_or_pageout_pte_range() function exhibits
significant latency under memory pressure, which can be effectively
reduced by adding cond_resched() within the loop.
When the batch_count reaches SWAP_CLUSTER_MAX, we reschedule
the task to ensure fairness and avoid long lock holding times.
Ryusuke Konishi [Tue, 5 Dec 2023 08:59:47 +0000 (17:59 +0900)]
nilfs2: prevent WARNING in nilfs_sufile_set_segment_usage()
If nilfs2 reads a disk image with corrupted segment usage metadata, and
its segment usage information is marked as an error for the segment at the
write location, nilfs_sufile_set_segment_usage() can trigger WARN_ONs
during log writing.
Segments newly allocated for writing with nilfs_sufile_alloc() will not
have this error flag set, but this unexpected situation will occur if the
segment indexed by either nilfs->ns_segnum or nilfs->ns_nextnum (active
segment) was marked in error.
Fix this issue by inserting a sanity check to treat it as a file system
corruption.
Since error returns are not allowed during the execution phase where
nilfs_sufile_set_segment_usage() is used, this inserts the sanity check
into nilfs_sufile_mark_dirty() which pre-reads the buffer containing the
segment usage record to be updated and sets it up in a dirty state for
writing.
In addition, nilfs_sufile_set_segment_usage() is also called when
canceling log writing and undoing segment usage update, so in order to
avoid issuing the same kernel warning in that case, in case of
cancellation, avoid checking the error flag in
nilfs_sufile_set_segment_usage().
Sidhartha Kumar [Mon, 4 Dec 2023 18:32:34 +0000 (10:32 -0800)]
mm/hugetlb: have CONFIG_HUGETLB_PAGE select CONFIG_XARRAY_MULTI
After commit a08c7193e4f1 "mm/filemap: remove hugetlb special casing in
filemap.c", hugetlb pages are stored in the page cache in base page sized
indexes. This leads to multi index stores in the xarray which is only
supporting through CONFIG_XARRAY_MULTI. The other page cache user of
multi index stores ,THP, selects XARRAY_MULTI. Have CONFIG_HUGETLB_PAGE
follow this behavior as well to avoid the BUG() with a CONFIG_HUGETLB_PAGE
&& !CONFIG_XARRAY_MULTI config.
Link: https://lkml.kernel.org/r/20231204183234.348697-1-sidhartha.kumar@oracle.com Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Florian Fainelli [Thu, 30 Nov 2023 04:33:16 +0000 (20:33 -0800)]
scripts/gdb: fix lx-device-list-bus and lx-device-list-class
After the conversion to bus_to_subsys() and class_to_subsys(), the gdb
scripts listing the system buses and classes respectively was broken, fix
those by returning the subsys_priv pointer and have the various caller
de-reference either the 'bus' or 'class' structure members accordingly.
Link: https://lkml.kernel.org/r/20231130043317.174188-1-florian.fainelli@broadcom.com Fixes: 7b884b7f24b4 ("driver core: class.c: convert to only use class_to_subsys") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Tested-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bagas Sanjaya [Thu, 30 Nov 2023 08:38:48 +0000 (15:38 +0700)]
MAINTAINERS: drop Antti Palosaari
He is currently inactive (last message from him is two years ago [1]).
His media tree [2] is also dormant (latest activity is 6 years ago), yet
his site is still online [3].
Drop him from MAINTAINERS and add CREDITS entry for him. We thank him
for maintaining various DVB drivers.
Su Hui [Thu, 30 Nov 2023 03:40:18 +0000 (11:40 +0800)]
highmem: fix a memory copy problem in memcpy_from_folio
Clang static checker complains that value stored to 'from' is never read.
And memcpy_from_folio() only copy the last chunk memory from folio to
destination. Use 'to += chunk' to replace 'from += chunk' to fix this
typo problem.
Link: https://lkml.kernel.org/r/20231130034017.1210429-1-suhui@nfschina.com Fixes: b23d03ef7af5 ("highmem: add memcpy_to_folio() and memcpy_from_folio()") Signed-off-by: Su Hui <suhui@nfschina.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Tom Rix <trix@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ryusuke Konishi [Wed, 29 Nov 2023 14:15:47 +0000 (23:15 +0900)]
nilfs2: fix missing error check for sb_set_blocksize call
When mounting a filesystem image with a block size larger than the page
size, nilfs2 repeatedly outputs long error messages with stack traces to
the kernel log, such as the following:
This overloads the system logger. And to make matters worse, it sometimes
crashes the kernel with a memory access violation.
This is because the return value of the sb_set_blocksize() call, which
should be checked for errors, is not checked.
The latter issue is due to out-of-buffer memory being accessed based on a
large block size that caused sb_set_blocksize() to fail for buffers read
with the initial minimum block size that remained unupdated in the
super_block structure.
Since nilfs2 mkfs tool does not accept block sizes larger than the system
page size, this has been overlooked. However, it is possible to create
this situation by intentionally modifying the tool or by passing a
filesystem image created on a system with a large page size to a system
with a smaller page size and mounting it.
Fix this issue by inserting the expected error handling for the call to
sb_set_blocksize().
Baoquan He [Tue, 28 Nov 2023 05:44:57 +0000 (13:44 +0800)]
kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Ignat Korchagin complained that a potential config regression was
introduced by commit 89cde455915f ("kexec: consolidate kexec and crash
options into kernel/Kconfig.kexec"). Before the commit, CONFIG_CRASH_DUMP
has no dependency on CONFIG_KEXEC. After the commit, CRASH_DUMP selects
KEXEC. That enforces system to have CONFIG_KEXEC=y as long as
CONFIG_CRASH_DUMP=Y which people may not want.
In Ignat's case, he sets CONFIG_CRASH_DUMP=y, CONFIG_KEXEC_FILE=y and
CONFIG_KEXEC=n because kexec_load interface could have security issue if
kernel/initrd has no chance to be signed and verified.
CRASH_DUMP has select of KEXEC because Eric, author of above commit, met a
LKP report of build failure when posting patch of earlier version. Please
see below link to get detail of the LKP report:
In fact, that LKP report is triggered because arm's <asm/kexec.h> is
wrapped in CONFIG_KEXEC ifdeffery scope. That is wrong. CONFIG_KEXEC
controls the enabling/disabling of kexec_load interface, but not kexec
feature. Removing the wrongly added CONFIG_KEXEC ifdeffery scope in
<asm/kexec.h> of arm allows us to drop the select KEXEC for CRASH_DUMP.
Meanwhile, change arch/arm/kernel/Makefile to let machine_kexec.o
relocate_kernel.o depend on KEXEC_CORE.
Link: https://lkml.kernel.org/r/20231128054457.659452-1-bhe@redhat.com Fixes: 89cde455915f ("kexec: consolidate kexec and crash options into kernel/Kconfig.kexec") Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Ignat Korchagin <ignat@cloudflare.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> [compile-time only] Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com> Tested-by: Eric DeVolder <eric_devolder@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Baoquan He [Tue, 28 Nov 2023 05:52:48 +0000 (13:52 +0800)]
drivers/base/cpu: crash data showing should depends on KEXEC_CORE
After commit 88a6f8994421 ("crash: memory and CPU hotplug sysfs
attributes"), on x86_64, if only below kernel configs related to kdump are
set, compiling error are triggered.
------------------------------------------------------
drivers/base/cpu.c: In function `crash_hotplug_show':
drivers/base/cpu.c:309:40: error: implicit declaration of function `crash_hotplug_cpu_support'; did you mean `crash_hotplug_show'? [-Werror=implicit-function-declaration]
309 | return sysfs_emit(buf, "%d\n", crash_hotplug_cpu_support());
| ^~~~~~~~~~~~~~~~~~~~~~~~~
| crash_hotplug_show
cc1: some warnings being treated as errors
------------------------------------------------------
CONFIG_KEXEC is used to enable kexec_load interface, the
crash_notes/crash_notes_size/crash_hotplug showing depends on
CONFIG_KEXEC is incorrect. It should depend on KEXEC_CORE instead.
Fix it now.
Link: https://lkml.kernel.org/r/20231128055248.659808-1-bhe@redhat.com Fixes: 88a6f8994421 ("crash: memory and CPU hotplug sysfs attributes") Signed-off-by: Baoquan He <bhe@redhat.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> [compile-time only] Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Eric DeVolder <eric_devolder@yahoo.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Fri, 24 Nov 2023 21:38:40 +0000 (21:38 +0000)]
mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions
If a scheme is set to not applied to any monitoring target region for any
reasons including the target access pattern, quota, filters, or
watermarks, writing 'update_schemes_tried_regions' to 'state' DAMON sysfs
file can indefinitely hang. Fix the case by implementing a timeout for
the operation. The time limit is two apply intervals of each scheme.
Link: https://lkml.kernel.org/r/20231124213840.39157-1-sj@kernel.org Fixes: 4d4e41b68299 ("mm/damon/sysfs-schemes: do not update tried regions more than one DAMON snapshot") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Kuan-Ying Lee [Mon, 27 Nov 2023 07:04:01 +0000 (15:04 +0800)]
scripts/gdb/tasks: fix lx-ps command error
Since commit 8e1f385104ac ("kill task_struct->thread_group") remove
the thread_group, we will encounter below issue.
(gdb) lx-ps
TASK PID COMM
0xffff800086503340 0 swapper/0
Python Exception <class 'gdb.error'>: There is no member named thread_group.
Error occurred in Python: There is no member named thread_group.
We use signal->thread_head to iterate all threads instead.
[Kuan-Ying.Lee@mediatek.com: v2] Link: https://lkml.kernel.org/r/20231129065142.13375-2-Kuan-Ying.Lee@mediatek.com Link: https://lkml.kernel.org/r/20231127070404.4192-2-Kuan-Ying.Lee@mediatek.com Fixes: 8e1f385104ac ("kill task_struct->thread_group") Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Florian Fainelli <florian.fainelli@broadcom.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Qun-Wei Lin <qun-wei.lin@mediatek.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Peter Xu [Thu, 23 Nov 2023 22:42:04 +0000 (17:42 -0500)]
mm/Kconfig: make userfaultfd a menuconfig
PTE_MARKER_UFFD_WP is a subconfig for userfaultfd. To make it clear,
switch to use menuconfig for userfaultfd.
Link: https://lkml.kernel.org/r/20231123224204.1060152-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nico Pache [Mon, 20 Nov 2023 22:29:08 +0000 (15:29 -0700)]
selftests/mm: prevent duplicate runs caused by TEST_GEN_PROGS
Commit 05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
fixed the inconsistency caused by tests being defined as TEST_GEN_PROGS.
This issue was leading to tests not being executed via run_vmtests.sh and
furthermore some tests running twice due to the kselftests wrapper also
executing them.
Fix the definition of two tests (soft-dirty and pagemap_ioctl) that are
still incorrectly defined.
Link: https://lkml.kernel.org/r/20231120222908.28559-1-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
SeongJae Park [Sun, 19 Nov 2023 17:15:28 +0000 (17:15 +0000)]
mm/damon/core: copy nr_accesses when splitting region
Regions split function ('damon_split_region_at()') is called at the
beginning of an aggregation interval, and when DAMOS applying the actions
and charging quota. Because 'nr_accesses' fields of all regions are reset
at the beginning of each aggregation interval, and DAMOS was applying the
action at the end of each aggregation interval, there was no need to copy
the 'nr_accesses' field to the split-out region.
However, commit 42f994b71404 ("mm/damon/core: implement scheme-specific
apply interval") made DAMOS applies action on its own timing interval.
Hence, 'nr_accesses' should also copied to split-out regions, but the
commit didn't. Fix it by copying it.
Link: https://lkml.kernel.org/r/20231119171529.66863-1-sj@kernel.org Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ming Lei [Mon, 20 Nov 2023 08:35:59 +0000 (16:35 +0800)]
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
group_cpus_evenly() could be part of storage driver's error handler, such
as nvme driver, when may happen during CPU hotplug, in which storage queue
has to drain its pending IOs because all CPUs associated with the queue
are offline and the queue is becoming inactive. And handling IO needs
error handler to provide forward progress.
Then deadlock is caused:
1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's
handler is waiting for inflight IO
2) error handler is waiting for CPU hotplug lock
3) inflight IO can't be completed in blk-mq's CPU hotplug handler
because error handling can't provide forward progress.
Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(),
in which two stage spreads are taken: 1) the 1st stage is over all present
CPUs; 2) the end stage is over all other CPUs.
Turns out the two stage spread just needs consistent 'cpu_present_mask',
and remove the CPU hotplug lock by storing it into one local cache. This
way doesn't change correctness, because all CPUs are still covered.
Link: https://lkml.kernel.org/r/20231120083559.285174-1-ming.lei@redhat.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Guangwu Zhang <guazhang@redhat.com> Tested-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Carstens [Mon, 20 Nov 2023 18:37:17 +0000 (19:37 +0100)]
checkstack: fix printed address
All addresses printed by checkstack have an extra incorrect 0 appended at
the end.
This was introduced with commit 677f1410e058 ("scripts/checkstack.pl: don't
display $dre as different entity"): since then the address is taken from
the line which contains the function name, instead of the line which
contains stack consumption. E.g. on s390:
So the used regex which matches spaces and hexadecimal numbers to extract
an address now matches a different substring. Subsequently replacing spaces
with 0 appends a zero at the and, instead of replacing leading spaces.
Fix this by using the proper regex, and simplify the code a bit.
Sumanth Korikkar [Mon, 20 Nov 2023 14:53:53 +0000 (15:53 +0100)]
mm/memory_hotplug: fix error handling in add_memory_resource()
In add_memory_resource(), creation of memory block devices occurs after
successful call to arch_add_memory(). However, creation of memory block
devices could fail. In that case, arch_remove_memory() is called to
perform necessary cleanup.
Currently with or without altmap support, arch_remove_memory() is always
passed with altmap set to NULL during error handling. This leads to
freeing of struct pages using free_pages(), eventhough the allocation
might have been performed with altmap support via
altmap_alloc_block_buf().
Fix the error handling by passing altmap in arch_remove_memory(). This
ensures the following:
* When altmap is disabled, deallocation of the struct pages array occurs
via free_pages().
* When altmap is enabled, deallocation occurs via vmem_altmap_free().
Link: https://lkml.kernel.org/r/20231120145354.308999-3-sumanthk@linux.ibm.com Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: kernel test robot <lkp@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sumanth Korikkar [Mon, 20 Nov 2023 14:53:52 +0000 (15:53 +0100)]
mm/memory_hotplug: add missing mem_hotplug_lock
From Documentation/core-api/memory-hotplug.rst:
When adding/removing/onlining/offlining memory or adding/removing
heterogeneous/device memory, we should always hold the mem_hotplug_lock
in write mode to serialise memory hotplug (e.g. access to global/zone
variables).
mhp_(de)init_memmap_on_memory() functions can change zone stats and
struct page content, but they are currently called w/o the
mem_hotplug_lock.
When memory block is being offlined and when kmemleak goes through each
populated zone, the following theoretical race conditions could occur:
CPU 0: | CPU 1:
memory_offline() |
-> offline_pages() |
-> mem_hotplug_begin() |
... |
-> mem_hotplug_done() |
| kmemleak_scan()
| -> get_online_mems()
| ...
-> mhp_deinit_memmap_on_memory() |
[not protected by mem_hotplug_begin/done()]|
Marks memory section as offline, | Retrieves zone_start_pfn
poisons vmemmap struct pages and updates | and struct page members.
the zone related data |
| ...
| -> put_online_mems()
Fix this by ensuring mem_hotplug_lock is taken before performing
mhp_init_memmap_on_memory(). Also ensure that
mhp_deinit_memmap_on_memory() holds the lock.
online/offline_pages() are currently only called from
memory_block_online/offline(), so it is safe to move the locking there.
Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: kernel test robot <lkp@intel.com> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>