Linus Torvalds [Sat, 13 Aug 2022 01:39:43 +0000 (18:39 -0700)]
Merge tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
"There's still a handful of new features in here, but there are a lot
of fixes/cleanups as well:
- Support for the Zicbom extension for explicit cache-block
management, along with the necessary bits to make the non-standard
cache management ops on the Allwinner D1 function
- Support for the Zihintpause extension, which codifies a go-slow
instruction used for cpu_relax()
- Support for the Sstc extension for supervisor-mode timer/counter
management
- Many device tree fixes and cleanups, including a large set for the
Canaan device trees
- A handful of fixes and cleanups for the PMU driver"
* tag 'riscv-for-linus-5.20-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (43 commits)
dt-bindings: gpio: sifive: add gpio-line-names
wireguard: selftests: set CONFIG_NONPORTABLE on riscv32
RISC-V: KVM: Support sstc extension
RISC-V: Improve SBI definitions
RISC-V: Move counter info definition to sbi header file
RISC-V: Fix SBI PMU calls for RV32
RISC-V: Update user page mapping only once during start
RISC-V: Fix counter restart during overflow for RV32
RISC-V: Prefer sstc extension if available
RISC-V: Enable sstc extension parsing from DT
RISC-V: Add SSTC extension CSR details
riscv:uprobe fix SR_SPIE set/clear handling
dt-bindings: riscv: fix SiFive l2-cache's cache-sets
riscv: ensure cpu_ops_sbi is declared
RISC-V: cpu_ops_spinwait.c should include head.h
RISC-V: Declare cpu_ops_spinwait in <asm/cpu_ops.h>
riscv: dts: starfive: correct number of external interrupts
riscv: dts: sifive unmatched: Add PWM controlled LEDs
riscv/purgatory: Omit use of bin2c
riscv/purgatory: hard-code obj-y in Makefile
...
* tag 'devicetree-fixes-for-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: chrome: google,cros-ec-typec: restrict allowed properties
dt-bindings: Drop Dan Murphy and Ricardo Rivera-Matos
dt-bindings: Drop Robert Jones
dt-bindings: Drop Beniamin Bia and Stefan Popa
dt-bindings: iio: Drop Bogdan Pricop
dt-bindings: iio: Drop Joachim Eastwood
dt-bindings: mailbox: arm,mhu: Make secure interrupt optional
dt-bindings: pinctrl: qcom,ipq6018: Fix example 'gpio-ranges' size
dt-bindings: Drop DT_MK_SCHEMA_FLAGS conditional selecting schema files
dt-bindings: mfd: convert to yaml Qualcomm SPMI PMIC
dt-bindings: mmc: sdhci-msm: Fix 'operating-points-v2 was unexpected' issue
dt-bindings: display: simple-framebuffer: Drop Bartlomiej Zolnierkiewicz
Drivers:
- use simple i2c probe where possible
- sun6i: add R329 support
- zynqmp: add calibration support
- vr41xx: remove unused driver"
* tag 'rtc-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (31 commits)
rtc: spear: set range max
rtc: rtc-cmos: Do not check ACPI_FADT_LOW_POWER_S0
rtc: zynqmp: initialize fract_tick
rtc: Add NCT3018Y real time clock driver
dt-bindings: rtc: nuvoton: add NCT3018Y Real Time Clock
dt-bindings: rtc: nxp,pcf85063: Convert to DT schema
dt-bindings: rtc: microcrystal,rv3032: Add missing type to 'trickle-voltage-millivolt'
rtc: rx8025: fix 12/24 hour mode detection on RX-8035
rtc: cros-ec: Only warn once in .remove() about notifier_chain problems
rtc: vr41xx: remove driver
rtc: mpfs: remove 'pending' variable from mpfs_rtc_wakeup_irq_handler()
rtc: rv8803: fix missing unlock on error in rv8803_set_time()
rtc: zynqmp: Add calibration set and get support
rtc: zynqmp: Updated calibration value
dt-bindings: rtc: zynqmp: Add clock information
rtc: sun6i: add support for R329 RTC
rtc: Directly use ida_alloc()/free()
rtc: Introduce ti-k3-rtc
dt-bindings: rtc: Add TI K3 RTC description
dt-bindings: rtc: qcom-pm8xxx-rtc: Update the maintainers section
...
Describe exactly what properties are allowed in Google Chrome OS EC Type
C port, so the schema can properly validate the DTS. Existing DTS
defines always connectors with unit addresses, not a sole "connector"
child.
dt-bindings: Drop Dan Murphy and Ricardo Rivera-Matos
Emails to Dan Murphy and Ricardo Rivera-Matos bounce ("550 Invalid
recipient"). Andrew Davis agreed to take over the bindings.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Andrew Davis <afd@ti.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220809162752.10186-6-krzysztof.kozlowski@linaro.org
Linus Torvalds [Fri, 12 Aug 2022 16:55:32 +0000 (09:55 -0700)]
Merge tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Includes a few usual updates for HD- and USB-audio and a trivial
cleanup patch"
* tag 'sound-fix-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Fix crash due to jack poll in suspend
ALSA: hda/cirrus - support for iMac 12,1 model
ALSA: usb-audio: make read-only array marker static const
ALSA: hda/realtek: Add a quirk for HP OMEN 15 (8786) mute LED
ALSA: usb-audio: More comprehensive mixer map for ASUS ROG Zenith II
ALSA: scarlett2: Add Focusrite Clarett+ 8Pre support
ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model
ALSA: ice1712: remove redundant assignment to new
ALSA: hda/realtek: Add quirk for another Asus K42JZ model
Linus Torvalds [Fri, 12 Aug 2022 16:50:34 +0000 (09:50 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- A huge patchset supporting vq resize using the new vq reset
capability
- Features, fixes, and cleanups all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits)
vdpa/mlx5: Fix possible uninitialized return value
vdpa_sim_blk: add support for discard and write-zeroes
vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH
vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests
vdpa_sim_blk: check if sector is 0 for commands other than read or write
vdpa_sim: Implement suspend vdpa op
vhost-vdpa: uAPI to suspend the device
vhost-vdpa: introduce SUSPEND backend feature bit
vdpa: Add suspend operation
virtio-blk: Avoid use-after-free on suspend/resume
virtio_vdpa: support the arg sizes of find_vqs()
vhost-vdpa: Call ida_simple_remove() when failed
vDPA: fix 'cast to restricted le16' warnings in vdpa.c
vDPA: !FEATURES_OK should not block querying device config space
vDPA/ifcvf: support userspace to query features and MQ of a management device
vDPA/ifcvf: get_config_size should return a value no greater than dev implementation
vhost scsi: Allow user to control num virtqueues
vhost-scsi: Fix max number of virtqueues
vdpa/mlx5: Support different address spaces for control and data
vdpa/mlx5: Implement susupend virtqueue callback
...
Linus Torvalds [Fri, 12 Aug 2022 16:44:23 +0000 (09:44 -0700)]
Merge tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Optimise getcpu() with vDSO
- PCI enablement on top of pci & irqchip changes
- Stack unwinder and stack trace support
- Some bug fixes and build error fixes
- Update the default config file
* tag 'loongarch-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
docs/zh_CN/LoongArch: Add I14 description
docs/LoongArch: Add I14 description
LoongArch: Update Loongson-3 default config file
LoongArch: Add USER_STACKTRACE support
LoongArch: Add STACKTRACE support
LoongArch: Add prologue unwinder support
LoongArch: Add guess unwinder support
LoongArch: Add vDSO syscall __vdso_getcpu()
LoongArch: Add PCI controller support
LoongArch: Parse MADT to get multi-processor information
LoongArch: Jump to the link address before enable PG
LoongArch: Requires __force attributes for any casts
LoongArch: Fix unsigned comparison with less than zero
LoongArch: Adjust arch/loongarch/Kconfig
LoongArch: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Yury Norov [Fri, 12 Aug 2022 05:55:05 +0000 (22:55 -0700)]
lib: remove lib/nodemask.c
Commit 36d4b36b6959 ("lib/nodemask: inline next_node_in() and
node_random()") removed the lib/nodemask.c file, but the remove didn't
happen when the patch was applied.
Atul Khare [Wed, 3 Aug 2022 15:55:40 +0000 (16:55 +0100)]
dt-bindings: gpio: sifive: add gpio-line-names
Fix device tree schema validation messages like 'gpio-line-names'
does not match any of the regexes: 'pinctrl-[0-9]+' From schema: ...
sifive,gpio.yaml'.
The bindings were missing the gpio-line-names element, which was
causing the dt-schema checker to trip-up.
wireguard: selftests: set CONFIG_NONPORTABLE on riscv32
When the CONFIG_PORTABLE/CONFIG_NONPORTABLE switches were added, various
configs were updated, but the wireguard config was forgotten about. This
leads to unbootable test kernels, causing CI fails. Add
CONFIG_NONPORTABLE=y to the wireguard test suite configuration for
riscv32.
Fixes: 44c1e84a38a0 ("RISC-V: Add CONFIG_{NON,}PORTABLE") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220809145757.83673-1-Jason@zx2c4.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sstc extension allows the guest to program the vstimecmp CSR directly
instead of making an SBI call to the hypervisor to program the next
event. The timer interrupt is also directly injected to the guest by
the hardware in this case. To maintain backward compatibility, the
hypervisors also update the vstimecmp in an SBI set_time call if
the hardware supports it. Thus, the older kernels in guest also
take advantage of the sstc extension.
Qing Zhang [Sat, 6 Aug 2022 08:10:04 +0000 (16:10 +0800)]
LoongArch: Add STACKTRACE support
1. Use common arch_stack_walk() infrastructure to avoid duplicated code
and avoid taking care of the stack storage and filtering.
2. Add sched_ra (means sched return address) and sched_cfa (means sched
call frame address) to thread_info, and store them in switch_to().
3. Add __get_wchan() implementation.
Now we can print the process stack and wait channel by cat /proc/*/stack
and /proc/*/wchan.
Qing Zhang [Sat, 6 Aug 2022 08:10:03 +0000 (16:10 +0800)]
LoongArch: Add prologue unwinder support
It unwind the stack frame based on prologue code analyze.
CONFIG_KALLSYMS is needed, at least the address and length
of each function.
Three stages when we do unwind,
1) unwind_start(), the prapare of unwinding, fill unwind_state.
2) unwind_done(), judge whether the unwind process is finished or not.
3) unwind_next_frame(), unwind the next frame.
Dividing unwinder helps to add new unwinders in the future, e.g.:
unwinder_frame, unwinder_orc, .etc.
Qing Zhang [Sat, 6 Aug 2022 08:10:02 +0000 (16:10 +0800)]
LoongArch: Add guess unwinder support
Name "guess unwinder" comes from x86, it scans the stack and reports
every kernel text address it finds.
Unwinders can be used by dump_stack() and other stacktrace functions.
Three stages when we do unwind,
1) unwind_start(), the prapare of unwinding, fill unwind_state.
2) unwind_done(), judge whether the unwind process is finished or not.
3) unwind_next_frame(), unwind the next frame.
Add get_stack_info() to get stack info. At present we have irq stack and
task stack. The next_sp is the key info between two types of stacks.
Dividing unwinder helps to add new unwinders in the future.
Huacai Chen [Thu, 11 Aug 2022 12:52:12 +0000 (20:52 +0800)]
LoongArch: Jump to the link address before enable PG
The kernel entry points of both boot CPU (i.e., kernel_entry) and non-
boot CPUs (i.e., smpboot_entry) may be physical address from BootLoader
(in DA mode or identity-mapping PG mode). So we should jump to the link
address before PG enabled (because DA is disabled at the same time) and
just after DMW configured.
Specifically: With some older firmwares, non-boot CPUs started with PG
enabled, but this need firmware cooperation in the form of a temporary
page table, which is deemed unnecessary. OTOH, latest firmware versions
configure the non-boot CPUs to start in DA mode, so kernel-side changes
are needed.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Yang Li [Sat, 6 Aug 2022 07:19:33 +0000 (15:19 +0800)]
LoongArch: Fix unsigned comparison with less than zero
The return value from the call to get_timer_irq() is int, which can be
a negative error code. However, the return value is being assigned to an
unsigned int variable 'irq', so making 'irq' an int.
Eliminate the following coccicheck warning:
./arch/loongarch/kernel/time.c:146:5-8: WARNING: Unsigned expression compared with zero: irq < 0
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Sat, 6 Aug 2022 07:19:32 +0000 (15:19 +0800)]
LoongArch: Adjust arch/loongarch/Kconfig
1, ACPI, EFI and SMP are mandatories for LoongArch, select them
unconditionally to avoid various build errors for 'make randconfig'.
2, Move the MMU_GATHER_MERGE_VMAS selection to the correct place.
Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
LoongArch: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected,
cpu_max_bits_warn() generates a runtime warning similar as below while
we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.
Linus Torvalds [Fri, 12 Aug 2022 02:46:48 +0000 (19:46 -0700)]
Merge tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- A few fixes for the DM verity and bufio changes in this merge window
- A smatch warning fix for DM writecache locking in writecache_map
* tag 'for-6.0/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: fix some cases where the code sleeps with spinlock held
dm writecache: fix smatch warning about invalid return from writecache_map
dm verity: fix verity_parse_opt_args parsing
dm verity: fix DM_VERITY_OPTS_MAX value yet again
dm bufio: simplify DM_BUFIO_CLIENT_NO_SLEEP locking
Linus Torvalds [Fri, 12 Aug 2022 02:12:15 +0000 (19:12 -0700)]
Merge tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Not much to squeeze into rc1, just two small fixes, one for core gem
and one for shmem-helpers:
gem:
- Annotate WW context in error paths
shmem-helper:
- Add missing vunmap in error paths"
* tag 'drm-next-2022-08-12-1' of git://anongit.freedesktop.org/drm/drm:
drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error
drm/shmem-helper: Add missing vunmap on error
RISC-V: Move counter info definition to sbi header file
Counter info encoding format is defined by the SBI specificaiton.
KVM implementation of SBI PMU extension will also leverage this definition.
Move the definition to common sbi header file from the sbi pmu driver.
Some of the SBI PMU calls does not pass 64bit arguments
correctly and not under RV32 compile time flags. Currently,
this doesn't create any incorrect results as RV64 ignores
any value in the additional register and qemu doesn't support
raw events.
Fix those SBI calls in order to set correct values for RV32.
RISC-V: Update user page mapping only once during start
Currently, riscv_pmu_event_set_period updates the userpage mapping.
However, the caller of riscv_pmu_event_set_period should update
the userpage mapping because the counter can not be updated/started
from set_period function in counter overflow path.
Invoke the perf_event_update_userpage at the caller so that it
doesn't get invoked twice during counter start path.
Palmer Dabbelt [Thu, 11 Aug 2022 21:41:52 +0000 (14:41 -0700)]
RISC-V: Add Sstc extension support
This series implements Sstc extension support which was ratified
recently. Before the Sstc extension, an SBI call is necessary to
generate timer interrupts as only M-mode have access to the timecompare
registers. Thus, there is significant latency to generate timer
interrupts at kernel. For virtualized enviornments, its even worse as
the KVM handles the SBI call and uses a software timer to emulate the
timecomapre register.
Sstc extension solves both these problems by defining a
stimecmp/vstimecmp at supervisor (host/guest) level. It allows kernel to
program a timer and recieve interrupt without supervisor execution
enviornment (M-mode/HS mode) intervention.
* palmer/riscv-sstc:
RISC-V: Prefer sstc extension if available
RISC-V: Enable sstc extension parsing from DT
RISC-V: Add SSTC extension CSR details
RISC-V ISA has sstc extension which allows updating the next clock event
via a CSR (stimecmp) instead of an SBI call. This should happen dynamically
if sstc extension is available. Otherwise, it will fallback to SBI call
to maintain backward compatibility.
Yipeng Zou [Thu, 21 Jul 2022 06:58:20 +0000 (14:58 +0800)]
riscv:uprobe fix SR_SPIE set/clear handling
In riscv the process of uprobe going to clear spie before exec
the origin insn,and set spie after that.But When access the page
which origin insn has been placed a page fault may happen and
irq was disabled in arch_uprobe_pre_xol function,It cause a WARN
as follows.
There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol.
We can just remove it.
Fix device tree schema validation error messages for the SiFive
Unmatched: ' cache-sets:0:0: 1024 was expected'.
The existing bindings allow for just 1024 cache-sets but the fu740 on
Unmatched the has 2048 cache-sets. The ISA itself permits any arbitrary
power of two, however this is not supported by dt-schema. The RTL for
the IP, to which the number of cache-sets is a tunable parameter, has
been released publicly so speculatively adding a small number of
"reasonable" values seems unwise also.
Instead, as the binding only supports two distinct controllers: add 2048
and explicitly lock it to the fu740's l2 cache while limiting 1024 to
the l2 cache on the fu540.
Fixes: af951c3a113b ("dt-bindings: riscv: Update l2 cache DT documentation to add support for SiFive FU740") Reported-by: Atul Khare <atulkhare@rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220803185359.942928-1-mail@conchuod.ie Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Sparse complains that cpu_ops_sbi is used undeclared:
arch/riscv/kernel/cpu_ops_sbi.c:17:29: warning: symbol 'cpu_ops_sbi' was not declared. Should it be static?
Fix the warning by adding cpu_ops_sbi to cpu_ops_sbi.h & including that
where used.
Linus Torvalds [Thu, 11 Aug 2022 20:45:37 +0000 (13:45 -0700)]
Merge tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf, can and netfilter.
A little larger than usual but it's all fixes, no late features. It's
large partially because of timing, and partially because of follow ups
to stuff that got merged a week or so before the merge window and
wasn't as widely tested. Maybe the Bluetooth fixes are a little
alarming so we'll address that, but the rest seems okay and not scary.
Notably we're including a fix for the netfilter Kconfig [1], your WiFi
warning [2] and a bluetooth fix which should unblock syzbot [3].
Current release - regressions:
- Bluetooth:
- don't try to cancel uninitialized works [3]
- L2CAP: fix use-after-free caused by l2cap_chan_put
- tls: rx: fix device offload after recent rework
- devlink: fix UAF on failed reload and leftover locks in mlxsw
Current release - new code bugs:
- netfilter:
- flowtable: fix incorrect Kconfig dependencies [1]
- nf_tables: fix crash when nf_trace is enabled
- bpf:
- use proper target btf when exporting attach_btf_obj_id
- arm64: fixes for bpf trampoline support
- Bluetooth:
- ISO: unlock on error path in iso_sock_setsockopt()
- ISO: fix info leak in iso_sock_getsockopt()
- ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP
- ISO: fix memory corruption on iso_pinfo.base
- ISO: fix not using the correct QoS
- hci_conn: fix updating ISO QoS PHY
- phy: dp83867: fix get nvmem cell fail
Previous releases - regressions:
- wifi: cfg80211: fix validating BSS pointers in
__cfg80211_connect_result [2]
- atm: bring back zatm uAPI after ATM had been removed
- properly fix old bug making bonding ARP monitor mode not being able
to work with software devices with lockless Tx
- tap: fix null-deref on skb->dev in dev_parse_header_protocol
- revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some
devices and breaks others
- netfilter:
- nf_tables: many fixes rejecting cross-object linking which may
lead to UAFs
- nf_tables: fix null deref due to zeroed list head
- nf_tables: validate variable length element extension
- bgmac: fix a BUG triggered by wrong bytes_compl
- bcmgenet: indicate MAC is in charge of PHY PM
Previous releases - always broken:
- bpf:
- fix bad pointer deref in bpf_sys_bpf() injected via test infra
- disallow non-builtin bpf programs calling the prog_run command
- don't reinit map value in prealloc_lru_pop
- fix UAFs during the read of map iterator fd
- fix invalidity check for values in sk local storage map
- reject sleepable program for non-resched map iterator
- mptcp:
- move subflow cleanup in mptcp_destroy_common()
- do not queue data on closed subflows
- virtio_net: fix memory leak inside XDP_TX with mergeable
- vsock: fix memory leak when multiple threads try to connect()
- rework sk_user_data sharing to prevent psock leaks
- geneve: fix TOS inheriting for ipv4
- tunnels & drivers: do not use RT_TOS for IPv6 flowlabel
- phy: c45 baset1: do not skip aneg configuration if clock role is
not specified
- rose: avoid overflow when /proc displays timer information
- x25: fix call timeouts in blocking connects
- can: mcp251x: fix race condition on receive interrupt
- can: j1939:
- replace user-reachable WARN_ON_ONCE() with netdev_warn_once()
- fix memory leak of skbs in j1939_session_destroy()
Misc:
- docs: bpf: clarify that many things are not uAPI
- seg6: initialize induction variable to first valid array index (to
silence clang vs objtool warning)
* tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits)
net: atm: bring back zatm uAPI
dpaa2-eth: trace the allocated address instead of page struct
net: add missing kdoc for struct genl_multicast_group::flags
nfp: fix use-after-free in area_cache_get()
MAINTAINERS: use my korg address for mt7601u
mlxsw: minimal: Fix deadlock in ports creation
bonding: fix reference count leak in balance-alb mode
net: usb: qmi_wwan: Add support for Cinterion MV32
bpf: Shut up kern_sys_bpf warning.
net/tls: Use RCU API to access tls_ctx->netdev
tls: rx: device: don't try to copy too much on detach
tls: rx: device: bound the frag walk
net_sched: cls_route: remove from list when handle is 0
selftests: forwarding: Fix failing tests with old libnet
net: refactor bpf_sk_reuseport_detach()
net: fix refcount bug in sk_psock_get (2)
selftests/bpf: Ensure sleepable program is rejected by hash map iter
selftests/bpf: Add write tests for sk local storage map iterator
selftests/bpf: Add tests for reading a dangling map iter fd
bpf: Only allow sleepable program for resched-able iterator
...
Linus Torvalds [Thu, 11 Aug 2022 20:26:09 +0000 (13:26 -0700)]
Merge tag 'acpi-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"These fix up direct references to the fwnode field in struct device
and extend ACPI device properties support.
Specifics:
- Replace direct references to the fwnode field in struct device with
dev_fwnode() and device_match_fwnode() (Andy Shevchenko)
- Make the ACPI code handling device properties support properties
with buffer values (Sakari Ailus)"
* tag 'acpi-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: property: Fix error handling in acpi_init_properties()
ACPI: VIOT: Do not dereference fwnode in struct device
ACPI: property: Read buffer properties as integers
ACPI: property: Add support for parsing buffer property UUID
ACPI: property: Unify integer value reading functions
ACPI: property: Switch node property referencing from ifs to a switch
ACPI: property: Move property ref argument parsing into a new function
ACPI: property: Use acpi_object_type consistently in property ref parsing
ACPI: property: Tie data nodes to acpi handles
ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
Ben Dooks [Wed, 13 Jul 2022 21:53:06 +0000 (22:53 +0100)]
RISC-V: cpu_ops_spinwait.c should include head.h
Running sparse shows cpu_ops_spinwait.c is missing two definitions
found in head.h, so include it to stop the following warnings:
arch/riscv/kernel/cpu_ops_spinwait.c:15:6: warning: symbol '__cpu_spinwait_stack_pointer' was not declared. Should it be static?
arch/riscv/kernel/cpu_ops_spinwait.c:16:6: warning: symbol '__cpu_spinwait_task_pointer' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220713215306.94675-1-ben.dooks@sifive.com Fixes: c78f94f35cf6 ("RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Thu, 11 Aug 2022 20:11:49 +0000 (13:11 -0700)]
Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull more iomap updates from Darrick Wong:
"In the past 10 days or so I've not heard any ZOMG STOP style
complaints about removing ->writepage support from gfs2 or zonefs, so
here's the pull request removing them (and the underlying fs iomap
support) from the kernel:
- Remove iomap_writepage and all callers, since the mm apparently
never called the zonefs or gfs2 writepage functions"
* tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: remove iomap_writepage
zonefs: remove ->writepage
gfs2: remove ->writepage
gfs2: stop using generic_writepages in gfs2_ail1_start_one
Ben Dooks [Thu, 14 Jul 2022 07:18:11 +0000 (08:18 +0100)]
RISC-V: Declare cpu_ops_spinwait in <asm/cpu_ops.h>
The cpu_ops_spinwait is used in a couple of places in arch/riscv
and is causing a sparse warning due to no declaration. Add this
to <asm/cpu_ops.h> with the others to fix the following:
arch/riscv/kernel/cpu_ops_spinwait.c:16:29: warning: symbol 'cpu_ops_spinwait' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220714071811.187491-1-ben.dooks@sifive.com
[Palmer: Drop the extern from cpu_ops.c] Fixes: 2ffc48fc7071 ("RISC-V: Move spinwait booting method to its own config") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Thu, 11 Aug 2022 19:41:07 +0000 (12:41 -0700)]
Merge tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"We have a good pile of various fixes and cleanups from Xiubo, Jeff,
Luis and others, almost exclusively in the filesystem.
Several patches touch files outside of our normal purview to set the
stage for bringing in Jeff's long awaited ceph+fscrypt series in the
near future. All of them have appropriate acks and sat in linux-next
for a while"
* tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits)
libceph: clean up ceph_osdc_start_request prototype
libceph: fix ceph_pagelist_reserve() comment typo
ceph: remove useless check for the folio
ceph: don't truncate file in atomic_open
ceph: make f_bsize always equal to f_frsize
ceph: flush the dirty caps immediatelly when quota is approaching
libceph: print fsid and epoch with osd id
libceph: check pointer before assigned to "c->rules[]"
ceph: don't get the inline data for new creating files
ceph: update the auth cap when the async create req is forwarded
ceph: make change_auth_cap_ses a global symbol
ceph: fix incorrect old_size length in ceph_mds_request_args
ceph: switch back to testing for NULL folio->private in ceph_dirty_folio
ceph: call netfs_subreq_terminated with was_async == false
ceph: convert to generic_file_llseek
ceph: fix the incorrect comment for the ceph_mds_caps struct
ceph: don't leak snap_rwsem in handle_cap_grant
ceph: prevent a client from exceeding the MDS maximum xattr size
ceph: choose auth MDS for getxattr with the Xs caps
ceph: add session already open notify support
...
Linus Torvalds [Thu, 11 Aug 2022 19:10:08 +0000 (12:10 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more kvm updates from Paolo Bonzini:
- Xen timer fixes
- Documentation formatting fixes
- Make rseq selftest compatible with glibc-2.35
- Fix handling of illegal LEA reg, reg
- Cleanup creation of debugfs entries
- Fix steal time cache handling bug
- Fixes for MMIO caching
- Optimize computation of number of LBRs
- Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table
Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline
KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh
KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers
KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES
KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals
KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test
KVM: selftests: Make rseq compatible with glibc-2.35
KVM: Actually create debugfs in kvm_create_vm()
KVM: Pass the name of the VM fd to kvm_create_vm_debugfs()
KVM: Get an fd before creating the VM
KVM: Shove vcpu stats_id init into kvm_vcpu_init()
KVM: Shove vm stats_id init into kvm_create_vm()
KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen
KVM: x86/mmu: rename trace function name for asynchronous page fault
KVM: x86/xen: Stop Xen timer before changing IRQ
KVM: x86/xen: Initialize Xen timer only once
KVM: SVM: Disable SEV-ES support if MMIO caching is disable
KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change
KVM: x86: Tag kvm_mmu_x86_module_init() with __init
...
Mark Kettenis [Thu, 7 Jul 2022 18:55:28 +0000 (20:55 +0200)]
riscv: dts: starfive: correct number of external interrupts
The PLIC integrated on the Vic_U7_Core integrated on the StarFive
JH7100 SoC actually supports 133 external interrupts. 127 of these
are exposed to the outside world; the remainder are used by other
devices that are part of the core-complex such as the L2 cache
controller. But all 133 interrupts are external interrupts as far
as the PLIC is concerned. Fix the property so that the driver can
manage these additional interrupts, which is important since the
interrupts for the L2 cache controller are enabled by default.
This adds the two PWM controlled LEDs to the HiFive Unmatched device
tree. D12 is just a regular green diode, but D2 is an RGB diode with 3
PWM inputs controlling the three different colours.
Signed-off-by: Emil Renner Berthing <emil.renner.berthing@canonical.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Ron Economos <re@w6rz.net> Link: https://lore.kernel.org/r/20220705210143.315151-5-emil.renner.berthing@canonical.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jakub Kicinski [Wed, 10 Aug 2022 16:45:47 +0000 (09:45 -0700)]
net: atm: bring back zatm uAPI
Jiri reports that linux-atm does not build without this header.
Bring it back. It's completely dead code but we can't break
the build for user space :(
Chen Lin [Thu, 11 Aug 2022 15:16:51 +0000 (23:16 +0800)]
dpaa2-eth: trace the allocated address instead of page struct
We should trace the allocated address instead of page struct.
Fixes: 27c874867c4e ("dpaa2-eth: Use a single page per Rx buffer") Signed-off-by: Chen Lin <chen.lin5@zte.com.cn> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20220811151651.3327-1-chen45464546@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merge changes adding support for device properties with buffer values
to the ACPI device properties handling code.
* acpi-properties:
ACPI: property: Fix error handling in acpi_init_properties()
ACPI: property: Read buffer properties as integers
ACPI: property: Add support for parsing buffer property UUID
ACPI: property: Unify integer value reading functions
ACPI: property: Switch node property referencing from ifs to a switch
ACPI: property: Move property ref argument parsing into a new function
ACPI: property: Use acpi_object_type consistently in property ref parsing
ACPI: property: Tie data nodes to acpi handles
ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
Masahiro Yamada [Sat, 25 Jun 2022 22:34:37 +0000 (07:34 +0900)]
riscv/purgatory: Omit use of bin2c
The .incbin assembler directive is much faster than bin2c + $(CC).
Do similar refactoring as in commit 4c0f032d4963 ("s390/purgatory:
Omit use of bin2c").
Please note the .quad directive matches to size_t in C (both 8 byte)
because the purgatory is compiled only for the 64-bit kernel.
(KEXEC_FILE depends on 64BIT).
Linus Torvalds [Thu, 11 Aug 2022 16:23:08 +0000 (09:23 -0700)]
Merge tag 'input-for-v5.20-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- changes to input core to properly queue synthetic events (such as
autorepeat) and to release multitouch contacts when an input device
is inhibited or suspended
- reworked quirk handling in i8042 driver that consolidates multiple
DMI tables into one and adds several quirks for TUXEDO line of
laptops
- update to mt6779 keypad to better reflect organization of the
hardware
- changes to mtk-pmic-keys driver preparing it to handle more variants
- facelift of adp5588-keys driver
- improvements to iqs7222 driver
- adjustments to various DT binding documents for input devices
- other assorted driver fixes.
* tag 'input-for-v5.20-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits)
Input: adc-joystick - fix ordering in adc_joystick_probe()
dt-bindings: input: ariel-pwrbutton: use spi-peripheral-props.yaml
Input: deactivate MT slots when inhibiting or suspending devices
Input: properly queue synthetic events
dt-bindings: input: iqs7222: Use central 'linux,code' definition
Input: i8042 - add dritek quirk for Acer Aspire One AO532
dt-bindings: input: gpio-keys: accept also interrupt-extended
dt-bindings: input: gpio-keys: reference input.yaml and document properties
dt-bindings: input: gpio-keys: enforce node names to match all properties
dt-bindings: input: Convert adc-keys to DT schema
dt-bindings: input: Centralize 'linux,input-type' definition
dt-bindings: input: Use common 'linux,keycodes' definition
dt-bindings: input: Centralize 'linux,code' definition
dt-bindings: input: Increase maximum keycode value to 0x2ff
Input: mt6779-keypad - implement row/column selection
Input: mt6779-keypad - match hardware matrix organization
Input: i8042 - add additional TUXEDO devices to i8042 quirk tables
Input: goodix - switch use of acpi_gpio_get_*_resource() APIs
Input: i8042 - add TUXEDO devices to i8042 quirk tables
Input: i8042 - add debug output for quirks
...
Jialiang Wang [Wed, 10 Aug 2022 07:30:57 +0000 (15:30 +0800)]
nfp: fix use-after-free in area_cache_get()
area_cache_get() is used to distribute cache->area and set cache->id,
and if cache->id is not 0 and cache->area->kref refcount is 0, it will
release the cache->area by nfp_cpp_area_release(). area_cache_get()
set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().
But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
is already set but the refcount is not increased as expected. At this
time, calling the nfp_cpp_area_release() will cause use-after-free.
To avoid the use-after-free, set cache->id after area_init() and
nfp_cpp_area_acquire() complete successfully.
Note: This vulnerability is triggerable by providing emulated device
equipped with specified configuration.
BUG: KASAN: use-after-free in nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1
Xianting Tian [Thu, 11 Aug 2022 07:41:48 +0000 (15:41 +0800)]
RISC-V: Add modules to virtual kernel memory layout dump
Modules always live before the kernel, MODULES_END is fixed but
MODULES_VADDR isn't fixed, it depends on the kernel size.
Let's add it to virtual kernel memory layout dump.
As MODULES is only defined for CONFIG_64BIT, so we dump it when
CONFIG_64BIT=y.
Xianting Tian [Thu, 11 Aug 2022 07:41:47 +0000 (15:41 +0800)]
RISC-V: Fixup schedule out issue in machine_crash_shutdown()
Current task of executing crash kexec will be schedule out when panic is
triggered by RCU Stall, as it needs to wait rcu completion. It lead to
inability to enter the crash system.
The implementation of machine_crash_shutdown() is non-standard for RISC-V
according to other Arch's implementation(eg, x86, arm64), we need to send
IPI to stop secondary harts.
Xianting Tian [Thu, 11 Aug 2022 07:41:46 +0000 (15:41 +0800)]
RISC-V: Fixup get incorrect user mode PC for kernel mode regs
When use 'echo c > /proc/sysrq-trigger' to trigger kdump, riscv_crash_save_regs()
will be called to save regs for vmcore, we found "epc" value 00ffffffa5537400
is not a valid kernel virtual address, but is a user virtual address. Other
regs(eg, ra, sp, gp...) are correct kernel virtual address.
Actually 0x00ffffffb0dd9400 is the user mode PC of 'PID: 113 Comm: sh', which
is saved in the task's stack.
Xianting Tian [Thu, 11 Aug 2022 07:41:45 +0000 (15:41 +0800)]
RISC-V: kexec: Fixup use of smp_processor_id() in preemptible context
Use __smp_processor_id() to avoid check the preemption context when
CONFIG_DEBUG_PREEMPT enabled, as we will enter crash kernel and no
return.
Without the patch,
[ 103.781044] sysrq: Trigger a crash
[ 103.784625] Kernel panic - not syncing: sysrq triggered crash
[ 103.837634] CPU1: off
[ 103.889668] CPU2: off
[ 103.933479] CPU3: off
[ 103.939424] Starting crashdump kernel...
[ 103.943442] BUG: using smp_processor_id() in preemptible [00000000] code: sh/346
[ 103.950884] caller is debug_smp_processor_id+0x1c/0x26
[ 103.956051] CPU: 0 PID: 346 Comm: sh Kdump: loaded Not tainted 5.10.113-00002-gce03f03bf4ec-dirty #149
[ 103.965355] Call Trace:
[ 103.967805] [<ffffffe00020372a>] walk_stackframe+0x0/0xa2
[ 103.973206] [<ffffffe000bcf1f4>] show_stack+0x32/0x3e
[ 103.978258] [<ffffffe000bd382a>] dump_stack_lvl+0x72/0x8e
[ 103.983655] [<ffffffe000bd385a>] dump_stack+0x14/0x1c
[ 103.988705] [<ffffffe000bdc8fe>] check_preemption_disabled+0x9e/0xaa
[ 103.995057] [<ffffffe000bdc926>] debug_smp_processor_id+0x1c/0x26
[ 104.001150] [<ffffffe000206c64>] machine_kexec+0x22/0xd0
[ 104.006463] [<ffffffe000291a7e>] __crash_kexec+0x6a/0xa4
[ 104.011774] [<ffffffe000bcf3fa>] panic+0xfc/0x2b0
[ 104.016480] [<ffffffe000656ca4>] sysrq_reset_seq_param_set+0x0/0x70
[ 104.022745] [<ffffffe000657310>] __handle_sysrq+0x8c/0x154
[ 104.028229] [<ffffffe0006577e8>] write_sysrq_trigger+0x5a/0x6a
[ 104.034061] [<ffffffe0003d90e0>] proc_reg_write+0x58/0xd4
[ 104.039459] [<ffffffe00036cff4>] vfs_write+0x7e/0x254
[ 104.044509] [<ffffffe00036d2f6>] ksys_write+0x58/0xbe
[ 104.049558] [<ffffffe00036d36a>] sys_write+0xe/0x16
[ 104.054434] [<ffffffe000201b9a>] ret_from_syscall+0x0/0x2
[ 104.067863] Will call new kernel at ecc00000 from hart id 0
[ 104.074939] FDT image at fc5ee000
[ 104.079523] Bye...
With the patch we can got clear output,
[ 67.740553] sysrq: Trigger a crash
[ 67.744166] Kernel panic - not syncing: sysrq triggered crash
[ 67.809123] CPU1: off
[ 67.865210] CPU2: off
[ 67.909075] CPU3: off
[ 67.919123] Starting crashdump kernel...
[ 67.924900] Will call new kernel at ecc00000 from hart id 0
[ 67.932045] FDT image at fc5ee000
[ 67.935560] Bye...
Fixes: 0e105f1d0037 ("riscv: use hart id instead of cpu id on machine_kexec") Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-2-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jay Vosburgh [Thu, 11 Aug 2022 05:06:53 +0000 (22:06 -0700)]
bonding: fix reference count leak in balance-alb mode
Commit d5410ac7b0ba ("net:bonding:support balance-alb interface
with vlan to bridge") introduced a reference count leak by not releasing
the reference acquired by ip_dev_find(). Remedy this by insuring the
reference is released.
Fixes: d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/26758.1660194413@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The clang -Wformat warning is terminally broken, and the clang people
can't seem to get their act together.
This test program causes a warning with clang:
#include <stdio.h>
int main(int argc, char **argv)
{
printf("%hhu\n", 'a');
}
resulting in
t.c:5:19: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
printf("%hhu\n", 'a');
~~~~ ^~~
%d
and apparently clang people consider that a feature, because they don't
want to face the reality of how either C character constants, C
arithmetic, and C varargs functions work.
The rest of the world just shakes their head at that kind of
incompetence, and turns off -Wformat for clang again.
And no, the "you should use a pointless cast to shut this up" is not a
valid answer. That warning should not exist in the first place, or at
least be optinal with some "-Wformat-me-harder" kind of option.
[ Admittedly, there's also very little reason to *ever* use '%hh[ud]' in
C, but what little reason there is is entirely about 'I want to see
only the low 8 bits of the argument'. So I would suggest nobody ever
use that format in the first place, but if they do, the clang
behavious is simply always wrong. Because '%hhu' takes an 'int'. It's
that simple. ]
Mikulas Patocka [Wed, 10 Aug 2022 20:16:34 +0000 (16:16 -0400)]
dm bufio: fix some cases where the code sleeps with spinlock held
Commit b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag")
added a "NO_SLEEP" mode, it replaces a mutex with a spinlock, and it
is only usable when the device is in read-only mode (because the write
path may be sleeping while holding the dm_bufio_client lock).
However, there are still two points where the code could sleep even in
read-only mode. One is in __get_unclaimed_buffer -> __make_buffer_clean.
The other is in __try_evict_buffer -> __make_buffer_clean. These functions
will call __make_buffer_clean which sleeps if the buffer is being read.
Fix these cases so that if c->no_sleep is set __make_buffer_clean
will not be called and the buffer will be skipped instead.
Slark Xiao [Wed, 10 Aug 2022 01:45:21 +0000 (09:45 +0800)]
net: usb: qmi_wwan: Add support for Cinterion MV32
There are 2 models for MV32 serials. MV32-W-A is designed
based on Qualcomm SDX62 chip, and MV32-W-B is designed based
on Qualcomm SDX65 chip. So we use 2 different PID to separate it.
Eli Cohen [Thu, 11 Aug 2022 13:40:10 +0000 (16:40 +0300)]
vdpa/mlx5: Fix possible uninitialized return value
Initialize err local variable to return -EAGAIN if the asid cannot be
found thus avoiding returning uninitialized value.
Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220811134010.952291-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vdpa_sim_blk: add support for discard and write-zeroes
Expose VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES features
to the drivers and handle VIRTIO_BLK_T_DISCARD and
VIRTIO_BLK_T_WRITE_ZEROES requests checking ranges and flags.
The simulator behaves like a ramdisk, so for VIRTIO_BLK_F_DISCARD
does nothing, while for VIRTIO_BLK_T_WRITE_ZEROES sets to 0 the
specified region.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220811083632.77525-5-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The simulator behaves like a ramdisk, so we don't have to do
anything when a VIRTIO_BLK_T_FLUSH request is received, but it
could be useful to test driver behavior.
Let's expose the VIRTIO_BLK_F_FLUSH feature to inform the driver
that we support the flush command.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220811083632.77525-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests
Next patches will add handling of other requests, where will be
useful to reuse vdpasim_blk_check_range().
So let's make it more generic by adding the `max_sectors` parameter,
since different requests allow different numbers of maximum sectors.
Let's also print the messages directly in vdpasim_blk_check_range()
to avoid duplicate prints.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220811083632.77525-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vdpa_sim_blk: check if sector is 0 for commands other than read or write
VIRTIO spec states: "The sector number indicates the offset
(multiplied by 512) where the read or write is to occur. This field is
unused and set to 0 for commands other than read or write."
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220811083632.77525-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Eugenio Pérez [Wed, 10 Aug 2022 17:15:12 +0000 (19:15 +0200)]
vdpa_sim: Implement suspend vdpa op
Implement suspend operation for vdpa_sim devices, so vhost-vdpa will
offer that backend feature and userspace can effectively suspend the
device.
This is a must before get virtqueue indexes (base) for live migration,
since the device could modify them after userland gets them. There are
individual ways to perform that action for some devices
(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
way to perform it for any vhost device (and, in particular, vhost-vdpa).
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220810171512.2343333-5-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Eugenio Pérez [Wed, 10 Aug 2022 17:15:11 +0000 (19:15 +0200)]
vhost-vdpa: uAPI to suspend the device
The ioctl adds support for suspending the device from userspace.
This is a must before getting virtqueue indexes (base) for live migration,
since the device could modify them after userland gets them. There are
individual ways to perform that action for some devices
(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
way to perform it for any vhost device (and, in particular, vhost-vdpa).
After a successful return of the ioctl call the device must not process
more virtqueue descriptors. The device can answer to read or writes of
config fields as if it were not suspended. In particular, writing to
"queue_enable" with a value of 1 will not make the device start
processing buffers of the virtqueue.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220810171512.2343333-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Eugenio Pérez [Wed, 10 Aug 2022 17:15:10 +0000 (19:15 +0200)]
vhost-vdpa: introduce SUSPEND backend feature bit
Userland knows if it can suspend the device or not by checking this feature
bit.
It's only offered if the vdpa driver backend implements the suspend()
operation callback, and to offer it or userland to ack it if the backend
does not offer that callback is an error.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20220810171512.2343333-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Shigeru Yoshida [Wed, 10 Aug 2022 16:09:48 +0000 (01:09 +0900)]
virtio-blk: Avoid use-after-free on suspend/resume
hctx->user_data is set to vq in virtblk_init_hctx(). However, vq is
freed on suspend and reallocated on resume. So, hctx->user_data is
invalid after resume, and it will cause use-after-free accessing which
will result in the kernel crash something like below:
vDPA: !FEATURES_OK should not block querying device config space
Users may want to query the config space of a vDPA device,
to choose a appropriate one for a certain guest. This means the
users need to read the config space before FEATURES_OK, and
the existence of config space contents does not depend on
FEATURES_OK.
The spec says:
The device MUST allow reading of any device-specific configuration
field before FEATURES_OK is set by the driver. This includes
fields which are conditional on feature bits, as long as those
feature bits are offered by the device.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20220722115309.82746-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vDPA/ifcvf: support userspace to query features and MQ of a management device
Adapting to current netlink interfaces, this commit allows userspace
to query feature bits and MQ capability of a management device.
Currently both the vDPA device and the management device are the VF itself,
thus this ifcvf should initialize the virtio capabilities in probe() before
setting up the struct vdpa_mgmt_dev.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20220722115309.82746-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vDPA/ifcvf: get_config_size should return a value no greater than dev implementation
Drivers must not access a BAR outside the capability length,
and for a virtio device, ifcvf driver should not report any non-standard
capability contents to the upper layers.
Function ifcvf_get_config_size() is introduced here to return a safe value
of the device config capability size.
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Message-Id: <20220722115309.82746-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Mike Christie [Fri, 8 Jul 2022 03:05:25 +0000 (22:05 -0500)]
vhost scsi: Allow user to control num virtqueues
We are currently hard coded to always create 128 IO virtqueues, so this
adds a modparam to control it. For large systems where we are ok with
using memory for virtqueues it allows us to add up to 1024. This limit
was just selected becuase that's qemu's limit.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20220708030525.5065-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
Mike Christie [Fri, 8 Jul 2022 03:05:24 +0000 (22:05 -0500)]
vhost-scsi: Fix max number of virtqueues
Qemu takes it's num_queues limit then adds the fixed queues (control and
event) to the total it will request from the kernel. So when a user
requests 128 (or qemu does it's num_queues calculation based on vCPUS
and other system limits), we hit errors due to userspace trying to setup
130 queues when vhost-scsi has a hard coded limit of 128.
This has vhost-scsi adjust it's max so we can do a total of 130 virtqueues
(128 IO and 2 fixed). For the case where the user has 128 vCPUs the guest
OS can then nicely map each IO virtqueue to a vCPU and not have the odd case
where 2 vCPUs share a virtqueue.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20220708030525.5065-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
Eli Cohen [Thu, 14 Jul 2022 11:39:27 +0000 (14:39 +0300)]
vdpa/mlx5: Support different address spaces for control and data
Partition virtqueues to two different address spaces: one for control
virtqueue which is implemented in software, and one for data virtqueues.
Based-on: <20220526124338.36247-1-eperezma@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com>
Message-Id: <20220714113927.85729-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>