Thomas Lamprecht [Thu, 20 Jun 2024 08:54:47 +0000 (10:54 +0200)]
fix #5448: support SCSI contollers with bad VDP page length encoding again
The reporter has an Adaptec 5805 controller (using the aacraid
driver), which reports a byteswapped page length for VPD page 0. It
reports "02 00" as page length instead of "00 02".
This stopped working with kernel 6.8.4 due to commit b5fc07a5fb56
("scsi: core: Consult supported VPD page list prior to fetching page")
To address that issue limit the page search scope to the size of our
VPD buffer to guard against devices returning a larger page count than
requested.
Reported-by: Peter Schneider <pschneider1968@googlemail.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Mon, 10 Jun 2024 11:34:38 +0000 (13:34 +0200)]
update fix for managing block flush queue list
The patch from commit e5731f4 ("backport fix for managing block flush
queue list") caused some fallout when used with LVM on root, as that
uses some rather odd (but previously working fine) PREFLUSH
| POSTFLUSH format that was now causing the list to be used without
being initialized, resulting in freezes.
Fiona Ebner [Thu, 16 May 2024 09:06:30 +0000 (11:06 +0200)]
backport fix for NFS memory leak
Reported in the community forum [0] and easy to reproduce by doing
e.g.
> while true; do mount -t nfs 192.168.20.148:/rpool/data /mnt/test; done
from another node for a share that does not exist or for which the
client has no permissions.
The original fix disabled the xsaves feature for zen1/2. The issue has
since been fixed in the cpus microcode and this patch keeps the feature enabled
if the microcode version is recent enough to contain the fix.
rather exotic driver with frequent security issues over the past months, see
- CVE-2023-6546
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67c37756898a
- https://lore.kernel.org/all/DB9PR10MB5881D2170678C169FB42A423E0082@DB9PR10MB5881.EURPRD10.PROD.OUTLOOK.COM/
add apparmor patch to fix recvmsg returning EINVAL
With apparmor 4, when recvmsg() calls are checked by the apparmor LSM
they will always return EINVAL.
This causes very weird issues when apparmor profiles are in use, and a
lot of networking issues in containers (which are always using
apparmor).
When coming from sys_recvmsg, msg->msg_namelen is explicitly set to
zero early on. (see ____sys_recvmsg in net/socket.c)
We still end up in 'map_addr' where the assumption is that addr !=
NULL means addrlen has a valid size.
This is likely not a final fix, it was suggested by jjohansen on irc
to get things going until this is resolved properly.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Thomas Lamprecht [Mon, 11 Mar 2024 13:19:45 +0000 (14:19 +0100)]
Revert "cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts"
This reverts commit 29cb6fcbb78e0d2b0b585783031402cc8d4ca148, user
feedback was showing any positive impact of this patch, and upstream
still hasn't a fix for older stable releases (but for 6.8), so for now
rather revert this and wait for either a better (well, actual) fix or
updating to 6.8 or newer.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Friedrich Weber [Wed, 17 Jan 2024 14:45:21 +0000 (15:45 +0100)]
cherry-pick scheduler fix to avoid temporary VM freezes on NUMA hosts
Users have been reporting [1] that VMs occasionally become
unresponsive with high CPU usage for some time (varying between ~1 and
more than 60 seconds). After that time, the guests come back and
continue running. Windows VMs seem most affected (not responding to
pings during the hang, RDP sessions time out), but we also got reports
about Linux VMs (reporting soft lockups). The issue was not present on
host kernel 5.15 and was first reported with kernel 6.2. Users
reported that the issue becomes easier to trigger the more memory is
assigned to the guests. Setting mitigations=off was reported to
alleviate (but not eliminate) the issue. For most users the issue
seems to disappear after (also) disabling KSM [2], but some users
reported freezes even with KSM disabled [3].
It turned out the reports concerned NUMA hosts only, and that the
freezes correlated with runs of the NUMA balancer [4]. Users reported
that disabling the NUMA balancer resolves the issue (even with KSM
enabled).
We put together a Linux VM reproducer, ran a git-bisect on the kernel
to find the commit introducing the issue and asked upstream for help
[5]. As it turned out, an upstream bugreport was recently opened [6]
and a preliminary fix to the KVM TDP MMU was proposed [7]. With that
patch [7] on top of kernel 6.7, the reproducer does not trigger
freezes anymore. As of now, the patch (or its v2 [8]) is not yet
merged in the mainline kernel, and backporting it may be difficult due
to dependencies on other KVM changes [9].
However, the bugreport [6] also prompted an upstream developer to
propose a patch to the kernel scheduler logic that decides whether a
contended spinlock/rwlock should be dropped [10]. Without the patch,
PREEMPT_DYNAMIC kernels (such as ours) would always drop contended
locks. With the patch, the kernel only drops contended locks if the
kernel is currently set to preempt=full. As noted in the commit
message [10], this can (counter-intuitively) improve KVM performance.
Our kernel defaults to preempt=voluntary (according to
/sys/kernel/debug/sched/preempt), so with the patch it does not drop
contended locks anymore, and the reproducer does not trigger freezes
anymore. Hence, backport [10] to our kernel.
the signed template together with the binary package(s) containing the unsigned
files form the input to our secure boot signing service.
the signed template consists of
- files.json (specifying which files are signed how and by which key)
- packaging template used to build the signed package(s)
the signing service
- extracts and checks the signed-template binary package
- extracts the unsigned package(s)
- signs the needed files
- packs up the signatures + the template contained in the signed-template
package into the signed source package
the signed source package can then be built in the regular fashion (in case of
the kernel packages, it will copy the kernel image, modules and some helper
files from the unsigned package, attach the signature created by the signing
service, and re-pack the result as signed-kernel package).
Thomas Lamprecht [Thu, 16 Nov 2023 12:25:01 +0000 (13:25 +0100)]
d/rules: temporarily disable UBSAN bound checks again
it's really not just ZFS and AMDGPU modules, but way more and
generating scary looking messages for these "issues" is just noise
that drown real issues. Disable this for now, maybe in another few
years.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 12 Nov 2023 15:33:18 +0000 (16:33 +0100)]
d/rules: disable CONFIG_WQ_CPU_INTENSIVE_REPORT for now
it's mostly noise for users, and quiet some interpret this as real
problem and report it to us.
Ideally we'd either educate them, or take time ourself, to report this
upstream and see if the situation can be improved overall, but
currently that's not feasible. We should check this out a few releases
down, if the lower hanging fruits got fixed and noise got lower we
could enable it again to catch the more rare cases.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 12 Nov 2023 15:19:15 +0000 (16:19 +0100)]
update ZFS to get better work-around for UBSAN bounds-checking
We have a slightly better fix where only a few targeted ZFS module
parts are added to the UBSAN ignore-list, so the rest of the kernel
still gets exposure.
revert "memfd: improve userspace warnings for missing exec-related flags"
This is generating far too much noise in the logs, so keep it at once
per boot until we (and other user space tools) adapted to the kernel
wanting user space to chose memfd execution behavior very explicitly.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>