the signed template together with the binary package(s) containing the unsigned
files form the input to our secure boot signing service.
the signed template consists of
- files.json (specifying which files are signed how and by which key)
- packaging template used to build the signed package(s)
the signing service
- extracts and checks the signed-template binary package
- extracts the unsigned package(s)
- signs the needed files
- packs up the signatures + the template contained in the signed-template
package into the signed source package
the signed source package can then be built in the regular fashion (in case of
the kernel packages, it will copy the kernel image, modules and some helper
files from the unsigned package, attach the signature created by the signing
service, and re-pack the result as signed-kernel package).
Thomas Lamprecht [Thu, 16 Nov 2023 12:25:01 +0000 (13:25 +0100)]
d/rules: temporarily disable UBSAN bound checks again
it's really not just ZFS and AMDGPU modules, but way more and
generating scary looking messages for these "issues" is just noise
that drown real issues. Disable this for now, maybe in another few
years.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 12 Nov 2023 15:33:18 +0000 (16:33 +0100)]
d/rules: disable CONFIG_WQ_CPU_INTENSIVE_REPORT for now
it's mostly noise for users, and quiet some interpret this as real
problem and report it to us.
Ideally we'd either educate them, or take time ourself, to report this
upstream and see if the situation can be improved overall, but
currently that's not feasible. We should check this out a few releases
down, if the lower hanging fruits got fixed and noise got lower we
could enable it again to catch the more rare cases.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sun, 12 Nov 2023 15:19:15 +0000 (16:19 +0100)]
update ZFS to get better work-around for UBSAN bounds-checking
We have a slightly better fix where only a few targeted ZFS module
parts are added to the UBSAN ignore-list, so the rest of the kernel
still gets exposure.
revert "memfd: improve userspace warnings for missing exec-related flags"
This is generating far too much noise in the logs, so keep it at once
per boot until we (and other user space tools) adapted to the kernel
wanting user space to chose memfd execution behavior very explicitly.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Thomas Lamprecht [Sat, 21 Oct 2023 13:16:25 +0000 (15:16 +0200)]
backport constraining guest-supported xfeatures only at KVM_GET_XSAVE{2}
This improves compatibility for guests w.r.t. live-migration, or live
snapshot rollback, to hosts with less (FPU) xfeatures supported, as
long as the set of features that was actually exposed to the guest is
still supported.
This improves on the ad856280ddea ("x86/kvm/fpu: Limit guest
user_xfeatures to supported bits of XCR0") bug fix.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Stefan Sterz [Thu, 19 Oct 2023 14:59:12 +0000 (16:59 +0200)]
backport exposing FLUSHBYASID when running nested VMs on AMD CPUs
this exposes the FLUSHBYASID CPU flag to nested VMs when running on an
AMD CPU. also reverts a made up check that would advertise
FLUSHBYASID as not supported. this enable certain modern hypervisors
such as VMWare ESXi 7 and Workstation 17 to run nested VMs properly
again.
Thomas Lamprecht [Tue, 10 Oct 2023 12:56:04 +0000 (14:56 +0200)]
update ZFS for backport of Intel AMX errata fix
From the upstream commit [0] that this update pulls in:
> Intel SPR erratum SPR4 says that if you trip into a vmexit while
> doing FPU save/restore, your AMX register state might misbehave...
> and by misbehave, I mean save all zeroes incorrectly, leading to
> explosions if you restore it.
>
> Since we're not using AMX for anything, the simple way to avoid
> this is to just not save/restore those when we do anything, since
> we're killing preemption of any sort across our save/restores.
>
> If we ever decide to use AMX, it's not clear that we have any
> way to mitigate this, on Linux...but I am not an expert.
The latest amd64-microcode package in sid [0] (which probably will
eventually make it to bookworm-security) has a change that requires
the added patch to work properly.
The changelog-entry refers to stable k.o branches only - but a quick
look through the linux-firmware.git log identifies:
`f2eb058afc57348cde66852272d6bf11da1eef8f` as relevant commit, which
refers (as NOTE in the patch) to: a32b0f0db3f3 ("x86/microcode/AMD: Load late on both threads too")
which applies cleanly (although I cherry-picked the patch from the
6.1.y stable branch to have the original commit in the commit
message).
quickly tested compiling and booting the result in a VM (however w/o
a fitting CPU (Epyc Genoa or Bergamo) it should cause a change)
reported in our Enterprise Support as potential culprit for one
thread from 128 being reported as offline in `lscpu`
Thomas Lamprecht [Mon, 18 Sep 2023 08:32:05 +0000 (10:32 +0200)]
backport thunderbolt-net fixes
A user of ours reported an issue with p2p thunderbolt-net w.r.t. IPv6
and failure to reestablish the connection after a reboot of a peer
node, in the forum [0] and the relayed it upstream, so lets
cherry-pick those two patches to our 6.2. Especially the IPv6 one
seems straight forward, and the other one makes it actually spec
conform and should only improve things.
[0]: https://forum.proxmox.com/threads/133104/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The mailing list thread [0] (found by Friedrich, many thanks!) leading
up to this patch sounds very familiar to issues users reported in the
community forum [1] and enterprise support channel, where a VM would
be stuck for no discernable reason with all vCPU threads spinning.
Fiona Ebner [Fri, 25 Aug 2023 09:26:50 +0000 (11:26 +0200)]
cherry-pick fix to surpress faulty segfault logging
While there is no actual issue, users are still nervous about the
faulty logging [0]. It might take a while until the fix comes in via
upstream, so just pick it up manually.
Stoiko Ivanov [Fri, 18 Aug 2023 11:35:08 +0000 (13:35 +0200)]
d/rules: disable CONFIG_GDS_FORCE_MITIGATION
when not having installed an intel-microcode version containing the
mitigation, this options disables AVX instructions, which breaks quite
a lot of software (e.g. firefox, electron apps)
Reported-by: Stefan Hanreich <s.hanreich@proxmox.com> Tested-by: Stefan Hanreich <s.hanreich@proxmox.com> Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Fiona Ebner [Mon, 14 Aug 2023 12:59:17 +0000 (14:59 +0200)]
add patch for igc tx timeout issue
There were several reports about issues related to igc and tx timeout
and while the issue couldn't be reproduced locally, the hope is that
this fix Friedrich found will resolve the issue for the users. The
kernel versions in the reports would match with when 9b275176270e
("igc: Add ndo_tx_timeout support"), i.e. the one fixed by this
commit, landed.
at build time, an ephemeral key pair will be generated and all built modules
will be signed with it. the private key is discarded, and the public key
embedded in the kernel image for signature validation at module load time.
this change means that every kernel release must be considered an ABI change
from now on, else the signatures of on-disk modules and the signing key
embedded in the running kernel image might not match.
long overdue, and avoids the issue of the meta packages version going down
after being folded in from the pve-kernel-meta repository.
the ABI needs to be bumped for every published kernel package now that modules
are signed, else the booted kernel image containing the public part of the
ephemeral signing key, and the on-disk (potentially upgraded in-place) signed
module files can disagree, and module loading would fail.
not changed (yet): git repository name, pve-firmware