]>
Commit | Line | Data |
---|---|---|
6810c247 FE |
1 | From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001 |
2 | From: Sean Christopherson <seanjc@google.com> | |
3 | Date: Wed, 23 Aug 2023 18:01:04 -0700 | |
4 | Subject: [PATCH] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that | |
5 | hangs vCPUs | |
6 | MIME-Version: 1.0 | |
7 | Content-Type: text/plain; charset=UTF-8 | |
8 | Content-Transfer-Encoding: 8bit | |
9 | ||
10 | Upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in | |
11 | kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring | |
12 | how KVM tracks the sequence counter snapshot. | |
13 | ||
14 | Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int" | |
15 | when checking to see if a page fault is stale, as the sequence count is | |
16 | stored as an "unsigned long" everywhere else in KVM. This fixes a bug | |
17 | where KVM will effectively hang vCPUs due to always thinking page faults | |
18 | are stale, which results in KVM refusing to "fix" faults. | |
19 | ||
20 | mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when | |
21 | KVM is handling page faults to detect if userspace mappings relevant to | |
22 | the guest were invalidated between snapshotting the counter and acquiring | |
23 | mmu_lock, i.e. to ensure that the userspace mapping KVM is using to | |
24 | resolve the page fault is fresh. If KVM sees that the counter has | |
25 | changed, KVM simply resumes the guest without fixing the fault. | |
26 | ||
27 | What _should_ happen is that the source of the mmu_notifier invalidations | |
28 | eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once | |
29 | again fix guest page fault(s). | |
30 | ||
31 | But for a long-lived VM and/or a VM that the host just doesn't particularly | |
32 | like, it's possible for a VM to be on the receiving end of 2 billion (with | |
33 | a B) mmu_notifier invalidations. When that happens, bit 31 will be set in | |
34 | mmu_invalidate_seq. This causes the value to be turned into a 32-bit | |
35 | negative value when implicitly cast to an "int" by is_page_fault_stale(), | |
36 | and then sign-extended into a 64-bit unsigned when the signed "int" is | |
37 | implicitly cast back to an "unsigned long" on the call to | |
38 | mmu_invalidate_retry_hva(). | |
39 | ||
40 | As a result of the casting and sign-extension, given a sequence counter of | |
41 | e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing | |
42 | ||
43 | if (0x8002dc25 != 0xffffffff8002dc25) | |
44 | ||
45 | and signals that the page fault is stale and needs to be retried even | |
46 | though the sequence counter is stable, and KVM effectively hangs any vCPU | |
47 | that takes a page fault (EPT violation or #NPF when TDP is enabled). | |
48 | ||
49 | Reported-by: Brian Rak <brak@vultr.com> | |
50 | Reported-by: Amaan Cheval <amaan.cheval@gmail.com> | |
51 | Reported-by: Eric Wheeler <kvm@lists.ewheeler.net> | |
52 | Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com | |
53 | Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update") | |
54 | Signed-off-by: Sean Christopherson <seanjc@google.com> | |
55 | Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> | |
56 | (cherry-picked from commit 82d811ff566594de3676f35808e8a9e19c5c864c in stable v6.1.51) | |
57 | Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> | |
58 | --- | |
59 | arch/x86/kvm/mmu/mmu.c | 3 ++- | |
60 | 1 file changed, 2 insertions(+), 1 deletion(-) | |
61 | ||
62 | diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c | |
63 | index 3220c1285984..c42ba5cde7a4 100644 | |
64 | --- a/arch/x86/kvm/mmu/mmu.c | |
65 | +++ b/arch/x86/kvm/mmu/mmu.c | |
66 | @@ -4261,7 +4261,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) | |
67 | * root was invalidated by a memslot update or a relevant mmu_notifier fired. | |
68 | */ | |
69 | static bool is_page_fault_stale(struct kvm_vcpu *vcpu, | |
70 | - struct kvm_page_fault *fault, int mmu_seq) | |
71 | + struct kvm_page_fault *fault, | |
72 | + unsigned long mmu_seq) | |
73 | { | |
74 | struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa); | |
75 |