arch/arm64/kvm/arm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
In the unlikely case pKVM failed to allocate carveout, the error path
tries to access NULL ptr when it de-reference the SVE state from the
uninitialized nVHE per-cpu base.
[ 1.575420] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.576010] pc : teardown_hyp_mode+0xe4/0x180
[ 1.576920] lr : teardown_hyp_mode+0xd0/0x180
[ 1.577308] sp : ffff8000826fb9d0
[ 1.577600] x29: ffff8000826fb9d0 x28: 0000000000000000 x27: ffff80008209b000
[ 1.578383] x26: ffff800081dde000 x25: ffff8000820493c0 x24: ffff80008209eb00
[ 1.579180] x23: 0000000000000040 x22: 0000000000000001 x21: 0000000000000000
[ 1.579881] x20: 0000000000000002 x19: ffff800081d540b8 x18: 0000000000000000
[ 1.580544] x17: ffff800081205230 x16: 0000000000000152 x15: 00000000fffffff8
[ 1.581183] x14: 0000000000000008 x13: fff00000ff7f6880 x12: 000000000000003e
[ 1.581813] x11: 0000000000000002 x10: 00000000000000ff x9 : 0000000000000000
[ 1.582503] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 : 43485e525851ff30
[ 1.583140] x5 : fff00000ff6e9030 x4 : fff00000ff6e8f80 x3 : 0000000000000000
[ 1.583780] x2 : 0000000000000000 x1 : 0000000000000002 x0 : 0000000000000000
[ 1.584526] Call trace:
[ 1.584945] teardown_hyp_mode+0xe4/0x180 (P)
[ 1.585578] init_hyp_mode+0x920/0x994
[ 1.586005] kvm_arm_init+0xb4/0x25c
[ 1.586387] do_one_initcall+0xe0/0x258
[ 1.586819] do_initcall_level+0xa0/0xd4
[ 1.587224] do_initcalls+0x54/0x94
[ 1.587606] do_basic_setup+0x1c/0x28
[ 1.587998] kernel_init_freeable+0xc8/0x130
[ 1.588409] kernel_init+0x20/0x1a4
[ 1.588768] ret_from_fork+0x10/0x20
[ 1.589568] Code: f875db48 8b1c0109 f100011f 9a8903e8 (f9463100)
[ 1.590332] ---[ end trace 0000000000000000 ]---
It seems also in other cases, it can try to free NULL ptrs, so add the
proper NULL checks in teardown_hyp_mode(), we can skip the loop early
if any of the ptrs is NULL as the order of free matches the order of
initialization.
I initially observed this on 6.12, but I could also repro in master.
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
arch/arm64/kvm/arm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 38a91bb5d4c7..5bb36c3b06b5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2344,15 +2344,22 @@ static void __init teardown_hyp_mode(void)
int cpu;
free_hyp_pgds();
+ /* Order matches the order of initialization init_hyp_mode() */
for_each_possible_cpu(cpu) {
+ if (!per_cpu(kvm_arm_hyp_stack_base, cpu))
+ continue;
free_pages(per_cpu(kvm_arm_hyp_stack_base, cpu), NVHE_STACK_SHIFT - PAGE_SHIFT);
+
+ if (!kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu])
+ continue;
free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
if (free_sve) {
struct cpu_sve_state *sve_state;
sve_state = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state;
- free_pages((unsigned long) sve_state, pkvm_host_sve_state_order());
+ if (sve_state)
+ free_pages((unsigned long) sve_state, pkvm_host_sve_state_order());
}
}
}
--
2.50.0.714.g196bf9f422-goog
On Wednesday 25 Jun 2025 at 11:33:01 (+0000), Mostafa Saleh wrote: > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 38a91bb5d4c7..5bb36c3b06b5 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -2344,15 +2344,22 @@ static void __init teardown_hyp_mode(void) > int cpu; > > free_hyp_pgds(); > + /* Order matches the order of initialization init_hyp_mode() */ > for_each_possible_cpu(cpu) { > + if (!per_cpu(kvm_arm_hyp_stack_base, cpu)) > + continue; > free_pages(per_cpu(kvm_arm_hyp_stack_base, cpu), NVHE_STACK_SHIFT - PAGE_SHIFT); > + > + if (!kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu]) > + continue; > free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order()); > > if (free_sve) { > struct cpu_sve_state *sve_state; > > sve_state = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state; > - free_pages((unsigned long) sve_state, pkvm_host_sve_state_order()); > + if (sve_state) > + free_pages((unsigned long) sve_state, pkvm_host_sve_state_order()); I'm a bit confused by these extra checks -- free_pages() should be safe to call on NULL? IIUC correctly, the actual issue is that per_cpu_ptr_nvhe_sym() will dereference the per-cpu pages to find the sve state, which is entirely bogus if the per-cpu pages have not been allocated. Now, looking at this, it looks like the bigger problem is that we literally free the per-cpu pages right before the sve state... Should we at least change the freeing order here? Thanks, Quentin
On Wed, Jun 25, 2025 at 12:22:45PM +0000, Quentin Perret wrote: > On Wednesday 25 Jun 2025 at 11:33:01 (+0000), Mostafa Saleh wrote: > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index 38a91bb5d4c7..5bb36c3b06b5 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -2344,15 +2344,22 @@ static void __init teardown_hyp_mode(void) > > int cpu; > > > > free_hyp_pgds(); > > + /* Order matches the order of initialization init_hyp_mode() */ > > for_each_possible_cpu(cpu) { > > + if (!per_cpu(kvm_arm_hyp_stack_base, cpu)) > > + continue; > > free_pages(per_cpu(kvm_arm_hyp_stack_base, cpu), NVHE_STACK_SHIFT - PAGE_SHIFT); > > + > > + if (!kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu]) > > + continue; > > free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order()); > > > > if (free_sve) { > > struct cpu_sve_state *sve_state; > > > > sve_state = per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->sve_state; > > - free_pages((unsigned long) sve_state, pkvm_host_sve_state_order()); > > + if (sve_state) > > + free_pages((unsigned long) sve_state, pkvm_host_sve_state_order()); > > I'm a bit confused by these extra checks -- free_pages() should be safe > to call on NULL? Yes, I surprised in my testing I didn't see an issue with freeing NULL, I though it might be config related, but I should have looked more. > > IIUC correctly, the actual issue is that per_cpu_ptr_nvhe_sym() will > dereference the per-cpu pages to find the sve state, which is entirely > bogus if the per-cpu pages have not been allocated. Now, looking at > this, it looks like the bigger problem is that we literally free the > per-cpu pages right before the sve state... > > Should we at least change the freeing order here? Make sense, I will fix that in v2. Thanks, Mostafa > > Thanks, > Quentin
© 2016 - 2025 Red Hat, Inc.