Making opt_allow_unsafe read only after init requires changes to the logic in
init_amd(), otherwise the following #PF happens on CPU hotplug:
----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]----
CPU: 1
RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993
[...]
Xen call trace:
[<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993
[<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db
[<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf
[<ffff82d040203327>] F __high_start+0x87/0xa0
Pagetable walk from ffff82d0404011ea:
L4[0x105] = 000000006fc2e063 ffffffffffffffff
L3[0x141] = 000000006fc2b063 ffffffffffffffff
L2[0x002] = 000000807c7ca063 ffffffffffffffff
L1[0x001] = 800000006f801121 ffffffffffffffff
****************************************
Panic on CPU 1:
FATAL PAGE FAULT
[error_code=0003]
Faulting linear address: ffff82d0404011ea
****************************************
For the time being revert opt_allow_unsafe to be __read_mostly.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: bfcb0abb191f ('types: replace remaining uses of s8')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/cpu/amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 903be14af4b9..7da04230393a 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -46,7 +46,7 @@ static unsigned int __initdata opt_cpuid_mask_thermal_ecx = ~0u;
integer_param("cpuid_mask_thermal_ecx", opt_cpuid_mask_thermal_ecx);
/* 1 = allow, 0 = don't allow guest creation, -1 = don't allow boot */
-int8_t __ro_after_init opt_allow_unsafe;
+int8_t __read_mostly opt_allow_unsafe;
boolean_param("allow_unsafe", opt_allow_unsafe);
/* Signal whether the ACPI C1E quirk is required. */
--
2.46.0
On 02.09.2024 17:00, Roger Pau Monne wrote: > Making opt_allow_unsafe read only after init requires changes to the logic in > init_amd(), otherwise the following #PF happens on CPU hotplug: > > ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- > CPU: 1 > RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > [...] > Xen call trace: > [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db > [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf > [<ffff82d040203327>] F __high_start+0x87/0xa0 > > Pagetable walk from ffff82d0404011ea: > L4[0x105] = 000000006fc2e063 ffffffffffffffff > L3[0x141] = 000000006fc2b063 ffffffffffffffff > L2[0x002] = 000000807c7ca063 ffffffffffffffff > L1[0x001] = 800000006f801121 ffffffffffffffff > > **************************************** > Panic on CPU 1: > FATAL PAGE FAULT > [error_code=0003] > Faulting linear address: ffff82d0404011ea > **************************************** Hmm, I specifically looked at that code, but I can see how I screwed up. > For the time being revert opt_allow_unsafe to be __read_mostly. There's exactly one write that an AP can hit. Is it really worth moving back, rather than just doing if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) opt_allow_unsafe = 1; else if ... ? Jan
On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote: > On 02.09.2024 17:00, Roger Pau Monne wrote: > > Making opt_allow_unsafe read only after init requires changes to the logic in > > init_amd(), otherwise the following #PF happens on CPU hotplug: > > > > ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- > > CPU: 1 > > RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > > [...] > > Xen call trace: > > [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 > > [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db > > [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf > > [<ffff82d040203327>] F __high_start+0x87/0xa0 > > > > Pagetable walk from ffff82d0404011ea: > > L4[0x105] = 000000006fc2e063 ffffffffffffffff > > L3[0x141] = 000000006fc2b063 ffffffffffffffff > > L2[0x002] = 000000807c7ca063 ffffffffffffffff > > L1[0x001] = 800000006f801121 ffffffffffffffff > > > > **************************************** > > Panic on CPU 1: > > FATAL PAGE FAULT > > [error_code=0003] > > Faulting linear address: ffff82d0404011ea > > **************************************** > > Hmm, I specifically looked at that code, but I can see how I screwed up. > > > For the time being revert opt_allow_unsafe to be __read_mostly. > > There's exactly one write that an AP can hit. Is it really worth moving > back, rather than just doing > > if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) > opt_allow_unsafe = 1; > else if ... > > ? I would rather move this logic so it's only the BSP that can set opt_allow_unsafe, and the APs check they match the configuration set by the BSP. I think the resulting logic would be cleaner, but I didn't want to do such a change as part of this fix. Thanks, Roger.
On 02.09.2024 17:30, Roger Pau Monné wrote: > On Mon, Sep 02, 2024 at 05:16:05PM +0200, Jan Beulich wrote: >> On 02.09.2024 17:00, Roger Pau Monne wrote: >>> Making opt_allow_unsafe read only after init requires changes to the logic in >>> init_amd(), otherwise the following #PF happens on CPU hotplug: >>> >>> ----[ Xen-4.20.0-1-d x86_64 debug=y Tainted: H ]---- >>> CPU: 1 >>> RIP: e008:[<ffff82d040291081>] arch/x86/cpu/amd.c#init_amd+0x37f/0x993 >>> [...] >>> Xen call trace: >>> [<ffff82d040291081>] R arch/x86/cpu/amd.c#init_amd+0x37f/0x993 >>> [<ffff82d040291fbe>] F identify_cpu+0x2d4/0x4db >>> [<ffff82d04032eeaa>] F start_secondary+0x22e/0x3cf >>> [<ffff82d040203327>] F __high_start+0x87/0xa0 >>> >>> Pagetable walk from ffff82d0404011ea: >>> L4[0x105] = 000000006fc2e063 ffffffffffffffff >>> L3[0x141] = 000000006fc2b063 ffffffffffffffff >>> L2[0x002] = 000000807c7ca063 ffffffffffffffff >>> L1[0x001] = 800000006f801121 ffffffffffffffff >>> >>> **************************************** >>> Panic on CPU 1: >>> FATAL PAGE FAULT >>> [error_code=0003] >>> Faulting linear address: ffff82d0404011ea >>> **************************************** >> >> Hmm, I specifically looked at that code, but I can see how I screwed up. >> >>> For the time being revert opt_allow_unsafe to be __read_mostly. >> >> There's exactly one write that an AP can hit. Is it really worth moving >> back, rather than just doing >> >> if (opt_allow_unsafe <= 0 && !cpu_has_amd_erratum(c, AMD_ERRATUM_121)) >> opt_allow_unsafe = 1; >> else if ... >> >> ? > > I would rather move this logic so it's only the BSP that can set > opt_allow_unsafe, and the APs check they match the configuration set > by the BSP. I think the resulting logic would be cleaner, but I > didn't want to do such a change as part of this fix. Well, okay then: Reviewed-by: Jan Beulich <jbeulich@suse.com> And I guess I'll put it in right away. Jan
© 2016 - 2024 Red Hat, Inc.