[v1] x86/sev: Update ghcb_version only once

[PATCH] x86/sev: Update ghcb_version only once

Posted by Ashwin Dayanand Kamat 2 years, 3 months ago

kernel crash was observed because of page fault, while running
cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
observed during hotplug after the CPU was offlined and the process
was migrated to different cpu. setup_ghcb() is called again which
tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
is a read_only variable which is initialised during booting.
This results in pagefault.

From logs,
[  256.447466] BUG: unable to handle page fault for address: ffffffffba556e70
[  256.447476] #PF: supervisor write access in kernel mode
[  256.447478] #PF: error_code(0x0003) - permissions violation
[  256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1
[  256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI
[  256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon
.
.
.
.
.
[  256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0
[  256.447514] PKRU: 55555554
[  256.447515] Call Trace:
[  256.447516]  <TASK>
[  256.447519]  ? __die_body.cold+0x1a/0x1f
[  256.447526]  ? __die+0x2a/0x35
[  256.447528]  ? page_fault_oops+0x10c/0x270
[  256.447531]  ? setup_ghcb+0x71/0x100
[  256.447533]  ? __x86_return_thunk+0x5/0x6
[  256.447537]  ? search_exception_tables+0x60/0x70
[  256.447541]  ? __x86_return_thunk+0x5/0x6
[  256.447543]  ? fixup_exception+0x27/0x320
[  256.447546]  ? kernelmode_fixup_or_oops+0xa2/0x120
[  256.447549]  ? __bad_area_nosemaphore+0x16a/0x1b0
[  256.447551]  ? kernel_exc_vmm_communication+0x60/0xb0
[  256.447556]  ? bad_area_nosemaphore+0x16/0x20
[  256.447558]  ? do_kern_addr_fault+0x7a/0x90
[  256.447560]  ? exc_page_fault+0xbd/0x160
[  256.447563]  ? asm_exc_page_fault+0x27/0x30
[  256.447570]  ? setup_ghcb+0x71/0x100
[  256.447572]  ? setup_ghcb+0xe/0x100
[  256.447574]  cpu_init_exception_handling+0x1b9/0x1f0

Fix is to avoid updating the variable after it has been initialised during booting.

Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Signed-off-by: Ashwin Dayanand Kamat <kashwindayan@vmware.com>
Co-developed-by: Bo Gan <ganb@vmware.com>
---
 arch/x86/kernel/sev-shared.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index ccb0915e84e1..a447908f2b4d 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -144,6 +144,9 @@ static bool sev_es_negotiate_protocol(void)
 {
 	u64 val;
 
+	if (ghcb_version)
+		return true;
+
 	/* Do the GHCB protocol version negotiation */
 	sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
 	VMGEXIT();
-- 
2.39.0

Re: [PATCH] x86/sev: Update ghcb_version only once

Posted by Tom Lendacky 2 years, 2 months ago

On 11/6/23 00:32, Ashwin Dayanand Kamat wrote:
> kernel crash was observed because of page fault, while running
> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
> observed during hotplug after the CPU was offlined and the process
> was migrated to different cpu. setup_ghcb() is called again which
> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
> is a read_only variable which is initialised during booting.
> This results in pagefault.
> 
>  From logs,
> [  256.447466] BUG: unable to handle page fault for address: ffffffffba556e70
> [  256.447476] #PF: supervisor write access in kernel mode
> [  256.447478] #PF: error_code(0x0003) - permissions violation
> [  256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1
> [  256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI
> [  256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon
> .
> .
> .
> .
> .
> [  256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0
> [  256.447514] PKRU: 55555554
> [  256.447515] Call Trace:
> [  256.447516]  <TASK>
> [  256.447519]  ? __die_body.cold+0x1a/0x1f
> [  256.447526]  ? __die+0x2a/0x35
> [  256.447528]  ? page_fault_oops+0x10c/0x270
> [  256.447531]  ? setup_ghcb+0x71/0x100
> [  256.447533]  ? __x86_return_thunk+0x5/0x6
> [  256.447537]  ? search_exception_tables+0x60/0x70
> [  256.447541]  ? __x86_return_thunk+0x5/0x6
> [  256.447543]  ? fixup_exception+0x27/0x320
> [  256.447546]  ? kernelmode_fixup_or_oops+0xa2/0x120
> [  256.447549]  ? __bad_area_nosemaphore+0x16a/0x1b0
> [  256.447551]  ? kernel_exc_vmm_communication+0x60/0xb0
> [  256.447556]  ? bad_area_nosemaphore+0x16/0x20
> [  256.447558]  ? do_kern_addr_fault+0x7a/0x90
> [  256.447560]  ? exc_page_fault+0xbd/0x160
> [  256.447563]  ? asm_exc_page_fault+0x27/0x30
> [  256.447570]  ? setup_ghcb+0x71/0x100
> [  256.447572]  ? setup_ghcb+0xe/0x100
> [  256.447574]  cpu_init_exception_handling+0x1b9/0x1f0
> 
> Fix is to avoid updating the variable after it has been initialised during booting.

The call to sev_es_negotiate_protocol() could be moved down to after the 
initial_vc_handler if-check in setup_ghcb(). That would then put the call 
to sev_es_negotiate_protocol() only in the BSP boot phase (and it only 
needs be done once). Does doing that prevent the #PF for you?

> 
> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Signed-off-by: Ashwin Dayanand Kamat <kashwindayan@vmware.com>
> Co-developed-by: Bo Gan <ganb@vmware.com>

This tag needs to be moved above your Signed-off-by: and it needs a 
Signed-off-by: for the co-developer.

Thanks,
Tom

> ---
>   arch/x86/kernel/sev-shared.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index ccb0915e84e1..a447908f2b4d 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -144,6 +144,9 @@ static bool sev_es_negotiate_protocol(void)
>   {
>   	u64 val;
>   
> +	if (ghcb_version)
> +		return true;
> +
>   	/* Do the GHCB protocol version negotiation */
>   	sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
>   	VMGEXIT();

Re: [PATCH] x86/sev: Update ghcb_version only once

Posted by Ashwin Kamat 2 years, 2 months ago


> On 28-Nov-2023, at 3:40 AM, Tom Lendacky <thomas.lendacky@amd.com> wrote:
> 
> !! External Email
> 
> On 11/6/23 00:32, Ashwin Dayanand Kamat wrote:
>> kernel crash was observed because of page fault, while running
>> cpuhotplug ltp testcases on SEV-ES enabled systems. The crash was
>> observed during hotplug after the CPU was offlined and the process
>> was migrated to different cpu. setup_ghcb() is called again which
>> tries to update ghcb_version in sev_es_negotiate_protocol(). Ideally this
>> is a read_only variable which is initialised during booting.
>> This results in pagefault.
>> 
>> From logs,
>> [  256.447466] BUG: unable to handle page fault for address: ffffffffba556e70
>> [  256.447476] #PF: supervisor write access in kernel mode
>> [  256.447478] #PF: error_code(0x0003) - permissions violation
>> [  256.447479] PGD 8000667c0f067 P4D 8000667c0f067 PUD 8000667c10063 PMD 80080006674001e1
>> [  256.447483] Oops: 0003 [#1] PREEMPT SMP NOPTI
>> [  256.447487] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.45-8.ph5 #1-photon
>> .
>> .
>> .
>> .
>> .
>> [  256.447511] CR2: ffffffffba556e70 CR3: 0008000667c0a004 CR4: 0000000000770ee0
>> [  256.447514] PKRU: 55555554
>> [  256.447515] Call Trace:
>> [  256.447516]  <TASK>
>> [  256.447519]  ? __die_body.cold+0x1a/0x1f
>> [  256.447526]  ? __die+0x2a/0x35
>> [  256.447528]  ? page_fault_oops+0x10c/0x270
>> [  256.447531]  ? setup_ghcb+0x71/0x100
>> [  256.447533]  ? __x86_return_thunk+0x5/0x6
>> [  256.447537]  ? search_exception_tables+0x60/0x70
>> [  256.447541]  ? __x86_return_thunk+0x5/0x6
>> [  256.447543]  ? fixup_exception+0x27/0x320
>> [  256.447546]  ? kernelmode_fixup_or_oops+0xa2/0x120
>> [  256.447549]  ? __bad_area_nosemaphore+0x16a/0x1b0
>> [  256.447551]  ? kernel_exc_vmm_communication+0x60/0xb0
>> [  256.447556]  ? bad_area_nosemaphore+0x16/0x20
>> [  256.447558]  ? do_kern_addr_fault+0x7a/0x90
>> [  256.447560]  ? exc_page_fault+0xbd/0x160
>> [  256.447563]  ? asm_exc_page_fault+0x27/0x30
>> [  256.447570]  ? setup_ghcb+0x71/0x100
>> [  256.447572]  ? setup_ghcb+0xe/0x100
>> [  256.447574]  cpu_init_exception_handling+0x1b9/0x1f0
>> 
>> Fix is to avoid updating the variable after it has been initialised during booting.
> 
> The call to sev_es_negotiate_protocol() could be moved down to after the
> initial_vc_handler if-check in setup_ghcb(). That would then put the call
> to sev_es_negotiate_protocol() only in the BSP boot phase (and it only
> needs be done once). Does doing that prevent the #PF for you?
> 
Verified the change and it works fine. Updated the same in v2.

>> 
>> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
>> Signed-off-by: Ashwin Dayanand Kamat <kashwindayan@vmware.com>
>> Co-developed-by: Bo Gan <ganb@vmware.com>
> 
> This tag needs to be moved above your Signed-off-by: and it needs a
> Signed-off-by: for the co-developer.
> 
Fixed in v2.

> Thanks,
> Tom
> 
>> ---
>>  arch/x86/kernel/sev-shared.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
>> index ccb0915e84e1..a447908f2b4d 100644
>> --- a/arch/x86/kernel/sev-shared.c
>> +++ b/arch/x86/kernel/sev-shared.c
>> @@ -144,6 +144,9 @@ static bool sev_es_negotiate_protocol(void)
>>  {
>>      u64 val;
>> 
>> +     if (ghcb_version)
>> +             return true;
>> +
>>      /* Do the GHCB protocol version negotiation */
>>      sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
>>      VMGEXIT();
> 
> !! External Email: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender.