KVM: x86: Mega-CET

[PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Sean Christopherson 2 weeks, 5 days ago

Synchronize XSS from the GHCB to KVM's internal tracking if the guest
marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
of XSS in order to compute the required XSTATE size when emulating
CPUID.0xD.0x1 for the guest.

Treat the incoming XSS change as an emulated write, i.e. validatate the
guest-provided value, to avoid letting the guest load garbage into KVM's
tracking.  Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage.  In either case, KVM can't fix the broken guest.

Note, emulating the change as an MSR write also takes care of side effects,
e.g. marking dynamic CPUID bits as dirty.

Suggested-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/sev.c | 3 +++
 arch/x86/kvm/svm/svm.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd77a87dd84..0cd32df7b9b6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	if (kvm_ghcb_xcr0_is_valid(svm))
 		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
 
+	if (kvm_ghcb_xss_is_valid(svm))
+		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+
 	/* Copy the GHCB exit information into the VMCB fields */
 	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
 	control->exit_code = lower_32_bits(exit_code);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a42e95883b45..10d764878bcc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -942,5 +942,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
 DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
 DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
 DEFINE_KVM_GHCB_ACCESSORS(xcr0)
+DEFINE_KVM_GHCB_ACCESSORS(xss)
 
 #endif
-- 
2.51.0.384.g4c02a37b29-goog

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 2 weeks, 1 day ago

On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> of XSS in order to compute the required XSTATE size when emulating
> CPUID.0xD.0x1 for the guest.
> 
> Treat the incoming XSS change as an emulated write, i.e. validatate the
> guest-provided value, to avoid letting the guest load garbage into KVM's
> tracking.  Simply ignore bad values, as either the guest managed to get an
> unsupported value into hardware, or the guest is misbehaving and providing
> pure garbage.  In either case, KVM can't fix the broken guest.
> 
> Note, emulating the change as an MSR write also takes care of side effects,
> e.g. marking dynamic CPUID bits as dirty.
> 
> Suggested-by: John Allen <john.allen@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/sev.c | 3 +++
>  arch/x86/kvm/svm/svm.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0cd77a87dd84..0cd32df7b9b6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>  	if (kvm_ghcb_xcr0_is_valid(svm))
>  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
>  
> +	if (kvm_ghcb_xss_is_valid(svm))
> +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> +

It looks like this is the change that caused the selftest regression
with sev-es. It's not yet clear to me what the problem is though.

Thanks,
John

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Sean Christopherson 2 weeks, 1 day ago

On Tue, Sep 16, 2025, John Allen wrote:
> On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> > of XSS in order to compute the required XSTATE size when emulating
> > CPUID.0xD.0x1 for the guest.
> > 
> > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > guest-provided value, to avoid letting the guest load garbage into KVM's
> > tracking.  Simply ignore bad values, as either the guest managed to get an
> > unsupported value into hardware, or the guest is misbehaving and providing
> > pure garbage.  In either case, KVM can't fix the broken guest.
> > 
> > Note, emulating the change as an MSR write also takes care of side effects,
> > e.g. marking dynamic CPUID bits as dirty.
> > 
> > Suggested-by: John Allen <john.allen@amd.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/sev.c | 3 +++
> >  arch/x86/kvm/svm/svm.h | 1 +
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 0cd77a87dd84..0cd32df7b9b6 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >  	if (kvm_ghcb_xcr0_is_valid(svm))
> >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> >  
> > +	if (kvm_ghcb_xss_is_valid(svm))
> > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > +
> 
> It looks like this is the change that caused the selftest regression
> with sev-es. It's not yet clear to me what the problem is though.

Do you see any WARNs in the guest kernel log?

The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
only used by init_xstate_size(), and I would expect the guest kernel's sanity
checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.

Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
was masking a pre-existing (or just different) bug, and that "fixing" that flaw
by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
the bug.

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 2 weeks, 1 day ago

On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > Synchronize XSS from the GHCB to KVM's internal tracking if the guest
> > > marks XSS as valid on a #VMGEXIT.  Like XCR0, KVM needs an up-to-date copy
> > > of XSS in order to compute the required XSTATE size when emulating
> > > CPUID.0xD.0x1 for the guest.
> > > 
> > > Treat the incoming XSS change as an emulated write, i.e. validatate the
> > > guest-provided value, to avoid letting the guest load garbage into KVM's
> > > tracking.  Simply ignore bad values, as either the guest managed to get an
> > > unsupported value into hardware, or the guest is misbehaving and providing
> > > pure garbage.  In either case, KVM can't fix the broken guest.
> > > 
> > > Note, emulating the change as an MSR write also takes care of side effects,
> > > e.g. marking dynamic CPUID bits as dirty.
> > > 
> > > Suggested-by: John Allen <john.allen@amd.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/svm/sev.c | 3 +++
> > >  arch/x86/kvm/svm/svm.h | 1 +
> > >  2 files changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > >  
> > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > +
> > 
> > It looks like this is the change that caused the selftest regression
> > with sev-es. It's not yet clear to me what the problem is though.
> 
> Do you see any WARNs in the guest kernel log?
> 
> The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> only used by init_xstate_size(), and I would expect the guest kernel's sanity
> checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.

Yes, actually that looks to be the case:

[    0.463504] ------------[ cut here ]------------
[    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
[    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
[    0.466443] Modules linked in:
[    0.467445] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.17.0-rc3-shstk-v15+ #6 PREEMPT(voluntary)
[    0.468443] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022
[    0.469444] RIP: 0010:paranoid_xstate_size_valid+0x101/0x140
[    0.470443] Code: 89 44 24 04 e8 00 fa ff ff 8b 44 24 04 eb c2 89 da 89 c6 48 c7 c7 80 f4 bc 9e 89 44 24 04 c6 05 9d a3 a4 ff 01 e8 3f fa fb fd <0f> 0b 8b 44 24 04 eb ce 80 3d 8a a3 a4 ff 00 74 09 e8 c9 f9 ff ff
[    0.471443] RSP: 0000:ffffffff9ee03e80 EFLAGS: 00010286
[    0.472443] RAX: 0000000000000000 RBX: 0000000000000348 RCX: c0000000fffeffff
[    0.473443] RDX: 0000000000000000 RSI: 00000000fffeffff RDI: ffffffff9fd83c00
[    0.474443] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000003
[    0.475443] R10: ffffffff9ee03d20 R11: ffff8c04fff8ffe8 R12: 0000000000000001
[    0.476443] R13: ffffffffffffffff R14: 0000000000000001 R15: 000000007c135000
[    0.477443] FS:  0000000000000000(0000) GS:ffff8c051c118000(0000) knlGS:0000000000000000
[    0.478443] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.479443] CR2: ffff8c03f4c01000 CR3: 0008000f73822001 CR4: 0000000000f70ef0
[    0.480445] PKRU: 55555554
[    0.480967] Call Trace:
[    0.481446]  <TASK>
[    0.481856]  init_xstate_size+0xa8/0x160
[    0.482444]  fpu__init_system_xstate+0x1c4/0x500
[    0.483444]  fpu__init_system+0x93/0xc0
[    0.484443]  arch_cpu_finalize_init+0xd2/0x160
[    0.485290]  start_kernel+0x330/0x470
[    0.485444]  x86_64_start_reservations+0x14/0x30
[    0.486443]  x86_64_start_kernel+0xd0/0xe0
[    0.487443]  common_startup_64+0x13e/0x141
[    0.488444]  </TASK>
[    0.488879] ---[ end trace 0000000000000000 ]--

> 
> Another possibility is that unconditionally setting cpuid_dynamic_bits_dirty
> was masking a pre-existing (or just different) bug, and that "fixing" that flaw
> by eliding cpuid_dynamic_bits_dirty when "vcpu->arch.ia32_xss == data" exposed
> the bug.

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Sean Christopherson 2 weeks, 1 day ago

On Tue, Sep 16, 2025, John Allen wrote:
> On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > On Tue, Sep 16, 2025, John Allen wrote:
> > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > >  
> > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > +
> > > 
> > > It looks like this is the change that caused the selftest regression
> > > with sev-es. It's not yet clear to me what the problem is though.
> > 
> > Do you see any WARNs in the guest kernel log?
> > 
> > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> 
> Yes, actually that looks to be the case:
> 
> [    0.463504] ------------[ cut here ]------------
> [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140

Can you run with the below printk tracing in the host (and optionally tracing in
the guest for its updates)?  Compile tested only.

There should be very few XSS updates, so this _shouldn't_ spam/crash your host :-)

---
 arch/x86/kvm/svm/sev.c |  6 ++++--
 arch/x86/kvm/x86.c     | 15 ++++++++++++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0cd32df7b9b6..8ac87d623767 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3306,8 +3306,10 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 	if (kvm_ghcb_xcr0_is_valid(svm))
 		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
 
-	if (kvm_ghcb_xss_is_valid(svm))
-		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+	if (kvm_ghcb_xss_is_valid(svm)) {
+		if (__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
+			pr_warn("Dropped XSS update, val = %llx\n", data);
+	}
 
 	/* Copy the GHCB exit information into the VMCB fields */
 	exit_code = kvm_ghcb_get_sw_exit_code(ghcb);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c78acab2ff3f..a846ed69ce2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4118,13 +4118,22 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_XSS:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+		if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES)) {
+			pr_warn("Guest CPUID doesn't have XSAVES\n");
 			return KVM_MSR_RET_UNSUPPORTED;
+		}
 
-		if (data & ~vcpu->arch.guest_supported_xss)
+		if (data & ~vcpu->arch.guest_supported_xss) {
+			pr_warn("Invalid XSS: supported = %llx, val = %llx\n",
+				vcpu->arch.guest_supported_xss, data);
 			return 1;
-		if (vcpu->arch.ia32_xss == data)
+		}
+		if (vcpu->arch.ia32_xss == data) {
+			pr_warn("XSS already set to val = %llx, eliding updates\n", data);
 			break;
+		}
+
+		pr_warn("XSS updated to val = %llx, marking CPUID dirty\n", data);
 		vcpu->arch.ia32_xss = data;
 		vcpu->arch.cpuid_dynamic_bits_dirty = true;
 		break;

base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c
--

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 2 weeks, 1 day ago

On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> On Tue, Sep 16, 2025, John Allen wrote:
> > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > >  
> > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > +
> > > > 
> > > > It looks like this is the change that caused the selftest regression
> > > > with sev-es. It's not yet clear to me what the problem is though.
> > > 
> > > Do you see any WARNs in the guest kernel log?
> > > 
> > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > 
> > Yes, actually that looks to be the case:
> > 
> > [    0.463504] ------------[ cut here ]------------
> > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> 
> Can you run with the below printk tracing in the host (and optionally tracing in
> the guest for its updates)?  Compile tested only.

Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
cpus followed by "XSS already set to val = 0, eliding updates" times the
number of cpus. This is with host tracing only. I can try with guest
tracing too in the morning.

Thanks,
John

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 1 week, 6 days ago

On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> > On Tue, Sep 16, 2025, John Allen wrote:
> > > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > > >  
> > > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > > +
> > > > > 
> > > > > It looks like this is the change that caused the selftest regression
> > > > > with sev-es. It's not yet clear to me what the problem is though.
> > > > 
> > > > Do you see any WARNs in the guest kernel log?
> > > > 
> > > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > > 
> > > Yes, actually that looks to be the case:
> > > 
> > > [    0.463504] ------------[ cut here ]------------
> > > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> > 
> > Can you run with the below printk tracing in the host (and optionally tracing in
> > the guest for its updates)?  Compile tested only.
> 
> Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> cpus followed by "XSS already set to val = 0, eliding updates" times the
> number of cpus. This is with host tracing only. I can try with guest
> tracing too in the morning.

Ok, I think I see the problem. The cases above where we were seeing the
added print statements from kvm_set_msr_common were not situations where
we were going through the __kvm_emulate_msr_write via
sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
context, we never end up getting to kvm_set_msr_common because we hit
the following statement at the top of svm_set_msr:

if (sev_es_prevent_msr_access(vcpu, msr))
	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;

So I'm not sure if this would force using the original method of
directly setting arch.ia32_xss or if there's some additional handling
here that we need in this scenario to allow the msr access.

Thanks,
John

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Sean Christopherson 1 week, 6 days ago

On Thu, Sep 18, 2025, John Allen wrote:
> On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > On Tue, Sep 16, 2025 at 02:38:52PM -0700, Sean Christopherson wrote:
> > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > On Tue, Sep 16, 2025 at 12:53:58PM -0700, Sean Christopherson wrote:
> > > > > On Tue, Sep 16, 2025, John Allen wrote:
> > > > > > On Fri, Sep 12, 2025 at 04:23:07PM -0700, Sean Christopherson wrote:
> > > > > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > > > > index 0cd77a87dd84..0cd32df7b9b6 100644
> > > > > > > --- a/arch/x86/kvm/svm/sev.c
> > > > > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > > > > @@ -3306,6 +3306,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> > > > > > >  	if (kvm_ghcb_xcr0_is_valid(svm))
> > > > > > >  		__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(ghcb));
> > > > > > >  
> > > > > > > +	if (kvm_ghcb_xss_is_valid(svm))
> > > > > > > +		__kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(ghcb));
> > > > > > > +
> > > > > > 
> > > > > > It looks like this is the change that caused the selftest regression
> > > > > > with sev-es. It's not yet clear to me what the problem is though.
> > > > > 
> > > > > Do you see any WARNs in the guest kernel log?
> > > > > 
> > > > > The most obvious potential bug is that KVM is missing a CPUID update, e.g. due
> > > > > to dropping an XSS write, consuming stale data, not setting cpuid_dynamic_bits_dirty,
> > > > > etc.  But AFAICT, CPUID.0xD.1.EBX (only thing that consumes the current XSS) is
> > > > > only used by init_xstate_size(), and I would expect the guest kernel's sanity
> > > > > checks in paranoid_xstate_size_valid() to yell if KVM botches CPUID emulation.
> > > > 
> > > > Yes, actually that looks to be the case:
> > > > 
> > > > [    0.463504] ------------[ cut here ]------------
> > > > [    0.464443] XSAVE consistency problem: size 880 != kernel_size 840
> > > > [    0.465445] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:638 paranoid_xstate_size_valid+0x101/0x140
> > > 
> > > Can you run with the below printk tracing in the host (and optionally tracing in
> > > the guest for its updates)?  Compile tested only.
> > 
> > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > number of cpus. This is with host tracing only. I can try with guest
> > tracing too in the morning.
> 
> Ok, I think I see the problem. The cases above where we were seeing the
> added print statements from kvm_set_msr_common were not situations where
> we were going through the __kvm_emulate_msr_write via
> sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> context, we never end up getting to kvm_set_msr_common because we hit
> the following statement at the top of svm_set_msr:
> 
> if (sev_es_prevent_msr_access(vcpu, msr))
> 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;

Gah, I was looking for something like that but couldn't find it, obviously.

> So I'm not sure if this would force using the original method of
> directly setting arch.ia32_xss or if there's some additional handling
> here that we need in this scenario to allow the msr access.

Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
the helper, and then this patch can use that API.  I like the symmetry between
__kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
end-around on svm_set_msr() by calling kvm_set_msr_common() directly.

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 945f7da60107..ace9f321d2c9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
 int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
 int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
+int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
 
 int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 94d9acc94c9a..462aebc54135 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
                __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
 
        if (kvm_ghcb_xss_is_valid(svm))
-               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
+               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
 
        /* Copy the GHCB exit information into the VMCB fields */
        exit_code = kvm_ghcb_get_sw_exit_code(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bbc187ab428..9b81e92a8de5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
 
+int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
+{
+       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+               return KVM_MSR_RET_UNSUPPORTED;
+
+       if (xss & ~vcpu->arch.guest_supported_xss)
+               return 1;
+       if (vcpu->arch.ia32_xss == xss)
+               return 0;
+
+       vcpu->arch.ia32_xss = xss;
+       vcpu->arch.cpuid_dynamic_bits_dirty = true;
+       return 0;
+}
+EXPORT_SYMBOL_GPL(__kvm_set_xss);
+
 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
        return __kvm_is_valid_cr4(vcpu, cr4) &&
@@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
                }
                break;
        case MSR_IA32_XSS:
-               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
-                       return KVM_MSR_RET_UNSUPPORTED;
-
-               if (data & ~vcpu->arch.guest_supported_xss)
-                       return 1;
-               if (vcpu->arch.ia32_xss == data)
-                       break;
-               vcpu->arch.ia32_xss = data;
-               vcpu->arch.cpuid_dynamic_bits_dirty = true;
-               break;
+               return __kvm_set_xss(vcpu, data);
        case MSR_SMI_COUNT:
                if (!msr_info->host_initiated)
                        return 1;

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Sean Christopherson 1 week, 6 days ago

On Thu, Sep 18, 2025, Sean Christopherson wrote:
> On Thu, Sep 18, 2025, John Allen wrote:
> > On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > > number of cpus. This is with host tracing only. I can try with guest
> > > tracing too in the morning.
> > 
> > Ok, I think I see the problem. The cases above where we were seeing the
> > added print statements from kvm_set_msr_common were not situations where
> > we were going through the __kvm_emulate_msr_write via
> > sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> > context, we never end up getting to kvm_set_msr_common because we hit
> > the following statement at the top of svm_set_msr:
> > 
> > if (sev_es_prevent_msr_access(vcpu, msr))
> > 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
> 
> Gah, I was looking for something like that but couldn't find it, obviously.
> 
> > So I'm not sure if this would force using the original method of
> > directly setting arch.ia32_xss or if there's some additional handling
> > here that we need in this scenario to allow the msr access.
> 
> Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
> the helper, and then this patch can use that API.  I like the symmetry between
> __kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
> end-around on svm_set_msr() by calling kvm_set_msr_common() directly.

Scratch that, KVM supports intra-host (and inter-host?) migration of SEV-ES
guests and so needs to allow the host to save/restore XSS, otherwise a guest
that *knows* its XSS hasn't change could get stale/bad CPUID emulation if the
guest doesn't provide XSS in the GHCB on every exit.

So while seemingly hacky, I'm pretty sure the right solution is actually:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cabe1950b160..d48bf20c865b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2721,8 +2721,8 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
 static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
                                      struct msr_data *msr_info)
 {
-       return sev_es_guest(vcpu->kvm) &&
-              vcpu->arch.guest_state_protected &&
+       return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected &&
+              msr_info->index != MSR_IA32_XSS &&
               !msr_write_intercepted(vcpu, msr_info->index);
 }
 
Side topic, checking msr_write_intercepted() is likely wrong.  It's a bad
heuristic for "managed in the VMSA".  MSRs that _KVM_ loads into hardware and
context switches should still be accessible.  I haven't looked to see if this is
a problem in practice.

> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 945f7da60107..ace9f321d2c9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
>  int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
>  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
> +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
>  
>  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
>  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 94d9acc94c9a..462aebc54135 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
>                 __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
>  
>         if (kvm_ghcb_xss_is_valid(svm))
> -               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
> +               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
>  
>         /* Copy the GHCB exit information into the VMCB fields */
>         exit_code = kvm_ghcb_get_sw_exit_code(svm);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bbc187ab428..9b81e92a8de5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
>  
> +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
> +{
> +       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> +               return KVM_MSR_RET_UNSUPPORTED;
> +
> +       if (xss & ~vcpu->arch.guest_supported_xss)
> +               return 1;
> +       if (vcpu->arch.ia32_xss == xss)
> +               return 0;
> +
> +       vcpu->arch.ia32_xss = xss;
> +       vcpu->arch.cpuid_dynamic_bits_dirty = true;
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(__kvm_set_xss);
> +
>  static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>         return __kvm_is_valid_cr4(vcpu, cr4) &&
> @@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 }
>                 break;
>         case MSR_IA32_XSS:
> -               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> -                       return KVM_MSR_RET_UNSUPPORTED;
> -
> -               if (data & ~vcpu->arch.guest_supported_xss)
> -                       return 1;
> -               if (vcpu->arch.ia32_xss == data)
> -                       break;
> -               vcpu->arch.ia32_xss = data;
> -               vcpu->arch.cpuid_dynamic_bits_dirty = true;
> -               break;
> +               return __kvm_set_xss(vcpu, data);
>         case MSR_SMI_COUNT:
>                 if (!msr_info->host_initiated)
>                         return 1;

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 1 week, 6 days ago

On Thu, Sep 18, 2025 at 01:44:13PM -0700, Sean Christopherson wrote:
> On Thu, Sep 18, 2025, Sean Christopherson wrote:
> > On Thu, Sep 18, 2025, John Allen wrote:
> > > On Tue, Sep 16, 2025 at 05:55:33PM -0500, John Allen wrote:
> > > > Interesting, I see "Guest CPUID doesn't have XSAVES" times the number of
> > > > cpus followed by "XSS already set to val = 0, eliding updates" times the
> > > > number of cpus. This is with host tracing only. I can try with guest
> > > > tracing too in the morning.
> > > 
> > > Ok, I think I see the problem. The cases above where we were seeing the
> > > added print statements from kvm_set_msr_common were not situations where
> > > we were going through the __kvm_emulate_msr_write via
> > > sev_es_sync_from_ghcb. When we call __kvm_emulate_msr_write from this
> > > context, we never end up getting to kvm_set_msr_common because we hit
> > > the following statement at the top of svm_set_msr:
> > > 
> > > if (sev_es_prevent_msr_access(vcpu, msr))
> > > 	return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
> > 
> > Gah, I was looking for something like that but couldn't find it, obviously.
> > 
> > > So I'm not sure if this would force using the original method of
> > > directly setting arch.ia32_xss or if there's some additional handling
> > > here that we need in this scenario to allow the msr access.
> > 
> > Does this fix things?  If so, I'll slot in a patch to extract setting XSS to
> > the helper, and then this patch can use that API.  I like the symmetry between
> > __kvm_set_xcr() and __kvm_set_xss(), and I especially like not doing a generic
> > end-around on svm_set_msr() by calling kvm_set_msr_common() directly.
> 
> Scratch that, KVM supports intra-host (and inter-host?) migration of SEV-ES
> guests and so needs to allow the host to save/restore XSS, otherwise a guest
> that *knows* its XSS hasn't change could get stale/bad CPUID emulation if the
> guest doesn't provide XSS in the GHCB on every exit.
> 
> So while seemingly hacky, I'm pretty sure the right solution is actually:
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index cabe1950b160..d48bf20c865b 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2721,8 +2721,8 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
>  static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
>                                       struct msr_data *msr_info)
>  {
> -       return sev_es_guest(vcpu->kvm) &&
> -              vcpu->arch.guest_state_protected &&
> +       return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected &&
> +              msr_info->index != MSR_IA32_XSS &&
>                !msr_write_intercepted(vcpu, msr_info->index);
>  }

Yes, it looks like this fixes the regression. Thanks!

The 32bit selftest still doesn't work properly with sev-es, but that was
a problem with the previous version too. I suspect there's some
incompatibility between sev-es and the test, but I haven't been able to
get a good answer on why that might be.

Thanks,
John

>  
> Side topic, checking msr_write_intercepted() is likely wrong.  It's a bad
> heuristic for "managed in the VMSA".  MSRs that _KVM_ loads into hardware and
> context switches should still be accessible.  I haven't looked to see if this is
> a problem in practice.
> 
> > 
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 945f7da60107..ace9f321d2c9 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2213,6 +2213,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
> >  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
> >  int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
> >  int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
> > +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss);
> >  
> >  int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> >  int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 94d9acc94c9a..462aebc54135 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -3355,7 +3355,7 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> >                 __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
> >  
> >         if (kvm_ghcb_xss_is_valid(svm))
> > -               __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
> > +               __kvm_set_xss(vcpu, kvm_ghcb_get_xss(svm));
> >  
> >         /* Copy the GHCB exit information into the VMCB fields */
> >         exit_code = kvm_ghcb_get_sw_exit_code(svm);
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5bbc187ab428..9b81e92a8de5 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1313,6 +1313,22 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
> >  }
> >  EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_emulate_xsetbv);
> >  
> > +int __kvm_set_xss(struct kvm_vcpu *vcpu, u64 xss)
> > +{
> > +       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> > +               return KVM_MSR_RET_UNSUPPORTED;
> > +
> > +       if (xss & ~vcpu->arch.guest_supported_xss)
> > +               return 1;
> > +       if (vcpu->arch.ia32_xss == xss)
> > +               return 0;
> > +
> > +       vcpu->arch.ia32_xss = xss;
> > +       vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(__kvm_set_xss);
> > +
> >  static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  {
> >         return __kvm_is_valid_cr4(vcpu, cr4) &&
> > @@ -4119,16 +4135,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >                 }
> >                 break;
> >         case MSR_IA32_XSS:
> > -               if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
> > -                       return KVM_MSR_RET_UNSUPPORTED;
> > -
> > -               if (data & ~vcpu->arch.guest_supported_xss)
> > -                       return 1;
> > -               if (vcpu->arch.ia32_xss == data)
> > -                       break;
> > -               vcpu->arch.ia32_xss = data;
> > -               vcpu->arch.cpuid_dynamic_bits_dirty = true;
> > -               break;
> > +               return __kvm_set_xss(vcpu, data);
> >         case MSR_SMI_COUNT:
> >                 if (!msr_info->host_initiated)
> >                         return 1;

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Edgecombe, Rick P 1 week, 6 days ago

On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> The 32bit selftest still doesn't work properly with sev-es, but that was
> a problem with the previous version too. I suspect there's some
> incompatibility between sev-es and the test, but I haven't been able to
> get a good answer on why that might be.

You are talking about test_32bit() in test_shadow_stack.c?

That test relies on a specific CET arch behavior. If you try to transition to a
32 bit compatibility mode segment with an SSP with high bits set (outside the 32
bit address space), a #GP will be triggered by the HW. The test verifies that
this happens and the kernel handles it appropriately. Could it be platform/mode
difference and not KVM issue?

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 1 week, 6 days ago

On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> > The 32bit selftest still doesn't work properly with sev-es, but that was
> > a problem with the previous version too. I suspect there's some
> > incompatibility between sev-es and the test, but I haven't been able to
> > get a good answer on why that might be.
> 
> You are talking about test_32bit() in test_shadow_stack.c?

Yes, that's right.

> 
> That test relies on a specific CET arch behavior. If you try to transition to a
> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
> bit address space), a #GP will be triggered by the HW. The test verifies that
> this happens and the kernel handles it appropriately. Could it be platform/mode
> difference and not KVM issue?

I'm fairly certain that this is an issue with any sev-es guest. The
unexpected seg fault happens when we isolate the sigaction32 call used
in the test regardless of shadow stack support. So I wonder if it's
something similar to the case that the test is checking for. Maybe
something to do with the C bit.

Thanks,
John

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Tom Lendacky 1 week, 6 days ago

On 9/18/25 17:18, John Allen wrote:
> On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
>> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
>>> The 32bit selftest still doesn't work properly with sev-es, but that was
>>> a problem with the previous version too. I suspect there's some
>>> incompatibility between sev-es and the test, but I haven't been able to
>>> get a good answer on why that might be.
>>
>> You are talking about test_32bit() in test_shadow_stack.c?
> 
> Yes, that's right.
> 
>>
>> That test relies on a specific CET arch behavior. If you try to transition to a
>> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
>> bit address space), a #GP will be triggered by the HW. The test verifies that
>> this happens and the kernel handles it appropriately. Could it be platform/mode
>> difference and not KVM issue?
> 
> I'm fairly certain that this is an issue with any sev-es guest. The
> unexpected seg fault happens when we isolate the sigaction32 call used
> in the test regardless of shadow stack support. So I wonder if it's
> something similar to the case that the test is checking for. Maybe
> something to do with the C bit.

Likely something to do with the encryption bit since, if set, will
generate an invalid address in 32-bit, right?

For SEV-ES, we transition to 64-bit very quickly because of the use of the
encryption bit, which is why, for example, we don't support SEV-ES /
SEV-SNP in the OvmfIa32X64.dsc package.

Thanks,
Tom

> 
> Thanks,
> John

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Edgecombe, Rick P 1 week, 6 days ago

On Fri, 2025-09-19 at 08:40 -0500, Tom Lendacky wrote:
> Likely something to do with the encryption bit since, if set, will
> generate an invalid address in 32-bit, right?

But the SSP is a virtual address and c-bit is a physical thing. 

> 
> For SEV-ES, we transition to 64-bit very quickly because of the use of the
> encryption bit, which is why, for example, we don't support SEV-ES /
> SEV-SNP in the OvmfIa32X64.dsc package.

This sounds like it's about the lack of ability to set the c-bit in the page
table, rather than having the C-bit set in a virtual address. In compatibility
mode you are not using 32 bit page tables, so the C-bit should be available like
normal I think. Not an expert in 32 bit/compatibility mode though.

More background on this test/behavior: During the tail end of the shadow stack
enabling, there was a concern raised that we didn't un-support 32 bit shadow
stack cleanly enough. We blocked it from being allowed in 32 bit apps, but
nothing stopped an app from enabling it in 64 bit an then switching to 32 bit
mode without the kernel getting a chance to block it. The simplest, get-it-done
type solution was to just not allocate shadow stacks in the space where they
could be usable in 32 bit mode and let the HW catch it.

But the whole point is just to not allow 32 bit mode CET. Sounds like SEV-ES
covers this another way - don't support 32 bit at all. I wonder if we should
just patch the test to skip the 32 bit test on coco VMs?

PS, we don't support CET on TDX currently even though it doesn't require
everything in this series, but I just remembered (forehead slap) that on the way
upstream the extra CET-TDX exclusion got pulled out. After this series, it would
be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
me see how the test goes in TDX and get back to you.

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Edgecombe, Rick P 1 week, 5 days ago

+Kiryl, a CET selftest that does int80 fails on SEV-ES.

On Fri, 2025-09-19 at 10:29 -0700, Rick Edgecombe wrote:
> PS, we don't support CET on TDX currently even though it doesn't require
> everything in this series, but I just remembered (forehead slap) that on the way
> upstream the extra CET-TDX exclusion got pulled out. After this series, it would
> be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
> me see how the test goes in TDX and get back to you.

The test passes on a TDX guest:

[INFO]	new_ssp = 7f8c8d7ffff8, *new_ssp = 7f8c8d800001
[INFO]	changing ssp from 7f8c8e1ffff0 to 7f8c8d7ffff8
[INFO]	ssp is now 7f8c8d800000
[OK]	Shadow stack pivot
[OK]	Shadow stack faults
[INFO]	Corrupting shadow stack
[INFO]	Generated shadow stack violation successfully
[OK]	Shadow stack violation test
[INFO]	Gup read -> shstk access success
[INFO]	Gup write -> shstk access success
[INFO]	Violation from normal write
[INFO]	Gup read -> write access success
[INFO]	Violation from normal write
[INFO]	Gup write -> write access success
[INFO]	Cow gup write -> write access success
[OK]	Shadow gup test
[INFO]	Violation from shstk access
[OK]	mprotect() test
[OK]	Userfaultfd test
[OK]	Guard gap test, other mapping's gaps
[OK]	Guard gap test, placement mapping's gaps
[OK]	Ptrace test
[OK]	32 bit test
[OK]	Uretprobe test


I guess int 80 was re-enabled for TDX, after being disabled for both coco
families. See commits starting back from f4116bfc4462 ("x86/tdx: Allow 32-bit
emulation by default"). Not sure why it was done that way. If there is some way
to re-enable int80 for SEV-ES too, we can leave the test as is. But if you
decide to disable the 32 bit test to resolve this, please leave it working for
TDX.

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Kiryl Shutsemau 1 week, 3 days ago

On Fri, Sep 19, 2025 at 08:58:45PM +0000, Edgecombe, Rick P wrote:
> +Kiryl, a CET selftest that does int80 fails on SEV-ES.
> 
> On Fri, 2025-09-19 at 10:29 -0700, Rick Edgecombe wrote:
> > PS, we don't support CET on TDX currently even though it doesn't require
> > everything in this series, but I just remembered (forehead slap) that on the way
> > upstream the extra CET-TDX exclusion got pulled out. After this series, it would
> > be allowed in TDX guests as well. So we need to do the same testing in TDX. Let
> > me see how the test goes in TDX and get back to you.
> 
> The test passes on a TDX guest:
> 
> [INFO]	new_ssp = 7f8c8d7ffff8, *new_ssp = 7f8c8d800001
> [INFO]	changing ssp from 7f8c8e1ffff0 to 7f8c8d7ffff8
> [INFO]	ssp is now 7f8c8d800000
> [OK]	Shadow stack pivot
> [OK]	Shadow stack faults
> [INFO]	Corrupting shadow stack
> [INFO]	Generated shadow stack violation successfully
> [OK]	Shadow stack violation test
> [INFO]	Gup read -> shstk access success
> [INFO]	Gup write -> shstk access success
> [INFO]	Violation from normal write
> [INFO]	Gup read -> write access success
> [INFO]	Violation from normal write
> [INFO]	Gup write -> write access success
> [INFO]	Cow gup write -> write access success
> [OK]	Shadow gup test
> [INFO]	Violation from shstk access
> [OK]	mprotect() test
> [OK]	Userfaultfd test
> [OK]	Guard gap test, other mapping's gaps
> [OK]	Guard gap test, placement mapping's gaps
> [OK]	Ptrace test
> [OK]	32 bit test
> [OK]	Uretprobe test
> 
> 
> I guess int 80 was re-enabled for TDX, after being disabled for both coco
> families. See commits starting back from f4116bfc4462 ("x86/tdx: Allow 32-bit
> emulation by default"). Not sure why it was done that way. If there is some way
> to re-enable int80 for SEV-ES too, we can leave the test as is. But if you
> decide to disable the 32 bit test to resolve this, please leave it working for
> TDX.

In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
safely check ISR to detect if the exception is external or internal.

IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
reliable.

I am not sure if Secure AVIC[1] changes the situation for AMD.

Neeraj?

[1] https://lore.kernel.org/all/20250811094444.203161-1-Neeraj.Upadhyay@amd.com/

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Upadhyay, Neeraj 1 week, 3 days ago

> 
> In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
> safely check ISR to detect if the exception is external or internal.
> 
> IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
> reliable.
> 
> I am not sure if Secure AVIC[1] changes the situation for AMD.
> 
> Neeraj?
> 

For Secure AVIC enabled guests, guest's vAPIC ISR state is not visible 
to (and not controlled by) host or VMM.


- Neeraj

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by Kiryl Shutsemau 1 week, 3 days ago

On Mon, Sep 22, 2025 at 03:03:59PM +0530, Upadhyay, Neeraj wrote:
> 
> > 
> > In TDX case, VAPIC state is protected VMM. It covers ISR, so guest can
> > safely check ISR to detect if the exception is external or internal.
> > 
> > IIUC, VAPIC state is controlled by VMM in SEV case and ISR is not
> > reliable.
> > 
> > I am not sure if Secure AVIC[1] changes the situation for AMD.
> > 
> > Neeraj?
> > 
> 
> For Secure AVIC enabled guests, guest's vAPIC ISR state is not visible to
> (and not controlled by) host or VMM.

In this case, I think you should make ia32_disable() in sme_early_init()
conditional on !Secure AVIC.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v15 29/41] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid

Posted by John Allen 1 week, 6 days ago

On Fri, Sep 19, 2025 at 08:40:15AM -0500, Tom Lendacky wrote:
> On 9/18/25 17:18, John Allen wrote:
> > On Thu, Sep 18, 2025 at 09:42:21PM +0000, Edgecombe, Rick P wrote:
> >> On Thu, 2025-09-18 at 16:23 -0500, John Allen wrote:
> >>> The 32bit selftest still doesn't work properly with sev-es, but that was
> >>> a problem with the previous version too. I suspect there's some
> >>> incompatibility between sev-es and the test, but I haven't been able to
> >>> get a good answer on why that might be.
> >>
> >> You are talking about test_32bit() in test_shadow_stack.c?
> > 
> > Yes, that's right.
> > 
> >>
> >> That test relies on a specific CET arch behavior. If you try to transition to a
> >> 32 bit compatibility mode segment with an SSP with high bits set (outside the 32
> >> bit address space), a #GP will be triggered by the HW. The test verifies that
> >> this happens and the kernel handles it appropriately. Could it be platform/mode
> >> difference and not KVM issue?
> > 
> > I'm fairly certain that this is an issue with any sev-es guest. The
> > unexpected seg fault happens when we isolate the sigaction32 call used
> > in the test regardless of shadow stack support. So I wonder if it's
> > something similar to the case that the test is checking for. Maybe
> > something to do with the C bit.
> 
> Likely something to do with the encryption bit since, if set, will
> generate an invalid address in 32-bit, right?
> 
> For SEV-ES, we transition to 64-bit very quickly because of the use of the
> encryption bit, which is why, for example, we don't support SEV-ES /
> SEV-SNP in the OvmfIa32X64.dsc package.

Ok, I knew this sounded familiar. This came up in a discussion a while
back. The reason this doesn't work is "int 0x80" is blocked in
SEV/SEV-ES guests. See:
b82a8dbd3d2f ("x86/coco: Disable 32-bit emulation by default on TDX and SEV")

So I don't think this should be a blocker for this series, but it is
something we'll want to address in the selftest. However, I'm not sure
how we can check if we're running from an SEV or SEV-ES guest from
userspace. Maybe we could attempt the int 0x80 and catch the seg fault
in which case we assume that we're running under SEV or SEV-ES or some
other situation where int 0x80 isn't supported? Seems hacky and like it
could mask other failures.

Thanks,
John