From: Isaku Yamahata <isaku.yamahata@intel.com>
Always block INIT and SIPI events for the TDX guest because the TDX module
doesn't provide API for VMM to inject INIT IPI or SIPI.
TDX defines its own vCPU creation and initialization sequence including
multiple seamcalls. Also, it's only allowed during TD build time.
Given that TDX guest is para-virtualized to boot BSP/APs, normally there
shouldn't be any INIT/SIPI event for TDX guest. If any, three options to
handle them:
1. Always block INIT/SIPI request.
2. (Silently) ignore INIT/SIPI request during delivery.
3. Return error to guest TDs somehow.
Choose option 1 for simplicity. Since INIT and SIPI are always blocked,
INIT handling and the OP vcpu_deliver_sipi_vector() won't be called, no
need to add new interface or helper function for INIT/SIPI delivery.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX interrupts breakout:
- Renamed from "KVM: TDX: Silently ignore INIT/SIPI" to
"KVM: TDX: Always block INIT/SIPI".
- Remove KVM_BUG_ON() in tdx_vcpu_reset(). (Rick)
- Drop tdx_vcpu_reset() and move the comment to vt_vcpu_reset().
- Remove unnecessary interface and helpers to delivery INIT/SIPI
because INIT/SIPI events are always blocked for TDX. (Binbin)
- Update changelog.
---
arch/x86/kvm/lapic.c | 2 +-
arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 474e0a7c1069..f93c382344ee 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
kvm_vcpu_reset(vcpu, true);
- if (kvm_vcpu_is_bsp(apic->vcpu))
+ if (kvm_vcpu_is_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
else
vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 8ec96646faec..7f933f821188 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
{
+ /*
+ * TDX has its own sequence to do init during TD build time (by
+ * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
+ * runtime.
+ */
if (is_td_vcpu(vcpu))
return;
@@ -211,6 +216,18 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
}
#endif
+static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
+{
+ /*
+ * INIT and SIPI are always blocked for TDX, i.e., INIT handling and
+ * the OP vcpu_deliver_sipi_vector() won't be called.
+ */
+ if (is_td_vcpu(vcpu))
+ return true;
+
+ return vmx_apic_init_signal_blocked(vcpu);
+}
+
static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
{
struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
@@ -565,7 +582,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
#endif
.check_emulate_instruction = vmx_check_emulate_instruction,
- .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
+ .apic_init_signal_blocked = vt_apic_init_signal_blocked,
.migrate_timers = vmx_migrate_timers,
.msr_filter_changed = vmx_msr_filter_changed,
--
2.46.0
On Mon, 2024-12-09 at 09:07 +0800, Binbin Wu wrote:
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 474e0a7c1069..f93c382344ee 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>
> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> kvm_vcpu_reset(vcpu, true);
> - if (kvm_vcpu_is_bsp(apic->vcpu))
> + if (kvm_vcpu_is_bsp(vcpu))
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> else
> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
This part seems not related. If it needs to be done, should it be done in a
separate patch?
On 12/9/2024 9:07 AM, Binbin Wu wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
>
> Always block INIT and SIPI events for the TDX guest because the TDX module
> doesn't provide API for VMM to inject INIT IPI or SIPI.
>
> TDX defines its own vCPU creation and initialization sequence including
> multiple seamcalls. Also, it's only allowed during TD build time.
>
> Given that TDX guest is para-virtualized to boot BSP/APs, normally there
> shouldn't be any INIT/SIPI event for TDX guest. If any, three options to
> handle them:
> 1. Always block INIT/SIPI request.
> 2. (Silently) ignore INIT/SIPI request during delivery.
> 3. Return error to guest TDs somehow.
>
> Choose option 1 for simplicity. Since INIT and SIPI are always blocked,
> INIT handling and the OP vcpu_deliver_sipi_vector() won't be called, no
> need to add new interface or helper function for INIT/SIPI delivery.
>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> TDX interrupts breakout:
> - Renamed from "KVM: TDX: Silently ignore INIT/SIPI" to
> "KVM: TDX: Always block INIT/SIPI".
> - Remove KVM_BUG_ON() in tdx_vcpu_reset(). (Rick)
> - Drop tdx_vcpu_reset() and move the comment to vt_vcpu_reset().
> - Remove unnecessary interface and helpers to delivery INIT/SIPI
> because INIT/SIPI events are always blocked for TDX. (Binbin)
> - Update changelog.
> ---
> arch/x86/kvm/lapic.c | 2 +-
> arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
> 2 files changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 474e0a7c1069..f93c382344ee 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>
> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> kvm_vcpu_reset(vcpu, true);
> - if (kvm_vcpu_is_bsp(apic->vcpu))
> + if (kvm_vcpu_is_bsp(vcpu))
> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> else
> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 8ec96646faec..7f933f821188 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>
> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> {
> + /*
> + * TDX has its own sequence to do init during TD build time (by
> + * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
> + * runtime.
> + */
The first half is confusing. It seems to mix up init(ialization) with
INIT event.
And this callback is about *reset*, which can be due to INIT event or
not. That's why it has a second parameter of init_event. The comment
needs to clarify why reset is not needed for both cases.
I think we can just say TDX doesn't support vcpu reset no matter due to
INIT event or not.
> if (is_td_vcpu(vcpu))
> return;
>
> @@ -211,6 +216,18 @@ static void vt_enable_smi_window(struct kvm_vcpu *vcpu)
> }
> #endif
>
> +static bool vt_apic_init_signal_blocked(struct kvm_vcpu *vcpu)
> +{
> + /*
> + * INIT and SIPI are always blocked for TDX, i.e., INIT handling and
> + * the OP vcpu_deliver_sipi_vector() won't be called.
> + */
> + if (is_td_vcpu(vcpu))
> + return true;
> +
> + return vmx_apic_init_signal_blocked(vcpu);
> +}
> +
> static void vt_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
> {
> struct pi_desc *pi = vcpu_to_pi_desc(vcpu);
> @@ -565,7 +582,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> #endif
>
> .check_emulate_instruction = vmx_check_emulate_instruction,
> - .apic_init_signal_blocked = vmx_apic_init_signal_blocked,
> + .apic_init_signal_blocked = vt_apic_init_signal_blocked,
> .migrate_timers = vmx_migrate_timers,
>
> .msr_filter_changed = vmx_msr_filter_changed,
On 1/8/2025 3:21 PM, Xiaoyao Li wrote:
> On 12/9/2024 9:07 AM, Binbin Wu wrote:
>> From: Isaku Yamahata <isaku.yamahata@intel.com>
>>
>> Always block INIT and SIPI events for the TDX guest because the TDX module
>> doesn't provide API for VMM to inject INIT IPI or SIPI.
>>
>> TDX defines its own vCPU creation and initialization sequence including
>> multiple seamcalls. Also, it's only allowed during TD build time.
>>
>> Given that TDX guest is para-virtualized to boot BSP/APs, normally there
>> shouldn't be any INIT/SIPI event for TDX guest. If any, three options to
>> handle them:
>> 1. Always block INIT/SIPI request.
>> 2. (Silently) ignore INIT/SIPI request during delivery.
>> 3. Return error to guest TDs somehow.
>>
>> Choose option 1 for simplicity. Since INIT and SIPI are always blocked,
>> INIT handling and the OP vcpu_deliver_sipi_vector() won't be called, no
>> need to add new interface or helper function for INIT/SIPI delivery.
>>
>> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
>> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
>> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
>> ---
>> TDX interrupts breakout:
>> - Renamed from "KVM: TDX: Silently ignore INIT/SIPI" to
>> "KVM: TDX: Always block INIT/SIPI".
>> - Remove KVM_BUG_ON() in tdx_vcpu_reset(). (Rick)
>> - Drop tdx_vcpu_reset() and move the comment to vt_vcpu_reset().
>> - Remove unnecessary interface and helpers to delivery INIT/SIPI
>> because INIT/SIPI events are always blocked for TDX. (Binbin)
>> - Update changelog.
>> ---
>> arch/x86/kvm/lapic.c | 2 +-
>> arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
>> 2 files changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 474e0a7c1069..f93c382344ee 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
>> kvm_vcpu_reset(vcpu, true);
>> - if (kvm_vcpu_is_bsp(apic->vcpu))
>> + if (kvm_vcpu_is_bsp(vcpu))
>> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>> else
>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>> index 8ec96646faec..7f933f821188 100644
>> --- a/arch/x86/kvm/vmx/main.c
>> +++ b/arch/x86/kvm/vmx/main.c
>> @@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>> {
>> + /*
>> + * TDX has its own sequence to do init during TD build time (by
>> + * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
>> + * runtime.
>> + */
>
> The first half is confusing. It seems to mix up init(ialization) with INIT event.
>
> And this callback is about *reset*, which can be due to INIT event or not. That's why it has a second parameter of init_event. The comment needs to clarify why reset is not needed for both cases.
>
> I think we can just say TDX doesn't support vcpu reset no matter due to INIT event or not.
>
>
Will update it.
Thanks!
On Wed, Jan 08, 2025, Binbin Wu wrote:
> On 1/8/2025 3:21 PM, Xiaoyao Li wrote:
> > On 12/9/2024 9:07 AM, Binbin Wu wrote:
...
> > > ---
> > > arch/x86/kvm/lapic.c | 2 +-
> > > arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
> > > 2 files changed, 19 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 474e0a7c1069..f93c382344ee 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> > > if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> > > kvm_vcpu_reset(vcpu, true);
> > > - if (kvm_vcpu_is_bsp(apic->vcpu))
> > > + if (kvm_vcpu_is_bsp(vcpu))
> > > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > > else
> > > vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> > > diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> > > index 8ec96646faec..7f933f821188 100644
> > > --- a/arch/x86/kvm/vmx/main.c
> > > +++ b/arch/x86/kvm/vmx/main.c
> > > @@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
> > > static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > > {
> > > + /*
> > > + * TDX has its own sequence to do init during TD build time (by
> > > + * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
> > > + * runtime.
> > > + */
> >
> > The first half is confusing. It seems to mix up init(ialization) with INIT
> > event.
> >
> > And this callback is about *reset*, which can be due to INIT event or not.
> > That's why it has a second parameter of init_event. The comment needs to
> > clarify why reset is not needed for both cases.
> >
> > I think we can just say TDX doesn't support vcpu reset no matter due to
> > INIT event or not.
That's not entirely accurate either though. TDX does support KVM's version of
RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of
runtime RESET is userspace's responsibility.
The real reason why KVM doesn't do anything during KVM's RESET is that what
little setup KVM does/can do needs to be defered until after guest CPUID is
configured.
KVM should also WARN if a TDX vCPU gets INIT, no?
Side topic, the comment about x2APIC in tdx_vcpu_init() is too specific, e.g.
calling out that x2APIC support is enumerated in CPUID.0x1.ECX isn't necessary,
and stating that userspace must use KVM_SET_CPUID2 is flat out wrong. Very
technically, KVM_SET_CPUID is also a valid option, it's just not used in practice
because it doesn't support setting non-zero indices (but in theory it could be
used to enable x2APIC).
E.g. something like this?
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d2e78e6675b9..e36fba94fa14 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -115,13 +115,10 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
{
- /*
- * TDX has its own sequence to do init during TD build time (by
- * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
- * runtime.
- */
- if (is_td_vcpu(vcpu))
+ if (is_td_vcpu(vcpu)) {
+ tdx_vcpu_reset(vcpu, init_event);
return;
+ }
vmx_vcpu_reset(vcpu, init_event);
}
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 9e490fccf073..a587f59167a7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2806,9 +2806,8 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
return -EINVAL;
/*
- * As TDX requires X2APIC, set local apic mode to X2APIC. User space
- * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
- * KVM_SET_CPUID2. Otherwise kvm_apic_set_base() will fail.
+ * TDX requires x2APIC, userspace is responsible for configuring guest
+ * CPUID accordingly.
*/
apic_base = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
(kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0);
@@ -2827,6 +2826,19 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
return 0;
}
+void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+{
+ /*
+ * Yell on INIT, as TDX doesn't support INIT, i.e. KVM should drop all
+ * INIT events.
+ *
+ * Defer initializing vCPU for RESET state until KVM_TDX_INIT_VCPU, as
+ * userspace needs to define the vCPU model before KVM can initialize
+ * vCPU state, e.g. to enable x2APIC.
+ */
+ WARN_ON_ONCE(init_event);
+}
+
struct tdx_gmem_post_populate_arg {
struct kvm_vcpu *vcpu;
__u32 flags;
On 1/8/2025 10:40 PM, Sean Christopherson wrote:
> On Wed, Jan 08, 2025, Binbin Wu wrote:
>> On 1/8/2025 3:21 PM, Xiaoyao Li wrote:
>>> On 12/9/2024 9:07 AM, Binbin Wu wrote:
> ...
>
>>>> ---
>>>> arch/x86/kvm/lapic.c | 2 +-
>>>> arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
>>>> 2 files changed, 19 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>> index 474e0a7c1069..f93c382344ee 100644
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>>>> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
>>>> kvm_vcpu_reset(vcpu, true);
>>>> - if (kvm_vcpu_is_bsp(apic->vcpu))
>>>> + if (kvm_vcpu_is_bsp(vcpu))
>>>> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>>>> else
>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>>>> index 8ec96646faec..7f933f821188 100644
>>>> --- a/arch/x86/kvm/vmx/main.c
>>>> +++ b/arch/x86/kvm/vmx/main.c
>>>> @@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>>>> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>>> {
>>>> + /*
>>>> + * TDX has its own sequence to do init during TD build time (by
>>>> + * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
>>>> + * runtime.
>>>> + */
>>> The first half is confusing. It seems to mix up init(ialization) with INIT
>>> event.
>>>
>>> And this callback is about *reset*, which can be due to INIT event or not.
>>> That's why it has a second parameter of init_event. The comment needs to
>>> clarify why reset is not needed for both cases.
>>>
>>> I think we can just say TDX doesn't support vcpu reset no matter due to
>>> INIT event or not.
> That's not entirely accurate either though. TDX does support KVM's version of
> RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of
> runtime RESET is userspace's responsibility.
>
> The real reason why KVM doesn't do anything during KVM's RESET is that what
> little setup KVM does/can do needs to be defered until after guest CPUID is
> configured.
>
> KVM should also WARN if a TDX vCPU gets INIT, no?
There was a KVM_BUG_ON() if a TDX vCPU gets INIT in v19, and later it was
removed during the cleanup about removing WARN_ON_ONCE() and KVM_BUG_ON().
Since INIT/SIPI are always blocked for TDX guests, a delivery of INIT
event is a KVM bug and a WARN_ON_ONCE() is appropriate for this case.
>
> Side topic, the comment about x2APIC in tdx_vcpu_init() is too specific, e.g.
> calling out that x2APIC support is enumerated in CPUID.0x1.ECX isn't necessary,
> and stating that userspace must use KVM_SET_CPUID2 is flat out wrong. Very
> technically, KVM_SET_CPUID is also a valid option, it's just not used in practice
> because it doesn't support setting non-zero indices (but in theory it could be
> used to enable x2APIC).
Indeed, it is too specific.
>
> E.g. something like this?
LGTM.
Thanks for the suggestion!
>
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index d2e78e6675b9..e36fba94fa14 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -115,13 +115,10 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>
> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> {
> - /*
> - * TDX has its own sequence to do init during TD build time (by
> - * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
> - * runtime.
> - */
> - if (is_td_vcpu(vcpu))
> + if (is_td_vcpu(vcpu)) {
> + tdx_vcpu_reset(vcpu, init_event);
> return;
> + }
>
> vmx_vcpu_reset(vcpu, init_event);
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 9e490fccf073..a587f59167a7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2806,9 +2806,8 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> return -EINVAL;
>
> /*
> - * As TDX requires X2APIC, set local apic mode to X2APIC. User space
> - * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> - * KVM_SET_CPUID2. Otherwise kvm_apic_set_base() will fail.
> + * TDX requires x2APIC, userspace is responsible for configuring guest
> + * CPUID accordingly.
> */
> apic_base = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0);
> @@ -2827,6 +2826,19 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> return 0;
> }
>
> +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> + /*
> + * Yell on INIT, as TDX doesn't support INIT, i.e. KVM should drop all
> + * INIT events.
> + *
> + * Defer initializing vCPU for RESET state until KVM_TDX_INIT_VCPU, as
> + * userspace needs to define the vCPU model before KVM can initialize
> + * vCPU state, e.g. to enable x2APIC.
> + */
> + WARN_ON_ONCE(init_event);
> +}
> +
> struct tdx_gmem_post_populate_arg {
> struct kvm_vcpu *vcpu;
> __u32 flags;
>
On Thu, 2025-01-09 at 10:26 +0800, Binbin Wu wrote: > > > > I think we can just say TDX doesn't support vcpu reset no matter due to > > > > INIT event or not. > > That's not entirely accurate either though. TDX does support KVM's version of > > RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of > > runtime RESET is userspace's responsibility. > > > > The real reason why KVM doesn't do anything during KVM's RESET is that what > > little setup KVM does/can do needs to be defered until after guest CPUID is > > configured. > > > > KVM should also WARN if a TDX vCPU gets INIT, no? > > There was a KVM_BUG_ON() if a TDX vCPU gets INIT in v19, and later it was > removed during the cleanup about removing WARN_ON_ONCE() and KVM_BUG_ON(). > > Since INIT/SIPI are always blocked for TDX guests, a delivery of INIT > event is a KVM bug and a WARN_ON_ONCE() is appropriate for this case. Can TDX guest issue INIT via IPI? Perhaps KVM_BUG_ON() is safer?
On 1/9/2025 10:46 AM, Huang, Kai wrote: > On Thu, 2025-01-09 at 10:26 +0800, Binbin Wu wrote: >>>>> I think we can just say TDX doesn't support vcpu reset no matter due to >>>>> INIT event or not. >>> That's not entirely accurate either though. TDX does support KVM's version of >>> RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of >>> runtime RESET is userspace's responsibility. >>> >>> The real reason why KVM doesn't do anything during KVM's RESET is that what >>> little setup KVM does/can do needs to be defered until after guest CPUID is >>> configured. >>> >>> KVM should also WARN if a TDX vCPU gets INIT, no? >> There was a KVM_BUG_ON() if a TDX vCPU gets INIT in v19, and later it was >> removed during the cleanup about removing WARN_ON_ONCE() and KVM_BUG_ON(). >> >> Since INIT/SIPI are always blocked for TDX guests, a delivery of INIT >> event is a KVM bug and a WARN_ON_ONCE() is appropriate for this case. > Can TDX guest issue INIT via IPI? Perhaps KVM_BUG_ON() is safer? TDX guests are not expected to issue INIT, but it could in theory. It seems no serous impact if guest does it, not sure it needs to kill the VM or not. Also, in this patch, for TDX kvm_apic_init_sipi_allowed() is always returning false, so vt_vcpu_reset() will not be called with init=true. Adding a WARN_ON_ONCE() is the guard for the KVM's logic itself, not the guard for guest behavior.
On Thu, 2025-01-09 at 11:20 +0800, Binbin Wu wrote: > > > On 1/9/2025 10:46 AM, Huang, Kai wrote: > > On Thu, 2025-01-09 at 10:26 +0800, Binbin Wu wrote: > > > > > > I think we can just say TDX doesn't support vcpu reset no matter due to > > > > > > INIT event or not. > > > > That's not entirely accurate either though. TDX does support KVM's version of > > > > RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of > > > > runtime RESET is userspace's responsibility. > > > > > > > > The real reason why KVM doesn't do anything during KVM's RESET is that what > > > > little setup KVM does/can do needs to be defered until after guest CPUID is > > > > configured. > > > > > > > > KVM should also WARN if a TDX vCPU gets INIT, no? > > > There was a KVM_BUG_ON() if a TDX vCPU gets INIT in v19, and later it was > > > removed during the cleanup about removing WARN_ON_ONCE() and KVM_BUG_ON(). > > > > > > Since INIT/SIPI are always blocked for TDX guests, a delivery of INIT > > > event is a KVM bug and a WARN_ON_ONCE() is appropriate for this case. > > Can TDX guest issue INIT via IPI? Perhaps KVM_BUG_ON() is safer? > TDX guests are not expected to issue INIT, but it could in theory. > It seems no serous impact if guest does it, not sure it needs to kill the > VM or not. > > Also, in this patch, for TDX kvm_apic_init_sipi_allowed() is always > returning false, so vt_vcpu_reset() will not be called with init=true. > Adding a WARN_ON_ONCE() is the guard for the KVM's logic itself, > not the guard for guest behavior. > Yeah agreed. I replied too early before looking at the patch.
On 1/8/2025 10:40 PM, Sean Christopherson wrote:
> On Wed, Jan 08, 2025, Binbin Wu wrote:
>> On 1/8/2025 3:21 PM, Xiaoyao Li wrote:
>>> On 12/9/2024 9:07 AM, Binbin Wu wrote:
>
> ...
>
>>>> ---
>>>> arch/x86/kvm/lapic.c | 2 +-
>>>> arch/x86/kvm/vmx/main.c | 19 ++++++++++++++++++-
>>>> 2 files changed, 19 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>>> index 474e0a7c1069..f93c382344ee 100644
>>>> --- a/arch/x86/kvm/lapic.c
>>>> +++ b/arch/x86/kvm/lapic.c
>>>> @@ -3365,7 +3365,7 @@ int kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>>>> if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
>>>> kvm_vcpu_reset(vcpu, true);
>>>> - if (kvm_vcpu_is_bsp(apic->vcpu))
>>>> + if (kvm_vcpu_is_bsp(vcpu))
>>>> vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>>>> else
>>>> vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>>>> index 8ec96646faec..7f933f821188 100644
>>>> --- a/arch/x86/kvm/vmx/main.c
>>>> +++ b/arch/x86/kvm/vmx/main.c
>>>> @@ -115,6 +115,11 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>>>> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>>> {
>>>> + /*
>>>> + * TDX has its own sequence to do init during TD build time (by
>>>> + * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
>>>> + * runtime.
>>>> + */
>>>
>>> The first half is confusing. It seems to mix up init(ialization) with INIT
>>> event.
>>>
>>> And this callback is about *reset*, which can be due to INIT event or not.
>>> That's why it has a second parameter of init_event. The comment needs to
>>> clarify why reset is not needed for both cases.
>>>
>>> I think we can just say TDX doesn't support vcpu reset no matter due to
>>> INIT event or not.
>
> That's not entirely accurate either though. TDX does support KVM's version of
> RESET, because KVM's RESET is "power-on", i.e. vCPU creation. Emulation of
> runtime RESET is userspace's responsibility.
>
> The real reason why KVM doesn't do anything during KVM's RESET is that what
> little setup KVM does/can do needs to be defered until after guest CPUID is
> configured.
It's much clear now.
> KVM should also WARN if a TDX vCPU gets INIT, no?
Agreed.
> Side topic, the comment about x2APIC in tdx_vcpu_init() is too specific, e.g.
> calling out that x2APIC support is enumerated in CPUID.0x1.ECX isn't necessary,
> and stating that userspace must use KVM_SET_CPUID2 is flat out wrong. Very
> technically, KVM_SET_CPUID is also a valid option, it's just not used in practice
> because it doesn't support setting non-zero indices (but in theory it could be
> used to enable x2APIC).
>
> E.g. something like this?
LGTM.
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index d2e78e6675b9..e36fba94fa14 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -115,13 +115,10 @@ static void vt_vcpu_free(struct kvm_vcpu *vcpu)
>
> static void vt_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> {
> - /*
> - * TDX has its own sequence to do init during TD build time (by
> - * KVM_TDX_INIT_VCPU) and it doesn't support INIT event during TD
> - * runtime.
> - */
> - if (is_td_vcpu(vcpu))
> + if (is_td_vcpu(vcpu)) {
> + tdx_vcpu_reset(vcpu, init_event);
> return;
> + }
>
> vmx_vcpu_reset(vcpu, init_event);
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 9e490fccf073..a587f59167a7 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -2806,9 +2806,8 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> return -EINVAL;
>
> /*
> - * As TDX requires X2APIC, set local apic mode to X2APIC. User space
> - * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> - * KVM_SET_CPUID2. Otherwise kvm_apic_set_base() will fail.
> + * TDX requires x2APIC, userspace is responsible for configuring guest
> + * CPUID accordingly.
> */
> apic_base = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0);
> @@ -2827,6 +2826,19 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> return 0;
> }
>
> +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> +{
> + /*
> + * Yell on INIT, as TDX doesn't support INIT, i.e. KVM should drop all
> + * INIT events.
> + *
> + * Defer initializing vCPU for RESET state until KVM_TDX_INIT_VCPU, as
> + * userspace needs to define the vCPU model before KVM can initialize
> + * vCPU state, e.g. to enable x2APIC.
> + */
> + WARN_ON_ONCE(init_event);
> +}
> +
> struct tdx_gmem_post_populate_arg {
> struct kvm_vcpu *vcpu;
> __u32 flags;
>
© 2016 - 2025 Red Hat, Inc.