[v2] KVM: x86: CPUID overhaul, fixes, and caching

[PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Sean Christopherson 1 year, 9 months ago

Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
OR-in features that KVM emulates in software, i.e. that don't depend on
the feature being available in hardware.  The contained scope
of kvm_cpu_cap_init() allows using a local variable to track the set of
emulated leaves, which in addition to avoiding confusing and/or
unnecessary variables, helps prevent misuse of EMUL_F().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1064e4d68718..33e3e77de1b7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	F(name);						\
 })
 
+/*
+ * Emulated Feature - For features that KVM emulates in software irrespective
+ * of host CPU/kernel support.
+ */
+#define EMUL_F(name)						\
+({								\
+	kvm_cpu_cap_emulated |= F(name);			\
+	F(name);						\
+})
+
 /*
  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
@@ -649,6 +659,7 @@ do {									\
 do {									\
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
+	u32 kvm_cpu_cap_emulated = 0;					\
 									\
 	if (leaf < NCAPINTS)						\
 		kvm_cpu_caps[leaf] &= (mask);				\
@@ -656,6 +667,7 @@ do {									\
 		kvm_cpu_caps[leaf] = (mask);				\
 									\
 	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
+	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
 } while (0)
 
 /*
@@ -684,12 +696,10 @@ void kvm_set_cpu_caps(void)
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
 		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
-		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
+		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
 		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
 		F(F16C) | F(RDRAND)
 	);
-	/* KVM emulates x2apic in software irrespective of host support. */
-	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
 		F(FPU) | F(VME) | F(DE) | F(PSE) |
@@ -703,13 +713,13 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
-		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
-		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
-		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
-		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
-		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
-		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
-		F(AVX512VL));
+		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
+		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
+		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
+		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
+		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
+		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
+		F(AVX512BW) | F(AVX512VL));
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
 		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
@@ -728,16 +738,12 @@ void kvm_set_cpu_caps(void)
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
-		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+		F(SPEC_CTRL_SSBD) | EMUL_F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
 		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
 		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
 	);
 
-	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
-	kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
-	kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
-
 	if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-- 
2.45.0.215.g3402c0e53f-goog

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Maxim Levitsky 1 year, 7 months ago

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> OR-in features that KVM emulates in software, i.e. that don't depend on
> the feature being available in hardware.  The contained scope
> of kvm_cpu_cap_init() allows using a local variable to track the set of
> emulated leaves, which in addition to avoiding confusing and/or
> unnecessary variables, helps prevent misuse of EMUL_F().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1064e4d68718..33e3e77de1b7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	F(name);						\
>  })
>  
> +/*
> + * Emulated Feature - For features that KVM emulates in software irrespective
> + * of host CPU/kernel support.
> + */
> +#define EMUL_F(name)						\
> +({								\
> +	kvm_cpu_cap_emulated |= F(name);			\
> +	F(name);						\
> +})

To me it feels more and more that this patch series doesn't go into the right direction.

How about we just abandon the whole concept of masks and instead just have a list of statements

Pretty much the opposite of the patch series I confess:


#define CAP_PASSTHOUGH		0x01
#define CAP_EMULATED		0x02
#define CAP_AMD_ALIASED		0x04 // for AMD aliased features
#define CAP_SCATTERED		0x08
#define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
...


/* CPUID_1_ECX*/

				/* TMA is not passed though because: xyz*/
kvm_cpu_cap_init(TMA,           0);

kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
				/* CNXT_ID is not passed though because: xyz*/
kvm_cpu_cap_init(CNXT_ID,       0);
kvm_cpu_cap_init(RESERVED,      0);
kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
...
				/* KVM always emulates TSC_ADJUST */
kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);

...

/* CPUID_D_1_EAX*/
				/* XFD is disabled on 32 bit systems because: xyz*/
kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)


'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.

There are several advantages to this:

- more readability, plus if needed each statement can be amended with a comment.
- No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit, which is confusing.
  In fact no need to even have them at all.
- No need to verify that bitmask belongs to a feature word.
- Merge friendly - each capability has its own line.

Disadvantages:

- Longer list - IMHO not a problem, since it is very easy to read / search
  and can have as much comments as needed.
  For example this is how the kernel lists the CPUID features and this list IMHO
  is very manageable.

- Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
  performance is the last thing I would care about in this function.


Another note about this patch: It is somewhat confusing that EMUL_F just forces a feature in kvm caps,
regardless of CPU support, because KVM also has KVM_GET_EMULATED_CPUID and it has a different meaning.

Users can easily confuse the EMUL_F for something that sets a feature bit in the KVM_GET_EMULATED_CPUID.


Best regards,
	Maxim Levitsky



> +
>  /*
>   * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
>   * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> @@ -649,6 +659,7 @@ do {									\
>  do {									\
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
>  	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> +	u32 kvm_cpu_cap_emulated = 0;					\
>  									\
>  	if (leaf < NCAPINTS)						\
>  		kvm_cpu_caps[leaf] &= (mask);				\
> @@ -656,6 +667,7 @@ do {									\
>  		kvm_cpu_caps[leaf] = (mask);				\
>  									\
>  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
> +	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
>  } while (0)
>  
>  /*
> @@ -684,12 +696,10 @@ void kvm_set_cpu_caps(void)
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
>  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
>  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> -		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> +		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
>  		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
>  		F(F16C) | F(RDRAND)
>  	);
> -	/* KVM emulates x2apic in software irrespective of host support. */
> -	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
>  
>  	kvm_cpu_cap_init(CPUID_1_EDX,
>  		F(FPU) | F(VME) | F(DE) | F(PSE) |
> @@ -703,13 +713,13 @@ void kvm_set_cpu_caps(void)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_7_0_EBX,
> -		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
> -		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
> -		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> -		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
> -		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
> -		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
> -		F(AVX512VL));
> +		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
> +		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
> +		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
> +		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
> +		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
> +		F(AVX512BW) | F(AVX512VL));
>  
>  	kvm_cpu_cap_init(CPUID_7_ECX,
>  		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> @@ -728,16 +738,12 @@ void kvm_set_cpu_caps(void)
>  
>  	kvm_cpu_cap_init(CPUID_7_EDX,
>  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> -		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> +		F(SPEC_CTRL_SSBD) | EMUL_F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>  		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
>  		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
>  		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
>  	);
>  
> -	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
> -	kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
> -	kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
> -
>  	if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
>  		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
>  	if (boot_cpu_has(X86_FEATURE_STIBP))

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Sean Christopherson 1 year, 7 months ago

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> > OR-in features that KVM emulates in software, i.e. that don't depend on
> > the feature being available in hardware.  The contained scope
> > of kvm_cpu_cap_init() allows using a local variable to track the set of
> > emulated leaves, which in addition to avoiding confusing and/or
> > unnecessary variables, helps prevent misuse of EMUL_F().
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
> >  1 file changed, 21 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 1064e4d68718..33e3e77de1b7 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >  	F(name);						\
> >  })
> >  
> > +/*
> > + * Emulated Feature - For features that KVM emulates in software irrespective
> > + * of host CPU/kernel support.
> > + */
> > +#define EMUL_F(name)						\
> > +({								\
> > +	kvm_cpu_cap_emulated |= F(name);			\
> > +	F(name);						\
> > +})
> 
> To me it feels more and more that this patch series doesn't go into the right
> direction.
> 
> How about we just abandon the whole concept of masks and instead just have a
> list of statements
> 
> Pretty much the opposite of the patch series I confess:

FWIW, I think it's actually largely the same code under the hood.  The code for
each concept/flavor ends up being very similar, it's mostly just handling the
bitwise-OR in the callers vs. in the helpers.

> #define CAP_PASSTHOUGH		0x01
> #define CAP_EMULATED		0x02
> #define CAP_AMD_ALIASED		0x04 // for AMD aliased features
> #define CAP_SCATTERED		0x08
> #define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
> ...
> 
> 
> /* CPUID_1_ECX*/
> 
> 				/* TMA is not passed though because: xyz*/
> kvm_cpu_cap_init(TMA,           0);
> 
> kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> 				/* CNXT_ID is not passed though because: xyz*/
> kvm_cpu_cap_init(CNXT_ID,       0);
> kvm_cpu_cap_init(RESERVED,      0);
> kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> ...
> 				/* KVM always emulates TSC_ADJUST */
> kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> 
> ...
> 
> /* CPUID_D_1_EAX*/
> 				/* XFD is disabled on 32 bit systems because: xyz*/
> kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)
> 
> 
> 'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.
> 
> There are several advantages to this:
> 
> - more readability, plus if needed each statement can be amended with a comment.
> - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
>   which is confusing.
>   In fact no need to even have them at all.
> - No need to verify that bitmask belongs to a feature word.

Yes, but the downside is that there is no enforcement of features in a word being
bundled together.

> - Merge friendly - each capability has its own line.

That's almost entirely convention though.  Other than inertia, nothing is stopping
us from doing:

	kvm_cpu_cap_init(CPUID_12_EAX,
		SF(SGX1) |
		SF(SGX2) |
		SF(SGX_EDECCSSA)
	);

I don't see a clean way of avoiding the addition of " |" on the last existing
line, but in practice I highly doubt that will ever be a source of meaningful pain.

Same goes for the point about adding comments.  We could do that with either
approach, we just don't do so today.

> Disadvantages:
> 
> - Longer list - IMHO not a problem, since it is very easy to read / search
>   and can have as much comments as needed.
>   For example this is how the kernel lists the CPUID features and this list IMHO
>   is very manageable.

There's one big difference: KVM would need to have a line for every feature that
KVM _doesn't_ support.  For densely populated words, that's not a huge issue,
but it's problematic for sparsely populated words, e.g. CPUID_12_EAX would have
29 reserved/unsupport entries, which IMO ends up being a big net negative for
code readability and ongoing maintenance.

We could avoid that cost (and the danger of a missed bit) by collecting the set
of features that have been initialized for each word, and then masking off the
uninitialized/unsupported at the end.  But then we're back to the bitwise-OR and
mask logic.

And while I agree that having the F*() macros set state _and_ evaulate to a bit
is imperfect, it does have its advantages.  E.g. to avoid evaluating to a value,
we could have F() modify a local variable that is scoped to kvm_cpu_cap_init(),
a las kvm_cpu_cap_emulated.  But then we'd need explicit code and/or comments
to call out that VMM_F() and the like intentionally don't set kvm_cpu_cap_supported,
whereas evualating to a value is a relatively self-documenting "0;".

> - Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
>   performance is the last thing I would care about in this function.
> 
> Another note about this patch: It is somewhat confusing that EMUL_F just
> forces a feature in kvm caps, regardless of CPU support, because KVM also has
> KVM_GET_EMULATED_CPUID and it has a different meaning.

Yeah, but IMO that's a problem with KVM_GET_EMULATED_CPUID being poorly defined.

> Users can easily confuse the EMUL_F for something that sets a feature bit in
> the KVM_GET_EMULATED_CPUID.

I'll see if I can find a good spot for a comment to try and convenient

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Maxim Levitsky 1 year, 6 months ago

On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> > > OR-in features that KVM emulates in software, i.e. that don't depend on
> > > the feature being available in hardware.  The contained scope
> > > of kvm_cpu_cap_init() allows using a local variable to track the set of
> > > emulated leaves, which in addition to avoiding confusing and/or
> > > unnecessary variables, helps prevent misuse of EMUL_F().
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
> > >  1 file changed, 21 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 1064e4d68718..33e3e77de1b7 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > >  	F(name);						\
> > >  })
> > >  
> > > +/*
> > > + * Emulated Feature - For features that KVM emulates in software irrespective
> > > + * of host CPU/kernel support.
> > > + */
> > > +#define EMUL_F(name)						\
> > > +({								\
> > > +	kvm_cpu_cap_emulated |= F(name);			\
> > > +	F(name);						\
> > > +})
> > 
> > To me it feels more and more that this patch series doesn't go into the right
> > direction.
> > 
> > How about we just abandon the whole concept of masks and instead just have a
> > list of statements
> > 
> > Pretty much the opposite of the patch series I confess:
> 
> FWIW, I think it's actually largely the same code under the hood.  The code for
> each concept/flavor ends up being very similar, it's mostly just handling the
> bitwise-OR in the callers vs. in the helpers.
> 
> > #define CAP_PASSTHOUGH		0x01
> > #define CAP_EMULATED		0x02
> > #define CAP_AMD_ALIASED		0x04 // for AMD aliased features
> > #define CAP_SCATTERED		0x08
> > #define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
> > ...
> > 
> > 
> > /* CPUID_1_ECX*/
> > 
> > 				/* TMA is not passed though because: xyz*/
> > kvm_cpu_cap_init(TMA,           0);
> > 
> > kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> > 				/* CNXT_ID is not passed though because: xyz*/
> > kvm_cpu_cap_init(CNXT_ID,       0);
> > kvm_cpu_cap_init(RESERVED,      0);
> > kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> > ...
> > 				/* KVM always emulates TSC_ADJUST */
> > kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> > 
> > ...
> > 
> > /* CPUID_D_1_EAX*/
> > 				/* XFD is disabled on 32 bit systems because: xyz*/
> > kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)
> > 
> > 
> > 'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.
> > 
> > There are several advantages to this:
> > 
> > - more readability, plus if needed each statement can be amended with a comment.
> > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> >   which is confusing.
> >   In fact no need to even have them at all.
> > - No need to verify that bitmask belongs to a feature word.
> 
> Yes, but the downside is that there is no enforcement of features in a word being
> bundled together.

As I explained earlier, this is not an issue in principle, even if the caps are not
grouped together, the code will still work just fine.


kvm_cpu_cap_init_begin(CPUID_1_ECX);
                                /* TMA is not passed though because: xyz*/
kvm_cpu_cap_init(TMA,           0);
kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
                                /* CNXT_ID is not passed though because: xyz*/
kvm_cpu_cap_init(CNXT_ID,       0);
kvm_cpu_cap_init(RESERVED,      0);
kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
...
                                /* KVM always emulates TSC_ADJUST */
kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);

kvm_cpu_cap_init_end(CPUID_1_ECX);

...

...

And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.



> 
> > - Merge friendly - each capability has its own line.
> 
> That's almost entirely convention though.  Other than inertia, nothing is stopping
> us from doing:
> 
> 	kvm_cpu_cap_init(CPUID_12_EAX,
> 		SF(SGX1) |
> 		SF(SGX2) |
> 		SF(SGX_EDECCSSA)

That trivial change is already an improvement, although it still leaves the problem
of thinking that this is one bit 'or', which was reasonable before this patch series,
because it was indeed one big 'or' but now there is lots of things going on behind
the scenes and that violates the principle of the least surprise.

My suggestion fixes this, because when the user sees a series of function calls,
and nobody will assume anything about these functions calls in contrast with series
of 'ors'. It's just how I look at it.

> 	);
> 
> I don't see a clean way of avoiding the addition of " |" on the last existing
> line, but in practice I highly doubt that will ever be a source of meaningful pain.
> 
> Same goes for the point about adding comments.  We could do that with either
> approach, we just don't do so today.

Yes, from the syntax POV there is indeed no problem, and I do agree that putting
each feature on its own line, together with comments for the features that need it
is a win-win improvement over what we have after this patch series.

> 
> > Disadvantages:
> > 
> > - Longer list - IMHO not a problem, since it is very easy to read / search
> >   and can have as much comments as needed.
> >   For example this is how the kernel lists the CPUID features and this list IMHO
> >   is very manageable.
> 
> There's one big difference: KVM would need to have a line for every feature that
> KVM _doesn't_ support.

Could you elaborate on why?
If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.



>   For densely populated words, that's not a huge issue,
> but it's problematic for sparsely populated words, e.g. CPUID_12_EAX would have
> 29 reserved/unsupport entries, which IMO ends up being a big net negative for
> code readability and ongoing maintenance.
> 
> We could avoid that cost (and the danger of a missed bit) by collecting the set
> of features that have been initialized for each word, and then masking off the
> uninitialized/unsupported at the end.  But then we're back to the bitwise-OR and
> mask logic.
> 
> And while I agree that having the F*() macros set state _and_ evaulate to a bit
> is imperfect, it does have its advantages.  E.g. to avoid evaluating to a value,
> we could have F() modify a local variable that is scoped to kvm_cpu_cap_init(),
> a las kvm_cpu_cap_emulated.  But then we'd need explicit code and/or comments
> to call out that VMM_F() and the like intentionally don't set kvm_cpu_cap_supported,
> whereas evualating to a value is a relatively self-documenting "0;".
> 
> > - Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
> >   performance is the last thing I would care about in this function.
> > 
> > Another note about this patch: It is somewhat confusing that EMUL_F just
> > forces a feature in kvm caps, regardless of CPU support, because KVM also has
> > KVM_GET_EMULATED_CPUID and it has a different meaning.
> 
> Yeah, but IMO that's a problem with KVM_GET_EMULATED_CPUID being poorly defined.
> 
> > Users can easily confuse the EMUL_F for something that sets a feature bit in
> > the KVM_GET_EMULATED_CPUID.
> 
> I'll see if I can find a good spot for a comment to try and convenient


Best regards,
	Maxim Levitsky
>

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Sean Christopherson 1 year, 6 months ago

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > There are several advantages to this:
> > > 
> > > - more readability, plus if needed each statement can be amended with a comment.
> > > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> > >   which is confusing.
> > >   In fact no need to even have them at all.
> > > - No need to verify that bitmask belongs to a feature word.
> > 
> > Yes, but the downside is that there is no enforcement of features in a word being
> > bundled together.
> 
> As I explained earlier, this is not an issue in principle, even if the caps are not
> grouped together, the code will still work just fine.

I agree that functionally it'll all be fine, but I also want the code to bunch
things together for readers.  We can force that with functions, though it means
passing in more state to kvm_cpu_cap_init_{begin,end}().

> kvm_cpu_cap_init_begin(CPUID_1_ECX);
>                                 /* TMA is not passed though because: xyz*/
> kvm_cpu_cap_init(TMA,           0);
> kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
>                                 /* CNXT_ID is not passed though because: xyz*/
> kvm_cpu_cap_init(CNXT_ID,       0);
> kvm_cpu_cap_init(RESERVED,      0);
> kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> ...
>                                 /* KVM always emulates TSC_ADJUST */
> kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> 
> kvm_cpu_cap_init_end(CPUID_1_ECX);
> 
> ...
> 
> ...
> 
> And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.

Ya, but then compile-time asserts become run-time asserts.

> > > - Merge friendly - each capability has its own line.
> > 
> > That's almost entirely convention though.  Other than inertia, nothing is stopping
> > us from doing:
> > 
> > 	kvm_cpu_cap_init(CPUID_12_EAX,
> > 		SF(SGX1) |
> > 		SF(SGX2) |
> > 		SF(SGX_EDECCSSA)
> 
> That trivial change is already an improvement, although it still leaves the problem
> of thinking that this is one bit 'or', which was reasonable before this patch series,
> because it was indeed one big 'or' but now there is lots of things going on behind
> the scenes and that violates the principle of the least surprise.
> 
> My suggestion fixes this, because when the user sees a series of function calls,
> and nobody will assume anything about these functions calls in contrast with series
> of 'ors'. It's just how I look at it.

If it's the macro styling that's misleading, we could do what we did for the
static_call() wrappers and make them look like functions.  E.g.

	kvm_cpu_cap_init(CPUID_12_EAX,
		scattered_f(SGX1) |
		scattered_f(SGX2) |
		scattered_f(SGX_EDECCSSA)
	);

though that probably doesn't help much and is misleading in its own right.  Does
it help if the names are more verbose? 
 
> > 	);
> > 
> > I don't see a clean way of avoiding the addition of " |" on the last existing
> > line, but in practice I highly doubt that will ever be a source of meaningful pain.
> > 
> > Same goes for the point about adding comments.  We could do that with either
> > approach, we just don't do so today.
> 
> Yes, from the syntax POV there is indeed no problem, and I do agree that putting
> each feature on its own line, together with comments for the features that need it
> is a win-win improvement over what we have after this patch series.
> 
> > 
> > > Disadvantages:
> > > 
> > > - Longer list - IMHO not a problem, since it is very easy to read / search
> > >   and can have as much comments as needed.
> > >   For example this is how the kernel lists the CPUID features and this list IMHO
> > >   is very manageable.
> > 
> > There's one big difference: KVM would need to have a line for every feature that
> > KVM _doesn't_ support.
> 
> Could you elaborate on why?
> If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.

Ah, if we move the the handling of boot_cpu_data[*] into the helpers, then yes,
there's no need to explicitly initialize features that aren't supported by KVM.

That said, I still don't like using functions instead of macros, mainly because
a number of compile-assertions become run-time assertions.  To provide equivalent
functionality, we also would need to pass in extra state to begin/end() (as
mentioned earlier).  Getting compile-time assertions on usage, e.g. via
guest_cpu_cap_has(), would also be trickier, though still doable, I think.
Lastly, it adds an extra step (calling _end()) to each flow, i.e. adds one more
thing for developers to mess up.  But that's a very minor concern and definitely
not a sticking point.

I agree that the macro shenanigans are aggressively clever, but for me, the
benefits of compile-time asserts make it worth dealing with the cleverness.

[*] https://lore.kernel.org/all/ZqKlDC11gItH1uj9@google.com

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by mlevitsk@redhat.com 1 year, 6 months ago

У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > > > > > > There are several advantages to this:
> > > > > > > > 
> > > > > > > > - more readability, plus if needed each statement can be amended with a comment.
> > > > > > > > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> > > > > > > >   which is confusing.
> > > > > > > >   In fact no need to even have them at all.
> > > > > > > > - No need to verify that bitmask belongs to a feature word.
> > > > > > 
> > > > > > Yes, but the downside is that there is no enforcement of features in a word being
> > > > > > bundled together.
> > > > 
> > > > As I explained earlier, this is not an issue in principle, even if the caps are not
> > > > grouped together, the code will still work just fine.
> > 
> > I agree that functionally it'll all be fine, but I also want the code to bunch
> > things together for readers.  We can force that with functions, though it means
> > passing in more state to kvm_cpu_cap_init_{begin,end}().
> > 
> > > > kvm_cpu_cap_init_begin(CPUID_1_ECX);
> > > >                                 /* TMA is not passed though because: xyz*/
> > > > kvm_cpu_cap_init(TMA,           0);
> > > > kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> > > >                                 /* CNXT_ID is not passed though because: xyz*/
> > > > kvm_cpu_cap_init(CNXT_ID,       0);
> > > > kvm_cpu_cap_init(RESERVED,      0);
> > > > kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> > > > ...
> > > >                                 /* KVM always emulates TSC_ADJUST */
> > > > kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> > > > 
> > > > kvm_cpu_cap_init_end(CPUID_1_ECX);
> > > > 
> > > > ...
> > > > 
> > > > ...
> > > > 
> > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > 
> > Ya, but then compile-time asserts become run-time asserts.

Not really, it all can be done with macros, in exactly the same way IMHO,
we do have BUILD_BUG_ON after all.

I am not against using macros, I am only against collecting a bitmask
while applying various side effects, and then passing the bitmask to
the kvm_cpu_cap_init.

> > 
> > > > > > > > - Merge friendly - each capability has its own line.
> > > > > > 
> > > > > > That's almost entirely convention though.  Other than inertia, nothing is stopping
> > > > > > us from doing:
> > > > > > 
> > > > > >         kvm_cpu_cap_init(CPUID_12_EAX,
> > > > > >                 SF(SGX1) |
> > > > > >                 SF(SGX2) |
> > > > > >                 SF(SGX_EDECCSSA)
> > > > 
> > > > That trivial change is already an improvement, although it still leaves the problem
> > > > of thinking that this is one bit 'or', which was reasonable before this patch series,
> > > > because it was indeed one big 'or' but now there is lots of things going on behind
> > > > the scenes and that violates the principle of the least surprise.
> > > > 
> > > > My suggestion fixes this, because when the user sees a series of function calls,
> > > > and nobody will assume anything about these functions calls in contrast with series
> > > > of 'ors'. It's just how I look at it.
> > 
> > If it's the macro styling that's misleading, we could do what we did for the
> > static_call() wrappers and make them look like functions.  E.g.
> > 
> >         kvm_cpu_cap_init(CPUID_12_EAX,
> >                 scattered_f(SGX1) |
> >                 scattered_f(SGX2) |
> >                 scattered_f(SGX_EDECCSSA)
> >         );



> > 
> > though that probably doesn't help much and is misleading in its own right.  Does
> > it help if the names are more verbose? 

Verbose names are a good thing, I already mentioned this.

> >  
> > > > > >         );
> > > > > > 
> > > > > > I don't see a clean way of avoiding the addition of " |" on the last existing
> > > > > > line, but in practice I highly doubt that will ever be a source of meaningful pain.
> > > > > > 
> > > > > > Same goes for the point about adding comments.  We could do that with either
> > > > > > approach, we just don't do so today.
> > > > 
> > > > Yes, from the syntax POV there is indeed no problem, and I do agree that putting
> > > > each feature on its own line, together with comments for the features that need it
> > > > is a win-win improvement over what we have after this patch series.
> > > > 
> > > > > > 
> > > > > > > > Disadvantages:
> > > > > > > > 
> > > > > > > > - Longer list - IMHO not a problem, since it is very easy to read / search
> > > > > > > >   and can have as much comments as needed.
> > > > > > > >   For example this is how the kernel lists the CPUID features and this list IMHO
> > > > > > > >   is very manageable.
> > > > > > 
> > > > > > There's one big difference: KVM would need to have a line for every feature that
> > > > > > KVM _doesn't_ support.
> > > > 
> > > > Could you elaborate on why?
> > > > If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.
> > 
> > Ah, if we move the the handling of boot_cpu_data[*] into the helpers, then yes,
> > there's no need to explicitly initialize features that aren't supported by KVM.
> > 
> > That said, I still don't like using functions instead of macros, mainly because
> > a number of compile-assertions become run-time assertions.

I'm almost sure that we can do everything with compile time asserts with series of functions.




> >   To provide equivalent
> > functionality, we also would need to pass in extra state to begin/end() (as
> > mentioned earlier).

Besides the number of leaf currently initialized, I don't see which other extra state we need.

In fact I can prove that this is possible:

Roughly like this:

#define kvm_cpu_cap_init_begin(leaf)							\
do {											\
 const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
 u32 kvm_cpu_cap_emulated = 0; 								\
 u32 kvm_cpu_cap_synthesized = 0; 							\
	u32 kvm_cpu_cap_regular = 0;


#define feature_scattered(name) 							\
 BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
 KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
											\
	if (boot_cpu_has(X86_FEATURE_##name) 						\
		kvm_cpu_cap_regular |= feature_bit(name);


#define kvm_cpu_cap_init_end() 								\
	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
											\
	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
 		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
 	else 										\
 		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
 											\
 	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
 	kvm_cpu_cap_synthesized); 							\
 	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
} while(0);


And now we have:

kvm_cpu_cap_init_begin(CPUID_12_EAX);
 feature_scattered(SGX1);
 feature_scattered(SGX2);
 feature_scattered(SGX_EDECCSSA);
kvm_cpu_cap_init_end();

In my book this looks much less misleading than the current version - I didn't put
much effort into naming variables though, the kvm_cpu_cap_regular name can be better IMHO.


Best regards,
	Maxim Levitsky


> >   Getting compile-time assertions on usage, e.g. via
> > guest_cpu_cap_has(), would also be trickier, though still doable, I think.
> > Lastly, it adds an extra step (calling _end()) to each flow, i.e. adds one more
> > thing for developers to mess up.  But that's a very minor concern and definitely
> > not a sticking point.
> > 
> > I agree that the macro shenanigans are aggressively clever, but for me, the
> > benefits of compile-time asserts make it worth dealing with the cleverness.
> > 
> > [*] https://lore.kernel.org/all/ZqKlDC11gItH1uj9@google.com
> >

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Sean Christopherson 1 year, 6 months ago

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > > 
> > > Ya, but then compile-time asserts become run-time asserts.
> 
> Not really, it all can be done with macros, in exactly the same way IMHO,
> we do have BUILD_BUG_ON after all.
> 
> I am not against using macros, I am only against collecting a bitmask
> while applying various side effects, and then passing the bitmask to
> the kvm_cpu_cap_init.

Gah, I wasn't grokking that, obviously.  Sorry for not catching on earlier.

> > > To provide equivalent functionality, we also would need to pass in extra
> > > state to begin/end() (as mentioned earlier).
> 
> Besides the number of leaf currently initialized, I don't see which other
> extra state we need.
> 
> In fact I can prove that this is possible:
> 
> Roughly like this:
> 
> #define kvm_cpu_cap_init_begin(leaf)							\
> do {											\
>  const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
>  u32 kvm_cpu_cap_emulated = 0; 								\
>  u32 kvm_cpu_cap_synthesized = 0; 							\
> 	u32 kvm_cpu_cap_regular = 0;

Maybe "virtualized" instead of "regular"?

> #define feature_scattered(name) 							\
>  BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
>  KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
> 											\
> 	if (boot_cpu_has(X86_FEATURE_##name) 						\
> 		kvm_cpu_cap_regular |= feature_bit(name);
> 
> 
> #define kvm_cpu_cap_init_end() 								\
> 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
> 											\
> 	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
>  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
>  	else 										\
>  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
>  											\
>  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
>  	kvm_cpu_cap_synthesized); 							\
>  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
> } while(0);
> 
> 
> And now we have:
> 
> kvm_cpu_cap_init_begin(CPUID_12_EAX);
>  feature_scattered(SGX1);
>  feature_scattered(SGX2);
>  feature_scattered(SGX_EDECCSSA);
> kvm_cpu_cap_init_end();

I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
getting rid of the @mask param/input.

What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
"unpacked" in the context of the macro?  That would avoid the need for a trailing
macro, and would provide a clear indication of when/where the set of features is
"initialized".

The biggest downside I see is that the last entry can't have a trailing comma,
i.e. adding a new feature would require updating the previous feature too.

#define kvm_cpu_cap_init(leaf, init_features...)			\
do {									\
	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
	u32 kvm_cpu_cap_virtualized= 0;					\
	u32 kvm_cpu_cap_emulated = 0;					\
	u32 kvm_cpu_cap_synthesized = 0;				\
									\
	init_features;							\
									\
	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
			       kvm_cpu_cap_synthesized);		\
	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
} while (0)

	kvm_cpu_cap_init(CPUID_1_ECX,
		VIRTUALIZED_F(XMM3),
		VIRTUALIZED_F(PCLMULQDQ),
		VIRTUALIZED_F(SSSE3),
		VIRTUALIZED_F(FMA),
		VIRTUALIZED_F(CX16),
		VIRTUALIZED_F(PDCM),
		VIRTUALIZED_F(PCID),
		VIRTUALIZED_F(XMM4_1),
		VIRTUALIZED_F(XMM4_2),
		EMULATED_F(X2APIC),
		VIRTUALIZED_F(MOVBE),
		VIRTUALIZED_F(POPCNT),
		EMULATED_F(TSC_DEADLINE_TIMER),
		VIRTUALIZED_F(AES),
		VIRTUALIZED_F(XSAVE),
		// DYNAMIC_F(OSXSAVE),
		VIRTUALIZED_F(AVX),
		VIRTUALIZED_F(F16C),
		VIRTUALIZED_F(RDRAND),
		EMULATED_F(HYPERVISOR)
	);


Alternatively, we could force a trailing comma by omitting the semicolon after
init_features, but that looks weird for the the macro itself, and arguably a bit
weird for the users too.

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Maxim Levitsky 1 year, 5 months ago

On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > > > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > > > 
> > > > Ya, but then compile-time asserts become run-time asserts.
> > 
> > Not really, it all can be done with macros, in exactly the same way IMHO,
> > we do have BUILD_BUG_ON after all.
> > 
> > I am not against using macros, I am only against collecting a bitmask
> > while applying various side effects, and then passing the bitmask to
> > the kvm_cpu_cap_init.
> 
> Gah, I wasn't grokking that, obviously.  Sorry for not catching on earlier.
> 
> > > > To provide equivalent functionality, we also would need to pass in extra
> > > > state to begin/end() (as mentioned earlier).
> > 
> > Besides the number of leaf currently initialized, I don't see which other
> > extra state we need.
> > 
> > In fact I can prove that this is possible:
> > 
> > Roughly like this:
> > 
> > #define kvm_cpu_cap_init_begin(leaf)							\
> > do {											\
> >  const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
> >  u32 kvm_cpu_cap_emulated = 0; 								\
> >  u32 kvm_cpu_cap_synthesized = 0; 							\
> > 	u32 kvm_cpu_cap_regular = 0;
> 
> Maybe "virtualized" instead of "regular"?
> 
> > #define feature_scattered(name) 							\
> >  BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
> >  KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
> > 											\
> > 	if (boot_cpu_has(X86_FEATURE_##name) 						\
> > 		kvm_cpu_cap_regular |= feature_bit(name);
> > 
> > 
> > #define kvm_cpu_cap_init_end() 								\
> > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
> > 											\
> > 	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
> >  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
> >  	else 										\
> >  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
> >  											\
> >  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
> >  	kvm_cpu_cap_synthesized); 							\
> >  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
> > } while(0);
> > 
> > 
> > And now we have:
> > 
> > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> >  feature_scattered(SGX1);
> >  feature_scattered(SGX2);
> >  feature_scattered(SGX_EDECCSSA);
> > kvm_cpu_cap_init_end();
> 
> I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> getting rid of the @mask param/input.
> 
> What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> "unpacked" in the context of the macro?  That would avoid the need for a trailing
> macro, and would provide a clear indication of when/where the set of features is
> "initialized".
> 
> The biggest downside I see is that the last entry can't have a trailing comma,
> i.e. adding a new feature would require updating the previous feature too.
> 
> #define kvm_cpu_cap_init(leaf, init_features...)			\
> do {									\
> 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> 	u32 kvm_cpu_cap_virtualized= 0;					\
> 	u32 kvm_cpu_cap_emulated = 0;					\
> 	u32 kvm_cpu_cap_synthesized = 0;				\
> 									\
> 	init_features;							\
> 									\
> 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> 			       kvm_cpu_cap_synthesized);		\
> 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> } while (0)
> 
> 	kvm_cpu_cap_init(CPUID_1_ECX,
> 		VIRTUALIZED_F(XMM3),
> 		VIRTUALIZED_F(PCLMULQDQ),
> 		VIRTUALIZED_F(SSSE3),
> 		VIRTUALIZED_F(FMA),
> 		VIRTUALIZED_F(CX16),
> 		VIRTUALIZED_F(PDCM),
> 		VIRTUALIZED_F(PCID),
> 		VIRTUALIZED_F(XMM4_1),
> 		VIRTUALIZED_F(XMM4_2),
> 		EMULATED_F(X2APIC),
> 		VIRTUALIZED_F(MOVBE),
> 		VIRTUALIZED_F(POPCNT),
> 		EMULATED_F(TSC_DEADLINE_TIMER),
> 		VIRTUALIZED_F(AES),
> 		VIRTUALIZED_F(XSAVE),
> 		// DYNAMIC_F(OSXSAVE),
> 		VIRTUALIZED_F(AVX),
> 		VIRTUALIZED_F(F16C),
> 		VIRTUALIZED_F(RDRAND),
> 		EMULATED_F(HYPERVISOR)
> 	);

Hi,

This is no doubt better than using '|'.

I still strongly prefer my version, because I don't really like the fact that _F 
macros have side effects, and yet passed as parameters to the kvm_cpu_cap_init function/macro.

Basically an unwritten rule, which I consider very important and because of which
I raised my concerns over this patch series is that if a function has side effects,
it should not be used as a parameter to another function, instead, it should be 
called explicitly on its own.

If you strongly prefer the variadic macro over my begin/end API, I can live with
that though, it is still better than '|'ing a mask with functions that have side
effects.

Best regards,
	Maxim Levitsky


> 
> 
> Alternatively, we could force a trailing comma by omitting the semicolon after
> init_features, but that looks weird for the the macro itself, and arguably a bit
> weird for the users too.
>

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Sean Christopherson 1 year, 5 months ago

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> > > And now we have:
> > > 
> > > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> > >  feature_scattered(SGX1);
> > >  feature_scattered(SGX2);
> > >  feature_scattered(SGX_EDECCSSA);
> > > kvm_cpu_cap_init_end();
> > 
> > I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> > getting rid of the @mask param/input.
> > 
> > What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> > "unpacked" in the context of the macro?  That would avoid the need for a trailing
> > macro, and would provide a clear indication of when/where the set of features is
> > "initialized".
> > 
> > The biggest downside I see is that the last entry can't have a trailing comma,
> > i.e. adding a new feature would require updating the previous feature too.
> > 
> > #define kvm_cpu_cap_init(leaf, init_features...)			\
> > do {									\
> > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> > 	u32 kvm_cpu_cap_virtualized= 0;					\
> > 	u32 kvm_cpu_cap_emulated = 0;					\
> > 	u32 kvm_cpu_cap_synthesized = 0;				\
> > 									\
> > 	init_features;							\
> > 									\
> > 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> > 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> > 			       kvm_cpu_cap_synthesized);		\
> > 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> > } while (0)
> > 
> > 	kvm_cpu_cap_init(CPUID_1_ECX,
> > 		VIRTUALIZED_F(XMM3),
> > 		VIRTUALIZED_F(PCLMULQDQ),
> > 		VIRTUALIZED_F(SSSE3),
> > 		VIRTUALIZED_F(FMA),
> > 		VIRTUALIZED_F(CX16),
> > 		VIRTUALIZED_F(PDCM),
> > 		VIRTUALIZED_F(PCID),
> > 		VIRTUALIZED_F(XMM4_1),
> > 		VIRTUALIZED_F(XMM4_2),
> > 		EMULATED_F(X2APIC),
> > 		VIRTUALIZED_F(MOVBE),
> > 		VIRTUALIZED_F(POPCNT),
> > 		EMULATED_F(TSC_DEADLINE_TIMER),
> > 		VIRTUALIZED_F(AES),
> > 		VIRTUALIZED_F(XSAVE),
> > 		// DYNAMIC_F(OSXSAVE),
> > 		VIRTUALIZED_F(AVX),
> > 		VIRTUALIZED_F(F16C),
> > 		VIRTUALIZED_F(RDRAND),
> > 		EMULATED_F(HYPERVISOR)
> > 	);
> 
> Hi,
> 
> This is no doubt better than using '|'.
> 
> I still strongly prefer my version, because I don't really like the fact that
> _F macros have side effects, and yet passed as parameters to the
> kvm_cpu_cap_init function/macro.
> 
> Basically an unwritten rule, which I consider very important and because of which
> I raised my concerns over this patch series is that if a function has side effects,
> it should not be used as a parameter to another function, instead, it should be 
> called explicitly on its own.

Splitting hairs to some degree, but the above suggestion is distinctly different
than passing the _result_ of a function call as a parameter to another function.
The actual "call" happens within the body of kvm_cpu_cap_init().  

This is effectively the same as passing a function pointer to a helper, and that
function pointer implementation having side effects, which is quite common in the
kernel and KVM, e.g. msr_access_t, rmap_handler_t, tdp_handler_t, gfn_handler_t,
on_lock_fn_t, etc.

I 100% agree that it's unusual and subtle to essentially have a variable number
of function pointers, but I don't see it as being an inherently bad pattern,
especially since it is practically impossible to misuse _because_ the macro
unpacks the "calls" at compile time.

IMO, the part that is most gross is the macros operating on local variables, but
that behavior exists in all ideas we've discussed, probably because I'm pretty
sure it's unavoidable unless we do something even worse (way, waaaaay worse).

E.g. we could add 32 versions of kvm_cpu_cap_init() that invoke pairs of parameters
and pass in the variables

  fn1(f1, virtualized, emulated, synthesized)
  fn2(f2, virtualized, emulated, synthesized)
  fn3(f3, virtualized, emulated, synthesized)
  ...
  fnN(fN, virtualized, emulated, synthesized)

and

  kvm_cpu_cap_init19(CPUID_1_ECX,
	F, XMM3,
	F, PCLMULQDQ,
	F, SSE3,
	...
	EMULATED_F, HYPERVISOR
  );

But that's beyond horrific :-)

> If you strongly prefer the variadic macro over my begin/end API, I can live with
> that though, it is still better than '|'ing a mask with functions that have side
> effects.

Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software

Posted by Maxim Levitsky 1 year, 2 months ago

On Wed, 2024-09-11 at 09:03 -0700, Sean Christopherson wrote:
> On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> > On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> > > > And now we have:
> > > > 
> > > > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> > > >  feature_scattered(SGX1);
> > > >  feature_scattered(SGX2);
> > > >  feature_scattered(SGX_EDECCSSA);
> > > > kvm_cpu_cap_init_end();
> > > 
> > > I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> > > getting rid of the @mask param/input.
> > > 
> > > What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> > > "unpacked" in the context of the macro?  That would avoid the need for a trailing
> > > macro, and would provide a clear indication of when/where the set of features is
> > > "initialized".
> > > 
> > > The biggest downside I see is that the last entry can't have a trailing comma,
> > > i.e. adding a new feature would require updating the previous feature too.
> > > 
> > > #define kvm_cpu_cap_init(leaf, init_features...)			\
> > > do {									\
> > > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > > 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> > > 	u32 kvm_cpu_cap_virtualized= 0;					\
> > > 	u32 kvm_cpu_cap_emulated = 0;					\
> > > 	u32 kvm_cpu_cap_synthesized = 0;				\
> > > 									\
> > > 	init_features;							\
> > > 									\
> > > 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> > > 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> > > 			       kvm_cpu_cap_synthesized);		\
> > > 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> > > } while (0)
> > > 
> > > 	kvm_cpu_cap_init(CPUID_1_ECX,
> > > 		VIRTUALIZED_F(XMM3),
> > > 		VIRTUALIZED_F(PCLMULQDQ),
> > > 		VIRTUALIZED_F(SSSE3),
> > > 		VIRTUALIZED_F(FMA),
> > > 		VIRTUALIZED_F(CX16),
> > > 		VIRTUALIZED_F(PDCM),
> > > 		VIRTUALIZED_F(PCID),
> > > 		VIRTUALIZED_F(XMM4_1),
> > > 		VIRTUALIZED_F(XMM4_2),
> > > 		EMULATED_F(X2APIC),
> > > 		VIRTUALIZED_F(MOVBE),
> > > 		VIRTUALIZED_F(POPCNT),
> > > 		EMULATED_F(TSC_DEADLINE_TIMER),
> > > 		VIRTUALIZED_F(AES),
> > > 		VIRTUALIZED_F(XSAVE),
> > > 		// DYNAMIC_F(OSXSAVE),
> > > 		VIRTUALIZED_F(AVX),
> > > 		VIRTUALIZED_F(F16C),
> > > 		VIRTUALIZED_F(RDRAND),
> > > 		EMULATED_F(HYPERVISOR)
> > > 	);
> > 
> > Hi,
> > 
> > This is no doubt better than using '|'.
> > 
> > I still strongly prefer my version, because I don't really like the fact that
> > _F macros have side effects, and yet passed as parameters to the
> > kvm_cpu_cap_init function/macro.
> > 
> > Basically an unwritten rule, which I consider very important and because of which
> > I raised my concerns over this patch series is that if a function has side effects,
> > it should not be used as a parameter to another function, instead, it should be 
> > called explicitly on its own.
> 
> Splitting hairs to some degree, but the above suggestion is distinctly different
> than passing the _result_ of a function call as a parameter to another function.
> The actual "call" happens within the body of kvm_cpu_cap_init().

You are technically right but you use a wrong point of view: You know the implementation,
and I pretend that I don't know it, and try to look at this from the point of view
of someone who just looks a the code for the first time, e.g. to fix some bugs.

Someone who doesn't know anything about this, won't know if these are macros, cleverly
passed to another variadric macro (which is itself a feature that is not often used)

I just state the fact: a function or what looks like a function, result of which
is evaluated in expression or passed to another function (within a single statement)
should not have side effects. 
Only top level function/procedure calls allowed to have side effects - 
otherwise this is just confusing.

Let me explain this again with code:

When I see for the first time this:

result = foo(x) | bar(x);

I strongly expect both foo and bar to be pure functions with no side effects.

Or if I see this for the first time:

err = somefunc(foo(x), bar(x));

I also expect that foo and bar are pure functions,
but 'somefunc' might not be because it only returns an error code,
and it is a top level statement.

And I don't care if this is implemented with functions or macros, because it
looks the same.

This is just how my common sense works.

I won't argue though more about this, I don't want to bikeshed this and block this patch series.
If you insist, let it be, but please at least use the variadic macro.


> 
> This is effectively the same as passing a function pointer to a helper, and that
> function pointer implementation having side effects, which is quite common in the
> kernel and KVM, e.g. msr_access_t, rmap_handler_t, tdp_handler_t, gfn_handler_t,
> on_lock_fn_t, etc.
> 
> I 100% agree that it's unusual and subtle to essentially have a variable number
> of function pointers, but I don't see it as being an inherently bad pattern,
> especially since it is practically impossible to misuse _because_ the macro
> unpacks the "calls" at compile time.
> 
> IMO, the part that is most gross is the macros operating on local variables, but
> that behavior exists in all ideas we've discussed, probably because I'm pretty
> sure it's unavoidable unless we do something even worse (way, waaaaay worse).
> 
> E.g. we could add 32 versions of kvm_cpu_cap_init() that invoke pairs of parameters
> and pass in the variables
> 
>   fn1(f1, virtualized, emulated, synthesized)
>   fn2(f2, virtualized, emulated, synthesized)
>   fn3(f3, virtualized, emulated, synthesized)
>   ...
>   fnN(fN, virtualized, emulated, synthesized)
> 
> and
> 
>   kvm_cpu_cap_init19(CPUID_1_ECX,
> 	F, XMM3,
> 	F, PCLMULQDQ,
> 	F, SSE3,
> 	...
> 	EMULATED_F, HYPERVISOR
>   );

I don't think that this change is worth it, but this is still better in some sense,
because at least the user won't be able to make any assumptions about the above,
and instead will have to read the code and figure out what was done here.
It won't be easy though.

> 
> But that's beyond horrific :-)
> 
> > If you strongly prefer the variadic macro over my begin/end API, I can live with
> > that though, it is still better than '|'ing a mask with functions that have side
> > effects.



Best regards,
	Maxim Levitsky