[PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

Sean Christopherson posted 23 patches 3 years, 7 months ago
There is a newer version of this series
[PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
Posted by Sean Christopherson 3 years, 7 months ago
Disable the optimized APIC logical map if multiple vCPUs are aliased to
the same logical ID.  Architecturally, all CPUs whose logical ID matches
the MDA are supposed to receive the interrupt; overwriting existing map
entries can result in missed IPIs.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6b2f538b8fd0..75748c380ceb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
 		if (!mask)
 			continue;
 
-		if (!is_power_of_2(mask)) {
+		ldr = ffs(mask) - 1;
+		if (!is_power_of_2(mask) || cluster[ldr]) {
 			new->mode = KVM_APIC_MODE_XAPIC_FLAT |
 				    KVM_APIC_MODE_XAPIC_CLUSTER;
 			continue;
 		}
-		cluster[ffs(mask) - 1] = apic;
+		cluster[ldr] = apic;
 	}
 out:
 	old = rcu_dereference_protected(kvm->arch.apic_map,
-- 
2.37.2.789.g6183377224-goog
Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
Posted by Suthikulpanit, Suravee 3 years, 6 months ago
Hi Sean

On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> Disable the optimized APIC logical map if multiple vCPUs are aliased to
> the same logical ID.  Architecturally, all CPUs whose logical ID matches
> the MDA are supposed to receive the interrupt; overwriting existing map
> entries can result in missed IPIs.
> 
> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>   arch/x86/kvm/lapic.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 6b2f538b8fd0..75748c380ceb 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>   		if (!mask)
>   			continue;
>   
> -		if (!is_power_of_2(mask)) {
> +		ldr = ffs(mask) - 1;
> +		if (!is_power_of_2(mask) || cluster[ldr]) {

Should this be checking if the cluster[ldr] is pointing to the same 
struct apic instead? For example:

		if (!is_power_of_2(mask) || cluster[ldr] != apic)

 From my observation, the kvm_recalculate_apic_map() can be called many 
times, and the cluster[ldr] could have already been assigned from the 
previous invocation. So, as long as it is the same, it should be okay.

Best Regards,
Suravee

>   			new->mode = KVM_APIC_MODE_XAPIC_FLAT |
>   				    KVM_APIC_MODE_XAPIC_CLUSTER;
>   			continue;
>   		}
> -		cluster[ffs(mask) - 1] = apic;
> +		cluster[ldr] = apic;
>   	}
>   out:
>   	old = rcu_dereference_protected(kvm->arch.apic_map,
Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
Posted by Sean Christopherson 3 years, 6 months ago
On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
> Hi Sean
> 
> On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> > Disable the optimized APIC logical map if multiple vCPUs are aliased to
> > the same logical ID.  Architecturally, all CPUs whose logical ID matches
> > the MDA are supposed to receive the interrupt; overwriting existing map
> > entries can result in missed IPIs.
> > 
> > Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >   arch/x86/kvm/lapic.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 6b2f538b8fd0..75748c380ceb 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> >   		if (!mask)
> >   			continue;
> > -		if (!is_power_of_2(mask)) {
> > +		ldr = ffs(mask) - 1;
> > +		if (!is_power_of_2(mask) || cluster[ldr]) {
> 
> Should this be checking if the cluster[ldr] is pointing to the same struct
> apic instead? For example:
> 
> 		if (!is_power_of_2(mask) || cluster[ldr] != apic)
> 
> From my observation, the kvm_recalculate_apic_map() can be called many
> times, and the cluster[ldr] could have already been assigned from the
> previous invocation. So, as long as it is the same, it should be okay.

No, because cluster[ldr] can never match "apic".  kvm_recalculate_apic_map()
creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
update of the current map.

The loop containing this code is:

	kvm_for_each_vcpu(i, vcpu, kvm) {
		struct kvm_lapic *apic = vcpu->arch.apic;

		...
	}

so it's impossible for cluster[ldr] to hold the current "apic", because this is
the first and only iteration that processes the current "apic".
Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
Posted by Suthikulpanit, Suravee 3 years, 6 months ago
Sean,

On 9/14/2022 2:42 AM, Sean Christopherson wrote:
> On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
>> Hi Sean
>>
>> On 9/2/2022 7:22 PM, Sean Christopherson wrote:
>>> Disable the optimized APIC logical map if multiple vCPUs are aliased to
>>> the same logical ID.  Architecturally, all CPUs whose logical ID matches
>>> the MDA are supposed to receive the interrupt; overwriting existing map
>>> entries can result in missed IPIs.
>>>
>>> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>>> ---
>>>    arch/x86/kvm/lapic.c | 5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 6b2f538b8fd0..75748c380ceb 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>>>    		if (!mask)
>>>    			continue;
>>> -		if (!is_power_of_2(mask)) {
>>> +		ldr = ffs(mask) - 1;
>>> +		if (!is_power_of_2(mask) || cluster[ldr]) {
>>
>> Should this be checking if the cluster[ldr] is pointing to the same struct
>> apic instead? For example:
>>
>> 		if (!is_power_of_2(mask) || cluster[ldr] != apic)
>>
>>  From my observation, the kvm_recalculate_apic_map() can be called many
>> times, and the cluster[ldr] could have already been assigned from the
>> previous invocation. So, as long as it is the same, it should be okay.
> 
> No, because cluster[ldr] can never match "apic".  kvm_recalculate_apic_map()
> creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
> update of the current map.

Yes, the _new_ is getting created and initialized every time we call 
kvm_recalculate_apic_map(), and then passed into 
kvm_apic_map_get_logical_dest() along with the reference of cluster and 
mask to get populated based on the provided ldr. Please note that the 
new->phys_map[x2apic_id] is already assigned before the calling of 
kvm_apic_map_get_logical_dest(). Therefore, this check:

	if (!is_power_of_2(mask) || cluster[ldr]) {
                                     ^here
will always fail, we are setting the new->logical_mode = 
KVM_APIC_MODE_MAP_DISABLED, which causing APICv/AVIC to always be inhibited.

I do not think this logic is correct. I also add debug printf to check, 
and I never see cluster[ldr] is NULL. Here is example of the debug 
printf just before the check above.

printk("DEBUG: vcpu_id=%u, logical_mode=%#x, mask=%#x, ldr=%#x, 
cluster[ldr]=%#llx, apic=%#llx\n", vcpu->vcpu_id, new->logical_mode, 
mask, ldr, (unsigned long long) cluster[ldr], (unsigned long long) apic);

DEBUG: vcpu_id=0, logical_mode=0x3, mask=0x1, ldr=0x0, 
cluster[ldr]=0xffff8f54fe7e8000, apic=0xffff8f54fe7e8000
DEBUG: vcpu_id=1, logical_mode=0x3, mask=0x2, ldr=0x1, 
cluster[ldr]=0xffff8f5475c28000, apic=0xffff8f5475c28000
DEBUG: vcpu_id=2, logical_mode=0x3, mask=0x4, ldr=0x2, 
cluster[ldr]=0xffff8f54ac488000, apic=0xffff8f54ac488000
DEBUG: vcpu_id=3, logical_mode=0x3, mask=0x8, ldr=0x3, 
cluster[ldr]=0xffff8f550dc34200, apic=0xffff8f550dc34200
DEBUG: vcpu_id=4, logical_mode=0x3, mask=0x10, ldr=0x4, 
cluster[ldr]=0xffff8f550dd08000, apic=0xffff8f550dd08000
.....
DEBUG: vcpu_id=15, logical_mode=0x3, mask=0x8000, ldr=0xf, 
cluster[ldr]=0xffff8f54ac488200, apic=0xffff8f54ac488200
DEBUG: vcpu_id=16, logical_mode=0x3, mask=0x1, ldr=0x0, 
cluster[ldr]=0xffff8f5531ff8000, apic=0xffff8f5531ff8000
DEBUG: vcpu_id=17, logical_mode=0x3, mask=0x2, ldr=0x1, 
cluster[ldr]=0xffff8f54e87c8200, apic=0xffff8f54e87c8200
DEBUG: vcpu_id=18, logical_mode=0x3, mask=0x4, ldr=0x2, 
cluster[ldr]=0xffff8f551c870200, apic=0xffff8f551c870200

Please, lemme know if I am missing something.

Best Regards,
Suravee

> The loop containing this code is:
> 
> 	kvm_for_each_vcpu(i, vcpu, kvm) {
> 		struct kvm_lapic *apic = vcpu->arch.apic;
> 
> 		...
> 	}
> 
> so it's impossible for cluster[ldr] to hold the current "apic", because this is
> the first and only iteration that processes the current "apic".
Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode
Posted by Sean Christopherson 3 years, 6 months ago
On Wed, Sep 14, 2022, Suthikulpanit, Suravee wrote:
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 6b2f538b8fd0..75748c380ceb 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> > > >    		if (!mask)
> > > >    			continue;
> > > > -		if (!is_power_of_2(mask)) {
> > > > +		ldr = ffs(mask) - 1;
> > > > +		if (!is_power_of_2(mask) || cluster[ldr]) {
> > > 
> > > Should this be checking if the cluster[ldr] is pointing to the same struct
> > > apic instead? For example:
> > > 
> > > 		if (!is_power_of_2(mask) || cluster[ldr] != apic)
> > > 
> > >  From my observation, the kvm_recalculate_apic_map() can be called many
> > > times, and the cluster[ldr] could have already been assigned from the
> > > previous invocation. So, as long as it is the same, it should be okay.
> > 
> > No, because cluster[ldr] can never match "apic".  kvm_recalculate_apic_map()
> > creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
> > update of the current map.
> 
> Yes, the _new_ is getting created and initialized every time we call
> kvm_recalculate_apic_map(), and then passed into
> kvm_apic_map_get_logical_dest() along with the reference of cluster and mask
> to get populated based on the provided ldr. Please note that the
> new->phys_map[x2apic_id] is already assigned before the calling of

Ooh, this is what I was missing.  LDR is read-only for x2APIC, and so KVM simply
uses the phys_map and computes the phys_map indices by reversing the x2APIC=>LDR
calculation.

So it's so much not that _can_ "apic" can match "cluster[ldr]", it's actually that
"apic" _must_ match "cluster[ldr]" for this flow.  Overwriting the cluster entry
is all kinds of pointless.  It's either unnecessary (no bugs) or it breaks things
(bugs in either LDR calculation or logical dest math).

Rather than add an exception to the cluster[] check, which is very confusing for
xAPIC, the whole flow can be skipped for x2APIC, with a sanity check that the LDR
does indeed align with the x2APIC ID.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 76a19bf1eb55..e9d7c647e8a7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -347,6 +347,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
                }
                new->logical_mode = logical_mode;
 
+               /* I'll add a comment here. */
+               if (apic_x2apic_mode(apic)) {
+                       WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
+                       continue;
+               }
+
                if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
                                                                &cluster, &mask))) {
                        new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;

Alternatively, the x2APIC handling could be done after kvm_apic_map_get_logical_dest(),
e.g. so that KVM can sanity check mask+cluster[ldr], but that's annoying to implement
and IMO it's overkill since the we can just as easily verify the math via tests on top
of the LDR sanity check.

I'll do a better job of verifying that APICv + x2APIC yields the correct inhibits.
I verified x2APIC functional correctness, and that APICv + xAPIC yielded the correct
inhibits, but I obviously didn't verify APICv + x2APIC yielded the correct inhibits.

Thanks much!