[Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible

Roger Pau Monne posted 1 patch 4 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20191224124453.47183-1-roger.pau@citrix.com
xen/arch/x86/acpi/boot.c  |  1 +
xen/arch/x86/mpparse.c    |  5 +++++
xen/arch/x86/smp.c        | 41 ++++++++++++++++++++++++++++++++++++++-
xen/include/asm-x86/smp.h |  2 ++
4 files changed, 48 insertions(+), 1 deletion(-)
[Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monne 4 years, 4 months ago
If the flush mask matches the mask of online CPUs use the APIC ALLBUT
destination shorthand in order to send an IPI to all CPUs on the
system except the current one. This can only be safely used when no
CPU hotplug or unplug operations are taking place, no offline CPUs or
those have been onlined and parked and finally when all CPUs in the
system have been accounted for (ie: the number of CPUs doesn't exceed
NR_CPUS and APIC IDs are below MAX_APICS).

This is specially beneficial when using the PV shim, since using the
shorthand avoids performing an APIC register write (or multiple ones
if using xAPIC mode) for each destination in the flush mask.

The lock time on a 32 vCPU guest using the shim without the shorthand
is:

Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
  lock:228455938(79406065573135), block:205908580(556416605761539)

Average lock time: 347577ns

While the same guest using the shorthand:

Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
  lock:1890775(416719148054), block:1663958(2500161282949)

Average lock time: 220395ns

Approximately a 1/3 improvement in the lock time.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/acpi/boot.c  |  1 +
 xen/arch/x86/mpparse.c    |  5 +++++
 xen/arch/x86/smp.c        | 41 ++++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/smp.h |  2 ++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/acpi/boot.c b/xen/arch/x86/acpi/boot.c
index 15542a9bdf..88e1a89ff0 100644
--- a/xen/arch/x86/acpi/boot.c
+++ b/xen/arch/x86/acpi/boot.c
@@ -103,6 +103,7 @@ acpi_parse_x2apic(struct acpi_subtable_header *header, const unsigned long end)
 			       processor->lapic_flags & ACPI_MADT_ENABLED
 			       ? KERN_WARNING "WARNING: " : KERN_INFO,
 			       processor->local_apic_id, processor->uid);
+		cpu_overflow = true;
 		/*
 		 * Must not return an error here, to prevent
 		 * acpi_table_parse_entries() from terminating early.
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index f057d9162f..8d7739fbf4 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -66,6 +66,9 @@ static unsigned int __initdata disabled_cpus;
 /* Bitmask of physically existing CPUs */
 physid_mask_t phys_cpu_present_map;
 
+/* Record whether CPUs haven't been added due to overflows. */
+bool __read_mostly cpu_overflow;
+
 void __init set_nr_cpu_ids(unsigned int max_cpus)
 {
 	unsigned int tot_cpus = num_processors + disabled_cpus;
@@ -160,6 +163,7 @@ static int MP_processor_info_x(struct mpc_config_processor *m,
 		printk_once(XENLOG_WARNING
 			    "WARNING: NR_CPUS limit of %u reached - ignoring further processors\n",
 			    nr_cpu_ids);
+		cpu_overflow = true;
 		return -ENOSPC;
 	}
 
@@ -167,6 +171,7 @@ static int MP_processor_info_x(struct mpc_config_processor *m,
 	    && genapic.name == apic_default.name) {
 		printk_once(XENLOG_WARNING
 			    "WARNING: CPUs limit of 8 reached - ignoring futher processors\n");
+		cpu_overflow = true;
 		return -ENOSPC;
 	}
 
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 6fb39a0a24..427c33db9d 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -8,6 +8,7 @@
  *	later.
  */
 
+#include <xen/cpu.h>
 #include <xen/irq.h>
 #include <xen/sched.h>
 #include <xen/delay.h>
@@ -123,6 +124,11 @@ void send_IPI_self_legacy(uint8_t vector)
     __default_send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
 }
 
+static void send_IPI_allbutself(unsigned int vector)
+{
+    __default_send_IPI_shortcut(APIC_DEST_ALLBUT, vector, APIC_DEST_PHYSICAL);
+}
+
 void send_IPI_mask_flat(const cpumask_t *cpumask, int vector)
 {
     unsigned long mask = cpumask_bits(cpumask)[0];
@@ -227,14 +233,47 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
     if ( (flags & ~FLUSH_ORDER_MASK) &&
          !cpumask_subset(mask, cpumask_of(cpu)) )
     {
+        bool cpus_locked = false;
+
         spin_lock(&flush_lock);
         cpumask_and(&flush_cpumask, mask, &cpu_online_map);
         cpumask_clear_cpu(cpu, &flush_cpumask);
         flush_va      = va;
         flush_flags   = flags;
-        send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
+
+        /*
+         * Prevent any CPU hot{un}plug while sending the IPIs if we are to use
+         * a shorthand, also refuse to use a shorthand if not all CPUs are
+         * online or have been parked.
+         */
+        if ( system_state > SYS_STATE_smp_boot && !cpu_overflow &&
+             (cpus_locked = get_cpu_maps()) &&
+             (park_offline_cpus ||
+              cpumask_equal(&cpu_online_map, &cpu_present_map)) )
+        {
+            cpumask_copy(this_cpu(scratch_cpumask), &cpu_online_map);
+            cpumask_clear_cpu(cpu, this_cpu(scratch_cpumask));
+        }
+        else
+        {
+            if ( cpus_locked )
+            {
+                put_cpu_maps();
+                cpus_locked = false;
+            }
+            cpumask_clear(this_cpu(scratch_cpumask));
+        }
+
+        if ( cpumask_equal(&flush_cpumask, this_cpu(scratch_cpumask)) )
+            send_IPI_allbutself(INVALIDATE_TLB_VECTOR);
+        else
+            send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
+
         while ( !cpumask_empty(&flush_cpumask) )
             cpu_relax();
+
+        if ( cpus_locked )
+            put_cpu_maps();
         spin_unlock(&flush_lock);
     }
 }
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index dbeed2fd41..3df4185744 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -84,6 +84,8 @@ extern cpumask_t **socket_cpumask;
 #define get_cpu_current(cpu) \
     (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu)
 
+extern bool cpu_overflow;
+
 #endif /* !__ASSEMBLY__ */
 
 #endif
-- 
2.24.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Andrew Cooper 4 years, 3 months ago
On 24/12/2019 12:44, Roger Pau Monne wrote:
> If the flush mask matches the mask of online CPUs use the APIC ALLBUT
> destination shorthand in order to send an IPI to all CPUs on the
> system except the current one. This can only be safely used when no
> CPU hotplug or unplug operations are taking place, no offline CPUs or
> those have been onlined and parked and finally when all CPUs in the
> system have been accounted for (ie: the number of CPUs doesn't exceed
> NR_CPUS and APIC IDs are below MAX_APICS).
>
> This is specially beneficial when using the PV shim, since using the
> shorthand avoids performing an APIC register write (or multiple ones
> if using xAPIC mode) for each destination in the flush mask.
>
> The lock time on a 32 vCPU guest using the shim without the shorthand
> is:
>
> Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
>   lock:228455938(79406065573135), block:205908580(556416605761539)
>
> Average lock time: 347577ns
>
> While the same guest using the shorthand:
>
> Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
>   lock:1890775(416719148054), block:1663958(2500161282949)
>
> Average lock time: 220395ns
>
> Approximately a 1/3 improvement in the lock time.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

While these are good stats, I'm somewhat hesitant about hacking this in
like this.  For one, it is a substantial amount of ad-hoc logic in
flush_area_mask()

Shorthand safety really should be part of the apic driver, not part of
the TLB logic.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Fri, Dec 27, 2019 at 03:03:57PM +0000, Andrew Cooper wrote:
> On 24/12/2019 12:44, Roger Pau Monne wrote:
> > If the flush mask matches the mask of online CPUs use the APIC ALLBUT
> > destination shorthand in order to send an IPI to all CPUs on the
> > system except the current one. This can only be safely used when no
> > CPU hotplug or unplug operations are taking place, no offline CPUs or
> > those have been onlined and parked and finally when all CPUs in the
> > system have been accounted for (ie: the number of CPUs doesn't exceed
> > NR_CPUS and APIC IDs are below MAX_APICS).
> >
> > This is specially beneficial when using the PV shim, since using the
> > shorthand avoids performing an APIC register write (or multiple ones
> > if using xAPIC mode) for each destination in the flush mask.
> >
> > The lock time on a 32 vCPU guest using the shim without the shorthand
> > is:
> >
> > Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
> >   lock:228455938(79406065573135), block:205908580(556416605761539)
> >
> > Average lock time: 347577ns
> >
> > While the same guest using the shorthand:
> >
> > Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
> >   lock:1890775(416719148054), block:1663958(2500161282949)
> >
> > Average lock time: 220395ns
> >
> > Approximately a 1/3 improvement in the lock time.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> While these are good stats, I'm somewhat hesitant about hacking this in
> like this.  For one, it is a substantial amount of ad-hoc logic in
> flush_area_mask()
> 
> Shorthand safety really should be part of the apic driver, not part of
> the TLB logic.

Yes, I've coded it this way because there are already similar APIC
hooks (ie: send_IPI_self). I can merge the shorthand functionality
with send_IPI_mask if that's preferred.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Jan Beulich 4 years, 3 months ago
On 24.12.2019 13:44, Roger Pau Monne wrote:
> @@ -227,14 +233,47 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>           !cpumask_subset(mask, cpumask_of(cpu)) )
>      {
> +        bool cpus_locked = false;
> +
>          spin_lock(&flush_lock);
>          cpumask_and(&flush_cpumask, mask, &cpu_online_map);
>          cpumask_clear_cpu(cpu, &flush_cpumask);
>          flush_va      = va;
>          flush_flags   = flags;
> -        send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
> +
> +        /*
> +         * Prevent any CPU hot{un}plug while sending the IPIs if we are to use
> +         * a shorthand, also refuse to use a shorthand if not all CPUs are
> +         * online or have been parked.
> +         */
> +        if ( system_state > SYS_STATE_smp_boot && !cpu_overflow &&
> +             (cpus_locked = get_cpu_maps()) &&
> +             (park_offline_cpus ||

Why is it relevant whether we park offline CPUs, or whether we've
even brought up all of the ones a system has? An IPI, in particular
a broadcast one, shouldn't have any issue getting delivered if some
of the nominal recipients don't listen, should it? (The use of
cpu_online_map that was already there above is a sign - but not a
proof, as it may itself be buggy - that the set of online CPUs
fluctuating behind this function's back ought to not be a problem.)

Further a question on lock nesting: Since the commit message
doesn't say anything in this regard, did you check there are no
TLB flush invocations with the get_cpu_maps() lock held? Even if
you did and even if there are none, I think the function should
then get a comment attached to the effect of this lock order
inversion risk. (For example, it isn't obvious to me that no user
of stop_machine() would ever want to do any kind of TLB flushing.)

Overall I wonder whether your goal couldn't be achieved without
the extra locking and without the special conditions.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Andrew Cooper 4 years, 3 months ago
On 03/01/2020 12:08, Jan Beulich wrote:
> On 24.12.2019 13:44, Roger Pau Monne wrote:
>> @@ -227,14 +233,47 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>>           !cpumask_subset(mask, cpumask_of(cpu)) )
>>      {
>> +        bool cpus_locked = false;
>> +
>>          spin_lock(&flush_lock);
>>          cpumask_and(&flush_cpumask, mask, &cpu_online_map);
>>          cpumask_clear_cpu(cpu, &flush_cpumask);
>>          flush_va      = va;
>>          flush_flags   = flags;
>> -        send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
>> +
>> +        /*
>> +         * Prevent any CPU hot{un}plug while sending the IPIs if we are to use
>> +         * a shorthand, also refuse to use a shorthand if not all CPUs are
>> +         * online or have been parked.
>> +         */
>> +        if ( system_state > SYS_STATE_smp_boot && !cpu_overflow &&
>> +             (cpus_locked = get_cpu_maps()) &&
>> +             (park_offline_cpus ||
> Why is it relevant whether we park offline CPUs, or whether we've
> even brought up all of the ones a system has? An IPI, in particular
> a broadcast one, shouldn't have any issue getting delivered if some
> of the nominal recipients don't listen, should it?

TGLX had a hard time time making shorthands work correctly on Linux. 
ISTR target CPUs in wait-for-SIPI may cause the source side of the IPI
to hang indefinitely, or yield an APIC Send error.

Linux maintains a bitmap of "CPUs which have booted once and we know can
safely handle IPIs", and doesn't permit any broadcast shorthands until
this matches the cpus_available mask.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> On 24.12.2019 13:44, Roger Pau Monne wrote:
> > @@ -227,14 +233,47 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> >           !cpumask_subset(mask, cpumask_of(cpu)) )
> >      {
> > +        bool cpus_locked = false;
> > +
> >          spin_lock(&flush_lock);
> >          cpumask_and(&flush_cpumask, mask, &cpu_online_map);
> >          cpumask_clear_cpu(cpu, &flush_cpumask);
> >          flush_va      = va;
> >          flush_flags   = flags;
> > -        send_IPI_mask(&flush_cpumask, INVALIDATE_TLB_VECTOR);
> > +
> > +        /*
> > +         * Prevent any CPU hot{un}plug while sending the IPIs if we are to use
> > +         * a shorthand, also refuse to use a shorthand if not all CPUs are
> > +         * online or have been parked.
> > +         */
> > +        if ( system_state > SYS_STATE_smp_boot && !cpu_overflow &&
> > +             (cpus_locked = get_cpu_maps()) &&
> > +             (park_offline_cpus ||
> 
> Why is it relevant whether we park offline CPUs, or whether we've
> even brought up all of the ones a system has? An IPI, in particular
> a broadcast one, shouldn't have any issue getting delivered if some
> of the nominal recipients don't listen, should it? (The use of
> cpu_online_map that was already there above is a sign - but not a
> proof, as it may itself be buggy - that the set of online CPUs
> fluctuating behind this function's back ought to not be a problem.)

I've tried it myself, and if not all CPUs are onlined when the
shorthand is used the box would just reboot. This matches the
description at:

https://lwn.net/Articles/793065/

Of the Linux shorthand implementation.

> Further a question on lock nesting: Since the commit message
> doesn't say anything in this regard, did you check there are no
> TLB flush invocations with the get_cpu_maps() lock held?

The CPU maps lock is a recursive one, so it should be fine to attempt
a TLB flush with the lock already held.

> Even if
> you did and even if there are none, I think the function should
> then get a comment attached to the effect of this lock order
> inversion risk. (For example, it isn't obvious to me that no user
> of stop_machine() would ever want to do any kind of TLB flushing.)
> 
> Overall I wonder whether your goal couldn't be achieved without
> the extra locking and without the special conditions.

Hm, this so far has worked fine on all the boxes that I've tried.
I'm happy to change it to a simpler approach, but I think the
conditions and locking are required for this to work correctly.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Jan Beulich 4 years, 3 months ago
On 03.01.2020 13:34, Roger Pau Monné wrote:
> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
>> On 24.12.2019 13:44, Roger Pau Monne wrote:
>> Further a question on lock nesting: Since the commit message
>> doesn't say anything in this regard, did you check there are no
>> TLB flush invocations with the get_cpu_maps() lock held?
> 
> The CPU maps lock is a recursive one, so it should be fine to attempt
> a TLB flush with the lock already held.

When already held by the same CPU - sure. It being a recursive
one (which I paid attention to when writing my earlier reply)
doesn't make it (together with any other one) immune against
ABBA deadlocks, though.

>> Even if
>> you did and even if there are none, I think the function should
>> then get a comment attached to the effect of this lock order
>> inversion risk. (For example, it isn't obvious to me that no user
>> of stop_machine() would ever want to do any kind of TLB flushing.)
>>
>> Overall I wonder whether your goal couldn't be achieved without
>> the extra locking and without the special conditions.
> 
> Hm, this so far has worked fine on all the boxes that I've tried.
> I'm happy to change it to a simpler approach, but I think the
> conditions and locking are required for this to work correctly.

Which might then indicate said pre-existing use of cpu_online_map
to be a (latent?) problem.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
> On 03.01.2020 13:34, Roger Pau Monné wrote:
> > On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> >> On 24.12.2019 13:44, Roger Pau Monne wrote:
> >> Further a question on lock nesting: Since the commit message
> >> doesn't say anything in this regard, did you check there are no
> >> TLB flush invocations with the get_cpu_maps() lock held?
> > 
> > The CPU maps lock is a recursive one, so it should be fine to attempt
> > a TLB flush with the lock already held.
> 
> When already held by the same CPU - sure. It being a recursive
> one (which I paid attention to when writing my earlier reply)
> doesn't make it (together with any other one) immune against
> ABBA deadlocks, though.

There's no possibility of a deadlock here because get_cpu_maps does a
trylock, so if another cpu is holding the lock the flush will just
fallback to not using the shorthand.

> >> Even if
> >> you did and even if there are none, I think the function should
> >> then get a comment attached to the effect of this lock order
> >> inversion risk. (For example, it isn't obvious to me that no user
> >> of stop_machine() would ever want to do any kind of TLB flushing.)
> >>
> >> Overall I wonder whether your goal couldn't be achieved without
> >> the extra locking and without the special conditions.
> > 
> > Hm, this so far has worked fine on all the boxes that I've tried.
> > I'm happy to change it to a simpler approach, but I think the
> > conditions and locking are required for this to work correctly.
> 
> Which might then indicate said pre-existing use of cpu_online_map
> to be a (latent?) problem.

Hm, maybe it could be a problem when offlining a CPU. I assume this is
not an issue so far because there are no global TLB flushes with a
mask contaning a CPU that is being offlined.

Regarding the patch itself, do you think the shorthand logic should be
added to send_IPI_mask, or would you rather only use the shorthand for
the TLB flushes?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Jan Beulich 4 years, 3 months ago
On 08.01.2020 14:30, Roger Pau Monné  wrote:
> On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
>> On 03.01.2020 13:34, Roger Pau Monné wrote:
>>> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
>>>> On 24.12.2019 13:44, Roger Pau Monne wrote:
>>>> Further a question on lock nesting: Since the commit message
>>>> doesn't say anything in this regard, did you check there are no
>>>> TLB flush invocations with the get_cpu_maps() lock held?
>>>
>>> The CPU maps lock is a recursive one, so it should be fine to attempt
>>> a TLB flush with the lock already held.
>>
>> When already held by the same CPU - sure. It being a recursive
>> one (which I paid attention to when writing my earlier reply)
>> doesn't make it (together with any other one) immune against
>> ABBA deadlocks, though.
> 
> There's no possibility of a deadlock here because get_cpu_maps does a
> trylock, so if another cpu is holding the lock the flush will just
> fallback to not using the shorthand.

Well, with the _exact_ arrangements (flush_lock used only in one
place, and cpu_add_remove_lock only used in ways similar to how it
is used now), there's no such risk, I agree. But there's nothing
at all making sure this doesn't change. Hence, as said, at the very
least this needs reasoning about in the description (or a code
comment).

>>>> Even if
>>>> you did and even if there are none, I think the function should
>>>> then get a comment attached to the effect of this lock order
>>>> inversion risk. (For example, it isn't obvious to me that no user
>>>> of stop_machine() would ever want to do any kind of TLB flushing.)
>>>>
>>>> Overall I wonder whether your goal couldn't be achieved without
>>>> the extra locking and without the special conditions.
>>>
>>> Hm, this so far has worked fine on all the boxes that I've tried.
>>> I'm happy to change it to a simpler approach, but I think the
>>> conditions and locking are required for this to work correctly.
>>
>> Which might then indicate said pre-existing use of cpu_online_map
>> to be a (latent?) problem.
> 
> Hm, maybe it could be a problem when offlining a CPU. I assume this is
> not an issue so far because there are no global TLB flushes with a
> mask contaning a CPU that is being offlined.
> 
> Regarding the patch itself, do you think the shorthand logic should be
> added to send_IPI_mask, or would you rather only use the shorthand for
> the TLB flushes?

If it could be generalize to all IPI sending, this would of course
seem preferable.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Wed, Jan 08, 2020 at 02:54:57PM +0100, Jan Beulich wrote:
> On 08.01.2020 14:30, Roger Pau Monné  wrote:
> > On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
> >> On 03.01.2020 13:34, Roger Pau Monné wrote:
> >>> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> >>>> On 24.12.2019 13:44, Roger Pau Monne wrote:
> >>>> Further a question on lock nesting: Since the commit message
> >>>> doesn't say anything in this regard, did you check there are no
> >>>> TLB flush invocations with the get_cpu_maps() lock held?
> >>>
> >>> The CPU maps lock is a recursive one, so it should be fine to attempt
> >>> a TLB flush with the lock already held.
> >>
> >> When already held by the same CPU - sure. It being a recursive
> >> one (which I paid attention to when writing my earlier reply)
> >> doesn't make it (together with any other one) immune against
> >> ABBA deadlocks, though.
> > 
> > There's no possibility of a deadlock here because get_cpu_maps does a
> > trylock, so if another cpu is holding the lock the flush will just
> > fallback to not using the shorthand.
> 
> Well, with the _exact_ arrangements (flush_lock used only in one
> place, and cpu_add_remove_lock only used in ways similar to how it
> is used now), there's no such risk, I agree. But there's nothing
> at all making sure this doesn't change. Hence, as said, at the very
> least this needs reasoning about in the description (or a code
> comment).

I'm afraid you will have to bear with me, but I'm still not sure how
the addition of get_cpu_maps in flush_area_mask can lead to deadlocks.
As said above get_cpu_maps does a trylock, which means that it will
never deadlock, and that's the only way to lock cpu_add_remove_lock.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Jan Beulich 4 years, 3 months ago
On 08.01.2020 19:14, Roger Pau Monné wrote:
> On Wed, Jan 08, 2020 at 02:54:57PM +0100, Jan Beulich wrote:
>> On 08.01.2020 14:30, Roger Pau Monné  wrote:
>>> On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
>>>> On 03.01.2020 13:34, Roger Pau Monné wrote:
>>>>> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
>>>>>> On 24.12.2019 13:44, Roger Pau Monne wrote:
>>>>>> Further a question on lock nesting: Since the commit message
>>>>>> doesn't say anything in this regard, did you check there are no
>>>>>> TLB flush invocations with the get_cpu_maps() lock held?
>>>>>
>>>>> The CPU maps lock is a recursive one, so it should be fine to attempt
>>>>> a TLB flush with the lock already held.
>>>>
>>>> When already held by the same CPU - sure. It being a recursive
>>>> one (which I paid attention to when writing my earlier reply)
>>>> doesn't make it (together with any other one) immune against
>>>> ABBA deadlocks, though.
>>>
>>> There's no possibility of a deadlock here because get_cpu_maps does a
>>> trylock, so if another cpu is holding the lock the flush will just
>>> fallback to not using the shorthand.
>>
>> Well, with the _exact_ arrangements (flush_lock used only in one
>> place, and cpu_add_remove_lock only used in ways similar to how it
>> is used now), there's no such risk, I agree. But there's nothing
>> at all making sure this doesn't change. Hence, as said, at the very
>> least this needs reasoning about in the description (or a code
>> comment).
> 
> I'm afraid you will have to bear with me, but I'm still not sure how
> the addition of get_cpu_maps in flush_area_mask can lead to deadlocks.
> As said above get_cpu_maps does a trylock, which means that it will
> never deadlock, and that's the only way to lock cpu_add_remove_lock.

That's why I said "cpu_add_remove_lock only used in ways similar to
how it is used now" - any non-trylock addition would break the
assumptions you make afaict. And you can't really expect people
adding another (slightly different) use of an existing lock to be
aware they now need to check whether this introduces deadlock
scenarios on unrelated pre-existing code paths. Hence my request to
at least discuss this in the description (raising awareness, and
hence allowing reviewers to judge whether further precautionary
measures should be taken).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Thu, Jan 09, 2020 at 10:43:01AM +0100, Jan Beulich wrote:
> On 08.01.2020 19:14, Roger Pau Monné wrote:
> > On Wed, Jan 08, 2020 at 02:54:57PM +0100, Jan Beulich wrote:
> >> On 08.01.2020 14:30, Roger Pau Monné  wrote:
> >>> On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
> >>>> On 03.01.2020 13:34, Roger Pau Monné wrote:
> >>>>> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> >>>>>> On 24.12.2019 13:44, Roger Pau Monne wrote:
> >>>>>> Further a question on lock nesting: Since the commit message
> >>>>>> doesn't say anything in this regard, did you check there are no
> >>>>>> TLB flush invocations with the get_cpu_maps() lock held?
> >>>>>
> >>>>> The CPU maps lock is a recursive one, so it should be fine to attempt
> >>>>> a TLB flush with the lock already held.
> >>>>
> >>>> When already held by the same CPU - sure. It being a recursive
> >>>> one (which I paid attention to when writing my earlier reply)
> >>>> doesn't make it (together with any other one) immune against
> >>>> ABBA deadlocks, though.
> >>>
> >>> There's no possibility of a deadlock here because get_cpu_maps does a
> >>> trylock, so if another cpu is holding the lock the flush will just
> >>> fallback to not using the shorthand.
> >>
> >> Well, with the _exact_ arrangements (flush_lock used only in one
> >> place, and cpu_add_remove_lock only used in ways similar to how it
> >> is used now), there's no such risk, I agree. But there's nothing
> >> at all making sure this doesn't change. Hence, as said, at the very
> >> least this needs reasoning about in the description (or a code
> >> comment).
> > 
> > I'm afraid you will have to bear with me, but I'm still not sure how
> > the addition of get_cpu_maps in flush_area_mask can lead to deadlocks.
> > As said above get_cpu_maps does a trylock, which means that it will
> > never deadlock, and that's the only way to lock cpu_add_remove_lock.
> 
> That's why I said "cpu_add_remove_lock only used in ways similar to
> how it is used now" - any non-trylock addition would break the
> assumptions you make afaict. And you can't really expect people
> adding another (slightly different) use of an existing lock to be
> aware they now need to check whether this introduces deadlock
> scenarios on unrelated pre-existing code paths. Hence my request to
> at least discuss this in the description (raising awareness, and
> hence allowing reviewers to judge whether further precautionary
> measures should be taken).

Thanks for the clarification, I did indeed misunderstood your
complain.

Regarding the generalization of the shorthand and integration into
send_IPI_mask, I've found an issue related to locking. send_IPI_mask
is called with both interrupts enabled and disabled, and so
get_cpu_maps panics because of the lock checking.

I however don't think that such panic is accurate: the logic in
check_lock specifically relates to the spinning done when picking a
lock, but I would say the call to check_lock in
_spin_trylock{_recursive} is not strictly needed, the scenario
described in check_lock doesn't apply when using trylock.

So my question is whether you would be OK to remove the calls to
check_lock in the trylock variants, or would you rather keep the
shorthand usage limited to flush_area_mask.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86/flush: use APIC ALLBUT destination shorthand when possible
Posted by Roger Pau Monné 4 years, 3 months ago
On Thu, Jan 09, 2020 at 12:24:05PM +0100, Roger Pau Monné wrote:
> On Thu, Jan 09, 2020 at 10:43:01AM +0100, Jan Beulich wrote:
> > On 08.01.2020 19:14, Roger Pau Monné wrote:
> > > On Wed, Jan 08, 2020 at 02:54:57PM +0100, Jan Beulich wrote:
> > >> On 08.01.2020 14:30, Roger Pau Monné  wrote:
> > >>> On Fri, Jan 03, 2020 at 01:55:51PM +0100, Jan Beulich wrote:
> > >>>> On 03.01.2020 13:34, Roger Pau Monné wrote:
> > >>>>> On Fri, Jan 03, 2020 at 01:08:20PM +0100, Jan Beulich wrote:
> > >>>>>> On 24.12.2019 13:44, Roger Pau Monne wrote:
> > >>>>>> Further a question on lock nesting: Since the commit message
> > >>>>>> doesn't say anything in this regard, did you check there are no
> > >>>>>> TLB flush invocations with the get_cpu_maps() lock held?
> > >>>>>
> > >>>>> The CPU maps lock is a recursive one, so it should be fine to attempt
> > >>>>> a TLB flush with the lock already held.
> > >>>>
> > >>>> When already held by the same CPU - sure. It being a recursive
> > >>>> one (which I paid attention to when writing my earlier reply)
> > >>>> doesn't make it (together with any other one) immune against
> > >>>> ABBA deadlocks, though.
> > >>>
> > >>> There's no possibility of a deadlock here because get_cpu_maps does a
> > >>> trylock, so if another cpu is holding the lock the flush will just
> > >>> fallback to not using the shorthand.
> > >>
> > >> Well, with the _exact_ arrangements (flush_lock used only in one
> > >> place, and cpu_add_remove_lock only used in ways similar to how it
> > >> is used now), there's no such risk, I agree. But there's nothing
> > >> at all making sure this doesn't change. Hence, as said, at the very
> > >> least this needs reasoning about in the description (or a code
> > >> comment).
> > > 
> > > I'm afraid you will have to bear with me, but I'm still not sure how
> > > the addition of get_cpu_maps in flush_area_mask can lead to deadlocks.
> > > As said above get_cpu_maps does a trylock, which means that it will
> > > never deadlock, and that's the only way to lock cpu_add_remove_lock.
> > 
> > That's why I said "cpu_add_remove_lock only used in ways similar to
> > how it is used now" - any non-trylock addition would break the
> > assumptions you make afaict. And you can't really expect people
> > adding another (slightly different) use of an existing lock to be
> > aware they now need to check whether this introduces deadlock
> > scenarios on unrelated pre-existing code paths. Hence my request to
> > at least discuss this in the description (raising awareness, and
> > hence allowing reviewers to judge whether further precautionary
> > measures should be taken).
> 
> Thanks for the clarification, I did indeed misunderstood your
> complain.
> 
> Regarding the generalization of the shorthand and integration into
> send_IPI_mask, I've found an issue related to locking. send_IPI_mask
> is called with both interrupts enabled and disabled, and so
> get_cpu_maps panics because of the lock checking.
> 
> I however don't think that such panic is accurate: the logic in
> check_lock specifically relates to the spinning done when picking a
> lock, but I would say the call to check_lock in
> _spin_trylock{_recursive} is not strictly needed, the scenario
> described in check_lock doesn't apply when using trylock.

Never mind, this is not actually true. You can still trigger the
deadlock if you mix trylock with regular locking, so I guess I will
leave the shorthand usage to flush_area_mask unless I can make all the
callers of send_IPI_mask use the same irq status.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel