[PATCH] x86: avoid wrong use of all-but-self IPI shorthand

Jan Beulich posted 1 patch 2 years, 4 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/935688c8-c971-53c7-4c27-f34a841168ec@suse.com
[PATCH] x86: avoid wrong use of all-but-self IPI shorthand
Posted by Jan Beulich 2 years, 4 months ago
With "nosmp" I did observe a flood of "APIC error on CPU0: 04(04), Send
accept error" log messages on an AMD system. And rightly so - nothing
excludes the use of the shorthand in send_IPI_mask() in this case. Set
"unaccounted_cpus" to "true" also when command line restrictions are the
cause.

Note that PV-shim mode is unaffected by this change, first and foremost
because "nosmp" and "maxcpus=" are ignored in this case.

Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
While in "nosmp" mode it's probably benign that we switch to the bigsmp
APIC driver simply because there are more than 8 physical CPUs, I
suppose that's inefficient when "maxcpus=" with a value between 2 and 8
(inclusive) is in use. Question is whether that's worthwhile to find a
solution for.

--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -84,9 +84,14 @@ void __init set_nr_cpu_ids(unsigned int
 	if (!park_offline_cpus)
 		tot_cpus = max_cpus;
 	nr_cpu_ids = min(tot_cpus, NR_CPUS + 0u);
-	if (park_offline_cpus && nr_cpu_ids < num_processors)
-		printk(XENLOG_WARNING "SMP: Cannot bring up %u further CPUs\n",
-		       num_processors - nr_cpu_ids);
+	if (nr_cpu_ids < num_processors)
+	{
+		unaccounted_cpus = true;
+		if (park_offline_cpus)
+			printk(XENLOG_WARNING
+			       "SMP: Cannot bring up %u further CPUs\n",
+			       num_processors - nr_cpu_ids);
+	}
 
 #ifndef nr_cpumask_bits
 	nr_cpumask_bits = ROUNDUP(nr_cpu_ids, BITS_PER_LONG);


Re: [PATCH] x86: avoid wrong use of all-but-self IPI shorthand
Posted by Andrew Cooper 2 years, 4 months ago
On 08/12/2021 11:47, Jan Beulich wrote:
> With "nosmp" I did observe a flood of "APIC error on CPU0: 04(04), Send
> accept error" log messages on an AMD system. And rightly so - nothing
> excludes the use of the shorthand in send_IPI_mask() in this case. Set
> "unaccounted_cpus" to "true" also when command line restrictions are the
> cause.
>
> Note that PV-shim mode is unaffected by this change, first and foremost
> because "nosmp" and "maxcpus=" are ignored in this case.
>
> Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
> While in "nosmp" mode it's probably benign that we switch to the bigsmp
> APIC driver simply because there are more than 8 physical CPUs, I
> suppose that's inefficient when "maxcpus=" with a value between 2 and 8
> (inclusive) is in use. Question is whether that's worthwhile to find a
> solution for.

Honestly, the concept of "nosmp" needs deleting.  We inherited it from
Linux and it wasn't terribly appropriate even back then.

Nowadays, even if we happen to boot with 1 cpu, there are normal things
we talk to (the IOMMUs most obviously) which are smp-like.


None of these command line restricted settings can be used in
production, because neither Intel nor AMD support, and both require us
to boot all logical processors.  Everything playing in this area is a
maintenance burden only.

~Andrew

Re: [PATCH] x86: avoid wrong use of all-but-self IPI shorthand
Posted by Jan Beulich 2 years, 4 months ago
On 08.12.2021 15:16, Andrew Cooper wrote:
> On 08/12/2021 11:47, Jan Beulich wrote:
>> With "nosmp" I did observe a flood of "APIC error on CPU0: 04(04), Send
>> accept error" log messages on an AMD system. And rightly so - nothing
>> excludes the use of the shorthand in send_IPI_mask() in this case. Set
>> "unaccounted_cpus" to "true" also when command line restrictions are the
>> cause.
>>
>> Note that PV-shim mode is unaffected by this change, first and foremost
>> because "nosmp" and "maxcpus=" are ignored in this case.
>>
>> Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

Thanks.

>> ---
>> While in "nosmp" mode it's probably benign that we switch to the bigsmp
>> APIC driver simply because there are more than 8 physical CPUs, I
>> suppose that's inefficient when "maxcpus=" with a value between 2 and 8
>> (inclusive) is in use. Question is whether that's worthwhile to find a
>> solution for.
> 
> Honestly, the concept of "nosmp" needs deleting.  We inherited it from
> Linux and it wasn't terribly appropriate even back then.
> 
> Nowadays, even if we happen to boot with 1 cpu, there are normal things
> we talk to (the IOMMUs most obviously) which are smp-like.
> 
> 
> None of these command line restricted settings can be used in
> production, because neither Intel nor AMD support, and both require us
> to boot all logical processors.  Everything playing in this area is a
> maintenance burden only.

But you realize that "nosmp" (nowadays at least) is merely a shorthand for
"maxcpus=1"? I don't think you mean to suggest to delete that option too?
What we did remove long ago, matching what you say, was CONFIG_SMP.

One aspect of my consideration, which I realize only now, would be that
then we'd have a way to test "flat" mode even on larger systems. This may
be relevant with there being less and less systems with no more than 8
CPUs (threads), and hence that mode probably already hasn't been tested
much.

Jan