[PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.

Harsh Prateek Bora posted 2 patches 1 year ago
Maintainers: Nicholas Piggin <npiggin@gmail.com>, Daniel Henrique Barboza <danielhb413@gmail.com>, "Cédric Le Goater" <clg@kaod.org>, David Gibson <david@gibson.dropbear.id.au>, Harsh Prateek Bora <harshpb@linux.ibm.com>
[PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Harsh Prateek Bora 1 year ago
Initialize the machine specific max_cpus limit as per the maximum range
of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
assert in tcg_region_init or spapr_xive_claim_irq.

Logs:

Without patch fix:

[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: IRQ 4096 is not free
[root@host build]#

On LPAR:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
**
ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
(region_size >= 2 * page_size)
Aborted (core dumped)
[root@host build]#

On x86:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
Assertion `lisn < xive->nr_irqs' failed.
Aborted (core dumped)
[root@host build]#

With patch fix:
[root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
machine 'pseries-8.2' is 4096
[root@host build]#

Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
---
 hw/ppc/spapr.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index df09aa9d6a..0de11a4458 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4647,13 +4647,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->block_default_type = IF_SCSI;
 
     /*
-     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
-     * should be limited by the host capability instead of hardcoded.
-     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
-     * guests are welcome to have as many CPUs as the host are capable
-     * of emulate.
+     * While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
+     * In TCG the limit is restricted by the range of CPU IPIs available.
      */
-    mc->max_cpus = INT32_MAX;
+    mc->max_cpus = SPAPR_NR_IPIS;
 
     mc->no_parallel = 1;
     mc->default_boot_order = "";
-- 
2.39.3
Re: [PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Philippe Mathieu-Daudé 1 year ago
Hi Harsh,

On 22/11/23 10:28, Harsh Prateek Bora wrote:
> Initialize the machine specific max_cpus limit as per the maximum range
> of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
> free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
> assert in tcg_region_init or spapr_xive_claim_irq.
> 
> Logs:
> 
> Without patch fix:
> 
> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
> qemu-system-ppc64: IRQ 4096 is not free
> [root@host build]#
> 
> On LPAR:
> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
> **
> ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
> (region_size >= 2 * page_size)
> Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
> (region_size >= 2 * page_size)
> Aborted (core dumped)
> [root@host build]#
> 
> On x86:
> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
> qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
> Assertion `lisn < xive->nr_irqs' failed.
> Aborted (core dumped)
> [root@host build]#
> 
> With patch fix:
> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
> qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
> machine 'pseries-8.2' is 4096
> [root@host build]#
> 
> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
> ---
>   hw/ppc/spapr.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index df09aa9d6a..0de11a4458 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4647,13 +4647,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>       mc->block_default_type = IF_SCSI;
>   
>       /*
> -     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
> -     * should be limited by the host capability instead of hardcoded.
> -     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
> -     * guests are welcome to have as many CPUs as the host are capable
> -     * of emulate.
> +     * While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
> +     * In TCG the limit is restricted by the range of CPU IPIs available.
>        */
> -    mc->max_cpus = INT32_MAX;
> +    mc->max_cpus = SPAPR_NR_IPIS;

Is SPAPR_NR_IPIS also the upper limit for KVM?

>       mc->no_parallel = 1;
>       mc->default_boot_order = "";
Re: [PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Harsh Prateek Bora 1 year ago
Hi Philippe,

On 11/22/23 16:46, Philippe Mathieu-Daudé wrote:
> Hi Harsh,
> 
> On 22/11/23 10:28, Harsh Prateek Bora wrote:
>> Initialize the machine specific max_cpus limit as per the maximum range
>> of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
>> free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
>> assert in tcg_region_init or spapr_xive_claim_irq.
>>
>> Logs:
>>
>> Without patch fix:
>>
>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>> qemu-system-ppc64: IRQ 4096 is not free
>> [root@host build]#
>>
>> On LPAR:
>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>> **
>> ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>> (region_size >= 2 * page_size)
>> Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>> (region_size >= 2 * page_size)
>> Aborted (core dumped)
>> [root@host build]#
>>
>> On x86:
>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>> qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
>> Assertion `lisn < xive->nr_irqs' failed.
>> Aborted (core dumped)
>> [root@host build]#
>>
>> With patch fix:
>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>> qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
>> machine 'pseries-8.2' is 4096
>> [root@host build]#
>>
>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>> ---
>>   hw/ppc/spapr.c | 9 +++------
>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index df09aa9d6a..0de11a4458 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -4647,13 +4647,10 @@ static void 
>> spapr_machine_class_init(ObjectClass *oc, void *data)
>>       mc->block_default_type = IF_SCSI;
>>       /*
>> -     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
>> -     * should be limited by the host capability instead of hardcoded.
>> -     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
>> -     * guests are welcome to have as many CPUs as the host are capable
>> -     * of emulate.
>> +     * While KVM determines max cpus in kvm_init() using 
>> kvm_max_vcpus(),
>> +     * In TCG the limit is restricted by the range of CPU IPIs 
>> available.
>>        */
>> -    mc->max_cpus = INT32_MAX;
>> +    mc->max_cpus = SPAPR_NR_IPIS;
> 
> Is SPAPR_NR_IPIS also the upper limit for KVM?

In KVM mode, the limit is restricted to what is supported by KVM which 
is checked using kvm_ioctl via wrappers in kvm_init and appears to be 
evaluating to 2048. So, having a larger default works for both case.

regards,
Harsh

> 
>>       mc->no_parallel = 1;
>>       mc->default_boot_order = "";
> 

Re: [PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Cédric Le Goater 1 year ago
On 11/23/23 06:03, Harsh Prateek Bora wrote:
> Hi Philippe,
> 
> On 11/22/23 16:46, Philippe Mathieu-Daudé wrote:
>> Hi Harsh,
>>
>> On 22/11/23 10:28, Harsh Prateek Bora wrote:
>>> Initialize the machine specific max_cpus limit as per the maximum range
>>> of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
>>> free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
>>> assert in tcg_region_init or spapr_xive_claim_irq.
>>>
>>> Logs:
>>>
>>> Without patch fix:
>>>
>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>> qemu-system-ppc64: IRQ 4096 is not free
>>> [root@host build]#
>>>
>>> On LPAR:
>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>> **
>>> ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>> (region_size >= 2 * page_size)
>>> Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>> (region_size >= 2 * page_size)
>>> Aborted (core dumped)
>>> [root@host build]#
>>>
>>> On x86:
>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>> qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
>>> Assertion `lisn < xive->nr_irqs' failed.
>>> Aborted (core dumped)
>>> [root@host build]#
>>>
>>> With patch fix:
>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>> qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
>>> machine 'pseries-8.2' is 4096
>>> [root@host build]#
>>>
>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>> ---
>>>   hw/ppc/spapr.c | 9 +++------
>>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index df09aa9d6a..0de11a4458 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -4647,13 +4647,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>       mc->block_default_type = IF_SCSI;
>>>       /*
>>> -     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
>>> -     * should be limited by the host capability instead of hardcoded.
>>> -     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
>>> -     * guests are welcome to have as many CPUs as the host are capable
>>> -     * of emulate.
>>> +     * While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
>>> +     * In TCG the limit is restricted by the range of CPU IPIs available.
>>>        */
>>> -    mc->max_cpus = INT32_MAX;
>>> +    mc->max_cpus = SPAPR_NR_IPIS;
>>
>> Is SPAPR_NR_IPIS also the upper limit for KVM?
> 
> In KVM mode, the limit is restricted to what is supported by KVM which is checked using kvm_ioctl via wrappers in kvm_init and appears to be evaluating to 2048. So, having a larger default works for both case.

QEMU sets the number of cpus with KVM ioctls :

	KVM_DEV_XICS_NR_SERVERS
	KVM_DEV_XIVE_NR_SERVERS

This is important for the host since the interrupt controller is then
configured with these values through FW.

The default value is indeed 2K but this is large and wastes a lot of
HW resources, page mappings, etc.

Thanks,

C.




Re: [PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Philippe Mathieu-Daudé 1 year ago
On 23/11/23 09:47, Cédric Le Goater wrote:
> On 11/23/23 06:03, Harsh Prateek Bora wrote:
>> Hi Philippe,
>>
>> On 11/22/23 16:46, Philippe Mathieu-Daudé wrote:
>>> Hi Harsh,
>>>
>>> On 22/11/23 10:28, Harsh Prateek Bora wrote:
>>>> Initialize the machine specific max_cpus limit as per the maximum range
>>>> of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
>>>> free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
>>>> assert in tcg_region_init or spapr_xive_claim_irq.
>>>>
>>>> Logs:
>>>>
>>>> Without patch fix:
>>>>
>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>>> qemu-system-ppc64: IRQ 4096 is not free
>>>> [root@host build]#
>>>>
>>>> On LPAR:
>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>>> **
>>>> ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>>> (region_size >= 2 * page_size)
>>>> Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>>> (region_size >= 2 * page_size)
>>>> Aborted (core dumped)
>>>> [root@host build]#
>>>>
>>>> On x86:
>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>>> qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
>>>> Assertion `lisn < xive->nr_irqs' failed.
>>>> Aborted (core dumped)
>>>> [root@host build]#
>>>>
>>>> With patch fix:
>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>>> qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
>>>> machine 'pseries-8.2' is 4096
>>>> [root@host build]#
>>>>
>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>> ---
>>>>   hw/ppc/spapr.c | 9 +++------
>>>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>> index df09aa9d6a..0de11a4458 100644
>>>> --- a/hw/ppc/spapr.c
>>>> +++ b/hw/ppc/spapr.c
>>>> @@ -4647,13 +4647,10 @@ static void 
>>>> spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>       mc->block_default_type = IF_SCSI;
>>>>       /*
>>>> -     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
>>>> -     * should be limited by the host capability instead of hardcoded.
>>>> -     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
>>>> -     * guests are welcome to have as many CPUs as the host are capable
>>>> -     * of emulate.
>>>> +     * While KVM determines max cpus in kvm_init() using 
>>>> kvm_max_vcpus(),
>>>> +     * In TCG the limit is restricted by the range of CPU IPIs 
>>>> available.
>>>>        */
>>>> -    mc->max_cpus = INT32_MAX;
>>>> +    mc->max_cpus = SPAPR_NR_IPIS;
>>>
>>> Is SPAPR_NR_IPIS also the upper limit for KVM?
>>
>> In KVM mode, the limit is restricted to what is supported by KVM which 
>> is checked using kvm_ioctl via wrappers in kvm_init and appears to be 
>> evaluating to 2048. So, having a larger default works for both case.
> 
> QEMU sets the number of cpus with KVM ioctls :
> 
>      KVM_DEV_XICS_NR_SERVERS
>      KVM_DEV_XIVE_NR_SERVERS
> 
> This is important for the host since the interrupt controller is then
> configured with these values through FW.
> 
> The default value is indeed 2K but this is large and wastes a lot of
> HW resources, page mappings, etc.

I was wondering if one day KVM raise its limit to 5k, then the
machine will clamp to 4k, and someone will have to debug that.
Not a big deal ;)


Re: [PATCH v2 2/2] ppc/spapr: Initialize max_cpus limit to SPAPR_NR_IPIS.
Posted by Cédric Le Goater 1 year ago
On 11/23/23 11:26, Philippe Mathieu-Daudé wrote:
> On 23/11/23 09:47, Cédric Le Goater wrote:
>> On 11/23/23 06:03, Harsh Prateek Bora wrote:
>>> Hi Philippe,
>>>
>>> On 11/22/23 16:46, Philippe Mathieu-Daudé wrote:
>>>> Hi Harsh,
>>>>
>>>> On 22/11/23 10:28, Harsh Prateek Bora wrote:
>>>>> Initialize the machine specific max_cpus limit as per the maximum range
>>>>> of CPU IPIs available. Keeping between 4096 to 8192 will throw IRQ not
>>>>> free error due to XIVE/XICS limitation and keeping beyond 8192 will hit
>>>>> assert in tcg_region_init or spapr_xive_claim_irq.
>>>>>
>>>>> Logs:
>>>>>
>>>>> Without patch fix:
>>>>>
>>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>>>> qemu-system-ppc64: IRQ 4096 is not free
>>>>> [root@host build]#
>>>>>
>>>>> On LPAR:
>>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>>>> **
>>>>> ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>>>> (region_size >= 2 * page_size)
>>>>> Bail out! ERROR:../tcg/region.c:774:tcg_region_init: assertion failed:
>>>>> (region_size >= 2 * page_size)
>>>>> Aborted (core dumped)
>>>>> [root@host build]#
>>>>>
>>>>> On x86:
>>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=8193
>>>>> qemu-system-ppc64: ../hw/intc/spapr_xive.c:596: spapr_xive_claim_irq:
>>>>> Assertion `lisn < xive->nr_irqs' failed.
>>>>> Aborted (core dumped)
>>>>> [root@host build]#
>>>>>
>>>>> With patch fix:
>>>>> [root@host build]# qemu-system-ppc64 -accel tcg -smp 10,maxcpus=4097
>>>>> qemu-system-ppc64: Invalid SMP CPUs 4097. The max CPUs supported by
>>>>> machine 'pseries-8.2' is 4096
>>>>> [root@host build]#
>>>>>
>>>>> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>>>> ---
>>>>>   hw/ppc/spapr.c | 9 +++------
>>>>>   1 file changed, 3 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>> index df09aa9d6a..0de11a4458 100644
>>>>> --- a/hw/ppc/spapr.c
>>>>> +++ b/hw/ppc/spapr.c
>>>>> @@ -4647,13 +4647,10 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>>>>>       mc->block_default_type = IF_SCSI;
>>>>>       /*
>>>>> -     * Setting max_cpus to INT32_MAX. Both KVM and TCG max_cpus values
>>>>> -     * should be limited by the host capability instead of hardcoded.
>>>>> -     * max_cpus for KVM guests will be checked in kvm_init(), and TCG
>>>>> -     * guests are welcome to have as many CPUs as the host are capable
>>>>> -     * of emulate.
>>>>> +     * While KVM determines max cpus in kvm_init() using kvm_max_vcpus(),
>>>>> +     * In TCG the limit is restricted by the range of CPU IPIs available.
>>>>>        */
>>>>> -    mc->max_cpus = INT32_MAX;
>>>>> +    mc->max_cpus = SPAPR_NR_IPIS;
>>>>
>>>> Is SPAPR_NR_IPIS also the upper limit for KVM?
>>>
>>> In KVM mode, the limit is restricted to what is supported by KVM which is checked using kvm_ioctl via wrappers in kvm_init and appears to be evaluating to 2048. So, having a larger default works for both case.
>>
>> QEMU sets the number of cpus with KVM ioctls :
>>
>>      KVM_DEV_XICS_NR_SERVERS
>>      KVM_DEV_XIVE_NR_SERVERS
>>
>> This is important for the host since the interrupt controller is then
>> configured with these values through FW.
>>
>> The default value is indeed 2K but this is large and wastes a lot of
>> HW resources, page mappings, etc.
> 
> I was wondering if one day KVM raise its limit to 5k, then the
> machine will clamp to 4k, and someone will have to debug that.
> Not a big deal ;)

Changing the number of CPUs will require some work in Linux first.
I think we (as ppc) tried to push it to 8K but there was some push
back upstream. Anyhow, there are DT issues also, memory layout, etc.
It won't happen without being noticed I am sure :)

Anyhow, If we need more IPIs to support more CPUs, the IRQ number
space will need an extra range after the device range to preserve
compatibility.

Thanks,

C.