[v1] RE: [PATCH V1 0/4] Arch agnostic ACPI changes to support vCPU Hotplug (on Archs like ARM)

RE: [PATCH V1 0/4] Arch agnostic ACPI changes to support vCPU Hotplug (on Archs like ARM)
Posted by Salil Mehta via 1 year, 3 months ago
Hi Igor,

Thanks for taking time to review the series. Please find my replies inline.

>  From: qemu-devel-bounces+salil.mehta=huawei.com@nongnu.org <qemu-
>  devel-bounces+salil.mehta=huawei.com@nongnu.org> On Behalf Of Igor
>  Mammedov
>  Sent: Friday, October 18, 2024 3:46 PM
>  To: Salil Mehta <salil.mehta@huawei.com>
>  
>  On Mon, 14 Oct 2024 20:22:01 +0100
>  Salil Mehta <salil.mehta@huawei.com> wrote:
>  
>  > Certain CPU architecture specifications [1][2][3] prohibit changes to
>  > the CPUs
>  > *presence* after the kernel has booted. This is because many system
>  > initializations depend on the exact CPU count at boot time and do not
>  > expect it to change afterward. For example, components like interrupt
>  > controllers that are closely coupled with CPUs, or various per-CPU
>  > features, may not support configuration changes once the kernel has
>  been initialized.
>  >
>  > This requirement poses a challenge for virtualization features like
>  > vCPU hotplug. To address this, changes to the ACPI AML are necessary
>  > to update the `_STA.PRES` (presence) and `_STA.ENA` (enabled) bits
>  > accordingly during guest initialization, as well as when vCPUs are
>  > hot-plugged or hot-unplugged. The presence of unplugged vCPUs may
>  need
>  > to be deliberately *simulated* at the ACPI level to maintain a *persistent*
>  view of vCPUs for the guest kernel.
>  
>  the series is peppered with *simulated* idea, which after looking at code I
>  read as 'fake'.


Got it! If "simulate" doesn't convey the meaning well, we can definitely switch
back to "fake." or something else. No issues at all.


> While it's obvious to author why things need to be faked at
>  this time, it will be forgotten later on. And cause a lot swearing from
>  whoever will have to deal with this code down the road.


Let's improve the design then. However, the boot time cannot be negotiated.
We cannot have 500 vCPUs spawned at boot time when we only need 10 vCPUs
to run. That’s a non-starter!

(please have a look at KVMForum 2023 slides for the measurements we did for
 128 vCPUs )

This also creates overhead during migration and will prolong the migration time.
The alternative you suggested was the first approach we experimented way back
in the year 2020. We then gradually moved to removing the threads but keeping
the QOM vCPU objects, and later even discarded the vCPU objects corresponding
to the `disabled` possible vCPUs. This change was made precisely at your request
last year in June.


>  
>  Salil, I'm sorry that review comes out as mostly negative but for me having
>  to repeat 'simulated' some many times, hints that the there is something
>  wrong with design and that we should re-evaluate the approach.


A lot of effort and time has gone into this project, involving many companies.

Therefore, let’s proceed, but it’s not fair to the many stakeholders who have
spent so much of their time over the past four years to simply hear that
“there is something wrong with the design.” It would be very helpful if you
could provide more details about your concerns so that we can work on
improving the `existing` design and help you understand our perspective.
Discussion around specifics are required, please!


>  
>  ps:
>  see comments on 1/4 for suggestions


Thanks for that. I've already replied with my perspective in context to that.
Please have a look.

Many thanks!

Best regards
Salil.