The first cpu unplug wasn't ever supported and corresponding
monitor/qmp commands refuse to unplug it. However guest is able
to issue eject request either using following command:
# echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject
or directly writing to cpu hotplug registers, which makes
qemu crash with SIGSEGV following back trace:
kvm_flush_coalesced_mmio_buffer ()
while (ring->first != ring->last)
...
qemu_flush_coalesced_mmio_buffer
prepare_mmio_access
flatview_read_continue
flatview_read
address_space_read_full
address_space_rw
kvm_cpu_exec(cpu!0)
qemu_kvm_cpu_thread_fn
the reason for which is that ring == KVMState::coalesced_mmio_ring
happens to be a part of 1st CPU that was uplugged by guest.
Fix it by forbidding 1st cpu unplug from guest side and in addition
remove CPU0._EJ0 ACPI method to make clear that unplug of the first
CPU is not supported.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well
CC: mst@redhat.com
CC: pbonzini@redhat.com
CC: david@gibson.dropbear.id.au
CC: groug@kaod.org
CC: david@redhat.com
CC: cohuck@redhat.com
---
hw/acpi/cpu.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 5ae595e..4bb8371 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -117,7 +117,7 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
DeviceState *dev = NULL;
HotplugHandler *hotplug_ctrl = NULL;
- if (!cdev->cpu) {
+ if (!cdev->cpu || cdev->cpu == first_cpu) {
trace_cpuhp_acpi_ejecting_invalid_cpu(cpu_st->selector);
break;
}
@@ -541,9 +541,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
aml_buffer(madt_buf->len, (uint8_t *)madt_buf->data)));
g_array_free(madt_buf, true);
- method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
- aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
- aml_append(dev, method);
+ if (CPU(arch_ids->cpus[i].cpu) != first_cpu) {
+ method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
+ aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
+ aml_append(dev, method);
+ }
method = aml_method("_OST", 3, AML_SERIALIZED);
aml_append(method,
--
2.7.4
On 27.08.2018 13:07, Igor Mammedov wrote:
> The first cpu unplug wasn't ever supported and corresponding
> monitor/qmp commands refuse to unplug it. However guest is able
> to issue eject request either using following command:
> # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject
> or directly writing to cpu hotplug registers, which makes
> qemu crash with SIGSEGV following back trace:
>
> kvm_flush_coalesced_mmio_buffer ()
> while (ring->first != ring->last)
> ...
> qemu_flush_coalesced_mmio_buffer
> prepare_mmio_access
> flatview_read_continue
> flatview_read
> address_space_read_full
> address_space_rw
> kvm_cpu_exec(cpu!0)
> qemu_kvm_cpu_thread_fn
>
> the reason for which is that ring == KVMState::coalesced_mmio_ring
> happens to be a part of 1st CPU that was uplugged by guest.
>
> Fix it by forbidding 1st cpu unplug from guest side and in addition
> remove CPU0._EJ0 ACPI method to make clear that unplug of the first
> CPU is not supported.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well
No unplug on s390x, so we're fine.
>
> CC: mst@redhat.com
> CC: pbonzini@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: groug@kaod.org
> CC: david@redhat.com
> CC: cohuck@redhat.com
> ---
> hw/acpi/cpu.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5ae595e..4bb8371 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -117,7 +117,7 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> DeviceState *dev = NULL;
> HotplugHandler *hotplug_ctrl = NULL;
>
> - if (!cdev->cpu) {
> + if (!cdev->cpu || cdev->cpu == first_cpu) {
> trace_cpuhp_acpi_ejecting_invalid_cpu(cpu_st->selector);
> break;
> }
> @@ -541,9 +541,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> aml_buffer(madt_buf->len, (uint8_t *)madt_buf->data)));
> g_array_free(madt_buf, true);
>
> - method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> - aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> - aml_append(dev, method);
> + if (CPU(arch_ids->cpus[i].cpu) != first_cpu) {
> + method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> + aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> + aml_append(dev, method);
> + }
>
> method = aml_method("_OST", 3, AML_SERIALIZED);
> aml_append(method,
>
--
Thanks,
David / dhildenb
On Mon, 27 Aug 2018 13:07:10 +0200
Igor Mammedov <imammedo@redhat.com> wrote:
> The first cpu unplug wasn't ever supported and corresponding
> monitor/qmp commands refuse to unplug it. However guest is able
> to issue eject request either using following command:
> # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject
I can't reproduce the issue with a pc guest and current master...
All I seem to get is an error in dmesg:
[ 97.435446] processor cpu0: Offline failed.
> or directly writing to cpu hotplug registers, which makes
> qemu crash with SIGSEGV following back trace:
>
> kvm_flush_coalesced_mmio_buffer ()
> while (ring->first != ring->last)
> ...
> qemu_flush_coalesced_mmio_buffer
> prepare_mmio_access
> flatview_read_continue
> flatview_read
> address_space_read_full
> address_space_rw
> kvm_cpu_exec(cpu!0)
> qemu_kvm_cpu_thread_fn
>
> the reason for which is that ring == KVMState::coalesced_mmio_ring
> happens to be a part of 1st CPU that was uplugged by guest.
>
> Fix it by forbidding 1st cpu unplug from guest side and in addition
> remove CPU0._EJ0 ACPI method to make clear that unplug of the first
> CPU is not supported.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well
>
A spapr guest can _release_ the first cpu by doing something like:
# echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release
But AFAIK, this doesn't unplug the cpu from a QEMU standpoint.
> CC: mst@redhat.com
> CC: pbonzini@redhat.com
> CC: david@gibson.dropbear.id.au
> CC: groug@kaod.org
> CC: david@redhat.com
> CC: cohuck@redhat.com
> ---
> hw/acpi/cpu.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> index 5ae595e..4bb8371 100644
> --- a/hw/acpi/cpu.c
> +++ b/hw/acpi/cpu.c
> @@ -117,7 +117,7 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> DeviceState *dev = NULL;
> HotplugHandler *hotplug_ctrl = NULL;
>
> - if (!cdev->cpu) {
> + if (!cdev->cpu || cdev->cpu == first_cpu) {
> trace_cpuhp_acpi_ejecting_invalid_cpu(cpu_st->selector);
> break;
> }
> @@ -541,9 +541,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> aml_buffer(madt_buf->len, (uint8_t *)madt_buf->data)));
> g_array_free(madt_buf, true);
>
> - method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> - aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> - aml_append(dev, method);
> + if (CPU(arch_ids->cpus[i].cpu) != first_cpu) {
> + method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> + aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> + aml_append(dev, method);
> + }
>
> method = aml_method("_OST", 3, AML_SERIALIZED);
> aml_append(method,
On Mon, 27 Aug 2018 16:02:39 +0200
Greg Kurz <groug@kaod.org> wrote:
> On Mon, 27 Aug 2018 13:07:10 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
>
> > The first cpu unplug wasn't ever supported and corresponding
> > monitor/qmp commands refuse to unplug it. However guest is able
> > to issue eject request either using following command:
> > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject
>
> I can't reproduce the issue with a pc guest and current master...
>
> All I seem to get is an error in dmesg:
>
> [ 97.435446] processor cpu0: Offline failed.
To be able to unplug CPU0 one need to set CONFIG_BOOTPARAM_HOTPLUG_CPU0=y
(I've tested it with a RHLE7 kernel)
> > or directly writing to cpu hotplug registers, which makes
> > qemu crash with SIGSEGV following back trace:
> >
> > kvm_flush_coalesced_mmio_buffer ()
> > while (ring->first != ring->last)
> > ...
> > qemu_flush_coalesced_mmio_buffer
> > prepare_mmio_access
> > flatview_read_continue
> > flatview_read
> > address_space_read_full
> > address_space_rw
> > kvm_cpu_exec(cpu!0)
> > qemu_kvm_cpu_thread_fn
> >
> > the reason for which is that ring == KVMState::coalesced_mmio_ring
> > happens to be a part of 1st CPU that was uplugged by guest.
> >
> > Fix it by forbidding 1st cpu unplug from guest side and in addition
> > remove CPU0._EJ0 ACPI method to make clear that unplug of the first
> > CPU is not supported.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well
> >
>
> A spapr guest can _release_ the first cpu by doing something like:
>
> # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release
>
> But AFAIK, this doesn't unplug the cpu from a QEMU standpoint.
Good,
so only x86 fix should be fixed.
> > CC: mst@redhat.com
> > CC: pbonzini@redhat.com
> > CC: david@gibson.dropbear.id.au
> > CC: groug@kaod.org
> > CC: david@redhat.com
> > CC: cohuck@redhat.com
> > ---
> > hw/acpi/cpu.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
> > index 5ae595e..4bb8371 100644
> > --- a/hw/acpi/cpu.c
> > +++ b/hw/acpi/cpu.c
> > @@ -117,7 +117,7 @@ static void cpu_hotplug_wr(void *opaque, hwaddr addr, uint64_t data,
> > DeviceState *dev = NULL;
> > HotplugHandler *hotplug_ctrl = NULL;
> >
> > - if (!cdev->cpu) {
> > + if (!cdev->cpu || cdev->cpu == first_cpu) {
> > trace_cpuhp_acpi_ejecting_invalid_cpu(cpu_st->selector);
> > break;
> > }
> > @@ -541,9 +541,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
> > aml_buffer(madt_buf->len, (uint8_t *)madt_buf->data)));
> > g_array_free(madt_buf, true);
> >
> > - method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> > - aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> > - aml_append(dev, method);
> > + if (CPU(arch_ids->cpus[i].cpu) != first_cpu) {
> > + method = aml_method("_EJ0", 1, AML_NOTSERIALIZED);
> > + aml_append(method, aml_call1(CPU_EJECT_METHOD, uid));
> > + aml_append(dev, method);
> > + }
> >
> > method = aml_method("_OST", 3, AML_SERIALIZED);
> > aml_append(method,
>
On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > On Mon, 27 Aug 2018 13:07:10 +0200 > Igor Mammedov <imammedo@redhat.com> wrote: > > > The first cpu unplug wasn't ever supported and corresponding > > monitor/qmp commands refuse to unplug it. However guest is able > > to issue eject request either using following command: > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > I can't reproduce the issue with a pc guest and current master... > > All I seem to get is an error in dmesg: > > [ 97.435446] processor cpu0: Offline failed. > > > or directly writing to cpu hotplug registers, which makes > > qemu crash with SIGSEGV following back trace: > > > > kvm_flush_coalesced_mmio_buffer () > > while (ring->first != ring->last) > > ... > > qemu_flush_coalesced_mmio_buffer > > prepare_mmio_access > > flatview_read_continue > > flatview_read > > address_space_read_full > > address_space_rw > > kvm_cpu_exec(cpu!0) > > qemu_kvm_cpu_thread_fn > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > happens to be a part of 1st CPU that was uplugged by guest. > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > CPU is not supported. > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > --- > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > A spapr guest can _release_ the first cpu by doing something like: > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. Unplugging CPU 0 with device_del should be ok too. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Tue, 28 Aug 2018 10:52:37 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > On Mon, 27 Aug 2018 13:07:10 +0200 > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > The first cpu unplug wasn't ever supported and corresponding > > > monitor/qmp commands refuse to unplug it. However guest is able > > > to issue eject request either using following command: > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > I can't reproduce the issue with a pc guest and current master... > > > > All I seem to get is an error in dmesg: > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > or directly writing to cpu hotplug registers, which makes > > > qemu crash with SIGSEGV following back trace: > > > > > > kvm_flush_coalesced_mmio_buffer () > > > while (ring->first != ring->last) > > > ... > > > qemu_flush_coalesced_mmio_buffer > > > prepare_mmio_access > > > flatview_read_continue > > > flatview_read > > > address_space_read_full > > > address_space_rw > > > kvm_cpu_exec(cpu!0) > > > qemu_kvm_cpu_thread_fn > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > CPU is not supported. > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > --- > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > Unplugging CPU 0 with device_del should be ok too. Do you mean real unplugging (cpu0 object freed) or just remove request?
On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: > On Tue, 28 Aug 2018 10:52:37 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > > On Mon, 27 Aug 2018 13:07:10 +0200 > > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > > > The first cpu unplug wasn't ever supported and corresponding > > > > monitor/qmp commands refuse to unplug it. However guest is able > > > > to issue eject request either using following command: > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > > > I can't reproduce the issue with a pc guest and current master... > > > > > > All I seem to get is an error in dmesg: > > > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > > > or directly writing to cpu hotplug registers, which makes > > > > qemu crash with SIGSEGV following back trace: > > > > > > > > kvm_flush_coalesced_mmio_buffer () > > > > while (ring->first != ring->last) > > > > ... > > > > qemu_flush_coalesced_mmio_buffer > > > > prepare_mmio_access > > > > flatview_read_continue > > > > flatview_read > > > > address_space_read_full > > > > address_space_rw > > > > kvm_cpu_exec(cpu!0) > > > > qemu_kvm_cpu_thread_fn > > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > > CPU is not supported. > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > --- > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > > > Unplugging CPU 0 with device_del should be ok too. > Do you mean real unplugging (cpu0 object freed) or just remove request? Real unplugging should be possible. I'm not sure how thorougly it's been tested, though. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Wed, 29 Aug 2018 12:54:40 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: > > On Tue, 28 Aug 2018 10:52:37 +1000 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > > > On Mon, 27 Aug 2018 13:07:10 +0200 > > > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > > > > > The first cpu unplug wasn't ever supported and corresponding > > > > > monitor/qmp commands refuse to unplug it. However guest is able > > > > > to issue eject request either using following command: > > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > > > > > I can't reproduce the issue with a pc guest and current master... > > > > > > > > All I seem to get is an error in dmesg: > > > > > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > > > > > or directly writing to cpu hotplug registers, which makes > > > > > qemu crash with SIGSEGV following back trace: > > > > > > > > > > kvm_flush_coalesced_mmio_buffer () > > > > > while (ring->first != ring->last) > > > > > ... > > > > > qemu_flush_coalesced_mmio_buffer > > > > > prepare_mmio_access > > > > > flatview_read_continue > > > > > flatview_read > > > > > address_space_read_full > > > > > address_space_rw > > > > > kvm_cpu_exec(cpu!0) > > > > > qemu_kvm_cpu_thread_fn > > > > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > > > CPU is not supported. > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > > --- > > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > > > > > Unplugging CPU 0 with device_del should be ok too. > > Do you mean real unplugging (cpu0 object freed) or just remove request? > > Real unplugging should be possible. I'm not sure how thorougly it's > been tested, though. Well, common kvm code in qemu seems to be in disagreement with it as backtrace in this patch shows also usage of first_cpu macro won't survive such unplug.
On Wed, Aug 29, 2018 at 10:43:11AM +0200, Igor Mammedov wrote: > On Wed, 29 Aug 2018 12:54:40 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: > > > On Tue, 28 Aug 2018 10:52:37 +1000 > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > > > > On Mon, 27 Aug 2018 13:07:10 +0200 > > > > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > > > > > > > The first cpu unplug wasn't ever supported and corresponding > > > > > > monitor/qmp commands refuse to unplug it. However guest is able > > > > > > to issue eject request either using following command: > > > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > > > > > > > I can't reproduce the issue with a pc guest and current master... > > > > > > > > > > All I seem to get is an error in dmesg: > > > > > > > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > > > > > > > or directly writing to cpu hotplug registers, which makes > > > > > > qemu crash with SIGSEGV following back trace: > > > > > > > > > > > > kvm_flush_coalesced_mmio_buffer () > > > > > > while (ring->first != ring->last) > > > > > > ... > > > > > > qemu_flush_coalesced_mmio_buffer > > > > > > prepare_mmio_access > > > > > > flatview_read_continue > > > > > > flatview_read > > > > > > address_space_read_full > > > > > > address_space_rw > > > > > > kvm_cpu_exec(cpu!0) > > > > > > qemu_kvm_cpu_thread_fn > > > > > > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > > > > CPU is not supported. > > > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > > > --- > > > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > > > > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > > > > > > > Unplugging CPU 0 with device_del should be ok too. > > > Do you mean real unplugging (cpu0 object freed) or just remove request? > > > > Real unplugging should be possible. I'm not sure how thorougly it's > > been tested, though. > Well, common kvm code in qemu seems to be in disagreement with it > as backtrace in this patch shows also usage of first_cpu macro > won't survive such unplug. Paolo - any take on this? Do we need to make cpu 0 special like this? -- MST
On Wed, 29 Aug 2018 09:15:53 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Aug 29, 2018 at 10:43:11AM +0200, Igor Mammedov wrote: > > On Wed, 29 Aug 2018 12:54:40 +1000 > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: > > > > On Tue, 28 Aug 2018 10:52:37 +1000 > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > > > > > On Mon, 27 Aug 2018 13:07:10 +0200 > > > > > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > > > > > > > > > The first cpu unplug wasn't ever supported and corresponding > > > > > > > monitor/qmp commands refuse to unplug it. However guest is able > > > > > > > to issue eject request either using following command: > > > > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > > > > > > > > > I can't reproduce the issue with a pc guest and current master... > > > > > > > > > > > > All I seem to get is an error in dmesg: > > > > > > > > > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > > > > > > > > > or directly writing to cpu hotplug registers, which makes > > > > > > > qemu crash with SIGSEGV following back trace: > > > > > > > > > > > > > > kvm_flush_coalesced_mmio_buffer () > > > > > > > while (ring->first != ring->last) > > > > > > > ... > > > > > > > qemu_flush_coalesced_mmio_buffer > > > > > > > prepare_mmio_access > > > > > > > flatview_read_continue > > > > > > > flatview_read > > > > > > > address_space_read_full > > > > > > > address_space_rw > > > > > > > kvm_cpu_exec(cpu!0) > > > > > > > qemu_kvm_cpu_thread_fn > > > > > > > > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > > > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > > > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > > > > > CPU is not supported. > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > > > > --- > > > > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > > > > > > > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > > > > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > > > > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > > > > > > > > > Unplugging CPU 0 with device_del should be ok too. > > > > Do you mean real unplugging (cpu0 object freed) or just remove request? > > > > > > Real unplugging should be possible. I'm not sure how thorougly it's > > > been tested, though. > > Well, common kvm code in qemu seems to be in disagreement with it > > as backtrace in this patch shows also usage of first_cpu macro > > won't survive such unplug. > > Paolo - any take on this? Do we need to make cpu 0 special like this? It probably would take a bunch of refactoring to get rid of first_cpu&co dependencies and besides of experimenting with cpu0 unplug in guest kernel there isn't any other value in it, so it probably not worth the effort. On top of that, for pc/q35 machine we would need to select boot cpu in some other way (right now it's hardwired to first_cpu). It seems that seabios might work if cpu0 isn't present, don't know about OVMF.
On 08/29/18 15:51, Igor Mammedov wrote: > On Wed, 29 Aug 2018 09:15:53 -0400 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> On Wed, Aug 29, 2018 at 10:43:11AM +0200, Igor Mammedov wrote: >>> On Wed, 29 Aug 2018 12:54:40 +1000 >>> David Gibson <david@gibson.dropbear.id.au> wrote: >>> >>>> On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: >>>>> On Tue, 28 Aug 2018 10:52:37 +1000 >>>>> David Gibson <david@gibson.dropbear.id.au> wrote: >>>>> >>>>>> On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: >>>>>>> On Mon, 27 Aug 2018 13:07:10 +0200 >>>>>>> Igor Mammedov <imammedo@redhat.com> wrote: >>>>>>> >>>>>>>> The first cpu unplug wasn't ever supported and corresponding >>>>>>>> monitor/qmp commands refuse to unplug it. However guest is able >>>>>>>> to issue eject request either using following command: >>>>>>>> # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject >>>>>>> >>>>>>> I can't reproduce the issue with a pc guest and current master... >>>>>>> >>>>>>> All I seem to get is an error in dmesg: >>>>>>> >>>>>>> [ 97.435446] processor cpu0: Offline failed. >>>>>>> >>>>>>>> or directly writing to cpu hotplug registers, which makes >>>>>>>> qemu crash with SIGSEGV following back trace: >>>>>>>> >>>>>>>> kvm_flush_coalesced_mmio_buffer () >>>>>>>> while (ring->first != ring->last) >>>>>>>> ... >>>>>>>> qemu_flush_coalesced_mmio_buffer >>>>>>>> prepare_mmio_access >>>>>>>> flatview_read_continue >>>>>>>> flatview_read >>>>>>>> address_space_read_full >>>>>>>> address_space_rw >>>>>>>> kvm_cpu_exec(cpu!0) >>>>>>>> qemu_kvm_cpu_thread_fn >>>>>>>> >>>>>>>> the reason for which is that ring == KVMState::coalesced_mmio_ring >>>>>>>> happens to be a part of 1st CPU that was uplugged by guest. >>>>>>>> >>>>>>>> Fix it by forbidding 1st cpu unplug from guest side and in addition >>>>>>>> remove CPU0._EJ0 ACPI method to make clear that unplug of the first >>>>>>>> CPU is not supported. >>>>>>>> >>>>>>>> Signed-off-by: Igor Mammedov <imammedo@redhat.com> >>>>>>>> --- >>>>>>>> CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well >>>>>>>> >>>>>>> >>>>>>> A spapr guest can _release_ the first cpu by doing something like: >>>>>>> >>>>>>> # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release >>>>>>> >>>>>>> But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. >>>>>> >>>>>> Unplugging CPU 0 with device_del should be ok too. >>>>> Do you mean real unplugging (cpu0 object freed) or just remove request? >>>> >>>> Real unplugging should be possible. I'm not sure how thorougly it's >>>> been tested, though. >>> Well, common kvm code in qemu seems to be in disagreement with it >>> as backtrace in this patch shows also usage of first_cpu macro >>> won't survive such unplug. >> >> Paolo - any take on this? Do we need to make cpu 0 special like this? > It probably would take a bunch of refactoring to get rid of first_cpu&co > dependencies and besides of experimenting with cpu0 unplug in guest kernel > there isn't any other value in it, so it probably not worth the effort. > > On top of that, for pc/q35 machine we would need to select boot cpu > in some other way (right now it's hardwired to first_cpu). > > It seems that seabios might work if cpu0 isn't present, don't know about OVMF. > Sorry, I have no idea. I'm not aware of any OVMF testing with partially populated VCPU topologies. Thanks Laszlo
On Wed, Aug 29, 2018 at 03:51:38PM +0200, Igor Mammedov wrote:
> On Wed, 29 Aug 2018 09:15:53 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Wed, Aug 29, 2018 at 10:43:11AM +0200, Igor Mammedov wrote:
> > > On Wed, 29 Aug 2018 12:54:40 +1000
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >
> > > > On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote:
> > > > > On Tue, 28 Aug 2018 10:52:37 +1000
> > > > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >
> > > > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote:
> > > > > > > On Mon, 27 Aug 2018 13:07:10 +0200
> > > > > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > > > > >
> > > > > > > > The first cpu unplug wasn't ever supported and corresponding
> > > > > > > > monitor/qmp commands refuse to unplug it. However guest is able
> > > > > > > > to issue eject request either using following command:
> > > > > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject
> > > > > > >
> > > > > > > I can't reproduce the issue with a pc guest and current master...
> > > > > > >
> > > > > > > All I seem to get is an error in dmesg:
> > > > > > >
> > > > > > > [ 97.435446] processor cpu0: Offline failed.
> > > > > > >
> > > > > > > > or directly writing to cpu hotplug registers, which makes
> > > > > > > > qemu crash with SIGSEGV following back trace:
> > > > > > > >
> > > > > > > > kvm_flush_coalesced_mmio_buffer ()
> > > > > > > > while (ring->first != ring->last)
> > > > > > > > ...
> > > > > > > > qemu_flush_coalesced_mmio_buffer
> > > > > > > > prepare_mmio_access
> > > > > > > > flatview_read_continue
> > > > > > > > flatview_read
> > > > > > > > address_space_read_full
> > > > > > > > address_space_rw
> > > > > > > > kvm_cpu_exec(cpu!0)
> > > > > > > > qemu_kvm_cpu_thread_fn
> > > > > > > >
> > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring
> > > > > > > > happens to be a part of 1st CPU that was uplugged by guest.
> > > > > > > >
> > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition
> > > > > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first
> > > > > > > > CPU is not supported.
> > > > > > > >
> > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > > > > > > ---
> > > > > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well
> > > > > > > >
> > > > > > >
> > > > > > > A spapr guest can _release_ the first cpu by doing something like:
> > > > > > >
> > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release
> > > > > > >
> > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint.
> > > > > >
> > > > > > Unplugging CPU 0 with device_del should be ok too.
> > > > > Do you mean real unplugging (cpu0 object freed) or just remove request?
> > > >
> > > > Real unplugging should be possible. I'm not sure how thorougly it's
> > > > been tested, though.
> > > Well, common kvm code in qemu seems to be in disagreement with it
> > > as backtrace in this patch shows also usage of first_cpu macro
> > > won't survive such unplug.
> >
> > Paolo - any take on this? Do we need to make cpu 0 special like this?
> It probably would take a bunch of refactoring to get rid of first_cpu&co
> dependencies and besides of experimenting with cpu0 unplug in guest kernel
> there isn't any other value in it, so it probably not worth the effort.
>
> On top of that, for pc/q35 machine we would need to select boot cpu
> in some other way (right now it's hardwired to first_cpu).
>
> It seems that seabios might work if cpu0 isn't present, don't know about OVMF.
OK, I queued this, but can you please add some code comments
so we remember why it is like this if someone changes the code?
Even better maybe to add an API along the lines of:
cpu_is_hotpluggable
return cpu != first_cpu
--
MST
On Fri, 7 Sep 2018 16:52:17 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Aug 29, 2018 at 03:51:38PM +0200, Igor Mammedov wrote: > > On Wed, 29 Aug 2018 09:15:53 -0400 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Wed, Aug 29, 2018 at 10:43:11AM +0200, Igor Mammedov wrote: > > > > On Wed, 29 Aug 2018 12:54:40 +1000 > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > On Tue, Aug 28, 2018 at 03:18:48PM +0200, Igor Mammedov wrote: > > > > > > On Tue, 28 Aug 2018 10:52:37 +1000 > > > > > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > On Mon, Aug 27, 2018 at 04:02:39PM +0200, Greg Kurz wrote: > > > > > > > > On Mon, 27 Aug 2018 13:07:10 +0200 > > > > > > > > Igor Mammedov <imammedo@redhat.com> wrote: > > > > > > > > > > > > > > > > > The first cpu unplug wasn't ever supported and corresponding > > > > > > > > > monitor/qmp commands refuse to unplug it. However guest is able > > > > > > > > > to issue eject request either using following command: > > > > > > > > > # echo 1 >/sys/devices/system/cpu/cpu0/firmware_node/eject > > > > > > > > > > > > > > > > I can't reproduce the issue with a pc guest and current master... > > > > > > > > > > > > > > > > All I seem to get is an error in dmesg: > > > > > > > > > > > > > > > > [ 97.435446] processor cpu0: Offline failed. > > > > > > > > > > > > > > > > > or directly writing to cpu hotplug registers, which makes > > > > > > > > > qemu crash with SIGSEGV following back trace: > > > > > > > > > > > > > > > > > > kvm_flush_coalesced_mmio_buffer () > > > > > > > > > while (ring->first != ring->last) > > > > > > > > > ... > > > > > > > > > qemu_flush_coalesced_mmio_buffer > > > > > > > > > prepare_mmio_access > > > > > > > > > flatview_read_continue > > > > > > > > > flatview_read > > > > > > > > > address_space_read_full > > > > > > > > > address_space_rw > > > > > > > > > kvm_cpu_exec(cpu!0) > > > > > > > > > qemu_kvm_cpu_thread_fn > > > > > > > > > > > > > > > > > > the reason for which is that ring == KVMState::coalesced_mmio_ring > > > > > > > > > happens to be a part of 1st CPU that was uplugged by guest. > > > > > > > > > > > > > > > > > > Fix it by forbidding 1st cpu unplug from guest side and in addition > > > > > > > > > remove CPU0._EJ0 ACPI method to make clear that unplug of the first > > > > > > > > > CPU is not supported. > > > > > > > > > > > > > > > > > > Signed-off-by: Igor Mammedov <imammedo@redhat.com> > > > > > > > > > --- > > > > > > > > > CCing spapr and s390x folks in case targets need to prevent 1st CPU unplug as well > > > > > > > > > > > > > > > > > > > > > > > > > A spapr guest can _release_ the first cpu by doing something like: > > > > > > > > > > > > > > > > # echo -n "/cpus/PowerPC,POWER8@0" > /sys/devices/system/cpu/release > > > > > > > > > > > > > > > > But AFAIK, this doesn't unplug the cpu from a QEMU standpoint. > > > > > > > > > > > > > > Unplugging CPU 0 with device_del should be ok too. > > > > > > Do you mean real unplugging (cpu0 object freed) or just remove request? > > > > > > > > > > Real unplugging should be possible. I'm not sure how thorougly it's > > > > > been tested, though. > > > > Well, common kvm code in qemu seems to be in disagreement with it > > > > as backtrace in this patch shows also usage of first_cpu macro > > > > won't survive such unplug. > > > > > > Paolo - any take on this? Do we need to make cpu 0 special like this? > > It probably would take a bunch of refactoring to get rid of first_cpu&co > > dependencies and besides of experimenting with cpu0 unplug in guest kernel > > there isn't any other value in it, so it probably not worth the effort. > > > > On top of that, for pc/q35 machine we would need to select boot cpu > > in some other way (right now it's hardwired to first_cpu). > > > > It seems that seabios might work if cpu0 isn't present, don't know about OVMF. > > OK, I queued this, but can you please add some code comments > so we remember why it is like this if someone changes the code? > Even better maybe to add an API along the lines of: > cpu_is_hotpluggable > return cpu != first_cpu I like an API idea, I'll try to look into how to use use it in a generic way.
© 2016 - 2025 Red Hat, Inc.