[PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus

Peter Xu posted 4 patches 4 years, 6 months ago
Failed in applying to current master (apply log)
hw/arm/stellaris.c           |  2 +-
hw/core/qdev.c               |  3 ++-
hw/display/ads7846.c         |  2 +-
hw/i2c/core.c                |  2 +-
hw/input/stellaris_input.c   |  3 ++-
hw/intc/apic_common.c        |  7 +++++--
hw/misc/max111x.c            |  2 +-
hw/net/eepro100.c            |  2 +-
hw/pci/pci.c                 |  2 +-
hw/ppc/spapr.c               |  2 +-
hw/timer/arm_timer.c         |  2 +-
hw/tpm/tpm_emulator.c        |  3 ++-
include/migration/register.h |  2 +-
include/migration/vmstate.h  |  6 ++++--
migration/savevm.c           | 40 +++++++++++++++++++++++++-----------
stubs/vmstate.c              |  2 +-
16 files changed, 53 insertions(+), 29 deletions(-)
[PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Peter Xu 4 years, 6 months ago
v2:
- use uint32_t rather than int64_t [Juan]
- one more patch (patch 4) to check dup SaveStateEntry [Dave]
- one more patch to define a macro (patch 1) to simplify patch 2

Please review, thanks.

Peter Xu (4):
  migration: Define VMSTATE_INSTANCE_ID_ANY
  migration: Change SaveStateEntry.instance_id into uint32_t
  apic: Use 32bit APIC ID for migration instance ID
  migration: Check in savevm_state_handler_insert for dups

 hw/arm/stellaris.c           |  2 +-
 hw/core/qdev.c               |  3 ++-
 hw/display/ads7846.c         |  2 +-
 hw/i2c/core.c                |  2 +-
 hw/input/stellaris_input.c   |  3 ++-
 hw/intc/apic_common.c        |  7 +++++--
 hw/misc/max111x.c            |  2 +-
 hw/net/eepro100.c            |  2 +-
 hw/pci/pci.c                 |  2 +-
 hw/ppc/spapr.c               |  2 +-
 hw/timer/arm_timer.c         |  2 +-
 hw/tpm/tpm_emulator.c        |  3 ++-
 include/migration/register.h |  2 +-
 include/migration/vmstate.h  |  6 ++++--
 migration/savevm.c           | 40 +++++++++++++++++++++++++-----------
 stubs/vmstate.c              |  2 +-
 16 files changed, 53 insertions(+), 29 deletions(-)

-- 
2.21.0


Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Eduardo Habkost 4 years, 6 months ago
On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> v2:
> - use uint32_t rather than int64_t [Juan]
> - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> - one more patch to define a macro (patch 1) to simplify patch 2
> 
> Please review, thanks.

I wonder how hard it is to write a simple test case to reproduce
the original bug.  We can extend tests/migration-test.c or
tests/acceptance/migration.py.  If using -device with explicit
apic-id, we probably don't even need to create >255 VCPUs.

> 
> Peter Xu (4):
>   migration: Define VMSTATE_INSTANCE_ID_ANY
>   migration: Change SaveStateEntry.instance_id into uint32_t
>   apic: Use 32bit APIC ID for migration instance ID
>   migration: Check in savevm_state_handler_insert for dups
> 
>  hw/arm/stellaris.c           |  2 +-
>  hw/core/qdev.c               |  3 ++-
>  hw/display/ads7846.c         |  2 +-
>  hw/i2c/core.c                |  2 +-
>  hw/input/stellaris_input.c   |  3 ++-
>  hw/intc/apic_common.c        |  7 +++++--
>  hw/misc/max111x.c            |  2 +-
>  hw/net/eepro100.c            |  2 +-
>  hw/pci/pci.c                 |  2 +-
>  hw/ppc/spapr.c               |  2 +-
>  hw/timer/arm_timer.c         |  2 +-
>  hw/tpm/tpm_emulator.c        |  3 ++-
>  include/migration/register.h |  2 +-
>  include/migration/vmstate.h  |  6 ++++--
>  migration/savevm.c           | 40 +++++++++++++++++++++++++-----------
>  stubs/vmstate.c              |  2 +-
>  16 files changed, 53 insertions(+), 29 deletions(-)
> 
> -- 
> 2.21.0
> 

-- 
Eduardo

Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Peter Xu 4 years, 6 months ago
On Wed, Oct 16, 2019 at 11:40:01AM -0300, Eduardo Habkost wrote:
> On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> > v2:
> > - use uint32_t rather than int64_t [Juan]
> > - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> > - one more patch to define a macro (patch 1) to simplify patch 2
> > 
> > Please review, thanks.
> 
> I wonder how hard it is to write a simple test case to reproduce
> the original bug.  We can extend tests/migration-test.c or
> tests/acceptance/migration.py.  If using -device with explicit
> apic-id, we probably don't even need to create >255 VCPUs.

I can give it a shot next week. :)

-- 
Peter Xu

Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Peter Xu 4 years, 6 months ago
On Sat, Oct 19, 2019 at 11:41:53AM +0800, Peter Xu wrote:
> On Wed, Oct 16, 2019 at 11:40:01AM -0300, Eduardo Habkost wrote:
> > On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> > > v2:
> > > - use uint32_t rather than int64_t [Juan]
> > > - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> > > - one more patch to define a macro (patch 1) to simplify patch 2
> > > 
> > > Please review, thanks.
> > 
> > I wonder how hard it is to write a simple test case to reproduce
> > the original bug.  We can extend tests/migration-test.c or
> > tests/acceptance/migration.py.  If using -device with explicit
> > apic-id, we probably don't even need to create >255 VCPUs.
> 
> I can give it a shot next week. :)

When trying this, I probably noticed a block layer issue: q35 seems to
have problem on booting from a very small block device (like 512B,
which is the image size that currently used for migration-test.c).
For example, this cmdline can boot successfully into the test image:

$qemu -M pc -m 200m -accel kvm -nographic \
      -drive file=$image,id=drive0,index=0,format=raw \
      -device ide-hd,drive=drive0

While this cannot:

$qemu -M q35 -m 200m -accel kvm -nographic \
      -drive file=$image,id=drive0,index=0,format=raw \
      -device ide-hd,drive=drive0

With error (BIOS debug messages on):

Booting from Hard Disk..invalid basic_access:143:
   a=00000201  b=00000000  c=00000001  d=00000080 ds=0000 es=07c0 ss=d980
  si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81  f=0202
invalid basic_access:144:
   a=00000201  b=00000000  c=00000001  d=00000080 ds=0000 es=07c0 ss=d980
  si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81  f=0202
.
Boot failed: could not read the boot disenter handle_18:
  NULL
k

This corresponds to this SeaBIOS check error:

static void noinline
basic_access(struct bregs *regs, struct drive_s *drive_fl, u16 command)
{
    ...
    // sanity check on cyl heads, sec
    if (cylinder >= nlc || head >= nlh || sector > nls) {
        warn_invalid(regs);
        disk_ret(regs, DISK_RET_EPARAM);
        return;
    }
    ...
}

And... below cmdline will work even for q35 (as suggested by Fam when
we talked offline):

$qemu -M q35 -m 200m -accel kvm -nographic \
      -drive file=$image,id=drive0,index=0,format=raw \
      -device ide-hd,drive=drive0,secs=1,cyls=1,heads=1

I think for migration test we can workaround like above, but I'm also
curious whether this is a real bug somewhere because I don't see a
reason for q35 to refuse to boot on a one-sector image.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Kevin Wolf 4 years, 6 months ago
Am 23.10.2019 um 09:57 hat Peter Xu geschrieben:
> On Sat, Oct 19, 2019 at 11:41:53AM +0800, Peter Xu wrote:
> > On Wed, Oct 16, 2019 at 11:40:01AM -0300, Eduardo Habkost wrote:
> > > On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> > > > v2:
> > > > - use uint32_t rather than int64_t [Juan]
> > > > - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> > > > - one more patch to define a macro (patch 1) to simplify patch 2
> > > > 
> > > > Please review, thanks.
> > > 
> > > I wonder how hard it is to write a simple test case to reproduce
> > > the original bug.  We can extend tests/migration-test.c or
> > > tests/acceptance/migration.py.  If using -device with explicit
> > > apic-id, we probably don't even need to create >255 VCPUs.
> > 
> > I can give it a shot next week. :)
> 
> When trying this, I probably noticed a block layer issue: q35 seems to
> have problem on booting from a very small block device (like 512B,
> which is the image size that currently used for migration-test.c).
> For example, this cmdline can boot successfully into the test image:
> 
> $qemu -M pc -m 200m -accel kvm -nographic \
>       -drive file=$image,id=drive0,index=0,format=raw \
>       -device ide-hd,drive=drive0
> 
> While this cannot:
> 
> $qemu -M q35 -m 200m -accel kvm -nographic \
>       -drive file=$image,id=drive0,index=0,format=raw \
>       -device ide-hd,drive=drive0

The important difference here is legacy IDE (which works) vs. AHCI
(which doesn't work). If you add a -device ahci to the -M pc case, it
starts failing, too.

Not sure why AHCI fails, but I'll just CC John who is the lucky
maintainer of this device. :-)

Kevin

> With error (BIOS debug messages on):
> 
> Booting from Hard Disk..invalid basic_access:143:
>    a=00000201  b=00000000  c=00000001  d=00000080 ds=0000 es=07c0 ss=d980
>   si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81  f=0202
> invalid basic_access:144:
>    a=00000201  b=00000000  c=00000001  d=00000080 ds=0000 es=07c0 ss=d980
>   si=00000000 di=00000000 bp=00000000 sp=0000fd8e cs=f000 ip=cb81  f=0202
> .
> Boot failed: could not read the boot disenter handle_18:
>   NULL
> k
> 
> This corresponds to this SeaBIOS check error:
> 
> static void noinline
> basic_access(struct bregs *regs, struct drive_s *drive_fl, u16 command)
> {
>     ...
>     // sanity check on cyl heads, sec
>     if (cylinder >= nlc || head >= nlh || sector > nls) {
>         warn_invalid(regs);
>         disk_ret(regs, DISK_RET_EPARAM);
>         return;
>     }
>     ...
> }
> 
> And... below cmdline will work even for q35 (as suggested by Fam when
> we talked offline):
> 
> $qemu -M q35 -m 200m -accel kvm -nographic \
>       -drive file=$image,id=drive0,index=0,format=raw \
>       -device ide-hd,drive=drive0,secs=1,cyls=1,heads=1
> 
> I think for migration test we can workaround like above, but I'm also
> curious whether this is a real bug somewhere because I don't see a
> reason for q35 to refuse to boot on a one-sector image.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 


Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by John Snow 4 years, 6 months ago

On 10/23/19 4:17 AM, Kevin Wolf wrote:
> The important difference here is legacy IDE (which works) vs. AHCI
> (which doesn't work). If you add a -device ahci to the -M pc case, it
> starts failing, too.
> 
> Not sure why AHCI fails, but I'll just CC John who is the lucky
> maintainer of this device. :-)

Hm... It looks like SeaBIOS is identifying the drive correctly and
perfectly well, but we're failing at boot_disk(u8 bootdrv, int
checksig), about here:

call16_int(0x13, &br);

if (br.flags & F_CF) {
    printf("Boot failed: could not read the boot disk\n\n");
    return;
}

Looking at AHCI tracing (From the QEMU side), it looks like we set up
the drive correctly, and then never touch the port ever again -- I don't
see an attempted read on QEMU's end.

I'll need to look through SeaBIOS source for hints, I'm not sure right
yet. If anyone is more familiar with the SeaBIOS boot code, maybe they
can give a pointer faster than I'll figure it out myself.

--js


Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Peter Xu 4 years, 6 months ago
On Thu, Oct 24, 2019 at 01:49:11PM -0400, John Snow wrote:
> 
> 
> On 10/23/19 4:17 AM, Kevin Wolf wrote:
> > The important difference here is legacy IDE (which works) vs. AHCI
> > (which doesn't work). If you add a -device ahci to the -M pc case, it
> > starts failing, too.
> > 
> > Not sure why AHCI fails, but I'll just CC John who is the lucky
> > maintainer of this device. :-)
> 
> Hm... It looks like SeaBIOS is identifying the drive correctly and
> perfectly well, but we're failing at boot_disk(u8 bootdrv, int
> checksig), about here:
> 
> call16_int(0x13, &br);
> 
> if (br.flags & F_CF) {
>     printf("Boot failed: could not read the boot disk\n\n");
>     return;
> }
> 
> Looking at AHCI tracing (From the QEMU side), it looks like we set up
> the drive correctly, and then never touch the port ever again -- I don't
> see an attempted read on QEMU's end.
> 
> I'll need to look through SeaBIOS source for hints, I'm not sure right
> yet. If anyone is more familiar with the SeaBIOS boot code, maybe they
> can give a pointer faster than I'll figure it out myself.

Hi, John,

I don't know seabios well, but I did have a pointer in my previous
email on where it faulted.  It seems to me that the issue is that
SeaBIOS may have got incorrect secs/cyls/heads information (and
explicitly setting secs=1,cyls=1,heads=1 on the block device fixes the
issue).

Thanks,

-- 
Peter Xu

Re: [PATCH v2 0/4] apic: Fix migration breakage of >255 vcpus
Posted by Peter Xu 4 years, 6 months ago
On Sat, Oct 19, 2019 at 11:41:53AM +0800, Peter Xu wrote:
> On Wed, Oct 16, 2019 at 11:40:01AM -0300, Eduardo Habkost wrote:
> > On Wed, Oct 16, 2019 at 10:29:29AM +0800, Peter Xu wrote:
> > > v2:
> > > - use uint32_t rather than int64_t [Juan]
> > > - one more patch (patch 4) to check dup SaveStateEntry [Dave]
> > > - one more patch to define a macro (patch 1) to simplify patch 2
> > > 
> > > Please review, thanks.
> > 
> > I wonder how hard it is to write a simple test case to reproduce
> > the original bug.  We can extend tests/migration-test.c or
> > tests/acceptance/migration.py.  If using -device with explicit
> > apic-id, we probably don't even need to create >255 VCPUs.
> 
> I can give it a shot next week. :)

When I was playing with it, I noticed that it's not that easy at least
for the migration-test.  We need to do these:

- add one specific CPU with apic-id>255, this is easy by using
  "-device qemu64-x86_64-cpu,..."

- enable x2apic in the guest code, read apic-id on the special vcpu to
  make sure it's correct even after migration, but before that...

- I failed to find a way to use apic-id>255 as the BSP of system but I
  can only create APs with apic-id>255, so we need to add initial MP
  support for the migration guest code, then run that apic-id check
  code on the new AP

- I also probably found that q35 bug on bootstraping the 512B disk, so
  we probably need to workaround that too until fixed

Unless someone has better idea on this, I'll simply stop here because
I'm afraid it does not worth the effort so far... (or until we have
some other requirement to enrich the migration qtest framework)

-- 
Peter Xu