[Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type

Sergio Lopez posted 4 patches 4 years, 10 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190702121106.28374-1-slp@redhat.com
Test s390x passed
Test FreeBSD passed
Test checkpatch failed
Test docker-clang@ubuntu failed
Test asan failed
Test docker-mingw@fedora passed
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Richard Henderson <rth@twiddle.net>, Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
default-configs/i386-softmmu.mak            |   1 +
hw/i386/Kconfig                             |   4 +
hw/i386/Makefile.objs                       |   2 +
hw/i386/microvm.c                           | 550 ++++++++++++++++++++
hw/i386/mptable.c                           | 156 ++++++
hw/i386/pc.c                                | 120 +----
hw/i386/pvh.c                               | 113 ++++
hw/i386/pvh.h                               |  10 +
hw/virtio/virtio-mmio.c                     |  35 +-
hw/virtio/virtio-mmio.h                     |  60 +++
include/hw/i386/microvm.h                   |  82 +++
include/hw/i386/mptable.h                   |  36 ++
include/standard-headers/linux/mpspec_def.h | 182 +++++++
13 files changed, 1209 insertions(+), 142 deletions(-)
create mode 100644 hw/i386/microvm.c
create mode 100644 hw/i386/mptable.c
create mode 100644 hw/i386/pvh.c
create mode 100644 hw/i386/pvh.h
create mode 100644 hw/virtio/virtio-mmio.h
create mode 100644 include/hw/i386/microvm.h
create mode 100644 include/hw/i386/mptable.h
create mode 100644 include/standard-headers/linux/mpspec_def.h
[Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 10 months ago
Microvm is a machine type inspired by both NEMU and Firecracker, and
constructed after the machine model implemented by the latter.

It's main purpose is providing users a KVM-only machine type with fast
boot times, minimal attack surface (measured as the number of IO ports
and MMIO regions exposed to the Guest) and small footprint (specially
when combined with the ongoing QEMU modularization effort).

Normally, other than the device support provided by KVM itself,
microvm only supports virtio-mmio devices. Microvm also includes a
legacy mode, which adds an ISA bus with a 16550A serial port, useful
for being able to see the early boot kernel messages.

Microvm only supports booting PVH-enabled Linux ELF images. Booting
other PVH-enabled kernels may be possible, but due to the lack of ACPI
and firmware, we're relying on the command line for specifying the
location of the virtio-mmio transports. If there's an interest on
using this machine type with other kernels, we'll try to find some
kind of middle ground solution.

This is the list of the exposed IO ports and MMIO regions when running
in non-legacy mode:

address-space: memory
    00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
    00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
    00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
    00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
    00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
    00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
    00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
    00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr

A QEMU instance with the microvm machine type can be invoked this way:

 - Normal mode:

qemu-system-x86_64 -M microvm -m 512m -smp 2 \
 -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -chardev pty,id=virtiocon0,server \
 -device virtio-serial-device \
 -device virtconsole,chardev=virtiocon0 \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0

 - Legacy mode:

qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
 -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
 -nodefaults -no-user-config \
 -drive id=test,file=test.img,format=raw,if=none \
 -device virtio-blk-device,drive=test \
 -netdev tap,id=tap0,script=no,downscript=no \
 -device virtio-net-device,netdev=tap0 \
 -serial stdio


Changelog:
v3:
  - Add initrd support (thanks Stefano).

v2:
  - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine".
  - Simplify machine definition (thanks Eduardo).
  - Remove use of unneeded NUMA-related callbacks (thanks Eduardo).
  - Add a patch to factorize PVH-related functions.
  - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo).


Sergio Lopez (4):
  hw/virtio: Factorize virtio-mmio headers
  hw/i386: Add an Intel MPTable generator
  hw/i386: Factorize PVH related functions
  hw/i386: Introduce the microvm machine type

 default-configs/i386-softmmu.mak            |   1 +
 hw/i386/Kconfig                             |   4 +
 hw/i386/Makefile.objs                       |   2 +
 hw/i386/microvm.c                           | 550 ++++++++++++++++++++
 hw/i386/mptable.c                           | 156 ++++++
 hw/i386/pc.c                                | 120 +----
 hw/i386/pvh.c                               | 113 ++++
 hw/i386/pvh.h                               |  10 +
 hw/virtio/virtio-mmio.c                     |  35 +-
 hw/virtio/virtio-mmio.h                     |  60 +++
 include/hw/i386/microvm.h                   |  82 +++
 include/hw/i386/mptable.h                   |  36 ++
 include/standard-headers/linux/mpspec_def.h | 182 +++++++
 13 files changed, 1209 insertions(+), 142 deletions(-)
 create mode 100644 hw/i386/microvm.c
 create mode 100644 hw/i386/mptable.c
 create mode 100644 hw/i386/pvh.c
 create mode 100644 hw/i386/pvh.h
 create mode 100644 hw/virtio/virtio-mmio.h
 create mode 100644 include/hw/i386/microvm.h
 create mode 100644 include/hw/i386/mptable.h
 create mode 100644 include/standard-headers/linux/mpspec_def.h

--
2.21.0

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Jing Liu 4 years, 8 months ago
Hi Sergio,

The idea is interesting and I tried to launch a guest by your
guide but seems failed to me. I tried both legacy and normal modes,
but the vncviewer connected and told me that:
The vm has no graphic display device.
All the screen in vnc is just black.

kernel config:
CONFIG_KVM_MMIO=y
CONFIG_VIRTIO_MMIO=y

I don't know if any specified kernel version/patch/config
is needed or anything I missed.
Could you kindly give some tips?

Thanks very much.
Jing



> A QEMU instance with the microvm machine type can be invoked this way:
> 
>   - Normal mode:
> 
> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>   -nodefaults -no-user-config \
>   -chardev pty,id=virtiocon0,server \
>   -device virtio-serial-device \
>   -device virtconsole,chardev=virtiocon0 \
>   -drive id=test,file=test.img,format=raw,if=none \
>   -device virtio-blk-device,drive=test \
>   -netdev tap,id=tap0,script=no,downscript=no \
>   -device virtio-net-device,netdev=tap0
> 
>   - Legacy mode:
> 
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>   -nodefaults -no-user-config \
>   -drive id=test,file=test.img,format=raw,if=none \
>   -device virtio-blk-device,drive=test \
>   -netdev tap,id=tap0,script=no,downscript=no \
>   -device virtio-net-device,netdev=tap0 \
>   -serial stdio
> 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 8 months ago
Jing Liu <jing2.liu@linux.intel.com> writes:

> Hi Sergio,
>
> The idea is interesting and I tried to launch a guest by your
> guide but seems failed to me. I tried both legacy and normal modes,
> but the vncviewer connected and told me that:
> The vm has no graphic display device.
> All the screen in vnc is just black.

The microvm machine type doesn't support any graphics device, so you
need to rely on the serial console.

> kernel config:
> CONFIG_KVM_MMIO=y
> CONFIG_VIRTIO_MMIO=y
>
> I don't know if any specified kernel version/patch/config
> is needed or anything I missed.
> Could you kindly give some tips?

I'm testing it with upstream vanilla Linux. In addition to MMIO, you
need to add support for PVH (the next version of this patchset, v4, will
support booting from FW, so it'll be possible to use non-PVH ELF kernels
and bzImages too).

I've just uploaded a working kernel config here:

https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9

As for the QEMU command line, something like this should do the trick:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio

If this works, you can move to non-legacy mode with a virtio-console:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0

If is still working, you can try adding some devices too:

./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test

Sergio.

> Thanks very much.
> Jing
>
>
>
>> A QEMU instance with the microvm machine type can be invoked this way:
>>
>>   - Normal mode:
>>
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -chardev pty,id=virtiocon0,server \
>>   -device virtio-serial-device \
>>   -device virtconsole,chardev=virtiocon0 \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0
>>
>>   - Legacy mode:
>>
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>   -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>   -nodefaults -no-user-config \
>>   -drive id=test,file=test.img,format=raw,if=none \
>>   -device virtio-blk-device,drive=test \
>>   -netdev tap,id=tap0,script=no,downscript=no \
>>   -device virtio-net-device,netdev=tap0 \
>>   -serial stdio
>>

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Jing Liu 4 years, 8 months ago
Hi Sergio,

On 8/29/2019 11:46 PM, Sergio Lopez wrote:
> 
> Jing Liu <jing2.liu@linux.intel.com> writes:
> 
>> Hi Sergio,
>>
>> The idea is interesting and I tried to launch a guest by your
>> guide but seems failed to me. I tried both legacy and normal modes,
>> but the vncviewer connected and told me that:
>> The vm has no graphic display device.
>> All the screen in vnc is just black.
> 
> The microvm machine type doesn't support any graphics device, so you
> need to rely on the serial console.
Got it.

> 
>> kernel config:
>> CONFIG_KVM_MMIO=y
>> CONFIG_VIRTIO_MMIO=y
>>
>> I don't know if any specified kernel version/patch/config
>> is needed or anything I missed.
>> Could you kindly give some tips?
> 
> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
> need to add support for PVH (the next version of this patchset, v4, will
> support booting from FW, so it'll be possible to use non-PVH ELF kernels
> and bzImages too).
> 
> I've just uploaded a working kernel config here:
> 
> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
> 
Thanks very much and this config is helpful to me.

> As for the QEMU command line, something like this should do the trick:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
> 
> If this works, you can move to non-legacy mode with a virtio-console:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
> 
I tried the above two ways and it works now. Thanks!

> If is still working, you can try adding some devices too:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
> 
But I'm wondering why the image I used can not be found.
root=/dev/vda3 and the same image worked well on normal qemu/guest-
config bootup, but didn't work here. The details are,

-append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \

[    0.022784] Key type encrypted registered
[    0.022988] VFS: Cannot open root device "vda3" or 
unknown-block(254,3): error -6
[    0.023041] Please append a correct "root=" boot option; here are the 
available partitions:
[    0.023089] fe00         8946688 vda
[    0.023090]  driver: virtio_blk
[    0.023143] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,3)
[    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23


BTW, root=/dev/vda is also tried and didn't work. The dmesg is a little 
different:

[    0.028050] Key type encrypted registered
[    0.028484] List of all partitions:
[    0.028529] fe00         8946688 vda
[    0.028529]  driver: virtio_blk
[    0.028615] No filesystem could mount root, tried:
[    0.028616]  ext4
[    0.028670]
[    0.028712] Kernel panic - not syncing: VFS: Unable to mount root fs 
on unknown-block(254,0)

I tried another ext4 img but still doesn't work.
Is there any limitation of blk image? Could I copy your image for simple
test?

Thanks in advance,
Jing

> Sergio.
> 
>> Thanks very much.
>> Jing
>>
>>
>>
>>> A QEMU instance with the microvm machine type can be invoked this way:
>>>
>>>    - Normal mode:
>>>
>>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>>    -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>>    -nodefaults -no-user-config \
>>>    -chardev pty,id=virtiocon0,server \
>>>    -device virtio-serial-device \
>>>    -device virtconsole,chardev=virtiocon0 \
>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>    -device virtio-blk-device,drive=test \
>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>    -device virtio-net-device,netdev=tap0
>>>
>>>    - Legacy mode:
>>>
>>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>>    -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>>    -nodefaults -no-user-config \
>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>    -device virtio-blk-device,drive=test \
>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>    -device virtio-net-device,netdev=tap0 \
>>>    -serial stdio
>>>
> 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 8 months ago
Jing Liu <jing2.liu@linux.intel.com> writes:

> Hi Sergio,
>
> On 8/29/2019 11:46 PM, Sergio Lopez wrote:
>>
>> Jing Liu <jing2.liu@linux.intel.com> writes:
>>
>>> Hi Sergio,
>>>
>>> The idea is interesting and I tried to launch a guest by your
>>> guide but seems failed to me. I tried both legacy and normal modes,
>>> but the vncviewer connected and told me that:
>>> The vm has no graphic display device.
>>> All the screen in vnc is just black.
>>
>> The microvm machine type doesn't support any graphics device, so you
>> need to rely on the serial console.
> Got it.
>
>>
>>> kernel config:
>>> CONFIG_KVM_MMIO=y
>>> CONFIG_VIRTIO_MMIO=y
>>>
>>> I don't know if any specified kernel version/patch/config
>>> is needed or anything I missed.
>>> Could you kindly give some tips?
>>
>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
>> need to add support for PVH (the next version of this patchset, v4, will
>> support booting from FW, so it'll be possible to use non-PVH ELF kernels
>> and bzImages too).
>>
>> I've just uploaded a working kernel config here:
>>
>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
>>
> Thanks very much and this config is helpful to me.
>
>> As for the QEMU command line, something like this should do the trick:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
>>
>> If this works, you can move to non-legacy mode with a virtio-console:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
>>
> I tried the above two ways and it works now. Thanks!
>
>> If is still working, you can try adding some devices too:
>>
>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
>>
> But I'm wondering why the image I used can not be found.
> root=/dev/vda3 and the same image worked well on normal qemu/guest-
> config bootup, but didn't work here. The details are,
>
> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \
>
> [    0.022784] Key type encrypted registered
> [    0.022988] VFS: Cannot open root device "vda3" or
> unknown-block(254,3): error -6
> [    0.023041] Please append a correct "root=" boot option; here are
> the available partitions:
> [    0.023089] fe00         8946688 vda
> [    0.023090]  driver: virtio_blk
> [    0.023143] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,3)
> [    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23
>
>
> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
> little different:
>
> [    0.028050] Key type encrypted registered
> [    0.028484] List of all partitions:
> [    0.028529] fe00         8946688 vda
> [    0.028529]  driver: virtio_blk
> [    0.028615] No filesystem could mount root, tried:
> [    0.028616]  ext4
> [    0.028670]
> [    0.028712] Kernel panic - not syncing: VFS: Unable to mount root
> fs on unknown-block(254,0)
>
> I tried another ext4 img but still doesn't work.
> Is there any limitation of blk image? Could I copy your image for simple
> test?

The kernel config I posted lacks support for DOS partitions. Adding
CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.

Anyway, in case you also want to try booting from /dev/vda (without
partitions), this is the recipe I use to quickly create a minimal rootfs
image:

# wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
# qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
# sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
# sudo mkfs.ext4 /dev/loop0
# sudo mount /dev/loop0 /mnt
# sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
# sudo umount /mnt
# sudo losetup -d /dev/loop0

The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
to the command line.

Sergio.

> Thanks in advance,
> Jing
>
>> Sergio.
>>
>>> Thanks very much.
>>> Jing
>>>
>>>
>>>
>>>> A QEMU instance with the microvm machine type can be invoked this way:
>>>>
>>>>    - Normal mode:
>>>>
>>>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>>>    -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>>>    -nodefaults -no-user-config \
>>>>    -chardev pty,id=virtiocon0,server \
>>>>    -device virtio-serial-device \
>>>>    -device virtconsole,chardev=virtiocon0 \
>>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>>    -device virtio-blk-device,drive=test \
>>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>>    -device virtio-net-device,netdev=tap0
>>>>
>>>>    - Legacy mode:
>>>>
>>>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>>>    -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>>>    -nodefaults -no-user-config \
>>>>    -drive id=test,file=test.img,format=raw,if=none \
>>>>    -device virtio-blk-device,drive=test \
>>>>    -netdev tap,id=tap0,script=no,downscript=no \
>>>>    -device virtio-net-device,netdev=tap0 \
>>>>    -serial stdio
>>>>
>>

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Jing Liu 4 years, 8 months ago

On 8/30/2019 10:27 PM, Sergio Lopez wrote:
> 
> Jing Liu <jing2.liu@linux.intel.com> writes:
> 
>> Hi Sergio,
>>
>> On 8/29/2019 11:46 PM, Sergio Lopez wrote:
>>>
>>> Jing Liu <jing2.liu@linux.intel.com> writes:
>>>
>>>> Hi Sergio,
>>>>
>>>> The idea is interesting and I tried to launch a guest by your
>>>> guide but seems failed to me. I tried both legacy and normal modes,
>>>> but the vncviewer connected and told me that:
>>>> The vm has no graphic display device.
>>>> All the screen in vnc is just black.
>>>
>>> The microvm machine type doesn't support any graphics device, so you
>>> need to rely on the serial console.
>> Got it.
>>
>>>
>>>> kernel config:
>>>> CONFIG_KVM_MMIO=y
>>>> CONFIG_VIRTIO_MMIO=y
>>>>
>>>> I don't know if any specified kernel version/patch/config
>>>> is needed or anything I missed.
>>>> Could you kindly give some tips?
>>>
>>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you
>>> need to add support for PVH (the next version of this patchset, v4, will
>>> support booting from FW, so it'll be possible to use non-PVH ELF kernels
>>> and bzImages too).
>>>
>>> I've just uploaded a working kernel config here:
>>>
>>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9
>>>
>> Thanks very much and this config is helpful to me.
>>
>>> As for the QEMU command line, something like this should do the trick:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio
>>>
>>> If this works, you can move to non-legacy mode with a virtio-console:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0
>>>
>> I tried the above two ways and it works now. Thanks!
>>
>>> If is still working, you can try adding some devices too:
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test
>>>
>> But I'm wondering why the image I used can not be found.
>> root=/dev/vda3 and the same image worked well on normal qemu/guest-
>> config bootup, but didn't work here. The details are,
>>
>> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \
>>
>> [    0.022784] Key type encrypted registered
>> [    0.022988] VFS: Cannot open root device "vda3" or
>> unknown-block(254,3): error -6
>> [    0.023041] Please append a correct "root=" boot option; here are
>> the available partitions:
>> [    0.023089] fe00         8946688 vda
>> [    0.023090]  driver: virtio_blk
>> [    0.023143] Kernel panic - not syncing: VFS: Unable to mount root
>> fs on unknown-block(254,3)
>> [    0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23
>>
>>
>> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a
>> little different:
>>
>> [    0.028050] Key type encrypted registered
>> [    0.028484] List of all partitions:
>> [    0.028529] fe00         8946688 vda
>> [    0.028529]  driver: virtio_blk
>> [    0.028615] No filesystem could mount root, tried:
>> [    0.028616]  ext4
>> [    0.028670]
>> [    0.028712] Kernel panic - not syncing: VFS: Unable to mount root
>> fs on unknown-block(254,0)
>>
>> I tried another ext4 img but still doesn't work.
>> Is there any limitation of blk image? Could I copy your image for simple
>> test?
> 
> The kernel config I posted lacks support for DOS partitions. Adding
> CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3.
> 
> Anyway, in case you also want to try booting from /dev/vda (without
> partitions), this is the recipe I use to quickly create a minimal rootfs
> image:
> 
> # wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz
> # qemu-img create -f raw alpine-rootfs-x86_64.raw 1G
> # sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw
> # sudo mkfs.ext4 /dev/loop0
> # sudo mount /dev/loop0 /mnt
> # sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt
> # sudo umount /mnt
> # sudo losetup -d /dev/loop0
> 
> The rootfs will be missing openrc, so you'll need to add "init=/bin/sh"
> to the command line.
> 

Thank you Sergio. I'll try that.

Jing
> Sergio.
> 
>> Thanks in advance,
>> Jing
>>
>>> Sergio.

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 10 months ago
On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
> 
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
> 
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.
> 
> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> and firmware, we're relying on the command line for specifying the
> location of the virtio-mmio transports. If there's an interest on
> using this machine type with other kernels, we'll try to find some
> kind of middle ground solution.
> 
> This is the list of the exposed IO ports and MMIO regions when running
> in non-legacy mode:
> 
> address-space: memory
>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
> address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
> 
> A QEMU instance with the microvm machine type can be invoked this way:
> 
>  - Normal mode:
> 
> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -chardev pty,id=virtiocon0,server \
>  -device virtio-serial-device \
>  -device virtconsole,chardev=virtiocon0 \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0
> 
>  - Legacy mode:
> 
> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>  -nodefaults -no-user-config \
>  -drive id=test,file=test.img,format=raw,if=none \
>  -device virtio-blk-device,drive=test \
>  -netdev tap,id=tap0,script=no,downscript=no \
>  -device virtio-net-device,netdev=tap0 \
>  -serial stdio

Please post metrics that compare this against a minimal Q35.

With qboot it was later found that SeaBIOS can achieve comparable boot
times, so it wasn't worth maintaining qboot.

Data is needed to show that microvm is really a significant improvement
over a minimal Q35.

Stefan
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>> 
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>> 
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>> 
>> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> and firmware, we're relying on the command line for specifying the
>> location of the virtio-mmio transports. If there's an interest on
>> using this machine type with other kernels, we'll try to find some
>> kind of middle ground solution.
>> 
>> This is the list of the exposed IO ports and MMIO regions when running
>> in non-legacy mode:
>> 
>> address-space: memory
>>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>> address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>> 
>> A QEMU instance with the microvm machine type can be invoked this way:
>> 
>>  - Normal mode:
>> 
>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -chardev pty,id=virtiocon0,server \
>>  -device virtio-serial-device \
>>  -device virtconsole,chardev=virtiocon0 \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0
>> 
>>  - Legacy mode:
>> 
>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>>  -nodefaults -no-user-config \
>>  -drive id=test,file=test.img,format=raw,if=none \
>>  -device virtio-blk-device,drive=test \
>>  -netdev tap,id=tap0,script=no,downscript=no \
>>  -device virtio-net-device,netdev=tap0 \
>>  -serial stdio
>
> Please post metrics that compare this against a minimal Q35.
>
> With qboot it was later found that SeaBIOS can achieve comparable boot
> times, so it wasn't worth maintaining qboot.
>
> Data is needed to show that microvm is really a significant improvement
> over a minimal Q35.

I've just ran some numbers using Stefano Garzarella's qemu-boot-time
scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
minimal features [2]. The VM boots a minimal kernel [3] without initrd,
using a kata container image as root via virtio-blk (though this isn't
really relevant, as we're just taking measurements until the kernel is
about to exec init).

To try to make the comparison as fair as possible, I've used a minimal
q35 machine with as few devices as possible. Disabling HPET and PIT at
the same time caused the kernel to get stuck on boot, so I ran two
iterations, one without HPET and the other without PIT:


-----------------
 | Q35 with HPET |
 -----------------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 77.637936
 linux_start_kernel: 117.082526 (+39.44459)
 linux_start_user: 364.629972 (+247.547446)

Memory tree:

 address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
      00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
      00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
        00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
        00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
        00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
        00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
      00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
        00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
        00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
        00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
        00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
      00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
        00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
        00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
      00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
        00000000febff000-00000000febff01f (prio 0, i/o): msix-table
        00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
    00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
    0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
    00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
    00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
    00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

 address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
    0000000000000008-000000000000000f (prio 0, i/o): dma-cont
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
    0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
    0000000000000070-0000000000000071 (prio 0, i/o): rtc
      0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    0000000000000080-0000000000000080 (prio 0, i/o): ioport80
    0000000000000081-0000000000000083 (prio 0, i/o): dma-page
    0000000000000087-0000000000000087 (prio 0, i/o): dma-page
    0000000000000089-000000000000008b (prio 0, i/o): dma-page
    000000000000008f-000000000000008f (prio 0, i/o): dma-page
    0000000000000092-0000000000000092 (prio 0, i/o): port92
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
    00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
    00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
    00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
    0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
    0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
    0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
      0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
      0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
      0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
      0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
      0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
      0000000000000660-000000000000067f (prio 0, i/o): sm-tco
    0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
    0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
    0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
    0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
    000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
    000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci


 ----------------
 | Q35 with PIT |
 ----------------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 77.467852
 linux_start_kernel: 116.688472 (+39.22062)
 linux_start_user: 363.033365 (+246.344893)

Memory tree:

address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
      00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
      00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
        00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
        00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
        00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
        00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
      00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
        00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
        00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
        00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
        00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
      00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
        00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
        00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
      00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
        00000000febff000-00000000febff01f (prio 0, i/o): msix-table
        00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
    00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
    0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
    00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
    00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
    0000000000000008-000000000000000f (prio 0, i/o): dma-cont
    0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
    0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
    0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
    0000000000000061-0000000000000061 (prio 0, i/o): pcspk
    0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
    0000000000000070-0000000000000071 (prio 0, i/o): rtc
      0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
    0000000000000080-0000000000000080 (prio 0, i/o): ioport80
    0000000000000081-0000000000000083 (prio 0, i/o): dma-page
    0000000000000087-0000000000000087 (prio 0, i/o): dma-page
    0000000000000089-000000000000008b (prio 0, i/o): dma-page
    000000000000008f-000000000000008f (prio 0, i/o): dma-page
    0000000000000092-0000000000000092 (prio 0, i/o): port92
    00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
    00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
    00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
    00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
    00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
    00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
    00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
    0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
    0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
    0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
      0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
      0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
      0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
      0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
      0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
      0000000000000660-000000000000067f (prio 0, i/o): sm-tco
    0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
    0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
    0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
    0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
    000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
    000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci


 -----------
 | microvm |
 -----------

Command line:

./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test

Average boot times after 10 consecutive runs:

 qemu_init_end: 64.043264
 linux_start_kernel: 65.481782 (+1.438518)
 linux_start_user: 114.938353 (+49.456571)

Memory tree:

 address-space: memory
  0000000000000000-ffffffffffffffff (prio 0, i/o): system
    0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
    00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
    00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
    00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
    00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
    00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
    00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi

 address-space: I/O
  0000000000000000-000000000000ffff (prio 0, i/o): io
    000000000000007e-000000000000007f (prio 0, i/o): kvmvapic


 --------------
 | Conclusion |
 --------------

The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
and is smaller on all sections (QEMU initialization, firmware overhead
and kernel start-to-user).

Microvm's memory tree is also visibly simpler, significantly reducing
the exposed surface to the guest.

While we can certainly work on making Q35 smaller, I definitely think
it's better (and way safer!) having a specialized machine type for a
specific use case, than a minimal Q35 whose behavior significantly
diverges from a conventional Q35.

Sergio.

[1] https://github.com/stefano-garzarella/qemu-boot-time
[2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw
[3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 9 months ago
On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> 
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> 
> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
> >> constructed after the machine model implemented by the latter.
> >> 
> >> It's main purpose is providing users a KVM-only machine type with fast
> >> boot times, minimal attack surface (measured as the number of IO ports
> >> and MMIO regions exposed to the Guest) and small footprint (specially
> >> when combined with the ongoing QEMU modularization effort).
> >> 
> >> Normally, other than the device support provided by KVM itself,
> >> microvm only supports virtio-mmio devices. Microvm also includes a
> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> >> for being able to see the early boot kernel messages.
> >> 
> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
> >> and firmware, we're relying on the command line for specifying the
> >> location of the virtio-mmio transports. If there's an interest on
> >> using this machine type with other kernels, we'll try to find some
> >> kind of middle ground solution.
> >> 
> >> This is the list of the exposed IO ports and MMIO regions when running
> >> in non-legacy mode:
> >> 
> >> address-space: memory
> >>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
> >>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
> >>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
> >>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
> >>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> >> 
> >> address-space: I/O
> >>   0000000000000000-000000000000ffff (prio 0, i/o): io
> >>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
> >>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
> >>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
> >>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
> >>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
> >>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
> >> 
> >> A QEMU instance with the microvm machine type can be invoked this way:
> >> 
> >>  - Normal mode:
> >> 
> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -chardev pty,id=virtiocon0,server \
> >>  -device virtio-serial-device \
> >>  -device virtconsole,chardev=virtiocon0 \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0
> >> 
> >>  - Legacy mode:
> >> 
> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
> >>  -nodefaults -no-user-config \
> >>  -drive id=test,file=test.img,format=raw,if=none \
> >>  -device virtio-blk-device,drive=test \
> >>  -netdev tap,id=tap0,script=no,downscript=no \
> >>  -device virtio-net-device,netdev=tap0 \
> >>  -serial stdio
> >
> > Please post metrics that compare this against a minimal Q35.
> >
> > With qboot it was later found that SeaBIOS can achieve comparable boot
> > times, so it wasn't worth maintaining qboot.
> >
> > Data is needed to show that microvm is really a significant improvement
> > over a minimal Q35.
> 
> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
> using a kata container image as root via virtio-blk (though this isn't
> really relevant, as we're just taking measurements until the kernel is
> about to exec init).
> 
> To try to make the comparison as fair as possible, I've used a minimal
> q35 machine with as few devices as possible. Disabling HPET and PIT at
> the same time caused the kernel to get stuck on boot, so I ran two
> iterations, one without HPET and the other without PIT:
> 
> 
> -----------------
>  | Q35 with HPET |
>  -----------------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 77.637936
>  linux_start_kernel: 117.082526 (+39.44459)
>  linux_start_user: 364.629972 (+247.547446)
> 
> Memory tree:
> 
>  address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
>  address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
> 
> 
>  ----------------
>  | Q35 with PIT |
>  ----------------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 77.467852
>  linux_start_kernel: 116.688472 (+39.22062)
>  linux_start_user: 363.033365 (+246.344893)
> 
> Memory tree:
> 
> address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
> address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>     0000000000000061-0000000000000061 (prio 0, i/o): pcspk
>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
> 
> 
>  -----------
>  | microvm |
>  -----------
> 
> Command line:
> 
> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test
> 
> Average boot times after 10 consecutive runs:
> 
>  qemu_init_end: 64.043264
>  linux_start_kernel: 65.481782 (+1.438518)
>  linux_start_user: 114.938353 (+49.456571)
> 
> Memory tree:
> 
>  address-space: memory
>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
> 
>  address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, i/o): io
>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
> 
> 
>  --------------
>  | Conclusion |
>  --------------
> 
> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
> and is smaller on all sections (QEMU initialization, firmware overhead
> and kernel start-to-user).
> 
> Microvm's memory tree is also visibly simpler, significantly reducing
> the exposed surface to the guest.
> 
> While we can certainly work on making Q35 smaller, I definitely think
> it's better (and way safer!) having a specialized machine type for a
> specific use case, than a minimal Q35 whose behavior significantly
> diverges from a conventional Q35.

Interesting, so not a 10x difference!  This might be amenable to
optimization.

My concern with microvm is that it's so limited that few users will be
able to benefit from the reduced attack surface and faster startup time.
I think it's worth investigating slimming down Q35 further first.

In terms of startup time the first step would be profiling Q35 kernel
startup to find out what's taking so long (firmware initialization, PCI
probing, etc)?

> Sergio.
> 
> [1] https://github.com/stefano-garzarella/qemu-boot-time
> [2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw
> [3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> 
>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>> 
>> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> >> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> >> constructed after the machine model implemented by the latter.
>> >> 
>> >> It's main purpose is providing users a KVM-only machine type with fast
>> >> boot times, minimal attack surface (measured as the number of IO ports
>> >> and MMIO regions exposed to the Guest) and small footprint (specially
>> >> when combined with the ongoing QEMU modularization effort).
>> >> 
>> >> Normally, other than the device support provided by KVM itself,
>> >> microvm only supports virtio-mmio devices. Microvm also includes a
>> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> >> for being able to see the early boot kernel messages.
>> >> 
>> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting
>> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI
>> >> and firmware, we're relying on the command line for specifying the
>> >> location of the virtio-mmio transports. If there's an interest on
>> >> using this machine type with other kernels, we'll try to find some
>> >> kind of middle ground solution.
>> >> 
>> >> This is the list of the exposed IO ports and MMIO regions when running
>> >> in non-legacy mode:
>> >> 
>> >> address-space: memory
>> >>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio
>> >>     00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio
>> >>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> >> 
>> >> address-space: I/O
>> >>   0000000000000000-000000000000ffff (prio 0, i/o): io
>> >>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>> >>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>> >>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>> >>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>> >>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>> >>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>> >> 
>> >> A QEMU instance with the microvm machine type can be invoked this way:
>> >> 
>> >>  - Normal mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=hvc0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -chardev pty,id=virtiocon0,server \
>> >>  -device virtio-serial-device \
>> >>  -device virtconsole,chardev=virtiocon0 \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0
>> >> 
>> >>  - Legacy mode:
>> >> 
>> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \
>> >>  -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \
>> >>  -nodefaults -no-user-config \
>> >>  -drive id=test,file=test.img,format=raw,if=none \
>> >>  -device virtio-blk-device,drive=test \
>> >>  -netdev tap,id=tap0,script=no,downscript=no \
>> >>  -device virtio-net-device,netdev=tap0 \
>> >>  -serial stdio
>> >
>> > Please post metrics that compare this against a minimal Q35.
>> >
>> > With qboot it was later found that SeaBIOS can achieve comparable boot
>> > times, so it wasn't worth maintaining qboot.
>> >
>> > Data is needed to show that microvm is really a significant improvement
>> > over a minimal Q35.
>> 
>> I've just ran some numbers using Stefano Garzarella's qemu-boot-time
>> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the
>> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with
>> minimal features [2]. The VM boots a minimal kernel [3] without initrd,
>> using a kata container image as root via virtio-blk (though this isn't
>> really relevant, as we're just taking measurements until the kernel is
>> about to exec init).
>> 
>> To try to make the comparison as fair as possible, I've used a minimal
>> q35 machine with as few devices as possible. Disabling HPET and PIT at
>> the same time caused the kernel to get stuck on boot, so I ran two
>> iterations, one without HPET and the other without PIT:
>> 
>> 
>> -----------------
>>  | Q35 with HPET |
>>  -----------------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 77.637936
>>  linux_start_kernel: 117.082526 (+39.44459)
>>  linux_start_user: 364.629972 (+247.547446)
>> 
>> Memory tree:
>> 
>>  address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
>>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>>  address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
>> 
>> 
>>  ----------------
>>  | Q35 with PIT |
>>  ----------------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 77.467852
>>  linux_start_kernel: 116.688472 (+39.22062)
>>  linux_start_user: 363.033365 (+246.344893)
>> 
>> Memory tree:
>> 
>> address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff
>>     0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>>       00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>>       00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff
>>       00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci
>>         00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common
>>         00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr
>>         00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device
>>         00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify
>>       00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci
>>         00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common
>>         00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr
>>         00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device
>>         00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify
>>       00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix
>>         00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table
>>         00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba
>>       00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix
>>         00000000febff000-00000000febff01f (prio 0, i/o): msix-table
>>         00000000febff800-00000000febff807 (prio 0, i/o): msix-pba
>>       00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>>     00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff
>>     00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff
>>     00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff
>>     00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff
>>     00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled]
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff
>>     00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff
>>     00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff
>>     00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff
>>     00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled]
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff
>>     00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff
>>     00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff
>>     00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled]
>>     00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled]
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff
>>     00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled]
>>     0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled]
>>     00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio
>>     00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled]
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>> address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     0000000000000000-0000000000000007 (prio 0, i/o): dma-chan
>>     0000000000000008-000000000000000f (prio 0, i/o): dma-cont
>>     0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic
>>     0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit
>>     0000000000000060-0000000000000060 (prio 0, i/o): i8042-data
>>     0000000000000061-0000000000000061 (prio 0, i/o): pcspk
>>     0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd
>>     0000000000000070-0000000000000071 (prio 0, i/o): rtc
>>       0000000000000070-0000000000000070 (prio 0, i/o): rtc-index
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>>     0000000000000080-0000000000000080 (prio 0, i/o): ioport80
>>     0000000000000081-0000000000000083 (prio 0, i/o): dma-page
>>     0000000000000087-0000000000000087 (prio 0, i/o): dma-page
>>     0000000000000089-000000000000008b (prio 0, i/o): dma-page
>>     000000000000008f-000000000000008f (prio 0, i/o): dma-page
>>     0000000000000092-0000000000000092 (prio 0, i/o): port92
>>     00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic
>>     00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io
>>     00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan
>>     00000000000000d0-00000000000000df (prio 0, i/o): dma-cont
>>     00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0
>>     00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr
>>     00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr
>>     0000000000000510-0000000000000511 (prio 0, i/o): fwcfg
>>     0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma
>>     0000000000000600-000000000000067f (prio 0, i/o): ich9-pm
>>       0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt
>>       0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt
>>       0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr
>>       0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0
>>       0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi
>>       0000000000000660-000000000000067f (prio 0, i/o): sm-tco
>>     0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug
>>     0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx
>>     0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control
>>     0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data
>>     000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci
>>     000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci
>> 
>> 
>>  -----------
>>  | microvm |
>>  -----------
>> 
>> Command line:
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test
>> 
>> Average boot times after 10 consecutive runs:
>> 
>>  qemu_init_end: 64.043264
>>  linux_start_kernel: 65.481782 (+1.438518)
>>  linux_start_user: 114.938353 (+49.456571)
>> 
>> Memory tree:
>> 
>>  address-space: memory
>>   0000000000000000-ffffffffffffffff (prio 0, i/o): system
>>     0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff
>>     00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio
>>     00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio
>>     00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio
>>     00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio
>>     00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic
>>     00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi
>> 
>>  address-space: I/O
>>   0000000000000000-000000000000ffff (prio 0, i/o): io
>>     000000000000007e-000000000000007f (prio 0, i/o): kvmvapic
>> 
>> 
>>  --------------
>>  | Conclusion |
>>  --------------
>> 
>> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
>> and is smaller on all sections (QEMU initialization, firmware overhead
>> and kernel start-to-user).
>> 
>> Microvm's memory tree is also visibly simpler, significantly reducing
>> the exposed surface to the guest.
>> 
>> While we can certainly work on making Q35 smaller, I definitely think
>> it's better (and way safer!) having a specialized machine type for a
>> specific use case, than a minimal Q35 whose behavior significantly
>> diverges from a conventional Q35.
>
> Interesting, so not a 10x difference!  This might be amenable to
> optimization.
>
> My concern with microvm is that it's so limited that few users will be
> able to benefit from the reduced attack surface and faster startup time.
> I think it's worth investigating slimming down Q35 further first.
>
> In terms of startup time the first step would be profiling Q35 kernel
> startup to find out what's taking so long (firmware initialization, PCI
> probing, etc)?

Some findings:

 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a
    whooping 120ms by avoiding the APIC timer calibration at
    arch/x86/kernel/apic/apic.c:calibrate_APIC_clock

Average boot time with "-cpu host"
 qemu_init_end: 76.408950
 linux_start_kernel: 116.166142 (+39.757192)
 linux_start_user: 242.954347 (+126.788205)

Average boot time with default "cpu"
 qemu_init_end: 77.467852
 linux_start_kernel: 116.688472 (+39.22062)
 linux_start_user: 363.033365 (+246.344893)

 2. The other 130ms are a direct result of PCI and ACPI presence (tested
    with a kernel without support for those elements). I'll publish some
    detailed numbers next week.

Sergio.
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 9 months ago
On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >>
> >> Stefan Hajnoczi <stefanha@gmail.com> writes:
> >>
> >> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >>  --------------
> >>  | Conclusion |
> >>  --------------
> >>
> >> The average boot time of microvm is a third of Q35's (115ms vs. 363ms),
> >> and is smaller on all sections (QEMU initialization, firmware overhead
> >> and kernel start-to-user).
> >>
> >> Microvm's memory tree is also visibly simpler, significantly reducing
> >> the exposed surface to the guest.
> >>
> >> While we can certainly work on making Q35 smaller, I definitely think
> >> it's better (and way safer!) having a specialized machine type for a
> >> specific use case, than a minimal Q35 whose behavior significantly
> >> diverges from a conventional Q35.
> >
> > Interesting, so not a 10x difference!  This might be amenable to
> > optimization.
> >
> > My concern with microvm is that it's so limited that few users will be
> > able to benefit from the reduced attack surface and faster startup time.
> > I think it's worth investigating slimming down Q35 further first.
> >
> > In terms of startup time the first step would be profiling Q35 kernel
> > startup to find out what's taking so long (firmware initialization, PCI
> > probing, etc)?
>
> Some findings:
>
>  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a
>     whooping 120ms by avoiding the APIC timer calibration at
>     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>
> Average boot time with "-cpu host"
>  qemu_init_end: 76.408950
>  linux_start_kernel: 116.166142 (+39.757192)
>  linux_start_user: 242.954347 (+126.788205)
>
> Average boot time with default "cpu"
>  qemu_init_end: 77.467852
>  linux_start_kernel: 116.688472 (+39.22062)
>  linux_start_user: 363.033365 (+246.344893)

\o/

>  2. The other 130ms are a direct result of PCI and ACPI presence (tested
>     with a kernel without support for those elements). I'll publish some
>     detailed numbers next week.

Here are the Kata Containers kernel parameters:

var kernelParams = []Param{
        {"tsc", "reliable"},
        {"no_timer_check", ""},
        {"rcupdate.rcu_expedited", "1"},
        {"i8042.direct", "1"},
        {"i8042.dumbkbd", "1"},
        {"i8042.nopnp", "1"},
        {"i8042.noaux", "1"},
        {"noreplace-smp", ""},
        {"reboot", "k"},
        {"console", "hvc0"},
        {"console", "hvc1"},
        {"iommu", "off"},
        {"cryptomgr.notests", ""},
        {"net.ifnames", "0"},
        {"pci", "lastbus=0"},
}

pci lastbus=0 looks interesting and so do some of the others :).

Stefan

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Montes, Julio 4 years, 9 months ago
On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > > 
> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >  --------------
> > > >  | Conclusion |
> > > >  --------------
> > > > 
> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > > 363ms),
> > > > and is smaller on all sections (QEMU initialization, firmware
> > > > overhead
> > > > and kernel start-to-user).
> > > > 
> > > > Microvm's memory tree is also visibly simpler, significantly
> > > > reducing
> > > > the exposed surface to the guest.
> > > > 
> > > > While we can certainly work on making Q35 smaller, I definitely
> > > > think
> > > > it's better (and way safer!) having a specialized machine type
> > > > for a
> > > > specific use case, than a minimal Q35 whose behavior
> > > > significantly
> > > > diverges from a conventional Q35.
> > > 
> > > Interesting, so not a 10x difference!  This might be amenable to
> > > optimization.
> > > 
> > > My concern with microvm is that it's so limited that few users
> > > will be
> > > able to benefit from the reduced attack surface and faster
> > > startup time.
> > > I think it's worth investigating slimming down Q35 further first.
> > > 
> > > In terms of startup time the first step would be profiling Q35
> > > kernel
> > > startup to find out what's taking so long (firmware
> > > initialization, PCI
> > > probing, etc)?
> > 
> > Some findings:
> > 
> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > saves a
> >     whooping 120ms by avoiding the APIC timer calibration at
> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > 
> > Average boot time with "-cpu host"
> >  qemu_init_end: 76.408950
> >  linux_start_kernel: 116.166142 (+39.757192)
> >  linux_start_user: 242.954347 (+126.788205)
> > 
> > Average boot time with default "cpu"
> >  qemu_init_end: 77.467852
> >  linux_start_kernel: 116.688472 (+39.22062)
> >  linux_start_user: 363.033365 (+246.344893)
> 
> \o/
> 
> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > (tested
> >     with a kernel without support for those elements). I'll publish
> > some
> >     detailed numbers next week.
> 
> Here are the Kata Containers kernel parameters:
> 
> var kernelParams = []Param{
>         {"tsc", "reliable"},
>         {"no_timer_check", ""},
>         {"rcupdate.rcu_expedited", "1"},
>         {"i8042.direct", "1"},
>         {"i8042.dumbkbd", "1"},
>         {"i8042.nopnp", "1"},
>         {"i8042.noaux", "1"},
>         {"noreplace-smp", ""},
>         {"reboot", "k"},
>         {"console", "hvc0"},
>         {"console", "hvc1"},
>         {"iommu", "off"},
>         {"cryptomgr.notests", ""},
>         {"net.ifnames", "0"},
>         {"pci", "lastbus=0"},
> }
> 
> pci lastbus=0 looks interesting and so do some of the others :).
> 

yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
kernel won't scan the 255.. buses :)

> Stefan
> 
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Montes, Julio <julio.montes@intel.com> writes:

> On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
>> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
>> > Stefan Hajnoczi <stefanha@gmail.com> writes:
>> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
>> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
>> > > > 
>> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
>> > > >  --------------
>> > > >  | Conclusion |
>> > > >  --------------
>> > > > 
>> > > > The average boot time of microvm is a third of Q35's (115ms vs.
>> > > > 363ms),
>> > > > and is smaller on all sections (QEMU initialization, firmware
>> > > > overhead
>> > > > and kernel start-to-user).
>> > > > 
>> > > > Microvm's memory tree is also visibly simpler, significantly
>> > > > reducing
>> > > > the exposed surface to the guest.
>> > > > 
>> > > > While we can certainly work on making Q35 smaller, I definitely
>> > > > think
>> > > > it's better (and way safer!) having a specialized machine type
>> > > > for a
>> > > > specific use case, than a minimal Q35 whose behavior
>> > > > significantly
>> > > > diverges from a conventional Q35.
>> > > 
>> > > Interesting, so not a 10x difference!  This might be amenable to
>> > > optimization.
>> > > 
>> > > My concern with microvm is that it's so limited that few users
>> > > will be
>> > > able to benefit from the reduced attack surface and faster
>> > > startup time.
>> > > I think it's worth investigating slimming down Q35 further first.
>> > > 
>> > > In terms of startup time the first step would be profiling Q35
>> > > kernel
>> > > startup to find out what's taking so long (firmware
>> > > initialization, PCI
>> > > probing, etc)?
>> > 
>> > Some findings:
>> > 
>> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
>> > saves a
>> >     whooping 120ms by avoiding the APIC timer calibration at
>> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
>> > 
>> > Average boot time with "-cpu host"
>> >  qemu_init_end: 76.408950
>> >  linux_start_kernel: 116.166142 (+39.757192)
>> >  linux_start_user: 242.954347 (+126.788205)
>> > 
>> > Average boot time with default "cpu"
>> >  qemu_init_end: 77.467852
>> >  linux_start_kernel: 116.688472 (+39.22062)
>> >  linux_start_user: 363.033365 (+246.344893)
>> 
>> \o/
>> 
>> >  2. The other 130ms are a direct result of PCI and ACPI presence
>> > (tested
>> >     with a kernel without support for those elements). I'll publish
>> > some
>> >     detailed numbers next week.
>> 
>> Here are the Kata Containers kernel parameters:
>> 
>> var kernelParams = []Param{
>>         {"tsc", "reliable"},
>>         {"no_timer_check", ""},
>>         {"rcupdate.rcu_expedited", "1"},
>>         {"i8042.direct", "1"},
>>         {"i8042.dumbkbd", "1"},
>>         {"i8042.nopnp", "1"},
>>         {"i8042.noaux", "1"},
>>         {"noreplace-smp", ""},
>>         {"reboot", "k"},
>>         {"console", "hvc0"},
>>         {"console", "hvc1"},
>>         {"iommu", "off"},
>>         {"cryptomgr.notests", ""},
>>         {"net.ifnames", "0"},
>>         {"pci", "lastbus=0"},
>> }
>> 
>> pci lastbus=0 looks interesting and so do some of the others :).
>> 
>
> yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> kernel won't scan the 255.. buses :)

I can confirm that adding pci=lastbus=0 makes a significant
improvement. In fact, is the only option from Kata's kernel parameter
list that has an impact, probably because the kernel is already quite
minimalistic.

Average boot time with "-cpu host" and "pci=lastbus=0"
 qemu_init_end: 73.711569
 linux_start_kernel: 113.414311 (+39.702742)
 linux_start_user: 190.949939 (+77.535628)

That's still ~40% slower than microvm, and the breach quickly widens
when adding more PCI devices (each one adds 10-15ms), but it's certainly
an improvement over the original numbers.

On the other hand, there isn't much we can do here from QEMU's
perspective, as this is basically Guest OS tuning.

Sergio.
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 9 months ago
On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> Montes, Julio <julio.montes@intel.com> writes:
>
> > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >> > > >
> >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> >> > > >  --------------
> >> > > >  | Conclusion |
> >> > > >  --------------
> >> > > >
> >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> >> > > > 363ms),
> >> > > > and is smaller on all sections (QEMU initialization, firmware
> >> > > > overhead
> >> > > > and kernel start-to-user).
> >> > > >
> >> > > > Microvm's memory tree is also visibly simpler, significantly
> >> > > > reducing
> >> > > > the exposed surface to the guest.
> >> > > >
> >> > > > While we can certainly work on making Q35 smaller, I definitely
> >> > > > think
> >> > > > it's better (and way safer!) having a specialized machine type
> >> > > > for a
> >> > > > specific use case, than a minimal Q35 whose behavior
> >> > > > significantly
> >> > > > diverges from a conventional Q35.
> >> > >
> >> > > Interesting, so not a 10x difference!  This might be amenable to
> >> > > optimization.
> >> > >
> >> > > My concern with microvm is that it's so limited that few users
> >> > > will be
> >> > > able to benefit from the reduced attack surface and faster
> >> > > startup time.
> >> > > I think it's worth investigating slimming down Q35 further first.
> >> > >
> >> > > In terms of startup time the first step would be profiling Q35
> >> > > kernel
> >> > > startup to find out what's taking so long (firmware
> >> > > initialization, PCI
> >> > > probing, etc)?
> >> >
> >> > Some findings:
> >> >
> >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> >> > saves a
> >> >     whooping 120ms by avoiding the APIC timer calibration at
> >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> >> >
> >> > Average boot time with "-cpu host"
> >> >  qemu_init_end: 76.408950
> >> >  linux_start_kernel: 116.166142 (+39.757192)
> >> >  linux_start_user: 242.954347 (+126.788205)
> >> >
> >> > Average boot time with default "cpu"
> >> >  qemu_init_end: 77.467852
> >> >  linux_start_kernel: 116.688472 (+39.22062)
> >> >  linux_start_user: 363.033365 (+246.344893)
> >>
> >> \o/
> >>
> >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> >> > (tested
> >> >     with a kernel without support for those elements). I'll publish
> >> > some
> >> >     detailed numbers next week.
> >>
> >> Here are the Kata Containers kernel parameters:
> >>
> >> var kernelParams = []Param{
> >>         {"tsc", "reliable"},
> >>         {"no_timer_check", ""},
> >>         {"rcupdate.rcu_expedited", "1"},
> >>         {"i8042.direct", "1"},
> >>         {"i8042.dumbkbd", "1"},
> >>         {"i8042.nopnp", "1"},
> >>         {"i8042.noaux", "1"},
> >>         {"noreplace-smp", ""},
> >>         {"reboot", "k"},
> >>         {"console", "hvc0"},
> >>         {"console", "hvc1"},
> >>         {"iommu", "off"},
> >>         {"cryptomgr.notests", ""},
> >>         {"net.ifnames", "0"},
> >>         {"pci", "lastbus=0"},
> >> }
> >>
> >> pci lastbus=0 looks interesting and so do some of the others :).
> >>
> >
> > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > kernel won't scan the 255.. buses :)
>
> I can confirm that adding pci=lastbus=0 makes a significant
> improvement. In fact, is the only option from Kata's kernel parameter
> list that has an impact, probably because the kernel is already quite
> minimalistic.
>
> Average boot time with "-cpu host" and "pci=lastbus=0"
>  qemu_init_end: 73.711569
>  linux_start_kernel: 113.414311 (+39.702742)
>  linux_start_user: 190.949939 (+77.535628)
>
> That's still ~40% slower than microvm, and the breach quickly widens
> when adding more PCI devices (each one adds 10-15ms), but it's certainly
> an improvement over the original numbers.
>
> On the other hand, there isn't much we can do here from QEMU's
> perspective, as this is basically Guest OS tuning.

fw_cfg could expose this information so guest kernels know when to
stop enumerating the PCI bus.  This would make all PCI guests with new
kernels boot ~50 ms faster, regardless of machine type.

The difference between microvm and tuned Q35 is 76 ms now.

microvm:
qemu_init_end: 64.043264
linux_start_kernel: 65.481782 (+1.438518)
linux_start_user: 114.938353 (+49.456571)

Q35 with -cpu host and pci=lasbus=0:
qemu_init_end: 73.711569
linux_start_kernel: 113.414311 (+39.702742)
linux_start_user: 190.949939 (+77.535628)

There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
loading the PVH Option ROM.

Stefano: any recommendations for profiling or tuning SeaBIOS?

Stefan

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 23/07/19 11:47, Stefan Hajnoczi wrote:
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.

The number of buses is determined by the firmware, not by QEMU, so
fw_cfg would not be the right interface.  In fact (as I have just
learnt) lastbus is an x86-specific option that overrides the last bus
returned by SeaBIOS's handle_1ab101.

So the next step could be to figure out what is the lastbus returned by
handle_1ab101 and possibly why it isn't zero.

Paolo

> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 (+39.702742)
> linux_start_user: 190.949939 (+77.535628)
> 
> There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> loading the PVH Option ROM.
> 
> Stefano: any recommendations for profiling or tuning SeaBIOS?


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 23/07/19 12:01, Paolo Bonzini wrote:
> The number of buses is determined by the firmware, not by QEMU, so
> fw_cfg would not be the right interface.  In fact (as I have just
> learnt) lastbus is an x86-specific option that overrides the last bus
> returned by SeaBIOS's handle_1ab101.
> 
> So the next step could be to figure out what is the lastbus returned by
> handle_1ab101 and possibly why it isn't zero.

Some update:

- for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
work on 32-bit kernels with ACPI disabled, because they are located beyond
pcibios_last_bus (with ACPI enabled, the DSDT exposes them).

- for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.

- for -M q35, pcibios_last_bus in Linux is set based on the size of the 
MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
for buses above 0.

Here is a patch that only scans devfn==0, which should mostly remove the need
for pci=lastbus=0.  (Testing is welcome).

Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
buses we expect are from PCI expander bridges and if you found an MMCONFIG area
through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
However, I am being conservative.

A possible alternative could be a mechanism whereby the vmlinuz real mode entry
point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
kernel via the vmlinuz or PVH boot information structs.  However, I don't think
that's very useful, and there is some risk of breaking real hardware too.

Paolo

diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index 73bb404f4d2a..17012aa60d22 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -61,6 +61,7 @@ enum pci_bf_sort_state {
 extern struct pci_ops pci_root_ops;
 
 void pcibios_scan_specific_bus(int busn);
+void pcibios_scan_bus_by_device(int busn);
 
 /* pci-irq.c */
 
@@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
 # endif
 # define x86_default_pci_init_irq	pcibios_irq_init
 # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
+# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
 #else
 # define x86_default_pci_init		NULL
 # define x86_default_pci_init_irq	NULL
 # define x86_default_pci_fixup_irqs	NULL
+# define x86_default_pci_scan_bus      NULL
 #endif
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index b85a7c54c6a1..4c3a0a17a600 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -251,6 +251,7 @@ struct x86_hyper_runtime {
  * @save_sched_clock_state:	save state for sched_clock() on suspend
  * @restore_sched_clock_state:	restore state for sched_clock() on resume
  * @apic_post_init:		adjust apic if needed
+ * @pci_scan_bus:		scan a PCI bus
  * @legacy:			legacy features
  * @set_legacy_features:	override legacy features. Use of this callback
  * 				is highly discouraged. You should only need
@@ -273,6 +274,7 @@ struct x86_platform_ops {
 	void (*save_sched_clock_state)(void);
 	void (*restore_sched_clock_state)(void);
 	void (*apic_post_init)(void);
+	void (*pci_scan_bus)(int busn);
 	struct x86_legacy_features legacy;
 	void (*set_legacy_features)(void);
 	struct x86_hyper_runtime hyper;
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index 6857b4577f17..b248d7036dd3 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -11,12 +11,14 @@
 #include <linux/acpi_pmtmr.h>
 #include <linux/kernel.h>
 #include <linux/reboot.h>
+#include <linux/pci.h>
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hypervisor.h>
 #include <asm/i8259.h>
 #include <asm/irqdomain.h>
 #include <asm/pci_x86.h>
+#include <asm/pci.h>
 #include <asm/reboot.h>
 #include <asm/setup.h>
 #include <asm/jailhouse_para.h>
@@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
 	return 0;
 }
 
+static void jailhouse_pci_scan_bus_by_function(int busn)
+{
+        int devfn;
+        u32 l;
+
+        for (devfn = 0; devfn < 256; devfn++) {
+                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
+                    l != 0x0000 && l != 0xffff) {
+                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
+                        pr_info("PCI: Discovered peer bus %02x\n", busn);
+                        pcibios_scan_root(busn);
+                        return;
+                }
+        }
+}
+
 static void __init jailhouse_init_platform(void)
 {
 	u64 pa_data = boot_params.hdr.setup_data;
@@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
 	x86_platform.legacy.rtc		= 0;
 	x86_platform.legacy.warm_reset	= 0;
 	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
+	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
 
 	legacy_pic			= &null_legacy_pic;
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 82caf01b63dd..59f7204ed8f3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -24,6 +24,7 @@
 #include <linux/debugfs.h>
 #include <linux/nmi.h>
 #include <linux/swait.h>
+#include <linux/pci.h>
 #include <asm/timer.h>
 #include <asm/cpu.h>
 #include <asm/traps.h>
@@ -33,6 +34,7 @@
 #include <asm/apicdef.h>
 #include <asm/hypervisor.h>
 #include <asm/tlb.h>
+#include <asm/pci.h>
 
 static int kvmapf = 1;
 
@@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
 	native_flush_tlb_others(flushmask, info);
 }
 
+#ifdef CONFIG_PCI
+static void kvm_pci_scan_bus(int busn)
+{
+        u32 l;
+
+	/*
+	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
+	 * have a host bridge at device 0, function 0.
+	 */
+	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
+	    l != 0x0000 && l != 0xffff) {
+		pr_info("PCI: Discovered peer bus %02x\n", busn);
+		pcibios_scan_root(busn);
+        }
+}
+#endif
+
 static void __init kvm_guest_init(void)
 {
 	int i;
 
+#ifdef CONFIG_PCI
+	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
+#endif
+
 	if (!kvm_para_available())
 		return;
 
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 50a2b492fdd6..19e1cc2cb6e0 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
 	.get_nmi_reason			= default_get_nmi_reason,
 	.save_sched_clock_state 	= tsc_save_sched_clock_state,
 	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
+	.pci_scan_bus			= x86_default_pci_scan_bus,
 	.hyper.pin_vcpu			= x86_op_int_noop,
 };
 
diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
index 467311b1eeea..6214dbce26d3 100644
--- a/arch/x86/pci/legacy.c
+++ b/arch/x86/pci/legacy.c
@@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
 
 void pcibios_scan_specific_bus(int busn)
 {
-	int stride = jailhouse_paravirt() ? 1 : 8;
-	int devfn;
-	u32 l;
-
 	if (pci_find_bus(0, busn))
 		return;
 
-	for (devfn = 0; devfn < 256; devfn += stride) {
+	x86_platform.pci_scan_bus(busn);
+}
+EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
+
+void pcibios_scan_bus_by_device(int busn)
+{
+	int devfn;
+	u32 l;
+
+	for (devfn = 0; devfn < 256; devfn += 8) {
 		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
 		    l != 0x0000 && l != 0xffff) {
 			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
@@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
 		}
 	}
 }
-EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
 
 static int __init pci_subsys_init(void)
 {

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 23/07/19 12:01, Paolo Bonzini wrote:
>> The number of buses is determined by the firmware, not by QEMU, so
>> fw_cfg would not be the right interface.  In fact (as I have just
>> learnt) lastbus is an x86-specific option that overrides the last bus
>> returned by SeaBIOS's handle_1ab101.
>> 
>> So the next step could be to figure out what is the lastbus returned by
>> handle_1ab101 and possibly why it isn't zero.
>
> Some update:
>
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
>
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
>
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
>
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

I just gave it a try. These are the results (avg on 10 consecutive runs):

 - Unpatched kernel:

Avg
 qemu_init_end: 75.207386
 linux_start_kernel: 115.056767 (+39.849381)
 linux_start_user: 241.020113 (+125.963346)

 - Unpatched kernel with pci=lastbus=0:

Avg
 qemu_init_end: 75.468282
 linux_start_kernel: 115.189322 (+39.72104)
 linux_start_user: 192.404823 (+77.215501)

 - Patched kernel (without pci=lastbus=0):

Avg
 qemu_init_end: 75.605627
 linux_start_kernel: 115.656557 (+40.05093)
 linux_start_user: 192.857655 (+77.201098)

Looks fine to me. There must an extra cost in the patched kernel
vs. using pci=lastbus=0, but it's so low that's hard to catch on the
average numbers.

> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
>
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
>
> Paolo
>
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).
> 
> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  

Shouldn't this happen after kvm_para_available?
In fact, let's add a CPU ID flag for this, so it's
easy to tell guest whether to scan extra buses.
What do you say?

> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 12:03, Michael S. Tsirkin wrote:
>> +#ifdef CONFIG_PCI
>> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
>> +#endif
>> +
>>  	if (!kvm_para_available())
>>  		return;
>>  
> Shouldn't this happen after kvm_para_available?

Actually kvm_para_available() is not needed anymore, since this only
runs after kvm_detect() has returned true.

> In fact, let's add a CPU ID flag for this, so it's
> easy to tell guest whether to scan extra buses.
> What do you say?

I think it would make it much harder to deploy this, since it relies on
having new userspace and new machine types.  This patch is basically a
reflection of the status quo, which is that there are generally no
"hidden" buses on commonly-used KVM userspaces, and even in the weird
configurations that have them there is always something at devfn=0.

(On real hardware, the only such hidden bus is e.g. 0x7f/0xff, which
have a bunch of QPI and MCH-related devices.  This is not something
you'd have in a virtual machine).

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote:
> On 23/07/19 12:01, Paolo Bonzini wrote:
> > The number of buses is determined by the firmware, not by QEMU, so
> > fw_cfg would not be the right interface.  In fact (as I have just
> > learnt) lastbus is an x86-specific option that overrides the last bus
> > returned by SeaBIOS's handle_1ab101.
> > 
> > So the next step could be to figure out what is the lastbus returned by
> > handle_1ab101 and possibly why it isn't zero.
> 
> Some update:
> 
> - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called.  PCIBIOS is
> only used by 32-bit kernels.  As a side effect, PCI expander bridges do not
> work on 32-bit kernels with ACPI disabled, because they are located beyond
> pcibios_last_bus (with ACPI enabled, the DSDT exposes them).
> 
> - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done.
> 
> - for -M q35, pcibios_last_bus in Linux is set based on the size of the 
> MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs 
> for buses above 0.
> 
> Here is a patch that only scans devfn==0, which should mostly remove the need
> for pci=lastbus=0.  (Testing is welcome).

Actually, I think I have a better idea.
At the moment we just get an exit on these reads and return all-ones.
Yes, in theory there could be a UR bit set in a bunch of
registers but in practice no one cares about these,
and I don't think we implement them.
So how about mapping a single page, read-only, and filling it
with all-ones?

We'll still run the code within linux but it will be free.

What do you think?


> Actually, KVM could probably avoid the scanning altogether.  The only "hidden" root
> buses we expect are from PCI expander bridges and if you found an MMCONFIG area
> through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges.
> However, I am being conservative.
> 
> A possible alternative could be a mechanism whereby the vmlinuz real mode entry
> point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the
> kernel via the vmlinuz or PVH boot information structs.  However, I don't think
> that's very useful, and there is some risk of breaking real hardware too.
> 
> Paolo
> 
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index 73bb404f4d2a..17012aa60d22 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -61,6 +61,7 @@ enum pci_bf_sort_state {
>  extern struct pci_ops pci_root_ops;
>  
>  void pcibios_scan_specific_bus(int busn);
> +void pcibios_scan_bus_by_device(int busn);
>  
>  /* pci-irq.c */
>  
> @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # endif
>  # define x86_default_pci_init_irq	pcibios_irq_init
>  # define x86_default_pci_fixup_irqs	pcibios_fixup_irqs
> +# define x86_default_pci_scan_bus	pcibios_scan_bus_by_device
>  #else
>  # define x86_default_pci_init		NULL
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
> +# define x86_default_pci_scan_bus      NULL
>  #endif
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index b85a7c54c6a1..4c3a0a17a600 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -251,6 +251,7 @@ struct x86_hyper_runtime {
>   * @save_sched_clock_state:	save state for sched_clock() on suspend
>   * @restore_sched_clock_state:	restore state for sched_clock() on resume
>   * @apic_post_init:		adjust apic if needed
> + * @pci_scan_bus:		scan a PCI bus
>   * @legacy:			legacy features
>   * @set_legacy_features:	override legacy features. Use of this callback
>   * 				is highly discouraged. You should only need
> @@ -273,6 +274,7 @@ struct x86_platform_ops {
>  	void (*save_sched_clock_state)(void);
>  	void (*restore_sched_clock_state)(void);
>  	void (*apic_post_init)(void);
> +	void (*pci_scan_bus)(int busn);
>  	struct x86_legacy_features legacy;
>  	void (*set_legacy_features)(void);
>  	struct x86_hyper_runtime hyper;
> diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
> index 6857b4577f17..b248d7036dd3 100644
> --- a/arch/x86/kernel/jailhouse.c
> +++ b/arch/x86/kernel/jailhouse.c
> @@ -11,12 +11,14 @@
>  #include <linux/acpi_pmtmr.h>
>  #include <linux/kernel.h>
>  #include <linux/reboot.h>
> +#include <linux/pci.h>
>  #include <asm/apic.h>
>  #include <asm/cpu.h>
>  #include <asm/hypervisor.h>
>  #include <asm/i8259.h>
>  #include <asm/irqdomain.h>
>  #include <asm/pci_x86.h>
> +#include <asm/pci.h>
>  #include <asm/reboot.h>
>  #include <asm/setup.h>
>  #include <asm/jailhouse_para.h>
> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void)
>  	return 0;
>  }
>  
> +static void jailhouse_pci_scan_bus_by_function(int busn)
> +{
> +        int devfn;
> +        u32 l;
> +
> +        for (devfn = 0; devfn < 256; devfn++) {
> +                if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
> +                    l != 0x0000 && l != 0xffff) {
> +                        DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> +                        pr_info("PCI: Discovered peer bus %02x\n", busn);
> +                        pcibios_scan_root(busn);
> +                        return;
> +                }
> +        }
> +}
> +
>  static void __init jailhouse_init_platform(void)
>  {
>  	u64 pa_data = boot_params.hdr.setup_data;
> @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void)
>  	x86_platform.legacy.rtc		= 0;
>  	x86_platform.legacy.warm_reset	= 0;
>  	x86_platform.legacy.i8042	= X86_LEGACY_I8042_PLATFORM_ABSENT;
> +	x86_platform.pci_scan_bus	= jailhouse_pci_scan_bus_by_function;
>  
>  	legacy_pic			= &null_legacy_pic;
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 82caf01b63dd..59f7204ed8f3 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -24,6 +24,7 @@
>  #include <linux/debugfs.h>
>  #include <linux/nmi.h>
>  #include <linux/swait.h>
> +#include <linux/pci.h>
>  #include <asm/timer.h>
>  #include <asm/cpu.h>
>  #include <asm/traps.h>
> @@ -33,6 +34,7 @@
>  #include <asm/apicdef.h>
>  #include <asm/hypervisor.h>
>  #include <asm/tlb.h>
> +#include <asm/pci.h>
>  
>  static int kvmapf = 1;
>  
> @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask,
>  	native_flush_tlb_others(flushmask, info);
>  }
>  
> +#ifdef CONFIG_PCI
> +static void kvm_pci_scan_bus(int busn)
> +{
> +        u32 l;
> +
> +	/*
> +	 * Assume that there are no "hidden" buses, i.e. all PCI root buses
> +	 * have a host bridge at device 0, function 0.
> +	 */
> +	if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) &&
> +	    l != 0x0000 && l != 0xffff) {
> +		pr_info("PCI: Discovered peer bus %02x\n", busn);
> +		pcibios_scan_root(busn);
> +        }
> +}
> +#endif
> +
>  static void __init kvm_guest_init(void)
>  {
>  	int i;
>  
> +#ifdef CONFIG_PCI
> +	x86_platform.pci_scan_bus = kvm_pci_scan_bus;
> +#endif
> +
>  	if (!kvm_para_available())
>  		return;
>  
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 50a2b492fdd6..19e1cc2cb6e0 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
>  	.get_nmi_reason			= default_get_nmi_reason,
>  	.save_sched_clock_state 	= tsc_save_sched_clock_state,
>  	.restore_sched_clock_state 	= tsc_restore_sched_clock_state,
> +	.pci_scan_bus			= x86_default_pci_scan_bus,
>  	.hyper.pin_vcpu			= x86_op_int_noop,
>  };
>  
> diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c
> index 467311b1eeea..6214dbce26d3 100644
> --- a/arch/x86/pci/legacy.c
> +++ b/arch/x86/pci/legacy.c
> @@ -36,14 +36,19 @@ int __init pci_legacy_init(void)
>  
>  void pcibios_scan_specific_bus(int busn)
>  {
> -	int stride = jailhouse_paravirt() ? 1 : 8;
> -	int devfn;
> -	u32 l;
> -
>  	if (pci_find_bus(0, busn))
>  		return;
>  
> -	for (devfn = 0; devfn < 256; devfn += stride) {
> +	x86_platform.pci_scan_bus(busn);
> +}
> +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
> +
> +void pcibios_scan_bus_by_device(int busn)
> +{
> +	int devfn;
> +	u32 l;
> +
> +	for (devfn = 0; devfn < 256; devfn += 8) {
>  		if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) &&
>  		    l != 0x0000 && l != 0xffff) {
>  			DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l);
> @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn)
>  		}
>  	}
>  }
> -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus);
>  
>  static int __init pci_subsys_init(void)
>  {

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 16:46, Michael S. Tsirkin wrote:
> Actually, I think I have a better idea.
> At the moment we just get an exit on these reads and return all-ones.
> Yes, in theory there could be a UR bit set in a bunch of
> registers but in practice no one cares about these,
> and I don't think we implement them.
> So how about mapping a single page, read-only, and filling it
> with all-ones?

Yes, that's nice indeed. :)  But it does have some cost, in terms of
either number of VMAs or QEMU RSS since the MMCONFIG area is large.

What breaks if we return all zeroes?  Zero is not a valid vendor ID.

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

It isn't but that's not what baremetal does. So there's some risk
there ...

Why is all zeroes better? We still need to map it, right?

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> > Actually, I think I have a better idea.
> > At the moment we just get an exit on these reads and return all-ones.
> > Yes, in theory there could be a UR bit set in a bunch of
> > registers but in practice no one cares about these,
> > and I don't think we implement them.
> > So how about mapping a single page, read-only, and filling it
> > with all-ones?
> 
> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> 
> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> 
> Paolo

I think I know what you are thinking of doing:
map /dev/zero so we get a single VMA but all mapped to
a single zero pte?

We could start with that, at least as an experiment.
Further:

- we can limit the amount of fragmentation and simply
  unmap everything if we exceed a specific limit:
  with more than X devices it's no longer a lightweight
  VM anyway :)

- we can implement /dev/ones. in fact, we can implement
  /dev/byteXX for each possible value, the cost will
  be only 1M on a 4k page system.
  it might come in handy for e.g. free page hinting:
  at the moment if guest memory is poisoned
  we can not unmap it, with this trick we can
  map it to /dev/byteXX.

Note that the kvm memory array is still fragmented.
Again, we can fallback on disabling the optimization
if there are too many devices.


-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 22:30, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
>> On 25/07/19 16:46, Michael S. Tsirkin wrote:
>>> Actually, I think I have a better idea.
>>> At the moment we just get an exit on these reads and return all-ones.
>>> Yes, in theory there could be a UR bit set in a bunch of
>>> registers but in practice no one cares about these,
>>> and I don't think we implement them.
>>> So how about mapping a single page, read-only, and filling it
>>> with all-ones?
>>
>> Yes, that's nice indeed. :)  But it does have some cost, in terms of
>> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
>>
>> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
>>
>> Paolo
> 
> I think I know what you are thinking of doing:
> map /dev/zero so we get a single VMA but all mapped to
> a single zero pte?

Yes, exactly.  You absolutely need to share the page because the guest
could easily touch 32*256 pages just to scan function 0 on every bus and
device, even if the VM has just 4 or 5 devices and all of them on the
root complex.  And that causes fragmentation so you have to map bigger
areas.

> - we can implement /dev/ones. in fact, we can implement
>   /dev/byteXX for each possible value, the cost will
>   be only 1M on a 4k page system.
>   it might come in handy for e.g. free page hinting:
>   at the moment if guest memory is poisoned
>   we can not unmap it, with this trick we can
>   map it to /dev/byteXX.

I also thought of /dev/ones, not sure how it would be accepted. :)  Also
you cannot map lazily on page fault, otherwise you get a vmexit and it's
slow again.  So /dev/ones needs to be written to use a huge page, possibly.

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Fri, Jul 26, 2019 at 09:57:51AM +0200, Paolo Bonzini wrote:
> On 25/07/19 22:30, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote:
> >> On 25/07/19 16:46, Michael S. Tsirkin wrote:
> >>> Actually, I think I have a better idea.
> >>> At the moment we just get an exit on these reads and return all-ones.
> >>> Yes, in theory there could be a UR bit set in a bunch of
> >>> registers but in practice no one cares about these,
> >>> and I don't think we implement them.
> >>> So how about mapping a single page, read-only, and filling it
> >>> with all-ones?
> >>
> >> Yes, that's nice indeed. :)  But it does have some cost, in terms of
> >> either number of VMAs or QEMU RSS since the MMCONFIG area is large.
> >>
> >> What breaks if we return all zeroes?  Zero is not a valid vendor ID.
> >>
> >> Paolo
> > 
> > I think I know what you are thinking of doing:
> > map /dev/zero so we get a single VMA but all mapped to
> > a single zero pte?
> 
> Yes, exactly.  You absolutely need to share the page because the guest
> could easily touch 32*256 pages just to scan function 0 on every bus and
> device, even if the VM has just 4 or 5 devices and all of them on the
> root complex.  And that causes fragmentation so you have to map bigger
> areas.
> 
> > - we can implement /dev/ones. in fact, we can implement
> >   /dev/byteXX for each possible value, the cost will
> >   be only 1M on a 4k page system.
> >   it might come in handy for e.g. free page hinting:
> >   at the moment if guest memory is poisoned
> >   we can not unmap it, with this trick we can
> >   map it to /dev/byteXX.
> 
> I also thought of /dev/ones, not sure how it would be accepted. :)  Also
> you cannot map lazily on page fault, otherwise you get a vmexit and it's
> slow again.  So /dev/ones needs to be written to use a huge page, possibly.
> 
> Paolo

It's not easy to do that - each device gets 4K within MCFG.

So what we need then is a kvm option to create an address range - or
maybe even a group of address ranges and aggressively map all pages in a
group to the same guest page on a fault of one page in the group.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefano Garzarella 4 years, 9 months ago
On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> > Montes, Julio <julio.montes@intel.com> writes:
> >
> > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > >> > > >
> > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > >> > > >  --------------
> > >> > > >  | Conclusion |
> > >> > > >  --------------
> > >> > > >
> > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > >> > > > 363ms),
> > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > >> > > > overhead
> > >> > > > and kernel start-to-user).
> > >> > > >
> > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > >> > > > reducing
> > >> > > > the exposed surface to the guest.
> > >> > > >
> > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > >> > > > think
> > >> > > > it's better (and way safer!) having a specialized machine type
> > >> > > > for a
> > >> > > > specific use case, than a minimal Q35 whose behavior
> > >> > > > significantly
> > >> > > > diverges from a conventional Q35.
> > >> > >
> > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > >> > > optimization.
> > >> > >
> > >> > > My concern with microvm is that it's so limited that few users
> > >> > > will be
> > >> > > able to benefit from the reduced attack surface and faster
> > >> > > startup time.
> > >> > > I think it's worth investigating slimming down Q35 further first.
> > >> > >
> > >> > > In terms of startup time the first step would be profiling Q35
> > >> > > kernel
> > >> > > startup to find out what's taking so long (firmware
> > >> > > initialization, PCI
> > >> > > probing, etc)?
> > >> >
> > >> > Some findings:
> > >> >
> > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > >> > saves a
> > >> >     whooping 120ms by avoiding the APIC timer calibration at
> > >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > >> >
> > >> > Average boot time with "-cpu host"
> > >> >  qemu_init_end: 76.408950
> > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > >> >  linux_start_user: 242.954347 (+126.788205)
> > >> >
> > >> > Average boot time with default "cpu"
> > >> >  qemu_init_end: 77.467852
> > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > >> >  linux_start_user: 363.033365 (+246.344893)
> > >>
> > >> \o/
> > >>
> > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > >> > (tested
> > >> >     with a kernel without support for those elements). I'll publish
> > >> > some
> > >> >     detailed numbers next week.
> > >>
> > >> Here are the Kata Containers kernel parameters:
> > >>
> > >> var kernelParams = []Param{
> > >>         {"tsc", "reliable"},
> > >>         {"no_timer_check", ""},
> > >>         {"rcupdate.rcu_expedited", "1"},
> > >>         {"i8042.direct", "1"},
> > >>         {"i8042.dumbkbd", "1"},
> > >>         {"i8042.nopnp", "1"},
> > >>         {"i8042.noaux", "1"},
> > >>         {"noreplace-smp", ""},
> > >>         {"reboot", "k"},
> > >>         {"console", "hvc0"},
> > >>         {"console", "hvc1"},
> > >>         {"iommu", "off"},
> > >>         {"cryptomgr.notests", ""},
> > >>         {"net.ifnames", "0"},
> > >>         {"pci", "lastbus=0"},
> > >> }
> > >>
> > >> pci lastbus=0 looks interesting and so do some of the others :).
> > >>
> > >
> > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > kernel won't scan the 255.. buses :)
> >
> > I can confirm that adding pci=lastbus=0 makes a significant
> > improvement. In fact, is the only option from Kata's kernel parameter
> > list that has an impact, probably because the kernel is already quite
> > minimalistic.
> >
> > Average boot time with "-cpu host" and "pci=lastbus=0"
> >  qemu_init_end: 73.711569
> >  linux_start_kernel: 113.414311 (+39.702742)
> >  linux_start_user: 190.949939 (+77.535628)
> >
> > That's still ~40% slower than microvm, and the breach quickly widens
> > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > an improvement over the original numbers.
> >
> > On the other hand, there isn't much we can do here from QEMU's
> > perspective, as this is basically Guest OS tuning.
> 
> fw_cfg could expose this information so guest kernels know when to
> stop enumerating the PCI bus.  This would make all PCI guests with new
> kernels boot ~50 ms faster, regardless of machine type.
> 
> The difference between microvm and tuned Q35 is 76 ms now.
> 
> microvm:
> qemu_init_end: 64.043264
> linux_start_kernel: 65.481782 (+1.438518)
> linux_start_user: 114.938353 (+49.456571)
> 
> Q35 with -cpu host and pci=lasbus=0:
> qemu_init_end: 73.711569
> linux_start_kernel: 113.414311 (+39.702742)
> linux_start_user: 190.949939 (+77.535628)
> 
> There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> loading the PVH Option ROM.
> 
> Stefano: any recommendations for profiling or tuning SeaBIOS?

As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't
include this patch (available in the upstream SeaBIOS) that saves ~10ms:

    commit 75b42835134553c96f113e5014072c0caf99d092
    Author: Stefano Garzarella <sgarzare@redhat.com>
    Date:   Sun Dec 2 14:10:13 2018 +0100

        qemu: avoid debug prints if debugcon is not enabled

        In order to speed up the boot phase, we can check the QEMU
        debugcon device, and disable the writes if it is not recognized.

        This patch allow us to save around 10 msec (time measured
        between SeaBIOS entry point and "linuxboot" entry point)
        when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled.

        Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
        Signed-off-by: Kevin O'Connor <kevin@koconnor.net>

As you said, we should update SeaBIOS for the next QEMU release.

For profiling, I have some patches that I used to put trace points in
the SeaBIOS code. I'll put them in this repository ASAP:
    https://github.com/stefano-garzarella/qemu-boot-time

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefano Garzarella 4 years, 9 months ago

On Tue, Jul 23, 2019 at 1:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote:
>
> On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote:
> > > Montes, Julio <julio.montes@intel.com> writes:
> > >
> > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote:
> > > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote:
> > > >> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote:
> > > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > >> > > >
> > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote:
> > > >> > > >  --------------
> > > >> > > >  | Conclusion |
> > > >> > > >  --------------
> > > >> > > >
> > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs.
> > > >> > > > 363ms),
> > > >> > > > and is smaller on all sections (QEMU initialization, firmware
> > > >> > > > overhead
> > > >> > > > and kernel start-to-user).
> > > >> > > >
> > > >> > > > Microvm's memory tree is also visibly simpler, significantly
> > > >> > > > reducing
> > > >> > > > the exposed surface to the guest.
> > > >> > > >
> > > >> > > > While we can certainly work on making Q35 smaller, I definitely
> > > >> > > > think
> > > >> > > > it's better (and way safer!) having a specialized machine type
> > > >> > > > for a
> > > >> > > > specific use case, than a minimal Q35 whose behavior
> > > >> > > > significantly
> > > >> > > > diverges from a conventional Q35.
> > > >> > >
> > > >> > > Interesting, so not a 10x difference!  This might be amenable to
> > > >> > > optimization.
> > > >> > >
> > > >> > > My concern with microvm is that it's so limited that few users
> > > >> > > will be
> > > >> > > able to benefit from the reduced attack surface and faster
> > > >> > > startup time.
> > > >> > > I think it's worth investigating slimming down Q35 further first.
> > > >> > >
> > > >> > > In terms of startup time the first step would be profiling Q35
> > > >> > > kernel
> > > >> > > startup to find out what's taking so long (firmware
> > > >> > > initialization, PCI
> > > >> > > probing, etc)?
> > > >> >
> > > >> > Some findings:
> > > >> >
> > > >> >  1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host")
> > > >> > saves a
> > > >> >     whooping 120ms by avoiding the APIC timer calibration at
> > > >> >     arch/x86/kernel/apic/apic.c:calibrate_APIC_clock
> > > >> >
> > > >> > Average boot time with "-cpu host"
> > > >> >  qemu_init_end: 76.408950
> > > >> >  linux_start_kernel: 116.166142 (+39.757192)
> > > >> >  linux_start_user: 242.954347 (+126.788205)
> > > >> >
> > > >> > Average boot time with default "cpu"
> > > >> >  qemu_init_end: 77.467852
> > > >> >  linux_start_kernel: 116.688472 (+39.22062)
> > > >> >  linux_start_user: 363.033365 (+246.344893)
> > > >>
> > > >> \o/
> > > >>
> > > >> >  2. The other 130ms are a direct result of PCI and ACPI presence
> > > >> > (tested
> > > >> >     with a kernel without support for those elements). I'll publish
> > > >> > some
> > > >> >     detailed numbers next week.
> > > >>
> > > >> Here are the Kata Containers kernel parameters:
> > > >>
> > > >> var kernelParams = []Param{
> > > >>         {"tsc", "reliable"},
> > > >>         {"no_timer_check", ""},
> > > >>         {"rcupdate.rcu_expedited", "1"},
> > > >>         {"i8042.direct", "1"},
> > > >>         {"i8042.dumbkbd", "1"},
> > > >>         {"i8042.nopnp", "1"},
> > > >>         {"i8042.noaux", "1"},
> > > >>         {"noreplace-smp", ""},
> > > >>         {"reboot", "k"},
> > > >>         {"console", "hvc0"},
> > > >>         {"console", "hvc1"},
> > > >>         {"iommu", "off"},
> > > >>         {"cryptomgr.notests", ""},
> > > >>         {"net.ifnames", "0"},
> > > >>         {"pci", "lastbus=0"},
> > > >> }
> > > >>
> > > >> pci lastbus=0 looks interesting and so do some of the others :).
> > > >>
> > > >
> > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35,
> > > > kernel won't scan the 255.. buses :)
> > >
> > > I can confirm that adding pci=lastbus=0 makes a significant
> > > improvement. In fact, is the only option from Kata's kernel parameter
> > > list that has an impact, probably because the kernel is already quite
> > > minimalistic.
> > >
> > > Average boot time with "-cpu host" and "pci=lastbus=0"
> > >  qemu_init_end: 73.711569
> > >  linux_start_kernel: 113.414311 (+39.702742)
> > >  linux_start_user: 190.949939 (+77.535628)
> > >
> > > That's still ~40% slower than microvm, and the breach quickly widens
> > > when adding more PCI devices (each one adds 10-15ms), but it's certainly
> > > an improvement over the original numbers.
> > >
> > > On the other hand, there isn't much we can do here from QEMU's
> > > perspective, as this is basically Guest OS tuning.
> >
> > fw_cfg could expose this information so guest kernels know when to
> > stop enumerating the PCI bus.  This would make all PCI guests with new
> > kernels boot ~50 ms faster, regardless of machine type.
> >
> > The difference between microvm and tuned Q35 is 76 ms now.
> >
> > microvm:
> > qemu_init_end: 64.043264
> > linux_start_kernel: 65.481782 (+1.438518)
> > linux_start_user: 114.938353 (+49.456571)
> >
> > Q35 with -cpu host and pci=lasbus=0:
> > qemu_init_end: 73.711569
> > linux_start_kernel: 113.414311 (+39.702742)
> > linux_start_user: 190.949939 (+77.535628)
> >
> > There is a ~39 ms difference before linux_start_kernel.  SeaBIOS is
> > loading the PVH Option ROM.
> >
> > Stefano: any recommendations for profiling or tuning SeaBIOS?
>
> As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't
> include this patch (available in the upstream SeaBIOS) that saves ~10ms:
>
>     commit 75b42835134553c96f113e5014072c0caf99d092
>     Author: Stefano Garzarella <sgarzare@redhat.com>
>     Date:   Sun Dec 2 14:10:13 2018 +0100
>
>         qemu: avoid debug prints if debugcon is not enabled
>
>         In order to speed up the boot phase, we can check the QEMU
>         debugcon device, and disable the writes if it is not recognized.
>
>         This patch allow us to save around 10 msec (time measured
>         between SeaBIOS entry point and "linuxboot" entry point)
>         when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled.
>
>         Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
>         Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
>
> As you said, we should update SeaBIOS for the next QEMU release.
>
> For profiling, I have some patches that I used to put trace points in
> the SeaBIOS code. I'll put them in this repository ASAP:
>     https://github.com/stefano-garzarella/qemu-boot-time

I pushed QEMU (optionrom) and SeaBIOS patches in:
https://github.com/stefano-garzarella/qemu-boot-time
They can be useful for profiling.

Cheers,
Stefano

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by no-reply@patchew.org 4 years, 10 months ago
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/check-qlit -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qlit" 
==7808==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-coroutine" 
==7851==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==7851==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc0ad0a000; bottom 0x7fa44def8000; size: 0x0057bce12000 (376831025152)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-coroutine /basic/no-dangling-access
---
PASS 11 test-aio /aio/event/wait
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
==7866==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-aio /aio/timer/schedule
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
---
PASS 28 test-aio /aio-gsource/timer/schedule
PASS 13 fdc-test /x86_64/fdc/fuzz-registers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-aio-multithread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==7873==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ide-test" 
==7890==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-aio-multithread /aio/multi/schedule
PASS 1 ide-test /x86_64/ide/identify
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==7901==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
==7912==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==7918==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 4 ide-test /x86_64/ide/bmdma/trim
==7929==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 5 ide-test /x86_64/ide/bmdma/short_prdt
==7940==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-throttle" 
PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt
==7948==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
PASS 14 test-throttle /throttle/config/max
PASS 15 test-throttle /throttle/config/iops_size
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-thread-pool" 
==7951==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==7955==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 7 ide-test /x86_64/ide/bmdma/long_prdt
==8027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8027==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd45d06000; bottom 0x7f83e57fe000; size: 0x007960508000 (521306931200)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 8 ide-test /x86_64/ide/bmdma/no_busmaster
PASS 5 test-thread-pool /thread-pool/cancel
PASS 9 ide-test /x86_64/ide/flush/nodev
==8038==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 ide-test /x86_64/ide/flush/empty_drive
PASS 6 test-thread-pool /thread-pool/cancel-async
==8043==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-hbitmap" 
PASS 1 test-hbitmap /hbitmap/granularity
PASS 2 test-hbitmap /hbitmap/size/0
---
PASS 4 test-hbitmap /hbitmap/iter/empty
PASS 11 ide-test /x86_64/ide/flush/retry_pci
PASS 5 test-hbitmap /hbitmap/iter/partial
==8054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 test-hbitmap /hbitmap/iter/granularity
PASS 7 test-hbitmap /hbitmap/iter/iter_and_reset
PASS 8 test-hbitmap /hbitmap/get/all
---
PASS 14 test-hbitmap /hbitmap/set/twice
PASS 15 test-hbitmap /hbitmap/set/overlap
PASS 16 test-hbitmap /hbitmap/reset/empty
==8060==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 17 test-hbitmap /hbitmap/reset/general
PASS 13 ide-test /x86_64/ide/cdrom/pio
PASS 18 test-hbitmap /hbitmap/reset/all
---
PASS 28 test-hbitmap /hbitmap/truncate/shrink/medium
PASS 29 test-hbitmap /hbitmap/truncate/shrink/large
PASS 30 test-hbitmap /hbitmap/meta/zero
==8066==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 ide-test /x86_64/ide/cdrom/pio_large
==8072==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 ide-test /x86_64/ide/cdrom/dma
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ahci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ahci-test" 
==8086==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 31 test-hbitmap /hbitmap/meta/one
PASS 32 test-hbitmap /hbitmap/meta/byte
PASS 33 test-hbitmap /hbitmap/meta/word
PASS 1 ahci-test /x86_64/ahci/sanity
==8092==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 ahci-test /x86_64/ahci/pci_spec
PASS 34 test-hbitmap /hbitmap/meta/sector
PASS 35 test-hbitmap /hbitmap/serialize/align
==8098==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 ahci-test /x86_64/ahci/pci_enable
==8104==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 36 test-hbitmap /hbitmap/serialize/basic
PASS 37 test-hbitmap /hbitmap/serialize/part
PASS 38 test-hbitmap /hbitmap/serialize/zeroes
---
PASS 4 ahci-test /x86_64/ahci/hba_spec
PASS 43 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_4
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-drain -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-drain" 
==8113==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-drain /bdrv-drain/nested
PASS 2 test-bdrv-drain /bdrv-drain/multiparent
PASS 3 test-bdrv-drain /bdrv-drain/set_aio_context
---
PASS 20 test-bdrv-drain /bdrv-drain/iothread/drain_subtree
PASS 21 test-bdrv-drain /bdrv-drain/blockjob/drain_all
PASS 22 test-bdrv-drain /bdrv-drain/blockjob/drain
==8110==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 23 test-bdrv-drain /bdrv-drain/blockjob/drain_subtree
PASS 24 test-bdrv-drain /bdrv-drain/blockjob/error/drain_all
PASS 25 test-bdrv-drain /bdrv-drain/blockjob/error/drain
---
PASS 39 test-bdrv-drain /bdrv-drain/attach/drain
PASS 5 ahci-test /x86_64/ahci/hba_enable
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bdrv-graph-mod -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-graph-mod" 
==8159==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bdrv-graph-mod /bdrv-graph-mod/update-perm-tree
PASS 2 test-bdrv-graph-mod /bdrv-graph-mod/should-update-child
==8157==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob" 
==8168==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob /blockjob/ids
PASS 2 test-blockjob /blockjob/cancel/created
PASS 3 test-blockjob /blockjob/cancel/running
---
PASS 8 test-blockjob /blockjob/cancel/concluded
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-blockjob-txn -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob-txn" 
PASS 6 ahci-test /x86_64/ahci/identify
==8174==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-blockjob-txn /single/success
PASS 2 test-blockjob-txn /single/failure
PASS 3 test-blockjob-txn /single/cancel
---
PASS 6 test-blockjob-txn /pair/cancel
PASS 7 test-blockjob-txn /pair/fail-cancel-race
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-backend -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-backend" 
==8176==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8181==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-backend /block-backend/drain_aio_error
PASS 2 test-block-backend /block-backend/drain_all_aio_error
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-block-iothread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-iothread" 
PASS 7 ahci-test /x86_64/ahci/max
==8190==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-block-iothread /sync-op/pread
PASS 2 test-block-iothread /sync-op/pwrite
PASS 3 test-block-iothread /sync-op/load_vmstate
---
PASS 15 test-block-iothread /propagate/diamond
PASS 16 test-block-iothread /propagate/mirror
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-image-locking -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-image-locking" 
==8192==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8212==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-image-locking /image-locking/basic
PASS 2 test-image-locking /image-locking/set-perm-abort
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-x86-cpuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid" 
---
PASS 4 test-xbzrle /xbzrle/encode_decode_1_byte
PASS 5 test-xbzrle /xbzrle/encode_decode_overflow
PASS 8 ahci-test /x86_64/ahci/reset
==8228==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8228==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc98ab7000; bottom 0x7f6a659fe000; size: 0x0092330b9000 (627921620992)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 6 test-xbzrle /xbzrle/encode_decode
---
PASS 133 test-cutils /cutils/strtosz/erange
PASS 134 test-cutils /cutils/strtosz/metric
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-shift128 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-shift128" 
==8240==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-shift128 /host-utils/test_lshift
PASS 2 test-shift128 /host-utils/test_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-mul64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-mul64" 
==8240==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd869e8000; bottom 0x7f71117fe000; size: 0x008c751ea000 (603260362752)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 test-mul64 /host-utils/mulu64
---
PASS 9 test-int128 /int128/int128_gt
PASS 10 test-int128 /int128/int128_rshift
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/rcutorture -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="rcutorture" 
==8262==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8262==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fffd5dde000; bottom 0x7f7850bfe000; size: 0x0087851e0000 (582053920768)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 1 rcutorture /rcu/torture/1reader
PASS 11 ahci-test /x86_64/ahci/io/pio/lba28/simple/high
==8295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 rcutorture /rcu/torture/10readers
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-list -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-list" 
==8295==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6e62e000; bottom 0x7f1b1fbfe000; size: 0x00e34ea30000 (976276881408)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 12 ahci-test /x86_64/ahci/io/pio/lba28/double/zero
==8308==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-rcu-list /rcu/qlist/single-threaded
==8308==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc54a9f000; bottom 0x7f5c1bdfe000; size: 0x00a038ca1000 (688147533824)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-list /rcu/qlist/short-few
PASS 13 ahci-test /x86_64/ahci/io/pio/lba28/double/low
==8341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8341==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4b8b5000; bottom 0x7f782c7fe000; size: 0x00841f0b7000 (567456526336)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 14 ahci-test /x86_64/ahci/io/pio/lba28/double/high
==8347==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8347==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffeb2bc8000; bottom 0x7fd572124000; size: 0x002940aa4000 (177178558464)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 3 test-rcu-list /rcu/qlist/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-simpleq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-simpleq" 
PASS 15 ahci-test /x86_64/ahci/io/pio/lba28/long/zero
PASS 1 test-rcu-simpleq /rcu/qsimpleq/single-threaded
==8360==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8360==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc5ebf2000; bottom 0x7f8d6cdfe000; size: 0x006ef1df4000 (476504342528)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 2 test-rcu-simpleq /rcu/qsimpleq/short-few
PASS 16 ahci-test /x86_64/ahci/io/pio/lba28/long/low
==8393==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8393==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc1e90d000; bottom 0x7fef47124000; size: 0x000cd77e9000 (55155003392)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 17 ahci-test /x86_64/ahci/io/pio/lba28/long/high
PASS 3 test-rcu-simpleq /rcu/qsimpleq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-rcu-tailq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-tailq" 
==8399==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 ahci-test /x86_64/ahci/io/pio/lba28/short/zero
PASS 1 test-rcu-tailq /rcu/qtailq/single-threaded
==8412==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 test-rcu-tailq /rcu/qtailq/short-few
PASS 19 ahci-test /x86_64/ahci/io/pio/lba28/short/low
==8445==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 20 ahci-test /x86_64/ahci/io/pio/lba28/short/high
PASS 3 test-rcu-tailq /rcu/qtailq/long-many
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qdist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdist" 
==8451==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qdist /qdist/none
PASS 2 test-qdist /qdist/pr
PASS 3 test-qdist /qdist/single/empty
---
PASS 7 test-qdist /qdist/binning/expand
PASS 8 test-qdist /qdist/binning/shrink
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht" 
==8451==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcd1fb1000; bottom 0x7f8bae7fe000; size: 0x0071237b3000 (485926580224)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 21 ahci-test /x86_64/ahci/io/pio/lba48/simple/zero
==8466==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8466==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd0b06f000; bottom 0x7fd8d85fe000; size: 0x002432a71000 (155468632064)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 22 ahci-test /x86_64/ahci/io/pio/lba48/simple/low
==8472==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8472==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe2c664000; bottom 0x7f11299fe000; size: 0x00ed02c66000 (1017953804288)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 23 ahci-test /x86_64/ahci/io/pio/lba48/simple/high
==8478==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8478==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb1ded000; bottom 0x7f37fd1fe000; size: 0x00c5b4bef000 (849140969472)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 24 ahci-test /x86_64/ahci/io/pio/lba48/double/zero
==8484==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8484==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4f4ff000; bottom 0x7ff9595fe000; size: 0x0002f5f01000 (12716085248)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 25 ahci-test /x86_64/ahci/io/pio/lba48/double/low
==8490==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8490==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb07bb000; bottom 0x7ffbc8dfe000; size: 0x0001e79bd000 (8180715520)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 26 ahci-test /x86_64/ahci/io/pio/lba48/double/high
==8496==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8496==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff207e2000; bottom 0x7fb6ffdfe000; size: 0x0048209e4000 (309784887296)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 27 ahci-test /x86_64/ahci/io/pio/lba48/long/zero
==8502==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8502==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc7d92d000; bottom 0x7f0b65b7c000; size: 0x00f117db1000 (1035487350784)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 28 ahci-test /x86_64/ahci/io/pio/lba48/long/low
==8508==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8508==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6de73000; bottom 0x7fc79a9fe000; size: 0x0036d3475000 (235472900096)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 29 ahci-test /x86_64/ahci/io/pio/lba48/long/high
==8514==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 30 ahci-test /x86_64/ahci/io/pio/lba48/short/zero
==8520==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qht /qht/mode/default
PASS 31 ahci-test /x86_64/ahci/io/pio/lba48/short/low
PASS 2 test-qht /qht/mode/resize
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qht-par -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht-par" 
==8526==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 32 ahci-test /x86_64/ahci/io/pio/lba48/short/high
PASS 1 test-qht-par /qht/parallel/2threads-0%updates-1s
==8542==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 33 ahci-test /x86_64/ahci/io/dma/lba28/fragmented
PASS 2 test-qht-par /qht/parallel/2threads-20%updates-1s
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bitops -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitops" 
==8555==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-bitops /bitops/sextract32
PASS 2 test-bitops /bitops/sextract64
PASS 3 test-bitops /bitops/half_shuffle32
---
PASS 1 check-qom-interface /qom/interface/direct_impl
PASS 2 check-qom-interface /qom/interface/intermediate_impl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/check-qom-proplist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qom-proplist" 
==8580==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 check-qom-proplist /qom/proplist/createlist
PASS 2 check-qom-proplist /qom/proplist/createv
PASS 3 check-qom-proplist /qom/proplist/createcmdline
---
PASS 4 test-write-threshold /write-threshold/not-trigger
PASS 5 test-write-threshold /write-threshold/trigger
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-hash -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-hash" 
==8607==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-hash /crypto/hash/iov
PASS 2 test-crypto-hash /crypto/hash/alloc
PASS 3 test-crypto-hash /crypto/hash/prealloc
---
PASS 15 test-crypto-secret /crypto/secret/crypt/missingiv
PASS 16 test-crypto-secret /crypto/secret/crypt/badiv
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlscredsx509 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlscredsx509" 
==8630==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 37 ahci-test /x86_64/ahci/io/dma/lba28/simple/high
PASS 1 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectserver
PASS 2 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectclient
PASS 3 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca1
==8645==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca2
PASS 38 ahci-test /x86_64/ahci/io/dma/lba28/double/zero
PASS 5 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca3
PASS 6 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca1
PASS 7 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca2
PASS 8 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca3
==8651==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver1
PASS 39 ahci-test /x86_64/ahci/io/dma/lba28/double/low
==8657==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 40 ahci-test /x86_64/ahci/io/dma/lba28/double/high
PASS 10 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver2
==8663==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver3
PASS 41 ahci-test /x86_64/ahci/io/dma/lba28/long/zero
PASS 12 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver4
==8669==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 42 ahci-test /x86_64/ahci/io/dma/lba28/long/low
PASS 13 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver5
PASS 14 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver6
---
PASS 32 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive1
PASS 33 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive2
PASS 34 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive3
==8675==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 35 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain1
PASS 36 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain2
PASS 37 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingca
---
PASS 39 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingclient
PASS 43 ahci-test /x86_64/ahci/io/dma/lba28/long/high
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-tlssession -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlssession" 
==8682==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-tlssession /qcrypto/tlssession/psk
PASS 44 ahci-test /x86_64/ahci/io/dma/lba28/short/zero
PASS 2 test-crypto-tlssession /qcrypto/tlssession/basicca
==8692==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-crypto-tlssession /qcrypto/tlssession/differentca
PASS 45 ahci-test /x86_64/ahci/io/dma/lba28/short/low
==8698==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 test-crypto-tlssession /qcrypto/tlssession/altname1
PASS 46 ahci-test /x86_64/ahci/io/dma/lba28/short/high
==8704==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 test-crypto-tlssession /qcrypto/tlssession/altname2
PASS 47 ahci-test /x86_64/ahci/io/dma/lba48/simple/zero
PASS 6 test-crypto-tlssession /qcrypto/tlssession/altname3
PASS 7 test-crypto-tlssession /qcrypto/tlssession/altname4
==8710==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-crypto-tlssession /qcrypto/tlssession/altname5
PASS 48 ahci-test /x86_64/ahci/io/dma/lba48/simple/low
PASS 9 test-crypto-tlssession /qcrypto/tlssession/altname6
==8716==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 49 ahci-test /x86_64/ahci/io/dma/lba48/simple/high
PASS 10 test-crypto-tlssession /qcrypto/tlssession/wildcard1
==8722==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-crypto-tlssession /qcrypto/tlssession/wildcard2
PASS 12 test-crypto-tlssession /qcrypto/tlssession/wildcard3
PASS 50 ahci-test /x86_64/ahci/io/dma/lba48/double/zero
==8729==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 51 ahci-test /x86_64/ahci/io/dma/lba48/double/low
==8735==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 test-crypto-tlssession /qcrypto/tlssession/wildcard4
PASS 52 ahci-test /x86_64/ahci/io/dma/lba48/double/high
==8741==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 test-crypto-tlssession /qcrypto/tlssession/wildcard5
PASS 15 test-crypto-tlssession /qcrypto/tlssession/wildcard6
PASS 16 test-crypto-tlssession /qcrypto/tlssession/cachain
PASS 53 ahci-test /x86_64/ahci/io/dma/lba48/long/zero
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-qga -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qga" 
==8748==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-qga /qga/sync-delimited
PASS 2 test-qga /qga/sync
PASS 3 test-qga /qga/ping
---
PASS 16 test-qga /qga/invalid-args
PASS 17 test-qga /qga/fsfreeze-status
PASS 54 ahci-test /x86_64/ahci/io/dma/lba48/long/low
==8760==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 18 test-qga /qga/blacklist
PASS 19 test-qga /qga/config
PASS 20 test-qga /qga/guest-exec
PASS 21 test-qga /qga/guest-exec-invalid
PASS 55 ahci-test /x86_64/ahci/io/dma/lba48/long/high
==8773==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 22 test-qga /qga/guest-get-osinfo
PASS 23 test-qga /qga/guest-get-host-name
PASS 24 test-qga /qga/guest-get-timezone
---
PASS 56 ahci-test /x86_64/ahci/io/dma/lba48/short/zero
PASS 1 test-util-filemonitor /util/filemonitor
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-util-sockets -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-util-sockets" 
==8790==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-util-sockets /util/socket/is-socket/bad
PASS 2 test-util-sockets /util/socket/is-socket/good
PASS 3 test-util-sockets /socket/fd-pass/name/good
---
PASS 4 test-authz-listfile /auth/list/explicit/deny
PASS 5 test-authz-listfile /auth/list/explicit/allow
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-task -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-task" 
==8818==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-task /crypto/task/complete
PASS 2 test-io-task /crypto/task/datafree
PASS 3 test-io-task /crypto/task/failure
---
PASS 5 test-io-channel-file /io/channel/pipe/async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-tls -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-tls" 
PASS 58 ahci-test /x86_64/ahci/io/dma/lba48/short/high
==8885==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-io-channel-tls /qio/channel/tls/basic
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-io-channel-command -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-command" 
PASS 1 test-io-channel-command /io/channel/command/fifo/sync
---
PASS 17 test-crypto-pbkdf /crypto/pbkdf/nonrfc/sha384/iter1200
PASS 18 test-crypto-pbkdf /crypto/pbkdf/nonrfc/ripemd160/iter1200
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-crypto-ivgen -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-ivgen" 
==8906==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-crypto-ivgen /crypto/ivgen/plain/1
PASS 2 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c
PASS 3 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c5b6a7988
---
PASS 1 test-logging /logging/parse_range
PASS 2 test-logging /logging/parse_path
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-replication -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-replication" 
==8947==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8945==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 test-replication /replication/primary/read
PASS 2 test-replication /replication/primary/write
PASS 61 ahci-test /x86_64/ahci/flush/simple
==8956==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 test-replication /replication/primary/start
PASS 4 test-replication /replication/primary/stop
PASS 5 test-replication /replication/primary/do_checkpoint
PASS 6 test-replication /replication/primary/get_error_all
PASS 62 ahci-test /x86_64/ahci/flush/retry
==8962==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 test-replication /replication/secondary/read
==8967==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 test-replication /replication/secondary/write
PASS 63 ahci-test /x86_64/ahci/flush/migrate
==8976==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8981==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==8947==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc17022000; bottom 0x7fa4f2cfc000; size: 0x005724326000 (374269435904)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 9 test-replication /replication/secondary/start
PASS 64 ahci-test /x86_64/ahci/migrate/sanity
==9008==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9013==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 test-replication /replication/secondary/stop
PASS 65 ahci-test /x86_64/ahci/migrate/dma/simple
==9022==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 test-replication /replication/secondary/do_checkpoint
PASS 12 test-replication /replication/secondary/get_error_all
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  tests/test-bufferiszero -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bufferiszero" 
PASS 66 ahci-test /x86_64/ahci/migrate/dma/halted
==9040==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9045==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 67 ahci-test /x86_64/ahci/migrate/ncq/simple
==9054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9059==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 68 ahci-test /x86_64/ahci/migrate/ncq/halted
==9068==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 69 ahci-test /x86_64/ahci/cdrom/eject
==9073==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 70 ahci-test /x86_64/ahci/cdrom/dma/single
==9079==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 71 ahci-test /x86_64/ahci/cdrom/dma/multi
==9085==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 72 ahci-test /x86_64/ahci/cdrom/pio/single
==9091==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
==9091==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdd7f93000; bottom 0x7f75251fe000; size: 0x0088b2d95000 (587116138496)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 73 ahci-test /x86_64/ahci/cdrom/pio/multi
==9097==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 74 ahci-test /x86_64/ahci/cdrom/pio/bcl
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/hd-geo-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="hd-geo-test" 
PASS 1 hd-geo-test /x86_64/hd-geo/ide/none
==9111==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 hd-geo-test /x86_64/hd-geo/ide/drive/cd_0
==9117==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/blank
==9123==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/lba
==9129==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/chs
==9135==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 hd-geo-test /x86_64/hd-geo/ide/device/mbr/blank
==9141==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 hd-geo-test /x86_64/hd-geo/ide/device/mbr/lba
==9147==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 hd-geo-test /x86_64/hd-geo/ide/device/mbr/chs
==9153==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 hd-geo-test /x86_64/hd-geo/ide/device/user/chs
==9158==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 hd-geo-test /x86_64/hd-geo/ide/device/user/chst
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" 
PASS 1 test-bufferiszero /cutils/bufferiszero
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9243==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 bios-tables-test /x86_64/acpi/piix4
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9249==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 bios-tables-test /x86_64/acpi/q35
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9255==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 bios-tables-test /x86_64/acpi/piix4/bridge
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9261==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 bios-tables-test /x86_64/acpi/piix4/ipmi
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9267==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 5 bios-tables-test /x86_64/acpi/piix4/cpuhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9274==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 bios-tables-test /x86_64/acpi/piix4/memhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9280==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 bios-tables-test /x86_64/acpi/piix4/numamem
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9286==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 bios-tables-test /x86_64/acpi/piix4/dimmpxm
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 9 bios-tables-test /x86_64/acpi/q35/bridge
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9301==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 10 bios-tables-test /x86_64/acpi/q35/mmio64
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9307==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 11 bios-tables-test /x86_64/acpi/q35/ipmi
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9313==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 12 bios-tables-test /x86_64/acpi/q35/cpuhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9320==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 13 bios-tables-test /x86_64/acpi/q35/memhp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9326==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 14 bios-tables-test /x86_64/acpi/q35/numamem
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9332==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 15 bios-tables-test /x86_64/acpi/q35/dimmpxm
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-serial-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-serial-test" 
PASS 1 boot-serial-test /x86_64/boot-serial/isapc
---
PASS 1 i440fx-test /x86_64/i440fx/defaults
PASS 2 i440fx-test /x86_64/i440fx/pam
PASS 3 i440fx-test /x86_64/i440fx/firmware/bios
==9416==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 i440fx-test /x86_64/i440fx/firmware/pflash
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/fw_cfg-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="fw_cfg-test" 
PASS 1 fw_cfg-test /x86_64/fw_cfg/signature
---
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/drive_del-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="drive_del-test" 
PASS 1 drive_del-test /x86_64/drive_del/without-dev
PASS 2 drive_del-test /x86_64/drive_del/after_failed_device_add
==9504==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 drive_del-test /x86_64/blockdev/drive_del_device_del
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/wdt_ib700-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="wdt_ib700-test" 
PASS 1 wdt_ib700-test /x86_64/wdt_ib700/pause
---
PASS 1 usb-hcd-uhci-test /x86_64/uhci/pci/init
PASS 2 usb-hcd-uhci-test /x86_64/uhci/pci/port1
PASS 3 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug
==9699==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug/usb-storage
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/usb-hcd-xhci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="usb-hcd-xhci-test" 
PASS 1 usb-hcd-xhci-test /x86_64/xhci/pci/init
PASS 2 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug
==9708==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-uas
PASS 4 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-ccid
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/cpu-plug-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="cpu-plug-test" 
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9814==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 1 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9820==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid-auto
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9826==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 vmgenid-test /x86_64/vmgenid/vmgenid/query-monitor
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/tpm-crb-swtpm-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="tpm-crb-swtpm-test" 
SKIP 1 tpm-crb-swtpm-test /x86_64/tpm/crb-swtpm/test # SKIP swtpm not in PATH or missing --tpm2 support
---
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9931==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9936==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 3 migration-test /x86_64/migration/fd_proto
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9944==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9949==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 4 migration-test /x86_64/migration/postcopy/unix
PASS 5 migration-test /x86_64/migration/postcopy/recovery
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9979==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9984==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 6 migration-test /x86_64/migration/precopy/unix
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9993==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==9998==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 7 migration-test /x86_64/migration/precopy/tcp
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==10007==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
Could not access KVM kernel module: No such file or directory
qemu-system-x86_64: failed to initialize KVM: No such file or directory
qemu-system-x86_64: Back to tcg accelerator
==10012==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 8 migration-test /x86_64/migration/xbzrle/unix
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/test-x86-cpuid-compat -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid-compat" 
PASS 1 test-x86-cpuid-compat /x86/cpuid/parsing-plus-minus
---
PASS 6 numa-test /x86_64/numa/pc/dynamic/cpu
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/qmp-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="qmp-test" 
PASS 1 qmp-test /x86_64/qmp/protocol
==10341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases!
PASS 2 qmp-test /x86_64/qmp/oob
PASS 3 qmp-test /x86_64/qmp/preconfig
PASS 4 qmp-test /x86_64/qmp/missing-any-arg
---
PASS 5 device-introspect-test /x86_64/device/introspect/abstract-interfaces

=================================================================
==10589==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x561de4fecb2e in calloc (/tmp/qemu-test/build/x86_64-softmmu/qemu-system-x86_64+0x19fdb2e)
---

SUMMARY: AddressSanitizer: 64 byte(s) leaked in 2 allocation(s).
/tmp/qemu-test/src/tests/libqtest.c:137: kill_qemu() tried to terminate QEMU process but encountered exit status 1
ERROR - too few tests run (expected 6, got 5)
make: *** [/tmp/qemu-test/src/tests/Makefile.include:894: check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):


The full log is available at
http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Peter Maydell 4 years, 10 months ago
On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote:
>
> Microvm is a machine type inspired by both NEMU and Firecracker, and
> constructed after the machine model implemented by the latter.
>
> It's main purpose is providing users a KVM-only machine type with fast
> boot times, minimal attack surface (measured as the number of IO ports
> and MMIO regions exposed to the Guest) and small footprint (specially
> when combined with the ongoing QEMU modularization effort).
>
> Normally, other than the device support provided by KVM itself,
> microvm only supports virtio-mmio devices. Microvm also includes a
> legacy mode, which adds an ISA bus with a 16550A serial port, useful
> for being able to see the early boot kernel messages.

Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
a bit deprecated and tends not to support all the features that
virtio-pci does. It was introduced mostly as a stopgap while we
didn't have pci support in the aarch64 virt machine, and remains
for legacy "we don't like to break existing working setups" rather
than as a recommended config for new systems.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 10 months ago
Peter Maydell <peter.maydell@linaro.org> writes:

> On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote:
>>
>> Microvm is a machine type inspired by both NEMU and Firecracker, and
>> constructed after the machine model implemented by the latter.
>>
>> It's main purpose is providing users a KVM-only machine type with fast
>> boot times, minimal attack surface (measured as the number of IO ports
>> and MMIO regions exposed to the Guest) and small footprint (specially
>> when combined with the ongoing QEMU modularization effort).
>>
>> Normally, other than the device support provided by KVM itself,
>> microvm only supports virtio-mmio devices. Microvm also includes a
>> legacy mode, which adds an ISA bus with a 16550A serial port, useful
>> for being able to see the early boot kernel messages.
>
> Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> a bit deprecated and tends not to support all the features that
> virtio-pci does. It was introduced mostly as a stopgap while we
> didn't have pci support in the aarch64 virt machine, and remains
> for legacy "we don't like to break existing working setups" rather
> than as a recommended config for new systems.

Using virtio-pci implies keeping PCI and ACPI support, defeating a
significant part of microvm's purpose.

What are the issues with the current state of virtio-mmio? Is there a
way I can help to improve the situation?

Sergio.


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Peter Maydell 4 years, 10 months ago
On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > a bit deprecated and tends not to support all the features that
> > virtio-pci does. It was introduced mostly as a stopgap while we
> > didn't have pci support in the aarch64 virt machine, and remains
> > for legacy "we don't like to break existing working setups" rather
> > than as a recommended config for new systems.
>
> Using virtio-pci implies keeping PCI and ACPI support, defeating a
> significant part of microvm's purpose.
>
> What are the issues with the current state of virtio-mmio? Is there a
> way I can help to improve the situation?

Off the top of my head:
 * limitations on numbers of devices
 * no hotplug support
 * unlike PCI, it's not probeable, so you have to tell the
   guest where all the transports are using device tree or
   some similar mechanism
 * you need one IRQ line per transport, which restricts how
   many you can have
 * it's only virtio-0.9, it doesn't support any of the new
   virtio-1.0 functionality
 * it is broadly not really maintained in QEMU (and I think
   not really in the kernel either? not sure), because we'd
   rather not have to maintain two mechanisms for doing virtio
   when virtio-pci is clearly better than virtio-mmio

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 10 months ago
On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > a bit deprecated and tends not to support all the features that
> > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > didn't have pci support in the aarch64 virt machine, and remains
> > > for legacy "we don't like to break existing working setups" rather
> > > than as a recommended config for new systems.
> >
> > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > significant part of microvm's purpose.
> >
> > What are the issues with the current state of virtio-mmio? Is there a
> > way I can help to improve the situation?
> 
> Off the top of my head:
>  * limitations on numbers of devices
>  * no hotplug support
>  * unlike PCI, it's not probeable, so you have to tell the
>    guest where all the transports are using device tree or
>    some similar mechanism
>  * you need one IRQ line per transport, which restricts how
>    many you can have
>  * it's only virtio-0.9, it doesn't support any of the new
>    virtio-1.0 functionality
>  * it is broadly not really maintained in QEMU (and I think
>    not really in the kernel either? not sure), because we'd
>    rather not have to maintain two mechanisms for doing virtio
>    when virtio-pci is clearly better than virtio-mmio

Some of these are design issues, but others can be improved with a bit
of work.

As for the maintenance burden, I volunteer myself to help with that, so
it won't have an impact on other developers and/or projects.

Sergio.


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Wed, Jul 03, 2019 at 12:04:00AM +0200, Sergio Lopez wrote:
> On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote:
> > On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is
> > > > a bit deprecated and tends not to support all the features that
> > > > virtio-pci does. It was introduced mostly as a stopgap while we
> > > > didn't have pci support in the aarch64 virt machine, and remains
> > > > for legacy "we don't like to break existing working setups" rather
> > > > than as a recommended config for new systems.
> > >
> > > Using virtio-pci implies keeping PCI and ACPI support, defeating a
> > > significant part of microvm's purpose.
> > >
> > > What are the issues with the current state of virtio-mmio? Is there a
> > > way I can help to improve the situation?
> > 
> > Off the top of my head:
> >  * limitations on numbers of devices
> >  * no hotplug support
> >  * unlike PCI, it's not probeable, so you have to tell the
> >    guest where all the transports are using device tree or
> >    some similar mechanism
> >  * you need one IRQ line per transport, which restricts how
> >    many you can have
> >  * it's only virtio-0.9, it doesn't support any of the new
> >    virtio-1.0 functionality
> >  * it is broadly not really maintained in QEMU (and I think
> >    not really in the kernel either? not sure), because we'd
> >    rather not have to maintain two mechanisms for doing virtio
> >    when virtio-pci is clearly better than virtio-mmio
> 
> Some of these are design issues, but others can be improved with a bit
> of work.
> 
> As for the maintenance burden, I volunteer myself to help with that, so
> it won't have an impact on other developers and/or projects.
> 
> Sergio.

OK so please start with adding virtio 1 support. Guest bits
have been ready for years now.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Peter Maydell 4 years, 9 months ago
On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> OK so please start with adding virtio 1 support. Guest bits
> have been ready for years now.

I'd still rather we just used pci virtio. If pci isn't
fast enough at startup, do something to make it faster...

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > OK so please start with adding virtio 1 support. Guest bits
> > have been ready for years now.
> 
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...
> 
> thanks
> -- PMM

Oh that's putting microvm aside - if we have a maintainer for
virtio mmio that's great because it does need a maintainer,
and virtio 1 would be the thing to fix before adding features ;)

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Michael S. Tsirkin <mst@redhat.com> writes:

> On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote:
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > OK so please start with adding virtio 1 support. Guest bits
>> > have been ready for years now.
>> 
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
>> 
>> thanks
>> -- PMM
>
> Oh that's putting microvm aside - if we have a maintainer for
> virtio mmio that's great because it does need a maintainer,
> and virtio 1 would be the thing to fix before adding features ;)

There seems to be a general consensus that virtio-mmio needs some care,
and looking at the specs, implementing virtio-mmio v2/virtio v1
shouldn't be too time consuming, so I'm going to give it a try.

Cheers,
Sergio.
Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Peter Maydell <peter.maydell@linaro.org> writes:

> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>> OK so please start with adding virtio 1 support. Guest bits
>> have been ready for years now.
>
> I'd still rather we just used pci virtio. If pci isn't
> fast enough at startup, do something to make it faster...

Actually, removing PCI (and ACPI), is one of the main ways microvm has
to reduce not only boot time, but also the exposed surface and the
general footprint.

I think we need to discuss and settle whether using virtio-mmio (even if
maintained and upgraded to virtio 1) for a new machine type is
acceptable or not. Because if it isn't, we should probably just ditch
the whole microvm idea and move to something else.

Sergio.


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 12:42, Sergio Lopez wrote:
> 
> Peter Maydell <peter.maydell@linaro.org> writes:
> 
>> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
>>> OK so please start with adding virtio 1 support. Guest bits
>>> have been ready for years now.
>>
>> I'd still rather we just used pci virtio. If pci isn't
>> fast enough at startup, do something to make it faster...
> 
> Actually, removing PCI (and ACPI), is one of the main ways microvm has
> to reduce not only boot time, but also the exposed surface and the
> general footprint.
> 
> I think we need to discuss and settle whether using virtio-mmio (even if
> maintained and upgraded to virtio 1) for a new machine type is
> acceptable or not. Because if it isn't, we should probably just ditch
> the whole microvm idea and move to something else.

I agree.  IMNSHO the reduced attack surface from removing PCI is
(mostly) security theater, however the boot time numbers that Sergio
showed for microvm are quite extreme and I don't think there is any hope
of getting even close with a PCI-based virtual machine.

So I'd even go a step further: if using virtio-mmio for a new machine
type is not acceptable, we should admit that boot time optimization in
QEMU is basically as good as it can get---low-hanging fruit has been
picked with PVH and mmap is the logical next step, but all that's left
is optimizing the guest or something else.

I must say that -M microvm took a while to grow on me, but I think it's
a great example of how the infrastructure provided by QEMU provides
useful features for free, even for the simplest emulated hardware.  For
example, in v3 microvm could only boot from PVH kernels, but the next
firmware-enabled version reuses more of the PC code and thus supports
all of vmlinuz, multiboot and PVH.

Again: Sergio has been very receptive to feedback and has provided
numbers to back the design choices, and we should reciprocate or at
least be very clear on the constraints.

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 9 months ago
On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 25/07/19 12:42, Sergio Lopez wrote:
> > Peter Maydell <peter.maydell@linaro.org> writes:
> >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> >>> OK so please start with adding virtio 1 support. Guest bits
> >>> have been ready for years now.
> >>
> >> I'd still rather we just used pci virtio. If pci isn't
> >> fast enough at startup, do something to make it faster...
> >
> > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > to reduce not only boot time, but also the exposed surface and the
> > general footprint.
> >
> > I think we need to discuss and settle whether using virtio-mmio (even if
> > maintained and upgraded to virtio 1) for a new machine type is
> > acceptable or not. Because if it isn't, we should probably just ditch
> > the whole microvm idea and move to something else.
>
> I agree.  IMNSHO the reduced attack surface from removing PCI is
> (mostly) security theater, however the boot time numbers that Sergio
> showed for microvm are quite extreme and I don't think there is any hope
> of getting even close with a PCI-based virtual machine.
>
> So I'd even go a step further: if using virtio-mmio for a new machine
> type is not acceptable, we should admit that boot time optimization in
> QEMU is basically as good as it can get---low-hanging fruit has been
> picked with PVH and mmap is the logical next step, but all that's left
> is optimizing the guest or something else.

I haven't seen enough analysis to declare boot time optimization done.
QEMU startup can be profiled and improved.

The numbers show that removing PCI and ACPI makes things faster but
this doesn't justify removing them.  Understanding of why they are
slow is what justifies removing them.  Otherwise it could just be a
misconfiguration, inefficient implementation, etc and we've seen there
is low-hanging fruit.

How much time is spent doing PCI initialization?  Is the vmexit
pattern for PCI initialization as good as the hardware interface
allows?

Without an analysis of why things are slow it's not possible come to
an informed decision.

Stefan

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On 25/07/19 12:42, Sergio Lopez wrote:
> > > Peter Maydell <peter.maydell@linaro.org> writes:
> > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >>> OK so please start with adding virtio 1 support. Guest bits
> > >>> have been ready for years now.
> > >>
> > >> I'd still rather we just used pci virtio. If pci isn't
> > >> fast enough at startup, do something to make it faster...
> > >
> > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > to reduce not only boot time, but also the exposed surface and the
> > > general footprint.
> > >
> > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > maintained and upgraded to virtio 1) for a new machine type is
> > > acceptable or not. Because if it isn't, we should probably just ditch
> > > the whole microvm idea and move to something else.
> >
> > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > (mostly) security theater, however the boot time numbers that Sergio
> > showed for microvm are quite extreme and I don't think there is any hope
> > of getting even close with a PCI-based virtual machine.
> >
> > So I'd even go a step further: if using virtio-mmio for a new machine
> > type is not acceptable, we should admit that boot time optimization in
> > QEMU is basically as good as it can get---low-hanging fruit has been
> > picked with PVH and mmap is the logical next step, but all that's left
> > is optimizing the guest or something else.
> 
> I haven't seen enough analysis to declare boot time optimization done.
> QEMU startup can be profiled and improved.

Right, and that will always stay the case. OTOH imho microvm is
non-intrusive enough, and small enough, that we'd just put it upstream
after addressing low-level comments.
This will allow more contributions from people interested in boot time.
With no cross-version migration support, or maybe migration
disabled completely, maintainance burden should not be too high.
Not everyone wants to hack on pci/acpi specifically.


> The numbers show that removing PCI and ACPI makes things faster but
> this doesn't justify removing them.  Understanding of why they are
> slow is what justifies removing them.  Otherwise it could just be a
> misconfiguration, inefficient implementation, etc and we've seen there
> is low-hanging fruit.
> 
> How much time is spent doing PCI initialization?  Is the vmexit
> pattern for PCI initialization as good as the hardware interface
> allows?

I know in the bios we wanted to use memory mapped for pci config
accesses for a very long time now. This makes each vmexit slower but
cuts the number of exits by half. Only affects seabios though.




> Without an analysis of why things are slow it's not possible come to
> an informed decision.
> 
> Stefan

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Stefan Hajnoczi 4 years, 9 months ago
On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > >>> have been ready for years now.
> > > >>
> > > >> I'd still rather we just used pci virtio. If pci isn't
> > > >> fast enough at startup, do something to make it faster...
> > > >
> > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > to reduce not only boot time, but also the exposed surface and the
> > > > general footprint.
> > > >
> > > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > the whole microvm idea and move to something else.
> > >
> > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > (mostly) security theater, however the boot time numbers that Sergio
> > > showed for microvm are quite extreme and I don't think there is any hope
> > > of getting even close with a PCI-based virtual machine.
> > >
> > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > type is not acceptable, we should admit that boot time optimization in
> > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > picked with PVH and mmap is the logical next step, but all that's left
> > > is optimizing the guest or something else.
> >
> > I haven't seen enough analysis to declare boot time optimization done.
> > QEMU startup can be profiled and improved.
>
> Right, and that will always stay the case.

The microvm design has a premise and it can be answered definitively
through performance analysis.

If I had to explain to someone why PCI or ACPI significantly slows
things down, I couldn't honestly do so.  I say significantly because
PCI init definitely requires more vmexits but can it be a small
number?  For ACPI I have no idea why it would consume significant
amounts of time.

Until we have this knowledge, the premise of microvm is unproven and
merging it would be premature because maybe we can get into the same
ballpark by optimizing existing code.

I'm sorry for being a pain.  I actually think the analysis will
support microvm, but it still needs to be done in order to justify it.

Stefan

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 02:26:12PM +0100, Stefan Hajnoczi wrote:
> On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote:
> > > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > On 25/07/19 12:42, Sergio Lopez wrote:
> > > > > Peter Maydell <peter.maydell@linaro.org> writes:
> > > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >>> OK so please start with adding virtio 1 support. Guest bits
> > > > >>> have been ready for years now.
> > > > >>
> > > > >> I'd still rather we just used pci virtio. If pci isn't
> > > > >> fast enough at startup, do something to make it faster...
> > > > >
> > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has
> > > > > to reduce not only boot time, but also the exposed surface and the
> > > > > general footprint.
> > > > >
> > > > > I think we need to discuss and settle whether using virtio-mmio (even if
> > > > > maintained and upgraded to virtio 1) for a new machine type is
> > > > > acceptable or not. Because if it isn't, we should probably just ditch
> > > > > the whole microvm idea and move to something else.
> > > >
> > > > I agree.  IMNSHO the reduced attack surface from removing PCI is
> > > > (mostly) security theater, however the boot time numbers that Sergio
> > > > showed for microvm are quite extreme and I don't think there is any hope
> > > > of getting even close with a PCI-based virtual machine.
> > > >
> > > > So I'd even go a step further: if using virtio-mmio for a new machine
> > > > type is not acceptable, we should admit that boot time optimization in
> > > > QEMU is basically as good as it can get---low-hanging fruit has been
> > > > picked with PVH and mmap is the logical next step, but all that's left
> > > > is optimizing the guest or something else.
> > >
> > > I haven't seen enough analysis to declare boot time optimization done.
> > > QEMU startup can be profiled and improved.
> >
> > Right, and that will always stay the case.
> 
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.

well with pci each device describes itself. you read
this description dword by dword normally. typical
description is 20-50 words.

if both bios and linux do this, that's twice the amount.

bios also uses two vmexits for each access.

there's also the resource allocation game.

I would say up to 200 exits per device is reasonable.


>  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?

each bus is scanned for devices. 32 accesses, 256 bus numbers
(that's the lastbus thing). Paolo posted a hack just
for the root bus but whenever we have a bridge the problem
will just re-surface.

pcie is actually link based so downstream buses do not
need to be scanned outside device 0 unless we see
a multifunction bit set. I don't think linux
implements this optimization atm.
But still the case for internal buses.


> For ACPI I have no idea why it would consume significant
> amounts of time.


me neither. I suspect it's not vmexit related at all.  Is ACPI driver in
linux just slow?  It's not been designed to be on any data path...
I'd love to know. I don't feel it's fair to ask someone
interested in writing new performant code to necessary optimize
old non-performant one.

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.

maybe but who is working on this right now?

If it's possible to make PC faster but not enough people
know how to do it, and enough people know how to make microvm
faster, then it does not matter what's possible in theory.


> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.
> 
> Stefan

At some level it would be great to have someone do detailed performance
profiling. But it is a lot of work, which also needs to be justified
given there's working code, and it's not bad code at that.

Yes speeding up PC would be nice but if everyone's gut feeling is it
won't get us what microvm is trying to achieve, why spend cycles making
sure?

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 15:26, Stefan Hajnoczi wrote:
> The microvm design has a premise and it can be answered definitively
> through performance analysis.
> 
> If I had to explain to someone why PCI or ACPI significantly slows
> things down, I couldn't honestly do so.  I say significantly because
> PCI init definitely requires more vmexits but can it be a small
> number?  For ACPI I have no idea why it would consume significant
> amounts of time.

My guess is that it's just a lot of code that has to run. :(

> Until we have this knowledge, the premise of microvm is unproven and
> merging it would be premature because maybe we can get into the same
> ballpark by optimizing existing code.
> 
> I'm sorry for being a pain.  I actually think the analysis will
> support microvm, but it still needs to be done in order to justify it.

No, you're not a pain, you're explaining your reasoning and that helps.

To me *maintainability is the biggest consideration* when introducing a
new feature.  "We can do just as well with q35" is a good reason to
deprecate and delete microvm, but not a good reason to reject it now as
long as microvm is good enough in terms of maintainability.  Keeping it
out of tree only makes it harder to do this kind of experiment.  virtio
1 seems to be the biggest remaining blocker and I think it'd be a good
thing to have even for the ARM virt machine type.

FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
and ~25 ms in the kernel.  I must say that's pretty good, but it's still
30% of the whole boot time and reducing it is the hardest part.  If
having microvm in tree can help reducing it, good.  Yes, it will get
users, but most likely they will have to support pc or q35 as a fallback
so we could still delete microvm at any time with the due deprecation
period if it turns out to be a failed experiment.

Whether to use qboot or SeaBIOS for microvm is another story, but it's
an implementation detail as long as the ROM size doesn't change and/or
we don't do versioned machine types.  So we can switch from one to the
other at any time; we can also include qboot directly in QEMU's tree,
without going through a submodule, which also reduces the infrastructure
needed (mirrors, etc.) and makes it easier to delete it.

Paolo

(*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
end up measured as PCI in SeaBIOS, due to different init order, so the
real firmware cost of PAM and PCI initialization should be 5ms for qboot
and 10ms for SeaBIOS.

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 03:43:12PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > The microvm design has a premise and it can be answered definitively
> > through performance analysis.
> > 
> > If I had to explain to someone why PCI or ACPI significantly slows
> > things down, I couldn't honestly do so.  I say significantly because
> > PCI init definitely requires more vmexits but can it be a small
> > number?  For ACPI I have no idea why it would consume significant
> > amounts of time.
> 
> My guess is that it's just a lot of code that has to run. :(
> 
> > Until we have this knowledge, the premise of microvm is unproven and
> > merging it would be premature because maybe we can get into the same
> > ballpark by optimizing existing code.
> > 
> > I'm sorry for being a pain.  I actually think the analysis will
> > support microvm, but it still needs to be done in order to justify it.
> 
> No, you're not a pain, you're explaining your reasoning and that helps.
> 
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.

Yep. E.g. virtio-iommu guys wanted that too.

> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.

How did you measure the qemu time btw?

>  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
> 
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
> 
> Paolo
> 
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 15:54, Michael S. Tsirkin wrote:
>> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> and ~25 ms in the kernel.
> How did you measure the qemu time btw?
> 

It's QEMU startup, but not QEMU altogether.  For example the time spent
in memory.c when a BAR is programmed is not part of those 10 ms.

So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
65 ms.

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 04:13:13PM +0200, Paolo Bonzini wrote:
> On 25/07/19 15:54, Michael S. Tsirkin wrote:
> >> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> >> and ~25 ms in the kernel.
> > How did you measure the qemu time btw?
> > 
> 
> It's QEMU startup, but not QEMU altogether.  For example the time spent
> in memory.c when a BAR is programmed is not part of those 10 ms.
> 
> So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs
> 65 ms.
> 
> Paolo

Oh so it could be eventfd or whatever, just as well.

I actually wonder whether we spend much time within
synchronize_* calls. eventfd triggers this a  lot of times.

How about ioeventfd=off? Does this speed up things?



-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Peter Maydell 4 years, 9 months ago
On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.

I think maintainability matters, but also important is "are
we going in the right direction in the first place?".
virtio-mmio is (variously deliberately and accidentally)
quite a long way behind virtio-pci, and certain kinds of things
(hotplug, extensibility beyond a certain number of endpoints)
are not going to be possible (either ever, or without a lot
of extra design and implementation work to reimplement stuff
we have already today with PCI). Are we sure we're not going
to end up with a stream of "oh, now we need to implement X for
virtio-mmio (that virtio-pci already has)", "users want Y now
(that virtio-pci already has)", etc?

The other thing is that once we've introduced something we're
stuck with whatever it does, because we don't like breaking
backwards compatibility. So I think getting the virtio-legacy
vs virtio-1 story sorted out before we land microvm is
important, at least to the point where we know we haven't
backed ourselves into a corner or required a lot of extra
effort on transitional-device support that we could have
avoided.

Which isn't to say that I'm against the microvm approach;
just that I'd like us to consider and make a decision on
these issues before landing it, rather than just saying
"the patches in themselves look good, let's merge it".

thanks
-- PMM

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 16:04, Peter Maydell wrote:
> On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> To me *maintainability is the biggest consideration* when introducing a
>> new feature.  "We can do just as well with q35" is a good reason to
>> deprecate and delete microvm, but not a good reason to reject it now as
>> long as microvm is good enough in terms of maintainability.
> 
> I think maintainability matters, but also important is "are
> we going in the right direction in the first place?".
> virtio-mmio is (variously deliberately and accidentally)
> quite a long way behind virtio-pci, and certain kinds of things
> (hotplug, extensibility beyond a certain number of endpoints)
> are not going to be possible (either ever, or without a lot
> of extra design and implementation work to reimplement stuff
> we have already today with PCI). Are we sure we're not going
> to end up with a stream of "oh, now we need to implement X for
> virtio-mmio (that virtio-pci already has)", "users want Y now
> (that virtio-pci already has)", etc?

I think this is part of maintainability in a wider sense.  For every
missing feature there should be a good reason why it's not needed.  And
if there is already code to do that in QEMU, then there should be an
excellent reason why it's not being used.  (This was the essence of the
firmware debate).

So for microvm you could do without hotplug because the idea is that you
just tear down the VM and restart it.  Lack of MSI is actually what
worries me the most, but we could say that microvm clients generally
have little multiprocessing so it's not common to have multiple network
flows at the same time and so you don't need multiqueue.

For microvm in particular there are two reasons why we can take some
shortcuts (but with care):

- we won't support versioned machine types for microvm.  microvm guests
die every time you upgrade QEMU, by design.  So this is not another QED,
which implemented more features than qcow2 but did so at the wrong place
of the stack.  In fact it's exactly the opposite (it implements less
features, so that the implementation of e.g. q35 or PCI is untouched and
does not need one-off boot time optimization hacks)

- we know that Amazon is using something very similar to microvm in
production, with virtio-mmio, so the feature set is at least usable for
something.

> The other thing is that once we've introduced something we're
> stuck with whatever it does, because we don't like breaking
> backwards compatibility. So I think getting the virtio-legacy
> vs virtio-1 story sorted out before we land microvm is
> important, at least to the point where we know we haven't
> backed ourselves into a corner or required a lot of extra
> effort on transitional-device support that we could have
> avoided.

Even though we won't support versioned machine types, I think there is
agreement that virtio 0.9 is a bad idea and should be fixed.

Paolo

> Which isn't to say that I'm against the microvm approach;
> just that I'd like us to consider and make a decision on
> these issues before landing it, rather than just saying
> "the patches in themselves look good, let's merge it".
> 
> thanks
> -- PMM
> 


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 04:26:42PM +0200, Paolo Bonzini wrote:
> On 25/07/19 16:04, Peter Maydell wrote:
> > On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> To me *maintainability is the biggest consideration* when introducing a
> >> new feature.  "We can do just as well with q35" is a good reason to
> >> deprecate and delete microvm, but not a good reason to reject it now as
> >> long as microvm is good enough in terms of maintainability.
> > 
> > I think maintainability matters, but also important is "are
> > we going in the right direction in the first place?".
> > virtio-mmio is (variously deliberately and accidentally)
> > quite a long way behind virtio-pci, and certain kinds of things
> > (hotplug, extensibility beyond a certain number of endpoints)
> > are not going to be possible (either ever, or without a lot
> > of extra design and implementation work to reimplement stuff
> > we have already today with PCI). Are we sure we're not going
> > to end up with a stream of "oh, now we need to implement X for
> > virtio-mmio (that virtio-pci already has)", "users want Y now
> > (that virtio-pci already has)", etc?
> 
> I think this is part of maintainability in a wider sense.  For every
> missing feature there should be a good reason why it's not needed.  And
> if there is already code to do that in QEMU, then there should be an
> excellent reason why it's not being used.  (This was the essence of the
> firmware debate).
> 
> So for microvm you could do without hotplug because the idea is that you
> just tear down the VM and restart it.  Lack of MSI is actually what
> worries me the most, but we could say that microvm clients generally
> have little multiprocessing so it's not common to have multiple network
> flows at the same time and so you don't need multiqueue.

Me too, and in fact someone just posted
	virtio-mmio: support multiple interrupt vectors


> For microvm in particular there are two reasons why we can take some
> shortcuts (but with care):
> 
> - we won't support versioned machine types for microvm.  microvm guests
> die every time you upgrade QEMU, by design.  So this is not another QED,
> which implemented more features than qcow2 but did so at the wrong place
> of the stack.  In fact it's exactly the opposite (it implements less
> features, so that the implementation of e.g. q35 or PCI is untouched and
> does not need one-off boot time optimization hacks)
> 
> - we know that Amazon is using something very similar to microvm in
> production, with virtio-mmio, so the feature set is at least usable for
> something.
> 
> > The other thing is that once we've introduced something we're
> > stuck with whatever it does, because we don't like breaking
> > backwards compatibility. So I think getting the virtio-legacy
> > vs virtio-1 story sorted out before we land microvm is
> > important, at least to the point where we know we haven't
> > backed ourselves into a corner or required a lot of extra
> > effort on transitional-device support that we could have
> > avoided.
> 
> Even though we won't support versioned machine types, I think there is
> agreement that virtio 0.9 is a bad idea and should be fixed.
> 
> Paolo

Right, for the simple reason that mmio does not support transitional
devices, only transitional drivers.  So if we commit to supporting old
guests, we won't be able to back out of that.

> > Which isn't to say that I'm against the microvm approach;
> > just that I'd like us to consider and make a decision on
> > these issues before landing it, rather than just saying
> > "the patches in themselves look good, let's merge it".
> > 
> > thanks
> > -- PMM
> > 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> The microvm design has a premise and it can be answered definitively
>> through performance analysis.
>> 
>> If I had to explain to someone why PCI or ACPI significantly slows
>> things down, I couldn't honestly do so.  I say significantly because
>> PCI init definitely requires more vmexits but can it be a small
>> number?  For ACPI I have no idea why it would consume significant
>> amounts of time.
>
> My guess is that it's just a lot of code that has to run. :(

I think I haven't shared any numbers about ACPI.

I don't have details about where exactly the time is spent, but
compiling a guest kernel without ACPI decreases the average boot time in
~12ms, and the kernel's unstripped ELF binary size goes down in a
whooping ~300KiB.

On the other hand, removing ACPI from QEMU decreases its initialization
time in ~5ms, and the binary size is ~183KiB smaller.

IMHO, those are pretty relevant savings on both fronts.

>> Until we have this knowledge, the premise of microvm is unproven and
>> merging it would be premature because maybe we can get into the same
>> ballpark by optimizing existing code.
>> 
>> I'm sorry for being a pain.  I actually think the analysis will
>> support microvm, but it still needs to be done in order to justify it.
>
> No, you're not a pain, you're explaining your reasoning and that helps.
>
> To me *maintainability is the biggest consideration* when introducing a
> new feature.  "We can do just as well with q35" is a good reason to
> deprecate and delete microvm, but not a good reason to reject it now as
> long as microvm is good enough in terms of maintainability.  Keeping it
> out of tree only makes it harder to do this kind of experiment.  virtio
> 1 seems to be the biggest remaining blocker and I think it'd be a good
> thing to have even for the ARM virt machine type.
>
> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> 30% of the whole boot time and reducing it is the hardest part.  If
> having microvm in tree can help reducing it, good.  Yes, it will get
> users, but most likely they will have to support pc or q35 as a fallback
> so we could still delete microvm at any time with the due deprecation
> period if it turns out to be a failed experiment.
>
> Whether to use qboot or SeaBIOS for microvm is another story, but it's
> an implementation detail as long as the ROM size doesn't change and/or
> we don't do versioned machine types.  So we can switch from one to the
> other at any time; we can also include qboot directly in QEMU's tree,
> without going through a submodule, which also reduces the infrastructure
> needed (mirrors, etc.) and makes it easier to delete it.
>
> Paolo
>
> (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> end up measured as PCI in SeaBIOS, due to different init order, so the
> real firmware cost of PAM and PCI initialization should be 5ms for qboot
> and 10ms for SeaBIOS.

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> >> The microvm design has a premise and it can be answered definitively
> >> through performance analysis.
> >> 
> >> If I had to explain to someone why PCI or ACPI significantly slows
> >> things down, I couldn't honestly do so.  I say significantly because
> >> PCI init definitely requires more vmexits but can it be a small
> >> number?  For ACPI I have no idea why it would consume significant
> >> amounts of time.
> >
> > My guess is that it's just a lot of code that has to run. :(
> 
> I think I haven't shared any numbers about ACPI.
> 
> I don't have details about where exactly the time is spent, but
> compiling a guest kernel without ACPI decreases the average boot time in
> ~12ms, and the kernel's unstripped ELF binary size goes down in a
> whooping ~300KiB.

At least the binary size is hardly surprising.

I'm guessing you built in lots of drivers.

It would be educational to try to enable ACPI core but disable all
optional features.


> On the other hand, removing ACPI from QEMU decreases its initialization
> time in ~5ms, and the binary size is ~183KiB smaller.

Yes - ACPI generation uses a ton of allocations and data copies.

Need to play with pre-allocation strategies. Maybe something
as simple as:

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index f3fdfefcd5..24becc069e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
     acpi_get_pci_holes(&pci_hole, &pci_hole64);
     acpi_get_slic_oem(&slic_oem);
 
+#define DEFAULT_ARRAY_SIZE 16
     table_offsets = g_array_new(false, true /* clear */,
-                                        sizeof(uint32_t));
+                                        sizeof(uint32_t),
+                                        DEFAULT_ARRAY_SIZE);
     ACPI_BUILD_DPRINTF("init ACPI tables\n");
 
     bios_linker_loader_alloc(tables->linker,

will already help a bit.

> 
> IMHO, those are pretty relevant savings on both fronts.
> 
> >> Until we have this knowledge, the premise of microvm is unproven and
> >> merging it would be premature because maybe we can get into the same
> >> ballpark by optimizing existing code.
> >> 
> >> I'm sorry for being a pain.  I actually think the analysis will
> >> support microvm, but it still needs to be done in order to justify it.
> >
> > No, you're not a pain, you're explaining your reasoning and that helps.
> >
> > To me *maintainability is the biggest consideration* when introducing a
> > new feature.  "We can do just as well with q35" is a good reason to
> > deprecate and delete microvm, but not a good reason to reject it now as
> > long as microvm is good enough in terms of maintainability.  Keeping it
> > out of tree only makes it harder to do this kind of experiment.  virtio
> > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > thing to have even for the ARM virt machine type.
> >
> > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > 30% of the whole boot time and reducing it is the hardest part.  If
> > having microvm in tree can help reducing it, good.  Yes, it will get
> > users, but most likely they will have to support pc or q35 as a fallback
> > so we could still delete microvm at any time with the due deprecation
> > period if it turns out to be a failed experiment.
> >
> > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > an implementation detail as long as the ROM size doesn't change and/or
> > we don't do versioned machine types.  So we can switch from one to the
> > other at any time; we can also include qboot directly in QEMU's tree,
> > without going through a submodule, which also reduces the infrastructure
> > needed (mirrors, etc.) and makes it easier to delete it.
> >
> > Paolo
> >
> > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > end up measured as PCI in SeaBIOS, due to different init order, so the
> > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > and 10ms for SeaBIOS.
> 



Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
> > 
> > Paolo Bonzini <pbonzini@redhat.com> writes:
> > 
> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
> > >> The microvm design has a premise and it can be answered definitively
> > >> through performance analysis.
> > >> 
> > >> If I had to explain to someone why PCI or ACPI significantly slows
> > >> things down, I couldn't honestly do so.  I say significantly because
> > >> PCI init definitely requires more vmexits but can it be a small
> > >> number?  For ACPI I have no idea why it would consume significant
> > >> amounts of time.
> > >
> > > My guess is that it's just a lot of code that has to run. :(
> > 
> > I think I haven't shared any numbers about ACPI.
> > 
> > I don't have details about where exactly the time is spent, but
> > compiling a guest kernel without ACPI decreases the average boot time in
> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
> > whooping ~300KiB.
> 
> At least the binary size is hardly surprising.
> 
> I'm guessing you built in lots of drivers.
> 
> It would be educational to try to enable ACPI core but disable all
> optional features.

Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.


> 
> > On the other hand, removing ACPI from QEMU decreases its initialization
> > time in ~5ms, and the binary size is ~183KiB smaller.
> 
> Yes - ACPI generation uses a ton of allocations and data copies.
> 
> Need to play with pre-allocation strategies. Maybe something
> as simple as:
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index f3fdfefcd5..24becc069e 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>      acpi_get_pci_holes(&pci_hole, &pci_hole64);
>      acpi_get_slic_oem(&slic_oem);
>  
> +#define DEFAULT_ARRAY_SIZE 16
>      table_offsets = g_array_new(false, true /* clear */,
> -                                        sizeof(uint32_t));
> +                                        sizeof(uint32_t),
> +                                        DEFAULT_ARRAY_SIZE);
>      ACPI_BUILD_DPRINTF("init ACPI tables\n");
>  
>      bios_linker_loader_alloc(tables->linker,
> 
> will already help a bit.
> 
> > 
> > IMHO, those are pretty relevant savings on both fronts.
> > 
> > >> Until we have this knowledge, the premise of microvm is unproven and
> > >> merging it would be premature because maybe we can get into the same
> > >> ballpark by optimizing existing code.
> > >> 
> > >> I'm sorry for being a pain.  I actually think the analysis will
> > >> support microvm, but it still needs to be done in order to justify it.
> > >
> > > No, you're not a pain, you're explaining your reasoning and that helps.
> > >
> > > To me *maintainability is the biggest consideration* when introducing a
> > > new feature.  "We can do just as well with q35" is a good reason to
> > > deprecate and delete microvm, but not a good reason to reject it now as
> > > long as microvm is good enough in terms of maintainability.  Keeping it
> > > out of tree only makes it harder to do this kind of experiment.  virtio
> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
> > > thing to have even for the ARM virt machine type.
> > >
> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
> > > 30% of the whole boot time and reducing it is the hardest part.  If
> > > having microvm in tree can help reducing it, good.  Yes, it will get
> > > users, but most likely they will have to support pc or q35 as a fallback
> > > so we could still delete microvm at any time with the due deprecation
> > > period if it turns out to be a failed experiment.
> > >
> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
> > > an implementation detail as long as the ROM size doesn't change and/or
> > > we don't do versioned machine types.  So we can switch from one to the
> > > other at any time; we can also include qboot directly in QEMU's tree,
> > > without going through a submodule, which also reduces the infrastructure
> > > needed (mirrors, etc.) and makes it easier to delete it.
> > >
> > > Paolo
> > >
> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
> > > end up measured as PCI in SeaBIOS, due to different init order, so the
> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot
> > > and 10ms for SeaBIOS.
> > 
> 
> 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Paolo Bonzini 4 years, 9 months ago
On 25/07/19 17:01, Michael S. Tsirkin wrote:
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

A lot of them are select'ed so it's not easy.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

That's what the NEMU guys experimented with.  It's not supported by our
DSDT since it uses ACPI GPE, and the reduction in code size is small
(about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).

Paolo

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Michael S. Tsirkin 4 years, 9 months ago
On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> On 25/07/19 17:01, Michael S. Tsirkin wrote:
> >> It would be educational to try to enable ACPI core but disable all
> >> optional features.
> 
> A lot of them are select'ed so it's not easy.
> 
> > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.
> 
> That's what the NEMU guys experimented with.  It's not supported by our
> DSDT since it uses ACPI GPE,

Well there are two GPE blocks in FADT. We could just switch to
these if necesary I think.

> and the reduction in code size is small
> (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> 
> Paolo

Well ACPI is 150k loc I think, right?

linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
 145926 total

So 100k wouldn't be too shabby.

-- 
MST

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Igor Mammedov 4 years, 9 months ago
On Thu, 25 Jul 2019 13:38:48 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote:
> > On 25/07/19 17:01, Michael S. Tsirkin wrote:  
> > >> It would be educational to try to enable ACPI core but disable all
> > >> optional features.  
> > 
> > A lot of them are select'ed so it's not easy.
> >   
> > > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.  
> > 
> > That's what the NEMU guys experimented with.  It's not supported by our
> > DSDT since it uses ACPI GPE,  
> 
> Well there are two GPE blocks in FADT. We could just switch to
> these if necesary I think.

if it's simplistic vm we could build dedicated DSDT (or whole set of tables)
for it and use reduced profile like arm-virt machine does (just a newer
version of FADT with need flags set). That probably would cut acpi cost on
QEMU side.

> > and the reduction in code size is small
> > (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?).
> > 
> > Paolo  
> 
> Well ACPI is 150k loc I think, right?
> 
> linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1
>  145926 total
> 
> So 100k wouldn't be too shabby.
> 


Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by Sergio Lopez 4 years, 9 months ago
Michael S. Tsirkin <mst@redhat.com> writes:

> On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote:
>> > 
>> > Paolo Bonzini <pbonzini@redhat.com> writes:
>> > 
>> > > On 25/07/19 15:26, Stefan Hajnoczi wrote:
>> > >> The microvm design has a premise and it can be answered definitively
>> > >> through performance analysis.
>> > >> 
>> > >> If I had to explain to someone why PCI or ACPI significantly slows
>> > >> things down, I couldn't honestly do so.  I say significantly because
>> > >> PCI init definitely requires more vmexits but can it be a small
>> > >> number?  For ACPI I have no idea why it would consume significant
>> > >> amounts of time.
>> > >
>> > > My guess is that it's just a lot of code that has to run. :(
>> > 
>> > I think I haven't shared any numbers about ACPI.
>> > 
>> > I don't have details about where exactly the time is spent, but
>> > compiling a guest kernel without ACPI decreases the average boot time in
>> > ~12ms, and the kernel's unstripped ELF binary size goes down in a
>> > whooping ~300KiB.
>> 
>> At least the binary size is hardly surprising.
>> 
>> I'm guessing you built in lots of drivers.
>> 
>> It would be educational to try to enable ACPI core but disable all
>> optional features.

I just tried disabling everything that menuconfig allowed me to. Saves
~27KiB and doesn't improve boot time.

> Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational.

I also tried enabling this one in my original config. It saves ~11.5KiB,
and has on impact on boot time either.

>> 
>> > On the other hand, removing ACPI from QEMU decreases its initialization
>> > time in ~5ms, and the binary size is ~183KiB smaller.
>> 
>> Yes - ACPI generation uses a ton of allocations and data copies.
>> 
>> Need to play with pre-allocation strategies. Maybe something
>> as simple as:
>> 
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index f3fdfefcd5..24becc069e 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>      acpi_get_pci_holes(&pci_hole, &pci_hole64);
>>      acpi_get_slic_oem(&slic_oem);
>>  
>> +#define DEFAULT_ARRAY_SIZE 16
>>      table_offsets = g_array_new(false, true /* clear */,
>> -                                        sizeof(uint32_t));
>> +                                        sizeof(uint32_t),
>> +                                        DEFAULT_ARRAY_SIZE);
>>      ACPI_BUILD_DPRINTF("init ACPI tables\n");
>>  
>>      bios_linker_loader_alloc(tables->linker,
>> 
>> will already help a bit.
>> 
>> > 
>> > IMHO, those are pretty relevant savings on both fronts.
>> > 
>> > >> Until we have this knowledge, the premise of microvm is unproven and
>> > >> merging it would be premature because maybe we can get into the same
>> > >> ballpark by optimizing existing code.
>> > >> 
>> > >> I'm sorry for being a pain.  I actually think the analysis will
>> > >> support microvm, but it still needs to be done in order to justify it.
>> > >
>> > > No, you're not a pain, you're explaining your reasoning and that helps.
>> > >
>> > > To me *maintainability is the biggest consideration* when introducing a
>> > > new feature.  "We can do just as well with q35" is a good reason to
>> > > deprecate and delete microvm, but not a good reason to reject it now as
>> > > long as microvm is good enough in terms of maintainability.  Keeping it
>> > > out of tree only makes it harder to do this kind of experiment.  virtio
>> > > 1 seems to be the biggest remaining blocker and I think it'd be a good
>> > > thing to have even for the ARM virt machine type.
>> > >
>> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*)
>> > > and ~25 ms in the kernel.  I must say that's pretty good, but it's still
>> > > 30% of the whole boot time and reducing it is the hardest part.  If
>> > > having microvm in tree can help reducing it, good.  Yes, it will get
>> > > users, but most likely they will have to support pc or q35 as a fallback
>> > > so we could still delete microvm at any time with the due deprecation
>> > > period if it turns out to be a failed experiment.
>> > >
>> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's
>> > > an implementation detail as long as the ROM size doesn't change and/or
>> > > we don't do versioned machine types.  So we can switch from one to the
>> > > other at any time; we can also include qboot directly in QEMU's tree,
>> > > without going through a submodule, which also reduces the infrastructure
>> > > needed (mirrors, etc.) and makes it easier to delete it.
>> > >
>> > > Paolo
>> > >
>> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the
>> > > last write to 0xcf8.  I suspect part of qboot's 10ms boot time actually
>> > > end up measured as PCI in SeaBIOS, due to different init order, so the
>> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot
>> > > and 10ms for SeaBIOS.
>> > 
>> 
>> 

Re: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Posted by no-reply@patchew.org 4 years, 10 months ago
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type
Message-id: 20190702121106.28374-1-slp@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20190702113414.6896-1-armbru@redhat.com -> patchew/20190702113414.6896-1-armbru@redhat.com
Switched to a new branch 'test'
8ebe540 hw/i386: Introduce the microvm machine type
ac71c2a hw/i386: Factorize PVH related functions
faeccbd hw/i386: Add an Intel MPTable generator
7540b93 hw/virtio: Factorize virtio-mmio headers

=== OUTPUT BEGIN ===
1/4 Checking commit 7540b9358a0f (hw/virtio: Factorize virtio-mmio headers)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#66: 
new file mode 100644

total: 0 errors, 1 warnings, 105 lines checked

Patch 1/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/4 Checking commit faeccbd2c589 (hw/i386: Add an Intel MPTable generator)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#16: 
new file mode 100644

total: 0 errors, 1 warnings, 374 lines checked

Patch 2/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
3/4 Checking commit ac71c2af3972 (hw/i386: Factorize PVH related functions)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#186: 
new file mode 100644

ERROR: do not initialise statics to 0 or NULL
#210: FILE: hw/i386/pvh.c:20:
+static size_t pvh_start_addr = 0;

total: 1 errors, 1 warnings, 281 lines checked

Patch 3/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

4/4 Checking commit 8ebe540c4430 (hw/i386: Introduce the microvm machine type)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#67: 
new file mode 100644

ERROR: Error messages should not contain newlines
#291: FILE: hw/i386/microvm.c:220:
+            error_report("qemu: error reading initrd %s: %s\n",

ERROR: Error messages should not contain newlines
#299: FILE: hw/i386/microvm.c:228:
+                         "(max: %"PRIu32", need %"PRId64")\n",

total: 2 errors, 1 warnings, 653 lines checked

Patch 4/4 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com