default-configs/i386-softmmu.mak | 1 + hw/i386/Kconfig | 4 + hw/i386/Makefile.objs | 2 + hw/i386/microvm.c | 550 ++++++++++++++++++++ hw/i386/mptable.c | 156 ++++++ hw/i386/pc.c | 120 +---- hw/i386/pvh.c | 113 ++++ hw/i386/pvh.h | 10 + hw/virtio/virtio-mmio.c | 35 +- hw/virtio/virtio-mmio.h | 60 +++ include/hw/i386/microvm.h | 82 +++ include/hw/i386/mptable.h | 36 ++ include/standard-headers/linux/mpspec_def.h | 182 +++++++ 13 files changed, 1209 insertions(+), 142 deletions(-) create mode 100644 hw/i386/microvm.c create mode 100644 hw/i386/mptable.c create mode 100644 hw/i386/pvh.c create mode 100644 hw/i386/pvh.h create mode 100644 hw/virtio/virtio-mmio.h create mode 100644 include/hw/i386/microvm.h create mode 100644 include/hw/i386/mptable.h create mode 100644 include/standard-headers/linux/mpspec_def.h
Microvm is a machine type inspired by both NEMU and Firecracker, and constructed after the machine model implemented by the latter. It's main purpose is providing users a KVM-only machine type with fast boot times, minimal attack surface (measured as the number of IO ports and MMIO regions exposed to the Guest) and small footprint (specially when combined with the ongoing QEMU modularization effort). Normally, other than the device support provided by KVM itself, microvm only supports virtio-mmio devices. Microvm also includes a legacy mode, which adds an ISA bus with a 16550A serial port, useful for being able to see the early boot kernel messages. Microvm only supports booting PVH-enabled Linux ELF images. Booting other PVH-enabled kernels may be possible, but due to the lack of ACPI and firmware, we're relying on the command line for specifying the location of the virtio-mmio transports. If there's an interest on using this machine type with other kernels, we'll try to find some kind of middle ground solution. This is the list of the exposed IO ports and MMIO regions when running in non-legacy mode: address-space: memory 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio 00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio 00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio 00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio 00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi address-space: I/O 0000000000000000-000000000000ffff (prio 0, i/o): io 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr A QEMU instance with the microvm machine type can be invoked this way: - Normal mode: qemu-system-x86_64 -M microvm -m 512m -smp 2 \ -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ -nodefaults -no-user-config \ -chardev pty,id=virtiocon0,server \ -device virtio-serial-device \ -device virtconsole,chardev=virtiocon0 \ -drive id=test,file=test.img,format=raw,if=none \ -device virtio-blk-device,drive=test \ -netdev tap,id=tap0,script=no,downscript=no \ -device virtio-net-device,netdev=tap0 - Legacy mode: qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ -nodefaults -no-user-config \ -drive id=test,file=test.img,format=raw,if=none \ -device virtio-blk-device,drive=test \ -netdev tap,id=tap0,script=no,downscript=no \ -device virtio-net-device,netdev=tap0 \ -serial stdio Changelog: v3: - Add initrd support (thanks Stefano). v2: - Drop "[PATCH 1/4] hw/i386: Factorize CPU routine". - Simplify machine definition (thanks Eduardo). - Remove use of unneeded NUMA-related callbacks (thanks Eduardo). - Add a patch to factorize PVH-related functions. - Replace use of Linux's Zero Page with PVH (thanks Maran and Paolo). Sergio Lopez (4): hw/virtio: Factorize virtio-mmio headers hw/i386: Add an Intel MPTable generator hw/i386: Factorize PVH related functions hw/i386: Introduce the microvm machine type default-configs/i386-softmmu.mak | 1 + hw/i386/Kconfig | 4 + hw/i386/Makefile.objs | 2 + hw/i386/microvm.c | 550 ++++++++++++++++++++ hw/i386/mptable.c | 156 ++++++ hw/i386/pc.c | 120 +---- hw/i386/pvh.c | 113 ++++ hw/i386/pvh.h | 10 + hw/virtio/virtio-mmio.c | 35 +- hw/virtio/virtio-mmio.h | 60 +++ include/hw/i386/microvm.h | 82 +++ include/hw/i386/mptable.h | 36 ++ include/standard-headers/linux/mpspec_def.h | 182 +++++++ 13 files changed, 1209 insertions(+), 142 deletions(-) create mode 100644 hw/i386/microvm.c create mode 100644 hw/i386/mptable.c create mode 100644 hw/i386/pvh.c create mode 100644 hw/i386/pvh.h create mode 100644 hw/virtio/virtio-mmio.h create mode 100644 include/hw/i386/microvm.h create mode 100644 include/hw/i386/mptable.h create mode 100644 include/standard-headers/linux/mpspec_def.h -- 2.21.0
Hi Sergio, The idea is interesting and I tried to launch a guest by your guide but seems failed to me. I tried both legacy and normal modes, but the vncviewer connected and told me that: The vm has no graphic display device. All the screen in vnc is just black. kernel config: CONFIG_KVM_MMIO=y CONFIG_VIRTIO_MMIO=y I don't know if any specified kernel version/patch/config is needed or anything I missed. Could you kindly give some tips? Thanks very much. Jing > A QEMU instance with the microvm machine type can be invoked this way: > > - Normal mode: > > qemu-system-x86_64 -M microvm -m 512m -smp 2 \ > -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ > -nodefaults -no-user-config \ > -chardev pty,id=virtiocon0,server \ > -device virtio-serial-device \ > -device virtconsole,chardev=virtiocon0 \ > -drive id=test,file=test.img,format=raw,if=none \ > -device virtio-blk-device,drive=test \ > -netdev tap,id=tap0,script=no,downscript=no \ > -device virtio-net-device,netdev=tap0 > > - Legacy mode: > > qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ > -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ > -nodefaults -no-user-config \ > -drive id=test,file=test.img,format=raw,if=none \ > -device virtio-blk-device,drive=test \ > -netdev tap,id=tap0,script=no,downscript=no \ > -device virtio-net-device,netdev=tap0 \ > -serial stdio >
Jing Liu <jing2.liu@linux.intel.com> writes: > Hi Sergio, > > The idea is interesting and I tried to launch a guest by your > guide but seems failed to me. I tried both legacy and normal modes, > but the vncviewer connected and told me that: > The vm has no graphic display device. > All the screen in vnc is just black. The microvm machine type doesn't support any graphics device, so you need to rely on the serial console. > kernel config: > CONFIG_KVM_MMIO=y > CONFIG_VIRTIO_MMIO=y > > I don't know if any specified kernel version/patch/config > is needed or anything I missed. > Could you kindly give some tips? I'm testing it with upstream vanilla Linux. In addition to MMIO, you need to add support for PVH (the next version of this patchset, v4, will support booting from FW, so it'll be possible to use non-PVH ELF kernels and bzImages too). I've just uploaded a working kernel config here: https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9 As for the QEMU command line, something like this should do the trick: ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio If this works, you can move to non-legacy mode with a virtio-console: ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 If is still working, you can try adding some devices too: ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test Sergio. > Thanks very much. > Jing > > > >> A QEMU instance with the microvm machine type can be invoked this way: >> >> - Normal mode: >> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ >> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ >> -nodefaults -no-user-config \ >> -chardev pty,id=virtiocon0,server \ >> -device virtio-serial-device \ >> -device virtconsole,chardev=virtiocon0 \ >> -drive id=test,file=test.img,format=raw,if=none \ >> -device virtio-blk-device,drive=test \ >> -netdev tap,id=tap0,script=no,downscript=no \ >> -device virtio-net-device,netdev=tap0 >> >> - Legacy mode: >> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ >> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ >> -nodefaults -no-user-config \ >> -drive id=test,file=test.img,format=raw,if=none \ >> -device virtio-blk-device,drive=test \ >> -netdev tap,id=tap0,script=no,downscript=no \ >> -device virtio-net-device,netdev=tap0 \ >> -serial stdio >>
Hi Sergio, On 8/29/2019 11:46 PM, Sergio Lopez wrote: > > Jing Liu <jing2.liu@linux.intel.com> writes: > >> Hi Sergio, >> >> The idea is interesting and I tried to launch a guest by your >> guide but seems failed to me. I tried both legacy and normal modes, >> but the vncviewer connected and told me that: >> The vm has no graphic display device. >> All the screen in vnc is just black. > > The microvm machine type doesn't support any graphics device, so you > need to rely on the serial console. Got it. > >> kernel config: >> CONFIG_KVM_MMIO=y >> CONFIG_VIRTIO_MMIO=y >> >> I don't know if any specified kernel version/patch/config >> is needed or anything I missed. >> Could you kindly give some tips? > > I'm testing it with upstream vanilla Linux. In addition to MMIO, you > need to add support for PVH (the next version of this patchset, v4, will > support booting from FW, so it'll be possible to use non-PVH ELF kernels > and bzImages too). > > I've just uploaded a working kernel config here: > > https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9 > Thanks very much and this config is helpful to me. > As for the QEMU command line, something like this should do the trick: > > ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio > > If this works, you can move to non-legacy mode with a virtio-console: > > ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 > I tried the above two ways and it works now. Thanks! > If is still working, you can try adding some devices too: > > ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test > But I'm wondering why the image I used can not be found. root=/dev/vda3 and the same image worked well on normal qemu/guest- config bootup, but didn't work here. The details are, -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \ [ 0.022784] Key type encrypted registered [ 0.022988] VFS: Cannot open root device "vda3" or unknown-block(254,3): error -6 [ 0.023041] Please append a correct "root=" boot option; here are the available partitions: [ 0.023089] fe00 8946688 vda [ 0.023090] driver: virtio_blk [ 0.023143] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,3) [ 0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23 BTW, root=/dev/vda is also tried and didn't work. The dmesg is a little different: [ 0.028050] Key type encrypted registered [ 0.028484] List of all partitions: [ 0.028529] fe00 8946688 vda [ 0.028529] driver: virtio_blk [ 0.028615] No filesystem could mount root, tried: [ 0.028616] ext4 [ 0.028670] [ 0.028712] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(254,0) I tried another ext4 img but still doesn't work. Is there any limitation of blk image? Could I copy your image for simple test? Thanks in advance, Jing > Sergio. > >> Thanks very much. >> Jing >> >> >> >>> A QEMU instance with the microvm machine type can be invoked this way: >>> >>> - Normal mode: >>> >>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ >>> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ >>> -nodefaults -no-user-config \ >>> -chardev pty,id=virtiocon0,server \ >>> -device virtio-serial-device \ >>> -device virtconsole,chardev=virtiocon0 \ >>> -drive id=test,file=test.img,format=raw,if=none \ >>> -device virtio-blk-device,drive=test \ >>> -netdev tap,id=tap0,script=no,downscript=no \ >>> -device virtio-net-device,netdev=tap0 >>> >>> - Legacy mode: >>> >>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ >>> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ >>> -nodefaults -no-user-config \ >>> -drive id=test,file=test.img,format=raw,if=none \ >>> -device virtio-blk-device,drive=test \ >>> -netdev tap,id=tap0,script=no,downscript=no \ >>> -device virtio-net-device,netdev=tap0 \ >>> -serial stdio >>> >
Jing Liu <jing2.liu@linux.intel.com> writes: > Hi Sergio, > > On 8/29/2019 11:46 PM, Sergio Lopez wrote: >> >> Jing Liu <jing2.liu@linux.intel.com> writes: >> >>> Hi Sergio, >>> >>> The idea is interesting and I tried to launch a guest by your >>> guide but seems failed to me. I tried both legacy and normal modes, >>> but the vncviewer connected and told me that: >>> The vm has no graphic display device. >>> All the screen in vnc is just black. >> >> The microvm machine type doesn't support any graphics device, so you >> need to rely on the serial console. > Got it. > >> >>> kernel config: >>> CONFIG_KVM_MMIO=y >>> CONFIG_VIRTIO_MMIO=y >>> >>> I don't know if any specified kernel version/patch/config >>> is needed or anything I missed. >>> Could you kindly give some tips? >> >> I'm testing it with upstream vanilla Linux. In addition to MMIO, you >> need to add support for PVH (the next version of this patchset, v4, will >> support booting from FW, so it'll be possible to use non-PVH ELF kernels >> and bzImages too). >> >> I've just uploaded a working kernel config here: >> >> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9 >> > Thanks very much and this config is helpful to me. > >> As for the QEMU command line, something like this should do the trick: >> >> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio >> >> If this works, you can move to non-legacy mode with a virtio-console: >> >> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 >> > I tried the above two ways and it works now. Thanks! > >> If is still working, you can try adding some devices too: >> >> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test >> > But I'm wondering why the image I used can not be found. > root=/dev/vda3 and the same image worked well on normal qemu/guest- > config bootup, but didn't work here. The details are, > > -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \ > > [ 0.022784] Key type encrypted registered > [ 0.022988] VFS: Cannot open root device "vda3" or > unknown-block(254,3): error -6 > [ 0.023041] Please append a correct "root=" boot option; here are > the available partitions: > [ 0.023089] fe00 8946688 vda > [ 0.023090] driver: virtio_blk > [ 0.023143] Kernel panic - not syncing: VFS: Unable to mount root > fs on unknown-block(254,3) > [ 0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23 > > > BTW, root=/dev/vda is also tried and didn't work. The dmesg is a > little different: > > [ 0.028050] Key type encrypted registered > [ 0.028484] List of all partitions: > [ 0.028529] fe00 8946688 vda > [ 0.028529] driver: virtio_blk > [ 0.028615] No filesystem could mount root, tried: > [ 0.028616] ext4 > [ 0.028670] > [ 0.028712] Kernel panic - not syncing: VFS: Unable to mount root > fs on unknown-block(254,0) > > I tried another ext4 img but still doesn't work. > Is there any limitation of blk image? Could I copy your image for simple > test? The kernel config I posted lacks support for DOS partitions. Adding CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3. Anyway, in case you also want to try booting from /dev/vda (without partitions), this is the recipe I use to quickly create a minimal rootfs image: # wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz # qemu-img create -f raw alpine-rootfs-x86_64.raw 1G # sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw # sudo mkfs.ext4 /dev/loop0 # sudo mount /dev/loop0 /mnt # sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt # sudo umount /mnt # sudo losetup -d /dev/loop0 The rootfs will be missing openrc, so you'll need to add "init=/bin/sh" to the command line. Sergio. > Thanks in advance, > Jing > >> Sergio. >> >>> Thanks very much. >>> Jing >>> >>> >>> >>>> A QEMU instance with the microvm machine type can be invoked this way: >>>> >>>> - Normal mode: >>>> >>>> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ >>>> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ >>>> -nodefaults -no-user-config \ >>>> -chardev pty,id=virtiocon0,server \ >>>> -device virtio-serial-device \ >>>> -device virtconsole,chardev=virtiocon0 \ >>>> -drive id=test,file=test.img,format=raw,if=none \ >>>> -device virtio-blk-device,drive=test \ >>>> -netdev tap,id=tap0,script=no,downscript=no \ >>>> -device virtio-net-device,netdev=tap0 >>>> >>>> - Legacy mode: >>>> >>>> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ >>>> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ >>>> -nodefaults -no-user-config \ >>>> -drive id=test,file=test.img,format=raw,if=none \ >>>> -device virtio-blk-device,drive=test \ >>>> -netdev tap,id=tap0,script=no,downscript=no \ >>>> -device virtio-net-device,netdev=tap0 \ >>>> -serial stdio >>>> >>
On 8/30/2019 10:27 PM, Sergio Lopez wrote: > > Jing Liu <jing2.liu@linux.intel.com> writes: > >> Hi Sergio, >> >> On 8/29/2019 11:46 PM, Sergio Lopez wrote: >>> >>> Jing Liu <jing2.liu@linux.intel.com> writes: >>> >>>> Hi Sergio, >>>> >>>> The idea is interesting and I tried to launch a guest by your >>>> guide but seems failed to me. I tried both legacy and normal modes, >>>> but the vncviewer connected and told me that: >>>> The vm has no graphic display device. >>>> All the screen in vnc is just black. >>> >>> The microvm machine type doesn't support any graphics device, so you >>> need to rely on the serial console. >> Got it. >> >>> >>>> kernel config: >>>> CONFIG_KVM_MMIO=y >>>> CONFIG_VIRTIO_MMIO=y >>>> >>>> I don't know if any specified kernel version/patch/config >>>> is needed or anything I missed. >>>> Could you kindly give some tips? >>> >>> I'm testing it with upstream vanilla Linux. In addition to MMIO, you >>> need to add support for PVH (the next version of this patchset, v4, will >>> support booting from FW, so it'll be possible to use non-PVH ELF kernels >>> and bzImages too). >>> >>> I've just uploaded a working kernel config here: >>> >>> https://gist.github.com/slp/1060ba3aaf708584572ad4109f28c8f9 >>> >> Thanks very much and this config is helpful to me. >> >>> As for the QEMU command line, something like this should do the trick: >>> >>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm,legacy -kernel vmlinux -append "earlyprintk=ttyS0 console=ttyS0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial stdio >>> >>> If this works, you can move to non-legacy mode with a virtio-console: >>> >>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 >>> >> I tried the above two ways and it works now. Thanks! >> >>> If is still working, you can try adding some devices too: >>> >>> ./x86_64-softmmu/qemu-system-x86_64 -smp 1 -m 1g -enable-kvm -M microvm -kernel vmlinux -append "console=hvc0 reboot=k panic=1 root=/dev/vda" -nodefaults -no-user-config -nographic -serial pty -chardev stdio,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -netdev user,id=testnet -device virtio-net-device,netdev=testnet -drive id=test,file=alpine-rootfs-x86_64.raw,format=raw,if=none -device virtio-blk-device,drive=test >>> >> But I'm wondering why the image I used can not be found. >> root=/dev/vda3 and the same image worked well on normal qemu/guest- >> config bootup, but didn't work here. The details are, >> >> -append "console=hvc0 reboot=k panic=1 root=/dev/vda3 rw rootfstype=ext4" \ >> >> [ 0.022784] Key type encrypted registered >> [ 0.022988] VFS: Cannot open root device "vda3" or >> unknown-block(254,3): error -6 >> [ 0.023041] Please append a correct "root=" boot option; here are >> the available partitions: >> [ 0.023089] fe00 8946688 vda >> [ 0.023090] driver: virtio_blk >> [ 0.023143] Kernel panic - not syncing: VFS: Unable to mount root >> fs on unknown-block(254,3) >> [ 0.023201] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc3 #23 >> >> >> BTW, root=/dev/vda is also tried and didn't work. The dmesg is a >> little different: >> >> [ 0.028050] Key type encrypted registered >> [ 0.028484] List of all partitions: >> [ 0.028529] fe00 8946688 vda >> [ 0.028529] driver: virtio_blk >> [ 0.028615] No filesystem could mount root, tried: >> [ 0.028616] ext4 >> [ 0.028670] >> [ 0.028712] Kernel panic - not syncing: VFS: Unable to mount root >> fs on unknown-block(254,0) >> >> I tried another ext4 img but still doesn't work. >> Is there any limitation of blk image? Could I copy your image for simple >> test? > > The kernel config I posted lacks support for DOS partitions. Adding > CONFIG_MSDOS_PARTITION=y should allow you to boot from /dev/vda3. > > Anyway, in case you also want to try booting from /dev/vda (without > partitions), this is the recipe I use to quickly create a minimal rootfs > image: > > # wget http://dl-cdn.alpinelinux.org/alpine/v3.10/releases/x86_64/alpine-minirootfs-3.10.2-x86_64.tar.gz > # qemu-img create -f raw alpine-rootfs-x86_64.raw 1G > # sudo losetup /dev/loop0 alpine-rootfs-x86_64.raw > # sudo mkfs.ext4 /dev/loop0 > # sudo mount /dev/loop0 /mnt > # sudo tar xpf alpine-minirootfs-3.10.2-x86_64.tar.gz -C /mnt > # sudo umount /mnt > # sudo losetup -d /dev/loop0 > > The rootfs will be missing openrc, so you'll need to add "init=/bin/sh" > to the command line. > Thank you Sergio. I'll try that. Jing > Sergio. > >> Thanks in advance, >> Jing >> >>> Sergio.
On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > Microvm is a machine type inspired by both NEMU and Firecracker, and > constructed after the machine model implemented by the latter. > > It's main purpose is providing users a KVM-only machine type with fast > boot times, minimal attack surface (measured as the number of IO ports > and MMIO regions exposed to the Guest) and small footprint (specially > when combined with the ongoing QEMU modularization effort). > > Normally, other than the device support provided by KVM itself, > microvm only supports virtio-mmio devices. Microvm also includes a > legacy mode, which adds an ISA bus with a 16550A serial port, useful > for being able to see the early boot kernel messages. > > Microvm only supports booting PVH-enabled Linux ELF images. Booting > other PVH-enabled kernels may be possible, but due to the lack of ACPI > and firmware, we're relying on the command line for specifying the > location of the virtio-mmio transports. If there's an interest on > using this machine type with other kernels, we'll try to find some > kind of middle ground solution. > > This is the list of the exposed IO ports and MMIO regions when running > in non-legacy mode: > > address-space: memory > 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio > 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio > 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio > 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio > 00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio > 00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio > 00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio > 00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio > 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi > > address-space: I/O > 0000000000000000-000000000000ffff (prio 0, i/o): io > 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic > 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit > 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic > 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic > 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr > 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr > > A QEMU instance with the microvm machine type can be invoked this way: > > - Normal mode: > > qemu-system-x86_64 -M microvm -m 512m -smp 2 \ > -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ > -nodefaults -no-user-config \ > -chardev pty,id=virtiocon0,server \ > -device virtio-serial-device \ > -device virtconsole,chardev=virtiocon0 \ > -drive id=test,file=test.img,format=raw,if=none \ > -device virtio-blk-device,drive=test \ > -netdev tap,id=tap0,script=no,downscript=no \ > -device virtio-net-device,netdev=tap0 > > - Legacy mode: > > qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ > -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ > -nodefaults -no-user-config \ > -drive id=test,file=test.img,format=raw,if=none \ > -device virtio-blk-device,drive=test \ > -netdev tap,id=tap0,script=no,downscript=no \ > -device virtio-net-device,netdev=tap0 \ > -serial stdio Please post metrics that compare this against a minimal Q35. With qboot it was later found that SeaBIOS can achieve comparable boot times, so it wasn't worth maintaining qboot. Data is needed to show that microvm is really a significant improvement over a minimal Q35. Stefan
Stefan Hajnoczi <stefanha@gmail.com> writes: > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: >> Microvm is a machine type inspired by both NEMU and Firecracker, and >> constructed after the machine model implemented by the latter. >> >> It's main purpose is providing users a KVM-only machine type with fast >> boot times, minimal attack surface (measured as the number of IO ports >> and MMIO regions exposed to the Guest) and small footprint (specially >> when combined with the ongoing QEMU modularization effort). >> >> Normally, other than the device support provided by KVM itself, >> microvm only supports virtio-mmio devices. Microvm also includes a >> legacy mode, which adds an ISA bus with a 16550A serial port, useful >> for being able to see the early boot kernel messages. >> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting >> other PVH-enabled kernels may be possible, but due to the lack of ACPI >> and firmware, we're relying on the command line for specifying the >> location of the virtio-mmio transports. If there's an interest on >> using this machine type with other kernels, we'll try to find some >> kind of middle ground solution. >> >> This is the list of the exposed IO ports and MMIO regions when running >> in non-legacy mode: >> >> address-space: memory >> 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio >> 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio >> 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio >> 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio >> 00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio >> 00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio >> 00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio >> 00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi >> >> address-space: I/O >> 0000000000000000-000000000000ffff (prio 0, i/o): io >> 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic >> 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic >> 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic >> 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr >> 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr >> >> A QEMU instance with the microvm machine type can be invoked this way: >> >> - Normal mode: >> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ >> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ >> -nodefaults -no-user-config \ >> -chardev pty,id=virtiocon0,server \ >> -device virtio-serial-device \ >> -device virtconsole,chardev=virtiocon0 \ >> -drive id=test,file=test.img,format=raw,if=none \ >> -device virtio-blk-device,drive=test \ >> -netdev tap,id=tap0,script=no,downscript=no \ >> -device virtio-net-device,netdev=tap0 >> >> - Legacy mode: >> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ >> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ >> -nodefaults -no-user-config \ >> -drive id=test,file=test.img,format=raw,if=none \ >> -device virtio-blk-device,drive=test \ >> -netdev tap,id=tap0,script=no,downscript=no \ >> -device virtio-net-device,netdev=tap0 \ >> -serial stdio > > Please post metrics that compare this against a minimal Q35. > > With qboot it was later found that SeaBIOS can achieve comparable boot > times, so it wasn't worth maintaining qboot. > > Data is needed to show that microvm is really a significant improvement > over a minimal Q35. I've just ran some numbers using Stefano Garzarella's qemu-boot-time scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with minimal features [2]. The VM boots a minimal kernel [3] without initrd, using a kata container image as root via virtio-blk (though this isn't really relevant, as we're just taking measurements until the kernel is about to exec init). To try to make the comparison as fair as possible, I've used a minimal q35 machine with as few devices as possible. Disabling HPET and PIT at the same time caused the kernel to get stuck on boot, so I ran two iterations, one without HPET and the other without PIT: ----------------- | Q35 with HPET | ----------------- Command line: ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test Average boot times after 10 consecutive runs: qemu_init_end: 77.637936 linux_start_kernel: 117.082526 (+39.44459) linux_start_user: 364.629972 (+247.547446) Memory tree: address-space: memory 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff 0000000000000000-ffffffffffffffff (prio -1, i/o): pci 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix 00000000febff000-00000000febff01f (prio 0, i/o): msix-table 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic 00000000fed00000-00000000fed003ff (prio 0, i/o): hpet 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi address-space: I/O 0000000000000000-000000000000ffff (prio 0, i/o): io 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan 0000000000000008-000000000000000f (prio 0, i/o): dma-cont 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd 0000000000000070-0000000000000071 (prio 0, i/o): rtc 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 0000000000000081-0000000000000083 (prio 0, i/o): dma-page 0000000000000087-0000000000000087 (prio 0, i/o): dma-page 0000000000000089-000000000000008b (prio 0, i/o): dma-page 000000000000008f-000000000000008f (prio 0, i/o): dma-page 0000000000000092-0000000000000092 (prio 0, i/o): port92 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi 0000000000000660-000000000000067f (prio 0, i/o): sm-tco 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci ---------------- | Q35 with PIT | ---------------- Command line: ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test Average boot times after 10 consecutive runs: qemu_init_end: 77.467852 linux_start_kernel: 116.688472 (+39.22062) linux_start_user: 363.033365 (+246.344893) Memory tree: address-space: memory 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff 0000000000000000-ffffffffffffffff (prio -1, i/o): pci 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix 00000000febff000-00000000febff01f (prio 0, i/o): msix-table 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi address-space: I/O 0000000000000000-000000000000ffff (prio 0, i/o): io 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan 0000000000000008-000000000000000f (prio 0, i/o): dma-cont 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data 0000000000000061-0000000000000061 (prio 0, i/o): pcspk 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd 0000000000000070-0000000000000071 (prio 0, i/o): rtc 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 0000000000000081-0000000000000083 (prio 0, i/o): dma-page 0000000000000087-0000000000000087 (prio 0, i/o): dma-page 0000000000000089-000000000000008b (prio 0, i/o): dma-page 000000000000008f-000000000000008f (prio 0, i/o): dma-page 0000000000000092-0000000000000092 (prio 0, i/o): port92 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi 0000000000000660-000000000000067f (prio 0, i/o): sm-tco 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci ----------- | microvm | ----------- Command line: ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test Average boot times after 10 consecutive runs: qemu_init_end: 64.043264 linux_start_kernel: 65.481782 (+1.438518) linux_start_user: 114.938353 (+49.456571) Memory tree: address-space: memory 0000000000000000-ffffffffffffffff (prio 0, i/o): system 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi address-space: I/O 0000000000000000-000000000000ffff (prio 0, i/o): io 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic -------------- | Conclusion | -------------- The average boot time of microvm is a third of Q35's (115ms vs. 363ms), and is smaller on all sections (QEMU initialization, firmware overhead and kernel start-to-user). Microvm's memory tree is also visibly simpler, significantly reducing the exposed surface to the guest. While we can certainly work on making Q35 smaller, I definitely think it's better (and way safer!) having a specialized machine type for a specific use case, than a minimal Q35 whose behavior significantly diverges from a conventional Q35. Sergio. [1] https://github.com/stefano-garzarella/qemu-boot-time [2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw [3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA
On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > >> Microvm is a machine type inspired by both NEMU and Firecracker, and > >> constructed after the machine model implemented by the latter. > >> > >> It's main purpose is providing users a KVM-only machine type with fast > >> boot times, minimal attack surface (measured as the number of IO ports > >> and MMIO regions exposed to the Guest) and small footprint (specially > >> when combined with the ongoing QEMU modularization effort). > >> > >> Normally, other than the device support provided by KVM itself, > >> microvm only supports virtio-mmio devices. Microvm also includes a > >> legacy mode, which adds an ISA bus with a 16550A serial port, useful > >> for being able to see the early boot kernel messages. > >> > >> Microvm only supports booting PVH-enabled Linux ELF images. Booting > >> other PVH-enabled kernels may be possible, but due to the lack of ACPI > >> and firmware, we're relying on the command line for specifying the > >> location of the virtio-mmio transports. If there's an interest on > >> using this machine type with other kernels, we'll try to find some > >> kind of middle ground solution. > >> > >> This is the list of the exposed IO ports and MMIO regions when running > >> in non-legacy mode: > >> > >> address-space: memory > >> 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio > >> 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio > >> 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio > >> 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio > >> 00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio > >> 00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio > >> 00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio > >> 00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio > >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi > >> > >> address-space: I/O > >> 0000000000000000-000000000000ffff (prio 0, i/o): io > >> 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic > >> 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit > >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic > >> 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic > >> 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr > >> 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr > >> > >> A QEMU instance with the microvm machine type can be invoked this way: > >> > >> - Normal mode: > >> > >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ > >> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ > >> -nodefaults -no-user-config \ > >> -chardev pty,id=virtiocon0,server \ > >> -device virtio-serial-device \ > >> -device virtconsole,chardev=virtiocon0 \ > >> -drive id=test,file=test.img,format=raw,if=none \ > >> -device virtio-blk-device,drive=test \ > >> -netdev tap,id=tap0,script=no,downscript=no \ > >> -device virtio-net-device,netdev=tap0 > >> > >> - Legacy mode: > >> > >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ > >> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ > >> -nodefaults -no-user-config \ > >> -drive id=test,file=test.img,format=raw,if=none \ > >> -device virtio-blk-device,drive=test \ > >> -netdev tap,id=tap0,script=no,downscript=no \ > >> -device virtio-net-device,netdev=tap0 \ > >> -serial stdio > > > > Please post metrics that compare this against a minimal Q35. > > > > With qboot it was later found that SeaBIOS can achieve comparable boot > > times, so it wasn't worth maintaining qboot. > > > > Data is needed to show that microvm is really a significant improvement > > over a minimal Q35. > > I've just ran some numbers using Stefano Garzarella's qemu-boot-time > scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the > upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with > minimal features [2]. The VM boots a minimal kernel [3] without initrd, > using a kata container image as root via virtio-blk (though this isn't > really relevant, as we're just taking measurements until the kernel is > about to exec init). > > To try to make the comparison as fair as possible, I've used a minimal > q35 machine with as few devices as possible. Disabling HPET and PIT at > the same time caused the kernel to get stuck on boot, so I ran two > iterations, one without HPET and the other without PIT: > > > ----------------- > | Q35 with HPET | > ----------------- > > Command line: > > ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test > > Average boot times after 10 consecutive runs: > > qemu_init_end: 77.637936 > linux_start_kernel: 117.082526 (+39.44459) > linux_start_user: 364.629972 (+247.547446) > > Memory tree: > > address-space: memory > 0000000000000000-ffffffffffffffff (prio 0, i/o): system > 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff > 0000000000000000-ffffffffffffffff (prio -1, i/o): pci > 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom > 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff > 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci > 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common > 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr > 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device > 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify > 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci > 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common > 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr > 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device > 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify > 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix > 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table > 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba > 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix > 00000000febff000-00000000febff01f (prio 0, i/o): msix-table > 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba > 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios > 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff > 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] > 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] > 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio > 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic > 00000000fed00000-00000000fed003ff (prio 0, i/o): hpet > 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio > 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] > 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi > > address-space: I/O > 0000000000000000-000000000000ffff (prio 0, i/o): io > 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan > 0000000000000008-000000000000000f (prio 0, i/o): dma-cont > 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic > 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data > 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd > 0000000000000070-0000000000000071 (prio 0, i/o): rtc > 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index > 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic > 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 > 0000000000000081-0000000000000083 (prio 0, i/o): dma-page > 0000000000000087-0000000000000087 (prio 0, i/o): dma-page > 0000000000000089-000000000000008b (prio 0, i/o): dma-page > 000000000000008f-000000000000008f (prio 0, i/o): dma-page > 0000000000000092-0000000000000092 (prio 0, i/o): port92 > 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic > 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io > 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan > 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont > 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 > 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr > 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr > 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg > 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma > 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm > 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt > 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt > 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr > 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 > 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi > 0000000000000660-000000000000067f (prio 0, i/o): sm-tco > 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug > 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx > 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control > 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data > 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci > 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci > > > ---------------- > | Q35 with PIT | > ---------------- > > Command line: > > ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test > > Average boot times after 10 consecutive runs: > > qemu_init_end: 77.467852 > linux_start_kernel: 116.688472 (+39.22062) > linux_start_user: 363.033365 (+246.344893) > > Memory tree: > > address-space: memory > 0000000000000000-ffffffffffffffff (prio 0, i/o): system > 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff > 0000000000000000-ffffffffffffffff (prio -1, i/o): pci > 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom > 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff > 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci > 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common > 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr > 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device > 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify > 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci > 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common > 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr > 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device > 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify > 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix > 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table > 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba > 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix > 00000000febff000-00000000febff01f (prio 0, i/o): msix-table > 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba > 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios > 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff > 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff > 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff > 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff > 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff > 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff > 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff > 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff > 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff > 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff > 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff > 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] > 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] > 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff > 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] > 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] > 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio > 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic > 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio > 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] > 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi > > address-space: I/O > 0000000000000000-000000000000ffff (prio 0, i/o): io > 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan > 0000000000000008-000000000000000f (prio 0, i/o): dma-cont > 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic > 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit > 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data > 0000000000000061-0000000000000061 (prio 0, i/o): pcspk > 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd > 0000000000000070-0000000000000071 (prio 0, i/o): rtc > 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index > 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic > 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 > 0000000000000081-0000000000000083 (prio 0, i/o): dma-page > 0000000000000087-0000000000000087 (prio 0, i/o): dma-page > 0000000000000089-000000000000008b (prio 0, i/o): dma-page > 000000000000008f-000000000000008f (prio 0, i/o): dma-page > 0000000000000092-0000000000000092 (prio 0, i/o): port92 > 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic > 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io > 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan > 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont > 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 > 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr > 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr > 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg > 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma > 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm > 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt > 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt > 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr > 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 > 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi > 0000000000000660-000000000000067f (prio 0, i/o): sm-tco > 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug > 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx > 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control > 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data > 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci > 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci > > > ----------- > | microvm | > ----------- > > Command line: > > ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test > > Average boot times after 10 consecutive runs: > > qemu_init_end: 64.043264 > linux_start_kernel: 65.481782 (+1.438518) > linux_start_user: 114.938353 (+49.456571) > > Memory tree: > > address-space: memory > 0000000000000000-ffffffffffffffff (prio 0, i/o): system > 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff > 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio > 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio > 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio > 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio > 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic > 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi > > address-space: I/O > 0000000000000000-000000000000ffff (prio 0, i/o): io > 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic > > > -------------- > | Conclusion | > -------------- > > The average boot time of microvm is a third of Q35's (115ms vs. 363ms), > and is smaller on all sections (QEMU initialization, firmware overhead > and kernel start-to-user). > > Microvm's memory tree is also visibly simpler, significantly reducing > the exposed surface to the guest. > > While we can certainly work on making Q35 smaller, I definitely think > it's better (and way safer!) having a specialized machine type for a > specific use case, than a minimal Q35 whose behavior significantly > diverges from a conventional Q35. Interesting, so not a 10x difference! This might be amenable to optimization. My concern with microvm is that it's so limited that few users will be able to benefit from the reduced attack surface and faster startup time. I think it's worth investigating slimming down Q35 further first. In terms of startup time the first step would be profiling Q35 kernel startup to find out what's taking so long (firmware initialization, PCI probing, etc)? > Sergio. > > [1] https://github.com/stefano-garzarella/qemu-boot-time > [2] https://paste.fedoraproject.org/paste/YZ9Ok-dJtQrc0xxctFm-nw > [3] https://paste.fedoraproject.org/paste/sck0jfioAJdMq51HH6wkmA
Stefan Hajnoczi <stefanha@gmail.com> writes: > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: >> >> Stefan Hajnoczi <stefanha@gmail.com> writes: >> >> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: >> >> Microvm is a machine type inspired by both NEMU and Firecracker, and >> >> constructed after the machine model implemented by the latter. >> >> >> >> It's main purpose is providing users a KVM-only machine type with fast >> >> boot times, minimal attack surface (measured as the number of IO ports >> >> and MMIO regions exposed to the Guest) and small footprint (specially >> >> when combined with the ongoing QEMU modularization effort). >> >> >> >> Normally, other than the device support provided by KVM itself, >> >> microvm only supports virtio-mmio devices. Microvm also includes a >> >> legacy mode, which adds an ISA bus with a 16550A serial port, useful >> >> for being able to see the early boot kernel messages. >> >> >> >> Microvm only supports booting PVH-enabled Linux ELF images. Booting >> >> other PVH-enabled kernels may be possible, but due to the lack of ACPI >> >> and firmware, we're relying on the command line for specifying the >> >> location of the virtio-mmio transports. If there's an interest on >> >> using this machine type with other kernels, we'll try to find some >> >> kind of middle ground solution. >> >> >> >> This is the list of the exposed IO ports and MMIO regions when running >> >> in non-legacy mode: >> >> >> >> address-space: memory >> >> 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio >> >> 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio >> >> 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio >> >> 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio >> >> 00000000d0000800-00000000d00009ff (prio 0, i/o): virtio-mmio >> >> 00000000d0000a00-00000000d0000bff (prio 0, i/o): virtio-mmio >> >> 00000000d0000c00-00000000d0000dff (prio 0, i/o): virtio-mmio >> >> 00000000d0000e00-00000000d0000fff (prio 0, i/o): virtio-mmio >> >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi >> >> >> >> address-space: I/O >> >> 0000000000000000-000000000000ffff (prio 0, i/o): io >> >> 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic >> >> 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit >> >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic >> >> 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic >> >> 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr >> >> 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr >> >> >> >> A QEMU instance with the microvm machine type can be invoked this way: >> >> >> >> - Normal mode: >> >> >> >> qemu-system-x86_64 -M microvm -m 512m -smp 2 \ >> >> -kernel vmlinux -append "console=hvc0 root=/dev/vda" \ >> >> -nodefaults -no-user-config \ >> >> -chardev pty,id=virtiocon0,server \ >> >> -device virtio-serial-device \ >> >> -device virtconsole,chardev=virtiocon0 \ >> >> -drive id=test,file=test.img,format=raw,if=none \ >> >> -device virtio-blk-device,drive=test \ >> >> -netdev tap,id=tap0,script=no,downscript=no \ >> >> -device virtio-net-device,netdev=tap0 >> >> >> >> - Legacy mode: >> >> >> >> qemu-system-x86_64 -M microvm,legacy -m 512m -smp 2 \ >> >> -kernel vmlinux -append "console=ttyS0 root=/dev/vda" \ >> >> -nodefaults -no-user-config \ >> >> -drive id=test,file=test.img,format=raw,if=none \ >> >> -device virtio-blk-device,drive=test \ >> >> -netdev tap,id=tap0,script=no,downscript=no \ >> >> -device virtio-net-device,netdev=tap0 \ >> >> -serial stdio >> > >> > Please post metrics that compare this against a minimal Q35. >> > >> > With qboot it was later found that SeaBIOS can achieve comparable boot >> > times, so it wasn't worth maintaining qboot. >> > >> > Data is needed to show that microvm is really a significant improvement >> > over a minimal Q35. >> >> I've just ran some numbers using Stefano Garzarella's qemu-boot-time >> scripts [1] on a server with 2xIntel Xeon Silver 4114 2.20GHz, using the >> upstream QEMU (474f3938d79ab36b9231c9ad3b5a9314c2aeacde) built with >> minimal features [2]. The VM boots a minimal kernel [3] without initrd, >> using a kata container image as root via virtio-blk (though this isn't >> really relevant, as we're just taking measurements until the kernel is >> about to exec init). >> >> To try to make the comparison as fair as possible, I've used a minimal >> q35 machine with as few devices as possible. Disabling HPET and PIT at >> the same time caused the kernel to get stuck on boot, so I ran two >> iterations, one without HPET and the other without PIT: >> >> >> ----------------- >> | Q35 with HPET | >> ----------------- >> >> Command line: >> >> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=off,vmport=off,sata=off,usb=off,graphics=off -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test >> >> Average boot times after 10 consecutive runs: >> >> qemu_init_end: 77.637936 >> linux_start_kernel: 117.082526 (+39.44459) >> linux_start_user: 364.629972 (+247.547446) >> >> Memory tree: >> >> address-space: memory >> 0000000000000000-ffffffffffffffff (prio 0, i/o): system >> 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff >> 0000000000000000-ffffffffffffffff (prio -1, i/o): pci >> 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom >> 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff >> 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci >> 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common >> 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr >> 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device >> 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify >> 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci >> 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common >> 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr >> 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device >> 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify >> 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix >> 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table >> 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba >> 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix >> 00000000febff000-00000000febff01f (prio 0, i/o): msix-table >> 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba >> 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios >> 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff >> 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] >> 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] >> 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio >> 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic >> 00000000fed00000-00000000fed003ff (prio 0, i/o): hpet >> 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio >> 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi >> >> address-space: I/O >> 0000000000000000-000000000000ffff (prio 0, i/o): io >> 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan >> 0000000000000008-000000000000000f (prio 0, i/o): dma-cont >> 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic >> 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data >> 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd >> 0000000000000070-0000000000000071 (prio 0, i/o): rtc >> 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic >> 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 >> 0000000000000081-0000000000000083 (prio 0, i/o): dma-page >> 0000000000000087-0000000000000087 (prio 0, i/o): dma-page >> 0000000000000089-000000000000008b (prio 0, i/o): dma-page >> 000000000000008f-000000000000008f (prio 0, i/o): dma-page >> 0000000000000092-0000000000000092 (prio 0, i/o): port92 >> 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic >> 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io >> 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan >> 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont >> 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 >> 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr >> 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr >> 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg >> 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma >> 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm >> 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt >> 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt >> 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr >> 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 >> 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi >> 0000000000000660-000000000000067f (prio 0, i/o): sm-tco >> 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug >> 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx >> 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control >> 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data >> 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci >> 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci >> >> >> ---------------- >> | Q35 with PIT | >> ---------------- >> >> Command line: >> >> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M q35,smbus=off,nvdimm=off,pit=on,vmport=off,sata=off,usb=off,graphics=off -no-hpet -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk,drive=test >> >> Average boot times after 10 consecutive runs: >> >> qemu_init_end: 77.467852 >> linux_start_kernel: 116.688472 (+39.22062) >> linux_start_user: 363.033365 (+246.344893) >> >> Memory tree: >> >> address-space: memory >> 0000000000000000-ffffffffffffffff (prio 0, i/o): system >> 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @pc.ram 0000000000000000-000000001fffffff >> 0000000000000000-ffffffffffffffff (prio -1, i/o): pci >> 00000000000c0000-00000000000dffff (prio 1, rom): pc.rom >> 00000000000e0000-00000000000fffff (prio 1, i/o): alias isa-bios @pc.bios 0000000000020000-000000000003ffff >> 00000000febf4000-00000000febf7fff (prio 1, i/o): virtio-pci >> 00000000febf4000-00000000febf4fff (prio 0, i/o): virtio-pci-common >> 00000000febf5000-00000000febf5fff (prio 0, i/o): virtio-pci-isr >> 00000000febf6000-00000000febf6fff (prio 0, i/o): virtio-pci-device >> 00000000febf7000-00000000febf7fff (prio 0, i/o): virtio-pci-notify >> 00000000febf8000-00000000febfbfff (prio 1, i/o): virtio-pci >> 00000000febf8000-00000000febf8fff (prio 0, i/o): virtio-pci-common >> 00000000febf9000-00000000febf9fff (prio 0, i/o): virtio-pci-isr >> 00000000febfa000-00000000febfafff (prio 0, i/o): virtio-pci-device >> 00000000febfb000-00000000febfbfff (prio 0, i/o): virtio-pci-notify >> 00000000febfe000-00000000febfefff (prio 1, i/o): virtio-serial-pci-msix >> 00000000febfe000-00000000febfe01f (prio 0, i/o): msix-table >> 00000000febfe800-00000000febfe807 (prio 0, i/o): msix-pba >> 00000000febff000-00000000febfffff (prio 1, i/o): virtio-blk-pci-msix >> 00000000febff000-00000000febff01f (prio 0, i/o): msix-table >> 00000000febff800-00000000febff807 (prio 0, i/o): msix-pba >> 00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios >> 00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 00000000000a0000-00000000000bffff >> 00000000000c0000-00000000000c2fff (prio 1000, i/o): alias kvmvapic-rom @pc.ram 00000000000c0000-00000000000c2fff >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c0000-00000000000c3fff >> 00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 00000000000c0000-00000000000c3fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c4000-00000000000c7fff >> 00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 00000000000c4000-00000000000c7fff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000c8000-00000000000cbfff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000c8000-00000000000cbfff [disabled] >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000c8000-00000000000cbfff >> 00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 00000000000c8000-00000000000cbfff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000cc000-00000000000cffff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000cc000-00000000000cffff [disabled] >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000cc000-00000000000cffff >> 00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 00000000000cc000-00000000000cffff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d0000-00000000000d3fff >> 00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 00000000000d0000-00000000000d3fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d4000-00000000000d7fff >> 00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 00000000000d4000-00000000000d7fff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000d8000-00000000000dbfff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000d8000-00000000000dbfff [disabled] >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000d8000-00000000000dbfff >> 00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 00000000000d8000-00000000000dbfff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000dc000-00000000000dffff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000dc000-00000000000dffff [disabled] >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000dc000-00000000000dffff >> 00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 00000000000dc000-00000000000dffff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e0000-00000000000e3fff >> 00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 00000000000e0000-00000000000e3fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e4000-00000000000e7fff >> 00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 00000000000e4000-00000000000e7fff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-ram @pc.ram 00000000000e8000-00000000000ebfff >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pc.ram 00000000000e8000-00000000000ebfff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-rom @pc.ram 00000000000e8000-00000000000ebfff [disabled] >> 00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 00000000000e8000-00000000000ebfff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-ram @pc.ram 00000000000ec000-00000000000effff >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pc.ram 00000000000ec000-00000000000effff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-rom @pc.ram 00000000000ec000-00000000000effff [disabled] >> 00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 00000000000ec000-00000000000effff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-ram @pc.ram 00000000000f0000-00000000000fffff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pc.ram 00000000000f0000-00000000000fffff [disabled] >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-rom @pc.ram 00000000000f0000-00000000000fffff >> 00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 00000000000f0000-00000000000fffff [disabled] >> 0000000020000000-0000000020000000 (prio 1, i/o): tseg-blackhole [disabled] >> 00000000b0000000-00000000bfffffff (prio 0, i/o): pcie-mmcfg-mmio >> 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic >> 00000000fed1c000-00000000fed1ffff (prio 1, i/o): lpc-rcrb-mmio >> 00000000feda0000-00000000fedbffff (prio 1, i/o): alias smram-open-high @pc.ram 00000000000a0000-00000000000bffff [disabled] >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi >> >> address-space: I/O >> 0000000000000000-000000000000ffff (prio 0, i/o): io >> 0000000000000000-0000000000000007 (prio 0, i/o): dma-chan >> 0000000000000008-000000000000000f (prio 0, i/o): dma-cont >> 0000000000000020-0000000000000021 (prio 0, i/o): kvm-pic >> 0000000000000040-0000000000000043 (prio 0, i/o): kvm-pit >> 0000000000000060-0000000000000060 (prio 0, i/o): i8042-data >> 0000000000000061-0000000000000061 (prio 0, i/o): pcspk >> 0000000000000064-0000000000000064 (prio 0, i/o): i8042-cmd >> 0000000000000070-0000000000000071 (prio 0, i/o): rtc >> 0000000000000070-0000000000000070 (prio 0, i/o): rtc-index >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic >> 0000000000000080-0000000000000080 (prio 0, i/o): ioport80 >> 0000000000000081-0000000000000083 (prio 0, i/o): dma-page >> 0000000000000087-0000000000000087 (prio 0, i/o): dma-page >> 0000000000000089-000000000000008b (prio 0, i/o): dma-page >> 000000000000008f-000000000000008f (prio 0, i/o): dma-page >> 0000000000000092-0000000000000092 (prio 0, i/o): port92 >> 00000000000000a0-00000000000000a1 (prio 0, i/o): kvm-pic >> 00000000000000b2-00000000000000b3 (prio 0, i/o): apm-io >> 00000000000000c0-00000000000000cf (prio 0, i/o): dma-chan >> 00000000000000d0-00000000000000df (prio 0, i/o): dma-cont >> 00000000000000f0-00000000000000f0 (prio 0, i/o): ioportF0 >> 00000000000004d0-00000000000004d0 (prio 0, i/o): kvm-elcr >> 00000000000004d1-00000000000004d1 (prio 0, i/o): kvm-elcr >> 0000000000000510-0000000000000511 (prio 0, i/o): fwcfg >> 0000000000000514-000000000000051b (prio 0, i/o): fwcfg.dma >> 0000000000000600-000000000000067f (prio 0, i/o): ich9-pm >> 0000000000000600-0000000000000603 (prio 0, i/o): acpi-evt >> 0000000000000604-0000000000000605 (prio 0, i/o): acpi-cnt >> 0000000000000608-000000000000060b (prio 0, i/o): acpi-tmr >> 0000000000000620-000000000000062f (prio 0, i/o): acpi-gpe0 >> 0000000000000630-0000000000000637 (prio 0, i/o): acpi-smi >> 0000000000000660-000000000000067f (prio 0, i/o): sm-tco >> 0000000000000cd8-0000000000000ce3 (prio 0, i/o): acpi-mem-hotplug >> 0000000000000cf8-0000000000000cfb (prio 0, i/o): pci-conf-idx >> 0000000000000cf9-0000000000000cf9 (prio 1, i/o): lpc-reset-control >> 0000000000000cfc-0000000000000cff (prio 0, i/o): pci-conf-data >> 000000000000c000-000000000000c07f (prio 1, i/o): virtio-pci >> 000000000000c080-000000000000c0bf (prio 1, i/o): virtio-pci >> >> >> ----------- >> | microvm | >> ----------- >> >> Command line: >> >> ./x86_64-softmmu/qemu-system-x86_64 -m 512m -enable-kvm -M microvm -kernel /root/src/images/vmlinux-5.2 -append "console=hvc0 reboot=k panic=1 root=/dev/vda quiet" -smp 1 -nodefaults -no-user-config -chardev pty,id=virtiocon0,server -device virtio-serial-device -device virtconsole,chardev=virtiocon0 -drive id=test,file=/root/src/images/hello-rootfs.ext4,format=raw,if=none -device virtio-blk-device,drive=test >> >> Average boot times after 10 consecutive runs: >> >> qemu_init_end: 64.043264 >> linux_start_kernel: 65.481782 (+1.438518) >> linux_start_user: 114.938353 (+49.456571) >> >> Memory tree: >> >> address-space: memory >> 0000000000000000-ffffffffffffffff (prio 0, i/o): system >> 0000000000000000-000000001fffffff (prio 0, i/o): alias ram-below-4g @microvm.ram 0000000000000000-000000001fffffff >> 00000000d0000000-00000000d00001ff (prio 0, i/o): virtio-mmio >> 00000000d0000200-00000000d00003ff (prio 0, i/o): virtio-mmio >> 00000000d0000400-00000000d00005ff (prio 0, i/o): virtio-mmio >> 00000000d0000600-00000000d00007ff (prio 0, i/o): virtio-mmio >> 00000000fec00000-00000000fec00fff (prio 0, i/o): kvm-ioapic >> 00000000fee00000-00000000feefffff (prio 4096, i/o): kvm-apic-msi >> >> address-space: I/O >> 0000000000000000-000000000000ffff (prio 0, i/o): io >> 000000000000007e-000000000000007f (prio 0, i/o): kvmvapic >> >> >> -------------- >> | Conclusion | >> -------------- >> >> The average boot time of microvm is a third of Q35's (115ms vs. 363ms), >> and is smaller on all sections (QEMU initialization, firmware overhead >> and kernel start-to-user). >> >> Microvm's memory tree is also visibly simpler, significantly reducing >> the exposed surface to the guest. >> >> While we can certainly work on making Q35 smaller, I definitely think >> it's better (and way safer!) having a specialized machine type for a >> specific use case, than a minimal Q35 whose behavior significantly >> diverges from a conventional Q35. > > Interesting, so not a 10x difference! This might be amenable to > optimization. > > My concern with microvm is that it's so limited that few users will be > able to benefit from the reduced attack surface and faster startup time. > I think it's worth investigating slimming down Q35 further first. > > In terms of startup time the first step would be profiling Q35 kernel > startup to find out what's taking so long (firmware initialization, PCI > probing, etc)? Some findings: 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a whooping 120ms by avoiding the APIC timer calibration at arch/x86/kernel/apic/apic.c:calibrate_APIC_clock Average boot time with "-cpu host" qemu_init_end: 76.408950 linux_start_kernel: 116.166142 (+39.757192) linux_start_user: 242.954347 (+126.788205) Average boot time with default "cpu" qemu_init_end: 77.467852 linux_start_kernel: 116.688472 (+39.22062) linux_start_user: 363.033365 (+246.344893) 2. The other 130ms are a direct result of PCI and ACPI presence (tested with a kernel without support for those elements). I'll publish some detailed numbers next week. Sergio.
On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > >> > >> Stefan Hajnoczi <stefanha@gmail.com> writes: > >> > >> > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > >> -------------- > >> | Conclusion | > >> -------------- > >> > >> The average boot time of microvm is a third of Q35's (115ms vs. 363ms), > >> and is smaller on all sections (QEMU initialization, firmware overhead > >> and kernel start-to-user). > >> > >> Microvm's memory tree is also visibly simpler, significantly reducing > >> the exposed surface to the guest. > >> > >> While we can certainly work on making Q35 smaller, I definitely think > >> it's better (and way safer!) having a specialized machine type for a > >> specific use case, than a minimal Q35 whose behavior significantly > >> diverges from a conventional Q35. > > > > Interesting, so not a 10x difference! This might be amenable to > > optimization. > > > > My concern with microvm is that it's so limited that few users will be > > able to benefit from the reduced attack surface and faster startup time. > > I think it's worth investigating slimming down Q35 further first. > > > > In terms of startup time the first step would be profiling Q35 kernel > > startup to find out what's taking so long (firmware initialization, PCI > > probing, etc)? > > Some findings: > > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") saves a > whooping 120ms by avoiding the APIC timer calibration at > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > > Average boot time with "-cpu host" > qemu_init_end: 76.408950 > linux_start_kernel: 116.166142 (+39.757192) > linux_start_user: 242.954347 (+126.788205) > > Average boot time with default "cpu" > qemu_init_end: 77.467852 > linux_start_kernel: 116.688472 (+39.22062) > linux_start_user: 363.033365 (+246.344893) \o/ > 2. The other 130ms are a direct result of PCI and ACPI presence (tested > with a kernel without support for those elements). I'll publish some > detailed numbers next week. Here are the Kata Containers kernel parameters: var kernelParams = []Param{ {"tsc", "reliable"}, {"no_timer_check", ""}, {"rcupdate.rcu_expedited", "1"}, {"i8042.direct", "1"}, {"i8042.dumbkbd", "1"}, {"i8042.nopnp", "1"}, {"i8042.noaux", "1"}, {"noreplace-smp", ""}, {"reboot", "k"}, {"console", "hvc0"}, {"console", "hvc1"}, {"iommu", "off"}, {"cryptomgr.notests", ""}, {"net.ifnames", "0"}, {"pci", "lastbus=0"}, } pci lastbus=0 looks interesting and so do some of the others :). Stefan
On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: > On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > > > > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > > > > -------------- > > > > | Conclusion | > > > > -------------- > > > > > > > > The average boot time of microvm is a third of Q35's (115ms vs. > > > > 363ms), > > > > and is smaller on all sections (QEMU initialization, firmware > > > > overhead > > > > and kernel start-to-user). > > > > > > > > Microvm's memory tree is also visibly simpler, significantly > > > > reducing > > > > the exposed surface to the guest. > > > > > > > > While we can certainly work on making Q35 smaller, I definitely > > > > think > > > > it's better (and way safer!) having a specialized machine type > > > > for a > > > > specific use case, than a minimal Q35 whose behavior > > > > significantly > > > > diverges from a conventional Q35. > > > > > > Interesting, so not a 10x difference! This might be amenable to > > > optimization. > > > > > > My concern with microvm is that it's so limited that few users > > > will be > > > able to benefit from the reduced attack surface and faster > > > startup time. > > > I think it's worth investigating slimming down Q35 further first. > > > > > > In terms of startup time the first step would be profiling Q35 > > > kernel > > > startup to find out what's taking so long (firmware > > > initialization, PCI > > > probing, etc)? > > > > Some findings: > > > > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") > > saves a > > whooping 120ms by avoiding the APIC timer calibration at > > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > > > > Average boot time with "-cpu host" > > qemu_init_end: 76.408950 > > linux_start_kernel: 116.166142 (+39.757192) > > linux_start_user: 242.954347 (+126.788205) > > > > Average boot time with default "cpu" > > qemu_init_end: 77.467852 > > linux_start_kernel: 116.688472 (+39.22062) > > linux_start_user: 363.033365 (+246.344893) > > \o/ > > > 2. The other 130ms are a direct result of PCI and ACPI presence > > (tested > > with a kernel without support for those elements). I'll publish > > some > > detailed numbers next week. > > Here are the Kata Containers kernel parameters: > > var kernelParams = []Param{ > {"tsc", "reliable"}, > {"no_timer_check", ""}, > {"rcupdate.rcu_expedited", "1"}, > {"i8042.direct", "1"}, > {"i8042.dumbkbd", "1"}, > {"i8042.nopnp", "1"}, > {"i8042.noaux", "1"}, > {"noreplace-smp", ""}, > {"reboot", "k"}, > {"console", "hvc0"}, > {"console", "hvc1"}, > {"iommu", "off"}, > {"cryptomgr.notests", ""}, > {"net.ifnames", "0"}, > {"pci", "lastbus=0"}, > } > > pci lastbus=0 looks interesting and so do some of the others :). > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, kernel won't scan the 255.. buses :) > Stefan >
Montes, Julio <julio.montes@intel.com> writes: > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: >> > Stefan Hajnoczi <stefanha@gmail.com> writes: >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes: >> > > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: >> > > > -------------- >> > > > | Conclusion | >> > > > -------------- >> > > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs. >> > > > 363ms), >> > > > and is smaller on all sections (QEMU initialization, firmware >> > > > overhead >> > > > and kernel start-to-user). >> > > > >> > > > Microvm's memory tree is also visibly simpler, significantly >> > > > reducing >> > > > the exposed surface to the guest. >> > > > >> > > > While we can certainly work on making Q35 smaller, I definitely >> > > > think >> > > > it's better (and way safer!) having a specialized machine type >> > > > for a >> > > > specific use case, than a minimal Q35 whose behavior >> > > > significantly >> > > > diverges from a conventional Q35. >> > > >> > > Interesting, so not a 10x difference! This might be amenable to >> > > optimization. >> > > >> > > My concern with microvm is that it's so limited that few users >> > > will be >> > > able to benefit from the reduced attack surface and faster >> > > startup time. >> > > I think it's worth investigating slimming down Q35 further first. >> > > >> > > In terms of startup time the first step would be profiling Q35 >> > > kernel >> > > startup to find out what's taking so long (firmware >> > > initialization, PCI >> > > probing, etc)? >> > >> > Some findings: >> > >> > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") >> > saves a >> > whooping 120ms by avoiding the APIC timer calibration at >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock >> > >> > Average boot time with "-cpu host" >> > qemu_init_end: 76.408950 >> > linux_start_kernel: 116.166142 (+39.757192) >> > linux_start_user: 242.954347 (+126.788205) >> > >> > Average boot time with default "cpu" >> > qemu_init_end: 77.467852 >> > linux_start_kernel: 116.688472 (+39.22062) >> > linux_start_user: 363.033365 (+246.344893) >> >> \o/ >> >> > 2. The other 130ms are a direct result of PCI and ACPI presence >> > (tested >> > with a kernel without support for those elements). I'll publish >> > some >> > detailed numbers next week. >> >> Here are the Kata Containers kernel parameters: >> >> var kernelParams = []Param{ >> {"tsc", "reliable"}, >> {"no_timer_check", ""}, >> {"rcupdate.rcu_expedited", "1"}, >> {"i8042.direct", "1"}, >> {"i8042.dumbkbd", "1"}, >> {"i8042.nopnp", "1"}, >> {"i8042.noaux", "1"}, >> {"noreplace-smp", ""}, >> {"reboot", "k"}, >> {"console", "hvc0"}, >> {"console", "hvc1"}, >> {"iommu", "off"}, >> {"cryptomgr.notests", ""}, >> {"net.ifnames", "0"}, >> {"pci", "lastbus=0"}, >> } >> >> pci lastbus=0 looks interesting and so do some of the others :). >> > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, > kernel won't scan the 255.. buses :) I can confirm that adding pci=lastbus=0 makes a significant improvement. In fact, is the only option from Kata's kernel parameter list that has an impact, probably because the kernel is already quite minimalistic. Average boot time with "-cpu host" and "pci=lastbus=0" qemu_init_end: 73.711569 linux_start_kernel: 113.414311 (+39.702742) linux_start_user: 190.949939 (+77.535628) That's still ~40% slower than microvm, and the breach quickly widens when adding more PCI devices (each one adds 10-15ms), but it's certainly an improvement over the original numbers. On the other hand, there isn't much we can do here from QEMU's perspective, as this is basically Guest OS tuning. Sergio.
On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote: > Montes, Julio <julio.montes@intel.com> writes: > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: > >> > Stefan Hajnoczi <stefanha@gmail.com> writes: > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > >> > > > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > >> > > > -------------- > >> > > > | Conclusion | > >> > > > -------------- > >> > > > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs. > >> > > > 363ms), > >> > > > and is smaller on all sections (QEMU initialization, firmware > >> > > > overhead > >> > > > and kernel start-to-user). > >> > > > > >> > > > Microvm's memory tree is also visibly simpler, significantly > >> > > > reducing > >> > > > the exposed surface to the guest. > >> > > > > >> > > > While we can certainly work on making Q35 smaller, I definitely > >> > > > think > >> > > > it's better (and way safer!) having a specialized machine type > >> > > > for a > >> > > > specific use case, than a minimal Q35 whose behavior > >> > > > significantly > >> > > > diverges from a conventional Q35. > >> > > > >> > > Interesting, so not a 10x difference! This might be amenable to > >> > > optimization. > >> > > > >> > > My concern with microvm is that it's so limited that few users > >> > > will be > >> > > able to benefit from the reduced attack surface and faster > >> > > startup time. > >> > > I think it's worth investigating slimming down Q35 further first. > >> > > > >> > > In terms of startup time the first step would be profiling Q35 > >> > > kernel > >> > > startup to find out what's taking so long (firmware > >> > > initialization, PCI > >> > > probing, etc)? > >> > > >> > Some findings: > >> > > >> > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") > >> > saves a > >> > whooping 120ms by avoiding the APIC timer calibration at > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > >> > > >> > Average boot time with "-cpu host" > >> > qemu_init_end: 76.408950 > >> > linux_start_kernel: 116.166142 (+39.757192) > >> > linux_start_user: 242.954347 (+126.788205) > >> > > >> > Average boot time with default "cpu" > >> > qemu_init_end: 77.467852 > >> > linux_start_kernel: 116.688472 (+39.22062) > >> > linux_start_user: 363.033365 (+246.344893) > >> > >> \o/ > >> > >> > 2. The other 130ms are a direct result of PCI and ACPI presence > >> > (tested > >> > with a kernel without support for those elements). I'll publish > >> > some > >> > detailed numbers next week. > >> > >> Here are the Kata Containers kernel parameters: > >> > >> var kernelParams = []Param{ > >> {"tsc", "reliable"}, > >> {"no_timer_check", ""}, > >> {"rcupdate.rcu_expedited", "1"}, > >> {"i8042.direct", "1"}, > >> {"i8042.dumbkbd", "1"}, > >> {"i8042.nopnp", "1"}, > >> {"i8042.noaux", "1"}, > >> {"noreplace-smp", ""}, > >> {"reboot", "k"}, > >> {"console", "hvc0"}, > >> {"console", "hvc1"}, > >> {"iommu", "off"}, > >> {"cryptomgr.notests", ""}, > >> {"net.ifnames", "0"}, > >> {"pci", "lastbus=0"}, > >> } > >> > >> pci lastbus=0 looks interesting and so do some of the others :). > >> > > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, > > kernel won't scan the 255.. buses :) > > I can confirm that adding pci=lastbus=0 makes a significant > improvement. In fact, is the only option from Kata's kernel parameter > list that has an impact, probably because the kernel is already quite > minimalistic. > > Average boot time with "-cpu host" and "pci=lastbus=0" > qemu_init_end: 73.711569 > linux_start_kernel: 113.414311 (+39.702742) > linux_start_user: 190.949939 (+77.535628) > > That's still ~40% slower than microvm, and the breach quickly widens > when adding more PCI devices (each one adds 10-15ms), but it's certainly > an improvement over the original numbers. > > On the other hand, there isn't much we can do here from QEMU's > perspective, as this is basically Guest OS tuning. fw_cfg could expose this information so guest kernels know when to stop enumerating the PCI bus. This would make all PCI guests with new kernels boot ~50 ms faster, regardless of machine type. The difference between microvm and tuned Q35 is 76 ms now. microvm: qemu_init_end: 64.043264 linux_start_kernel: 65.481782 (+1.438518) linux_start_user: 114.938353 (+49.456571) Q35 with -cpu host and pci=lasbus=0: qemu_init_end: 73.711569 linux_start_kernel: 113.414311 (+39.702742) linux_start_user: 190.949939 (+77.535628) There is a ~39 ms difference before linux_start_kernel. SeaBIOS is loading the PVH Option ROM. Stefano: any recommendations for profiling or tuning SeaBIOS? Stefan
On 23/07/19 11:47, Stefan Hajnoczi wrote: > fw_cfg could expose this information so guest kernels know when to > stop enumerating the PCI bus. This would make all PCI guests with new > kernels boot ~50 ms faster, regardless of machine type. The number of buses is determined by the firmware, not by QEMU, so fw_cfg would not be the right interface. In fact (as I have just learnt) lastbus is an x86-specific option that overrides the last bus returned by SeaBIOS's handle_1ab101. So the next step could be to figure out what is the lastbus returned by handle_1ab101 and possibly why it isn't zero. Paolo > The difference between microvm and tuned Q35 is 76 ms now. > > microvm: > qemu_init_end: 64.043264 > linux_start_kernel: 65.481782 (+1.438518) > linux_start_user: 114.938353 (+49.456571) > > Q35 with -cpu host and pci=lasbus=0: > qemu_init_end: 73.711569 > linux_start_kernel: 113.414311 (+39.702742) > linux_start_user: 190.949939 (+77.535628) > > There is a ~39 ms difference before linux_start_kernel. SeaBIOS is > loading the PVH Option ROM. > > Stefano: any recommendations for profiling or tuning SeaBIOS?
On 23/07/19 12:01, Paolo Bonzini wrote: > The number of buses is determined by the firmware, not by QEMU, so > fw_cfg would not be the right interface. In fact (as I have just > learnt) lastbus is an x86-specific option that overrides the last bus > returned by SeaBIOS's handle_1ab101. > > So the next step could be to figure out what is the lastbus returned by > handle_1ab101 and possibly why it isn't zero. Some update: - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called. PCIBIOS is only used by 32-bit kernels. As a side effect, PCI expander bridges do not work on 32-bit kernels with ACPI disabled, because they are located beyond pcibios_last_bus (with ACPI enabled, the DSDT exposes them). - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done. - for -M q35, pcibios_last_bus in Linux is set based on the size of the MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs for buses above 0. Here is a patch that only scans devfn==0, which should mostly remove the need for pci=lastbus=0. (Testing is welcome). Actually, KVM could probably avoid the scanning altogether. The only "hidden" root buses we expect are from PCI expander bridges and if you found an MMCONFIG area through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges. However, I am being conservative. A possible alternative could be a mechanism whereby the vmlinuz real mode entry point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the kernel via the vmlinuz or PVH boot information structs. However, I don't think that's very useful, and there is some risk of breaking real hardware too. Paolo diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 73bb404f4d2a..17012aa60d22 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -61,6 +61,7 @@ enum pci_bf_sort_state { extern struct pci_ops pci_root_ops; void pcibios_scan_specific_bus(int busn); +void pcibios_scan_bus_by_device(int busn); /* pci-irq.c */ @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) # endif # define x86_default_pci_init_irq pcibios_irq_init # define x86_default_pci_fixup_irqs pcibios_fixup_irqs +# define x86_default_pci_scan_bus pcibios_scan_bus_by_device #else # define x86_default_pci_init NULL # define x86_default_pci_init_irq NULL # define x86_default_pci_fixup_irqs NULL +# define x86_default_pci_scan_bus NULL #endif diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index b85a7c54c6a1..4c3a0a17a600 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -251,6 +251,7 @@ struct x86_hyper_runtime { * @save_sched_clock_state: save state for sched_clock() on suspend * @restore_sched_clock_state: restore state for sched_clock() on resume * @apic_post_init: adjust apic if needed + * @pci_scan_bus: scan a PCI bus * @legacy: legacy features * @set_legacy_features: override legacy features. Use of this callback * is highly discouraged. You should only need @@ -273,6 +274,7 @@ struct x86_platform_ops { void (*save_sched_clock_state)(void); void (*restore_sched_clock_state)(void); void (*apic_post_init)(void); + void (*pci_scan_bus)(int busn); struct x86_legacy_features legacy; void (*set_legacy_features)(void); struct x86_hyper_runtime hyper; diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c index 6857b4577f17..b248d7036dd3 100644 --- a/arch/x86/kernel/jailhouse.c +++ b/arch/x86/kernel/jailhouse.c @@ -11,12 +11,14 @@ #include <linux/acpi_pmtmr.h> #include <linux/kernel.h> #include <linux/reboot.h> +#include <linux/pci.h> #include <asm/apic.h> #include <asm/cpu.h> #include <asm/hypervisor.h> #include <asm/i8259.h> #include <asm/irqdomain.h> #include <asm/pci_x86.h> +#include <asm/pci.h> #include <asm/reboot.h> #include <asm/setup.h> #include <asm/jailhouse_para.h> @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void) return 0; } +static void jailhouse_pci_scan_bus_by_function(int busn) +{ + int devfn; + u32 l; + + for (devfn = 0; devfn < 256; devfn++) { + if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && + l != 0x0000 && l != 0xffff) { + DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); + pr_info("PCI: Discovered peer bus %02x\n", busn); + pcibios_scan_root(busn); + return; + } + } +} + static void __init jailhouse_init_platform(void) { u64 pa_data = boot_params.hdr.setup_data; @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void) x86_platform.legacy.rtc = 0; x86_platform.legacy.warm_reset = 0; x86_platform.legacy.i8042 = X86_LEGACY_I8042_PLATFORM_ABSENT; + x86_platform.pci_scan_bus = jailhouse_pci_scan_bus_by_function; legacy_pic = &null_legacy_pic; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 82caf01b63dd..59f7204ed8f3 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -24,6 +24,7 @@ #include <linux/debugfs.h> #include <linux/nmi.h> #include <linux/swait.h> +#include <linux/pci.h> #include <asm/timer.h> #include <asm/cpu.h> #include <asm/traps.h> @@ -33,6 +34,7 @@ #include <asm/apicdef.h> #include <asm/hypervisor.h> #include <asm/tlb.h> +#include <asm/pci.h> static int kvmapf = 1; @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, native_flush_tlb_others(flushmask, info); } +#ifdef CONFIG_PCI +static void kvm_pci_scan_bus(int busn) +{ + u32 l; + + /* + * Assume that there are no "hidden" buses, i.e. all PCI root buses + * have a host bridge at device 0, function 0. + */ + if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) && + l != 0x0000 && l != 0xffff) { + pr_info("PCI: Discovered peer bus %02x\n", busn); + pcibios_scan_root(busn); + } +} +#endif + static void __init kvm_guest_init(void) { int i; +#ifdef CONFIG_PCI + x86_platform.pci_scan_bus = kvm_pci_scan_bus; +#endif + if (!kvm_para_available()) return; diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index 50a2b492fdd6..19e1cc2cb6e0 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { .get_nmi_reason = default_get_nmi_reason, .save_sched_clock_state = tsc_save_sched_clock_state, .restore_sched_clock_state = tsc_restore_sched_clock_state, + .pci_scan_bus = x86_default_pci_scan_bus, .hyper.pin_vcpu = x86_op_int_noop, }; diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c index 467311b1eeea..6214dbce26d3 100644 --- a/arch/x86/pci/legacy.c +++ b/arch/x86/pci/legacy.c @@ -36,14 +36,19 @@ int __init pci_legacy_init(void) void pcibios_scan_specific_bus(int busn) { - int stride = jailhouse_paravirt() ? 1 : 8; - int devfn; - u32 l; - if (pci_find_bus(0, busn)) return; - for (devfn = 0; devfn < 256; devfn += stride) { + x86_platform.pci_scan_bus(busn); +} +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); + +void pcibios_scan_bus_by_device(int busn) +{ + int devfn; + u32 l; + + for (devfn = 0; devfn < 256; devfn += 8) { if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && l != 0x0000 && l != 0xffff) { DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn) } } } -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); static int __init pci_subsys_init(void) {
Paolo Bonzini <pbonzini@redhat.com> writes: > On 23/07/19 12:01, Paolo Bonzini wrote: >> The number of buses is determined by the firmware, not by QEMU, so >> fw_cfg would not be the right interface. In fact (as I have just >> learnt) lastbus is an x86-specific option that overrides the last bus >> returned by SeaBIOS's handle_1ab101. >> >> So the next step could be to figure out what is the lastbus returned by >> handle_1ab101 and possibly why it isn't zero. > > Some update: > > - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called. PCIBIOS is > only used by 32-bit kernels. As a side effect, PCI expander bridges do not > work on 32-bit kernels with ACPI disabled, because they are located beyond > pcibios_last_bus (with ACPI enabled, the DSDT exposes them). > > - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done. > > - for -M q35, pcibios_last_bus in Linux is set based on the size of the > MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs > for buses above 0. > > Here is a patch that only scans devfn==0, which should mostly remove the need > for pci=lastbus=0. (Testing is welcome). I just gave it a try. These are the results (avg on 10 consecutive runs): - Unpatched kernel: Avg qemu_init_end: 75.207386 linux_start_kernel: 115.056767 (+39.849381) linux_start_user: 241.020113 (+125.963346) - Unpatched kernel with pci=lastbus=0: Avg qemu_init_end: 75.468282 linux_start_kernel: 115.189322 (+39.72104) linux_start_user: 192.404823 (+77.215501) - Patched kernel (without pci=lastbus=0): Avg qemu_init_end: 75.605627 linux_start_kernel: 115.656557 (+40.05093) linux_start_user: 192.857655 (+77.201098) Looks fine to me. There must an extra cost in the patched kernel vs. using pci=lastbus=0, but it's so low that's hard to catch on the average numbers. > Actually, KVM could probably avoid the scanning altogether. The only "hidden" root > buses we expect are from PCI expander bridges and if you found an MMCONFIG area > through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges. > However, I am being conservative. > > A possible alternative could be a mechanism whereby the vmlinuz real mode entry > point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the > kernel via the vmlinuz or PVH boot information structs. However, I don't think > that's very useful, and there is some risk of breaking real hardware too. > > Paolo > > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index 73bb404f4d2a..17012aa60d22 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -61,6 +61,7 @@ enum pci_bf_sort_state { > extern struct pci_ops pci_root_ops; > > void pcibios_scan_specific_bus(int busn); > +void pcibios_scan_bus_by_device(int busn); > > /* pci-irq.c */ > > @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) > # endif > # define x86_default_pci_init_irq pcibios_irq_init > # define x86_default_pci_fixup_irqs pcibios_fixup_irqs > +# define x86_default_pci_scan_bus pcibios_scan_bus_by_device > #else > # define x86_default_pci_init NULL > # define x86_default_pci_init_irq NULL > # define x86_default_pci_fixup_irqs NULL > +# define x86_default_pci_scan_bus NULL > #endif > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index b85a7c54c6a1..4c3a0a17a600 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -251,6 +251,7 @@ struct x86_hyper_runtime { > * @save_sched_clock_state: save state for sched_clock() on suspend > * @restore_sched_clock_state: restore state for sched_clock() on resume > * @apic_post_init: adjust apic if needed > + * @pci_scan_bus: scan a PCI bus > * @legacy: legacy features > * @set_legacy_features: override legacy features. Use of this callback > * is highly discouraged. You should only need > @@ -273,6 +274,7 @@ struct x86_platform_ops { > void (*save_sched_clock_state)(void); > void (*restore_sched_clock_state)(void); > void (*apic_post_init)(void); > + void (*pci_scan_bus)(int busn); > struct x86_legacy_features legacy; > void (*set_legacy_features)(void); > struct x86_hyper_runtime hyper; > diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c > index 6857b4577f17..b248d7036dd3 100644 > --- a/arch/x86/kernel/jailhouse.c > +++ b/arch/x86/kernel/jailhouse.c > @@ -11,12 +11,14 @@ > #include <linux/acpi_pmtmr.h> > #include <linux/kernel.h> > #include <linux/reboot.h> > +#include <linux/pci.h> > #include <asm/apic.h> > #include <asm/cpu.h> > #include <asm/hypervisor.h> > #include <asm/i8259.h> > #include <asm/irqdomain.h> > #include <asm/pci_x86.h> > +#include <asm/pci.h> > #include <asm/reboot.h> > #include <asm/setup.h> > #include <asm/jailhouse_para.h> > @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void) > return 0; > } > > +static void jailhouse_pci_scan_bus_by_function(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn++) { > + if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + return; > + } > + } > +} > + > static void __init jailhouse_init_platform(void) > { > u64 pa_data = boot_params.hdr.setup_data; > @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void) > x86_platform.legacy.rtc = 0; > x86_platform.legacy.warm_reset = 0; > x86_platform.legacy.i8042 = X86_LEGACY_I8042_PLATFORM_ABSENT; > + x86_platform.pci_scan_bus = jailhouse_pci_scan_bus_by_function; > > legacy_pic = &null_legacy_pic; > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 82caf01b63dd..59f7204ed8f3 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -24,6 +24,7 @@ > #include <linux/debugfs.h> > #include <linux/nmi.h> > #include <linux/swait.h> > +#include <linux/pci.h> > #include <asm/timer.h> > #include <asm/cpu.h> > #include <asm/traps.h> > @@ -33,6 +34,7 @@ > #include <asm/apicdef.h> > #include <asm/hypervisor.h> > #include <asm/tlb.h> > +#include <asm/pci.h> > > static int kvmapf = 1; > > @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, > native_flush_tlb_others(flushmask, info); > } > > +#ifdef CONFIG_PCI > +static void kvm_pci_scan_bus(int busn) > +{ > + u32 l; > + > + /* > + * Assume that there are no "hidden" buses, i.e. all PCI root buses > + * have a host bridge at device 0, function 0. > + */ > + if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + } > +} > +#endif > + > static void __init kvm_guest_init(void) > { > int i; > > +#ifdef CONFIG_PCI > + x86_platform.pci_scan_bus = kvm_pci_scan_bus; > +#endif > + > if (!kvm_para_available()) > return; > > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > index 50a2b492fdd6..19e1cc2cb6e0 100644 > --- a/arch/x86/kernel/x86_init.c > +++ b/arch/x86/kernel/x86_init.c > @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { > .get_nmi_reason = default_get_nmi_reason, > .save_sched_clock_state = tsc_save_sched_clock_state, > .restore_sched_clock_state = tsc_restore_sched_clock_state, > + .pci_scan_bus = x86_default_pci_scan_bus, > .hyper.pin_vcpu = x86_op_int_noop, > }; > > diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c > index 467311b1eeea..6214dbce26d3 100644 > --- a/arch/x86/pci/legacy.c > +++ b/arch/x86/pci/legacy.c > @@ -36,14 +36,19 @@ int __init pci_legacy_init(void) > > void pcibios_scan_specific_bus(int busn) > { > - int stride = jailhouse_paravirt() ? 1 : 8; > - int devfn; > - u32 l; > - > if (pci_find_bus(0, busn)) > return; > > - for (devfn = 0; devfn < 256; devfn += stride) { > + x86_platform.pci_scan_bus(busn); > +} > +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > + > +void pcibios_scan_bus_by_device(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn += 8) { > if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > l != 0x0000 && l != 0xffff) { > DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn) > } > } > } > -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > > static int __init pci_subsys_init(void) > {
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote: > On 23/07/19 12:01, Paolo Bonzini wrote: > > The number of buses is determined by the firmware, not by QEMU, so > > fw_cfg would not be the right interface. In fact (as I have just > > learnt) lastbus is an x86-specific option that overrides the last bus > > returned by SeaBIOS's handle_1ab101. > > > > So the next step could be to figure out what is the lastbus returned by > > handle_1ab101 and possibly why it isn't zero. > > Some update: > > - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called. PCIBIOS is > only used by 32-bit kernels. As a side effect, PCI expander bridges do not > work on 32-bit kernels with ACPI disabled, because they are located beyond > pcibios_last_bus (with ACPI enabled, the DSDT exposes them). > > - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done. > > - for -M q35, pcibios_last_bus in Linux is set based on the size of the > MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs > for buses above 0. > > Here is a patch that only scans devfn==0, which should mostly remove the need > for pci=lastbus=0. (Testing is welcome). > > Actually, KVM could probably avoid the scanning altogether. The only "hidden" root > buses we expect are from PCI expander bridges and if you found an MMCONFIG area > through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges. > However, I am being conservative. > > A possible alternative could be a mechanism whereby the vmlinuz real mode entry > point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the > kernel via the vmlinuz or PVH boot information structs. However, I don't think > that's very useful, and there is some risk of breaking real hardware too. > > Paolo > > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index 73bb404f4d2a..17012aa60d22 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -61,6 +61,7 @@ enum pci_bf_sort_state { > extern struct pci_ops pci_root_ops; > > void pcibios_scan_specific_bus(int busn); > +void pcibios_scan_bus_by_device(int busn); > > /* pci-irq.c */ > > @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) > # endif > # define x86_default_pci_init_irq pcibios_irq_init > # define x86_default_pci_fixup_irqs pcibios_fixup_irqs > +# define x86_default_pci_scan_bus pcibios_scan_bus_by_device > #else > # define x86_default_pci_init NULL > # define x86_default_pci_init_irq NULL > # define x86_default_pci_fixup_irqs NULL > +# define x86_default_pci_scan_bus NULL > #endif > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index b85a7c54c6a1..4c3a0a17a600 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -251,6 +251,7 @@ struct x86_hyper_runtime { > * @save_sched_clock_state: save state for sched_clock() on suspend > * @restore_sched_clock_state: restore state for sched_clock() on resume > * @apic_post_init: adjust apic if needed > + * @pci_scan_bus: scan a PCI bus > * @legacy: legacy features > * @set_legacy_features: override legacy features. Use of this callback > * is highly discouraged. You should only need > @@ -273,6 +274,7 @@ struct x86_platform_ops { > void (*save_sched_clock_state)(void); > void (*restore_sched_clock_state)(void); > void (*apic_post_init)(void); > + void (*pci_scan_bus)(int busn); > struct x86_legacy_features legacy; > void (*set_legacy_features)(void); > struct x86_hyper_runtime hyper; > diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c > index 6857b4577f17..b248d7036dd3 100644 > --- a/arch/x86/kernel/jailhouse.c > +++ b/arch/x86/kernel/jailhouse.c > @@ -11,12 +11,14 @@ > #include <linux/acpi_pmtmr.h> > #include <linux/kernel.h> > #include <linux/reboot.h> > +#include <linux/pci.h> > #include <asm/apic.h> > #include <asm/cpu.h> > #include <asm/hypervisor.h> > #include <asm/i8259.h> > #include <asm/irqdomain.h> > #include <asm/pci_x86.h> > +#include <asm/pci.h> > #include <asm/reboot.h> > #include <asm/setup.h> > #include <asm/jailhouse_para.h> > @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void) > return 0; > } > > +static void jailhouse_pci_scan_bus_by_function(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn++) { > + if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + return; > + } > + } > +} > + > static void __init jailhouse_init_platform(void) > { > u64 pa_data = boot_params.hdr.setup_data; > @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void) > x86_platform.legacy.rtc = 0; > x86_platform.legacy.warm_reset = 0; > x86_platform.legacy.i8042 = X86_LEGACY_I8042_PLATFORM_ABSENT; > + x86_platform.pci_scan_bus = jailhouse_pci_scan_bus_by_function; > > legacy_pic = &null_legacy_pic; > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 82caf01b63dd..59f7204ed8f3 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -24,6 +24,7 @@ > #include <linux/debugfs.h> > #include <linux/nmi.h> > #include <linux/swait.h> > +#include <linux/pci.h> > #include <asm/timer.h> > #include <asm/cpu.h> > #include <asm/traps.h> > @@ -33,6 +34,7 @@ > #include <asm/apicdef.h> > #include <asm/hypervisor.h> > #include <asm/tlb.h> > +#include <asm/pci.h> > > static int kvmapf = 1; > > @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, > native_flush_tlb_others(flushmask, info); > } > > +#ifdef CONFIG_PCI > +static void kvm_pci_scan_bus(int busn) > +{ > + u32 l; > + > + /* > + * Assume that there are no "hidden" buses, i.e. all PCI root buses > + * have a host bridge at device 0, function 0. > + */ > + if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + } > +} > +#endif > + > static void __init kvm_guest_init(void) > { > int i; > > +#ifdef CONFIG_PCI > + x86_platform.pci_scan_bus = kvm_pci_scan_bus; > +#endif > + > if (!kvm_para_available()) > return; > Shouldn't this happen after kvm_para_available? In fact, let's add a CPU ID flag for this, so it's easy to tell guest whether to scan extra buses. What do you say? > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > index 50a2b492fdd6..19e1cc2cb6e0 100644 > --- a/arch/x86/kernel/x86_init.c > +++ b/arch/x86/kernel/x86_init.c > @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { > .get_nmi_reason = default_get_nmi_reason, > .save_sched_clock_state = tsc_save_sched_clock_state, > .restore_sched_clock_state = tsc_restore_sched_clock_state, > + .pci_scan_bus = x86_default_pci_scan_bus, > .hyper.pin_vcpu = x86_op_int_noop, > }; > > diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c > index 467311b1eeea..6214dbce26d3 100644 > --- a/arch/x86/pci/legacy.c > +++ b/arch/x86/pci/legacy.c > @@ -36,14 +36,19 @@ int __init pci_legacy_init(void) > > void pcibios_scan_specific_bus(int busn) > { > - int stride = jailhouse_paravirt() ? 1 : 8; > - int devfn; > - u32 l; > - > if (pci_find_bus(0, busn)) > return; > > - for (devfn = 0; devfn < 256; devfn += stride) { > + x86_platform.pci_scan_bus(busn); > +} > +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > + > +void pcibios_scan_bus_by_device(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn += 8) { > if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > l != 0x0000 && l != 0xffff) { > DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn) > } > } > } > -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > > static int __init pci_subsys_init(void) > {
On 25/07/19 12:03, Michael S. Tsirkin wrote: >> +#ifdef CONFIG_PCI >> + x86_platform.pci_scan_bus = kvm_pci_scan_bus; >> +#endif >> + >> if (!kvm_para_available()) >> return; >> > Shouldn't this happen after kvm_para_available? Actually kvm_para_available() is not needed anymore, since this only runs after kvm_detect() has returned true. > In fact, let's add a CPU ID flag for this, so it's > easy to tell guest whether to scan extra buses. > What do you say? I think it would make it much harder to deploy this, since it relies on having new userspace and new machine types. This patch is basically a reflection of the status quo, which is that there are generally no "hidden" buses on commonly-used KVM userspaces, and even in the weird configurations that have them there is always something at devfn=0. (On real hardware, the only such hidden bus is e.g. 0x7f/0xff, which have a bunch of QPI and MCH-related devices. This is not something you'd have in a virtual machine). Paolo
On Wed, Jul 24, 2019 at 01:14:35PM +0200, Paolo Bonzini wrote: > On 23/07/19 12:01, Paolo Bonzini wrote: > > The number of buses is determined by the firmware, not by QEMU, so > > fw_cfg would not be the right interface. In fact (as I have just > > learnt) lastbus is an x86-specific option that overrides the last bus > > returned by SeaBIOS's handle_1ab101. > > > > So the next step could be to figure out what is the lastbus returned by > > handle_1ab101 and possibly why it isn't zero. > > Some update: > > - for 64-bit, PCIBIOS (and thus handle_1ab101) is not called. PCIBIOS is > only used by 32-bit kernels. As a side effect, PCI expander bridges do not > work on 32-bit kernels with ACPI disabled, because they are located beyond > pcibios_last_bus (with ACPI enabled, the DSDT exposes them). > > - for -M pc, pcibios_last_bus in Linux remains -1 and no "legacy scanning" is done. > > - for -M q35, pcibios_last_bus in Linux is set based on the size of the > MMCONFIG aperture and Linux ends up scanning all 32*255 (bus,dev) pairs > for buses above 0. > > Here is a patch that only scans devfn==0, which should mostly remove the need > for pci=lastbus=0. (Testing is welcome). Actually, I think I have a better idea. At the moment we just get an exit on these reads and return all-ones. Yes, in theory there could be a UR bit set in a bunch of registers but in practice no one cares about these, and I don't think we implement them. So how about mapping a single page, read-only, and filling it with all-ones? We'll still run the code within linux but it will be free. What do you think? > Actually, KVM could probably avoid the scanning altogether. The only "hidden" root > buses we expect are from PCI expander bridges and if you found an MMCONFIG area > through the ACPI MCFG table, you can also use the DSDT to find PCI expander bridges. > However, I am being conservative. > > A possible alternative could be a mechanism whereby the vmlinuz real mode entry > point, or the 32-bit PVH entry point, fetch lastbus and they pass it to the > kernel via the vmlinuz or PVH boot information structs. However, I don't think > that's very useful, and there is some risk of breaking real hardware too. > > Paolo > > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index 73bb404f4d2a..17012aa60d22 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -61,6 +61,7 @@ enum pci_bf_sort_state { > extern struct pci_ops pci_root_ops; > > void pcibios_scan_specific_bus(int busn); > +void pcibios_scan_bus_by_device(int busn); > > /* pci-irq.c */ > > @@ -216,8 +217,10 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) > # endif > # define x86_default_pci_init_irq pcibios_irq_init > # define x86_default_pci_fixup_irqs pcibios_fixup_irqs > +# define x86_default_pci_scan_bus pcibios_scan_bus_by_device > #else > # define x86_default_pci_init NULL > # define x86_default_pci_init_irq NULL > # define x86_default_pci_fixup_irqs NULL > +# define x86_default_pci_scan_bus NULL > #endif > diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h > index b85a7c54c6a1..4c3a0a17a600 100644 > --- a/arch/x86/include/asm/x86_init.h > +++ b/arch/x86/include/asm/x86_init.h > @@ -251,6 +251,7 @@ struct x86_hyper_runtime { > * @save_sched_clock_state: save state for sched_clock() on suspend > * @restore_sched_clock_state: restore state for sched_clock() on resume > * @apic_post_init: adjust apic if needed > + * @pci_scan_bus: scan a PCI bus > * @legacy: legacy features > * @set_legacy_features: override legacy features. Use of this callback > * is highly discouraged. You should only need > @@ -273,6 +274,7 @@ struct x86_platform_ops { > void (*save_sched_clock_state)(void); > void (*restore_sched_clock_state)(void); > void (*apic_post_init)(void); > + void (*pci_scan_bus)(int busn); > struct x86_legacy_features legacy; > void (*set_legacy_features)(void); > struct x86_hyper_runtime hyper; > diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c > index 6857b4577f17..b248d7036dd3 100644 > --- a/arch/x86/kernel/jailhouse.c > +++ b/arch/x86/kernel/jailhouse.c > @@ -11,12 +11,14 @@ > #include <linux/acpi_pmtmr.h> > #include <linux/kernel.h> > #include <linux/reboot.h> > +#include <linux/pci.h> > #include <asm/apic.h> > #include <asm/cpu.h> > #include <asm/hypervisor.h> > #include <asm/i8259.h> > #include <asm/irqdomain.h> > #include <asm/pci_x86.h> > +#include <asm/pci.h> > #include <asm/reboot.h> > #include <asm/setup.h> > #include <asm/jailhouse_para.h> > @@ -136,6 +138,22 @@ static int __init jailhouse_pci_arch_init(void) > return 0; > } > > +static void jailhouse_pci_scan_bus_by_function(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn++) { > + if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + return; > + } > + } > +} > + > static void __init jailhouse_init_platform(void) > { > u64 pa_data = boot_params.hdr.setup_data; > @@ -153,6 +171,7 @@ static void __init jailhouse_init_platform(void) > x86_platform.legacy.rtc = 0; > x86_platform.legacy.warm_reset = 0; > x86_platform.legacy.i8042 = X86_LEGACY_I8042_PLATFORM_ABSENT; > + x86_platform.pci_scan_bus = jailhouse_pci_scan_bus_by_function; > > legacy_pic = &null_legacy_pic; > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 82caf01b63dd..59f7204ed8f3 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -24,6 +24,7 @@ > #include <linux/debugfs.h> > #include <linux/nmi.h> > #include <linux/swait.h> > +#include <linux/pci.h> > #include <asm/timer.h> > #include <asm/cpu.h> > #include <asm/traps.h> > @@ -33,6 +34,7 @@ > #include <asm/apicdef.h> > #include <asm/hypervisor.h> > #include <asm/tlb.h> > +#include <asm/pci.h> > > static int kvmapf = 1; > > @@ -621,10 +623,31 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, > native_flush_tlb_others(flushmask, info); > } > > +#ifdef CONFIG_PCI > +static void kvm_pci_scan_bus(int busn) > +{ > + u32 l; > + > + /* > + * Assume that there are no "hidden" buses, i.e. all PCI root buses > + * have a host bridge at device 0, function 0. > + */ > + if (!raw_pci_read(0, busn, 0, PCI_VENDOR_ID, 2, &l) && > + l != 0x0000 && l != 0xffff) { > + pr_info("PCI: Discovered peer bus %02x\n", busn); > + pcibios_scan_root(busn); > + } > +} > +#endif > + > static void __init kvm_guest_init(void) > { > int i; > > +#ifdef CONFIG_PCI > + x86_platform.pci_scan_bus = kvm_pci_scan_bus; > +#endif > + > if (!kvm_para_available()) > return; > > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c > index 50a2b492fdd6..19e1cc2cb6e0 100644 > --- a/arch/x86/kernel/x86_init.c > +++ b/arch/x86/kernel/x86_init.c > @@ -118,6 +118,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { > .get_nmi_reason = default_get_nmi_reason, > .save_sched_clock_state = tsc_save_sched_clock_state, > .restore_sched_clock_state = tsc_restore_sched_clock_state, > + .pci_scan_bus = x86_default_pci_scan_bus, > .hyper.pin_vcpu = x86_op_int_noop, > }; > > diff --git a/arch/x86/pci/legacy.c b/arch/x86/pci/legacy.c > index 467311b1eeea..6214dbce26d3 100644 > --- a/arch/x86/pci/legacy.c > +++ b/arch/x86/pci/legacy.c > @@ -36,14 +36,19 @@ int __init pci_legacy_init(void) > > void pcibios_scan_specific_bus(int busn) > { > - int stride = jailhouse_paravirt() ? 1 : 8; > - int devfn; > - u32 l; > - > if (pci_find_bus(0, busn)) > return; > > - for (devfn = 0; devfn < 256; devfn += stride) { > + x86_platform.pci_scan_bus(busn); > +} > +EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > + > +void pcibios_scan_bus_by_device(int busn) > +{ > + int devfn; > + u32 l; > + > + for (devfn = 0; devfn < 256; devfn += 8) { > if (!raw_pci_read(0, busn, devfn, PCI_VENDOR_ID, 2, &l) && > l != 0x0000 && l != 0xffff) { > DBG("Found device at %02x:%02x [%04x]\n", busn, devfn, l); > @@ -53,7 +58,6 @@ void pcibios_scan_specific_bus(int busn) > } > } > } > -EXPORT_SYMBOL_GPL(pcibios_scan_specific_bus); > > static int __init pci_subsys_init(void) > {
On 25/07/19 16:46, Michael S. Tsirkin wrote: > Actually, I think I have a better idea. > At the moment we just get an exit on these reads and return all-ones. > Yes, in theory there could be a UR bit set in a bunch of > registers but in practice no one cares about these, > and I don't think we implement them. > So how about mapping a single page, read-only, and filling it > with all-ones? Yes, that's nice indeed. :) But it does have some cost, in terms of either number of VMAs or QEMU RSS since the MMCONFIG area is large. What breaks if we return all zeroes? Zero is not a valid vendor ID. Paolo
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote: > On 25/07/19 16:46, Michael S. Tsirkin wrote: > > Actually, I think I have a better idea. > > At the moment we just get an exit on these reads and return all-ones. > > Yes, in theory there could be a UR bit set in a bunch of > > registers but in practice no one cares about these, > > and I don't think we implement them. > > So how about mapping a single page, read-only, and filling it > > with all-ones? > > Yes, that's nice indeed. :) But it does have some cost, in terms of > either number of VMAs or QEMU RSS since the MMCONFIG area is large. > > What breaks if we return all zeroes? Zero is not a valid vendor ID. > > Paolo It isn't but that's not what baremetal does. So there's some risk there ... Why is all zeroes better? We still need to map it, right? -- MST
On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote: > On 25/07/19 16:46, Michael S. Tsirkin wrote: > > Actually, I think I have a better idea. > > At the moment we just get an exit on these reads and return all-ones. > > Yes, in theory there could be a UR bit set in a bunch of > > registers but in practice no one cares about these, > > and I don't think we implement them. > > So how about mapping a single page, read-only, and filling it > > with all-ones? > > Yes, that's nice indeed. :) But it does have some cost, in terms of > either number of VMAs or QEMU RSS since the MMCONFIG area is large. > > What breaks if we return all zeroes? Zero is not a valid vendor ID. > > Paolo I think I know what you are thinking of doing: map /dev/zero so we get a single VMA but all mapped to a single zero pte? We could start with that, at least as an experiment. Further: - we can limit the amount of fragmentation and simply unmap everything if we exceed a specific limit: with more than X devices it's no longer a lightweight VM anyway :) - we can implement /dev/ones. in fact, we can implement /dev/byteXX for each possible value, the cost will be only 1M on a 4k page system. it might come in handy for e.g. free page hinting: at the moment if guest memory is poisoned we can not unmap it, with this trick we can map it to /dev/byteXX. Note that the kvm memory array is still fragmented. Again, we can fallback on disabling the optimization if there are too many devices. -- MST
On 25/07/19 22:30, Michael S. Tsirkin wrote: > On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote: >> On 25/07/19 16:46, Michael S. Tsirkin wrote: >>> Actually, I think I have a better idea. >>> At the moment we just get an exit on these reads and return all-ones. >>> Yes, in theory there could be a UR bit set in a bunch of >>> registers but in practice no one cares about these, >>> and I don't think we implement them. >>> So how about mapping a single page, read-only, and filling it >>> with all-ones? >> >> Yes, that's nice indeed. :) But it does have some cost, in terms of >> either number of VMAs or QEMU RSS since the MMCONFIG area is large. >> >> What breaks if we return all zeroes? Zero is not a valid vendor ID. >> >> Paolo > > I think I know what you are thinking of doing: > map /dev/zero so we get a single VMA but all mapped to > a single zero pte? Yes, exactly. You absolutely need to share the page because the guest could easily touch 32*256 pages just to scan function 0 on every bus and device, even if the VM has just 4 or 5 devices and all of them on the root complex. And that causes fragmentation so you have to map bigger areas. > - we can implement /dev/ones. in fact, we can implement > /dev/byteXX for each possible value, the cost will > be only 1M on a 4k page system. > it might come in handy for e.g. free page hinting: > at the moment if guest memory is poisoned > we can not unmap it, with this trick we can > map it to /dev/byteXX. I also thought of /dev/ones, not sure how it would be accepted. :) Also you cannot map lazily on page fault, otherwise you get a vmexit and it's slow again. So /dev/ones needs to be written to use a huge page, possibly. Paolo
On Fri, Jul 26, 2019 at 09:57:51AM +0200, Paolo Bonzini wrote: > On 25/07/19 22:30, Michael S. Tsirkin wrote: > > On Thu, Jul 25, 2019 at 05:35:01PM +0200, Paolo Bonzini wrote: > >> On 25/07/19 16:46, Michael S. Tsirkin wrote: > >>> Actually, I think I have a better idea. > >>> At the moment we just get an exit on these reads and return all-ones. > >>> Yes, in theory there could be a UR bit set in a bunch of > >>> registers but in practice no one cares about these, > >>> and I don't think we implement them. > >>> So how about mapping a single page, read-only, and filling it > >>> with all-ones? > >> > >> Yes, that's nice indeed. :) But it does have some cost, in terms of > >> either number of VMAs or QEMU RSS since the MMCONFIG area is large. > >> > >> What breaks if we return all zeroes? Zero is not a valid vendor ID. > >> > >> Paolo > > > > I think I know what you are thinking of doing: > > map /dev/zero so we get a single VMA but all mapped to > > a single zero pte? > > Yes, exactly. You absolutely need to share the page because the guest > could easily touch 32*256 pages just to scan function 0 on every bus and > device, even if the VM has just 4 or 5 devices and all of them on the > root complex. And that causes fragmentation so you have to map bigger > areas. > > > - we can implement /dev/ones. in fact, we can implement > > /dev/byteXX for each possible value, the cost will > > be only 1M on a 4k page system. > > it might come in handy for e.g. free page hinting: > > at the moment if guest memory is poisoned > > we can not unmap it, with this trick we can > > map it to /dev/byteXX. > > I also thought of /dev/ones, not sure how it would be accepted. :) Also > you cannot map lazily on page fault, otherwise you get a vmexit and it's > slow again. So /dev/ones needs to be written to use a huge page, possibly. > > Paolo It's not easy to do that - each device gets 4K within MCFG. So what we need then is a kvm option to create an address range - or maybe even a group of address ranges and aggressively map all pages in a group to the same guest page on a fault of one page in the group. -- MST
On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote: > On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote: > > Montes, Julio <julio.montes@intel.com> writes: > > > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: > > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: > > >> > Stefan Hajnoczi <stefanha@gmail.com> writes: > > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > >> > > > > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > > >> > > > -------------- > > >> > > > | Conclusion | > > >> > > > -------------- > > >> > > > > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs. > > >> > > > 363ms), > > >> > > > and is smaller on all sections (QEMU initialization, firmware > > >> > > > overhead > > >> > > > and kernel start-to-user). > > >> > > > > > >> > > > Microvm's memory tree is also visibly simpler, significantly > > >> > > > reducing > > >> > > > the exposed surface to the guest. > > >> > > > > > >> > > > While we can certainly work on making Q35 smaller, I definitely > > >> > > > think > > >> > > > it's better (and way safer!) having a specialized machine type > > >> > > > for a > > >> > > > specific use case, than a minimal Q35 whose behavior > > >> > > > significantly > > >> > > > diverges from a conventional Q35. > > >> > > > > >> > > Interesting, so not a 10x difference! This might be amenable to > > >> > > optimization. > > >> > > > > >> > > My concern with microvm is that it's so limited that few users > > >> > > will be > > >> > > able to benefit from the reduced attack surface and faster > > >> > > startup time. > > >> > > I think it's worth investigating slimming down Q35 further first. > > >> > > > > >> > > In terms of startup time the first step would be profiling Q35 > > >> > > kernel > > >> > > startup to find out what's taking so long (firmware > > >> > > initialization, PCI > > >> > > probing, etc)? > > >> > > > >> > Some findings: > > >> > > > >> > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") > > >> > saves a > > >> > whooping 120ms by avoiding the APIC timer calibration at > > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > > >> > > > >> > Average boot time with "-cpu host" > > >> > qemu_init_end: 76.408950 > > >> > linux_start_kernel: 116.166142 (+39.757192) > > >> > linux_start_user: 242.954347 (+126.788205) > > >> > > > >> > Average boot time with default "cpu" > > >> > qemu_init_end: 77.467852 > > >> > linux_start_kernel: 116.688472 (+39.22062) > > >> > linux_start_user: 363.033365 (+246.344893) > > >> > > >> \o/ > > >> > > >> > 2. The other 130ms are a direct result of PCI and ACPI presence > > >> > (tested > > >> > with a kernel without support for those elements). I'll publish > > >> > some > > >> > detailed numbers next week. > > >> > > >> Here are the Kata Containers kernel parameters: > > >> > > >> var kernelParams = []Param{ > > >> {"tsc", "reliable"}, > > >> {"no_timer_check", ""}, > > >> {"rcupdate.rcu_expedited", "1"}, > > >> {"i8042.direct", "1"}, > > >> {"i8042.dumbkbd", "1"}, > > >> {"i8042.nopnp", "1"}, > > >> {"i8042.noaux", "1"}, > > >> {"noreplace-smp", ""}, > > >> {"reboot", "k"}, > > >> {"console", "hvc0"}, > > >> {"console", "hvc1"}, > > >> {"iommu", "off"}, > > >> {"cryptomgr.notests", ""}, > > >> {"net.ifnames", "0"}, > > >> {"pci", "lastbus=0"}, > > >> } > > >> > > >> pci lastbus=0 looks interesting and so do some of the others :). > > >> > > > > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, > > > kernel won't scan the 255.. buses :) > > > > I can confirm that adding pci=lastbus=0 makes a significant > > improvement. In fact, is the only option from Kata's kernel parameter > > list that has an impact, probably because the kernel is already quite > > minimalistic. > > > > Average boot time with "-cpu host" and "pci=lastbus=0" > > qemu_init_end: 73.711569 > > linux_start_kernel: 113.414311 (+39.702742) > > linux_start_user: 190.949939 (+77.535628) > > > > That's still ~40% slower than microvm, and the breach quickly widens > > when adding more PCI devices (each one adds 10-15ms), but it's certainly > > an improvement over the original numbers. > > > > On the other hand, there isn't much we can do here from QEMU's > > perspective, as this is basically Guest OS tuning. > > fw_cfg could expose this information so guest kernels know when to > stop enumerating the PCI bus. This would make all PCI guests with new > kernels boot ~50 ms faster, regardless of machine type. > > The difference between microvm and tuned Q35 is 76 ms now. > > microvm: > qemu_init_end: 64.043264 > linux_start_kernel: 65.481782 (+1.438518) > linux_start_user: 114.938353 (+49.456571) > > Q35 with -cpu host and pci=lasbus=0: > qemu_init_end: 73.711569 > linux_start_kernel: 113.414311 (+39.702742) > linux_start_user: 190.949939 (+77.535628) > > There is a ~39 ms difference before linux_start_kernel. SeaBIOS is > loading the PVH Option ROM. > > Stefano: any recommendations for profiling or tuning SeaBIOS? As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't include this patch (available in the upstream SeaBIOS) that saves ~10ms: commit 75b42835134553c96f113e5014072c0caf99d092 Author: Stefano Garzarella <sgarzare@redhat.com> Date: Sun Dec 2 14:10:13 2018 +0100 qemu: avoid debug prints if debugcon is not enabled In order to speed up the boot phase, we can check the QEMU debugcon device, and disable the writes if it is not recognized. This patch allow us to save around 10 msec (time measured between SeaBIOS entry point and "linuxboot" entry point) when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin O'Connor <kevin@koconnor.net> As you said, we should update SeaBIOS for the next QEMU release. For profiling, I have some patches that I used to put trace points in the SeaBIOS code. I'll put them in this repository ASAP: https://github.com/stefano-garzarella/qemu-boot-time
On Tue, Jul 23, 2019 at 1:30 PM Stefano Garzarella <sgarzare@redhat.com> wrote: > > On Tue, Jul 23, 2019 at 10:47:39AM +0100, Stefan Hajnoczi wrote: > > On Tue, Jul 23, 2019 at 9:43 AM Sergio Lopez <slp@redhat.com> wrote: > > > Montes, Julio <julio.montes@intel.com> writes: > > > > > > > On Fri, 2019-07-19 at 16:09 +0100, Stefan Hajnoczi wrote: > > > >> On Fri, Jul 19, 2019 at 2:48 PM Sergio Lopez <slp@redhat.com> wrote: > > > >> > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > >> > > On Thu, Jul 18, 2019 at 05:21:46PM +0200, Sergio Lopez wrote: > > > >> > > > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > >> > > > > > > >> > > > > On Tue, Jul 02, 2019 at 02:11:02PM +0200, Sergio Lopez wrote: > > > >> > > > -------------- > > > >> > > > | Conclusion | > > > >> > > > -------------- > > > >> > > > > > > >> > > > The average boot time of microvm is a third of Q35's (115ms vs. > > > >> > > > 363ms), > > > >> > > > and is smaller on all sections (QEMU initialization, firmware > > > >> > > > overhead > > > >> > > > and kernel start-to-user). > > > >> > > > > > > >> > > > Microvm's memory tree is also visibly simpler, significantly > > > >> > > > reducing > > > >> > > > the exposed surface to the guest. > > > >> > > > > > > >> > > > While we can certainly work on making Q35 smaller, I definitely > > > >> > > > think > > > >> > > > it's better (and way safer!) having a specialized machine type > > > >> > > > for a > > > >> > > > specific use case, than a minimal Q35 whose behavior > > > >> > > > significantly > > > >> > > > diverges from a conventional Q35. > > > >> > > > > > >> > > Interesting, so not a 10x difference! This might be amenable to > > > >> > > optimization. > > > >> > > > > > >> > > My concern with microvm is that it's so limited that few users > > > >> > > will be > > > >> > > able to benefit from the reduced attack surface and faster > > > >> > > startup time. > > > >> > > I think it's worth investigating slimming down Q35 further first. > > > >> > > > > > >> > > In terms of startup time the first step would be profiling Q35 > > > >> > > kernel > > > >> > > startup to find out what's taking so long (firmware > > > >> > > initialization, PCI > > > >> > > probing, etc)? > > > >> > > > > >> > Some findings: > > > >> > > > > >> > 1. Exposing the TSC_DEADLINE CPU flag (i.e. using "-cpu host") > > > >> > saves a > > > >> > whooping 120ms by avoiding the APIC timer calibration at > > > >> > arch/x86/kernel/apic/apic.c:calibrate_APIC_clock > > > >> > > > > >> > Average boot time with "-cpu host" > > > >> > qemu_init_end: 76.408950 > > > >> > linux_start_kernel: 116.166142 (+39.757192) > > > >> > linux_start_user: 242.954347 (+126.788205) > > > >> > > > > >> > Average boot time with default "cpu" > > > >> > qemu_init_end: 77.467852 > > > >> > linux_start_kernel: 116.688472 (+39.22062) > > > >> > linux_start_user: 363.033365 (+246.344893) > > > >> > > > >> \o/ > > > >> > > > >> > 2. The other 130ms are a direct result of PCI and ACPI presence > > > >> > (tested > > > >> > with a kernel without support for those elements). I'll publish > > > >> > some > > > >> > detailed numbers next week. > > > >> > > > >> Here are the Kata Containers kernel parameters: > > > >> > > > >> var kernelParams = []Param{ > > > >> {"tsc", "reliable"}, > > > >> {"no_timer_check", ""}, > > > >> {"rcupdate.rcu_expedited", "1"}, > > > >> {"i8042.direct", "1"}, > > > >> {"i8042.dumbkbd", "1"}, > > > >> {"i8042.nopnp", "1"}, > > > >> {"i8042.noaux", "1"}, > > > >> {"noreplace-smp", ""}, > > > >> {"reboot", "k"}, > > > >> {"console", "hvc0"}, > > > >> {"console", "hvc1"}, > > > >> {"iommu", "off"}, > > > >> {"cryptomgr.notests", ""}, > > > >> {"net.ifnames", "0"}, > > > >> {"pci", "lastbus=0"}, > > > >> } > > > >> > > > >> pci lastbus=0 looks interesting and so do some of the others :). > > > >> > > > > > > > > yeah, pci=lastbus=0 is very helpful to reduce the boot time in q35, > > > > kernel won't scan the 255.. buses :) > > > > > > I can confirm that adding pci=lastbus=0 makes a significant > > > improvement. In fact, is the only option from Kata's kernel parameter > > > list that has an impact, probably because the kernel is already quite > > > minimalistic. > > > > > > Average boot time with "-cpu host" and "pci=lastbus=0" > > > qemu_init_end: 73.711569 > > > linux_start_kernel: 113.414311 (+39.702742) > > > linux_start_user: 190.949939 (+77.535628) > > > > > > That's still ~40% slower than microvm, and the breach quickly widens > > > when adding more PCI devices (each one adds 10-15ms), but it's certainly > > > an improvement over the original numbers. > > > > > > On the other hand, there isn't much we can do here from QEMU's > > > perspective, as this is basically Guest OS tuning. > > > > fw_cfg could expose this information so guest kernels know when to > > stop enumerating the PCI bus. This would make all PCI guests with new > > kernels boot ~50 ms faster, regardless of machine type. > > > > The difference between microvm and tuned Q35 is 76 ms now. > > > > microvm: > > qemu_init_end: 64.043264 > > linux_start_kernel: 65.481782 (+1.438518) > > linux_start_user: 114.938353 (+49.456571) > > > > Q35 with -cpu host and pci=lasbus=0: > > qemu_init_end: 73.711569 > > linux_start_kernel: 113.414311 (+39.702742) > > linux_start_user: 190.949939 (+77.535628) > > > > There is a ~39 ms difference before linux_start_kernel. SeaBIOS is > > loading the PVH Option ROM. > > > > Stefano: any recommendations for profiling or tuning SeaBIOS? > > As I said on IRC, the SeaBIOS image in QEMU is the 1.12.1 and it doesn't > include this patch (available in the upstream SeaBIOS) that saves ~10ms: > > commit 75b42835134553c96f113e5014072c0caf99d092 > Author: Stefano Garzarella <sgarzare@redhat.com> > Date: Sun Dec 2 14:10:13 2018 +0100 > > qemu: avoid debug prints if debugcon is not enabled > > In order to speed up the boot phase, we can check the QEMU > debugcon device, and disable the writes if it is not recognized. > > This patch allow us to save around 10 msec (time measured > between SeaBIOS entry point and "linuxboot" entry point) > when CONFIG_DEBUG_LEVEL=1 and debugcon is not enabled. > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> > Signed-off-by: Kevin O'Connor <kevin@koconnor.net> > > As you said, we should update SeaBIOS for the next QEMU release. > > For profiling, I have some patches that I used to put trace points in > the SeaBIOS code. I'll put them in this repository ASAP: > https://github.com/stefano-garzarella/qemu-boot-time I pushed QEMU (optionrom) and SeaBIOS patches in: https://github.com/stefano-garzarella/qemu-boot-time They can be useful for profiling. Cheers, Stefano
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/ Hi, This series failed the asan build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-fedora V=1 NETWORK=1 time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1 === TEST SCRIPT END === PASS 2 fdc-test /x86_64/fdc/no_media_on_start PASS 3 fdc-test /x86_64/fdc/read_without_media MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/check-qlit -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qlit" ==7808==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 fdc-test /x86_64/fdc/media_change PASS 5 fdc-test /x86_64/fdc/sense_interrupt PASS 6 fdc-test /x86_64/fdc/relative_seek --- PASS 32 test-opts-visitor /visitor/opts/range/beyond PASS 33 test-opts-visitor /visitor/opts/dict/unvisited MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-coroutine" ==7851==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==7851==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc0ad0a000; bottom 0x7fa44def8000; size: 0x0057bce12000 (376831025152) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 1 test-coroutine /basic/no-dangling-access --- PASS 11 test-aio /aio/event/wait PASS 12 test-aio /aio/event/flush PASS 13 test-aio /aio/event/wait/no-flush-cb ==7866==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 14 test-aio /aio/timer/schedule PASS 15 test-aio /aio/coroutine/queue-chaining PASS 16 test-aio /aio-gsource/flush --- PASS 28 test-aio /aio-gsource/timer/schedule PASS 13 fdc-test /x86_64/fdc/fuzz-registers MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-aio-multithread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-aio-multithread" ==7873==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-aio-multithread /aio/multi/lifecycle MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ide-test" ==7890==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 test-aio-multithread /aio/multi/schedule PASS 1 ide-test /x86_64/ide/identify PASS 3 test-aio-multithread /aio/multi/mutex/contended ==7901==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 ide-test /x86_64/ide/flush ==7912==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 ide-test /x86_64/ide/bmdma/simple_rw ==7918==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 test-aio-multithread /aio/multi/mutex/handoff PASS 4 ide-test /x86_64/ide/bmdma/trim ==7929==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 5 test-aio-multithread /aio/multi/mutex/mcs PASS 5 ide-test /x86_64/ide/bmdma/short_prdt ==7940==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 6 test-aio-multithread /aio/multi/mutex/pthread MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-throttle" PASS 6 ide-test /x86_64/ide/bmdma/one_sector_short_prdt ==7948==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-throttle /throttle/leak_bucket PASS 2 test-throttle /throttle/compute_wait PASS 3 test-throttle /throttle/init --- PASS 14 test-throttle /throttle/config/max PASS 15 test-throttle /throttle/config/iops_size MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-thread-pool" ==7951==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==7955==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-thread-pool /thread-pool/submit PASS 2 test-thread-pool /thread-pool/submit-aio PASS 3 test-thread-pool /thread-pool/submit-co PASS 4 test-thread-pool /thread-pool/submit-many PASS 7 ide-test /x86_64/ide/bmdma/long_prdt ==8027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8027==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd45d06000; bottom 0x7f83e57fe000; size: 0x007960508000 (521306931200) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 8 ide-test /x86_64/ide/bmdma/no_busmaster PASS 5 test-thread-pool /thread-pool/cancel PASS 9 ide-test /x86_64/ide/flush/nodev ==8038==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 10 ide-test /x86_64/ide/flush/empty_drive PASS 6 test-thread-pool /thread-pool/cancel-async ==8043==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-hbitmap -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-hbitmap" PASS 1 test-hbitmap /hbitmap/granularity PASS 2 test-hbitmap /hbitmap/size/0 --- PASS 4 test-hbitmap /hbitmap/iter/empty PASS 11 ide-test /x86_64/ide/flush/retry_pci PASS 5 test-hbitmap /hbitmap/iter/partial ==8054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 6 test-hbitmap /hbitmap/iter/granularity PASS 7 test-hbitmap /hbitmap/iter/iter_and_reset PASS 8 test-hbitmap /hbitmap/get/all --- PASS 14 test-hbitmap /hbitmap/set/twice PASS 15 test-hbitmap /hbitmap/set/overlap PASS 16 test-hbitmap /hbitmap/reset/empty ==8060==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 17 test-hbitmap /hbitmap/reset/general PASS 13 ide-test /x86_64/ide/cdrom/pio PASS 18 test-hbitmap /hbitmap/reset/all --- PASS 28 test-hbitmap /hbitmap/truncate/shrink/medium PASS 29 test-hbitmap /hbitmap/truncate/shrink/large PASS 30 test-hbitmap /hbitmap/meta/zero ==8066==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 14 ide-test /x86_64/ide/cdrom/pio_large ==8072==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 15 ide-test /x86_64/ide/cdrom/dma MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/ahci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="ahci-test" ==8086==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 31 test-hbitmap /hbitmap/meta/one PASS 32 test-hbitmap /hbitmap/meta/byte PASS 33 test-hbitmap /hbitmap/meta/word PASS 1 ahci-test /x86_64/ahci/sanity ==8092==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 ahci-test /x86_64/ahci/pci_spec PASS 34 test-hbitmap /hbitmap/meta/sector PASS 35 test-hbitmap /hbitmap/serialize/align ==8098==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 ahci-test /x86_64/ahci/pci_enable ==8104==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 36 test-hbitmap /hbitmap/serialize/basic PASS 37 test-hbitmap /hbitmap/serialize/part PASS 38 test-hbitmap /hbitmap/serialize/zeroes --- PASS 4 ahci-test /x86_64/ahci/hba_spec PASS 43 test-hbitmap /hbitmap/next_dirty_area/next_dirty_area_4 MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-bdrv-drain -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-drain" ==8113==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-bdrv-drain /bdrv-drain/nested PASS 2 test-bdrv-drain /bdrv-drain/multiparent PASS 3 test-bdrv-drain /bdrv-drain/set_aio_context --- PASS 20 test-bdrv-drain /bdrv-drain/iothread/drain_subtree PASS 21 test-bdrv-drain /bdrv-drain/blockjob/drain_all PASS 22 test-bdrv-drain /bdrv-drain/blockjob/drain ==8110==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 23 test-bdrv-drain /bdrv-drain/blockjob/drain_subtree PASS 24 test-bdrv-drain /bdrv-drain/blockjob/error/drain_all PASS 25 test-bdrv-drain /bdrv-drain/blockjob/error/drain --- PASS 39 test-bdrv-drain /bdrv-drain/attach/drain PASS 5 ahci-test /x86_64/ahci/hba_enable MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-bdrv-graph-mod -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bdrv-graph-mod" ==8159==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-bdrv-graph-mod /bdrv-graph-mod/update-perm-tree PASS 2 test-bdrv-graph-mod /bdrv-graph-mod/should-update-child ==8157==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-blockjob -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob" ==8168==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-blockjob /blockjob/ids PASS 2 test-blockjob /blockjob/cancel/created PASS 3 test-blockjob /blockjob/cancel/running --- PASS 8 test-blockjob /blockjob/cancel/concluded MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-blockjob-txn -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-blockjob-txn" PASS 6 ahci-test /x86_64/ahci/identify ==8174==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-blockjob-txn /single/success PASS 2 test-blockjob-txn /single/failure PASS 3 test-blockjob-txn /single/cancel --- PASS 6 test-blockjob-txn /pair/cancel PASS 7 test-blockjob-txn /pair/fail-cancel-race MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-block-backend -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-backend" ==8176==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8181==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-block-backend /block-backend/drain_aio_error PASS 2 test-block-backend /block-backend/drain_all_aio_error MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-block-iothread -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-block-iothread" PASS 7 ahci-test /x86_64/ahci/max ==8190==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-block-iothread /sync-op/pread PASS 2 test-block-iothread /sync-op/pwrite PASS 3 test-block-iothread /sync-op/load_vmstate --- PASS 15 test-block-iothread /propagate/diamond PASS 16 test-block-iothread /propagate/mirror MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-image-locking -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-image-locking" ==8192==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8212==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-image-locking /image-locking/basic PASS 2 test-image-locking /image-locking/set-perm-abort MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-x86-cpuid -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid" --- PASS 4 test-xbzrle /xbzrle/encode_decode_1_byte PASS 5 test-xbzrle /xbzrle/encode_decode_overflow PASS 8 ahci-test /x86_64/ahci/reset ==8228==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8228==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc98ab7000; bottom 0x7f6a659fe000; size: 0x0092330b9000 (627921620992) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 6 test-xbzrle /xbzrle/encode_decode --- PASS 133 test-cutils /cutils/strtosz/erange PASS 134 test-cutils /cutils/strtosz/metric MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-shift128 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-shift128" ==8240==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-shift128 /host-utils/test_lshift PASS 2 test-shift128 /host-utils/test_rshift MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-mul64 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-mul64" ==8240==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd869e8000; bottom 0x7f71117fe000; size: 0x008c751ea000 (603260362752) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 1 test-mul64 /host-utils/mulu64 --- PASS 9 test-int128 /int128/int128_gt PASS 10 test-int128 /int128/int128_rshift MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/rcutorture -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="rcutorture" ==8262==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8262==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fffd5dde000; bottom 0x7f7850bfe000; size: 0x0087851e0000 (582053920768) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 1 rcutorture /rcu/torture/1reader PASS 11 ahci-test /x86_64/ahci/io/pio/lba28/simple/high ==8295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 rcutorture /rcu/torture/10readers MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-rcu-list -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-list" ==8295==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6e62e000; bottom 0x7f1b1fbfe000; size: 0x00e34ea30000 (976276881408) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 12 ahci-test /x86_64/ahci/io/pio/lba28/double/zero ==8308==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-rcu-list /rcu/qlist/single-threaded ==8308==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc54a9f000; bottom 0x7f5c1bdfe000; size: 0x00a038ca1000 (688147533824) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 2 test-rcu-list /rcu/qlist/short-few PASS 13 ahci-test /x86_64/ahci/io/pio/lba28/double/low ==8341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8341==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4b8b5000; bottom 0x7f782c7fe000; size: 0x00841f0b7000 (567456526336) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 14 ahci-test /x86_64/ahci/io/pio/lba28/double/high ==8347==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8347==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffeb2bc8000; bottom 0x7fd572124000; size: 0x002940aa4000 (177178558464) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 3 test-rcu-list /rcu/qlist/long-many MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-rcu-simpleq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-simpleq" PASS 15 ahci-test /x86_64/ahci/io/pio/lba28/long/zero PASS 1 test-rcu-simpleq /rcu/qsimpleq/single-threaded ==8360==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8360==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc5ebf2000; bottom 0x7f8d6cdfe000; size: 0x006ef1df4000 (476504342528) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 2 test-rcu-simpleq /rcu/qsimpleq/short-few PASS 16 ahci-test /x86_64/ahci/io/pio/lba28/long/low ==8393==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8393==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc1e90d000; bottom 0x7fef47124000; size: 0x000cd77e9000 (55155003392) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 17 ahci-test /x86_64/ahci/io/pio/lba28/long/high PASS 3 test-rcu-simpleq /rcu/qsimpleq/long-many MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-rcu-tailq -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-rcu-tailq" ==8399==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 18 ahci-test /x86_64/ahci/io/pio/lba28/short/zero PASS 1 test-rcu-tailq /rcu/qtailq/single-threaded ==8412==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 test-rcu-tailq /rcu/qtailq/short-few PASS 19 ahci-test /x86_64/ahci/io/pio/lba28/short/low ==8445==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 20 ahci-test /x86_64/ahci/io/pio/lba28/short/high PASS 3 test-rcu-tailq /rcu/qtailq/long-many MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-qdist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qdist" ==8451==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-qdist /qdist/none PASS 2 test-qdist /qdist/pr PASS 3 test-qdist /qdist/single/empty --- PASS 7 test-qdist /qdist/binning/expand PASS 8 test-qdist /qdist/binning/shrink MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-qht -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht" ==8451==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffcd1fb1000; bottom 0x7f8bae7fe000; size: 0x0071237b3000 (485926580224) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 21 ahci-test /x86_64/ahci/io/pio/lba48/simple/zero ==8466==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8466==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffd0b06f000; bottom 0x7fd8d85fe000; size: 0x002432a71000 (155468632064) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 22 ahci-test /x86_64/ahci/io/pio/lba48/simple/low ==8472==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8472==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe2c664000; bottom 0x7f11299fe000; size: 0x00ed02c66000 (1017953804288) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 23 ahci-test /x86_64/ahci/io/pio/lba48/simple/high ==8478==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8478==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb1ded000; bottom 0x7f37fd1fe000; size: 0x00c5b4bef000 (849140969472) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 24 ahci-test /x86_64/ahci/io/pio/lba48/double/zero ==8484==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8484==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc4f4ff000; bottom 0x7ff9595fe000; size: 0x0002f5f01000 (12716085248) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 25 ahci-test /x86_64/ahci/io/pio/lba48/double/low ==8490==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8490==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdb07bb000; bottom 0x7ffbc8dfe000; size: 0x0001e79bd000 (8180715520) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 26 ahci-test /x86_64/ahci/io/pio/lba48/double/high ==8496==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8496==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7fff207e2000; bottom 0x7fb6ffdfe000; size: 0x0048209e4000 (309784887296) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 27 ahci-test /x86_64/ahci/io/pio/lba48/long/zero ==8502==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8502==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc7d92d000; bottom 0x7f0b65b7c000; size: 0x00f117db1000 (1035487350784) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 28 ahci-test /x86_64/ahci/io/pio/lba48/long/low ==8508==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8508==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffe6de73000; bottom 0x7fc79a9fe000; size: 0x0036d3475000 (235472900096) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 29 ahci-test /x86_64/ahci/io/pio/lba48/long/high ==8514==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 30 ahci-test /x86_64/ahci/io/pio/lba48/short/zero ==8520==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-qht /qht/mode/default PASS 31 ahci-test /x86_64/ahci/io/pio/lba48/short/low PASS 2 test-qht /qht/mode/resize MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-qht-par -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qht-par" ==8526==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 32 ahci-test /x86_64/ahci/io/pio/lba48/short/high PASS 1 test-qht-par /qht/parallel/2threads-0%updates-1s ==8542==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 33 ahci-test /x86_64/ahci/io/dma/lba28/fragmented PASS 2 test-qht-par /qht/parallel/2threads-20%updates-1s MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-bitops -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bitops" ==8555==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-bitops /bitops/sextract32 PASS 2 test-bitops /bitops/sextract64 PASS 3 test-bitops /bitops/half_shuffle32 --- PASS 1 check-qom-interface /qom/interface/direct_impl PASS 2 check-qom-interface /qom/interface/intermediate_impl MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/check-qom-proplist -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="check-qom-proplist" ==8580==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 check-qom-proplist /qom/proplist/createlist PASS 2 check-qom-proplist /qom/proplist/createv PASS 3 check-qom-proplist /qom/proplist/createcmdline --- PASS 4 test-write-threshold /write-threshold/not-trigger PASS 5 test-write-threshold /write-threshold/trigger MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-crypto-hash -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-hash" ==8607==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-crypto-hash /crypto/hash/iov PASS 2 test-crypto-hash /crypto/hash/alloc PASS 3 test-crypto-hash /crypto/hash/prealloc --- PASS 15 test-crypto-secret /crypto/secret/crypt/missingiv PASS 16 test-crypto-secret /crypto/secret/crypt/badiv MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-crypto-tlscredsx509 -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlscredsx509" ==8630==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 37 ahci-test /x86_64/ahci/io/dma/lba28/simple/high PASS 1 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectserver PASS 2 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/perfectclient PASS 3 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca1 ==8645==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca2 PASS 38 ahci-test /x86_64/ahci/io/dma/lba28/double/zero PASS 5 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodca3 PASS 6 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca1 PASS 7 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca2 PASS 8 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/badca3 ==8651==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 9 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver1 PASS 39 ahci-test /x86_64/ahci/io/dma/lba28/double/low ==8657==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 40 ahci-test /x86_64/ahci/io/dma/lba28/double/high PASS 10 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver2 ==8663==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 11 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver3 PASS 41 ahci-test /x86_64/ahci/io/dma/lba28/long/zero PASS 12 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver4 ==8669==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 42 ahci-test /x86_64/ahci/io/dma/lba28/long/low PASS 13 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver5 PASS 14 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/goodserver6 --- PASS 32 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive1 PASS 33 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive2 PASS 34 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/inactive3 ==8675==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 35 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain1 PASS 36 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/chain2 PASS 37 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingca --- PASS 39 test-crypto-tlscredsx509 /qcrypto/tlscredsx509/missingclient PASS 43 ahci-test /x86_64/ahci/io/dma/lba28/long/high MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-crypto-tlssession -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-tlssession" ==8682==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-crypto-tlssession /qcrypto/tlssession/psk PASS 44 ahci-test /x86_64/ahci/io/dma/lba28/short/zero PASS 2 test-crypto-tlssession /qcrypto/tlssession/basicca ==8692==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 test-crypto-tlssession /qcrypto/tlssession/differentca PASS 45 ahci-test /x86_64/ahci/io/dma/lba28/short/low ==8698==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 test-crypto-tlssession /qcrypto/tlssession/altname1 PASS 46 ahci-test /x86_64/ahci/io/dma/lba28/short/high ==8704==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 5 test-crypto-tlssession /qcrypto/tlssession/altname2 PASS 47 ahci-test /x86_64/ahci/io/dma/lba48/simple/zero PASS 6 test-crypto-tlssession /qcrypto/tlssession/altname3 PASS 7 test-crypto-tlssession /qcrypto/tlssession/altname4 ==8710==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 8 test-crypto-tlssession /qcrypto/tlssession/altname5 PASS 48 ahci-test /x86_64/ahci/io/dma/lba48/simple/low PASS 9 test-crypto-tlssession /qcrypto/tlssession/altname6 ==8716==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 49 ahci-test /x86_64/ahci/io/dma/lba48/simple/high PASS 10 test-crypto-tlssession /qcrypto/tlssession/wildcard1 ==8722==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 11 test-crypto-tlssession /qcrypto/tlssession/wildcard2 PASS 12 test-crypto-tlssession /qcrypto/tlssession/wildcard3 PASS 50 ahci-test /x86_64/ahci/io/dma/lba48/double/zero ==8729==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 51 ahci-test /x86_64/ahci/io/dma/lba48/double/low ==8735==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 13 test-crypto-tlssession /qcrypto/tlssession/wildcard4 PASS 52 ahci-test /x86_64/ahci/io/dma/lba48/double/high ==8741==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 14 test-crypto-tlssession /qcrypto/tlssession/wildcard5 PASS 15 test-crypto-tlssession /qcrypto/tlssession/wildcard6 PASS 16 test-crypto-tlssession /qcrypto/tlssession/cachain PASS 53 ahci-test /x86_64/ahci/io/dma/lba48/long/zero MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-qga -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-qga" ==8748==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-qga /qga/sync-delimited PASS 2 test-qga /qga/sync PASS 3 test-qga /qga/ping --- PASS 16 test-qga /qga/invalid-args PASS 17 test-qga /qga/fsfreeze-status PASS 54 ahci-test /x86_64/ahci/io/dma/lba48/long/low ==8760==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 18 test-qga /qga/blacklist PASS 19 test-qga /qga/config PASS 20 test-qga /qga/guest-exec PASS 21 test-qga /qga/guest-exec-invalid PASS 55 ahci-test /x86_64/ahci/io/dma/lba48/long/high ==8773==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 22 test-qga /qga/guest-get-osinfo PASS 23 test-qga /qga/guest-get-host-name PASS 24 test-qga /qga/guest-get-timezone --- PASS 56 ahci-test /x86_64/ahci/io/dma/lba48/short/zero PASS 1 test-util-filemonitor /util/filemonitor MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-util-sockets -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-util-sockets" ==8790==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-util-sockets /util/socket/is-socket/bad PASS 2 test-util-sockets /util/socket/is-socket/good PASS 3 test-util-sockets /socket/fd-pass/name/good --- PASS 4 test-authz-listfile /auth/list/explicit/deny PASS 5 test-authz-listfile /auth/list/explicit/allow MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-io-task -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-task" ==8818==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-io-task /crypto/task/complete PASS 2 test-io-task /crypto/task/datafree PASS 3 test-io-task /crypto/task/failure --- PASS 5 test-io-channel-file /io/channel/pipe/async MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-io-channel-tls -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-tls" PASS 58 ahci-test /x86_64/ahci/io/dma/lba48/short/high ==8885==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-io-channel-tls /qio/channel/tls/basic MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-io-channel-command -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-io-channel-command" PASS 1 test-io-channel-command /io/channel/command/fifo/sync --- PASS 17 test-crypto-pbkdf /crypto/pbkdf/nonrfc/sha384/iter1200 PASS 18 test-crypto-pbkdf /crypto/pbkdf/nonrfc/ripemd160/iter1200 MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-crypto-ivgen -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-crypto-ivgen" ==8906==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-crypto-ivgen /crypto/ivgen/plain/1 PASS 2 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c PASS 3 test-crypto-ivgen /crypto/ivgen/plain/1f2e3d4c5b6a7988 --- PASS 1 test-logging /logging/parse_range PASS 2 test-logging /logging/parse_path MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-replication -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-replication" ==8947==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8945==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 test-replication /replication/primary/read PASS 2 test-replication /replication/primary/write PASS 61 ahci-test /x86_64/ahci/flush/simple ==8956==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 test-replication /replication/primary/start PASS 4 test-replication /replication/primary/stop PASS 5 test-replication /replication/primary/do_checkpoint PASS 6 test-replication /replication/primary/get_error_all PASS 62 ahci-test /x86_64/ahci/flush/retry ==8962==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 7 test-replication /replication/secondary/read ==8967==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 8 test-replication /replication/secondary/write PASS 63 ahci-test /x86_64/ahci/flush/migrate ==8976==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8981==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==8947==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffc17022000; bottom 0x7fa4f2cfc000; size: 0x005724326000 (374269435904) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 9 test-replication /replication/secondary/start PASS 64 ahci-test /x86_64/ahci/migrate/sanity ==9008==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==9013==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 10 test-replication /replication/secondary/stop PASS 65 ahci-test /x86_64/ahci/migrate/dma/simple ==9022==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==9027==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 11 test-replication /replication/secondary/do_checkpoint PASS 12 test-replication /replication/secondary/get_error_all MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} tests/test-bufferiszero -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-bufferiszero" PASS 66 ahci-test /x86_64/ahci/migrate/dma/halted ==9040==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==9045==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 67 ahci-test /x86_64/ahci/migrate/ncq/simple ==9054==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==9059==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 68 ahci-test /x86_64/ahci/migrate/ncq/halted ==9068==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 69 ahci-test /x86_64/ahci/cdrom/eject ==9073==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 70 ahci-test /x86_64/ahci/cdrom/dma/single ==9079==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 71 ahci-test /x86_64/ahci/cdrom/dma/multi ==9085==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 72 ahci-test /x86_64/ahci/cdrom/pio/single ==9091==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! ==9091==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 0x7ffdd7f93000; bottom 0x7f75251fe000; size: 0x0088b2d95000 (587116138496) False positive error reports may follow For details see https://github.com/google/sanitizers/issues/189 PASS 73 ahci-test /x86_64/ahci/cdrom/pio/multi ==9097==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 74 ahci-test /x86_64/ahci/cdrom/pio/bcl MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/hd-geo-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="hd-geo-test" PASS 1 hd-geo-test /x86_64/hd-geo/ide/none ==9111==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 hd-geo-test /x86_64/hd-geo/ide/drive/cd_0 ==9117==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/blank ==9123==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/lba ==9129==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 5 hd-geo-test /x86_64/hd-geo/ide/drive/mbr/chs ==9135==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 6 hd-geo-test /x86_64/hd-geo/ide/device/mbr/blank ==9141==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 7 hd-geo-test /x86_64/hd-geo/ide/device/mbr/lba ==9147==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 8 hd-geo-test /x86_64/hd-geo/ide/device/mbr/chs ==9153==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 9 hd-geo-test /x86_64/hd-geo/ide/device/user/chs ==9158==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 10 hd-geo-test /x86_64/hd-geo/ide/device/user/chst MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-order-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-order-test" PASS 1 test-bufferiszero /cutils/bufferiszero --- Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9243==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 bios-tables-test /x86_64/acpi/piix4 Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9249==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 bios-tables-test /x86_64/acpi/q35 Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9255==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 bios-tables-test /x86_64/acpi/piix4/bridge Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9261==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 bios-tables-test /x86_64/acpi/piix4/ipmi Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9267==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 5 bios-tables-test /x86_64/acpi/piix4/cpuhp Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9274==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 6 bios-tables-test /x86_64/acpi/piix4/memhp Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9280==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 7 bios-tables-test /x86_64/acpi/piix4/numamem Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9286==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 8 bios-tables-test /x86_64/acpi/piix4/dimmpxm Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9295==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 9 bios-tables-test /x86_64/acpi/q35/bridge Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9301==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 10 bios-tables-test /x86_64/acpi/q35/mmio64 Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9307==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 11 bios-tables-test /x86_64/acpi/q35/ipmi Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9313==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 12 bios-tables-test /x86_64/acpi/q35/cpuhp Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9320==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 13 bios-tables-test /x86_64/acpi/q35/memhp Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9326==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 14 bios-tables-test /x86_64/acpi/q35/numamem Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9332==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 15 bios-tables-test /x86_64/acpi/q35/dimmpxm MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/boot-serial-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="boot-serial-test" PASS 1 boot-serial-test /x86_64/boot-serial/isapc --- PASS 1 i440fx-test /x86_64/i440fx/defaults PASS 2 i440fx-test /x86_64/i440fx/pam PASS 3 i440fx-test /x86_64/i440fx/firmware/bios ==9416==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 i440fx-test /x86_64/i440fx/firmware/pflash MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/fw_cfg-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="fw_cfg-test" PASS 1 fw_cfg-test /x86_64/fw_cfg/signature --- MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/drive_del-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="drive_del-test" PASS 1 drive_del-test /x86_64/drive_del/without-dev PASS 2 drive_del-test /x86_64/drive_del/after_failed_device_add ==9504==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 drive_del-test /x86_64/blockdev/drive_del_device_del MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/wdt_ib700-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="wdt_ib700-test" PASS 1 wdt_ib700-test /x86_64/wdt_ib700/pause --- PASS 1 usb-hcd-uhci-test /x86_64/uhci/pci/init PASS 2 usb-hcd-uhci-test /x86_64/uhci/pci/port1 PASS 3 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug ==9699==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 usb-hcd-uhci-test /x86_64/uhci/pci/hotplug/usb-storage MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/usb-hcd-xhci-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="usb-hcd-xhci-test" PASS 1 usb-hcd-xhci-test /x86_64/xhci/pci/init PASS 2 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug ==9708==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-uas PASS 4 usb-hcd-xhci-test /x86_64/xhci/pci/hotplug/usb-ccid MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/cpu-plug-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="cpu-plug-test" --- Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9814==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 1 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9820==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 vmgenid-test /x86_64/vmgenid/vmgenid/set-guid-auto Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9826==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 vmgenid-test /x86_64/vmgenid/vmgenid/query-monitor MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/tpm-crb-swtpm-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="tpm-crb-swtpm-test" SKIP 1 tpm-crb-swtpm-test /x86_64/tpm/crb-swtpm/test # SKIP swtpm not in PATH or missing --tpm2 support --- Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9931==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9936==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 3 migration-test /x86_64/migration/fd_proto Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9944==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9949==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 4 migration-test /x86_64/migration/postcopy/unix PASS 5 migration-test /x86_64/migration/postcopy/recovery Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9979==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9984==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 6 migration-test /x86_64/migration/precopy/unix Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9993==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==9998==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 7 migration-test /x86_64/migration/precopy/tcp Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==10007==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator ==10012==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 8 migration-test /x86_64/migration/xbzrle/unix MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/test-x86-cpuid-compat -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="test-x86-cpuid-compat" PASS 1 test-x86-cpuid-compat /x86/cpuid/parsing-plus-minus --- PASS 6 numa-test /x86_64/numa/pc/dynamic/cpu MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img tests/qmp-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl --test-name="qmp-test" PASS 1 qmp-test /x86_64/qmp/protocol ==10341==WARNING: ASan doesn't fully support makecontext/swapcontext functions and may produce false positives in some cases! PASS 2 qmp-test /x86_64/qmp/oob PASS 3 qmp-test /x86_64/qmp/preconfig PASS 4 qmp-test /x86_64/qmp/missing-any-arg --- PASS 5 device-introspect-test /x86_64/device/introspect/abstract-interfaces ================================================================= ==10589==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x561de4fecb2e in calloc (/tmp/qemu-test/build/x86_64-softmmu/qemu-system-x86_64+0x19fdb2e) --- SUMMARY: AddressSanitizer: 64 byte(s) leaked in 2 allocation(s). /tmp/qemu-test/src/tests/libqtest.c:137: kill_qemu() tried to terminate QEMU process but encountered exit status 1 ERROR - too few tests run (expected 6, got 5) make: *** [/tmp/qemu-test/src/tests/Makefile.include:894: check-qtest-x86_64] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): The full log is available at http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.asan/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote: > > Microvm is a machine type inspired by both NEMU and Firecracker, and > constructed after the machine model implemented by the latter. > > It's main purpose is providing users a KVM-only machine type with fast > boot times, minimal attack surface (measured as the number of IO ports > and MMIO regions exposed to the Guest) and small footprint (specially > when combined with the ongoing QEMU modularization effort). > > Normally, other than the device support provided by KVM itself, > microvm only supports virtio-mmio devices. Microvm also includes a > legacy mode, which adds an ISA bus with a 16550A serial port, useful > for being able to see the early boot kernel messages. Could we use virtio-pci instead of virtio-mmio? virtio-mmio is a bit deprecated and tends not to support all the features that virtio-pci does. It was introduced mostly as a stopgap while we didn't have pci support in the aarch64 virt machine, and remains for legacy "we don't like to break existing working setups" rather than as a recommended config for new systems. thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On Tue, 2 Jul 2019 at 13:14, Sergio Lopez <slp@redhat.com> wrote: >> >> Microvm is a machine type inspired by both NEMU and Firecracker, and >> constructed after the machine model implemented by the latter. >> >> It's main purpose is providing users a KVM-only machine type with fast >> boot times, minimal attack surface (measured as the number of IO ports >> and MMIO regions exposed to the Guest) and small footprint (specially >> when combined with the ongoing QEMU modularization effort). >> >> Normally, other than the device support provided by KVM itself, >> microvm only supports virtio-mmio devices. Microvm also includes a >> legacy mode, which adds an ISA bus with a 16550A serial port, useful >> for being able to see the early boot kernel messages. > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is > a bit deprecated and tends not to support all the features that > virtio-pci does. It was introduced mostly as a stopgap while we > didn't have pci support in the aarch64 virt machine, and remains > for legacy "we don't like to break existing working setups" rather > than as a recommended config for new systems. Using virtio-pci implies keeping PCI and ACPI support, defeating a significant part of microvm's purpose. What are the issues with the current state of virtio-mmio? Is there a way I can help to improve the situation? Sergio.
On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote: > Peter Maydell <peter.maydell@linaro.org> writes: > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is > > a bit deprecated and tends not to support all the features that > > virtio-pci does. It was introduced mostly as a stopgap while we > > didn't have pci support in the aarch64 virt machine, and remains > > for legacy "we don't like to break existing working setups" rather > > than as a recommended config for new systems. > > Using virtio-pci implies keeping PCI and ACPI support, defeating a > significant part of microvm's purpose. > > What are the issues with the current state of virtio-mmio? Is there a > way I can help to improve the situation? Off the top of my head: * limitations on numbers of devices * no hotplug support * unlike PCI, it's not probeable, so you have to tell the guest where all the transports are using device tree or some similar mechanism * you need one IRQ line per transport, which restricts how many you can have * it's only virtio-0.9, it doesn't support any of the new virtio-1.0 functionality * it is broadly not really maintained in QEMU (and I think not really in the kernel either? not sure), because we'd rather not have to maintain two mechanisms for doing virtio when virtio-pci is clearly better than virtio-mmio thanks -- PMM
On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote: > On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote: > > Peter Maydell <peter.maydell@linaro.org> writes: > > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is > > > a bit deprecated and tends not to support all the features that > > > virtio-pci does. It was introduced mostly as a stopgap while we > > > didn't have pci support in the aarch64 virt machine, and remains > > > for legacy "we don't like to break existing working setups" rather > > > than as a recommended config for new systems. > > > > Using virtio-pci implies keeping PCI and ACPI support, defeating a > > significant part of microvm's purpose. > > > > What are the issues with the current state of virtio-mmio? Is there a > > way I can help to improve the situation? > > Off the top of my head: > * limitations on numbers of devices > * no hotplug support > * unlike PCI, it's not probeable, so you have to tell the > guest where all the transports are using device tree or > some similar mechanism > * you need one IRQ line per transport, which restricts how > many you can have > * it's only virtio-0.9, it doesn't support any of the new > virtio-1.0 functionality > * it is broadly not really maintained in QEMU (and I think > not really in the kernel either? not sure), because we'd > rather not have to maintain two mechanisms for doing virtio > when virtio-pci is clearly better than virtio-mmio Some of these are design issues, but others can be improved with a bit of work. As for the maintenance burden, I volunteer myself to help with that, so it won't have an impact on other developers and/or projects. Sergio.
On Wed, Jul 03, 2019 at 12:04:00AM +0200, Sergio Lopez wrote: > On Tue, Jul 02, 2019 at 07:04:15PM +0100, Peter Maydell wrote: > > On Tue, 2 Jul 2019 at 18:34, Sergio Lopez <slp@redhat.com> wrote: > > > Peter Maydell <peter.maydell@linaro.org> writes: > > > > Could we use virtio-pci instead of virtio-mmio? virtio-mmio is > > > > a bit deprecated and tends not to support all the features that > > > > virtio-pci does. It was introduced mostly as a stopgap while we > > > > didn't have pci support in the aarch64 virt machine, and remains > > > > for legacy "we don't like to break existing working setups" rather > > > > than as a recommended config for new systems. > > > > > > Using virtio-pci implies keeping PCI and ACPI support, defeating a > > > significant part of microvm's purpose. > > > > > > What are the issues with the current state of virtio-mmio? Is there a > > > way I can help to improve the situation? > > > > Off the top of my head: > > * limitations on numbers of devices > > * no hotplug support > > * unlike PCI, it's not probeable, so you have to tell the > > guest where all the transports are using device tree or > > some similar mechanism > > * you need one IRQ line per transport, which restricts how > > many you can have > > * it's only virtio-0.9, it doesn't support any of the new > > virtio-1.0 functionality > > * it is broadly not really maintained in QEMU (and I think > > not really in the kernel either? not sure), because we'd > > rather not have to maintain two mechanisms for doing virtio > > when virtio-pci is clearly better than virtio-mmio > > Some of these are design issues, but others can be improved with a bit > of work. > > As for the maintenance burden, I volunteer myself to help with that, so > it won't have an impact on other developers and/or projects. > > Sergio. OK so please start with adding virtio 1 support. Guest bits have been ready for years now. -- MST
On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > OK so please start with adding virtio 1 support. Guest bits > have been ready for years now. I'd still rather we just used pci virtio. If pci isn't fast enough at startup, do something to make it faster... thanks -- PMM
On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote: > On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > > OK so please start with adding virtio 1 support. Guest bits > > have been ready for years now. > > I'd still rather we just used pci virtio. If pci isn't > fast enough at startup, do something to make it faster... > > thanks > -- PMM Oh that's putting microvm aside - if we have a maintainer for virtio mmio that's great because it does need a maintainer, and virtio 1 would be the thing to fix before adding features ;) -- MST
Michael S. Tsirkin <mst@redhat.com> writes: > On Thu, Jul 25, 2019 at 11:05:05AM +0100, Peter Maydell wrote: >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: >> > OK so please start with adding virtio 1 support. Guest bits >> > have been ready for years now. >> >> I'd still rather we just used pci virtio. If pci isn't >> fast enough at startup, do something to make it faster... >> >> thanks >> -- PMM > > Oh that's putting microvm aside - if we have a maintainer for > virtio mmio that's great because it does need a maintainer, > and virtio 1 would be the thing to fix before adding features ;) There seems to be a general consensus that virtio-mmio needs some care, and looking at the specs, implementing virtio-mmio v2/virtio v1 shouldn't be too time consuming, so I'm going to give it a try. Cheers, Sergio.
Peter Maydell <peter.maydell@linaro.org> writes: > On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: >> OK so please start with adding virtio 1 support. Guest bits >> have been ready for years now. > > I'd still rather we just used pci virtio. If pci isn't > fast enough at startup, do something to make it faster... Actually, removing PCI (and ACPI), is one of the main ways microvm has to reduce not only boot time, but also the exposed surface and the general footprint. I think we need to discuss and settle whether using virtio-mmio (even if maintained and upgraded to virtio 1) for a new machine type is acceptable or not. Because if it isn't, we should probably just ditch the whole microvm idea and move to something else. Sergio.
On 25/07/19 12:42, Sergio Lopez wrote: > > Peter Maydell <peter.maydell@linaro.org> writes: > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: >>> OK so please start with adding virtio 1 support. Guest bits >>> have been ready for years now. >> >> I'd still rather we just used pci virtio. If pci isn't >> fast enough at startup, do something to make it faster... > > Actually, removing PCI (and ACPI), is one of the main ways microvm has > to reduce not only boot time, but also the exposed surface and the > general footprint. > > I think we need to discuss and settle whether using virtio-mmio (even if > maintained and upgraded to virtio 1) for a new machine type is > acceptable or not. Because if it isn't, we should probably just ditch > the whole microvm idea and move to something else. I agree. IMNSHO the reduced attack surface from removing PCI is (mostly) security theater, however the boot time numbers that Sergio showed for microvm are quite extreme and I don't think there is any hope of getting even close with a PCI-based virtual machine. So I'd even go a step further: if using virtio-mmio for a new machine type is not acceptable, we should admit that boot time optimization in QEMU is basically as good as it can get---low-hanging fruit has been picked with PVH and mmap is the logical next step, but all that's left is optimizing the guest or something else. I must say that -M microvm took a while to grow on me, but I think it's a great example of how the infrastructure provided by QEMU provides useful features for free, even for the simplest emulated hardware. For example, in v3 microvm could only boot from PVH kernels, but the next firmware-enabled version reuses more of the PC code and thus supports all of vmlinuz, multiboot and PVH. Again: Sergio has been very receptive to feedback and has provided numbers to back the design choices, and we should reciprocate or at least be very clear on the constraints. Paolo
On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > On 25/07/19 12:42, Sergio Lopez wrote: > > Peter Maydell <peter.maydell@linaro.org> writes: > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > >>> OK so please start with adding virtio 1 support. Guest bits > >>> have been ready for years now. > >> > >> I'd still rather we just used pci virtio. If pci isn't > >> fast enough at startup, do something to make it faster... > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has > > to reduce not only boot time, but also the exposed surface and the > > general footprint. > > > > I think we need to discuss and settle whether using virtio-mmio (even if > > maintained and upgraded to virtio 1) for a new machine type is > > acceptable or not. Because if it isn't, we should probably just ditch > > the whole microvm idea and move to something else. > > I agree. IMNSHO the reduced attack surface from removing PCI is > (mostly) security theater, however the boot time numbers that Sergio > showed for microvm are quite extreme and I don't think there is any hope > of getting even close with a PCI-based virtual machine. > > So I'd even go a step further: if using virtio-mmio for a new machine > type is not acceptable, we should admit that boot time optimization in > QEMU is basically as good as it can get---low-hanging fruit has been > picked with PVH and mmap is the logical next step, but all that's left > is optimizing the guest or something else. I haven't seen enough analysis to declare boot time optimization done. QEMU startup can be profiled and improved. The numbers show that removing PCI and ACPI makes things faster but this doesn't justify removing them. Understanding of why they are slow is what justifies removing them. Otherwise it could just be a misconfiguration, inefficient implementation, etc and we've seen there is low-hanging fruit. How much time is spent doing PCI initialization? Is the vmexit pattern for PCI initialization as good as the hardware interface allows? Without an analysis of why things are slow it's not possible come to an informed decision. Stefan
On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote: > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 25/07/19 12:42, Sergio Lopez wrote: > > > Peter Maydell <peter.maydell@linaro.org> writes: > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > > >>> OK so please start with adding virtio 1 support. Guest bits > > >>> have been ready for years now. > > >> > > >> I'd still rather we just used pci virtio. If pci isn't > > >> fast enough at startup, do something to make it faster... > > > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has > > > to reduce not only boot time, but also the exposed surface and the > > > general footprint. > > > > > > I think we need to discuss and settle whether using virtio-mmio (even if > > > maintained and upgraded to virtio 1) for a new machine type is > > > acceptable or not. Because if it isn't, we should probably just ditch > > > the whole microvm idea and move to something else. > > > > I agree. IMNSHO the reduced attack surface from removing PCI is > > (mostly) security theater, however the boot time numbers that Sergio > > showed for microvm are quite extreme and I don't think there is any hope > > of getting even close with a PCI-based virtual machine. > > > > So I'd even go a step further: if using virtio-mmio for a new machine > > type is not acceptable, we should admit that boot time optimization in > > QEMU is basically as good as it can get---low-hanging fruit has been > > picked with PVH and mmap is the logical next step, but all that's left > > is optimizing the guest or something else. > > I haven't seen enough analysis to declare boot time optimization done. > QEMU startup can be profiled and improved. Right, and that will always stay the case. OTOH imho microvm is non-intrusive enough, and small enough, that we'd just put it upstream after addressing low-level comments. This will allow more contributions from people interested in boot time. With no cross-version migration support, or maybe migration disabled completely, maintainance burden should not be too high. Not everyone wants to hack on pci/acpi specifically. > The numbers show that removing PCI and ACPI makes things faster but > this doesn't justify removing them. Understanding of why they are > slow is what justifies removing them. Otherwise it could just be a > misconfiguration, inefficient implementation, etc and we've seen there > is low-hanging fruit. > > How much time is spent doing PCI initialization? Is the vmexit > pattern for PCI initialization as good as the hardware interface > allows? I know in the bios we wanted to use memory mapped for pci config accesses for a very long time now. This makes each vmexit slower but cuts the number of exits by half. Only affects seabios though. > Without an analysis of why things are slow it's not possible come to > an informed decision. > > Stefan
On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote: > > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > On 25/07/19 12:42, Sergio Lopez wrote: > > > > Peter Maydell <peter.maydell@linaro.org> writes: > > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > > > >>> OK so please start with adding virtio 1 support. Guest bits > > > >>> have been ready for years now. > > > >> > > > >> I'd still rather we just used pci virtio. If pci isn't > > > >> fast enough at startup, do something to make it faster... > > > > > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has > > > > to reduce not only boot time, but also the exposed surface and the > > > > general footprint. > > > > > > > > I think we need to discuss and settle whether using virtio-mmio (even if > > > > maintained and upgraded to virtio 1) for a new machine type is > > > > acceptable or not. Because if it isn't, we should probably just ditch > > > > the whole microvm idea and move to something else. > > > > > > I agree. IMNSHO the reduced attack surface from removing PCI is > > > (mostly) security theater, however the boot time numbers that Sergio > > > showed for microvm are quite extreme and I don't think there is any hope > > > of getting even close with a PCI-based virtual machine. > > > > > > So I'd even go a step further: if using virtio-mmio for a new machine > > > type is not acceptable, we should admit that boot time optimization in > > > QEMU is basically as good as it can get---low-hanging fruit has been > > > picked with PVH and mmap is the logical next step, but all that's left > > > is optimizing the guest or something else. > > > > I haven't seen enough analysis to declare boot time optimization done. > > QEMU startup can be profiled and improved. > > Right, and that will always stay the case. The microvm design has a premise and it can be answered definitively through performance analysis. If I had to explain to someone why PCI or ACPI significantly slows things down, I couldn't honestly do so. I say significantly because PCI init definitely requires more vmexits but can it be a small number? For ACPI I have no idea why it would consume significant amounts of time. Until we have this knowledge, the premise of microvm is unproven and merging it would be premature because maybe we can get into the same ballpark by optimizing existing code. I'm sorry for being a pain. I actually think the analysis will support microvm, but it still needs to be done in order to justify it. Stefan
On Thu, Jul 25, 2019 at 02:26:12PM +0100, Stefan Hajnoczi wrote: > On Thu, Jul 25, 2019 at 1:10 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Jul 25, 2019 at 01:01:29PM +0100, Stefan Hajnoczi wrote: > > > On Thu, Jul 25, 2019 at 12:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 25/07/19 12:42, Sergio Lopez wrote: > > > > > Peter Maydell <peter.maydell@linaro.org> writes: > > > > >> On Thu, 25 Jul 2019 at 10:59, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > >>> OK so please start with adding virtio 1 support. Guest bits > > > > >>> have been ready for years now. > > > > >> > > > > >> I'd still rather we just used pci virtio. If pci isn't > > > > >> fast enough at startup, do something to make it faster... > > > > > > > > > > Actually, removing PCI (and ACPI), is one of the main ways microvm has > > > > > to reduce not only boot time, but also the exposed surface and the > > > > > general footprint. > > > > > > > > > > I think we need to discuss and settle whether using virtio-mmio (even if > > > > > maintained and upgraded to virtio 1) for a new machine type is > > > > > acceptable or not. Because if it isn't, we should probably just ditch > > > > > the whole microvm idea and move to something else. > > > > > > > > I agree. IMNSHO the reduced attack surface from removing PCI is > > > > (mostly) security theater, however the boot time numbers that Sergio > > > > showed for microvm are quite extreme and I don't think there is any hope > > > > of getting even close with a PCI-based virtual machine. > > > > > > > > So I'd even go a step further: if using virtio-mmio for a new machine > > > > type is not acceptable, we should admit that boot time optimization in > > > > QEMU is basically as good as it can get---low-hanging fruit has been > > > > picked with PVH and mmap is the logical next step, but all that's left > > > > is optimizing the guest or something else. > > > > > > I haven't seen enough analysis to declare boot time optimization done. > > > QEMU startup can be profiled and improved. > > > > Right, and that will always stay the case. > > The microvm design has a premise and it can be answered definitively > through performance analysis. > > If I had to explain to someone why PCI or ACPI significantly slows > things down, I couldn't honestly do so. well with pci each device describes itself. you read this description dword by dword normally. typical description is 20-50 words. if both bios and linux do this, that's twice the amount. bios also uses two vmexits for each access. there's also the resource allocation game. I would say up to 200 exits per device is reasonable. > I say significantly because > PCI init definitely requires more vmexits but can it be a small > number? each bus is scanned for devices. 32 accesses, 256 bus numbers (that's the lastbus thing). Paolo posted a hack just for the root bus but whenever we have a bridge the problem will just re-surface. pcie is actually link based so downstream buses do not need to be scanned outside device 0 unless we see a multifunction bit set. I don't think linux implements this optimization atm. But still the case for internal buses. > For ACPI I have no idea why it would consume significant > amounts of time. me neither. I suspect it's not vmexit related at all. Is ACPI driver in linux just slow? It's not been designed to be on any data path... I'd love to know. I don't feel it's fair to ask someone interested in writing new performant code to necessary optimize old non-performant one. > Until we have this knowledge, the premise of microvm is unproven and > merging it would be premature because maybe we can get into the same > ballpark by optimizing existing code. maybe but who is working on this right now? If it's possible to make PC faster but not enough people know how to do it, and enough people know how to make microvm faster, then it does not matter what's possible in theory. > > I'm sorry for being a pain. I actually think the analysis will > support microvm, but it still needs to be done in order to justify it. > > Stefan At some level it would be great to have someone do detailed performance profiling. But it is a lot of work, which also needs to be justified given there's working code, and it's not bad code at that. Yes speeding up PC would be nice but if everyone's gut feeling is it won't get us what microvm is trying to achieve, why spend cycles making sure? -- MST
On 25/07/19 15:26, Stefan Hajnoczi wrote: > The microvm design has a premise and it can be answered definitively > through performance analysis. > > If I had to explain to someone why PCI or ACPI significantly slows > things down, I couldn't honestly do so. I say significantly because > PCI init definitely requires more vmexits but can it be a small > number? For ACPI I have no idea why it would consume significant > amounts of time. My guess is that it's just a lot of code that has to run. :( > Until we have this knowledge, the premise of microvm is unproven and > merging it would be premature because maybe we can get into the same > ballpark by optimizing existing code. > > I'm sorry for being a pain. I actually think the analysis will > support microvm, but it still needs to be done in order to justify it. No, you're not a pain, you're explaining your reasoning and that helps. To me *maintainability is the biggest consideration* when introducing a new feature. "We can do just as well with q35" is a good reason to deprecate and delete microvm, but not a good reason to reject it now as long as microvm is good enough in terms of maintainability. Keeping it out of tree only makes it harder to do this kind of experiment. virtio 1 seems to be the biggest remaining blocker and I think it'd be a good thing to have even for the ARM virt machine type. FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) and ~25 ms in the kernel. I must say that's pretty good, but it's still 30% of the whole boot time and reducing it is the hardest part. If having microvm in tree can help reducing it, good. Yes, it will get users, but most likely they will have to support pc or q35 as a fallback so we could still delete microvm at any time with the due deprecation period if it turns out to be a failed experiment. Whether to use qboot or SeaBIOS for microvm is another story, but it's an implementation detail as long as the ROM size doesn't change and/or we don't do versioned machine types. So we can switch from one to the other at any time; we can also include qboot directly in QEMU's tree, without going through a submodule, which also reduces the infrastructure needed (mirrors, etc.) and makes it easier to delete it. Paolo (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the last write to 0xcf8. I suspect part of qboot's 10ms boot time actually end up measured as PCI in SeaBIOS, due to different init order, so the real firmware cost of PAM and PCI initialization should be 5ms for qboot and 10ms for SeaBIOS.
On Thu, Jul 25, 2019 at 03:43:12PM +0200, Paolo Bonzini wrote: > On 25/07/19 15:26, Stefan Hajnoczi wrote: > > The microvm design has a premise and it can be answered definitively > > through performance analysis. > > > > If I had to explain to someone why PCI or ACPI significantly slows > > things down, I couldn't honestly do so. I say significantly because > > PCI init definitely requires more vmexits but can it be a small > > number? For ACPI I have no idea why it would consume significant > > amounts of time. > > My guess is that it's just a lot of code that has to run. :( > > > Until we have this knowledge, the premise of microvm is unproven and > > merging it would be premature because maybe we can get into the same > > ballpark by optimizing existing code. > > > > I'm sorry for being a pain. I actually think the analysis will > > support microvm, but it still needs to be done in order to justify it. > > No, you're not a pain, you're explaining your reasoning and that helps. > > To me *maintainability is the biggest consideration* when introducing a > new feature. "We can do just as well with q35" is a good reason to > deprecate and delete microvm, but not a good reason to reject it now as > long as microvm is good enough in terms of maintainability. Keeping it > out of tree only makes it harder to do this kind of experiment. virtio > 1 seems to be the biggest remaining blocker and I think it'd be a good > thing to have even for the ARM virt machine type. Yep. E.g. virtio-iommu guys wanted that too. > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) > and ~25 ms in the kernel. How did you measure the qemu time btw? > I must say that's pretty good, but it's still > 30% of the whole boot time and reducing it is the hardest part. If > having microvm in tree can help reducing it, good. Yes, it will get > users, but most likely they will have to support pc or q35 as a fallback > so we could still delete microvm at any time with the due deprecation > period if it turns out to be a failed experiment. > > Whether to use qboot or SeaBIOS for microvm is another story, but it's > an implementation detail as long as the ROM size doesn't change and/or > we don't do versioned machine types. So we can switch from one to the > other at any time; we can also include qboot directly in QEMU's tree, > without going through a submodule, which also reduces the infrastructure > needed (mirrors, etc.) and makes it easier to delete it. > > Paolo > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the > last write to 0xcf8. I suspect part of qboot's 10ms boot time actually > end up measured as PCI in SeaBIOS, due to different init order, so the > real firmware cost of PAM and PCI initialization should be 5ms for qboot > and 10ms for SeaBIOS.
On 25/07/19 15:54, Michael S. Tsirkin wrote: >> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) >> and ~25 ms in the kernel. > How did you measure the qemu time btw? > It's QEMU startup, but not QEMU altogether. For example the time spent in memory.c when a BAR is programmed is not part of those 10 ms. So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs 65 ms. Paolo
On Thu, Jul 25, 2019 at 04:13:13PM +0200, Paolo Bonzini wrote: > On 25/07/19 15:54, Michael S. Tsirkin wrote: > >> FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) > >> and ~25 ms in the kernel. > > How did you measure the qemu time btw? > > > > It's QEMU startup, but not QEMU altogether. For example the time spent > in memory.c when a BAR is programmed is not part of those 10 ms. > > So I just computed q35 qemu startup - microvm qemu startup, it's 65 vs > 65 ms. > > Paolo Oh so it could be eventfd or whatever, just as well. I actually wonder whether we spend much time within synchronize_* calls. eventfd triggers this a lot of times. How about ioeventfd=off? Does this speed up things? -- MST
On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote: > To me *maintainability is the biggest consideration* when introducing a > new feature. "We can do just as well with q35" is a good reason to > deprecate and delete microvm, but not a good reason to reject it now as > long as microvm is good enough in terms of maintainability. I think maintainability matters, but also important is "are we going in the right direction in the first place?". virtio-mmio is (variously deliberately and accidentally) quite a long way behind virtio-pci, and certain kinds of things (hotplug, extensibility beyond a certain number of endpoints) are not going to be possible (either ever, or without a lot of extra design and implementation work to reimplement stuff we have already today with PCI). Are we sure we're not going to end up with a stream of "oh, now we need to implement X for virtio-mmio (that virtio-pci already has)", "users want Y now (that virtio-pci already has)", etc? The other thing is that once we've introduced something we're stuck with whatever it does, because we don't like breaking backwards compatibility. So I think getting the virtio-legacy vs virtio-1 story sorted out before we land microvm is important, at least to the point where we know we haven't backed ourselves into a corner or required a lot of extra effort on transitional-device support that we could have avoided. Which isn't to say that I'm against the microvm approach; just that I'd like us to consider and make a decision on these issues before landing it, rather than just saying "the patches in themselves look good, let's merge it". thanks -- PMM
On 25/07/19 16:04, Peter Maydell wrote: > On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote: >> To me *maintainability is the biggest consideration* when introducing a >> new feature. "We can do just as well with q35" is a good reason to >> deprecate and delete microvm, but not a good reason to reject it now as >> long as microvm is good enough in terms of maintainability. > > I think maintainability matters, but also important is "are > we going in the right direction in the first place?". > virtio-mmio is (variously deliberately and accidentally) > quite a long way behind virtio-pci, and certain kinds of things > (hotplug, extensibility beyond a certain number of endpoints) > are not going to be possible (either ever, or without a lot > of extra design and implementation work to reimplement stuff > we have already today with PCI). Are we sure we're not going > to end up with a stream of "oh, now we need to implement X for > virtio-mmio (that virtio-pci already has)", "users want Y now > (that virtio-pci already has)", etc? I think this is part of maintainability in a wider sense. For every missing feature there should be a good reason why it's not needed. And if there is already code to do that in QEMU, then there should be an excellent reason why it's not being used. (This was the essence of the firmware debate). So for microvm you could do without hotplug because the idea is that you just tear down the VM and restart it. Lack of MSI is actually what worries me the most, but we could say that microvm clients generally have little multiprocessing so it's not common to have multiple network flows at the same time and so you don't need multiqueue. For microvm in particular there are two reasons why we can take some shortcuts (but with care): - we won't support versioned machine types for microvm. microvm guests die every time you upgrade QEMU, by design. So this is not another QED, which implemented more features than qcow2 but did so at the wrong place of the stack. In fact it's exactly the opposite (it implements less features, so that the implementation of e.g. q35 or PCI is untouched and does not need one-off boot time optimization hacks) - we know that Amazon is using something very similar to microvm in production, with virtio-mmio, so the feature set is at least usable for something. > The other thing is that once we've introduced something we're > stuck with whatever it does, because we don't like breaking > backwards compatibility. So I think getting the virtio-legacy > vs virtio-1 story sorted out before we land microvm is > important, at least to the point where we know we haven't > backed ourselves into a corner or required a lot of extra > effort on transitional-device support that we could have > avoided. Even though we won't support versioned machine types, I think there is agreement that virtio 0.9 is a bad idea and should be fixed. Paolo > Which isn't to say that I'm against the microvm approach; > just that I'd like us to consider and make a decision on > these issues before landing it, rather than just saying > "the patches in themselves look good, let's merge it". > > thanks > -- PMM >
On Thu, Jul 25, 2019 at 04:26:42PM +0200, Paolo Bonzini wrote: > On 25/07/19 16:04, Peter Maydell wrote: > > On Thu, 25 Jul 2019 at 14:43, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> To me *maintainability is the biggest consideration* when introducing a > >> new feature. "We can do just as well with q35" is a good reason to > >> deprecate and delete microvm, but not a good reason to reject it now as > >> long as microvm is good enough in terms of maintainability. > > > > I think maintainability matters, but also important is "are > > we going in the right direction in the first place?". > > virtio-mmio is (variously deliberately and accidentally) > > quite a long way behind virtio-pci, and certain kinds of things > > (hotplug, extensibility beyond a certain number of endpoints) > > are not going to be possible (either ever, or without a lot > > of extra design and implementation work to reimplement stuff > > we have already today with PCI). Are we sure we're not going > > to end up with a stream of "oh, now we need to implement X for > > virtio-mmio (that virtio-pci already has)", "users want Y now > > (that virtio-pci already has)", etc? > > I think this is part of maintainability in a wider sense. For every > missing feature there should be a good reason why it's not needed. And > if there is already code to do that in QEMU, then there should be an > excellent reason why it's not being used. (This was the essence of the > firmware debate). > > So for microvm you could do without hotplug because the idea is that you > just tear down the VM and restart it. Lack of MSI is actually what > worries me the most, but we could say that microvm clients generally > have little multiprocessing so it's not common to have multiple network > flows at the same time and so you don't need multiqueue. Me too, and in fact someone just posted virtio-mmio: support multiple interrupt vectors > For microvm in particular there are two reasons why we can take some > shortcuts (but with care): > > - we won't support versioned machine types for microvm. microvm guests > die every time you upgrade QEMU, by design. So this is not another QED, > which implemented more features than qcow2 but did so at the wrong place > of the stack. In fact it's exactly the opposite (it implements less > features, so that the implementation of e.g. q35 or PCI is untouched and > does not need one-off boot time optimization hacks) > > - we know that Amazon is using something very similar to microvm in > production, with virtio-mmio, so the feature set is at least usable for > something. > > > The other thing is that once we've introduced something we're > > stuck with whatever it does, because we don't like breaking > > backwards compatibility. So I think getting the virtio-legacy > > vs virtio-1 story sorted out before we land microvm is > > important, at least to the point where we know we haven't > > backed ourselves into a corner or required a lot of extra > > effort on transitional-device support that we could have > > avoided. > > Even though we won't support versioned machine types, I think there is > agreement that virtio 0.9 is a bad idea and should be fixed. > > Paolo Right, for the simple reason that mmio does not support transitional devices, only transitional drivers. So if we commit to supporting old guests, we won't be able to back out of that. > > Which isn't to say that I'm against the microvm approach; > > just that I'd like us to consider and make a decision on > > these issues before landing it, rather than just saying > > "the patches in themselves look good, let's merge it". > > > > thanks > > -- PMM > >
Paolo Bonzini <pbonzini@redhat.com> writes: > On 25/07/19 15:26, Stefan Hajnoczi wrote: >> The microvm design has a premise and it can be answered definitively >> through performance analysis. >> >> If I had to explain to someone why PCI or ACPI significantly slows >> things down, I couldn't honestly do so. I say significantly because >> PCI init definitely requires more vmexits but can it be a small >> number? For ACPI I have no idea why it would consume significant >> amounts of time. > > My guess is that it's just a lot of code that has to run. :( I think I haven't shared any numbers about ACPI. I don't have details about where exactly the time is spent, but compiling a guest kernel without ACPI decreases the average boot time in ~12ms, and the kernel's unstripped ELF binary size goes down in a whooping ~300KiB. On the other hand, removing ACPI from QEMU decreases its initialization time in ~5ms, and the binary size is ~183KiB smaller. IMHO, those are pretty relevant savings on both fronts. >> Until we have this knowledge, the premise of microvm is unproven and >> merging it would be premature because maybe we can get into the same >> ballpark by optimizing existing code. >> >> I'm sorry for being a pain. I actually think the analysis will >> support microvm, but it still needs to be done in order to justify it. > > No, you're not a pain, you're explaining your reasoning and that helps. > > To me *maintainability is the biggest consideration* when introducing a > new feature. "We can do just as well with q35" is a good reason to > deprecate and delete microvm, but not a good reason to reject it now as > long as microvm is good enough in terms of maintainability. Keeping it > out of tree only makes it harder to do this kind of experiment. virtio > 1 seems to be the biggest remaining blocker and I think it'd be a good > thing to have even for the ARM virt machine type. > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) > and ~25 ms in the kernel. I must say that's pretty good, but it's still > 30% of the whole boot time and reducing it is the hardest part. If > having microvm in tree can help reducing it, good. Yes, it will get > users, but most likely they will have to support pc or q35 as a fallback > so we could still delete microvm at any time with the due deprecation > period if it turns out to be a failed experiment. > > Whether to use qboot or SeaBIOS for microvm is another story, but it's > an implementation detail as long as the ROM size doesn't change and/or > we don't do versioned machine types. So we can switch from one to the > other at any time; we can also include qboot directly in QEMU's tree, > without going through a submodule, which also reduces the infrastructure > needed (mirrors, etc.) and makes it easier to delete it. > > Paolo > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the > last write to 0xcf8. I suspect part of qboot's 10ms boot time actually > end up measured as PCI in SeaBIOS, due to different init order, so the > real firmware cost of PAM and PCI initialization should be 5ms for qboot > and 10ms for SeaBIOS.
On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote: > > Paolo Bonzini <pbonzini@redhat.com> writes: > > > On 25/07/19 15:26, Stefan Hajnoczi wrote: > >> The microvm design has a premise and it can be answered definitively > >> through performance analysis. > >> > >> If I had to explain to someone why PCI or ACPI significantly slows > >> things down, I couldn't honestly do so. I say significantly because > >> PCI init definitely requires more vmexits but can it be a small > >> number? For ACPI I have no idea why it would consume significant > >> amounts of time. > > > > My guess is that it's just a lot of code that has to run. :( > > I think I haven't shared any numbers about ACPI. > > I don't have details about where exactly the time is spent, but > compiling a guest kernel without ACPI decreases the average boot time in > ~12ms, and the kernel's unstripped ELF binary size goes down in a > whooping ~300KiB. At least the binary size is hardly surprising. I'm guessing you built in lots of drivers. It would be educational to try to enable ACPI core but disable all optional features. > On the other hand, removing ACPI from QEMU decreases its initialization > time in ~5ms, and the binary size is ~183KiB smaller. Yes - ACPI generation uses a ton of allocations and data copies. Need to play with pre-allocation strategies. Maybe something as simple as: diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index f3fdfefcd5..24becc069e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) acpi_get_pci_holes(&pci_hole, &pci_hole64); acpi_get_slic_oem(&slic_oem); +#define DEFAULT_ARRAY_SIZE 16 table_offsets = g_array_new(false, true /* clear */, - sizeof(uint32_t)); + sizeof(uint32_t), + DEFAULT_ARRAY_SIZE); ACPI_BUILD_DPRINTF("init ACPI tables\n"); bios_linker_loader_alloc(tables->linker, will already help a bit. > > IMHO, those are pretty relevant savings on both fronts. > > >> Until we have this knowledge, the premise of microvm is unproven and > >> merging it would be premature because maybe we can get into the same > >> ballpark by optimizing existing code. > >> > >> I'm sorry for being a pain. I actually think the analysis will > >> support microvm, but it still needs to be done in order to justify it. > > > > No, you're not a pain, you're explaining your reasoning and that helps. > > > > To me *maintainability is the biggest consideration* when introducing a > > new feature. "We can do just as well with q35" is a good reason to > > deprecate and delete microvm, but not a good reason to reject it now as > > long as microvm is good enough in terms of maintainability. Keeping it > > out of tree only makes it harder to do this kind of experiment. virtio > > 1 seems to be the biggest remaining blocker and I think it'd be a good > > thing to have even for the ARM virt machine type. > > > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) > > and ~25 ms in the kernel. I must say that's pretty good, but it's still > > 30% of the whole boot time and reducing it is the hardest part. If > > having microvm in tree can help reducing it, good. Yes, it will get > > users, but most likely they will have to support pc or q35 as a fallback > > so we could still delete microvm at any time with the due deprecation > > period if it turns out to be a failed experiment. > > > > Whether to use qboot or SeaBIOS for microvm is another story, but it's > > an implementation detail as long as the ROM size doesn't change and/or > > we don't do versioned machine types. So we can switch from one to the > > other at any time; we can also include qboot directly in QEMU's tree, > > without going through a submodule, which also reduces the infrastructure > > needed (mirrors, etc.) and makes it easier to delete it. > > > > Paolo > > > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the > > last write to 0xcf8. I suspect part of qboot's 10ms boot time actually > > end up measured as PCI in SeaBIOS, due to different init order, so the > > real firmware cost of PAM and PCI initialization should be 5ms for qboot > > and 10ms for SeaBIOS. >
On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote: > On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote: > > > > Paolo Bonzini <pbonzini@redhat.com> writes: > > > > > On 25/07/19 15:26, Stefan Hajnoczi wrote: > > >> The microvm design has a premise and it can be answered definitively > > >> through performance analysis. > > >> > > >> If I had to explain to someone why PCI or ACPI significantly slows > > >> things down, I couldn't honestly do so. I say significantly because > > >> PCI init definitely requires more vmexits but can it be a small > > >> number? For ACPI I have no idea why it would consume significant > > >> amounts of time. > > > > > > My guess is that it's just a lot of code that has to run. :( > > > > I think I haven't shared any numbers about ACPI. > > > > I don't have details about where exactly the time is spent, but > > compiling a guest kernel without ACPI decreases the average boot time in > > ~12ms, and the kernel's unstripped ELF binary size goes down in a > > whooping ~300KiB. > > At least the binary size is hardly surprising. > > I'm guessing you built in lots of drivers. > > It would be educational to try to enable ACPI core but disable all > optional features. Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational. > > > On the other hand, removing ACPI from QEMU decreases its initialization > > time in ~5ms, and the binary size is ~183KiB smaller. > > Yes - ACPI generation uses a ton of allocations and data copies. > > Need to play with pre-allocation strategies. Maybe something > as simple as: > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index f3fdfefcd5..24becc069e 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) > acpi_get_pci_holes(&pci_hole, &pci_hole64); > acpi_get_slic_oem(&slic_oem); > > +#define DEFAULT_ARRAY_SIZE 16 > table_offsets = g_array_new(false, true /* clear */, > - sizeof(uint32_t)); > + sizeof(uint32_t), > + DEFAULT_ARRAY_SIZE); > ACPI_BUILD_DPRINTF("init ACPI tables\n"); > > bios_linker_loader_alloc(tables->linker, > > will already help a bit. > > > > > IMHO, those are pretty relevant savings on both fronts. > > > > >> Until we have this knowledge, the premise of microvm is unproven and > > >> merging it would be premature because maybe we can get into the same > > >> ballpark by optimizing existing code. > > >> > > >> I'm sorry for being a pain. I actually think the analysis will > > >> support microvm, but it still needs to be done in order to justify it. > > > > > > No, you're not a pain, you're explaining your reasoning and that helps. > > > > > > To me *maintainability is the biggest consideration* when introducing a > > > new feature. "We can do just as well with q35" is a good reason to > > > deprecate and delete microvm, but not a good reason to reject it now as > > > long as microvm is good enough in terms of maintainability. Keeping it > > > out of tree only makes it harder to do this kind of experiment. virtio > > > 1 seems to be the biggest remaining blocker and I think it'd be a good > > > thing to have even for the ARM virt machine type. > > > > > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) > > > and ~25 ms in the kernel. I must say that's pretty good, but it's still > > > 30% of the whole boot time and reducing it is the hardest part. If > > > having microvm in tree can help reducing it, good. Yes, it will get > > > users, but most likely they will have to support pc or q35 as a fallback > > > so we could still delete microvm at any time with the due deprecation > > > period if it turns out to be a failed experiment. > > > > > > Whether to use qboot or SeaBIOS for microvm is another story, but it's > > > an implementation detail as long as the ROM size doesn't change and/or > > > we don't do versioned machine types. So we can switch from one to the > > > other at any time; we can also include qboot directly in QEMU's tree, > > > without going through a submodule, which also reduces the infrastructure > > > needed (mirrors, etc.) and makes it easier to delete it. > > > > > > Paolo > > > > > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the > > > last write to 0xcf8. I suspect part of qboot's 10ms boot time actually > > > end up measured as PCI in SeaBIOS, due to different init order, so the > > > real firmware cost of PAM and PCI initialization should be 5ms for qboot > > > and 10ms for SeaBIOS. > > > >
On 25/07/19 17:01, Michael S. Tsirkin wrote: >> It would be educational to try to enable ACPI core but disable all >> optional features. A lot of them are select'ed so it's not easy. > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational. That's what the NEMU guys experimented with. It's not supported by our DSDT since it uses ACPI GPE, and the reduction in code size is small (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?). Paolo
On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote: > On 25/07/19 17:01, Michael S. Tsirkin wrote: > >> It would be educational to try to enable ACPI core but disable all > >> optional features. > > A lot of them are select'ed so it's not easy. > > > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational. > > That's what the NEMU guys experimented with. It's not supported by our > DSDT since it uses ACPI GPE, Well there are two GPE blocks in FADT. We could just switch to these if necesary I think. > and the reduction in code size is small > (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?). > > Paolo Well ACPI is 150k loc I think, right? linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1 145926 total So 100k wouldn't be too shabby. -- MST
On Thu, 25 Jul 2019 13:38:48 -0400 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Thu, Jul 25, 2019 at 05:39:39PM +0200, Paolo Bonzini wrote: > > On 25/07/19 17:01, Michael S. Tsirkin wrote: > > >> It would be educational to try to enable ACPI core but disable all > > >> optional features. > > > > A lot of them are select'ed so it's not easy. > > > > > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational. > > > > That's what the NEMU guys experimented with. It's not supported by our > > DSDT since it uses ACPI GPE, > > Well there are two GPE blocks in FADT. We could just switch to > these if necesary I think. if it's simplistic vm we could build dedicated DSDT (or whole set of tables) for it and use reduced profile like arm-virt machine does (just a newer version of FADT with need flags set). That probably would cut acpi cost on QEMU side. > > and the reduction in code size is small > > (about 15000 lines of code in ACPICA, perhaps 100k if you're lucky?). > > > > Paolo > > Well ACPI is 150k loc I think, right? > > linux]$ wc -l `find drivers/acpi/ -name '*.c' `|tail -1 > 145926 total > > So 100k wouldn't be too shabby. >
Michael S. Tsirkin <mst@redhat.com> writes: > On Thu, Jul 25, 2019 at 10:58:22AM -0400, Michael S. Tsirkin wrote: >> On Thu, Jul 25, 2019 at 04:42:42PM +0200, Sergio Lopez wrote: >> > >> > Paolo Bonzini <pbonzini@redhat.com> writes: >> > >> > > On 25/07/19 15:26, Stefan Hajnoczi wrote: >> > >> The microvm design has a premise and it can be answered definitively >> > >> through performance analysis. >> > >> >> > >> If I had to explain to someone why PCI or ACPI significantly slows >> > >> things down, I couldn't honestly do so. I say significantly because >> > >> PCI init definitely requires more vmexits but can it be a small >> > >> number? For ACPI I have no idea why it would consume significant >> > >> amounts of time. >> > > >> > > My guess is that it's just a lot of code that has to run. :( >> > >> > I think I haven't shared any numbers about ACPI. >> > >> > I don't have details about where exactly the time is spent, but >> > compiling a guest kernel without ACPI decreases the average boot time in >> > ~12ms, and the kernel's unstripped ELF binary size goes down in a >> > whooping ~300KiB. >> >> At least the binary size is hardly surprising. >> >> I'm guessing you built in lots of drivers. >> >> It would be educational to try to enable ACPI core but disable all >> optional features. I just tried disabling everything that menuconfig allowed me to. Saves ~27KiB and doesn't improve boot time. > Trying with ACPI_REDUCED_HARDWARE_ONLY would also be educational. I also tried enabling this one in my original config. It saves ~11.5KiB, and has on impact on boot time either. >> >> > On the other hand, removing ACPI from QEMU decreases its initialization >> > time in ~5ms, and the binary size is ~183KiB smaller. >> >> Yes - ACPI generation uses a ton of allocations and data copies. >> >> Need to play with pre-allocation strategies. Maybe something >> as simple as: >> >> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c >> index f3fdfefcd5..24becc069e 100644 >> --- a/hw/i386/acpi-build.c >> +++ b/hw/i386/acpi-build.c >> @@ -2629,8 +2629,10 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine) >> acpi_get_pci_holes(&pci_hole, &pci_hole64); >> acpi_get_slic_oem(&slic_oem); >> >> +#define DEFAULT_ARRAY_SIZE 16 >> table_offsets = g_array_new(false, true /* clear */, >> - sizeof(uint32_t)); >> + sizeof(uint32_t), >> + DEFAULT_ARRAY_SIZE); >> ACPI_BUILD_DPRINTF("init ACPI tables\n"); >> >> bios_linker_loader_alloc(tables->linker, >> >> will already help a bit. >> >> > >> > IMHO, those are pretty relevant savings on both fronts. >> > >> > >> Until we have this knowledge, the premise of microvm is unproven and >> > >> merging it would be premature because maybe we can get into the same >> > >> ballpark by optimizing existing code. >> > >> >> > >> I'm sorry for being a pain. I actually think the analysis will >> > >> support microvm, but it still needs to be done in order to justify it. >> > > >> > > No, you're not a pain, you're explaining your reasoning and that helps. >> > > >> > > To me *maintainability is the biggest consideration* when introducing a >> > > new feature. "We can do just as well with q35" is a good reason to >> > > deprecate and delete microvm, but not a good reason to reject it now as >> > > long as microvm is good enough in terms of maintainability. Keeping it >> > > out of tree only makes it harder to do this kind of experiment. virtio >> > > 1 seems to be the biggest remaining blocker and I think it'd be a good >> > > thing to have even for the ARM virt machine type. >> > > >> > > FWIW the "PCI tax" seems to be ~10 ms in QEMU, ~10 ms in the firmware(*) >> > > and ~25 ms in the kernel. I must say that's pretty good, but it's still >> > > 30% of the whole boot time and reducing it is the hardest part. If >> > > having microvm in tree can help reducing it, good. Yes, it will get >> > > users, but most likely they will have to support pc or q35 as a fallback >> > > so we could still delete microvm at any time with the due deprecation >> > > period if it turns out to be a failed experiment. >> > > >> > > Whether to use qboot or SeaBIOS for microvm is another story, but it's >> > > an implementation detail as long as the ROM size doesn't change and/or >> > > we don't do versioned machine types. So we can switch from one to the >> > > other at any time; we can also include qboot directly in QEMU's tree, >> > > without going through a submodule, which also reduces the infrastructure >> > > needed (mirrors, etc.) and makes it easier to delete it. >> > > >> > > Paolo >> > > >> > > (*) I measured 15ms in SeaBIOS and 5ms in qboot from the first to the >> > > last write to 0xcf8. I suspect part of qboot's 10ms boot time actually >> > > end up measured as PCI in SeaBIOS, due to different init order, so the >> > > real firmware cost of PAM and PCI initialization should be 5ms for qboot >> > > and 10ms for SeaBIOS. >> > >> >>
Patchew URL: https://patchew.org/QEMU/20190702121106.28374-1-slp@redhat.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Subject: [Qemu-devel] [PATCH v3 0/4] Introduce the microvm machine type Message-id: 20190702121106.28374-1-slp@redhat.com === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu - [tag update] patchew/20190702113414.6896-1-armbru@redhat.com -> patchew/20190702113414.6896-1-armbru@redhat.com Switched to a new branch 'test' 8ebe540 hw/i386: Introduce the microvm machine type ac71c2a hw/i386: Factorize PVH related functions faeccbd hw/i386: Add an Intel MPTable generator 7540b93 hw/virtio: Factorize virtio-mmio headers === OUTPUT BEGIN === 1/4 Checking commit 7540b9358a0f (hw/virtio: Factorize virtio-mmio headers) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #66: new file mode 100644 total: 0 errors, 1 warnings, 105 lines checked Patch 1/4 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 2/4 Checking commit faeccbd2c589 (hw/i386: Add an Intel MPTable generator) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #16: new file mode 100644 total: 0 errors, 1 warnings, 374 lines checked Patch 2/4 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 3/4 Checking commit ac71c2af3972 (hw/i386: Factorize PVH related functions) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #186: new file mode 100644 ERROR: do not initialise statics to 0 or NULL #210: FILE: hw/i386/pvh.c:20: +static size_t pvh_start_addr = 0; total: 1 errors, 1 warnings, 281 lines checked Patch 3/4 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. 4/4 Checking commit 8ebe540c4430 (hw/i386: Introduce the microvm machine type) WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? #67: new file mode 100644 ERROR: Error messages should not contain newlines #291: FILE: hw/i386/microvm.c:220: + error_report("qemu: error reading initrd %s: %s\n", ERROR: Error messages should not contain newlines #299: FILE: hw/i386/microvm.c:228: + "(max: %"PRIu32", need %"PRId64")\n", total: 2 errors, 1 warnings, 653 lines checked Patch 4/4 has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20190702121106.28374-1-slp@redhat.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
© 2016 - 2025 Red Hat, Inc.