arm: Implement GICv4

[PATCH 00/41] arm: Implement GICv4

Posted by Peter Maydell 2 years, 1 month ago

This patchset implements emulation of GICv4 in our TCG GIC and ITS
models, and makes the virt board use it where appropriate.

The GICv4 provides a single new feature: direct injection of virtual
interrupts from the ITS to a VM. In QEMU terms this means that if you
have an outer QEMU which is emulating a CPU with EL2, and the outer
guest passes through a PCI device (probably one emulated by the outer
QEMU) to an inner guest, interrupts from that device can go directly
to the inner guest, rather than having to go to the outer guest and
the outer guest then synthesizing virtual interrupts to the inner
guest. (If you aren't configuring the inner guest with a passthrough
PCI device then this new feature is of no interest.)

The basic structure of the patchset is as follows:

(1) There are a handful of preliminary patches fixing some minor
existing nits.

(2) The v4 ITS has some new in-guest-memory data structures and new
ITS commands that let the guest set them up. The next sequence of
patches implement all those commands. Where the command needs to
actually do something (eg "deliver a vLPI"), these patches call
functions in the redistributor which are left as unimplemented stubs
to be filled in in subsequent patches. This first chunk of patches
sticks to the data-structure handling and all the command argument
unpacking and error checking.

(3) The redistributor has a new redistributor frame (ie the amount of
guest memory used by redistributor registers is larger) with a two new
registers in it. We implement these initially as reads-as-written.

(4) The CPU interface needs relatively minor changes: as well as
looking at the list registers to determine the highest priority
pending virtual interrupt, we must also look at the highest priority
pending vLPI. We implement these changes, again leaving the interfaces
from this code into the redistributor as stubs for the moment.

(5) Now we can fill in all the stub code in the redistributor.  This
is almost all working with the pending and config tables for virtual
LPIs. (Side note: in real hardware some of this work is done by the
ITS rather than the redistributor, but in our implementation we split
between the two source files slightly differently. I've made the vLPI
handling follow the pattern of the existing LPI handling.)

(6) Finally, we can update the ID registers which tell the guest about
the presence of v4 features, allow the GIC device to accept 4 as a
value for its QOM revision property, and make the virt board set that
when appropriate.

General notes:

Since the only useful thing in GICv4 is direct virtual interrupt
injection, it isn't expected that you would have a system with a GICv4
and a CPU without EL2. So I've made this an error, and the virt board
will only use GICv4 if the user also enables emulation of
virtualization.

Because the redistributor frame is twice the size in GICv4, the
number of redistributors we can fit into a given area of memory
is reduced. This means that when using GICv4 the maximum number
of CPUs supported on the virt board drops from 512 to 317. (No,
I'm not sure why this is 317 and not 256 :-))

I have not particularly considered performance in this initial
implementation. In particular, we will do a complete re-scan of a
virtual LPI pending table every time the outer guest reschedules a
vCPU (and writes GICR_VPENDBASER). The spec provides scope for
optimisation here, by allowing part of the LPI table to have IMPDEF
contents, which we could in principle use to cache information like
the current highest priority pending vLPI. Given that emulating
nested guests with PCI passthrough is a fairly niche activity,
I propose that we not do this unless the three people doing that
complain about this all being too slow :-)

Tested with a Linux kernel passing through a virtio-blk device
to an inner Linux VM with KVM/QEMU. (NB that to get the outer
Linux kernel to actually use the new GICv4 functionality you
need to pass it "kvm-arm.vgic_v4_enable=1", as the kernel
will not use it by default.)

thanks
-- PMM

Peter Maydell (41):
  hw/intc/arm_gicv3_its: Add missing blank line
  hw/intc/arm_gicv3: Sanity-check num-cpu property
  hw/intc/arm_gicv3: Insist that redist region capacity matches CPU count
  hw/intc/arm_gicv3: Report correct PIDR0 values for ID registers
  target/arm/cpu.c: ignore VIRQ and VFIQ if no EL2
  hw/intc/arm_gicv3_its: Factor out "is intid a valid LPI ID?"
  hw/intc/arm_gicv3_its: Implement GITS_BASER2 for GICv4
  hw/intc/arm_gicv3_its: Implement VMAPI and VMAPTI
  hw/intc/arm_gicv3_its: Implement VMAPP
  hw/intc/arm_gicv3_its: Distinguish success and error cases of CMD_CONTINUE
  hw/intc/arm_gicv3_its: Factor out "find ITE given devid, eventid"
  hw/intc/arm_gicv3_its: Factor out CTE lookup sequence
  hw/intc/arm_gicv3_its: Split out process_its_cmd() physical interrupt code
  hw/intc/arm_gicv3_its: Handle virtual interrupts in process_its_cmd()
  hw/intc/arm_gicv3: Keep pointers to every connected ITS
  hw/intc/arm_gicv3_its: Implement VMOVP
  hw/intc/arm_gicv3_its: Implement VSYNC
  hw/intc/arm_gicv3_its: Implement INV command properly
  hw/intc/arm_gicv3_its: Implement INV for virtual interrupts
  hw/intc/arm_gicv3_its: Implement VMOVI
  hw/intc/arm_gicv3_its: Implement VINVALL
  hw/intc/arm_gicv3: Implement GICv4's new redistributor frame
  hw/intc/arm_gicv3: Implement new GICv4 redistributor registers
  hw/intc/arm_gicv3_cpuif: Split "update vIRQ/vFIQ" from
    gicv3_cpuif_virt_update()
  hw/intc/arm_gicv3_cpuif: Support vLPIs
  hw/intc/arm_gicv3_cpuif: Don't recalculate maintenance irq unnecessarily
  hw/intc/arm_gicv3_redist: Factor out "update hpplpi for one LPI" logic
  hw/intc/arm_gicv3_redist: Factor out "update hpplpi for all LPIs" logic
  hw/intc/arm_gicv3_redist: Recalculate hppvlpi on VPENDBASER writes
  hw/intc/arm_gicv3_redist: Factor out "update bit in pending table" code
  hw/intc/arm_gicv3_redist: Implement gicv3_redist_process_vlpi()
  hw/intc/arm_gicv3_redist: Implement gicv3_redist_vlpi_pending()
  hw/intc/arm_gicv3_redist: Use set_pending_table_bit() in mov handling
  hw/intc/arm_gicv3_redist: Implement gicv3_redist_mov_vlpi()
  hw/intc/arm_gicv3_redist: Implement gicv3_redist_vinvall()
  hw/intc/arm_gicv3_redist: Implement gicv3_redist_inv_vlpi()
  hw/intc/arm_gicv3: Update ID and feature registers for GICv4
  hw/intc/arm_gicv3: Allow 'revision' property to be set to 4
  hw/arm/virt: Use VIRT_GIC_VERSION_* enum values in create_gic()
  hw/arm/virt: Abstract out calculation of redistributor region capacity
  hw/arm/virt: Support TCG GICv4

 docs/system/arm/virt.rst               |   5 +-
 hw/intc/gicv3_internal.h               | 231 ++++++-
 include/hw/arm/virt.h                  |  19 +-
 include/hw/intc/arm_gicv3_common.h     |  13 +
 include/hw/intc/arm_gicv3_its_common.h |   1 +
 hw/arm/virt.c                          | 102 ++-
 hw/intc/arm_gicv3_common.c             |  54 +-
 hw/intc/arm_gicv3_cpuif.c              | 195 +++++-
 hw/intc/arm_gicv3_dist.c               |   7 +-
 hw/intc/arm_gicv3_its.c                | 876 ++++++++++++++++++++-----
 hw/intc/arm_gicv3_its_kvm.c            |   2 +
 hw/intc/arm_gicv3_kvm.c                |   5 +
 hw/intc/arm_gicv3_redist.c             | 480 +++++++++++---
 target/arm/cpu.c                       |  12 +-
 hw/intc/trace-events                   |  18 +-
 15 files changed, 1695 insertions(+), 325 deletions(-)

-- 
2.25.1

Re: [PATCH 00/41] arm: Implement GICv4

Posted by Peter Maydell 2 years, 1 month ago

On Fri, 8 Apr 2022 at 15:15, Peter Maydell <peter.maydell@linaro.org> wrote:
>
> This patchset implements emulation of GICv4 in our TCG GIC and ITS
> models, and makes the virt board use it where appropriate.

> Tested with a Linux kernel passing through a virtio-blk device
> to an inner Linux VM with KVM/QEMU. (NB that to get the outer
> Linux kernel to actually use the new GICv4 functionality you
> need to pass it "kvm-arm.vgic_v4_enable=1", as the kernel
> will not use it by default.)

I guess I might as well post my notes here about how I set up
that test environment. These are a bit too scrappy (and rather
specific about a niche thing) to be proper documentation, but
having them in the list archives might be helpful in future...

===nested-setup.txt===
How to set up an environment to test QEMU's emulation of virtualization,
with PCI passthrough of a virtio-blk-pci device to the L2 guest

(1) Set up a Debian aarch64 guest (the instructions in the old
blog post
https://translatedcode.wordpress.com/2017/07/24/installing-debian-on-qemus-64-bit-arm-virt-board/
still work; I used Debian bullseye for my testing).

(2) Copy the hda.qcow2 to hda-for-inner.qcow2; run the L1 guest
using the 'runme' script.

Caution: the virtio devices need to be in this order (hda.qcow2,
network,hda-for-inner.qcow2), because systemd in the guest names
the ethernet interface based on which PCI slot it goes in.

(3) In the L1 guest, first we need to fix up the hda-for-inner.qcow2
so that it has different UUIDs and partition UUIDs from hda.qcow2.
You'll need to make sure you have the blkid, gdisk, tune2fs, swaplabel
utilities installed in the guest.

 swapoff -a   # L1 guest might have swapped onto /dev/vdb2 by accident
 # print current partition IDs; you'll see that vda and vdb currently
 # share IDs for their partitions, and we must change those for vdb
 blkid
 # first change the PARTUUIDs with gdisk; this is the answer from
 # https://askubuntu.com/questions/1250224/how-to-change-partuuid
 gdisk /dev/vdb
 x   # change to experts menu
 c   # change partition ID
 1   # for partition 1
 R   # pick a random ID
 c   # ditto for partitions 2, 3
 2
 R
 c
 3
 R
 m   # back to main menu
 w   # write partition table
 q   # quit
 # change UUIDs; from
https://unix.stackexchange.com/questions/12858/how-to-change-filesystem-uuid-2-same-uuid
 tune2fs -U random /dev/vdb1
 tune2fs -U random /dev/vdb2
 swaplabel -U $(uuidgen) /dev/vdb3
 # Check the UUIDs and PARTUUIDs are now all changed:
 blkid
 # Now update the fstab in the L2 filesystem:
 mount /dev/vdb2 /mnt
 # Finally, edit /mnt/etc/fstab to set the UUID values for /, /boot and swap to
 # the new ones for /dev/vdb's partitions
 vi /mnt/etc/fstab # or editor of your choice
 umount /mnt
 # shutdown the L1 guest now, to ensure that all the changes to that
 # qcow2 file are committed
 shutdown -h now

(4) Copy necessary files into the L1 guest's filesystem;
you can run the L1 guest and run scp there to copy from your host machine,
or any other method you like. You'll need:
 - the vmlinuz (same one being used for L1)
 - the initrd
 - some scripts [runme-inner, runme-inner-nopassthru, reassign-vdb]
 - a copy of hda-for-inner.qcow2 (probably best to copy it to a temporary
   file while the L1 guest is not running, then copy that into the guest)
 - the qemu-system-aarch64 you want to use as the L2 QEMU
   (I cross-compiled this on my x86-64 host. The packaged Debian bullseye
   qemu-system-aarch64 will also work if you don't need to use a custom
   QEMU for L2.)

(5) Now you can run the L2 guest without using PCI passthrough like this:
 ./runme-inner-nopassthru ./qemu-system-aarch64

(6) And you can run the L2 guest with PCI passthrough like this:
 # you only need to run reassign-vdb once for any given run of the
 # L1 guest, to give the PCI device to vfio-pci rather than to the
 # L1 virtio driver. After that you can run the L2 QEMU multiple times.
 ./reassign-vdb
 ./runme-inner ./qemu-system-aarch64

Notes:

I have set up the various 'runme' scripts so that L1 has a mux of
stdio and the monitor, which means that you can kill it with ^A-x,
and ^C will be delivered to the L1 guest. The L2 guest has plain
'-serial stdio', which means that ^C will kill the L2 guest.

The 'runme' scripts expect their first argument to be the path to
the QEMU you want to run; any further arguments are extra arguments
to that QEMU. So you can do things like:

   # pass more arguments to QEMU, here disabling the ITS
   ./runme ~/qemu-system-aarch64 -machine its=off
   # run gdb, and run QEMU under gdb
   ./runme gdb --args ~/qemu-system-aarch64 -machine its=off

The 'runme' scripts should be in the same directory as the
kernel etc files they go with; but you don't need to be
in that directory to run them.
===endit===

===runme===
#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"
QEMU="$@"

# Run with GICv3 and the disk image with a nested copy in it
# (for testing EL2/GICv3-virt emulation)

: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}
: ${DISK:=$TESTDIR/hda.qcow2}
: ${INNERDISK:=$TESTDIR/hda-for-inner.qcow2}

# Note that the virtio-net-pci must be the 2nd PCI device,
# because otherwise the network interface name it gets will
# not match /etc/network/interfaces.

# set up with -serial mon:stdio so we can ^C the inner QEMU

IOMMU_ADDON=',iommu_platform=on,disable-modern=off,disable-legacy=on'

${QEMU} \
  -cpu cortex-a57 \
  -machine type=virt \
  -machine gic-version=max \
  -machine virtualization=true \
  -machine iommu=smmuv3 \
  -m 1024M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \
  -device virtio-blk-pci,drive=mydrive \
  -netdev user,id=mynet \
  -device virtio-net-pci,netdev=mynet \
  -drive if=none,id=innerdrive,file="${INNERDISK}",format=qcow2 \
  -device virtio-blk-pci,drive=innerdrive"$IOMMU_ADDON" \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2
kvm-arm.vgic_v4_enable=1' \
  -chardev socket,id=monitor,host=127.0.0.1,port=4444,server=on,wait=off,telnet=on
\
  -mon chardev=monitor,mode=readline \
  -display none -serial mon:stdio
===endit===

===reassign-vdb===
#!/bin/sh -e
# Script to detach the /dev/vdb PCI device from the virtio-blk driver
# and hand it to vfio-pci

PCIDEV=0000:00:03.0

echo -n "$PCIDEV" > /sys/bus/pci/drivers/virtio-pci/unbind
modprobe vfio-pci

echo vfio-pci > /sys/bus/pci/devices/"$PCIDEV"/driver_override

echo -n "$PCIDEV" > /sys/bus/pci/drivers/vfio-pci/bind
===endit===

===runme-inner===
#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"
QEMU="$@"

# run the inner guest, passing it the passthrough PCI device
: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}

# set up with -serial stdio so we can ^C the inner QEMU
# use -net none to work around the default virtio-net-pci
# network device wanting to load efi-virtio.rom, which the
# L1 guest's debian package puts somewhere other than where
# our locally compiled qemu-system-aarch64 wants to find it.

${QEMU} \
  -cpu cortex-a57 \
  -enable-kvm \
  -machine type=virt \
  -machine gic-version=3 \
  -m 256M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \
  -display none -serial stdio \
  -device vfio-pci,host=0000:00:03.0,id=pci0 \
  -net none
===endit===

===runme-inner-nopassthru===
#!/bin/sh -e
TESTDIR="$(cd "$(dirname "$0")"; pwd)"
QEMU="$@"

# run the inner guest, passing it a disk image
: ${KERNEL:=$TESTDIR/vmlinuz-5.10.0-9-arm64}
: ${INITRD:=$TESTDIR/initrd.img-5.10.0-9-arm64}
: ${DISK:=$TESTDIR/hda-for-inner.qcow2}

# set up with -serial stdio so we can ^C the inner QEMU
# use -net none to work around the default virtio-net-pci
# network device wanting to load efi-virtio.rom, which the
# L1 guest's debian package puts somewhere other than where
# our locally compiled qemu-system-aarch64 wants to find it.

${QEMU} \
  -cpu cortex-a57 \
  -enable-kvm \
  -machine type=virt \
  -machine gic-version=3 \
  -m 256M \
  -kernel "${KERNEL}" -initrd "${INITRD}" \
  -drive if=none,id=mydrive,file="${DISK}",format=qcow2 \
  -device virtio-blk-pci,drive=mydrive \
  -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \
  -display none -serial stdio \
  -net none
===endit===

-- PMM