[Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices

Alexey Gerasimenko posted 30 patches 6 years ago
Only 17 patches received!
hw/acpi/ich9.c               |   24 +
hw/acpi/pcihp.c              |    8 +-
hw/core/machine.c            |   21 +
hw/i386/pc_q35.c             |   27 +-
hw/i386/xen/xen-hvm.c        |   32 +-
hw/isa/lpc_ich9.c            |    4 +
hw/pci-host/piix.c           |    2 +-
hw/pci-host/q35.c            |   14 +-
hw/xen/xen-host-pci-device.c |  110 ++++-
hw/xen/xen-host-pci-device.h |    6 +-
hw/xen/xen_pt.c              |   53 +-
hw/xen/xen_pt.h              |   19 +-
hw/xen/xen_pt_config_init.c  | 1109 +++++++++++++++++++++++++++++++++++++++---
include/hw/acpi/ich9.h       |    2 +
include/hw/acpi/pcihp.h      |    2 +
include/hw/boards.h          |    1 +
include/hw/i386/ich9.h       |    1 +
include/hw/i386/pc.h         |    3 +
include/hw/pci-host/q35.h    |    4 +-
include/hw/xen/xen.h         |    5 +-
qemu-options.hx              |    1 +
stubs/xen-hvm.c              |    8 +-
22 files changed, 1333 insertions(+), 123 deletions(-)
[Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
Posted by Alexey Gerasimenko 6 years ago
This patch series introduces support of Q35 emulation for Xen HVM guests
(via QEMU). This feature is present in other virtualization products and
Xen can greatly benefit from this feature as well.

The main goal for implementing Q35 emulation for Xen was extending PCI/GPU
passthrough capabilities. It's the main advantage of Q35 emulation
- availability of extra features for PCIe device passthrough. The most
important PCIe-specific passthrough feature Q35 provides is a support for
PCIe config space ECAM (aka MMCONFIG) to allow accesses to extended PCIe
config space (>256), which is MMIO-based.  Lots of PCIe devices and their
drivers make use of PCIe Extended Capabilities, whose can be accessed only
using ECAM and offsets above 0x100 in PCI config space. Supporting ECAM
is a mandatory feature for PCIe passthrough. Not only this allows
passthrough PCIe devices to function properly, but opens a road to extend
Xen PCIe passthrough features further -- eg. providing support for AER. One
of possible directions is providing support for PCIe Resizable BARs --
a feature which likely to become common for modern GPUs as video memory
sizes increase.

Q35 emulation may also be useful for other purposes. In fact, the emulation
of a more recent chipset partially closes a huge gap between a set of
required platform features and the actual emulated platform capabilities
- lot of required functionality is actually missing in a real i440 chipset.
One can look at IGD passthru support patches from Intel for example:
according to code comments, they had to create a dummy PCI-ISA bridge
at BDF 0:1F.0 in order to make the old i440 system look more modern, just
to make it compatible with IGD driver. Using Q35 emulation with its own
emulated LPC bridge allows to avoid workarounds like this. i440 on its own
is a fairly outdated system and doesn't really support lot of things, like
MMIO hole above 4Gb (although it is actually emulated). Also, due to the
i440 chipset's age the only fact of its usage may be used as a reliable
method to detect a virtualized environment by some malicious software
especially considering the fact that i440 emulation is shared among
multiple virtualization products.

On top of this series I've also implemented a solution which solves
existing Xen puzzle with HVM memory layout -- handling of VRAM, RMRRs and
MMIO hole in general. This "puzzle" (memory layout inconsistency between
libxl/libxc, hvmloader and QEMU) is a sort of fundamental problem which
plagues Xen for years and among few other issues prevents Xen to become a
decent GPU/PCIe passthrough platform (which it should be). This solution
also allows to later resolve current PCI passthrough incompatibility
issues, eg. with Populate-on-Demand. In fact, i440 support has been added
as well, but it's a bit hacky as it uses NB registers which are not present
in a real i440 (well, one more non-existing i440 feature won't harm anyway
as there are plenty of them already). I'm planning to send RFC patches of
this solution right after current patches will be reviewed and related code
settle, to rebase patches on top of it. Also, a good description is
required as the change is rather radical.

The good thing is that providing Q35 support for Xen at this stage neither
break any existing functionality nor affect the legacy i440 emulation
in any way - Q35 emulation can be enabled on demand only, using a new
domain config option. Also, only existing interfaces are used, no new
hypecalls were introduced, no API changes, etc. Although in the future
we'll have to change some hypercall/QMP/etc interfaces to remove
limitations and extend the Q35/PCIe passthru support further.

Current features and limitations:
- All basic functionality works normally - MP, networking, storage (AHCI),
  powering down VMs via ACPI soft off, etc
- Xen Platform Device and PV devices are supported -- PV drivers for vbd,
  vif, etc may be installed and used
- PCIe ECAM fully supported, with allocating space for PCIEXBAR in MMIO
  hole, ACPI MCFG generation, etc.
- Xen is limited to max 4 PIRQs in multiple places, while Q35 have support
  of 8 PIRQs / PCI router links. This was workarounded by describing only
  4 usable IRQ link entries in ACPI tables and disabling PIRQE..PIRQH -- like
  we're on a real system which has only some of 8 available PIRQs physically
  connected on the chipset. Extending the number of PCI links supported
  is trivial, but this step will change the save/migration stream format
  a bit... although as it seems there was actually some place for this
  extension being left -- eg. field uint8_t route[4] followed by uint8_t
  pad0[4] in hvm_hw_pci_link structure. Anyway, there is no problem actually
  as we normally deal with APIC mode (or MSIs) for IRQ delivery, while PIC
  mode with PCI routing needed only for legacy compatibility
- PCI hotplug currently implemented via ACPI hotplug, in a way similar
  to i440. In future, this might be changed to native PCIe hotplug facilities
  (if there will be a benefit).
- For PCIe passthrough to work on Windows 7 and above, a specific
  workaround was implemented, which allows to use PCIe device passthrough
  on those guest OSes normally. In future, this should be changed to a new
  emulated PCI architecture for Xen -- providing support for simple PCI
  hierarchies, nested MMIO spaces, etc. Basically, we need at least
  to provide support for PCI-PCI bridges (PCIe Root Ports in our case).
  Currently Xen limited to bus 0 in many places, even in hypercall
  parameters. A detailed description of the issue can be found in the patch
  named "xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
  check".
- VM migration was not tested as the feature primarily targets the PCIe
  passthrough which doesn't compatible with migration anyway.

How to use the Q35 feature:

A new domain config option was implemented: device_model_machine. It's
a string which has following possible values:
- "i440" -- i440 emulation (default)
- "q35"  -- emulate a Q35 machine. By default, the storage interface is
  AHCI.

Note that omitting device_model_machine parameter means i440 system
by default, so the default behavior doesn't change for old domain config
files.

So, in order to enable Q35 emulation one need to specify the following
option in the domain config file:
device_model_machine="q35"

It is recommended to install the guest OS from scratch to avoid issues due
to the emulated platform change.

One extra note - if you're going to backport this series to some older QEMU
version, make sure you have this patch for AHCI DMA bug applied: [1].
Otherwise you will encounter  random Q35 guest hangups with "Bad RAM
offset" message logged in /var/log/xen. Recent QEMU versions have this
patch commited already.

Also, a commit [2] is required to be applied (for xen-pt.c) -- it is
available in the upstream QEMU currently, but not present in qemu-xen.

This is my first (somewhat) large contribution to Xen, so some mistakes
are to be expected. Most testing was done using previous version of patches
and Xen 4.8.x.

I plan to support and extend this series further, for now I expect some
comments/suggestions/testing results/bugreports.

[1]: https://lists.xen.org/archives/html/xen-devel/2017-07/msg01077.html
[2]: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg03572.html

Xen changes:
Alexey Gerasimenko (12):
  libacpi: new DSDT ACPI table for Q35
  Makefile: build and use new DSDT table for Q35
  hvmloader: add function to query an emulated machine type (i440/Q35)
  hvmloader: add ACPI enabling for Q35
  hvmloader: add Q35 DSDT table loading
  hvmloader: add basic Q35 support
  hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
    refactoring
  libxl: Q35 support (new option device_model_machine)
  libxl: Xen Platform device support for Q35
  libacpi: build ACPI MCFG table if requested
  hvmloader: use libacpi to build MCFG table
  docs: provide description for device_model_machine option

 docs/man/xl.cfg.pod.5.in             |  27 ++
 tools/firmware/hvmloader/Makefile    |   2 +-
 tools/firmware/hvmloader/config.h    |   5 +
 tools/firmware/hvmloader/hvmloader.c |  11 +-
 tools/firmware/hvmloader/pci.c       | 289 ++++++++++++------
 tools/firmware/hvmloader/pci_regs.h  |   7 +
 tools/firmware/hvmloader/util.c      | 130 ++++++++-
 tools/firmware/hvmloader/util.h      |  10 +
 tools/libacpi/Makefile               |   9 +-
 tools/libacpi/acpi2_0.h              |  21 ++
 tools/libacpi/build.c                |  42 +++
 tools/libacpi/dsdt_q35.asl           | 551 +++++++++++++++++++++++++++++++++++
 tools/libacpi/libacpi.h              |   4 +
 tools/libxl/libxl_dm.c               |  20 +-
 tools/libxl/libxl_types.idl          |   7 +
 tools/xl/xl_parse.c                  |  14 +
 16 files changed, 1051 insertions(+), 98 deletions(-)
 create mode 100644 tools/libacpi/dsdt_q35.asl

QEMU changes:
Alexey Gerasimenko (18):
  pc/xen: Xen Q35 support: provide IRQ handling for PCI devices
  pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug
  q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35
  q35/xen: Add Xen platform device support for Q35
  q35: Fix incorrect values for PCIEXBAR masks
  xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and
    PCIe Extended Capabilities enumeration
  xen/pt: avoid reading PCIe device type and cap version multiple times
  xen/pt: determine the legacy/PCIe mode for a passed through device
  xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology
    check
  xen/pt: add support for PCIe Extended Capabilities and larger config
    space
  xen/pt: handle PCIe Extended Capabilities Next register
  xen/pt: allow to hide PCIe Extended Capabilities
  xen/pt: add Vendor-specific PCIe Extended Capability descriptor and
    sizing
  xen/pt: add fixed-size PCIe Extended Capabilities descriptors
  xen/pt: add AER PCIe Extended Capability descriptor and sizing
  xen/pt: add descriptors and size calculation for
    RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities
  xen/pt: add Resizable BAR PCIe Extended Capability descriptor and
    sizing
  xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors and
    sizing

 hw/acpi/ich9.c               |   24 +
 hw/acpi/pcihp.c              |    8 +-
 hw/core/machine.c            |   21 +
 hw/i386/pc_q35.c             |   27 +-
 hw/i386/xen/xen-hvm.c        |   32 +-
 hw/isa/lpc_ich9.c            |    4 +
 hw/pci-host/piix.c           |    2 +-
 hw/pci-host/q35.c            |   14 +-
 hw/xen/xen-host-pci-device.c |  110 ++++-
 hw/xen/xen-host-pci-device.h |    6 +-
 hw/xen/xen_pt.c              |   53 +-
 hw/xen/xen_pt.h              |   19 +-
 hw/xen/xen_pt_config_init.c  | 1109 +++++++++++++++++++++++++++++++++++++++---
 include/hw/acpi/ich9.h       |    2 +
 include/hw/acpi/pcihp.h      |    2 +
 include/hw/boards.h          |    1 +
 include/hw/i386/ich9.h       |    1 +
 include/hw/i386/pc.h         |    3 +
 include/hw/pci-host/q35.h    |    4 +-
 include/hw/xen/xen.h         |    5 +-
 qemu-options.hx              |    1 +
 stubs/xen-hvm.c              |    8 +-
 22 files changed, 1333 insertions(+), 123 deletions(-)

-- 
2.11.0


Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
Posted by Daniel P. Berrangé 6 years ago
The subject line says to expect 30 patches, but you've only sent 18 to
the list here. I eventually figured out that the first 12 patches were
in Xen code and so not sent to qemu-devel.

For future if you have changes that affect multiple completely separate
projects, send them as separate series. ie just send PATCH 00/18 to
QEMU devel so it doesn't look like a bunch of patches have gone missing.

On Tue, Mar 13, 2018 at 04:33:45AM +1000, Alexey Gerasimenko wrote:
> How to use the Q35 feature:
> 
> A new domain config option was implemented: device_model_machine. It's
> a string which has following possible values:
> - "i440" -- i440 emulation (default)
> - "q35"  -- emulate a Q35 machine. By default, the storage interface is
>   AHCI.

Presumably this is mapping to the QEMU -machine arg, so it feels desirable
to keep the same naming scheme. ie allow any of the versioned machine
names that QEMU uses. eg any of "pc-q35-2.x" versioned types, or 'q35' as
an alias for latest, and use "pc-i440fx-2.x" versioned types of 'pc' as
an alias for latest, rather than 'i440' which is needlessly divering
from the QEMU machine type.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
Posted by Alexey G 6 years ago
On Tue, 13 Mar 2018 09:21:54 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

>The subject line says to expect 30 patches, but you've only sent 18 to
>the list here. I eventually figured out that the first 12 patches were
>in Xen code and so not sent to qemu-devel.
>
>For future if you have changes that affect multiple completely separate
>projects, send them as separate series. ie just send PATCH 00/18 to
>QEMU devel so it doesn't look like a bunch of patches have gone
>missing.

OK, we'll do for next versions.

>> A new domain config option was implemented: device_model_machine.
>> It's a string which has following possible values:
>> - "i440" -- i440 emulation (default)
>> - "q35"  -- emulate a Q35 machine. By default, the storage interface
>> is AHCI.  
>
>Presumably this is mapping to the QEMU -machine arg, so it feels
>desirable to keep the same naming scheme. ie allow any of the
>versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
>versioned types, or 'q35' as an alias for latest, and use
>"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
>than 'i440' which is needlessly divering from the QEMU machine type.

Yes, it is translated into the '-machine' argument.

A direct mapping between the Xen device_model_machine option and QEMU
'-machine' argument won't be accepted by Xen maintainers I guess.

The main problem with this approach is a requirement to have a match
between Xen/libxl and QEMU versions. If, for example,
device_model_machine tells something like "pc-q35-2.11" and later we
downgrade QEMU to some older version we'll likely have a problem
without changing anything in the domain config. So I guess the "use the
latest available" approach for machine selection (pc, q35, etc) is the
only possible option. Perhaps having the way to specify the exact QEMU
machine name and version in a separate domain config parameter (for
advanced use) might be feasible.

Also, parameter names do not speak for themselves I'm afraid. This way
we'll have, for example, device_model_machine="pc" vs
device_model_machine="q35"... a bit unclear I think. This may be
obvious for a QEMU user, but many Xen users didn't get used to QEMU
machines and there might be some wondering why "q35" is not "pc" and
why "pc" is an i440 system precisely.

Another obstacle here is xen_platform_device option which indirectly
selects QEMU machine type for i440 at the moment (pc/xenfv), but this
may be addressed by controlling the Xen platform device independently
via a separate machine property or '-device xen-platform' like
Eduardo Habkost suggested.

Re: [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices
Posted by Daniel P. Berrangé 6 years ago
On Tue, Mar 13, 2018 at 09:37:55PM +1000, Alexey G wrote:
> On Tue, 13 Mar 2018 09:21:54 +0000
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> >The subject line says to expect 30 patches, but you've only sent 18 to
> >the list here. I eventually figured out that the first 12 patches were
> >in Xen code and so not sent to qemu-devel.
> >
> >For future if you have changes that affect multiple completely separate
> >projects, send them as separate series. ie just send PATCH 00/18 to
> >QEMU devel so it doesn't look like a bunch of patches have gone
> >missing.
> 
> OK, we'll do for next versions.
> 
> >> A new domain config option was implemented: device_model_machine.
> >> It's a string which has following possible values:
> >> - "i440" -- i440 emulation (default)
> >> - "q35"  -- emulate a Q35 machine. By default, the storage interface
> >> is AHCI.  
> >
> >Presumably this is mapping to the QEMU -machine arg, so it feels
> >desirable to keep the same naming scheme. ie allow any of the
> >versioned machine names that QEMU uses. eg any of "pc-q35-2.x"
> >versioned types, or 'q35' as an alias for latest, and use
> >"pc-i440fx-2.x" versioned types of 'pc' as an alias for latest, rather
> >than 'i440' which is needlessly divering from the QEMU machine type.
> 
> Yes, it is translated into the '-machine' argument.
> 
> A direct mapping between the Xen device_model_machine option and QEMU
> '-machine' argument won't be accepted by Xen maintainers I guess.
> 
> The main problem with this approach is a requirement to have a match
> between Xen/libxl and QEMU versions. If, for example,
> device_model_machine tells something like "pc-q35-2.11" and later we
> downgrade QEMU to some older version we'll likely have a problem
> without changing anything in the domain config. So I guess the "use the
> latest available" approach for machine selection (pc, q35, etc) is the
> only possible option. Perhaps having the way to specify the exact QEMU
> machine name and version in a separate domain config parameter (for
> advanced use) might be feasible.

At least with plain QEMU or KVM, using the versioned machine type
names is important as that is what guarantees you a stable guest
machine ABI, independant of QEMU version.  If your deployment has
a mixture of QEMU versions on different hosts, then you very much
want to pick a versioned machine type to ensure compatibility for
live migration. With libvirt we accept the short "pc" or "q35"
names on input, but expand them to the fully versioned name
when saving the config file, so no matter which QEMU version is
used each time the guest is launched, the ABI is always the same.

> 
> Also, parameter names do not speak for themselves I'm afraid. This way
> we'll have, for example, device_model_machine="pc" vs
> device_model_machine="q35"... a bit unclear I think. This may be
> obvious for a QEMU user, but many Xen users didn't get used to QEMU
> machines and there might be some wondering why "q35" is not "pc" and
> why "pc" is an i440 system precisely.
> 
> Another obstacle here is xen_platform_device option which indirectly
> selects QEMU machine type for i440 at the moment (pc/xenfv), but this
> may be addressed by controlling the Xen platform device independently
> via a separate machine property or '-device xen-platform' like
> Eduardo Habkost suggested.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|