[PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute

Roger Pau Monne posted 1 patch 4 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250610162930.89055-1-roger.pau@citrix.com
CHANGELOG.md                            |  3 ++
docs/man/xl.cfg.5.pod.in                |  8 ++++
docs/misc/xenstore-paths.pandoc         |  5 +++
tools/firmware/hvmloader/config.h       |  2 +-
tools/firmware/hvmloader/pci.c          | 50 ++++++++++++++++++++++++-
tools/firmware/hvmloader/util.c         |  2 +-
tools/include/libxl.h                   |  9 +++++
tools/libs/light/libxl_create.c         |  1 +
tools/libs/light/libxl_dom.c            |  9 +++++
tools/libs/light/libxl_types.idl        |  1 +
tools/xl/xl_parse.c                     |  2 +
xen/include/public/hvm/hvm_xs_strings.h |  2 +
12 files changed, 90 insertions(+), 4 deletions(-)
[PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Roger Pau Monne 4 months, 3 weeks ago
The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't
have the functionality of a traditional PCI device.  The exposed MMIO BAR
is used by some guests (including Linux) as a safe place to map foreign
memory, including the grant table itself.

Traditionally BARs from devices have the uncacheable (UC) cache attribute
from the MTRR, to ensure correct functionality of such devices.  hvmloader
mimics this behavior and sets the MTRR attributes of both the low and high
PCI MMIO windows (where BARs of PCI devices reside) as UC in MTRR.

This however causes performance issues for users of the Xen PCI device BAR,
as for the purposes of mapping remote memory there's no need to use the UC
attribute.  On Intel systems this is worked around by using iPAT, that
allows the hypervisor to force the effective cache attribute of a p2m entry
regardless of the guest PAT value.  AMD however doesn't have an equivalent
of iPAT, and guest PAT values are always considered.

Linux commit:

41925b105e34 xen: replace xen_remap() with memremap()

Attempted to mitigate this by forcing mappings of the grant-table to use
the write-back (WB) cache attribute.  However Linux memremap() takes MTRRs
into account to calculate which PAT type to use, and seeing the MTRR cache
attribute for the region being UC the PAT also ends up as UC, regardless of
the caller having requested WB.

As a workaround to allow current Linux to map the grant-table as WB using
memremap() introduce an xl.cfg option (xenpci_bar_uc=0) that can be used to
select whether the Xen PCI device BAR will have the UC attribute in MTRR.
Such workaround in hvmloader should also be paired with a fix for Linux so
it attempts to change the MTRR of the Xen PCI device BAR to WB by itself.

Overall, the long term solution would be to provide the guest with a safe
range in the guest physical address space where mappings to foreign pages
can be created.

Some vif throughput performance figures provided by Anthoine from a 8
vCPUs, 4GB of RAM HVM guest(s) running on AMD hardware:

Without this patch:
vm -> dom0: 1.1Gb/s
vm -> vm:   5.0Gb/s

With the patch:
vm -> dom0: 4.5Gb/s
vm -> vm:   7.0Gb/s

Reported-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
---
Changes since v3:
 - Add changelog.
 - Assign BARs normally for xenpci if special casing fails.

Changes since v2:
 - Add default value in xl.cfg.
 - List xenstore path in the pandoc file.
 - Adjust comment in hvmloader.
 - Fix commit message MIO -> MMIO.

Changes since v1:
 - Leave the xenpci BAR as UC by default.
 - Introduce an option to not set it as UC.
---
 CHANGELOG.md                            |  3 ++
 docs/man/xl.cfg.5.pod.in                |  8 ++++
 docs/misc/xenstore-paths.pandoc         |  5 +++
 tools/firmware/hvmloader/config.h       |  2 +-
 tools/firmware/hvmloader/pci.c          | 50 ++++++++++++++++++++++++-
 tools/firmware/hvmloader/util.c         |  2 +-
 tools/include/libxl.h                   |  9 +++++
 tools/libs/light/libxl_create.c         |  1 +
 tools/libs/light/libxl_dom.c            |  9 +++++
 tools/libs/light/libxl_types.idl        |  1 +
 tools/xl/xl_parse.c                     |  2 +
 xen/include/public/hvm/hvm_xs_strings.h |  2 +
 12 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1ee2f42e7405..23215a8cc1e6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -15,6 +15,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
  - On x86:
    - Restrict the cache flushing done as a result of guest physical memory map
      manipulations and memory type changes.
+   - Allow controlling the MTRR cache attribute of the Xen PCI device BAR
+     for HVM guests, to improve performance of guests using it to map the grant
+     table or foreign memory.
 
 ### Added
  - On x86:
diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c388899306c2..ddbff6fffc16 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
 Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
 requires at least QEMU 1.6.
 
+
+=item B<xenpci_bar_uc=BOOLEAN>
+
+B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
+uncacheable (UC) cache attribute set in MTRR.
+
+Default is B<true>.
+
 =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
 
 The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
diff --git a/docs/misc/xenstore-paths.pandoc b/docs/misc/xenstore-paths.pandoc
index 01a340fafcbe..073bed91eec1 100644
--- a/docs/misc/xenstore-paths.pandoc
+++ b/docs/misc/xenstore-paths.pandoc
@@ -234,6 +234,11 @@ These xenstore values are used to override some of the default string
 values in the SMBIOS table constructed in hvmloader. See the SMBIOS
 table specification at http://www.dmtf.org/standards/smbios/ 
 
+#### ~/hvmloader/pci/xenpci-bar-uc = ("1"|"0") [HVM,INTERNAL]
+
+Select whether the Xen PCI device MMIO BAR will have the uncacheable cache
+attribute set in the MTRRs by hvmloader.
+
 #### ~/bios-strings/oem-* = STRING [HVM,INTERNAL]
 
 1 to 99 OEM strings can be set in xenstore using values of the form
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index 6e1da137d779..c159db30eea9 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -58,7 +58,7 @@ extern uint32_t *cpu_to_apicid;
 #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
 
 extern uint32_t pci_mem_start;
-extern const uint32_t pci_mem_end;
+extern uint32_t pci_mem_end;
 extern uint64_t pci_hi_mem_start, pci_hi_mem_end;
 
 extern bool acpi_enabled;
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index cc67b18c0361..cfd39cc37cdc 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -30,7 +30,7 @@
 #include <xen/hvm/e820.h>
 
 uint32_t pci_mem_start = HVM_BELOW_4G_MMIO_START;
-const uint32_t pci_mem_end = RESERVED_MEMBASE;
+uint32_t pci_mem_end = RESERVED_MEMBASE;
 uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 
 /*
@@ -116,6 +116,8 @@ void pci_setup(void)
      * experience the memory relocation bug described below.
      */
     bool allow_memory_relocate = 1;
+    /* Select the MTRR cache attribute of the xenpci device BAR. */
+    bool xenpci_bar_uc = false;
 
     BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
                  PCI_COMMAND_IO);
@@ -130,6 +132,12 @@ void pci_setup(void)
     printf("Relocating guest memory for lowmem MMIO space %s\n",
            allow_memory_relocate?"enabled":"disabled");
 
+    s = xenstore_read(HVM_XS_XENPCI_BAR_UC, NULL);
+    if ( s )
+        xenpci_bar_uc = strtoll(s, NULL, 0);
+    printf("XenPCI device BAR MTRR cache attribute set to %s\n",
+           xenpci_bar_uc ? "UC" : "WB");
+
     s = xenstore_read("platform/mmio_hole_size", NULL);
     if ( s )
         mmio_hole_size = strtoll(s, NULL, 0);
@@ -271,6 +279,44 @@ void pci_setup(void)
             if ( bar_sz == 0 )
                 continue;
 
+            if ( !xenpci_bar_uc &&
+                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
+                 vendor_id == 0x5853 &&
+                 (device_id == 0x0001 || device_id == 0x0002) )
+            {
+                if ( is_64bar )
+                {
+                     printf("xenpci dev %02x:%x unexpected MMIO 64bit BAR%u\n",
+                            devfn >> 3, devfn & 7, bar);
+                     goto skip_xenpci;
+                }
+
+                if ( bar_sz > pci_mem_end ||
+                     ((pci_mem_end - bar_sz) & ~(bar_sz - 1)) < pci_mem_start )
+                {
+                     printf("xenpci dev %02x:%x BAR%u size %llx overflows low PCI hole\n",
+                            devfn >> 3, devfn & 7, bar, bar_sz);
+                     goto skip_xenpci;
+                }
+
+                /* Put unconditionally at the end of the low PCI MMIO hole. */
+                pci_mem_end -= bar_sz;
+                pci_mem_end &= ~(bar_sz - 1);
+                bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
+                bar_data |= pci_mem_end;
+                pci_writel(devfn, bar_reg, bar_data);
+                pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
+
+                /* Prefix BAR address with a 0 to match format used below. */
+                printf("pci dev %02x:%x bar %02x size "PRIllx": 0%08x\n",
+                       devfn >> 3, devfn & 7, bar_reg,
+                       PRIllx_arg(bar_sz), bar_data);
+
+                continue;
+            }
+ skip_xenpci:
+
             for ( i = 0; i < nr_bars; i++ )
                 if ( bars[i].bar_sz < bar_sz )
                     break;
@@ -310,7 +356,7 @@ void pci_setup(void)
         }
 
         /* Enable bus master for this function later */
-        pci_devfn_decode_type[devfn] = PCI_COMMAND_MASTER;
+        pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
     }
 
     if ( mmio_hole_size )
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 79c0e6bd4ad2..31b4411db7b4 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -867,7 +867,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
         config->table_flags |= ACPI_HAS_HPET;
 
     config->pci_start = pci_mem_start;
-    config->pci_len = pci_mem_end - pci_mem_start;
+    config->pci_len = RESERVED_MEMBASE - pci_mem_start;
     if ( pci_hi_mem_end > pci_hi_mem_start )
     {
         config->pci_hi_start = pci_hi_mem_start;
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index b7ad7735ca4c..7ce7678e6836 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1503,6 +1503,15 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, const libxl_mac *src);
  */
 #define LIBXL_HAVE_CREATEINFO_XEND_SUSPEND_EVTCHN_COMPAT
 
+/*
+ * LIBXL_HAVE_XENPCI_BAR_UC
+ *
+ * libxl_domain_build_info contains a boolean 'u.hvm.xenpci_bar_uc' field to
+ * signal whether the XenPCI device BAR should have UC cache attribute set in
+ * MTRR.
+ */
+#define LIBXL_HAVE_XENPCI_BAR_UC
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 8bc768b5156c..962fa820faec 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -313,6 +313,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
         libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
         libxl_defbool_setdefault(&b_info->u.hvm.vkb_device,         true);
         libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
+        libxl_defbool_setdefault(&b_info->u.hvm.xenpci_bar_uc,      true);
         libxl_defbool_setdefault(&b_info->u.hvm.pirq,               false);
 
         libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
index 4d67b0d28294..60ec0354d19a 100644
--- a/tools/libs/light/libxl_dom.c
+++ b/tools/libs/light/libxl_dom.c
@@ -819,6 +819,15 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
             goto err;
     }
 
+    if (info->type == LIBXL_DOMAIN_TYPE_HVM &&
+        libxl_defbool_val(info->u.hvm.xenpci_bar_uc)) {
+        path = GCSPRINTF("/local/domain/%d/"HVM_XS_XENPCI_BAR_UC, domid);
+        ret = libxl__xs_printf(gc, XBT_NULL, path, "%d",
+                               libxl_defbool_val(info->u.hvm.xenpci_bar_uc));
+        if (ret)
+            goto err;
+    }
+
     return 0;
 
 err:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 198515383012..6054350b83c7 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -691,6 +691,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
                                        ("vkb_device",       libxl_defbool),
                                        ("soundhw",          string),
                                        ("xen_platform_pci", libxl_defbool),
+                                       ("xenpci_bar_uc",    libxl_defbool),
                                        ("usbdevice_list",   libxl_string_list),
                                        ("vendor_device",    libxl_vendor_device),
                                        # See libxl_ms_vm_genid_generate()
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 219e924779ff..4da3bb9e91ab 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2917,6 +2917,8 @@ skip_usbdev:
         xlu_cfg_replace_string (config, "soundhw", &b_info->u.hvm.soundhw, 0);
         xlu_cfg_get_defbool(config, "xen_platform_pci",
                             &b_info->u.hvm.xen_platform_pci, 0);
+        xlu_cfg_get_defbool(config, "xenpci_bar_uc",
+                            &b_info->u.hvm.xenpci_bar_uc, 0);
 
         if(b_info->u.hvm.vnc.listen
            && b_info->u.hvm.vnc.display
diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h
index e1ed078628a0..ebb07b9fba56 100644
--- a/xen/include/public/hvm/hvm_xs_strings.h
+++ b/xen/include/public/hvm/hvm_xs_strings.h
@@ -14,6 +14,8 @@
 #define HVM_XS_BIOS                    "hvmloader/bios"
 #define HVM_XS_GENERATION_ID_ADDRESS   "hvmloader/generation-id-address"
 #define HVM_XS_ALLOW_MEMORY_RELOCATE   "hvmloader/allow-memory-relocate"
+/* Set xenpci device BAR as UC in MTRR */
+#define HVM_XS_XENPCI_BAR_UC           "hvmloader/pci/xenpci-bar-uc"
 
 /* The following values allow additional ACPI tables to be added to the
  * virtual ACPI BIOS that hvmloader constructs. The values specify the guest
-- 
2.49.0


Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Anthony PERARD 4 months, 3 weeks ago
On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> index c388899306c2..ddbff6fffc16 100644
> --- a/docs/man/xl.cfg.5.pod.in
> +++ b/docs/man/xl.cfg.5.pod.in
> @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
>  Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
>  requires at least QEMU 1.6.
>  
> +
> +=item B<xenpci_bar_uc=BOOLEAN>
> +
> +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
> +uncacheable (UC) cache attribute set in MTRR.

For information, here are different name used for this pci device:

- man xl.cfg:
    xen_platform_pci=<bool>
        Xen platform PCI device
- QEMU:
    -device xen-platform
    in comments: XEN platform pci device
    with pci device-id PCI_DEVICE_ID_XEN_PLATFORM
- EDK2 / OVMF:
    XenIoPci
        described virtual Xen PCI device
        But XenIo is a generic protocol in EDK2
    Before XenIo, the pci device would be linked to XenBus, and
    loaded with PCI_DEVICE_ID_XEN_PLATFORM
- Linux:
    Seems to be called "xen-platform-pci"

Overall, this PCI device is mostly referenced as the Xen Platform PCI
device. So "xenpci" or "Xen PCI device" is surprising to me, and I'm not
quite sure what it is.


> +
> +Default is B<true>.
> +
>  =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
>  
>  The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index cc67b18c0361..cfd39cc37cdc 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -116,6 +116,8 @@ void pci_setup(void)
>       * experience the memory relocation bug described below.
>       */
>      bool allow_memory_relocate = 1;
> +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> +    bool xenpci_bar_uc = false;

This default value for `xenpci_bar_uc` mean that hvmloader changes
behavior compared to previous version, right? Shouldn't we instead have
hvmloader keep the same behavior unless the toolstack want to use the
new behavior? (Like it's done for `allow_memory_relocate`,
"platform/mmio_hole_size")

It would just mean that toolstack other than `xl` won't be surprised by
a change of behavior.

>      BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
>                   PCI_COMMAND_IO);
> @@ -130,6 +132,12 @@ void pci_setup(void)
>      printf("Relocating guest memory for lowmem MMIO space %s\n",
>             allow_memory_relocate?"enabled":"disabled");
>  
> +    s = xenstore_read(HVM_XS_XENPCI_BAR_UC, NULL);
> +    if ( s )
> +        xenpci_bar_uc = strtoll(s, NULL, 0);
> +    printf("XenPCI device BAR MTRR cache attribute set to %s\n",
> +           xenpci_bar_uc ? "UC" : "WB");
> +
>      s = xenstore_read("platform/mmio_hole_size", NULL);
>      if ( s )
>          mmio_hole_size = strtoll(s, NULL, 0);
> @@ -271,6 +279,44 @@ void pci_setup(void)
>              if ( bar_sz == 0 )
>                  continue;
>  
> +            if ( !xenpci_bar_uc &&
> +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> +                 vendor_id == 0x5853 &&
> +                 (device_id == 0x0001 || device_id == 0x0002) )

We don't have defines for 0x5853 in the tree (and those device_id)?
Maybe introduce at least one for the vendor_id:

These two names are use by QEMU, OVMF, Linux, for example.

#define PCI_VENDOR_ID_XEN           0x5853
#define PCI_DEVICE_ID_XEN_PLATFORM  0x0001

There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux


> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index 79c0e6bd4ad2..31b4411db7b4 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -867,7 +867,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>          config->table_flags |= ACPI_HAS_HPET;
>  
>      config->pci_start = pci_mem_start;
> -    config->pci_len = pci_mem_end - pci_mem_start;
> +    config->pci_len = RESERVED_MEMBASE - pci_mem_start;
>      if ( pci_hi_mem_end > pci_hi_mem_start )
>      {
>          config->pci_hi_start = pci_hi_mem_start;
> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> index 8bc768b5156c..962fa820faec 100644
> --- a/tools/libs/light/libxl_create.c
> +++ b/tools/libs/light/libxl_create.c
> @@ -313,6 +313,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>          libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
>          libxl_defbool_setdefault(&b_info->u.hvm.vkb_device,         true);
>          libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
> +        libxl_defbool_setdefault(&b_info->u.hvm.xenpci_bar_uc,      true);
>          libxl_defbool_setdefault(&b_info->u.hvm.pirq,               false);
>  
>          libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
> diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
> index 4d67b0d28294..60ec0354d19a 100644
> --- a/tools/libs/light/libxl_dom.c
> +++ b/tools/libs/light/libxl_dom.c
> @@ -819,6 +819,15 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
>              goto err;
>      }
>  
> +    if (info->type == LIBXL_DOMAIN_TYPE_HVM &&
> +        libxl_defbool_val(info->u.hvm.xenpci_bar_uc)) {

I think this condition is wrong. You should always write the value of
xenpci_bar_uc into xenstore, or only write it if a value have been
selected. But I guess we already lost the information here about whether
the value is the default or not, and it's probably not important, so I
think you should always write the value.

> +        path = GCSPRINTF("/local/domain/%d/"HVM_XS_XENPCI_BAR_UC, domid);
> +        ret = libxl__xs_printf(gc, XBT_NULL, path, "%d",
> +                               libxl_defbool_val(info->u.hvm.xenpci_bar_uc));
> +        if (ret)
> +            goto err;
> +    }
> +
>      return 0;
>  
>  err:

Thanks,

-- 
Anthony PERARD
Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Roger Pau Monné 4 months, 2 weeks ago
On Wed, Jun 11, 2025 at 07:26:06PM +0200, Anthony PERARD wrote:
> On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index c388899306c2..ddbff6fffc16 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
> >  Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
> >  requires at least QEMU 1.6.
> >  
> > +
> > +=item B<xenpci_bar_uc=BOOLEAN>
> > +
> > +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
> > +uncacheable (UC) cache attribute set in MTRR.
> 
> For information, here are different name used for this pci device:
> 
> - man xl.cfg:
>     xen_platform_pci=<bool>
>         Xen platform PCI device
> - QEMU:
>     -device xen-platform
>     in comments: XEN platform pci device
>     with pci device-id PCI_DEVICE_ID_XEN_PLATFORM
> - EDK2 / OVMF:
>     XenIoPci
>         described virtual Xen PCI device
>         But XenIo is a generic protocol in EDK2
>     Before XenIo, the pci device would be linked to XenBus, and
>     loaded with PCI_DEVICE_ID_XEN_PLATFORM
> - Linux:
>     Seems to be called "xen-platform-pci"
> 
> Overall, this PCI device is mostly referenced as the Xen Platform PCI
> device. So "xenpci" or "Xen PCI device" is surprising to me, and I'm not
> quite sure what it is.

I can do xen_platform_pci_bar_uc, but it seems a bit long.

> 
> > +
> > +Default is B<true>.
> > +
> >  =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
> >  
> >  The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
> > diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> > index cc67b18c0361..cfd39cc37cdc 100644
> > --- a/tools/firmware/hvmloader/pci.c
> > +++ b/tools/firmware/hvmloader/pci.c
> > @@ -116,6 +116,8 @@ void pci_setup(void)
> >       * experience the memory relocation bug described below.
> >       */
> >      bool allow_memory_relocate = 1;
> > +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> > +    bool xenpci_bar_uc = false;
> 
> This default value for `xenpci_bar_uc` mean that hvmloader changes
> behavior compared to previous version, right? Shouldn't we instead have
> hvmloader keep the same behavior unless the toolstack want to use the
> new behavior? (Like it's done for `allow_memory_relocate`,
> "platform/mmio_hole_size")
> 
> It would just mean that toolstack other than `xl` won't be surprised by
> a change of behavior.

My plan was that we didn't need changes to XAPI to implement this new
mode, but given the comment I will change to keep the previous
behavior in absence of a xenstore node.

> 
> >      BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
> >                   PCI_COMMAND_IO);
> > @@ -130,6 +132,12 @@ void pci_setup(void)
> >      printf("Relocating guest memory for lowmem MMIO space %s\n",
> >             allow_memory_relocate?"enabled":"disabled");
> >  
> > +    s = xenstore_read(HVM_XS_XENPCI_BAR_UC, NULL);
> > +    if ( s )
> > +        xenpci_bar_uc = strtoll(s, NULL, 0);
> > +    printf("XenPCI device BAR MTRR cache attribute set to %s\n",
> > +           xenpci_bar_uc ? "UC" : "WB");
> > +
> >      s = xenstore_read("platform/mmio_hole_size", NULL);
> >      if ( s )
> >          mmio_hole_size = strtoll(s, NULL, 0);
> > @@ -271,6 +279,44 @@ void pci_setup(void)
> >              if ( bar_sz == 0 )
> >                  continue;
> >  
> > +            if ( !xenpci_bar_uc &&
> > +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> > +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> > +                 vendor_id == 0x5853 &&
> > +                 (device_id == 0x0001 || device_id == 0x0002) )
> 
> We don't have defines for 0x5853 in the tree (and those device_id)?
> Maybe introduce at least one for the vendor_id:
> 
> These two names are use by QEMU, OVMF, Linux, for example.
> 
> #define PCI_VENDOR_ID_XEN           0x5853
> #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
> 
> There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux

You mean in the public headers?

For Device IDs we have ranges allocated to downstream vendors, I'm not
sure we want to keep track of whatever IDs they use for their devices.
OTOH, not tracking those IDs here means OSes are likely to miss
additions of new Xen platform PCI devices?

I could introduce public/pci.h to contain those IDs, but I would like
consensus on what should be there, otherwise this patch will get
stuck.

> 
> > diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> > index 79c0e6bd4ad2..31b4411db7b4 100644
> > --- a/tools/firmware/hvmloader/util.c
> > +++ b/tools/firmware/hvmloader/util.c
> > @@ -867,7 +867,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
> >          config->table_flags |= ACPI_HAS_HPET;
> >  
> >      config->pci_start = pci_mem_start;
> > -    config->pci_len = pci_mem_end - pci_mem_start;
> > +    config->pci_len = RESERVED_MEMBASE - pci_mem_start;
> >      if ( pci_hi_mem_end > pci_hi_mem_start )
> >      {
> >          config->pci_hi_start = pci_hi_mem_start;
> > diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
> > index 8bc768b5156c..962fa820faec 100644
> > --- a/tools/libs/light/libxl_create.c
> > +++ b/tools/libs/light/libxl_create.c
> > @@ -313,6 +313,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
> >          libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
> >          libxl_defbool_setdefault(&b_info->u.hvm.vkb_device,         true);
> >          libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
> > +        libxl_defbool_setdefault(&b_info->u.hvm.xenpci_bar_uc,      true);
> >          libxl_defbool_setdefault(&b_info->u.hvm.pirq,               false);
> >  
> >          libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
> > diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
> > index 4d67b0d28294..60ec0354d19a 100644
> > --- a/tools/libs/light/libxl_dom.c
> > +++ b/tools/libs/light/libxl_dom.c
> > @@ -819,6 +819,15 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
> >              goto err;
> >      }
> >  
> > +    if (info->type == LIBXL_DOMAIN_TYPE_HVM &&
> > +        libxl_defbool_val(info->u.hvm.xenpci_bar_uc)) {
> 
> I think this condition is wrong. You should always write the value of
> xenpci_bar_uc into xenstore, or only write it if a value have been
> selected. But I guess we already lost the information here about whether
> the value is the default or not, and it's probably not important, so I
> think you should always write the value.

Indeed, whether the value is the default or the user-selected one is
lost by the time we get here.

I would adjust according to the above comments, but I would suggest we
leave out the addition of the Xen platform PCI device IDs to a
separate patch, as I fear it will block the patch otherwise.

Thanks, Roger.
Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Anthony PERARD 4 months, 2 weeks ago
On Thu, Jun 12, 2025 at 04:56:17PM +0200, Roger Pau Monné wrote:
> On Wed, Jun 11, 2025 at 07:26:06PM +0200, Anthony PERARD wrote:
> > On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
> > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > > index c388899306c2..ddbff6fffc16 100644
> > > --- a/docs/man/xl.cfg.5.pod.in
> > > +++ b/docs/man/xl.cfg.5.pod.in
> > > @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
> > >  Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
> > >  requires at least QEMU 1.6.
> > >  
> > > +
> > > +=item B<xenpci_bar_uc=BOOLEAN>
> > > +
> > > +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
> > > +uncacheable (UC) cache attribute set in MTRR.
> > 
> > For information, here are different name used for this pci device:
> > 
> > - man xl.cfg:
> >     xen_platform_pci=<bool>
> >         Xen platform PCI device
> > - QEMU:
> >     -device xen-platform
> >     in comments: XEN platform pci device
> >     with pci device-id PCI_DEVICE_ID_XEN_PLATFORM
> > - EDK2 / OVMF:
> >     XenIoPci
> >         described virtual Xen PCI device
> >         But XenIo is a generic protocol in EDK2
> >     Before XenIo, the pci device would be linked to XenBus, and
> >     loaded with PCI_DEVICE_ID_XEN_PLATFORM
> > - Linux:
> >     Seems to be called "xen-platform-pci"
> > 
> > Overall, this PCI device is mostly referenced as the Xen Platform PCI
> > device. So "xenpci" or "Xen PCI device" is surprising to me, and I'm not
> > quite sure what it is.
> 
> I can do xen_platform_pci_bar_uc, but it seems a bit long.

I don't think it matter much how long it is, it is just a word that is
surly copy-past from the man page. What I think matter more is that it's
descriptive enough and match the existing option name for the same
device, which is "xen_platform_pci".

> > > +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> > > +    bool xenpci_bar_uc = false;
> > 
> > This default value for `xenpci_bar_uc` mean that hvmloader changes
> > behavior compared to previous version, right? Shouldn't we instead have
> > hvmloader keep the same behavior unless the toolstack want to use the
> > new behavior? (Like it's done for `allow_memory_relocate`,
> > "platform/mmio_hole_size")
> > 
> > It would just mean that toolstack other than `xl` won't be surprised by
> > a change of behavior.
> 
> My plan was that we didn't need changes to XAPI to implement this new
> mode, but given the comment I will change to keep the previous
> behavior in absence of a xenstore node.

Why would guests created with XAPI get the new behavior, but guest
created with libxl have to stick with the old one?

I do like that there's an option for libxl to choose between the old and
new behavior, and allow to revert in case someone got an issue for a
particular guest, but otherwise, it is probably better to have the same
default for both XAPI and libxl.

> > > @@ -271,6 +279,44 @@ void pci_setup(void)
> > >              if ( bar_sz == 0 )
> > >                  continue;
> > >  
> > > +            if ( !xenpci_bar_uc &&
> > > +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> > > +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> > > +                 vendor_id == 0x5853 &&
> > > +                 (device_id == 0x0001 || device_id == 0x0002) )
> > 
> > We don't have defines for 0x5853 in the tree (and those device_id)?
> > Maybe introduce at least one for the vendor_id:
> > 
> > These two names are use by QEMU, OVMF, Linux, for example.
> > 
> > #define PCI_VENDOR_ID_XEN           0x5853
> > #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
> > 
> > There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux
> 
> You mean in the public headers?
> 
> For Device IDs we have ranges allocated to downstream vendors, I'm not
> sure we want to keep track of whatever IDs they use for their devices.
> OTOH, not tracking those IDs here means OSes are likely to miss
> additions of new Xen platform PCI devices?
> 
> I could introduce public/pci.h to contain those IDs, but I would like
> consensus on what should be there, otherwise this patch will get
> stuck.

I guess, just start with adding the vendor_id define in this same file
(pci.c), that would be a good start, it would give a name to an
otherwise magic number.
Reading `vendor_id == PCI_VENDOR_ID_XEN` is better than reading
`vendor_id == 0x5853`.

If for some reason, we want to use the value in a different part of the
repo, we could introduce or edit a common header and move the define
there.

For the device ids, using a define is less of a need, we would at least
know we have a condition on Xen specific PCI device.

This patch is only about a single device, isn't speaking about ID of
other device a bit out of scope? And anyway, there's already a document
about those, that is "xen-pci-device-reservations.7.pod".

Thanks,

-- 
Anthony PERARD
Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Roger Pau Monné 4 months, 2 weeks ago
On Thu, Jun 12, 2025 at 06:13:48PM +0200, Anthony PERARD wrote:
> On Thu, Jun 12, 2025 at 04:56:17PM +0200, Roger Pau Monné wrote:
> > On Wed, Jun 11, 2025 at 07:26:06PM +0200, Anthony PERARD wrote:
> > > On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
> > > > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > > > index c388899306c2..ddbff6fffc16 100644
> > > > --- a/docs/man/xl.cfg.5.pod.in
> > > > +++ b/docs/man/xl.cfg.5.pod.in
> > > > @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
> > > >  Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
> > > >  requires at least QEMU 1.6.
> > > >  
> > > > +
> > > > +=item B<xenpci_bar_uc=BOOLEAN>
> > > > +
> > > > +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
> > > > +uncacheable (UC) cache attribute set in MTRR.
> > > 
> > > For information, here are different name used for this pci device:
> > > 
> > > - man xl.cfg:
> > >     xen_platform_pci=<bool>
> > >         Xen platform PCI device
> > > - QEMU:
> > >     -device xen-platform
> > >     in comments: XEN platform pci device
> > >     with pci device-id PCI_DEVICE_ID_XEN_PLATFORM
> > > - EDK2 / OVMF:
> > >     XenIoPci
> > >         described virtual Xen PCI device
> > >         But XenIo is a generic protocol in EDK2
> > >     Before XenIo, the pci device would be linked to XenBus, and
> > >     loaded with PCI_DEVICE_ID_XEN_PLATFORM
> > > - Linux:
> > >     Seems to be called "xen-platform-pci"
> > > 
> > > Overall, this PCI device is mostly referenced as the Xen Platform PCI
> > > device. So "xenpci" or "Xen PCI device" is surprising to me, and I'm not
> > > quite sure what it is.
> > 
> > I can do xen_platform_pci_bar_uc, but it seems a bit long.
> 
> I don't think it matter much how long it is, it is just a word that is
> surly copy-past from the man page. What I think matter more is that it's
> descriptive enough and match the existing option name for the same
> device, which is "xen_platform_pci".

I've already adjusted everything to xen_platform_pci_bar_uc, plus the
text to s/Xen PCI/Xen platform PCI/.

> > > > +    /* Select the MTRR cache attribute of the xenpci device BAR. */
> > > > +    bool xenpci_bar_uc = false;
> > > 
> > > This default value for `xenpci_bar_uc` mean that hvmloader changes
> > > behavior compared to previous version, right? Shouldn't we instead have
> > > hvmloader keep the same behavior unless the toolstack want to use the
> > > new behavior? (Like it's done for `allow_memory_relocate`,
> > > "platform/mmio_hole_size")
> > > 
> > > It would just mean that toolstack other than `xl` won't be surprised by
> > > a change of behavior.
> > 
> > My plan was that we didn't need changes to XAPI to implement this new
> > mode, but given the comment I will change to keep the previous
> > behavior in absence of a xenstore node.
> 
> Why would guests created with XAPI get the new behavior, but guest
> created with libxl have to stick with the old one?
> 
> I do like that there's an option for libxl to choose between the old and
> new behavior, and allow to revert in case someone got an issue for a
> particular guest, but otherwise, it is probably better to have the same
> default for both XAPI and libxl.
> 
> > > > @@ -271,6 +279,44 @@ void pci_setup(void)
> > > >              if ( bar_sz == 0 )
> > > >                  continue;
> > > >  
> > > > +            if ( !xenpci_bar_uc &&
> > > > +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> > > > +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> > > > +                 vendor_id == 0x5853 &&
> > > > +                 (device_id == 0x0001 || device_id == 0x0002) )
> > > 
> > > We don't have defines for 0x5853 in the tree (and those device_id)?
> > > Maybe introduce at least one for the vendor_id:
> > > 
> > > These two names are use by QEMU, OVMF, Linux, for example.
> > > 
> > > #define PCI_VENDOR_ID_XEN           0x5853
> > > #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
> > > 
> > > There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux
> > 
> > You mean in the public headers?
> > 
> > For Device IDs we have ranges allocated to downstream vendors, I'm not
> > sure we want to keep track of whatever IDs they use for their devices.
> > OTOH, not tracking those IDs here means OSes are likely to miss
> > additions of new Xen platform PCI devices?
> > 
> > I could introduce public/pci.h to contain those IDs, but I would like
> > consensus on what should be there, otherwise this patch will get
> > stuck.
> 
> I guess, just start with adding the vendor_id define in this same file
> (pci.c), that would be a good start, it would give a name to an
> otherwise magic number.

OK, I think I've misunderstood from your previous reply that you
wanted me to introduce a public pci.h header to contain those values.

> Reading `vendor_id == PCI_VENDOR_ID_XEN` is better than reading
> `vendor_id == 0x5853`.
> 
> If for some reason, we want to use the value in a different part of the
> repo, we could introduce or edit a common header and move the define
> there.
> 
> For the device ids, using a define is less of a need, we would at least
> know we have a condition on Xen specific PCI device.
> 
> This patch is only about a single device, isn't speaking about ID of
> other device a bit out of scope? And anyway, there's already a document
> about those, that is "xen-pci-device-reservations.7.pod".

Yes, but if I had to add a new header I got the feeling I would get
questions about which device IDs should be listed there.

Thanks, Roger.

Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Tu Dinh 4 months, 2 weeks ago
On 12/06/2025 16:57, Roger Pau Monné wrote:
> On Wed, Jun 11, 2025 at 07:26:06PM +0200, Anthony PERARD wrote:
>> On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
>>> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
>>> index c388899306c2..ddbff6fffc16 100644
>>> --- a/docs/man/xl.cfg.5.pod.in
>>> +++ b/docs/man/xl.cfg.5.pod.in
>>> @@ -2351,6 +2351,14 @@ Windows L<https://xenproject.org/windows-pv-drivers/>.
>>>   Setting B<xen_platform_pci=0> with the default device_model "qemu-xen"
>>>   requires at least QEMU 1.6.
>>>   
>>> +
>>> +=item B<xenpci_bar_uc=BOOLEAN>
>>> +
>>> +B<x86 only:> Select whether the memory BAR of the Xen PCI device should have
>>> +uncacheable (UC) cache attribute set in MTRR.
>>
>> For information, here are different name used for this pci device:
>>
>> - man xl.cfg:
>>      xen_platform_pci=<bool>
>>          Xen platform PCI device
>> - QEMU:
>>      -device xen-platform
>>      in comments: XEN platform pci device
>>      with pci device-id PCI_DEVICE_ID_XEN_PLATFORM
>> - EDK2 / OVMF:
>>      XenIoPci
>>          described virtual Xen PCI device
>>          But XenIo is a generic protocol in EDK2
>>      Before XenIo, the pci device would be linked to XenBus, and
>>      loaded with PCI_DEVICE_ID_XEN_PLATFORM
>> - Linux:
>>      Seems to be called "xen-platform-pci"
>>
>> Overall, this PCI device is mostly referenced as the Xen Platform PCI
>> device. So "xenpci" or "Xen PCI device" is surprising to me, and I'm not
>> quite sure what it is.
> 
> I can do xen_platform_pci_bar_uc, but it seems a bit long.
> 
>>
>>> +
>>> +Default is B<true>.
>>> +
>>>   =item B<viridian=[ "GROUP", "GROUP", ...]> or B<viridian=BOOLEAN>
>>>   
>>>   The groups of Microsoft Hyper-V (AKA viridian) compatible enlightenments
>>> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
>>> index cc67b18c0361..cfd39cc37cdc 100644
>>> --- a/tools/firmware/hvmloader/pci.c
>>> +++ b/tools/firmware/hvmloader/pci.c
>>> @@ -116,6 +116,8 @@ void pci_setup(void)
>>>        * experience the memory relocation bug described below.
>>>        */
>>>       bool allow_memory_relocate = 1;
>>> +    /* Select the MTRR cache attribute of the xenpci device BAR. */
>>> +    bool xenpci_bar_uc = false;
>>
>> This default value for `xenpci_bar_uc` mean that hvmloader changes
>> behavior compared to previous version, right? Shouldn't we instead have
>> hvmloader keep the same behavior unless the toolstack want to use the
>> new behavior? (Like it's done for `allow_memory_relocate`,
>> "platform/mmio_hole_size")
>>
>> It would just mean that toolstack other than `xl` won't be surprised by
>> a change of behavior.
> 
> My plan was that we didn't need changes to XAPI to implement this new
> mode, but given the comment I will change to keep the previous
> behavior in absence of a xenstore node.
> 
>>
>>>       BUILD_BUG_ON((typeof(*pci_devfn_decode_type))PCI_COMMAND_IO !=
>>>                    PCI_COMMAND_IO);
>>> @@ -130,6 +132,12 @@ void pci_setup(void)
>>>       printf("Relocating guest memory for lowmem MMIO space %s\n",
>>>              allow_memory_relocate?"enabled":"disabled");
>>>   
>>> +    s = xenstore_read(HVM_XS_XENPCI_BAR_UC, NULL);
>>> +    if ( s )
>>> +        xenpci_bar_uc = strtoll(s, NULL, 0);
>>> +    printf("XenPCI device BAR MTRR cache attribute set to %s\n",
>>> +           xenpci_bar_uc ? "UC" : "WB");
>>> +
>>>       s = xenstore_read("platform/mmio_hole_size", NULL);
>>>       if ( s )
>>>           mmio_hole_size = strtoll(s, NULL, 0);
>>> @@ -271,6 +279,44 @@ void pci_setup(void)
>>>               if ( bar_sz == 0 )
>>>                   continue;
>>>   
>>> +            if ( !xenpci_bar_uc &&
>>> +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>>> +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
>>> +                 vendor_id == 0x5853 &&
>>> +                 (device_id == 0x0001 || device_id == 0x0002) )
>>
>> We don't have defines for 0x5853 in the tree (and those device_id)?
>> Maybe introduce at least one for the vendor_id:
>>
>> These two names are use by QEMU, OVMF, Linux, for example.
>>
>> #define PCI_VENDOR_ID_XEN           0x5853
>> #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
>>
>> There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux
> 
> You mean in the public headers?
> 
> For Device IDs we have ranges allocated to downstream vendors, I'm not
> sure we want to keep track of whatever IDs they use for their devices.
> OTOH, not tracking those IDs here means OSes are likely to miss
> additions of new Xen platform PCI devices?
> 

The devices starting from ID c000 are supposed to be xen_pvdevice, which 
are separate device types that work differently from Xen platform PCI 
devices. So I think you only need to check for 
PCI_DEVICE_ID_XEN_PLATFORM{,_XS61} unless another platform PCI ID gets 
defined some day.

> I could introduce public/pci.h to contain those IDs, but I would like
> consensus on what should be there, otherwise this patch will get
> stuck.
> 
>>
>>> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
>>> index 79c0e6bd4ad2..31b4411db7b4 100644
>>> --- a/tools/firmware/hvmloader/util.c
>>> +++ b/tools/firmware/hvmloader/util.c
>>> @@ -867,7 +867,7 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
>>>           config->table_flags |= ACPI_HAS_HPET;
>>>   
>>>       config->pci_start = pci_mem_start;
>>> -    config->pci_len = pci_mem_end - pci_mem_start;
>>> +    config->pci_len = RESERVED_MEMBASE - pci_mem_start;
>>>       if ( pci_hi_mem_end > pci_hi_mem_start )
>>>       {
>>>           config->pci_hi_start = pci_hi_mem_start;
>>> diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
>>> index 8bc768b5156c..962fa820faec 100644
>>> --- a/tools/libs/light/libxl_create.c
>>> +++ b/tools/libs/light/libxl_create.c
>>> @@ -313,6 +313,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>>>           libxl_defbool_setdefault(&b_info->u.hvm.usb,                false);
>>>           libxl_defbool_setdefault(&b_info->u.hvm.vkb_device,         true);
>>>           libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
>>> +        libxl_defbool_setdefault(&b_info->u.hvm.xenpci_bar_uc,      true);
>>>           libxl_defbool_setdefault(&b_info->u.hvm.pirq,               false);
>>>   
>>>           libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
>>> diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
>>> index 4d67b0d28294..60ec0354d19a 100644
>>> --- a/tools/libs/light/libxl_dom.c
>>> +++ b/tools/libs/light/libxl_dom.c
>>> @@ -819,6 +819,15 @@ static int hvm_build_set_xs_values(libxl__gc *gc,
>>>               goto err;
>>>       }
>>>   
>>> +    if (info->type == LIBXL_DOMAIN_TYPE_HVM &&
>>> +        libxl_defbool_val(info->u.hvm.xenpci_bar_uc)) {
>>
>> I think this condition is wrong. You should always write the value of
>> xenpci_bar_uc into xenstore, or only write it if a value have been
>> selected. But I guess we already lost the information here about whether
>> the value is the default or not, and it's probably not important, so I
>> think you should always write the value.
> 
> Indeed, whether the value is the default or the user-selected one is
> lost by the time we get here.
> 
> I would adjust according to the above comments, but I would suggest we
> leave out the addition of the Xen platform PCI device IDs to a
> separate patch, as I fear it will block the patch otherwise.
> 
> Thanks, Roger.
> 

Best regards,


Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Roger Pau Monné 4 months, 2 weeks ago
On Thu, Jun 12, 2025 at 03:12:03PM +0000, Tu Dinh wrote:
> On 12/06/2025 16:57, Roger Pau Monné wrote:
> > On Wed, Jun 11, 2025 at 07:26:06PM +0200, Anthony PERARD wrote:
> >> On Tue, Jun 10, 2025 at 06:29:30PM +0200, Roger Pau Monne wrote:
> >>> @@ -271,6 +279,44 @@ void pci_setup(void)
> >>>               if ( bar_sz == 0 )
> >>>                   continue;
> >>>
> >>> +            if ( !xenpci_bar_uc &&
> >>> +                 ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> >>> +                   PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> >>> +                 vendor_id == 0x5853 &&
> >>> +                 (device_id == 0x0001 || device_id == 0x0002) )
> >>
> >> We don't have defines for 0x5853 in the tree (and those device_id)?
> >> Maybe introduce at least one for the vendor_id:
> >>
> >> These two names are use by QEMU, OVMF, Linux, for example.
> >>
> >> #define PCI_VENDOR_ID_XEN           0x5853
> >> #define PCI_DEVICE_ID_XEN_PLATFORM  0x0001
> >>
> >> There's even PCI_DEVICE_ID_XEN_PLATFORM_XS61 in Linux
> >
> > You mean in the public headers?
> >
> > For Device IDs we have ranges allocated to downstream vendors, I'm not
> > sure we want to keep track of whatever IDs they use for their devices.
> > OTOH, not tracking those IDs here means OSes are likely to miss
> > additions of new Xen platform PCI devices?
> >
> 
> The devices starting from ID c000 are supposed to be xen_pvdevice, which
> are separate device types that work differently from Xen platform PCI
> devices. So I think you only need to check for
> PCI_DEVICE_ID_XEN_PLATFORM{,_XS61} unless another platform PCI ID gets
> defined some day.

It's not the check here, but rather what would be defined in the newly
added header what I don't know.

Even if not used here, should the other PCI device IDs be listed
together with the platform PCI device ones?

Thanks, Roger.

Re: [PATCH v4] x86/hvmloader: select xenpci MMIO BAR UC or WB MTRR cache attribute
Posted by Jan Beulich 4 months, 3 weeks ago
On 10.06.2025 18:29, Roger Pau Monne wrote:
> The Xen PCI device (vendor ID 0x5853) exposed to x86 HVM guests doesn't
> have the functionality of a traditional PCI device.  The exposed MMIO BAR
> is used by some guests (including Linux) as a safe place to map foreign
> memory, including the grant table itself.
> 
> Traditionally BARs from devices have the uncacheable (UC) cache attribute
> from the MTRR, to ensure correct functionality of such devices.  hvmloader
> mimics this behavior and sets the MTRR attributes of both the low and high
> PCI MMIO windows (where BARs of PCI devices reside) as UC in MTRR.
> 
> This however causes performance issues for users of the Xen PCI device BAR,
> as for the purposes of mapping remote memory there's no need to use the UC
> attribute.  On Intel systems this is worked around by using iPAT, that
> allows the hypervisor to force the effective cache attribute of a p2m entry
> regardless of the guest PAT value.  AMD however doesn't have an equivalent
> of iPAT, and guest PAT values are always considered.
> 
> Linux commit:
> 
> 41925b105e34 xen: replace xen_remap() with memremap()
> 
> Attempted to mitigate this by forcing mappings of the grant-table to use
> the write-back (WB) cache attribute.  However Linux memremap() takes MTRRs
> into account to calculate which PAT type to use, and seeing the MTRR cache
> attribute for the region being UC the PAT also ends up as UC, regardless of
> the caller having requested WB.
> 
> As a workaround to allow current Linux to map the grant-table as WB using
> memremap() introduce an xl.cfg option (xenpci_bar_uc=0) that can be used to
> select whether the Xen PCI device BAR will have the UC attribute in MTRR.
> Such workaround in hvmloader should also be paired with a fix for Linux so
> it attempts to change the MTRR of the Xen PCI device BAR to WB by itself.
> 
> Overall, the long term solution would be to provide the guest with a safe
> range in the guest physical address space where mappings to foreign pages
> can be created.
> 
> Some vif throughput performance figures provided by Anthoine from a 8
> vCPUs, 4GB of RAM HVM guest(s) running on AMD hardware:
> 
> Without this patch:
> vm -> dom0: 1.1Gb/s
> vm -> vm:   5.0Gb/s
> 
> With the patch:
> vm -> dom0: 4.5Gb/s
> vm -> vm:   7.0Gb/s
> 
> Reported-by: Anthoine Bourgeois <anthoine.bourgeois@vates.tech>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>

Acked-by: Jan Beulich <jbeulich@suse.com> # hvmloader