[PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests

Stewart Hildebrand posted 14 patches 8 months, 2 weeks ago
There is a newer version of this series
[PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests
Posted by Stewart Hildebrand 8 months, 2 weeks ago
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
    don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
    regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
    disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU checks if INTx was mapped
    for a device. If it is not, then guest can't control
    PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU/dom0: rsvdp_mask

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
RFC: There is an unaddressed question for Roger: should we update the
     guest view of the PCI_COMMAND_INTX_DISABLE bit in
     msi.c/msix.c:control_write()? See prior discussion at [1].

[1] https://lore.kernel.org/xen-devel/86b25777-788c-4b9a-8166-a6f8174bedc9@suse.com/

In v13:
- Update right away (don't defer) PCI_COMMAND_MEMORY bit in guest_cmd
  variable in cmd_write()
- Make comment single line in xen/drivers/vpci/msi.c:control_write()
- Rearrange memory decoding disabling snippet in init_header()

In v12:
- Rework patch using vpci_add_register_mask()
- Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except
  don't add the RO bits because they were RW in PCI LB 3.0 spec.
- Move and expand TODO comment about properly emulating bits
- Update guest_cmd in msi.c/msix.c:control_write()
- Simplify cmd_write(), thanks to rsvdp_mask
- Update commit description

In v11:
- Fix copy-paste mistake: vpci->msi should be vpci->msix
- Handle PCI_COMMAND_IO
- Fix condition for disabling INTx in the MSI-X code
- Show domU changes to only allowed bits
- Show PCI_COMMAND_MEMORY write only after P2M was altered
- Update comments in the code
In v10:
- Added cf_check attribute to guest_cmd_read
- Removed warning about non-zero cmd
- Updated comment MSI code regarding disabling INTX
- Used ternary operator in vpci_add_register() call
- Disable memory decoding for DomUs in init_bars()
In v9:
- Reworked guest_cmd_read
- Added handling for more bits
Since v6:
- fold guest's logic into cmd_write
- implement cmd_read, so we can report emulated INTx state to guests
- introduce header->guest_cmd to hold the emulated state of the
  PCI_COMMAND register for guests
Since v5:
- add additional check for MSI-X enabled while altering INTX bit
- make sure INTx disabled while guests enable MSI/MSI-X
Since v3:
- gate more code on CONFIG_HAS_MSI
- removed logic for the case when MSI/MSI-X not enabled
---
 xen/drivers/vpci/header.c  | 57 ++++++++++++++++++++++++++++++++++----
 xen/drivers/vpci/msi.c     |  7 +++++
 xen/drivers/vpci/msix.c    |  7 +++++
 xen/include/xen/pci_regs.h |  1 +
 xen/include/xen/vpci.h     |  3 ++
 5 files changed, 69 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
index 47648c395132..5d13ea975cc2 100644
--- a/xen/drivers/vpci/header.c
+++ b/xen/drivers/vpci/header.c
@@ -524,9 +524,21 @@ static void cf_check cmd_write(
 {
     struct vpci_header *header = data;
 
+    if ( !is_hardware_domain(pdev->domain) )
+    {
+        const struct vpci *vpci = pdev->vpci;
+
+        if ( (vpci->msi && vpci->msi->enabled) ||
+             (vpci->msix && vpci->msix->enabled) )
+            cmd |= PCI_COMMAND_INTX_DISABLE;
+
+        header->guest_cmd = cmd;
+    }
+
     /*
      * Let Dom0 play with all the bits directly except for the memory
-     * decoding one.
+     * decoding one. Bits that are not allowed for DomU are already
+     * handled above and by the rsvdp_mask.
      */
     if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) )
         /*
@@ -540,6 +552,14 @@ static void cf_check cmd_write(
         pci_conf_write16(pdev->sbdf, reg, cmd);
 }
 
+static uint32_t cf_check guest_cmd_read(
+    const struct pci_dev *pdev, unsigned int reg, void *data)
+{
+    const struct vpci_header *header = data;
+
+    return header->guest_cmd;
+}
+
 static void cf_check bar_write(
     const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data)
 {
@@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev)
         return -EOPNOTSUPP;
     }
 
-    /* Setup a handler for the command register. */
-    rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, PCI_COMMAND,
-                           2, header);
+    /*
+     * Setup a handler for the command register.
+     *
+     * TODO: If support for emulated bits is added, re-visit how to handle
+     * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK.
+     */
+    rc = vpci_add_register_mask(pdev->vpci,
+                                is_hwdom ? vpci_hw_read16 : guest_cmd_read,
+                                cmd_write, PCI_COMMAND, 2, header, 0, 0,
+                                PCI_COMMAND_RSVDP_MASK |
+                                    (is_hwdom ? 0
+                                              : PCI_COMMAND_IO |
+                                                PCI_COMMAND_PARITY |
+                                                PCI_COMMAND_WAIT |
+                                                PCI_COMMAND_SERR |
+                                                PCI_COMMAND_FAST_BACK),
+                                0);
     if ( rc )
         return rc;
 
@@ -836,9 +870,20 @@ static int cf_check init_header(struct pci_dev *pdev)
     if ( pdev->ignore_bars )
         return 0;
 
-    /* Disable memory decoding before sizing. */
     cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
-    if ( cmd & PCI_COMMAND_MEMORY )
+
+    /*
+     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
+     * always start with memory decoding disabled and to ensure that we will not
+     * call modify_bars() at the end of this function.
+     */
+    if ( !is_hwdom )
+        cmd &= ~(PCI_COMMAND_MEMORY | PCI_COMMAND_IO);
+
+    header->guest_cmd = cmd;
+
+    /* Disable memory decoding before sizing. */
+    if ( !is_hwdom || (cmd & PCI_COMMAND_MEMORY) )
         pci_conf_write16(pdev->sbdf, PCI_COMMAND, cmd & ~PCI_COMMAND_MEMORY);
 
     for ( i = 0; i < num_bars; i++ )
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index dc71938e23f5..1e747ad9bba8 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -70,6 +70,13 @@ static void cf_check control_write(
 
         if ( vpci_msi_arch_enable(msi, pdev, vectors) )
             return;
+
+        /* Make sure domU doesn't enable INTx while enabling MSI. */
+        if ( !is_hardware_domain(pdev->domain) )
+        {
+            pci_intx(pdev, false);
+            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+        }
     }
     else
         vpci_msi_arch_disable(msi, pdev);
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index b6abab47efdd..d4f756ce287a 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -135,6 +135,13 @@ static void cf_check control_write(
         }
     }
 
+    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
+    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
+    {
+        pci_intx(pdev, false);
+        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
+    }
+
     msix->masked = new_masked;
     msix->enabled = new_enabled;
 
diff --git a/xen/include/xen/pci_regs.h b/xen/include/xen/pci_regs.h
index 9909b27425a5..b2f2e43e864d 100644
--- a/xen/include/xen/pci_regs.h
+++ b/xen/include/xen/pci_regs.h
@@ -48,6 +48,7 @@
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
 #define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
+#define  PCI_COMMAND_RSVDP_MASK	0xf800
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_IMM_READY	0x01	/* Immediate Readiness */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..77320a667e55 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -107,6 +107,9 @@ struct vpci {
         } bars[PCI_HEADER_NORMAL_NR_BARS + 1];
         /* At most 6 BARS + 1 expansion ROM BAR. */
 
+        /* Guest (domU only) view of the PCI_COMMAND register. */
+        uint16_t guest_cmd;
+
         /*
          * Store whether the ROM enable bit is set (doesn't imply ROM BAR
          * is mapped into guest p2m) if there's a ROM BAR on the device.
-- 
2.43.0
Re: [PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests
Posted by Jan Beulich 8 months ago
On 02.02.2024 22:33, Stewart Hildebrand wrote:
> @@ -836,9 +870,20 @@ static int cf_check init_header(struct pci_dev *pdev)
>      if ( pdev->ignore_bars )
>          return 0;
>  
> -    /* Disable memory decoding before sizing. */
>      cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> -    if ( cmd & PCI_COMMAND_MEMORY )
> +
> +    /*
> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
> +     * always start with memory decoding disabled and to ensure that we will not
> +     * call modify_bars() at the end of this function.

To achieve this, fiddling with PCI_COMMAND_IO isn't necessary. Which isn't
to say its clearing should go away; quite the other way around: Why would
we leave e.g. PCI_COMMAND_MASTER enabled? In fact wasn't it in an earlier
version of the series that the guest view simply started out as zero? The
patch description still says so.

> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -70,6 +70,13 @@ static void cf_check control_write(
>  
>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>              return;
> +
> +        /* Make sure domU doesn't enable INTx while enabling MSI. */
> +        if ( !is_hardware_domain(pdev->domain) )
> +        {
> +            pci_intx(pdev, false);
> +            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
> +        }

While here we're inside "if ( new_enabled )", ...

> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -135,6 +135,13 @@ static void cf_check control_write(
>          }
>      }
>  
> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
> +    {
> +        pci_intx(pdev, false);
> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
> +    }

.. here you further check that it's actually a 0->1 transition? Why
not alike for MSI?

Jan
Re: [PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests
Posted by Stewart Hildebrand 7 months ago
On 2/14/24 10:41, Jan Beulich wrote:
> On 02.02.2024 22:33, Stewart Hildebrand wrote:
>> @@ -836,9 +870,20 @@ static int cf_check init_header(struct pci_dev *pdev)
>>      if ( pdev->ignore_bars )
>>          return 0;
>>  
>> -    /* Disable memory decoding before sizing. */
>>      cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>> -    if ( cmd & PCI_COMMAND_MEMORY )
>> +
>> +    /*
>> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
>> +     * always start with memory decoding disabled and to ensure that we will not
>> +     * call modify_bars() at the end of this function.
> 
> To achieve this, fiddling with PCI_COMMAND_IO isn't necessary. Which isn't
> to say its clearing should go away; quite the other way around: Why would
> we leave e.g. PCI_COMMAND_MASTER enabled? In fact wasn't it in an earlier
> version of the series that the guest view simply started out as zero? The
> patch description still says so.

Yep, clearing PCI_COMMAND_MASTER too for domUs makes sense to me, I'll
make this change in v14. I'll also try to improve the comment.

Roger suggested at [1] that we should reflect the state of the hardware
in the command register. I'll update the patch description accordingly.

Archaeology/notes/references follow, primarily for my own reference:

Note that the rsvdp_mask will be applied to the guest_cmd value before
being returned to the guest, so no need to apply masks here.

Clearing both PCI_COMMAND_MEMORY and PCI_COMMAND_IO for domUs was
suggested by Roger at [2] and [3]. It is currently problematic for
devices assigned to domUs to have memory decoding enabled at this stage
because we don't yet have a good/generic way to initialize
bar.guest_addr taking the domU memory layout into account.

Reminder that we want to leave the PCI_COMMAND_{MASTER,MEMORY,IO} bits
unchanged for devices assigned to dom0. A description of why can be
found in the commit message of:

53d9133638c3 ("pci: do not disable memory decoding for devices").

[1] https://lore.kernel.org/xen-devel/ZLqI65gmNj1XDBm4@MacBook-Air-de-Roger.local/
[2] https://lore.kernel.org/xen-devel/ZRquRcRz-K43WeMc@MacBookPdeRoger/
[3] https://lore.kernel.org/xen-devel/ZVy73iJ3E8nJHvgf@macbook.local/

>> --- a/xen/drivers/vpci/msi.c
>> +++ b/xen/drivers/vpci/msi.c
>> @@ -70,6 +70,13 @@ static void cf_check control_write(
>>  
>>          if ( vpci_msi_arch_enable(msi, pdev, vectors) )
>>              return;
>> +
>> +        /* Make sure domU doesn't enable INTx while enabling MSI. */
>> +        if ( !is_hardware_domain(pdev->domain) )
>> +        {
>> +            pci_intx(pdev, false);
>> +            pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +        }
> 
> While here we're inside "if ( new_enabled )", ...
> 
>> --- a/xen/drivers/vpci/msix.c
>> +++ b/xen/drivers/vpci/msix.c
>> @@ -135,6 +135,13 @@ static void cf_check control_write(
>>          }
>>      }
>>  
>> +    /* Make sure domU doesn't enable INTx while enabling MSI-X. */
>> +    if ( new_enabled && !msix->enabled && !is_hardware_domain(pdev->domain) )
>> +    {
>> +        pci_intx(pdev, false);
>> +        pdev->vpci->header.guest_cmd |= PCI_COMMAND_INTX_DISABLE;
>> +    }
> 
> .. here you further check that it's actually a 0->1 transition? Why
> not alike for MSI?

Good catch, we should similarly check for a 0->1 transition for MSI.
I'll fix it.
Re: [PATCH v13 10/14] vpci/header: emulate PCI_COMMAND register for guests
Posted by Jan Beulich 7 months ago
On 18.03.2024 22:03, Stewart Hildebrand wrote:
> On 2/14/24 10:41, Jan Beulich wrote:
>> On 02.02.2024 22:33, Stewart Hildebrand wrote:
>>> @@ -836,9 +870,20 @@ static int cf_check init_header(struct pci_dev *pdev)
>>>      if ( pdev->ignore_bars )
>>>          return 0;
>>>  
>>> -    /* Disable memory decoding before sizing. */
>>>      cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
>>> -    if ( cmd & PCI_COMMAND_MEMORY )
>>> +
>>> +    /*
>>> +     * Clear PCI_COMMAND_MEMORY and PCI_COMMAND_IO for DomUs, so they will
>>> +     * always start with memory decoding disabled and to ensure that we will not
>>> +     * call modify_bars() at the end of this function.
>>
>> To achieve this, fiddling with PCI_COMMAND_IO isn't necessary. Which isn't
>> to say its clearing should go away; quite the other way around: Why would
>> we leave e.g. PCI_COMMAND_MASTER enabled? In fact wasn't it in an earlier
>> version of the series that the guest view simply started out as zero? The
>> patch description still says so.
> 
> Yep, clearing PCI_COMMAND_MASTER too for domUs makes sense to me, I'll
> make this change in v14. I'll also try to improve the comment.
> 
> Roger suggested at [1] that we should reflect the state of the hardware
> in the command register. I'll update the patch description accordingly.

Roger's request can be carried out in several ways. But first of all state
reflected should be that of whatever is consistent in the virtualized view.
Any bits the guest may control ought to start out as zero, imo (which may
mean bringing hardware in sync with the guest view, not necessarily the
other way around). For bits the guest cannot control what the guest ought
to see depends.

Jan