[PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities

Roger Pau Monne posted 1 patch 4 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20250929084149.70560-1-roger.pau@citrix.com
There is a newer version of this series
xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 45 insertions(+), 5 deletions(-)
[PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Roger Pau Monne 4 months, 1 week ago
I've had the luck to come across a PCI card that exposes a MSI-X capability
where the BIR of the vector and PBA tables points at a BAR that has 0 size.

This doesn't play nice with the code in vpci_make_msix_hole(), as it would
still use the address of such empty BAR (0) and attempt to crave a hole in
the p2m.  This leads to errors like the one below being reported by Xen:

d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area

And the device left unable to enable memory decoding due to the failure
reported by vpci_make_msix_hole().

Introduce checking in init_msix() to ensure the BARs containing the MSI-X
tables are usable.  This requires checking that the BIR points to a
non-empty BAR, and the offset and size of the MSI-X tables can fit in the
target BAR.

This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
EPYC 9965 processors.  The broken device is:

22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)

There are multiple of those integrated controllers in the system, all
broken in the same way.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>

While not strictly a bugfix, I consider this a worthy improvement so that
PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
capabilities.  Hence I think this change should be considered for inclusion
into 4.21.  There a risk of regressing on hardware that was already working
with PVH, but given enough testing that should be minimal.
---
 xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 45 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 54a5070733aa..8458955d5bbb 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
     if ( !msix )
         return -ENOMEM;
 
+    msix->tables[VPCI_MSIX_TABLE] =
+        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
+    msix->tables[VPCI_MSIX_PBA] =
+        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
+
+    /* Check that the provided BAR is valid. */
+    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
+    {
+        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
+        const struct vpci_bar *bars = pdev->vpci->header.bars;
+        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
+        unsigned int type;
+        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
+        unsigned int size =
+            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
+                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
+
+        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
+        {
+            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
+                   &pdev->sbdf, name, bir);
+ invalid:
+            xfree(msix);
+            return -ENODEV;
+
+        }
+
+        type = bars[bir].type;
+        if ( type != VPCI_BAR_MEM32 && type != VPCI_BAR_MEM64_LO )
+        {
+            printk(XENLOG_ERR
+                   "%pp: MSI-X %s table at invalid BAR%u with type %u\n",
+                   &pdev->sbdf, name, bir, type);
+            goto invalid;
+        }
+
+        if ( (uint64_t)offset + size > bars[bir].size )
+        {
+            printk(XENLOG_ERR
+                   "%pp: MSI-X %s table offset %#x size %#x outside of BAR%u size %#lx\n",
+                   &pdev->sbdf, name, offset, size, bir, bars[bir].size);
+            goto invalid;
+        }
+    }
+
     rc = vpci_add_register(pdev->vpci, control_read, control_write,
                            msix_control_reg(msix_offset), 2, msix);
     if ( rc )
@@ -686,11 +731,6 @@ static int cf_check init_msix(struct pci_dev *pdev)
     msix->max_entries = max_entries;
     msix->pdev = pdev;
 
-    msix->tables[VPCI_MSIX_TABLE] =
-        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
-    msix->tables[VPCI_MSIX_PBA] =
-        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
-
     for ( i = 0; i < max_entries; i++)
     {
         msix->entries[i].masked = true;
-- 
2.51.0


Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Stewart Hildebrand 4 months, 1 week ago
On 9/29/25 04:41, Roger Pau Monne wrote:
> I've had the luck to come across a PCI card that exposes a MSI-X capability
> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
> 
> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> still use the address of such empty BAR (0) and attempt to crave a hole in

s/crave/carve/

> the p2m.  This leads to errors like the one below being reported by Xen:
> 
> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
> 
> And the device left unable to enable memory decoding due to the failure
> reported by vpci_make_msix_hole().
> 
> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> tables are usable.  This requires checking that the BIR points to a
> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> target BAR.
> 
> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> EPYC 9965 processors.  The broken device is:
> 
> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
> 
> There are multiple of those integrated controllers in the system, all
> broken in the same way.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> 
> While not strictly a bugfix, I consider this a worthy improvement so that
> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> capabilities.  Hence I think this change should be considered for inclusion
> into 4.21.  There a risk of regressing on hardware that was already working
> with PVH, but given enough testing that should be minimal.
> ---
>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 54a5070733aa..8458955d5bbb 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      if ( !msix )
>          return -ENOMEM;
>  
> +    msix->tables[VPCI_MSIX_TABLE] =
> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> +    msix->tables[VPCI_MSIX_PBA] =
> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> +
> +    /* Check that the provided BAR is valid. */
> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> +    {
> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> +        unsigned int type;
> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> +        unsigned int size =
> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> +
> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )

This assumes a type 0 header. For type 1 headers, bir values 2 and up are
also reserved.

> +        {
> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> +                   &pdev->sbdf, name, bir);

Nit: placing the cleanup label at the end of the function and using 'rc' would
make it more amenable to future uses.

> + invalid:
> +            xfree(msix);
> +            return -ENODEV;
> +

Extraneous newline.

> +        }
> +
> +        type = bars[bir].type;
> +        if ( type != VPCI_BAR_MEM32 && type != VPCI_BAR_MEM64_LO )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table at invalid BAR%u with type %u\n",
> +                   &pdev->sbdf, name, bir, type);
> +            goto invalid;
> +        }
> +
> +        if ( (uint64_t)offset + size > bars[bir].size )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table offset %#x size %#x outside of BAR%u size %#lx\n",
> +                   &pdev->sbdf, name, offset, size, bir, bars[bir].size);
> +            goto invalid;
> +        }
> +    }
> +
>      rc = vpci_add_register(pdev->vpci, control_read, control_write,
>                             msix_control_reg(msix_offset), 2, msix);
>      if ( rc )
> @@ -686,11 +731,6 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      msix->max_entries = max_entries;
>      msix->pdev = pdev;
>  
> -    msix->tables[VPCI_MSIX_TABLE] =
> -        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> -    msix->tables[VPCI_MSIX_PBA] =
> -        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> -
>      for ( i = 0; i < max_entries; i++)
>      {
>          msix->entries[i].masked = true;


Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Roger Pau Monné 4 months ago
On Fri, Oct 03, 2025 at 11:29:40PM -0400, Stewart Hildebrand wrote:
> On 9/29/25 04:41, Roger Pau Monne wrote:
> > I've had the luck to come across a PCI card that exposes a MSI-X capability
> > where the BIR of the vector and PBA tables points at a BAR that has 0 size.
> > 
> > This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> > still use the address of such empty BAR (0) and attempt to crave a hole in
> 
> s/crave/carve/
> 
> > the p2m.  This leads to errors like the one below being reported by Xen:
> > 
> > d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
> > 
> > And the device left unable to enable memory decoding due to the failure
> > reported by vpci_make_msix_hole().
> > 
> > Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> > tables are usable.  This requires checking that the BIR points to a
> > non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> > target BAR.
> > 
> > This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> > EPYC 9965 processors.  The broken device is:
> > 
> > 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
> > 
> > There are multiple of those integrated controllers in the system, all
> > broken in the same way.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> > 
> > While not strictly a bugfix, I consider this a worthy improvement so that
> > PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> > capabilities.  Hence I think this change should be considered for inclusion
> > into 4.21.  There a risk of regressing on hardware that was already working
> > with PVH, but given enough testing that should be minimal.
> > ---
> >  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 45 insertions(+), 5 deletions(-)
> > 
> > diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> > index 54a5070733aa..8458955d5bbb 100644
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
> >      if ( !msix )
> >          return -ENOMEM;
> >  
> > +    msix->tables[VPCI_MSIX_TABLE] =
> > +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> > +    msix->tables[VPCI_MSIX_PBA] =
> > +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> > +
> > +    /* Check that the provided BAR is valid. */
> > +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> > +    {
> > +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> > +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> > +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> > +        unsigned int type;
> > +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> > +        unsigned int size =
> > +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> > +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> > +
> > +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
> 
> This assumes a type 0 header. For type 1 headers, bir values 2 and up are
> also reserved.

Right, but those BARs will be set as VPCI_BAR_EMPTY for type 1 headers.
The check here is to avoid doing an out of bounds array access, the
check for validity of the pointed BAR is done below.

> 
> > +        {
> > +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> > +                   &pdev->sbdf, name, bir);
> 
> Nit: placing the cleanup label at the end of the function and using 'rc' would
> make it more amenable to future uses.

The issue with that is that we then end up with a structure like:

    return vpci_make_msix_hole();

 error:
    xfree();
    return rc;

Which I don't like much because of the double usage of return (it's a
taste issue TBH).

My motivation for using a goto is that they are conceptually the same
error path, but we provide different log messages to aid in debugging
the issue.  Otherwise all checks will be done in a single condition.

Let me know how strong you feel about the usage of a label here vs one
at the tail of the function.

> 
> > + invalid:
> > +            xfree(msix);
> > +            return -ENODEV;
> > +
> 
> Extraneous newline.

Thanks, Roger.

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Jan Beulich 4 months ago
On 06.10.2025 10:20, Roger Pau Monné wrote:
> On Fri, Oct 03, 2025 at 11:29:40PM -0400, Stewart Hildebrand wrote:
>> On 9/29/25 04:41, Roger Pau Monne wrote:
>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>>>
>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>>
>> s/crave/carve/
>>
>>> the p2m.  This leads to errors like the one below being reported by Xen:
>>>
>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>>>
>>> And the device left unable to enable memory decoding due to the failure
>>> reported by vpci_make_msix_hole().
>>>
>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>>> tables are usable.  This requires checking that the BIR points to a
>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>>> target BAR.
>>>
>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>>> EPYC 9965 processors.  The broken device is:
>>>
>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>>>
>>> There are multiple of those integrated controllers in the system, all
>>> broken in the same way.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>
>>> While not strictly a bugfix, I consider this a worthy improvement so that
>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>>> capabilities.  Hence I think this change should be considered for inclusion
>>> into 4.21.  There a risk of regressing on hardware that was already working
>>> with PVH, but given enough testing that should be minimal.
>>> ---
>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 45 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>>> index 54a5070733aa..8458955d5bbb 100644
>>> --- a/xen/drivers/vpci/msix.c
>>> +++ b/xen/drivers/vpci/msix.c
>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>>      if ( !msix )
>>>          return -ENOMEM;
>>>  
>>> +    msix->tables[VPCI_MSIX_TABLE] =
>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>>> +    msix->tables[VPCI_MSIX_PBA] =
>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>>> +
>>> +    /* Check that the provided BAR is valid. */
>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>> +    {
>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>>> +        unsigned int type;
>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>>> +        unsigned int size =
>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>>> +
>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>>
>> This assumes a type 0 header. For type 1 headers, bir values 2 and up are
>> also reserved.
> 
> Right, but those BARs will be set as VPCI_BAR_EMPTY for type 1 headers.
> The check here is to avoid doing an out of bounds array access, the
> check for validity of the pointed BAR is done below.
> 
>>
>>> +        {
>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>>> +                   &pdev->sbdf, name, bir);
>>
>> Nit: placing the cleanup label at the end of the function and using 'rc' would
>> make it more amenable to future uses.
> 
> The issue with that is that we then end up with a structure like:
> 
>     return vpci_make_msix_hole();
> 
>  error:
>     xfree();
>     return rc;
> 
> Which I don't like much because of the double usage of return (it's a
> taste issue TBH).
> 
> My motivation for using a goto is that they are conceptually the same
> error path, but we provide different log messages to aid in debugging
> the issue.  Otherwise all checks will be done in a single condition.

I agree here, yet I'd like to point out that (iirc) Misra wants us to use
only forward goto-s (which imo is a mistake, but I don't expect they're
going to change their minds). So flipping where the label and goto are
may be desirable.

Jan

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Stewart Hildebrand 4 months ago
On 10/7/25 10:56, Jan Beulich wrote:
> On 06.10.2025 10:20, Roger Pau Monné wrote:
>> On Fri, Oct 03, 2025 at 11:29:40PM -0400, Stewart Hildebrand wrote:
>>> On 9/29/25 04:41, Roger Pau Monne wrote:
>>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>>>>
>>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>>>
>>> s/crave/carve/
>>>
>>>> the p2m.  This leads to errors like the one below being reported by Xen:
>>>>
>>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>>>>
>>>> And the device left unable to enable memory decoding due to the failure
>>>> reported by vpci_make_msix_hole().
>>>>
>>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>>>> tables are usable.  This requires checking that the BIR points to a
>>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>>>> target BAR.
>>>>
>>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>>>> EPYC 9965 processors.  The broken device is:
>>>>
>>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>>>>
>>>> There are multiple of those integrated controllers in the system, all
>>>> broken in the same way.
>>>>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>>
>>>> While not strictly a bugfix, I consider this a worthy improvement so that
>>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>>>> capabilities.  Hence I think this change should be considered for inclusion
>>>> into 4.21.  There a risk of regressing on hardware that was already working
>>>> with PVH, but given enough testing that should be minimal.
>>>> ---
>>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 45 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>>>> index 54a5070733aa..8458955d5bbb 100644
>>>> --- a/xen/drivers/vpci/msix.c
>>>> +++ b/xen/drivers/vpci/msix.c
>>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>>>      if ( !msix )
>>>>          return -ENOMEM;
>>>>  
>>>> +    msix->tables[VPCI_MSIX_TABLE] =
>>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>>>> +    msix->tables[VPCI_MSIX_PBA] =
>>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>>>> +
>>>> +    /* Check that the provided BAR is valid. */
>>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>>> +    {
>>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>>>> +        unsigned int type;
>>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>>>> +        unsigned int size =
>>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>>>> +
>>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>>>
>>> This assumes a type 0 header. For type 1 headers, bir values 2 and up are
>>> also reserved.
>>
>> Right, but those BARs will be set as VPCI_BAR_EMPTY for type 1 headers.
>> The check here is to avoid doing an out of bounds array access, the
>> check for validity of the pointed BAR is done below.
>>
>>>
>>>> +        {
>>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>>>> +                   &pdev->sbdf, name, bir);
>>>
>>> Nit: placing the cleanup label at the end of the function and using 'rc' would
>>> make it more amenable to future uses.
>>
>> The issue with that is that we then end up with a structure like:
>>
>>     return vpci_make_msix_hole();
>>
>>  error:
>>     xfree();
>>     return rc;
>>
>> Which I don't like much because of the double usage of return (it's a
>> taste issue TBH).
>>
>> My motivation for using a goto is that they are conceptually the same
>> error path, but we provide different log messages to aid in debugging
>> the issue.  Otherwise all checks will be done in a single condition.
> 
> I agree here, yet I'd like to point out that (iirc) Misra wants us to use
> only forward goto-s (which imo is a mistake, but I don't expect they're
> going to change their minds). So flipping where the label and goto are
> may be desirable.

Aren't we planning to deviate rule 15.2? See [1]

[1] https://lore.kernel.org/xen-devel/7e2993c68fda95d1eda6fd136750fcba@bugseng.com/

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Stewart Hildebrand 4 months ago
On 10/6/25 04:20, Roger Pau Monné wrote:
> On Fri, Oct 03, 2025 at 11:29:40PM -0400, Stewart Hildebrand wrote:
>> On 9/29/25 04:41, Roger Pau Monne wrote:
>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>>>
>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>>
>> s/crave/carve/
>>
>>> the p2m.  This leads to errors like the one below being reported by Xen:
>>>
>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>>>
>>> And the device left unable to enable memory decoding due to the failure
>>> reported by vpci_make_msix_hole().
>>>
>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>>> tables are usable.  This requires checking that the BIR points to a
>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>>> target BAR.
>>>
>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>>> EPYC 9965 processors.  The broken device is:
>>>
>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>>>
>>> There are multiple of those integrated controllers in the system, all
>>> broken in the same way.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>
>>> While not strictly a bugfix, I consider this a worthy improvement so that
>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>>> capabilities.  Hence I think this change should be considered for inclusion
>>> into 4.21.  There a risk of regressing on hardware that was already working
>>> with PVH, but given enough testing that should be minimal.
>>> ---
>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 45 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>>> index 54a5070733aa..8458955d5bbb 100644
>>> --- a/xen/drivers/vpci/msix.c
>>> +++ b/xen/drivers/vpci/msix.c
>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>>      if ( !msix )
>>>          return -ENOMEM;
>>>  
>>> +    msix->tables[VPCI_MSIX_TABLE] =
>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>>> +    msix->tables[VPCI_MSIX_PBA] =
>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>>> +
>>> +    /* Check that the provided BAR is valid. */
>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>> +    {
>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>>> +        unsigned int type;
>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>>> +        unsigned int size =
>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>>> +
>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>>
>> This assumes a type 0 header. For type 1 headers, bir values 2 and up are
>> also reserved.
> 
> Right, but those BARs will be set as VPCI_BAR_EMPTY for type 1 headers.
> The check here is to avoid doing an out of bounds array access, the
> check for validity of the pointed BAR is done below.

OK, makes sense.

>>
>>> +        {
>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>>> +                   &pdev->sbdf, name, bir);
>>
>> Nit: placing the cleanup label at the end of the function and using 'rc' would
>> make it more amenable to future uses.
> 
> The issue with that is that we then end up with a structure like:
> 
>     return vpci_make_msix_hole();
> 
>  error:
>     xfree();
>     return rc;
> 
> Which I don't like much because of the double usage of return (it's a
> taste issue TBH).
> 
> My motivation for using a goto is that they are conceptually the same
> error path, but we provide different log messages to aid in debugging
> the issue.  Otherwise all checks will be done in a single condition.
> 
> Let me know how strong you feel about the usage of a label here vs one
> at the tail of the function.

Not very strongly, hence the Nit: prefix.

>>
>>> + invalid:
>>> +            xfree(msix);
>>> +            return -ENODEV;
>>> +
>>
>> Extraneous newline.
> 
> Thanks, Roger.

Overall the patch looks good to me. With the commit message typo fixed, and the
newline removed:

Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Lastly, I don't have a strong opinion on Alejandro's suggested prefix, so my R-b
stands with or without that.

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Alejandro Vallejo 4 months, 1 week ago
On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
> I've had the luck to come across a PCI card that exposes a MSI-X capability
> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>
> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> still use the address of such empty BAR (0) and attempt to crave a hole in
> the p2m.  This leads to errors like the one below being reported by Xen:
>
> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>
> And the device left unable to enable memory decoding due to the failure
> reported by vpci_make_msix_hole().
>
> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> tables are usable.  This requires checking that the BIR points to a
> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> target BAR.
>
> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> EPYC 9965 processors.  The broken device is:
>
> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>
> There are multiple of those integrated controllers in the system, all
> broken in the same way.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> While not strictly a bugfix, I consider this a worthy improvement so that
> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> capabilities.  Hence I think this change should be considered for inclusion
> into 4.21.  There a risk of regressing on hardware that was already working
> with PVH, but given enough testing that should be minimal.
> ---
>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 54a5070733aa..8458955d5bbb 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      if ( !msix )
>          return -ENOMEM;
>  
> +    msix->tables[VPCI_MSIX_TABLE] =
> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> +    msix->tables[VPCI_MSIX_PBA] =
> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> +
> +    /* Check that the provided BAR is valid. */
> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> +    {
> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> +        unsigned int type;
> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> +        unsigned int size =
> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> +
> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
> +        {
> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> +                   &pdev->sbdf, name, bir);

Would it be worth adding something here such that a device vendor testing their
hardware under Xen can trivially grep for device bugs?

Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.

It would also let anyone not very knowledgeable about PCI know that a device
they own is being unreasonable. Same below in the other XENLOG_ERR messages.

> + invalid:
> +            xfree(msix);
> +            return -ENODEV;
> +
> +        }
> +
> +        type = bars[bir].type;
> +        if ( type != VPCI_BAR_MEM32 && type != VPCI_BAR_MEM64_LO )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table at invalid BAR%u with type %u\n",
> +                   &pdev->sbdf, name, bir, type);
> +            goto invalid;
> +        }
> +
> +        if ( (uint64_t)offset + size > bars[bir].size )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table offset %#x size %#x outside of BAR%u size %#lx\n",
> +                   &pdev->sbdf, name, offset, size, bir, bars[bir].size);
> +            goto invalid;
> +        }
> +    }
> +
>      rc = vpci_add_register(pdev->vpci, control_read, control_write,
>                             msix_control_reg(msix_offset), 2, msix);
>      if ( rc )
> @@ -686,11 +731,6 @@ static int cf_check init_msix(struct pci_dev *pdev)
>      msix->max_entries = max_entries;
>      msix->pdev = pdev;
>  
> -    msix->tables[VPCI_MSIX_TABLE] =
> -        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> -    msix->tables[VPCI_MSIX_PBA] =
> -        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> -
>      for ( i = 0; i < max_entries; i++)
>      {
>          msix->entries[i].masked = true;
Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Roger Pau Monné 4 months, 1 week ago
On Tue, Sep 30, 2025 at 11:15:01AM +0200, Alejandro Vallejo wrote:
> On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
> > I've had the luck to come across a PCI card that exposes a MSI-X capability
> > where the BIR of the vector and PBA tables points at a BAR that has 0 size.
> >
> > This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> > still use the address of such empty BAR (0) and attempt to crave a hole in
> > the p2m.  This leads to errors like the one below being reported by Xen:
> >
> > d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
> >
> > And the device left unable to enable memory decoding due to the failure
> > reported by vpci_make_msix_hole().
> >
> > Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> > tables are usable.  This requires checking that the BIR points to a
> > non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> > target BAR.
> >
> > This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> > EPYC 9965 processors.  The broken device is:
> >
> > 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
> >
> > There are multiple of those integrated controllers in the system, all
> > broken in the same way.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> >
> > While not strictly a bugfix, I consider this a worthy improvement so that
> > PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> > capabilities.  Hence I think this change should be considered for inclusion
> > into 4.21.  There a risk of regressing on hardware that was already working
> > with PVH, but given enough testing that should be minimal.
> > ---
> >  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 45 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> > index 54a5070733aa..8458955d5bbb 100644
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
> >      if ( !msix )
> >          return -ENOMEM;
> >  
> > +    msix->tables[VPCI_MSIX_TABLE] =
> > +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> > +    msix->tables[VPCI_MSIX_PBA] =
> > +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> > +
> > +    /* Check that the provided BAR is valid. */
> > +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> > +    {
> > +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> > +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> > +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> > +        unsigned int type;
> > +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> > +        unsigned int size =
> > +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> > +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> > +
> > +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
> > +        {
> > +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> > +                   &pdev->sbdf, name, bir);
> 
> Would it be worth adding something here such that a device vendor testing their
> hardware under Xen can trivially grep for device bugs?
> 
> Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.
> 
> It would also let anyone not very knowledgeable about PCI know that a device
> they own is being unreasonable. Same below in the other XENLOG_ERR messages.

We could add indeed.  I don't think we haven't done so in the past.
If we go that route I would suggest that I add a:

#define DEVICE_BUG_PREFIX "[Device bug] "

in lib.h or similar, to make sure we use the same prefix uniformly.
TBH I think vendors care little about the output of Xen, as long as it
boots.

The downside of this is that it makes those messages longer, which
will require more time to print if using a slow UART.

Thanks, Roger.

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Alejandro Vallejo 4 months ago
On Tue Sep 30, 2025 at 2:57 PM CEST, Roger Pau Monné wrote:
> On Tue, Sep 30, 2025 at 11:15:01AM +0200, Alejandro Vallejo wrote:
>> On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
>> > I've had the luck to come across a PCI card that exposes a MSI-X capability
>> > where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>> >
>> > This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>> > still use the address of such empty BAR (0) and attempt to crave a hole in
>> > the p2m.  This leads to errors like the one below being reported by Xen:
>> >
>> > d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>> >
>> > And the device left unable to enable memory decoding due to the failure
>> > reported by vpci_make_msix_hole().
>> >
>> > Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>> > tables are usable.  This requires checking that the BIR points to a
>> > non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>> > target BAR.
>> >
>> > This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>> > EPYC 9965 processors.  The broken device is:
>> >
>> > 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>> >
>> > There are multiple of those integrated controllers in the system, all
>> > broken in the same way.
>> >
>> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> > ---
>> > Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> > Cc: Jan Beulich <jbeulich@suse.com>
>> > Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> >
>> > While not strictly a bugfix, I consider this a worthy improvement so that
>> > PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>> > capabilities.  Hence I think this change should be considered for inclusion
>> > into 4.21.  There a risk of regressing on hardware that was already working
>> > with PVH, but given enough testing that should be minimal.
>> > ---
>> >  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>> >  1 file changed, 45 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> > index 54a5070733aa..8458955d5bbb 100644
>> > --- a/xen/drivers/vpci/msix.c
>> > +++ b/xen/drivers/vpci/msix.c
>> > @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>> >      if ( !msix )
>> >          return -ENOMEM;
>> >  
>> > +    msix->tables[VPCI_MSIX_TABLE] =
>> > +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>> > +    msix->tables[VPCI_MSIX_PBA] =
>> > +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>> > +
>> > +    /* Check that the provided BAR is valid. */
>> > +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>> > +    {
>> > +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>> > +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>> > +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>> > +        unsigned int type;
>> > +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>> > +        unsigned int size =
>> > +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>> > +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>> > +
>> > +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>> > +        {
>> > +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>> > +                   &pdev->sbdf, name, bir);
>> 
>> Would it be worth adding something here such that a device vendor testing their
>> hardware under Xen can trivially grep for device bugs?
>> 
>> Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.
>> 
>> It would also let anyone not very knowledgeable about PCI know that a device
>> they own is being unreasonable. Same below in the other XENLOG_ERR messages.
>
> We could add indeed.  I don't think we haven't done so in the past.
> If we go that route I would suggest that I add a:
>
> #define DEVICE_BUG_PREFIX "[Device bug] "
>
> in lib.h or similar, to make sure we use the same prefix uniformly.
> TBH

That works. As would DEV_BUG_PREFIX, XENLOG_DEV_BUG or even just DEV_BUG.

LGTM in any form or shape that makes it patently clear the admin is running
on buggy hardware.

> I think vendors care little about the output of Xen, as long as it boots.

They might care more once they realise they can "grep" lines of interest. That's
what happened on Linux, after all.

Also, it's not just for vendors. Users and developers alike might be interested
in using these message as a "taint" to know when they bought known-bad hardware.

Maybe to avoid buying it twice.

>
> The downside of this is that it makes those messages longer, which
> will require more time to print if using a slow UART.

Only the reporting of known bugs. Which isn't a high throughput operation.

Hopefully.

Cheers,
Alejandro
Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Andrew Cooper 4 months ago
On 06/10/2025 2:55 pm, Alejandro Vallejo wrote:
> On Tue Sep 30, 2025 at 2:57 PM CEST, Roger Pau Monné wrote:
>> On Tue, Sep 30, 2025 at 11:15:01AM +0200, Alejandro Vallejo wrote:
>>> On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
>>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>>>>
>>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>>>> the p2m.  This leads to errors like the one below being reported by Xen:
>>>>
>>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>>>>
>>>> And the device left unable to enable memory decoding due to the failure
>>>> reported by vpci_make_msix_hole().
>>>>
>>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>>>> tables are usable.  This requires checking that the BIR points to a
>>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>>>> target BAR.
>>>>
>>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>>>> EPYC 9965 processors.  The broken device is:
>>>>
>>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>>>>
>>>> There are multiple of those integrated controllers in the system, all
>>>> broken in the same way.
>>>>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> ---
>>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>>>
>>>> While not strictly a bugfix, I consider this a worthy improvement so that
>>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>>>> capabilities.  Hence I think this change should be considered for inclusion
>>>> into 4.21.  There a risk of regressing on hardware that was already working
>>>> with PVH, but given enough testing that should be minimal.
>>>> ---
>>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>>>>  1 file changed, 45 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>>>> index 54a5070733aa..8458955d5bbb 100644
>>>> --- a/xen/drivers/vpci/msix.c
>>>> +++ b/xen/drivers/vpci/msix.c
>>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>>>>      if ( !msix )
>>>>          return -ENOMEM;
>>>>  
>>>> +    msix->tables[VPCI_MSIX_TABLE] =
>>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>>>> +    msix->tables[VPCI_MSIX_PBA] =
>>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>>>> +
>>>> +    /* Check that the provided BAR is valid. */
>>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>>>> +    {
>>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>>>> +        unsigned int type;
>>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>>>> +        unsigned int size =
>>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>>>> +
>>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>>>> +        {
>>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>>>> +                   &pdev->sbdf, name, bir);
>>> Would it be worth adding something here such that a device vendor testing their
>>> hardware under Xen can trivially grep for device bugs?
>>>
>>> Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.
>>>
>>> It would also let anyone not very knowledgeable about PCI know that a device
>>> they own is being unreasonable. Same below in the other XENLOG_ERR messages.
>> We could add indeed.  I don't think we haven't done so in the past.
>> If we go that route I would suggest that I add a:
>>
>> #define DEVICE_BUG_PREFIX "[Device bug] "
>>
>> in lib.h or similar, to make sure we use the same prefix uniformly.
>> TBH

We have several FIRMWARE BUG's in Xen already, and several more that
ought to move to this pattern.

Given that Linux has definitely been booted on this hardware, we should
match whichever prefix they use for messages about this.

What's unclear is whether AMD can even fix this with a firmware update. 
I would have expected that the PCIe hardblock would have prevented
making this mistake, but clearly not...

~Andrew

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Roger Pau Monné 4 months ago
On Mon, Oct 06, 2025 at 03:29:11PM +0100, Andrew Cooper wrote:
> On 06/10/2025 2:55 pm, Alejandro Vallejo wrote:
> > On Tue Sep 30, 2025 at 2:57 PM CEST, Roger Pau Monné wrote:
> >> On Tue, Sep 30, 2025 at 11:15:01AM +0200, Alejandro Vallejo wrote:
> >>> On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
> >>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
> >>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
> >>>>
> >>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> >>>> still use the address of such empty BAR (0) and attempt to crave a hole in
> >>>> the p2m.  This leads to errors like the one below being reported by Xen:
> >>>>
> >>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
> >>>>
> >>>> And the device left unable to enable memory decoding due to the failure
> >>>> reported by vpci_make_msix_hole().
> >>>>
> >>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> >>>> tables are usable.  This requires checking that the BIR points to a
> >>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> >>>> target BAR.
> >>>>
> >>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> >>>> EPYC 9965 processors.  The broken device is:
> >>>>
> >>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
> >>>>
> >>>> There are multiple of those integrated controllers in the system, all
> >>>> broken in the same way.
> >>>>
> >>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>> ---
> >>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
> >>>> Cc: Jan Beulich <jbeulich@suse.com>
> >>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> >>>>
> >>>> While not strictly a bugfix, I consider this a worthy improvement so that
> >>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> >>>> capabilities.  Hence I think this change should be considered for inclusion
> >>>> into 4.21.  There a risk of regressing on hardware that was already working
> >>>> with PVH, but given enough testing that should be minimal.
> >>>> ---
> >>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
> >>>>  1 file changed, 45 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> >>>> index 54a5070733aa..8458955d5bbb 100644
> >>>> --- a/xen/drivers/vpci/msix.c
> >>>> +++ b/xen/drivers/vpci/msix.c
> >>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
> >>>>      if ( !msix )
> >>>>          return -ENOMEM;
> >>>>  
> >>>> +    msix->tables[VPCI_MSIX_TABLE] =
> >>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> >>>> +    msix->tables[VPCI_MSIX_PBA] =
> >>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> >>>> +
> >>>> +    /* Check that the provided BAR is valid. */
> >>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> >>>> +    {
> >>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> >>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> >>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> >>>> +        unsigned int type;
> >>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> >>>> +        unsigned int size =
> >>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> >>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> >>>> +
> >>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
> >>>> +        {
> >>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> >>>> +                   &pdev->sbdf, name, bir);
> >>> Would it be worth adding something here such that a device vendor testing their
> >>> hardware under Xen can trivially grep for device bugs?
> >>>
> >>> Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.
> >>>
> >>> It would also let anyone not very knowledgeable about PCI know that a device
> >>> they own is being unreasonable. Same below in the other XENLOG_ERR messages.
> >> We could add indeed.  I don't think we haven't done so in the past.
> >> If we go that route I would suggest that I add a:
> >>
> >> #define DEVICE_BUG_PREFIX "[Device bug] "
> >>
> >> in lib.h or similar, to make sure we use the same prefix uniformly.
> >> TBH
> 
> We have several FIRMWARE BUG's in Xen already, and several more that
> ought to move to this pattern.
> 
> Given that Linux has definitely been booted on this hardware, we should
> match whichever prefix they use for messages about this.

I don't think Linux prints any message about this, it simply ignores
the capability.

We have another instance of having to support buggy devices in vPCI:
when a device places registers in the same 4K page as the MSI-X vector
or PBA tables.  In that case the offending device was an Intel
Wireless card.

I'm happy to use "[Device Bug]", will adjust the patch this afternoon.

> What's unclear is whether AMD can even fix this with a firmware update. 
> I would have expected that the PCIe hardblock would have prevented
> making this mistake, but clearly not...

I didn't want to point fingers :), I have no idea if it can be fixed
in firmware.

Thanks, Roger.

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Alejandro Vallejo 4 months ago
On Tue Oct 7, 2025 at 9:16 AM CEST, Roger Pau Monné wrote:
> On Mon, Oct 06, 2025 at 03:29:11PM +0100, Andrew Cooper wrote:
>> On 06/10/2025 2:55 pm, Alejandro Vallejo wrote:
>> > On Tue Sep 30, 2025 at 2:57 PM CEST, Roger Pau Monné wrote:
>> >> On Tue, Sep 30, 2025 at 11:15:01AM +0200, Alejandro Vallejo wrote:
>> >>> On Mon Sep 29, 2025 at 10:41 AM CEST, Roger Pau Monne wrote:
>> >>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>> >>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>> >>>>
>> >>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>> >>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>> >>>> the p2m.  This leads to errors like the one below being reported by Xen:
>> >>>>
>> >>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>> >>>>
>> >>>> And the device left unable to enable memory decoding due to the failure
>> >>>> reported by vpci_make_msix_hole().
>> >>>>
>> >>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>> >>>> tables are usable.  This requires checking that the BIR points to a
>> >>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>> >>>> target BAR.
>> >>>>
>> >>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>> >>>> EPYC 9965 processors.  The broken device is:
>> >>>>
>> >>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>> >>>>
>> >>>> There are multiple of those integrated controllers in the system, all
>> >>>> broken in the same way.
>> >>>>
>> >>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> >>>> ---
>> >>>> Cc: Stewart Hildebrand <stewart.hildebrand@amd.com>
>> >>>> Cc: Jan Beulich <jbeulich@suse.com>
>> >>>> Cc: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> >>>>
>> >>>> While not strictly a bugfix, I consider this a worthy improvement so that
>> >>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>> >>>> capabilities.  Hence I think this change should be considered for inclusion
>> >>>> into 4.21.  There a risk of regressing on hardware that was already working
>> >>>> with PVH, but given enough testing that should be minimal.
>> >>>> ---
>> >>>>  xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>> >>>>  1 file changed, 45 insertions(+), 5 deletions(-)
>> >>>>
>> >>>> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
>> >>>> index 54a5070733aa..8458955d5bbb 100644
>> >>>> --- a/xen/drivers/vpci/msix.c
>> >>>> +++ b/xen/drivers/vpci/msix.c
>> >>>> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>> >>>>      if ( !msix )
>> >>>>          return -ENOMEM;
>> >>>>  
>> >>>> +    msix->tables[VPCI_MSIX_TABLE] =
>> >>>> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
>> >>>> +    msix->tables[VPCI_MSIX_PBA] =
>> >>>> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
>> >>>> +
>> >>>> +    /* Check that the provided BAR is valid. */
>> >>>> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>> >>>> +    {
>> >>>> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
>> >>>> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
>> >>>> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
>> >>>> +        unsigned int type;
>> >>>> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
>> >>>> +        unsigned int size =
>> >>>> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
>> >>>> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
>> >>>> +
>> >>>> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
>> >>>> +        {
>> >>>> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
>> >>>> +                   &pdev->sbdf, name, bir);
>> >>> Would it be worth adding something here such that a device vendor testing their
>> >>> hardware under Xen can trivially grep for device bugs?
>> >>>
>> >>> Something akin to "[Firmware bug]" on Linux, like "[Device bug]" or some such.
>> >>>
>> >>> It would also let anyone not very knowledgeable about PCI know that a device
>> >>> they own is being unreasonable. Same below in the other XENLOG_ERR messages.
>> >> We could add indeed.  I don't think we haven't done so in the past.
>> >> If we go that route I would suggest that I add a:
>> >>
>> >> #define DEVICE_BUG_PREFIX "[Device bug] "
>> >>
>> >> in lib.h or similar, to make sure we use the same prefix uniformly.
>> >> TBH
>> 
>> We have several FIRMWARE BUG's in Xen already, and several more that
>> ought to move to this pattern.
>> 
>> Given that Linux has definitely been booted on this hardware, we should
>> match whichever prefix they use for messages about this.
>
> I don't think Linux prints any message about this, it simply ignores
> the capability.
>
> We have another instance of having to support buggy devices in vPCI:
> when a device places registers in the same 4K page as the MSI-X vector
> or PBA tables.  In that case the offending device was an Intel
> Wireless card.
>
> I'm happy to use "[Device Bug]", will adjust the patch this afternoon.
>

In Linux's printk.h

	/*
	 * FW_BUG
	 * Add this to a message where you are sure the firmware is buggy or behaves
	 * really stupid or out of spec. Be aware that the responsible BIOS developer
	 * should be able to fix this issue or at least get a concrete idea of the
	 * problem by reading your message without the need of looking at the kernel
	 * code.
	 *
	 * Use it for definite and high priority BIOS bugs.
	 *
	 * FW_WARN
	 * Use it for not that clear (e.g. could the kernel messed up things already?)
	 * and medium priority BIOS bugs.
	 *
	 * FW_INFO
	 * Use this one if you want to tell the user or vendor about something
	 * suspicious, but generally harmless related to the firmware.
	 *
	 * Use it for information or very low priority BIOS bugs.
	 */
	#define FW_BUG		"[Firmware Bug]: "
	#define FW_WARN		"[Firmware Warn]: "
	#define FW_INFO		"[Firmware Info]: "

	/*
	 * HW_ERR
	 * Add this to a message for hardware errors, so that user can report
	 * it to hardware vendor instead of LKML or software vendor.
	 */
	#define HW_ERR		"[Hardware Error]: "

HW_ERR seems wired to MCEs and the like rather than everything covered by
quirks.c in Linux, but using [Hardware Error] might help grep scripts by
matching existing messages in Linux's dmesg.

Otherwise, "[Devive Bug]" works just as well.

Cheers,
Alejandro
Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Oleksii Kurochko 4 months, 1 week ago
On 9/29/25 10:41 AM, Roger Pau Monne wrote:
> I've had the luck to come across a PCI card that exposes a MSI-X capability
> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>
> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> still use the address of such empty BAR (0) and attempt to crave a hole in
> the p2m.  This leads to errors like the one below being reported by Xen:
>
> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>
> And the device left unable to enable memory decoding due to the failure
> reported by vpci_make_msix_hole().
>
> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> tables are usable.  This requires checking that the BIR points to a
> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> target BAR.
>
> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> EPYC 9965 processors.  The broken device is:
>
> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>
> There are multiple of those integrated controllers in the system, all
> broken in the same way.
>
> Signed-off-by: Roger Pau Monné<roger.pau@citrix.com>
> ---
> Cc: Stewart Hildebrand<stewart.hildebrand@amd.com>
> Cc: Jan Beulich<jbeulich@suse.com>
> Cc: Oleksii Kurochko<oleksii.kurochko@gmail.com>
>
> While not strictly a bugfix, I consider this a worthy improvement so that
> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> capabilities.

Based on your commit description it looks like a bug as without it, for example,
SATA controller can't be used, right?

>   Hence I think this change should be considered for inclusion
> into 4.21.  There a risk of regressing on hardware that was already working
> with PVH, but given enough testing that should be minimal.

We have some PVH tests in Xen’s GitLab CI, but I assume that isn’t enough?

~ Oleksii

> ---
>   xen/drivers/vpci/msix.c | 50 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index 54a5070733aa..8458955d5bbb 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -675,6 +675,51 @@ static int cf_check init_msix(struct pci_dev *pdev)
>       if ( !msix )
>           return -ENOMEM;
>   
> +    msix->tables[VPCI_MSIX_TABLE] =
> +        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> +    msix->tables[VPCI_MSIX_PBA] =
> +        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> +
> +    /* Check that the provided BAR is valid. */
> +    for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
> +    {
> +        const char *name = (i == VPCI_MSIX_TABLE) ? "vector" : "PBA";
> +        const struct vpci_bar *bars = pdev->vpci->header.bars;
> +        unsigned int bir = msix->tables[i] & PCI_MSIX_BIRMASK;
> +        unsigned int type;
> +        unsigned int offset = msix->tables[i] & ~PCI_MSIX_BIRMASK;
> +        unsigned int size =
> +            (i == VPCI_MSIX_TABLE) ? max_entries * PCI_MSIX_ENTRY_SIZE
> +                                   : ROUNDUP(DIV_ROUND_UP(max_entries, 8), 8);
> +
> +        if ( bir >= ARRAY_SIZE(pdev->vpci->header.bars) )
> +        {
> +            printk(XENLOG_ERR "%pp: MSI-X %s table with out of range BIR %u\n",
> +                   &pdev->sbdf, name, bir);
> + invalid:
> +            xfree(msix);
> +            return -ENODEV;
> +
> +        }
> +
> +        type = bars[bir].type;
> +        if ( type != VPCI_BAR_MEM32 && type != VPCI_BAR_MEM64_LO )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table at invalid BAR%u with type %u\n",
> +                   &pdev->sbdf, name, bir, type);
> +            goto invalid;
> +        }
> +
> +        if ( (uint64_t)offset + size > bars[bir].size )
> +        {
> +            printk(XENLOG_ERR
> +                   "%pp: MSI-X %s table offset %#x size %#x outside of BAR%u size %#lx\n",
> +                   &pdev->sbdf, name, offset, size, bir, bars[bir].size);
> +            goto invalid;
> +        }
> +    }
> +
>       rc = vpci_add_register(pdev->vpci, control_read, control_write,
>                              msix_control_reg(msix_offset), 2, msix);
>       if ( rc )
> @@ -686,11 +731,6 @@ static int cf_check init_msix(struct pci_dev *pdev)
>       msix->max_entries = max_entries;
>       msix->pdev = pdev;
>   
> -    msix->tables[VPCI_MSIX_TABLE] =
> -        pci_conf_read32(pdev->sbdf, msix_table_offset_reg(msix_offset));
> -    msix->tables[VPCI_MSIX_PBA] =
> -        pci_conf_read32(pdev->sbdf, msix_pba_offset_reg(msix_offset));
> -
>       for ( i = 0; i < max_entries; i++)
>       {
>           msix->entries[i].masked = true;
Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Roger Pau Monné 4 months, 1 week ago
On Mon, Sep 29, 2025 at 05:59:00PM +0200, Oleksii Kurochko wrote:
> 
> On 9/29/25 10:41 AM, Roger Pau Monne wrote:
> > I've had the luck to come across a PCI card that exposes a MSI-X capability
> > where the BIR of the vector and PBA tables points at a BAR that has 0 size.
> > 
> > This doesn't play nice with the code in vpci_make_msix_hole(), as it would
> > still use the address of such empty BAR (0) and attempt to crave a hole in
> > the p2m.  This leads to errors like the one below being reported by Xen:
> > 
> > d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
> > 
> > And the device left unable to enable memory decoding due to the failure
> > reported by vpci_make_msix_hole().
> > 
> > Introduce checking in init_msix() to ensure the BARs containing the MSI-X
> > tables are usable.  This requires checking that the BIR points to a
> > non-empty BAR, and the offset and size of the MSI-X tables can fit in the
> > target BAR.
> > 
> > This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
> > EPYC 9965 processors.  The broken device is:
> > 
> > 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
> > 
> > There are multiple of those integrated controllers in the system, all
> > broken in the same way.
> > 
> > Signed-off-by: Roger Pau Monné<roger.pau@citrix.com>
> > ---
> > Cc: Stewart Hildebrand<stewart.hildebrand@amd.com>
> > Cc: Jan Beulich<jbeulich@suse.com>
> > Cc: Oleksii Kurochko<oleksii.kurochko@gmail.com>
> > 
> > While not strictly a bugfix, I consider this a worthy improvement so that
> > PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
> > capabilities.
> 
> Based on your commit description it looks like a bug as without it, for example,
> SATA controller can't be used, right?
> 
> >   Hence I think this change should be considered for inclusion
> > into 4.21.  There a risk of regressing on hardware that was already working
> > with PVH, but given enough testing that should be minimal.
> 
> We have some PVH tests in Xen’s GitLab CI, but I assume that isn’t enough?

It's a very specific controller, which we don't seem to have any
examples of in the lab.  The model is in the commit message.  Without
this fix the device doesn't work as expected when used in PVH dom0
mode.

Regards, Roger.

Re: [PATCH for-4.21] vpci/msix: improve handling of bogus MSI-X capabilities
Posted by Oleksii Kurochko 4 months, 1 week ago
On 9/30/25 2:46 PM, Roger Pau Monné wrote:
> On Mon, Sep 29, 2025 at 05:59:00PM +0200, Oleksii Kurochko wrote:
>> On 9/29/25 10:41 AM, Roger Pau Monne wrote:
>>> I've had the luck to come across a PCI card that exposes a MSI-X capability
>>> where the BIR of the vector and PBA tables points at a BAR that has 0 size.
>>>
>>> This doesn't play nice with the code in vpci_make_msix_hole(), as it would
>>> still use the address of such empty BAR (0) and attempt to crave a hole in
>>> the p2m.  This leads to errors like the one below being reported by Xen:
>>>
>>> d0v0 0000:22:00.0: existing mapping (mfn: 181c4300 type: 0) at 0 clobbers MSIX MMIO area
>>>
>>> And the device left unable to enable memory decoding due to the failure
>>> reported by vpci_make_msix_hole().
>>>
>>> Introduce checking in init_msix() to ensure the BARs containing the MSI-X
>>> tables are usable.  This requires checking that the BIR points to a
>>> non-empty BAR, and the offset and size of the MSI-X tables can fit in the
>>> target BAR.
>>>
>>> This fixes booting PVH dom0 on Supermicro AS -2126HS-TN severs with AMD
>>> EPYC 9965 processors.  The broken device is:
>>>
>>> 22:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 93)
>>>
>>> There are multiple of those integrated controllers in the system, all
>>> broken in the same way.
>>>
>>> Signed-off-by: Roger Pau Monné<roger.pau@citrix.com>
>>> ---
>>> Cc: Stewart Hildebrand<stewart.hildebrand@amd.com>
>>> Cc: Jan Beulich<jbeulich@suse.com>
>>> Cc: Oleksii Kurochko<oleksii.kurochko@gmail.com>
>>>
>>> While not strictly a bugfix, I consider this a worthy improvement so that
>>> PVH dom0 has a chance to boot on hardware that exposes such broken MSI-X
>>> capabilities.
>> Based on your commit description it looks like a bug as without it, for example,
>> SATA controller can't be used, right?
>>
>>>    Hence I think this change should be considered for inclusion
>>> into 4.21.  There a risk of regressing on hardware that was already working
>>> with PVH, but given enough testing that should be minimal.
>> We have some PVH tests in Xen’s GitLab CI, but I assume that isn’t enough?
> It's a very specific controller, which we don't seem to have any
> examples of in the lab.  The model is in the commit message.  Without
> this fix the device doesn't work as expected when used in PVH dom0
> mode.

Thanks for the explanation.

I think we should consider to have this in 4.21 as it is still a fix of bogus
behavior:
  Released-Acked-By: Oleksii Kurochko<oleksii.kurochko@gmail.com>

~ Oleksii