[PATCH RFC] xen/pci: detect when BARs overlap RAM

Roger Pau Monne posted 1 patch 2 years, 3 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220126122648.52275-1-roger.pau@citrix.com
xen/arch/arm/mm.c             | 11 ++++++
xen/arch/x86/mm.c             | 20 +++++++++++
xen/drivers/passthrough/pci.c | 68 ++++++++++++++++++++++++++++++++++-
xen/include/xen/mm.h          |  2 ++
4 files changed, 100 insertions(+), 1 deletion(-)
[PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Roger Pau Monne 2 years, 3 months ago
One of the boxes where I was attempting to boot Xen in PVH dom0 mode
has quirky firmware, as it will handover with a device with memory
decoding enabled and a BAR of size 4K at address 0. Such BAR overlaps
with a RAM range on the e820.

This interacts badly with the dom0 PVH build, as BARs will be setup on
the p2m before RAM, so if there's a BAR positioned over a RAM region
it will trigger a domain crash when the dom0 builder attempts to
populate that region with a regular RAM page.

It's in general a very bad idea to have a BAR overlapping with a RAM
region, so add some sanity checks for devices that are added with
memory decoding enabled in order to assure that BARs are not placed on
top of memory regions. If overlaps are detected just disable the
memory decoding bit for the device and expect the hardware domain to
properly position the BAR.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
RFC because:
 - Not sure the best way to implement is_iomem_range on Arm. BARs can
   be quite big, so iterating over every possible page is not ideal.
 - is_iomem_page cannot be used for this purpose on x86, because all
   the low 1MB will return true due to belonging to dom_io.
 - VF BARs are not checked. Should we also check them and disable VF
   if overlaps in a followup patch?
---
 xen/arch/arm/mm.c             | 11 ++++++
 xen/arch/x86/mm.c             | 20 +++++++++++
 xen/drivers/passthrough/pci.c | 68 ++++++++++++++++++++++++++++++++++-
 xen/include/xen/mm.h          |  2 ++
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index eea926d823..fa4cee64c7 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1625,6 +1625,17 @@ bool is_iomem_page(mfn_t mfn)
     return !mfn_valid(mfn);
 }
 
+bool is_iomem_range(uint64_t start, uint64_t size)
+{
+    unsigned int i;
+
+    for ( i = 0; i < PFN_UP(size); i++ )
+        if ( !is_iomem_page(_mfn(PFN_DOWN(start) + i)) )
+            return false;
+
+    return true;
+}
+
 void clear_and_clean_page(struct page_info *page)
 {
     void *p = __map_domain_page(page);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1397f83e41..03699b2227 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
     return type ?: RAM_TYPE_UNKNOWN;
 }
 
+bool is_iomem_range(uint64_t start, uint64_t size)
+{
+    unsigned int i;
+
+    for ( i = 0; i < e820.nr_map; i++ )
+    {
+        struct e820entry *entry = &e820.map[i];
+
+        if ( entry->type != E820_RAM && entry->type != E820_ACPI &&
+             entry->type != E820_NVS )
+            continue;
+
+        if ( start < entry->addr + entry->size &&
+             entry->addr < start + size )
+            return false;
+    }
+
+    return true;
+}
+
 unsigned long domain_get_maximum_gpfn(struct domain *d)
 {
     if ( is_hvm_domain(d) )
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 0d8ab2e716..032df71393 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -233,6 +233,7 @@ static void check_pdev(const struct pci_dev *pdev)
      PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT | \
      PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY)
     u16 val;
+    unsigned int nbars = 0, rom_pos = 0, i;
 
     if ( command_mask )
     {
@@ -251,6 +252,8 @@ static void check_pdev(const struct pci_dev *pdev)
     switch ( pci_conf_read8(pdev->sbdf, PCI_HEADER_TYPE) & 0x7f )
     {
     case PCI_HEADER_TYPE_BRIDGE:
+        nbars = PCI_HEADER_BRIDGE_NR_BARS;
+        rom_pos = PCI_ROM_ADDRESS1;
         if ( !bridge_ctl_mask )
             break;
         val = pci_conf_read16(pdev->sbdf, PCI_BRIDGE_CONTROL);
@@ -267,11 +270,74 @@ static void check_pdev(const struct pci_dev *pdev)
         }
         break;
 
+    case PCI_HEADER_TYPE_NORMAL:
+        nbars = PCI_HEADER_NORMAL_NR_BARS;
+        rom_pos = PCI_ROM_ADDRESS;
+        break;
+
     case PCI_HEADER_TYPE_CARDBUS:
         /* TODO */
         break;
     }
 #undef PCI_STATUS_CHECK
+
+    /* Check if BARs overlap with RAM regions. */
+    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
+    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
+        return;
+
+    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
+    for ( i = 0; i < nbars; )
+    {
+        uint64_t addr, size;
+        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
+        int rc = 1;
+
+        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
+             PCI_BASE_ADDRESS_SPACE_MEMORY )
+            goto next;
+
+        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
+                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
+        if ( rc < 0 )
+            return;
+        if ( size && !is_iomem_range(addr, size) )
+        {
+            /*
+             * Return without enabling memory decoding if BAR overlaps with
+             * RAM region.
+             */
+            printk(XENLOG_WARNING
+                   "%pp: disabled: BAR%u [%" PRIx64 ", %" PRIx64
+                   ") overlaps with RAM\n",
+                   &pdev->sbdf, i, addr, addr + size);
+            return;
+        }
+
+ next:
+        ASSERT(rc > 0);
+        i += rc;
+    }
+
+    if ( pci_conf_read32(pdev->sbdf, rom_pos) & PCI_ROM_ADDRESS_ENABLE )
+    {
+        uint64_t addr, size;
+        int rc = pci_size_mem_bar(pdev->sbdf, rom_pos, &addr, &size,
+                                  PCI_BAR_ROM);
+
+        if ( rc < 0 )
+            return;
+        if ( size && !is_iomem_range(addr, size) )
+        {
+            printk(XENLOG_WARNING
+                   "%pp: disabled: ROM BAR [%" PRIx64 ", %" PRIx64
+                   ") overlaps with RAM\n",
+                   &pdev->sbdf, addr, addr + size);
+            return;
+        }
+    }
+
+    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val);
 }
 
 static void apply_quirks(struct pci_dev *pdev)
@@ -399,8 +465,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
             break;
     }
 
-    check_pdev(pdev);
     apply_quirks(pdev);
+    check_pdev(pdev);
 
     return pdev;
 }
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 5db26ed477..764dcad5b3 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -554,6 +554,8 @@ int __must_check steal_page(struct domain *d, struct page_info *page,
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type);
 /* Returns the page type(s). */
 unsigned int page_get_ram_type(mfn_t mfn);
+/* Check if a range is in IO suitable memory. */
+bool is_iomem_range(uint64_t start, uint64_t size);
 
 /* Prepare/destroy a ring for a dom0 helper. Helper with talk
  * with Xen on behalf of this domain. */
-- 
2.34.1


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Andrew Cooper 2 years, 3 months ago
On 26/01/2022 12:26, Roger Pau Monne wrote:
> One of the boxes where I was attempting to boot Xen in PVH dom0 mode
> has quirky firmware, as it will handover with a device with memory
> decoding enabled and a BAR of size 4K at address 0. Such BAR overlaps
> with a RAM range on the e820.
>
> This interacts badly with the dom0 PVH build, as BARs will be setup on
> the p2m before RAM, so if there's a BAR positioned over a RAM region
> it will trigger a domain crash when the dom0 builder attempts to
> populate that region with a regular RAM page.
>
> It's in general a very bad idea to have a BAR overlapping with a RAM
> region, so add some sanity checks for devices that are added with
> memory decoding enabled in order to assure that BARs are not placed on
> top of memory regions. If overlaps are detected just disable the
> memory decoding bit for the device and expect the hardware domain to
> properly position the BAR.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I'm not sure this is a sensible approach.

A bar at 0 is utterly unusable, because it is outside of the host bridge
window.

The absence of any coherent GPA map for guests (needs fixing for oh so
many reasons) is a primary contributor to the problem, because Xen
*should* know where the guest's low and high MMIO windows are before
attempting to map BARs into the guest.

The proper fix is to teach Xen about GPA maps, and reject BAR insertion
outside of the respective bridge windows.

An fix in the short term would be to disable the problematic BAR when
scanning the PCI bus to being with.

~Andrew

Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Jan Beulich 2 years, 3 months ago
On 26.01.2022 15:45, Andrew Cooper wrote:
> On 26/01/2022 12:26, Roger Pau Monne wrote:
>> One of the boxes where I was attempting to boot Xen in PVH dom0 mode
>> has quirky firmware, as it will handover with a device with memory
>> decoding enabled and a BAR of size 4K at address 0. Such BAR overlaps
>> with a RAM range on the e820.
>>
>> This interacts badly with the dom0 PVH build, as BARs will be setup on
>> the p2m before RAM, so if there's a BAR positioned over a RAM region
>> it will trigger a domain crash when the dom0 builder attempts to
>> populate that region with a regular RAM page.
>>
>> It's in general a very bad idea to have a BAR overlapping with a RAM
>> region, so add some sanity checks for devices that are added with
>> memory decoding enabled in order to assure that BARs are not placed on
>> top of memory regions. If overlaps are detected just disable the
>> memory decoding bit for the device and expect the hardware domain to
>> properly position the BAR.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I'm not sure this is a sensible approach.
> 
> A bar at 0 is utterly unusable, because it is outside of the host bridge
> window.
> 
> The absence of any coherent GPA map for guests (needs fixing for oh so
> many reasons) is a primary contributor to the problem, because Xen
> *should* know where the guest's low and high MMIO windows are before
> attempting to map BARs into the guest.

But this is all about Dom0, ...

> The proper fix is to teach Xen about GPA maps, and reject BAR insertion
> outside of the respective bridge windows.

... and hence this isn't about "insertion", but about what we find
upon booting.

> An fix in the short term would be to disable the problematic BAR when
> scanning the PCI bus to being with.

You can't disable an individual BAR (except for the ROM one), you need
to disable all memory ones collectively (by disabling memory decode).
Which is what Roger does.

Jan


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Roger Pau Monné 2 years, 3 months ago
On Wed, Jan 26, 2022 at 04:27:04PM +0100, Jan Beulich wrote:
> On 26.01.2022 15:45, Andrew Cooper wrote:
> > On 26/01/2022 12:26, Roger Pau Monne wrote:
> >> One of the boxes where I was attempting to boot Xen in PVH dom0 mode
> >> has quirky firmware, as it will handover with a device with memory
> >> decoding enabled and a BAR of size 4K at address 0. Such BAR overlaps
> >> with a RAM range on the e820.
> >>
> >> This interacts badly with the dom0 PVH build, as BARs will be setup on
> >> the p2m before RAM, so if there's a BAR positioned over a RAM region
> >> it will trigger a domain crash when the dom0 builder attempts to
> >> populate that region with a regular RAM page.
> >>
> >> It's in general a very bad idea to have a BAR overlapping with a RAM
> >> region, so add some sanity checks for devices that are added with
> >> memory decoding enabled in order to assure that BARs are not placed on
> >> top of memory regions. If overlaps are detected just disable the
> >> memory decoding bit for the device and expect the hardware domain to
> >> properly position the BAR.
> >>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > 
> > I'm not sure this is a sensible approach.
> > 
> > A bar at 0 is utterly unusable, because it is outside of the host bridge
> > window.
> > 
> > The absence of any coherent GPA map for guests (needs fixing for oh so
> > many reasons) is a primary contributor to the problem, because Xen
> > *should* know where the guest's low and high MMIO windows are before
> > attempting to map BARs into the guest.
> 
> But this is all about Dom0, ...
> 
> > The proper fix is to teach Xen about GPA maps, and reject BAR insertion
> > outside of the respective bridge windows.
> 
> ... and hence this isn't about "insertion", but about what we find
> upon booting.
> 
> > An fix in the short term would be to disable the problematic BAR when
> > scanning the PCI bus to being with.
> 
> You can't disable an individual BAR (except for the ROM one), you need
> to disable all memory ones collectively (by disabling memory decode).
> Which is what Roger does.

I've thought about just disabling the ROM BAR if that's the only
conflicting one, but it was easier given the context to just disable
memory decoding globally like it's done for plain BARs. Didn't seem
worth the extra code to special case the ROM BAR disabling.

Thanks, Roger.

Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Jan Beulich 2 years, 3 months ago
On 26.01.2022 13:26, Roger Pau Monne wrote:
> RFC because:
>  - Not sure the best way to implement is_iomem_range on Arm. BARs can
>    be quite big, so iterating over every possible page is not ideal.
>  - is_iomem_page cannot be used for this purpose on x86, because all
>    the low 1MB will return true due to belonging to dom_io.

I don't see an issue there - if you were to us it, you'd end up with
the same scalability issue as you point out for Arm.

>  - VF BARs are not checked. Should we also check them and disable VF
>    if overlaps in a followup patch?

Not sure about "followup patch", but once we support SR-IOV for PVH,
that'll want checking, yes.

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
>      return type ?: RAM_TYPE_UNKNOWN;
>  }
>  
> +bool is_iomem_range(uint64_t start, uint64_t size)

Might be nice to have this sit next it is_iomem_page(). And wouldn't
at least "start" better be paddr_t?

> +{
> +    unsigned int i;
> +
> +    for ( i = 0; i < e820.nr_map; i++ )
> +    {
> +        struct e820entry *entry = &e820.map[i];

const?

> +        if ( entry->type != E820_RAM && entry->type != E820_ACPI &&
> +             entry->type != E820_NVS )
> +            continue;

I think UNUSABLE also needs checking for here. Even further, it might
be best to reject any range overlapping an E820 entry, since colliding
with a RESERVED one could mean an overlap with e.g. MMCFG space.

> @@ -267,11 +270,74 @@ static void check_pdev(const struct pci_dev *pdev)
>          }
>          break;
>  
> +    case PCI_HEADER_TYPE_NORMAL:
> +        nbars = PCI_HEADER_NORMAL_NR_BARS;
> +        rom_pos = PCI_ROM_ADDRESS;
> +        break;
> +
>      case PCI_HEADER_TYPE_CARDBUS:
>          /* TODO */
>          break;
>      }
>  #undef PCI_STATUS_CHECK
> +
> +    /* Check if BARs overlap with RAM regions. */
> +    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> +    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
> +        return;
> +
> +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
> +    for ( i = 0; i < nbars; )
> +    {
> +        uint64_t addr, size;
> +        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
> +        int rc = 1;
> +
> +        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
> +             PCI_BASE_ADDRESS_SPACE_MEMORY )
> +            goto next;
> +
> +        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
> +                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
> +        if ( rc < 0 )
> +            return;

This isn't very nice, but probably the best you can do. Might be worth
a comment, though.

> +        if ( size && !is_iomem_range(addr, size) )
> +        {
> +            /*
> +             * Return without enabling memory decoding if BAR overlaps with
> +             * RAM region.
> +             */
> +            printk(XENLOG_WARNING
> +                   "%pp: disabled: BAR%u [%" PRIx64 ", %" PRIx64
> +                   ") overlaps with RAM\n",

Mentioning "RAM" in comment and log message is potentially misleading
when considering what other types get also checked (irrespective of my
earlier comment). (Ftaod I don't mind the title using "RAM".)

Also please don't wrap the format string (also again for ROM below).

> @@ -399,8 +465,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>              break;
>      }
>  
> -    check_pdev(pdev);
>      apply_quirks(pdev);
> +    check_pdev(pdev);

I can't find the description say anything about this re-ordering. What's
the deal here?

Jan


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Roger Pau Monné 2 years, 3 months ago
On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
> On 26.01.2022 13:26, Roger Pau Monne wrote:
> > RFC because:
> >  - Not sure the best way to implement is_iomem_range on Arm. BARs can
> >    be quite big, so iterating over every possible page is not ideal.
> >  - is_iomem_page cannot be used for this purpose on x86, because all
> >    the low 1MB will return true due to belonging to dom_io.
> 
> I don't see an issue there - if you were to us it, you'd end up with
> the same scalability issue as you point out for Arm.
> 
> >  - VF BARs are not checked. Should we also check them and disable VF
> >    if overlaps in a followup patch?
> 
> Not sure about "followup patch", but once we support SR-IOV for PVH,
> that'll want checking, yes.
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
> >      return type ?: RAM_TYPE_UNKNOWN;
> >  }
> >  
> > +bool is_iomem_range(uint64_t start, uint64_t size)
> 
> Might be nice to have this sit next it is_iomem_page(). And wouldn't
> at least "start" better be paddr_t?

(paddr_t, size_t) would be better, but AFAICT size_t can be an
unsigned long and on Arm32 that won't be suitable for holding a 64bit
BAR size.

> > +{
> > +    unsigned int i;
> > +
> > +    for ( i = 0; i < e820.nr_map; i++ )
> > +    {
> > +        struct e820entry *entry = &e820.map[i];
> 
> const?
> 
> > +        if ( entry->type != E820_RAM && entry->type != E820_ACPI &&
> > +             entry->type != E820_NVS )
> > +            continue;
> 
> I think UNUSABLE also needs checking for here. Even further, it might
> be best to reject any range overlapping an E820 entry, since colliding
> with a RESERVED one could mean an overlap with e.g. MMCFG space.

Hm, I've though about adding UNUSABLE and RESERVED (and should have
added a note about this), but that might be too restrictive.

> > @@ -267,11 +270,74 @@ static void check_pdev(const struct pci_dev *pdev)
> >          }
> >          break;
> >  
> > +    case PCI_HEADER_TYPE_NORMAL:
> > +        nbars = PCI_HEADER_NORMAL_NR_BARS;
> > +        rom_pos = PCI_ROM_ADDRESS;
> > +        break;
> > +
> >      case PCI_HEADER_TYPE_CARDBUS:
> >          /* TODO */
> >          break;
> >      }
> >  #undef PCI_STATUS_CHECK
> > +
> > +    /* Check if BARs overlap with RAM regions. */
> > +    val = pci_conf_read16(pdev->sbdf, PCI_COMMAND);
> > +    if ( !(val & PCI_COMMAND_MEMORY) || pdev->ignore_bars )
> > +        return;
> > +
> > +    pci_conf_write16(pdev->sbdf, PCI_COMMAND, val & ~PCI_COMMAND_MEMORY);
> > +    for ( i = 0; i < nbars; )
> > +    {
> > +        uint64_t addr, size;
> > +        unsigned int reg = PCI_BASE_ADDRESS_0 + i * 4;
> > +        int rc = 1;
> > +
> > +        if ( (pci_conf_read32(pdev->sbdf, reg) & PCI_BASE_ADDRESS_SPACE) !=
> > +             PCI_BASE_ADDRESS_SPACE_MEMORY )
> > +            goto next;
> > +
> > +        rc = pci_size_mem_bar(pdev->sbdf, reg, &addr, &size,
> > +                              (i == nbars - 1) ? PCI_BAR_LAST : 0);
> > +        if ( rc < 0 )
> > +            return;
> 
> This isn't very nice, but probably the best you can do. Might be worth
> a comment, though.
> 
> > +        if ( size && !is_iomem_range(addr, size) )
> > +        {
> > +            /*
> > +             * Return without enabling memory decoding if BAR overlaps with
> > +             * RAM region.
> > +             */
> > +            printk(XENLOG_WARNING
> > +                   "%pp: disabled: BAR%u [%" PRIx64 ", %" PRIx64
> > +                   ") overlaps with RAM\n",
> 
> Mentioning "RAM" in comment and log message is potentially misleading
> when considering what other types get also checked (irrespective of my
> earlier comment). (Ftaod I don't mind the title using "RAM".)

Would the message below be better?

"%pp disabled: BAR%u [%" PRIx64 ", %" PRIx64 ") overlap detected\n"

> > @@ -399,8 +465,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
> >              break;
> >      }
> >  
> > -    check_pdev(pdev);
> >      apply_quirks(pdev);
> > +    check_pdev(pdev);
> 
> I can't find the description say anything about this re-ordering. What's
> the deal here?

Should have mentioned: this is required so that ignore_bars is set
prior to calling check_pdev.

Thanks, Roger.

Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Jan Beulich 2 years, 2 months ago
On 26.01.2022 16:45, Roger Pau Monné wrote:
> On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
>> On 26.01.2022 13:26, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
>>>      return type ?: RAM_TYPE_UNKNOWN;
>>>  }
>>>  
>>> +bool is_iomem_range(uint64_t start, uint64_t size)
>>
>> Might be nice to have this sit next it is_iomem_page(). And wouldn't
>> at least "start" better be paddr_t?
> 
> (paddr_t, size_t) would be better, but AFAICT size_t can be an
> unsigned long and on Arm32 that won't be suitable for holding a 64bit
> BAR size.

Talking of representing the range - a BAR occupying one part of a
page overlapping an E820 entry covering another part of that same
page is going to be equally bad, I think. The range may therefore
want expressing in page granularity. This may then be easier as
[start,end], at least on the calling side (can use PFN_DOWN()
twice then).

Jan


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Roger Pau Monné 2 years, 2 months ago
On Fri, Jan 28, 2022 at 07:48:44AM +0100, Jan Beulich wrote:
> On 26.01.2022 16:45, Roger Pau Monné wrote:
> > On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
> >> On 26.01.2022 13:26, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm.c
> >>> +++ b/xen/arch/x86/mm.c
> >>> @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
> >>>      return type ?: RAM_TYPE_UNKNOWN;
> >>>  }
> >>>  
> >>> +bool is_iomem_range(uint64_t start, uint64_t size)
> >>
> >> Might be nice to have this sit next it is_iomem_page(). And wouldn't
> >> at least "start" better be paddr_t?
> > 
> > (paddr_t, size_t) would be better, but AFAICT size_t can be an
> > unsigned long and on Arm32 that won't be suitable for holding a 64bit
> > BAR size.
> 
> Talking of representing the range - a BAR occupying one part of a
> page overlapping an E820 entry covering another part of that same
> page is going to be equally bad, I think. The range may therefore
> want expressing in page granularity. This may then be easier as
> [start,end], at least on the calling side (can use PFN_DOWN()
> twice then).

Yes, indeed, would likely be better. Then I can make both parameters
unsigned long.

I also think is_memory_hole would be a suitable name given the new
implementation?

Thanks, Roger.

Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Jan Beulich 2 years, 2 months ago
On 31.01.2022 10:57, Roger Pau Monné wrote:
> On Fri, Jan 28, 2022 at 07:48:44AM +0100, Jan Beulich wrote:
>> On 26.01.2022 16:45, Roger Pau Monné wrote:
>>> On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
>>>> On 26.01.2022 13:26, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/mm.c
>>>>> +++ b/xen/arch/x86/mm.c
>>>>> @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
>>>>>      return type ?: RAM_TYPE_UNKNOWN;
>>>>>  }
>>>>>  
>>>>> +bool is_iomem_range(uint64_t start, uint64_t size)
>>>>
>>>> Might be nice to have this sit next it is_iomem_page(). And wouldn't
>>>> at least "start" better be paddr_t?
>>>
>>> (paddr_t, size_t) would be better, but AFAICT size_t can be an
>>> unsigned long and on Arm32 that won't be suitable for holding a 64bit
>>> BAR size.
>>
>> Talking of representing the range - a BAR occupying one part of a
>> page overlapping an E820 entry covering another part of that same
>> page is going to be equally bad, I think. The range may therefore
>> want expressing in page granularity. This may then be easier as
>> [start,end], at least on the calling side (can use PFN_DOWN()
>> twice then).
> 
> Yes, indeed, would likely be better. Then I can make both parameters
> unsigned long.
> 
> I also think is_memory_hole would be a suitable name given the new
> implementation?

I'd be fine with this name, fwiw.

Jan


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Jan Beulich 2 years, 3 months ago
On 26.01.2022 16:45, Roger Pau Monné wrote:
> On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
>> On 26.01.2022 13:26, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -479,6 +479,26 @@ unsigned int page_get_ram_type(mfn_t mfn)
>>>      return type ?: RAM_TYPE_UNKNOWN;
>>>  }
>>>  
>>> +bool is_iomem_range(uint64_t start, uint64_t size)
>>
>> Might be nice to have this sit next it is_iomem_page(). And wouldn't
>> at least "start" better be paddr_t?
> 
> (paddr_t, size_t) would be better, but AFAICT size_t can be an
> unsigned long and on Arm32 that won't be suitable for holding a 64bit
> BAR size.

Indeed. We'd need a resource_size_t or alike.

>>> +{
>>> +    unsigned int i;
>>> +
>>> +    for ( i = 0; i < e820.nr_map; i++ )
>>> +    {
>>> +        struct e820entry *entry = &e820.map[i];
>>
>> const?
>>
>>> +        if ( entry->type != E820_RAM && entry->type != E820_ACPI &&
>>> +             entry->type != E820_NVS )
>>> +            continue;
>>
>> I think UNUSABLE also needs checking for here. Even further, it might
>> be best to reject any range overlapping an E820 entry, since colliding
>> with a RESERVED one could mean an overlap with e.g. MMCFG space.
> 
> Hm, I've though about adding UNUSABLE and RESERVED (and should have
> added a note about this), but that might be too restrictive.

Why (other than because of firmware bugs)?

>>> +        if ( size && !is_iomem_range(addr, size) )
>>> +        {
>>> +            /*
>>> +             * Return without enabling memory decoding if BAR overlaps with
>>> +             * RAM region.
>>> +             */
>>> +            printk(XENLOG_WARNING
>>> +                   "%pp: disabled: BAR%u [%" PRIx64 ", %" PRIx64
>>> +                   ") overlaps with RAM\n",
>>
>> Mentioning "RAM" in comment and log message is potentially misleading
>> when considering what other types get also checked (irrespective of my
>> earlier comment). (Ftaod I don't mind the title using "RAM".)
> 
> Would the message below be better?
> 
> "%pp disabled: BAR%u [%" PRIx64 ", %" PRIx64 ") overlap detected\n"

This or maybe

"%pp disabled: BAR%u [%" PRIx64 ", %" PRIx64 ") overlaps with memory map\n"

in particular if, on x86, we'd be checking for any E820 entry type.

>>> @@ -399,8 +465,8 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
>>>              break;
>>>      }
>>>  
>>> -    check_pdev(pdev);
>>>      apply_quirks(pdev);
>>> +    check_pdev(pdev);
>>
>> I can't find the description say anything about this re-ordering. What's
>> the deal here?
> 
> Should have mentioned: this is required so that ignore_bars is set
> prior to calling check_pdev.

Ah, I see. Makes sense, but indeed wants mentioning.

Jan


Re: [PATCH RFC] xen/pci: detect when BARs overlap RAM
Posted by Roger Pau Monné 2 years, 2 months ago
On Wed, Jan 26, 2022 at 05:09:56PM +0100, Jan Beulich wrote:
> On 26.01.2022 16:45, Roger Pau Monné wrote:
> > On Wed, Jan 26, 2022 at 03:08:28PM +0100, Jan Beulich wrote:
> >> On 26.01.2022 13:26, Roger Pau Monne wrote:
> >>> +        if ( entry->type != E820_RAM && entry->type != E820_ACPI &&
> >>> +             entry->type != E820_NVS )
> >>> +            continue;
> >>
> >> I think UNUSABLE also needs checking for here. Even further, it might
> >> be best to reject any range overlapping an E820 entry, since colliding
> >> with a RESERVED one could mean an overlap with e.g. MMCFG space.
> > 
> > Hm, I've though about adding UNUSABLE and RESERVED (and should have
> > added a note about this), but that might be too restrictive.
> 
> Why (other than because of firmware bugs)?

Handling overlaps with those wasn't strictly needed to solve the issue
at hand, but I didn't take into account that an overlap with a MMCFG
region would be equally bad. I've extended the check to cover any
E820 entry.

> >>> +        if ( size && !is_iomem_range(addr, size) )
> >>> +        {
> >>> +            /*
> >>> +             * Return without enabling memory decoding if BAR overlaps with
> >>> +             * RAM region.
> >>> +             */
> >>> +            printk(XENLOG_WARNING
> >>> +                   "%pp: disabled: BAR%u [%" PRIx64 ", %" PRIx64
> >>> +                   ") overlaps with RAM\n",
> >>
> >> Mentioning "RAM" in comment and log message is potentially misleading
> >> when considering what other types get also checked (irrespective of my
> >> earlier comment). (Ftaod I don't mind the title using "RAM".)
> > 
> > Would the message below be better?
> > 
> > "%pp disabled: BAR%u [%" PRIx64 ", %" PRIx64 ") overlap detected\n"
> 
> This or maybe
> 
> "%pp disabled: BAR%u [%" PRIx64 ", %" PRIx64 ") overlaps with memory map\n"
> 
> in particular if, on x86, we'd be checking for any E820 entry type.

Done.

Thanks, Roger.