[PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope

Roger Pau Monne posted 3 patches 1 year ago
There is a newer version of this series
[PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Roger Pau Monne 1 year ago
The PCI segment value is limited to 16 bits, however there are buses like VMD
that fake being part of the PCI topology by adding segment with a number
outside the scope of the PCI firmware specification range (>= 0x10000). The
MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
width.

Attempting to register or manage those devices with Xen would result in errors
at best, or overlaps with existing devices living on the truncated equivalent
segment values.

Skip notifying Xen about those devices.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 drivers/xen/pci.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 416f231809cb..08e82fd1263e 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -43,6 +43,13 @@ static int xen_add_device(struct device *dev)
 		pci_mcfg_reserved = true;
 	}
 #endif
+
+	if (pci_domain_nr(pci_dev->bus) >> 16) {
+		dev_info(dev,
+			 "not registering with Xen: invalid PCI segment\n");
+		return 0;
+	}
+
 	if (pci_seg_supported) {
 		DEFINE_RAW_FLEX(struct physdev_pci_device_add, add, optarr, 1);
 
@@ -149,6 +156,12 @@ static int xen_remove_device(struct device *dev)
 	int r;
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 
+	if (pci_domain_nr(pci_dev->bus) >> 16) {
+		dev_info(dev,
+			 "not unregistering with Xen: invalid PCI segment\n");
+		return 0;
+	}
+
 	if (pci_seg_supported) {
 		struct physdev_pci_device device = {
 			.seg = pci_domain_nr(pci_dev->bus),
@@ -182,6 +195,12 @@ int xen_reset_device(const struct pci_dev *dev)
 		.flags = PCI_DEVICE_RESET_FLR,
 	};
 
+	if (pci_domain_nr(dev->bus) >> 16) {
+		dev_info(&dev->dev,
+			 "unable to notify Xen of device reset: invalid PCI segment\n");
+		return 0;
+	}
+
 	return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device);
 }
 EXPORT_SYMBOL_GPL(xen_reset_device);
-- 
2.46.0


Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Jan Beulich 1 year ago
On 10.01.2025 15:01, Roger Pau Monne wrote:
> The PCI segment value is limited to 16 bits, however there are buses like VMD
> that fake being part of the PCI topology by adding segment with a number
> outside the scope of the PCI firmware specification range (>= 0x10000). The
> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> width.
> 
> Attempting to register or manage those devices with Xen would result in errors
> at best, or overlaps with existing devices living on the truncated equivalent
> segment values.
> 
> Skip notifying Xen about those devices.

Hmm, is simply omitting the notification really all it takes? How would Xen
manage MSI / MSI-X, for example, without knowing of the device? As per the
BoF on the summit in Prague(?), I continue to think we need partial driver
logic in Xen for VMD ...

Jan
Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Roger Pau Monné 1 year ago
On Mon, Jan 13, 2025 at 08:17:23AM +0100, Jan Beulich wrote:
> On 10.01.2025 15:01, Roger Pau Monne wrote:
> > The PCI segment value is limited to 16 bits, however there are buses like VMD
> > that fake being part of the PCI topology by adding segment with a number
> > outside the scope of the PCI firmware specification range (>= 0x10000). The
> > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> > width.
> > 
> > Attempting to register or manage those devices with Xen would result in errors
> > at best, or overlaps with existing devices living on the truncated equivalent
> > segment values.
> > 
> > Skip notifying Xen about those devices.
> 
> Hmm, is simply omitting the notification really all it takes? How would Xen
> manage MSI / MSI-X, for example, without knowing of the device? As per the
> BoF on the summit in Prague(?), I continue to think we need partial driver
> logic in Xen for VMD ...

The basic mode of operation of devices behind a VMD bridge is to
reference the interrupts of the bridge device in its MSI(-X) entries,
so the VMD bridge acts as a de-multiplexer and forwards the interrupts
to the device behind the VMD bridge.  See vmd_alloc_irqs() (and
calling context) in the VMD driver for a reference about how this is
setup and operates.  Note also that devices behind a VMD bridge
operate using a different MSI domain, that uses a custom
irq_compose_msi_msg hook.

Thanks, Roger.
Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Bjorn Helgaas 1 year ago
On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> The PCI segment value is limited to 16 bits, however there are buses like VMD
> that fake being part of the PCI topology by adding segment with a number
> outside the scope of the PCI firmware specification range (>= 0x10000). The
> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> width.
>
> Attempting to register or manage those devices with Xen would result in errors
> at best, or overlaps with existing devices living on the truncated equivalent
> segment values.

The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
16-bit values.

But otherwise, the segment value is pretty much an arbitrary software
value, and the kernel works fine with the larger domain values from
vmd_find_free_domain(), so this isn't quite enough to explain what the
issue with Xen is.

Does Xen truncate the domain to 16 bits or use it to look up something
in ACPI?

> Skip notifying Xen about those devices.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  drivers/xen/pci.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
> index 416f231809cb..08e82fd1263e 100644
> --- a/drivers/xen/pci.c
> +++ b/drivers/xen/pci.c
> @@ -43,6 +43,13 @@ static int xen_add_device(struct device *dev)
>  		pci_mcfg_reserved = true;
>  	}
>  #endif
> +
> +	if (pci_domain_nr(pci_dev->bus) >> 16) {
> +		dev_info(dev,
> +			 "not registering with Xen: invalid PCI segment\n");
> +		return 0;
> +	}
> +
>  	if (pci_seg_supported) {
>  		DEFINE_RAW_FLEX(struct physdev_pci_device_add, add, optarr, 1);
>  
> @@ -149,6 +156,12 @@ static int xen_remove_device(struct device *dev)
>  	int r;
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  
> +	if (pci_domain_nr(pci_dev->bus) >> 16) {
> +		dev_info(dev,
> +			 "not unregistering with Xen: invalid PCI segment\n");
> +		return 0;
> +	}
> +
>  	if (pci_seg_supported) {
>  		struct physdev_pci_device device = {
>  			.seg = pci_domain_nr(pci_dev->bus),
> @@ -182,6 +195,12 @@ int xen_reset_device(const struct pci_dev *dev)
>  		.flags = PCI_DEVICE_RESET_FLR,
>  	};
>  
> +	if (pci_domain_nr(dev->bus) >> 16) {
> +		dev_info(&dev->dev,
> +			 "unable to notify Xen of device reset: invalid PCI segment\n");
> +		return 0;
> +	}
> +
>  	return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_reset, &device);
>  }
>  EXPORT_SYMBOL_GPL(xen_reset_device);
> -- 
> 2.46.0
> 

Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Roger Pau Monné 1 year ago
On Fri, Jan 10, 2025 at 04:21:29PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> > The PCI segment value is limited to 16 bits, however there are buses like VMD
> > that fake being part of the PCI topology by adding segment with a number
> > outside the scope of the PCI firmware specification range (>= 0x10000). The
> > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> > width.
> >
> > Attempting to register or manage those devices with Xen would result in errors
> > at best, or overlaps with existing devices living on the truncated equivalent
> > segment values.
> 
> The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> 16-bit values.
> 
> But otherwise, the segment value is pretty much an arbitrary software
> value, and the kernel works fine with the larger domain values from
> vmd_find_free_domain(), so this isn't quite enough to explain what the
> issue with Xen is.
> 
> Does Xen truncate the domain to 16 bits or use it to look up something
> in ACPI?

In the interface between Xen and Linux the segment field is 16 bit
width, so with the current interface is not possible to reference
devices that are past the 0xffff segment.

I also wonder whether Xen and Linux (or guest OSes in general) would
agree on how to reference such devices.  AFAICT VMD segment numbers
are purely a software construct, but not something enforced by the
specification.  Could for example FreeBSD assign a different segment
to VMD devices?

If so we would need some kind of specification about how VMD segment
values are assigned so that OSes have a coherent way of referencing
VMD devices without ambiguity.

Thanks, Roger.
Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Bjorn Helgaas 1 year ago
On Mon, Jan 13, 2025 at 11:18:57AM +0100, Roger Pau Monné wrote:
> On Fri, Jan 10, 2025 at 04:21:29PM -0600, Bjorn Helgaas wrote:
> > On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
> > > The PCI segment value is limited to 16 bits, however there are buses like VMD
> > > that fake being part of the PCI topology by adding segment with a number
> > > outside the scope of the PCI firmware specification range (>= 0x10000). The
> > > MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
> > > width.
> > >
> > > Attempting to register or manage those devices with Xen would result in errors
> > > at best, or overlaps with existing devices living on the truncated equivalent
> > > segment values.
> > 
> > The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> > value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> > 16-bit values.
> > 
> > But otherwise, the segment value is pretty much an arbitrary software
> > value, and the kernel works fine with the larger domain values from
> > vmd_find_free_domain(), so this isn't quite enough to explain what the
> > issue with Xen is.
> > 
> > Does Xen truncate the domain to 16 bits or use it to look up something
> > in ACPI?
> 
> In the interface between Xen and Linux the segment field is 16 bit
> width, so with the current interface is not possible to reference
> devices that are past the 0xffff segment.

I think this specific reason (and maybe even struct
physdev_pci_device_add) would be more useful than the ACPI _SEG and
MCFG things, which are not as directly connected here.

Bjorn

Re: [PATCH 1/3] xen/pci: do not register devices outside of PCI segment scope
Posted by Jan Beulich 1 year ago
On 10.01.2025 23:21, Bjorn Helgaas wrote:
> On Fri, Jan 10, 2025 at 03:01:48PM +0100, Roger Pau Monne wrote:
>> The PCI segment value is limited to 16 bits, however there are buses like VMD
>> that fake being part of the PCI topology by adding segment with a number
>> outside the scope of the PCI firmware specification range (>= 0x10000). The
>> MCFG ACPI Table "PCI Segment Group Number" field is defined as having a 16 bit
>> width.
>>
>> Attempting to register or manage those devices with Xen would result in errors
>> at best, or overlaps with existing devices living on the truncated equivalent
>> segment values.
> 
> The ACPI _SEG method (ACPI r6.5, sec 6.5.6) and the corresponding
> value in the MCFG table (PCI Firmware r3.3, sec 4.1.2) are clearly
> 16-bit values.
> 
> But otherwise, the segment value is pretty much an arbitrary software
> value, and the kernel works fine with the larger domain values from
> vmd_find_free_domain(), so this isn't quite enough to explain what the
> issue with Xen is.
> 
> Does Xen truncate the domain to 16 bits or use it to look up something
> in ACPI?

One of the involved public interface structs starts like this:

struct physdev_pci_device_add {
    /* IN */
    uint16_t seg;
    uint8_t bus;
    uint8_t devfn;
    ...

So yes, wider segment values would be truncated. Plus, even if they weren't,
there would need to be coordination between Dom0 and Xen on which devices
gets which segment number, since - as you say - the assignment in Linux is
pretty much arbitrary.

Jan