[PATCH v2 1/2] PCI: Allow per function PCI slots

Farhan Ali posted 2 patches 3 months, 2 weeks ago
[PATCH v2 1/2] PCI: Allow per function PCI slots
Posted by Farhan Ali 3 months, 2 weeks ago
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.

Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. When attempting to reset a function
through the hotplug driver, the shared slot assignment causes the wrong
function to be reset instead of the intended one. It also leaks memory as
we do create a pci_slot object for the function, but don't correctly free
it in pci_slot_release().

Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology.

Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable@vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c   |  5 +++--
 drivers/pci/slot.c  | 25 ++++++++++++++++++++++---
 include/linux/pci.h |  1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..36ee38e0d817 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4980,8 +4980,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
 
 static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
 {
-	if (dev->multifunction || dev->subordinate || !dev->slot ||
-	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+	if (dev->subordinate || !dev->slot ||
+	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+	    (dev->multifunction && !dev->slot->per_func_slot))
 		return -ENOTTY;
 
 	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe..ed10fa3ae727 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -63,6 +63,22 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
 	return bus_speed_read(slot->bus->cur_bus_speed, buf);
 }
 
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+	if (slot->per_func_slot)
+		return dev->devfn == slot->number;
+
+	return PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+	if (IS_ENABLED(CONFIG_S390))
+		return true;
+
+	return false;
+}
+
 static void pci_slot_release(struct kobject *kobj)
 {
 	struct pci_dev *dev;
@@ -73,7 +89,7 @@ static void pci_slot_release(struct kobject *kobj)
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &slot->bus->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = NULL;
 	up_read(&pci_bus_sem);
 
@@ -166,7 +182,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
 
 	mutex_lock(&pci_slot_mutex);
 	list_for_each_entry(slot, &dev->bus->slots, list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	mutex_unlock(&pci_slot_mutex);
 }
@@ -265,6 +281,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 	slot->bus = pci_bus_get(parent);
 	slot->number = slot_nr;
 
+	if (pci_slot_enabled_per_func())
+		slot->per_func_slot = 1;
+
 	slot->kobj.kset = pci_slots_kset;
 
 	slot_name = make_slot_name(name);
@@ -285,7 +304,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &parent->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot_nr)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	up_read(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d1fdf81fbe1e..6ad194597ab5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,6 +78,7 @@ struct pci_slot {
 	struct list_head	list;		/* Node in list of slots */
 	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
 	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
+	unsigned int		per_func_slot:1; /* Allow per function slot */
 	struct kobject		kobj;
 };
 
-- 
2.43.0
Re: [PATCH v2 1/2] PCI: Allow per function PCI slots
Posted by Niklas Schnelle 3 months, 1 week ago
On Wed, 2025-10-22 at 14:24 -0700, Farhan Ali wrote:
> On s390 systems, which use a machine level hypervisor, PCI devices are
> always accessed through a form of PCI pass-through which fundamentally
> operates on a per PCI function granularity. This is also reflected in the
> s390 PCI hotplug driver which creates hotplug slots for individual PCI
> functions. Its reset_slot() function, which is a wrapper for
> zpci_hot_reset_device(), thus also resets individual functions.
> 
> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> to multifunction devices. This approach worked fine on s390 systems that
> only exposed virtual functions as individual PCI domains to the operating
> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> s390 supports exposing the topology of multifunction PCI devices by
> grouping them in a shared PCI domain. When attempting to reset a function
> through the hotplug driver, the shared slot assignment causes the wrong
> function to be reset instead of the intended one. It also leaks memory as
> we do create a pci_slot object for the function, but don't correctly free
> it in pci_slot_release().
> 
> Add a flag for struct pci_slot to allow per function PCI slots for
> functions managed through a hypervisor, which exposes individual PCI
> functions while retaining the topology.

I wonder if LoongArch which now also does per PCI function pass-through
might need this too. Adding their KVM maintainers.

> 
> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> Cc: stable@vger.kernel.org
> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/pci/pci.c   |  5 +++--
>  drivers/pci/slot.c  | 25 ++++++++++++++++++++++---
>  include/linux/pci.h |  1 +
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b14dd064006c..36ee38e0d817 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4980,8 +4980,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>  
>  static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>  {
> -	if (dev->multifunction || dev->subordinate || !dev->slot ||
> -	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
> +	if (dev->subordinate || !dev->slot ||
> +	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
> +	    (dev->multifunction && !dev->slot->per_func_slot))
>  		return -ENOTTY;
>  
>  	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 50fb3eb595fe..ed10fa3ae727 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -63,6 +63,22 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
>  	return bus_speed_read(slot->bus->cur_bus_speed, buf);
>  }
>  
> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
> +{
> +	if (slot->per_func_slot)
> +		return dev->devfn == slot->number;
> +
> +	return PCI_SLOT(dev->devfn) == slot->number;
> +}
> +
> +static bool pci_slot_enabled_per_func(void)
> +{
> +	if (IS_ENABLED(CONFIG_S390))
> +		return true;
> +
> +	return false;
> +}
> +
--- snip ---
>  
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index d1fdf81fbe1e..6ad194597ab5 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -78,6 +78,7 @@ struct pci_slot {
>  	struct list_head	list;		/* Node in list of slots */
>  	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
>  	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
> +	unsigned int		per_func_slot:1; /* Allow per function slot */
>  	struct kobject		kobj;
>  };
>  

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Re: [PATCH v2 1/2] PCI: Allow per function PCI slots
Posted by Bibo Mao 3 months, 1 week ago

On 2025/10/27 下午8:29, Niklas Schnelle wrote:
> On Wed, 2025-10-22 at 14:24 -0700, Farhan Ali wrote:
>> On s390 systems, which use a machine level hypervisor, PCI devices are
>> always accessed through a form of PCI pass-through which fundamentally
>> operates on a per PCI function granularity. This is also reflected in the
>> s390 PCI hotplug driver which creates hotplug slots for individual PCI
>> functions. Its reset_slot() function, which is a wrapper for
>> zpci_hot_reset_device(), thus also resets individual functions.
>>
>> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
>> to multifunction devices. This approach worked fine on s390 systems that
>> only exposed virtual functions as individual PCI domains to the operating
>> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> s390 supports exposing the topology of multifunction PCI devices by
>> grouping them in a shared PCI domain. When attempting to reset a function
>> through the hotplug driver, the shared slot assignment causes the wrong
>> function to be reset instead of the intended one. It also leaks memory as
>> we do create a pci_slot object for the function, but don't correctly free
>> it in pci_slot_release().
>>
>> Add a flag for struct pci_slot to allow per function PCI slots for
>> functions managed through a hypervisor, which exposes individual PCI
>> functions while retaining the topology.
> 
> I wonder if LoongArch which now also does per PCI function pass-through
> might need this too. Adding their KVM maintainers.

Hi Niklas,

Thanks for your reminder. Yes, LoongArch do per PCI function 
pass-throught. In theory, function pci_slot_enabled_per_func() should 
return true on LoongArch also. Only that now IOMMU driver is not merged, 
there is no way to test it, however we will write down this as a note 
inside about this issue and verify it once IOMMU driver is merged.


Regards
Bibo Mao
> 
>>
>> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> Cc: stable@vger.kernel.org
>> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/pci/pci.c   |  5 +++--
>>   drivers/pci/slot.c  | 25 ++++++++++++++++++++++---
>>   include/linux/pci.h |  1 +
>>   3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index b14dd064006c..36ee38e0d817 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4980,8 +4980,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>>   
>>   static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>>   {
>> -	if (dev->multifunction || dev->subordinate || !dev->slot ||
>> -	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
>> +	if (dev->subordinate || !dev->slot ||
>> +	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
>> +	    (dev->multifunction && !dev->slot->per_func_slot))
>>   		return -ENOTTY;
>>   
>>   	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
>> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
>> index 50fb3eb595fe..ed10fa3ae727 100644
>> --- a/drivers/pci/slot.c
>> +++ b/drivers/pci/slot.c
>> @@ -63,6 +63,22 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
>>   	return bus_speed_read(slot->bus->cur_bus_speed, buf);
>>   }
>>   
>> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
>> +{
>> +	if (slot->per_func_slot)
>> +		return dev->devfn == slot->number;
>> +
>> +	return PCI_SLOT(dev->devfn) == slot->number;
>> +}
>> +
>> +static bool pci_slot_enabled_per_func(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_S390))
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
> --- snip ---
>>   
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index d1fdf81fbe1e..6ad194597ab5 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -78,6 +78,7 @@ struct pci_slot {
>>   	struct list_head	list;		/* Node in list of slots */
>>   	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
>>   	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
>> +	unsigned int		per_func_slot:1; /* Allow per function slot */
>>   	struct kobject		kobj;
>>   };
>>   
> 
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 

Re: [PATCH v2 1/2] PCI: Allow per function PCI slots
Posted by Niklas Schnelle 3 months, 1 week ago
On Tue, 2025-10-28 at 09:11 +0800, Bibo Mao wrote:
> 
> On 2025/10/27 下午8:29, Niklas Schnelle wrote:
> > On Wed, 2025-10-22 at 14:24 -0700, Farhan Ali wrote:
> > > On s390 systems, which use a machine level hypervisor, PCI devices are
> > > always accessed through a form of PCI pass-through which fundamentally
> > > operates on a per PCI function granularity. This is also reflected in the
> > > s390 PCI hotplug driver which creates hotplug slots for individual PCI
> > > functions. Its reset_slot() function, which is a wrapper for
> > > zpci_hot_reset_device(), thus also resets individual functions.
> > > 
> > > Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> > > to multifunction devices. This approach worked fine on s390 systems that
> > > only exposed virtual functions as individual PCI domains to the operating
> > > system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> > > s390 supports exposing the topology of multifunction PCI devices by
> > > grouping them in a shared PCI domain. When attempting to reset a function
> > > through the hotplug driver, the shared slot assignment causes the wrong
> > > function to be reset instead of the intended one. It also leaks memory as
> > > we do create a pci_slot object for the function, but don't correctly free
> > > it in pci_slot_release().
> > > 
> > > Add a flag for struct pci_slot to allow per function PCI slots for
> > > functions managed through a hypervisor, which exposes individual PCI
> > > functions while retaining the topology.
> > 
> > I wonder if LoongArch which now also does per PCI function pass-through
> > might need this too. Adding their KVM maintainers.
> 
> Hi Niklas,
> 
> Thanks for your reminder. Yes, LoongArch do per PCI function 
> pass-throught. In theory, function pci_slot_enabled_per_func() should 
> return true on LoongArch also. Only that now IOMMU driver is not merged, 
> there is no way to test it, however we will write down this as a note 
> inside about this issue and verify it once IOMMU driver is merged.
> 
> 
> Regards
> Bibo Mao
> 

Hi Bibo,

Is your ability to test if it works for you also hindered for my other
patch touching pci_scan_slot()? It sounded like Huacai was looking into
testing that.

Thanks,
Niklas