[PATCH v9 1/9] PCI: Allow per function PCI slots

Farhan Ali posted 9 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH v9 1/9] PCI: Allow per function PCI slots
Posted by Farhan Ali 1 month, 2 weeks ago
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.

Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. When attempting to reset a function
through the hotplug driver, the shared slot assignment causes the wrong
function to be reset instead of the intended one. It also leaks memory as
we do create a pci_slot object for the function, but don't correctly free
it in pci_slot_release().

Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology.

Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Cc: stable@vger.kernel.org
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c   |  5 +++--
 drivers/pci/slot.c  | 25 ++++++++++++++++++++++---
 include/linux/pci.h |  1 +
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f3244630bfd0..3090c727b76f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4869,8 +4869,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
 
 static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
 {
-	if (dev->multifunction || dev->subordinate || !dev->slot ||
-	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+	if (dev->subordinate || !dev->slot ||
+	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+	    (dev->multifunction && !dev->slot->per_func_slot))
 		return -ENOTTY;
 
 	return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe..ed10fa3ae727 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -63,6 +63,22 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
 	return bus_speed_read(slot->bus->cur_bus_speed, buf);
 }
 
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+	if (slot->per_func_slot)
+		return dev->devfn == slot->number;
+
+	return PCI_SLOT(dev->devfn) == slot->number;
+}
+
+static bool pci_slot_enabled_per_func(void)
+{
+	if (IS_ENABLED(CONFIG_S390))
+		return true;
+
+	return false;
+}
+
 static void pci_slot_release(struct kobject *kobj)
 {
 	struct pci_dev *dev;
@@ -73,7 +89,7 @@ static void pci_slot_release(struct kobject *kobj)
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &slot->bus->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = NULL;
 	up_read(&pci_bus_sem);
 
@@ -166,7 +182,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
 
 	mutex_lock(&pci_slot_mutex);
 	list_for_each_entry(slot, &dev->bus->slots, list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	mutex_unlock(&pci_slot_mutex);
 }
@@ -265,6 +281,9 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 	slot->bus = pci_bus_get(parent);
 	slot->number = slot_nr;
 
+	if (pci_slot_enabled_per_func())
+		slot->per_func_slot = 1;
+
 	slot->kobj.kset = pci_slots_kset;
 
 	slot_name = make_slot_name(name);
@@ -285,7 +304,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &parent->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot_nr)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	up_read(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1c270f1d5123..a9975d0e104f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,6 +78,7 @@ struct pci_slot {
 	struct list_head	list;		/* Node in list of slots */
 	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
 	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
+	unsigned int		per_func_slot:1; /* Allow per function slot */
 	struct kobject		kobj;
 };
 
-- 
2.43.0
Re: [PATCH v9 1/9] PCI: Allow per function PCI slots
Posted by Keith Busch 1 month, 2 weeks ago
On Tue, Feb 17, 2026 at 10:22:49AM -0800, Farhan Ali wrote:
> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> to multifunction devices. This approach worked fine on s390 systems that
> only exposed virtual functions as individual PCI domains to the operating
> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> s390 supports exposing the topology of multifunction PCI devices by
> grouping them in a shared PCI domain. When attempting to reset a function
> through the hotplug driver, the shared slot assignment causes the wrong
> function to be reset instead of the intended one. It also leaks memory as
> we do create a pci_slot object for the function, but don't correctly free
> it in pci_slot_release().

I think leakage is because s390 is passing in devfn when pci_create_slot
is expecting devnr, so things are not getting matched and assigned as
expected.

If you're able to make this change, it will clash with one existing
thing, and another proposal I've got at v5 now(*). Specifically, this
would allow all 8 bits to be used for the pci_slot 'number' when it's
currently expected to be limited to 5 bits. 0xff is special, and I'm
proposing another special value. If we are going to allow the slot
numbers to use all 8 bits, I think we need to change the type from u8 to
u16 so that there is space to encode such special values.

 * https://lore.kernel.org/linux-pci/20260217160836.2709885-3-kbusch@meta.com/
Re: [PATCH v9 1/9] PCI: Allow per function PCI slots
Posted by Farhan Ali 1 month, 2 weeks ago
On 2/17/2026 1:14 PM, Keith Busch wrote:
> On Tue, Feb 17, 2026 at 10:22:49AM -0800, Farhan Ali wrote:
>> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
>> to multifunction devices. This approach worked fine on s390 systems that
>> only exposed virtual functions as individual PCI domains to the operating
>> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> s390 supports exposing the topology of multifunction PCI devices by
>> grouping them in a shared PCI domain. When attempting to reset a function
>> through the hotplug driver, the shared slot assignment causes the wrong
>> function to be reset instead of the intended one. It also leaks memory as
>> we do create a pci_slot object for the function, but don't correctly free
>> it in pci_slot_release().
> I think leakage is because s390 is passing in devfn when pci_create_slot
> is expecting devnr, so things are not getting matched and assigned as
> expected.
>
> If you're able to make this change, it will clash with one existing
> thing, and another proposal I've got at v5 now(*). Specifically, this
> would allow all 8 bits to be used for the pci_slot 'number' when it's
> currently expected to be limited to 5 bits. 0xff is special, and I'm
> proposing another special value. If we are going to allow the slot
> numbers to use all 8 bits, I think we need to change the type from u8 to
> u16 so that there is space to encode such special values.
>
>   * https://lore.kernel.org/linux-pci/20260217160836.2709885-3-kbusch@meta.com/

I am open to suggestions on how we can better model a pci_slot per 
function. And yeah, I think it makes sense to update the pci_slot 
'number' to u16.

Thanks

Farhan
Re: [PATCH v9 1/9] PCI: Allow per function PCI slots
Posted by Niklas Schnelle 1 month, 1 week ago
On Tue, 2026-02-17 at 14:26 -0800, Farhan Ali wrote:
> On 2/17/2026 1:14 PM, Keith Busch wrote:
> > On Tue, Feb 17, 2026 at 10:22:49AM -0800, Farhan Ali wrote:
> > > Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> > > to multifunction devices. This approach worked fine on s390 systems that
> > > only exposed virtual functions as individual PCI domains to the operating
> > > system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> > > s390 supports exposing the topology of multifunction PCI devices by
> > > grouping them in a shared PCI domain. When attempting to reset a function
> > > through the hotplug driver, the shared slot assignment causes the wrong
> > > function to be reset instead of the intended one. It also leaks memory as
> > > we do create a pci_slot object for the function, but don't correctly free
> > > it in pci_slot_release().
> > I think leakage is because s390 is passing in devfn when pci_create_slot
> > is expecting devnr, so things are not getting matched and assigned as
> > expected.
> > 
> > If you're able to make this change, it will clash with one existing
> > thing, and another proposal I've got at v5 now(*). Specifically, this
> > would allow all 8 bits to be used for the pci_slot 'number' when it's
> > currently expected to be limited to 5 bits. 0xff is special, and I'm
> > proposing another special value. If we are going to allow the slot
> > numbers to use all 8 bits, I think we need to change the type from u8 to
> > u16 so that there is space to encode such special values.
> > 
> >   * https://lore.kernel.org/linux-pci/20260217160836.2709885-3-kbusch@meta.com/
> 
> I am open to suggestions on how we can better model a pci_slot per 
> function. And yeah, I think it makes sense to update the pci_slot 
> 'number' to u16.
> 
> Thanks
> 
> Farhan

It may give some better context to emphasize that s390 has been using a
per PCI function hotplug slot model since commit 7441b0627e22
("s390/pci: PCI hotplug support via SCLP") from 2012. Quoting the
commit description here:

   "The hotplug controller creates a slot for every PCI function in 
   stand-by or configured state. The PCI functions are named after
   the PCI function ID (fid). By writing to the power attribute 
   in /sys/bus/pci/slots/<fid>/power the PCI function
   is moved to stand-by or configured state."

So this isn't a new concept or something we can really change since
users and documentation relies on this for over a decade. It's only
that for the longest time the mismatch between the hotplug slot and the
pci slot went unnoticed because it really only causes trouble in very
specific circumstances even after we've started exposing the devfn part
of the geographic PCI addresses for SR-IOV capable multi-function
devices e.g. for the two PFs of ConnectX series NICs.

Thanks,
Niklas