[PATCH v3 03/10] PCI: Allow per function PCI slots

Farhan Ali posted 10 patches 3 weeks ago
There is a newer version of this series
[PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Farhan Ali 3 weeks ago
On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.

Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. When attempting to reset a function
through the hotplug driver, the shared slot assignment causes the wrong
function to be reset instead of the intended one. It also leaks memory as
we do create a pci_slot object for the function, but don't correctly free
it in pci_slot_release().

Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology.

Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++--
 drivers/pci/pci.c                  |  4 +++-
 drivers/pci/slot.c                 | 14 +++++++++++---
 include/linux/pci.h                |  1 +
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
index d9996516f49e..8b547de464bf 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -126,14 +126,20 @@ static const struct hotplug_slot_ops s390_hotplug_slot_ops = {
 
 int zpci_init_slot(struct zpci_dev *zdev)
 {
+	int ret;
 	char name[SLOT_NAME_SIZE];
 	struct zpci_bus *zbus = zdev->zbus;
 
 	zdev->hotplug_slot.ops = &s390_hotplug_slot_ops;
 
 	snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
-	return pci_hp_register(&zdev->hotplug_slot, zbus->bus,
-			       zdev->devfn, name);
+	ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus,
+				zdev->devfn, name);
+	if (ret)
+		return ret;
+
+	zdev->hotplug_slot.pci_slot->per_func_slot = 1;
+	return 0;
 }
 
 void zpci_exit_slot(struct zpci_dev *zdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3994fa82df68..70296d3b1cfc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
 
 static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
 {
-	if (dev->multifunction || dev->subordinate || !dev->slot ||
+	if (dev->multifunction && !dev->slot->per_func_slot)
+		return -ENOTTY;
+	if (dev->subordinate || !dev->slot ||
 	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
 		return -ENOTTY;
 
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe..51ee59e14393 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
 	return bus_speed_read(slot->bus->cur_bus_speed, buf);
 }
 
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+	if (slot->per_func_slot)
+		return dev->devfn == slot->number;
+
+	return PCI_SLOT(dev->devfn) == slot->number;
+}
+
 static void pci_slot_release(struct kobject *kobj)
 {
 	struct pci_dev *dev;
@@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj)
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &slot->bus->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = NULL;
 	up_read(&pci_bus_sem);
 
@@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
 
 	mutex_lock(&pci_slot_mutex);
 	list_for_each_entry(slot, &dev->bus->slots, list)
-		if (PCI_SLOT(dev->devfn) == slot->number)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	mutex_unlock(&pci_slot_mutex);
 }
@@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
 
 	down_read(&pci_bus_sem);
 	list_for_each_entry(dev, &parent->devices, bus_list)
-		if (PCI_SLOT(dev->devfn) == slot_nr)
+		if (pci_dev_matches_slot(dev, slot))
 			dev->slot = slot;
 	up_read(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 59876de13860..9265f32d9786 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,6 +78,7 @@ struct pci_slot {
 	struct list_head	list;		/* Node in list of slots */
 	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
 	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
+	unsigned int		per_func_slot:1; /* Allow per function slot */
 	struct kobject		kobj;
 };
 
-- 
2.43.0
Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Cédric Le Goater 2 weeks, 2 days ago
Hello Ali,

On 9/11/25 20:33, Farhan Ali wrote:
> On s390 systems, which use a machine level hypervisor, PCI devices are
> always accessed through a form of PCI pass-through which fundamentally
> operates on a per PCI function granularity. This is also reflected in the
> s390 PCI hotplug driver which creates hotplug slots for individual PCI
> functions. Its reset_slot() function, which is a wrapper for
> zpci_hot_reset_device(), thus also resets individual functions.
> 
> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> to multifunction devices. This approach worked fine on s390 systems that
> only exposed virtual functions as individual PCI domains to the operating
> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> s390 supports exposing the topology of multifunction PCI devices by
> grouping them in a shared PCI domain. When attempting to reset a function
> through the hotplug driver, the shared slot assignment causes the wrong
> function to be reset instead of the intended one. It also leaks memory as
> we do create a pci_slot object for the function, but don't correctly free
> it in pci_slot_release().
> 
> Add a flag for struct pci_slot to allow per function PCI slots for
> functions managed through a hypervisor, which exposes individual PCI
> functions while retaining the topology.
> 
> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>   drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++--
>   drivers/pci/pci.c                  |  4 +++-
>   drivers/pci/slot.c                 | 14 +++++++++++---
>   include/linux/pci.h                |  1 +
>   4 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
> index d9996516f49e..8b547de464bf 100644
> --- a/drivers/pci/hotplug/s390_pci_hpc.c
> +++ b/drivers/pci/hotplug/s390_pci_hpc.c
> @@ -126,14 +126,20 @@ static const struct hotplug_slot_ops s390_hotplug_slot_ops = {
>   
>   int zpci_init_slot(struct zpci_dev *zdev)
>   {
> +	int ret;
>   	char name[SLOT_NAME_SIZE];
>   	struct zpci_bus *zbus = zdev->zbus;
>   
>   	zdev->hotplug_slot.ops = &s390_hotplug_slot_ops;
>   
>   	snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
> -	return pci_hp_register(&zdev->hotplug_slot, zbus->bus,
> -			       zdev->devfn, name);
> +	ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus,
> +				zdev->devfn, name);
> +	if (ret)
> +		return ret;
> +
> +	zdev->hotplug_slot.pci_slot->per_func_slot = 1;
> +	return 0;
>   }
>   
>   void zpci_exit_slot(struct zpci_dev *zdev)
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 3994fa82df68..70296d3b1cfc 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
>   
>   static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>   {
> -	if (dev->multifunction || dev->subordinate || !dev->slot ||
> +	if (dev->multifunction && !dev->slot->per_func_slot)
> +		return -ENOTTY;
> +	if (dev->subordinate || !dev->slot ||
>   	    dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
>   		return -ENOTTY;
>   
> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
> index 50fb3eb595fe..51ee59e14393 100644
> --- a/drivers/pci/slot.c
> +++ b/drivers/pci/slot.c
> @@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
>   	return bus_speed_read(slot->bus->cur_bus_speed, buf);
>   }
>   
> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
> +{
> +	if (slot->per_func_slot)
> +		return dev->devfn == slot->number;
> +
> +	return PCI_SLOT(dev->devfn) == slot->number;
> +}
> +
>   static void pci_slot_release(struct kobject *kobj)
>   {
>   	struct pci_dev *dev;
> @@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj)
>   
>   	down_read(&pci_bus_sem);
>   	list_for_each_entry(dev, &slot->bus->devices, bus_list)
> -		if (PCI_SLOT(dev->devfn) == slot->number)
> +		if (pci_dev_matches_slot(dev, slot))
>   			dev->slot = NULL;
>   	up_read(&pci_bus_sem);
>   
> @@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
>   
>   	mutex_lock(&pci_slot_mutex);
>   	list_for_each_entry(slot, &dev->bus->slots, list)
> -		if (PCI_SLOT(dev->devfn) == slot->number)
> +		if (pci_dev_matches_slot(dev, slot))
>   			dev->slot = slot;
>   	mutex_unlock(&pci_slot_mutex);
>   }
> @@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>   
>   	down_read(&pci_bus_sem);
>   	list_for_each_entry(dev, &parent->devices, bus_list)
> -		if (PCI_SLOT(dev->devfn) == slot_nr)
> +		if (pci_dev_matches_slot(dev, slot))
>   			dev->slot = slot;
>   	up_read(&pci_bus_sem);
>   
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 59876de13860..9265f32d9786 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -78,6 +78,7 @@ struct pci_slot {
>   	struct list_head	list;		/* Node in list of slots */
>   	struct hotplug_slot	*hotplug;	/* Hotplug info (move here) */
>   	unsigned char		number;		/* PCI_SLOT(pci_dev->devfn) */
> +	unsigned int		per_func_slot:1; /* Allow per function slot */
>   	struct kobject		kobj;
>   };
>   

This change generates a kernel oops on x86_64. It can be reproduced in a VM.


C.

[    3.073990] BUG: kernel NULL pointer dereference, address: 0000000000000021
[    3.074976] #PF: supervisor read access in kernel mode
[    3.074976] #PF: error_code(0x0000) - not-present page
[    3.074976] PGD 0 P4D 0
[    3.074976] Oops: Oops: 0000 [#1] SMP NOPTI
[    3.074976] CPU: 18 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc6-clg-dirty #8 PREEMPT(voluntary)
[    3.074976] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 4.2 12/17/2024
[    3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
[    3.074976] Code: 4e 08 00 00 40 0f 85 83 00 00 00 48 8b 78 18 e8 27 9d ff ff 83 f8 e7 74 17 48 83 c4 08 5b 5d 41 5c c3 cc cc cc cc 48 8b 43 30 <f6> 40 21 01 75 b6 48 8b 53 10 48 83 7a 10 00 74 5e 48 83 7b 18 00
[    3.074976] RSP: 0000:ffffcd808007b9a8 EFLAGS: 00010202
[    3.074976] RAX: 0000000000000000 RBX: ffff88c4019b8000 RCX: 0000000000000000
[    3.074976] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88c4019b8000
[    3.074976] RBP: 0000000000000001 R08: 0000000000000002 R09: ffffcd808007b99c
[    3.074976] R10: ffffcd808007b950 R11: 0000000000000000 R12: 0000000000000001
[    3.074976] R13: ffff88c4019b80c8 R14: ffff88c401a7e028 R15: ffff88c401a73400
[    3.074976] FS:  0000000000000000(0000) GS:ffff88d38aad5000(0000) knlGS:0000000000000000
[    3.074976] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.074976] CR2: 0000000000000021 CR3: 0000000f66222001 CR4: 0000000000770ef0
[    3.074976] PKRU: 55555554
[    3.074976] Call Trace:
[    3.074976]  <TASK>
[    3.074976]  ? pci_pm_reset+0x39/0x180
[    3.074976]  pci_init_reset_methods+0x52/0x80
[    3.074976]  pci_device_add+0x215/0x5d0
[    3.074976]  pci_scan_single_device+0xa2/0xe0
[    3.074976]  pci_scan_slot+0x66/0x1c0
[    3.074976]  ? klist_next+0x145/0x150
[    3.074976]  pci_scan_child_bus_extend+0x3a/0x290
[    3.074976]  acpi_pci_root_create+0x236/0x2a0
[    3.074976]  pci_acpi_scan_root+0x19b/0x1f0
[    3.074976]  acpi_pci_root_add+0x1a5/0x370
[    3.074976]  acpi_bus_attach+0x1a8/0x290
[    3.074976]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[    3.074976]  device_for_each_child+0x4b/0x80
[    3.074976]  acpi_dev_for_each_child+0x28/0x40
[    3.074976]  ? __pfx_acpi_bus_attach+0x10/0x10
[    3.074976]  acpi_bus_attach+0x7a/0x290
[    3.074976]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    3.074976]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
[    3.074976]  device_for_each_child+0x4b/0x80
[    3.074976]  acpi_dev_for_each_child+0x28/0x40
[    3.074976]  ? __pfx_acpi_bus_attach+0x10/0x10
[    3.074976]  acpi_bus_attach+0x7a/0x290
[    3.074976]  acpi_bus_scan+0x6a/0x1c0
[    3.074976]  ? __pfx_acpi_init+0x10/0x10
[    3.074976]  acpi_scan_init+0xdc/0x280
[    3.074976]  ? __pfx_acpi_init+0x10/0x10
[    3.074976]  acpi_init+0x218/0x530
[    3.074976]  do_one_initcall+0x40/0x310
[    3.074976]  kernel_init_freeable+0x2fe/0x450
[    3.074976]  ? __pfx_kernel_init+0x10/0x10
[    3.074976]  kernel_init+0x16/0x1d0
[    3.074976]  ret_from_fork+0x1ab/0x1e0
[    3.074976]  ? __pfx_kernel_init+0x10/0x10
[    3.074976]  ret_from_fork_asm+0x1a/0x30
[    3.074976]  </TASK>
[    3.074976] Modules linked in:
[    3.074976] CR2: 0000000000000021
[    3.074976] ---[ end trace 0000000000000000 ]---
[    3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Farhan Ali 2 weeks, 2 days ago
On 9/15/2025 11:52 PM, Cédric Le Goater wrote:
> Hello Ali,
>
> On 9/11/25 20:33, Farhan Ali wrote:
>> On s390 systems, which use a machine level hypervisor, PCI devices are
>> always accessed through a form of PCI pass-through which fundamentally
>> operates on a per PCI function granularity. This is also reflected in 
>> the
>> s390 PCI hotplug driver which creates hotplug slots for individual PCI
>> functions. Its reset_slot() function, which is a wrapper for
>> zpci_hot_reset_device(), thus also resets individual functions.
>>
>> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot 
>> object
>> to multifunction devices. This approach worked fine on s390 systems that
>> only exposed virtual functions as individual PCI domains to the 
>> operating
>> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> s390 supports exposing the topology of multifunction PCI devices by
>> grouping them in a shared PCI domain. When attempting to reset a 
>> function
>> through the hotplug driver, the shared slot assignment causes the wrong
>> function to be reset instead of the intended one. It also leaks 
>> memory as
>> we do create a pci_slot object for the function, but don't correctly 
>> free
>> it in pci_slot_release().
>>
>> Add a flag for struct pci_slot to allow per function PCI slots for
>> functions managed through a hypervisor, which exposes individual PCI
>> functions while retaining the topology.
>>
>> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++--
>>   drivers/pci/pci.c                  |  4 +++-
>>   drivers/pci/slot.c                 | 14 +++++++++++---
>>   include/linux/pci.h                |  1 +
>>   4 files changed, 23 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/pci/hotplug/s390_pci_hpc.c 
>> b/drivers/pci/hotplug/s390_pci_hpc.c
>> index d9996516f49e..8b547de464bf 100644
>> --- a/drivers/pci/hotplug/s390_pci_hpc.c
>> +++ b/drivers/pci/hotplug/s390_pci_hpc.c
>> @@ -126,14 +126,20 @@ static const struct hotplug_slot_ops 
>> s390_hotplug_slot_ops = {
>>     int zpci_init_slot(struct zpci_dev *zdev)
>>   {
>> +    int ret;
>>       char name[SLOT_NAME_SIZE];
>>       struct zpci_bus *zbus = zdev->zbus;
>>         zdev->hotplug_slot.ops = &s390_hotplug_slot_ops;
>>         snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
>> -    return pci_hp_register(&zdev->hotplug_slot, zbus->bus,
>> -                   zdev->devfn, name);
>> +    ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus,
>> +                zdev->devfn, name);
>> +    if (ret)
>> +        return ret;
>> +
>> +    zdev->hotplug_slot.pci_slot->per_func_slot = 1;
>> +    return 0;
>>   }
>>     void zpci_exit_slot(struct zpci_dev *zdev)
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 3994fa82df68..70296d3b1cfc 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct 
>> hotplug_slot *hotplug, bool probe)
>>     static int pci_dev_reset_slot_function(struct pci_dev *dev, bool 
>> probe)
>>   {
>> -    if (dev->multifunction || dev->subordinate || !dev->slot ||
>> +    if (dev->multifunction && !dev->slot->per_func_slot)
>> +        return -ENOTTY;
>> +    if (dev->subordinate || !dev->slot ||
>>           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
>>           return -ENOTTY;
>>   diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
>> index 50fb3eb595fe..51ee59e14393 100644
>> --- a/drivers/pci/slot.c
>> +++ b/drivers/pci/slot.c
>> @@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot 
>> *slot, char *buf)
>>       return bus_speed_read(slot->bus->cur_bus_speed, buf);
>>   }
>>   +static bool pci_dev_matches_slot(struct pci_dev *dev, struct 
>> pci_slot *slot)
>> +{
>> +    if (slot->per_func_slot)
>> +        return dev->devfn == slot->number;
>> +
>> +    return PCI_SLOT(dev->devfn) == slot->number;
>> +}
>> +
>>   static void pci_slot_release(struct kobject *kobj)
>>   {
>>       struct pci_dev *dev;
>> @@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj)
>>         down_read(&pci_bus_sem);
>>       list_for_each_entry(dev, &slot->bus->devices, bus_list)
>> -        if (PCI_SLOT(dev->devfn) == slot->number)
>> +        if (pci_dev_matches_slot(dev, slot))
>>               dev->slot = NULL;
>>       up_read(&pci_bus_sem);
>>   @@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
>>         mutex_lock(&pci_slot_mutex);
>>       list_for_each_entry(slot, &dev->bus->slots, list)
>> -        if (PCI_SLOT(dev->devfn) == slot->number)
>> +        if (pci_dev_matches_slot(dev, slot))
>>               dev->slot = slot;
>>       mutex_unlock(&pci_slot_mutex);
>>   }
>> @@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus 
>> *parent, int slot_nr,
>>         down_read(&pci_bus_sem);
>>       list_for_each_entry(dev, &parent->devices, bus_list)
>> -        if (PCI_SLOT(dev->devfn) == slot_nr)
>> +        if (pci_dev_matches_slot(dev, slot))
>>               dev->slot = slot;
>>       up_read(&pci_bus_sem);
>>   diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 59876de13860..9265f32d9786 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -78,6 +78,7 @@ struct pci_slot {
>>       struct list_head    list;        /* Node in list of slots */
>>       struct hotplug_slot    *hotplug;    /* Hotplug info (move here) */
>>       unsigned char        number;        /* PCI_SLOT(pci_dev->devfn) */
>> +    unsigned int        per_func_slot:1; /* Allow per function slot */
>>       struct kobject        kobj;
>>   };
>
> This change generates a kernel oops on x86_64. It can be reproduced in 
> a VM.
>
>
> C.
>
> [    3.073990] BUG: kernel NULL pointer dereference, address: 
> 0000000000000021
> [    3.074976] #PF: supervisor read access in kernel mode
> [    3.074976] #PF: error_code(0x0000) - not-present page
> [    3.074976] PGD 0 P4D 0
> [    3.074976] Oops: Oops: 0000 [#1] SMP NOPTI
> [    3.074976] CPU: 18 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
> 6.17.0-rc6-clg-dirty #8 PREEMPT(voluntary)
> [    3.074976] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 
> 4.2 12/17/2024
> [    3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
> [    3.074976] Code: 4e 08 00 00 40 0f 85 83 00 00 00 48 8b 78 18 e8 
> 27 9d ff ff 83 f8 e7 74 17 48 83 c4 08 5b 5d 41 5c c3 cc cc cc cc 48 
> 8b 43 30 <f6> 40 21 01 75 b6 48 8b 53 10 48 83 7a 10 00 74 5e 48 83 7b 
> 18 00
> [    3.074976] RSP: 0000:ffffcd808007b9a8 EFLAGS: 00010202
> [    3.074976] RAX: 0000000000000000 RBX: ffff88c4019b8000 RCX: 
> 0000000000000000
> [    3.074976] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
> ffff88c4019b8000
> [    3.074976] RBP: 0000000000000001 R08: 0000000000000002 R09: 
> ffffcd808007b99c
> [    3.074976] R10: ffffcd808007b950 R11: 0000000000000000 R12: 
> 0000000000000001
> [    3.074976] R13: ffff88c4019b80c8 R14: ffff88c401a7e028 R15: 
> ffff88c401a73400
> [    3.074976] FS:  0000000000000000(0000) GS:ffff88d38aad5000(0000) 
> knlGS:0000000000000000
> [    3.074976] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.074976] CR2: 0000000000000021 CR3: 0000000f66222001 CR4: 
> 0000000000770ef0
> [    3.074976] PKRU: 55555554
> [    3.074976] Call Trace:
> [    3.074976]  <TASK>
> [    3.074976]  ? pci_pm_reset+0x39/0x180
> [    3.074976]  pci_init_reset_methods+0x52/0x80
> [    3.074976]  pci_device_add+0x215/0x5d0
> [    3.074976]  pci_scan_single_device+0xa2/0xe0
> [    3.074976]  pci_scan_slot+0x66/0x1c0
> [    3.074976]  ? klist_next+0x145/0x150
> [    3.074976]  pci_scan_child_bus_extend+0x3a/0x290
> [    3.074976]  acpi_pci_root_create+0x236/0x2a0
> [    3.074976]  pci_acpi_scan_root+0x19b/0x1f0
> [    3.074976]  acpi_pci_root_add+0x1a5/0x370
> [    3.074976]  acpi_bus_attach+0x1a8/0x290
> [    3.074976]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
> [    3.074976]  device_for_each_child+0x4b/0x80
> [    3.074976]  acpi_dev_for_each_child+0x28/0x40
> [    3.074976]  ? __pfx_acpi_bus_attach+0x10/0x10
> [    3.074976]  acpi_bus_attach+0x7a/0x290
> [    3.074976]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    3.074976]  ? __pfx_acpi_dev_for_one_check+0x10/0x10
> [    3.074976]  device_for_each_child+0x4b/0x80
> [    3.074976]  acpi_dev_for_each_child+0x28/0x40
> [    3.074976]  ? __pfx_acpi_bus_attach+0x10/0x10
> [    3.074976]  acpi_bus_attach+0x7a/0x290
> [    3.074976]  acpi_bus_scan+0x6a/0x1c0
> [    3.074976]  ? __pfx_acpi_init+0x10/0x10
> [    3.074976]  acpi_scan_init+0xdc/0x280
> [    3.074976]  ? __pfx_acpi_init+0x10/0x10
> [    3.074976]  acpi_init+0x218/0x530
> [    3.074976]  do_one_initcall+0x40/0x310
> [    3.074976]  kernel_init_freeable+0x2fe/0x450
> [    3.074976]  ? __pfx_kernel_init+0x10/0x10
> [    3.074976]  kernel_init+0x16/0x1d0
> [    3.074976]  ret_from_fork+0x1ab/0x1e0
> [    3.074976]  ? __pfx_kernel_init+0x10/0x10
> [    3.074976]  ret_from_fork_asm+0x1a/0x30
> [    3.074976]  </TASK>
> [    3.074976] Modules linked in:
> [    3.074976] CR2: 0000000000000021
> [    3.074976] ---[ end trace 0000000000000000 ]---
> [    3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
>
Hi Cedric,

Thanks for pointing this out. I missed that dev->slot could be NULL and 
so the per_func_slot check should be done after the check for 
!dev->slot. I tried this change on top of the patch in an x86_64 VM and 
was able to boot the VM without the oops.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 70296d3b1cfc..3631f7faa0cf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct 
hotplug_slot *hotplug, bool probe)

  static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
  {
-       if (dev->multifunction && !dev->slot->per_func_slot)
-               return -ENOTTY;
         if (dev->subordinate || !dev->slot ||
-           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
+           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+           (dev->multifunction && !dev->slot->per_func_slot))
                 return -ENOTTY;



Thanks
Farhan

Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Cédric Le Goater 2 weeks, 1 day ago
Hi Farhan,

> Hi Cedric,
> 
> Thanks for pointing this out. I missed that dev->slot could be NULL and so the per_func_slot check should be done after the check for !dev->slot. I tried this change on top of the patch in an x86_64 VM and was able to boot the VM without the oops.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 70296d3b1cfc..3631f7faa0cf 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
> 
>   static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
>   {
> -       if (dev->multifunction && !dev->slot->per_func_slot)
> -               return -ENOTTY;
>          if (dev->subordinate || !dev->slot ||
> -           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
> +           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
> +           (dev->multifunction && !dev->slot->per_func_slot))
>                  return -ENOTTY;
All good.

I have pushed the Linux branch I use for vfio :
  
    https://github.com/legoater/linux/commits/vfio/

These commits have small changes :

     PCI: Allow per function PCI slots
     vfio-pci/zdev: Add a device feature for error information

Thanks,

C.


Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Farhan Ali 2 weeks, 1 day ago
On 9/16/2025 11:21 PM, Cédric Le Goater wrote:
> Hi Farhan,
>
>> Hi Cedric,
>>
>> Thanks for pointing this out. I missed that dev->slot could be NULL 
>> and so the per_func_slot check should be done after the check for 
>> !dev->slot. I tried this change on top of the patch in an x86_64 VM 
>> and was able to boot the VM without the oops.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 70296d3b1cfc..3631f7faa0cf 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct 
>> hotplug_slot *hotplug, bool probe)
>>
>>   static int pci_dev_reset_slot_function(struct pci_dev *dev, bool 
>> probe)
>>   {
>> -       if (dev->multifunction && !dev->slot->per_func_slot)
>> -               return -ENOTTY;
>>          if (dev->subordinate || !dev->slot ||
>> -           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
>> +           dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
>> +           (dev->multifunction && !dev->slot->per_func_slot))
>>                  return -ENOTTY;
> All good.
>
> I have pushed the Linux branch I use for vfio :
>
>    https://github.com/legoater/linux/commits/vfio/
>
> These commits have small changes :
>
>     PCI: Allow per function PCI slots
>     vfio-pci/zdev: Add a device feature for error information
>
> Thanks,
>
> C.
>
>
Hi Cedric,

Thanks again for your help in reviewing the patches.

Thanks
Farhan


Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Benjamin Block 2 weeks, 6 days ago
On Thu, Sep 11, 2025 at 11:33:00AM -0700, Farhan Ali wrote:
> On s390 systems, which use a machine level hypervisor, PCI devices are
> always accessed through a form of PCI pass-through which fundamentally
> operates on a per PCI function granularity. This is also reflected in the
> s390 PCI hotplug driver which creates hotplug slots for individual PCI
> functions. Its reset_slot() function, which is a wrapper for
> zpci_hot_reset_device(), thus also resets individual functions.
> 
> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
> to multifunction devices. This approach worked fine on s390 systems that
> only exposed virtual functions as individual PCI domains to the operating
> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> s390 supports exposing the topology of multifunction PCI devices by
> grouping them in a shared PCI domain. When attempting to reset a function
> through the hotplug driver, the shared slot assignment causes the wrong
> function to be reset instead of the intended one. It also leaks memory as
> we do create a pci_slot object for the function, but don't correctly free
> it in pci_slot_release().
> 
> Add a flag for struct pci_slot to allow per function PCI slots for
> functions managed through a hypervisor, which exposes individual PCI
> functions while retaining the topology.
> 
> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")

Stable tag?
Reseting the wrong PCI function sounds bad enough.

-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294
Re: [PATCH v3 03/10] PCI: Allow per function PCI slots
Posted by Farhan Ali 2 weeks, 6 days ago
On 9/12/2025 5:23 AM, Benjamin Block wrote:
> On Thu, Sep 11, 2025 at 11:33:00AM -0700, Farhan Ali wrote:
>> On s390 systems, which use a machine level hypervisor, PCI devices are
>> always accessed through a form of PCI pass-through which fundamentally
>> operates on a per PCI function granularity. This is also reflected in the
>> s390 PCI hotplug driver which creates hotplug slots for individual PCI
>> functions. Its reset_slot() function, which is a wrapper for
>> zpci_hot_reset_device(), thus also resets individual functions.
>>
>> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
>> to multifunction devices. This approach worked fine on s390 systems that
>> only exposed virtual functions as individual PCI domains to the operating
>> system.  Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
>> s390 supports exposing the topology of multifunction PCI devices by
>> grouping them in a shared PCI domain. When attempting to reset a function
>> through the hotplug driver, the shared slot assignment causes the wrong
>> function to be reset instead of the intended one. It also leaks memory as
>> we do create a pci_slot object for the function, but don't correctly free
>> it in pci_slot_release().
>>
>> Add a flag for struct pci_slot to allow per function PCI slots for
>> functions managed through a hypervisor, which exposes individual PCI
>> functions while retaining the topology.
>>
>> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
> Stable tag?
> Reseting the wrong PCI function sounds bad enough.

That's a fair point. This is definitely broken for NETD devices 
(https://www.ibm.com/docs/en/linux-on-systems?topic=express-direct-mode). 
Will cc stable.

Thanks
Farhan