On s390 systems, which use a machine level hypervisor, PCI devices are
always accessed through a form of PCI pass-through which fundamentally
operates on a per PCI function granularity. This is also reflected in the
s390 PCI hotplug driver which creates hotplug slots for individual PCI
functions. Its reset_slot() function, which is a wrapper for
zpci_hot_reset_device(), thus also resets individual functions.
Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object
to multifunction devices. This approach worked fine on s390 systems that
only exposed virtual functions as individual PCI domains to the operating
system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions")
s390 supports exposing the topology of multifunction PCI devices by
grouping them in a shared PCI domain. When attempting to reset a function
through the hotplug driver, the shared slot assignment causes the wrong
function to be reset instead of the intended one. It also leaks memory as
we do create a pci_slot object for the function, but don't correctly free
it in pci_slot_release().
Add a flag for struct pci_slot to allow per function PCI slots for
functions managed through a hypervisor, which exposes individual PCI
functions while retaining the topology.
Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions")
Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++--
drivers/pci/pci.c | 4 +++-
drivers/pci/slot.c | 14 +++++++++++---
include/linux/pci.h | 1 +
4 files changed, 23 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c
index d9996516f49e..8b547de464bf 100644
--- a/drivers/pci/hotplug/s390_pci_hpc.c
+++ b/drivers/pci/hotplug/s390_pci_hpc.c
@@ -126,14 +126,20 @@ static const struct hotplug_slot_ops s390_hotplug_slot_ops = {
int zpci_init_slot(struct zpci_dev *zdev)
{
+ int ret;
char name[SLOT_NAME_SIZE];
struct zpci_bus *zbus = zdev->zbus;
zdev->hotplug_slot.ops = &s390_hotplug_slot_ops;
snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid);
- return pci_hp_register(&zdev->hotplug_slot, zbus->bus,
- zdev->devfn, name);
+ ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus,
+ zdev->devfn, name);
+ if (ret)
+ return ret;
+
+ zdev->hotplug_slot.pci_slot->per_func_slot = 1;
+ return 0;
}
void zpci_exit_slot(struct zpci_dev *zdev)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3994fa82df68..70296d3b1cfc 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe)
static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe)
{
- if (dev->multifunction || dev->subordinate || !dev->slot ||
+ if (dev->multifunction && !dev->slot->per_func_slot)
+ return -ENOTTY;
+ if (dev->subordinate || !dev->slot ||
dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
return -ENOTTY;
diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c
index 50fb3eb595fe..51ee59e14393 100644
--- a/drivers/pci/slot.c
+++ b/drivers/pci/slot.c
@@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf)
return bus_speed_read(slot->bus->cur_bus_speed, buf);
}
+static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot)
+{
+ if (slot->per_func_slot)
+ return dev->devfn == slot->number;
+
+ return PCI_SLOT(dev->devfn) == slot->number;
+}
+
static void pci_slot_release(struct kobject *kobj)
{
struct pci_dev *dev;
@@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj)
down_read(&pci_bus_sem);
list_for_each_entry(dev, &slot->bus->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = NULL;
up_read(&pci_bus_sem);
@@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev)
mutex_lock(&pci_slot_mutex);
list_for_each_entry(slot, &dev->bus->slots, list)
- if (PCI_SLOT(dev->devfn) == slot->number)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
mutex_unlock(&pci_slot_mutex);
}
@@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
down_read(&pci_bus_sem);
list_for_each_entry(dev, &parent->devices, bus_list)
- if (PCI_SLOT(dev->devfn) == slot_nr)
+ if (pci_dev_matches_slot(dev, slot))
dev->slot = slot;
up_read(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 59876de13860..9265f32d9786 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -78,6 +78,7 @@ struct pci_slot {
struct list_head list; /* Node in list of slots */
struct hotplug_slot *hotplug; /* Hotplug info (move here) */
unsigned char number; /* PCI_SLOT(pci_dev->devfn) */
+ unsigned int per_func_slot:1; /* Allow per function slot */
struct kobject kobj;
};
--
2.43.0
Hello Ali, On 9/11/25 20:33, Farhan Ali wrote: > On s390 systems, which use a machine level hypervisor, PCI devices are > always accessed through a form of PCI pass-through which fundamentally > operates on a per PCI function granularity. This is also reflected in the > s390 PCI hotplug driver which creates hotplug slots for individual PCI > functions. Its reset_slot() function, which is a wrapper for > zpci_hot_reset_device(), thus also resets individual functions. > > Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object > to multifunction devices. This approach worked fine on s390 systems that > only exposed virtual functions as individual PCI domains to the operating > system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") > s390 supports exposing the topology of multifunction PCI devices by > grouping them in a shared PCI domain. When attempting to reset a function > through the hotplug driver, the shared slot assignment causes the wrong > function to be reset instead of the intended one. It also leaks memory as > we do create a pci_slot object for the function, but don't correctly free > it in pci_slot_release(). > > Add a flag for struct pci_slot to allow per function PCI slots for > functions managed through a hypervisor, which exposes individual PCI > functions while retaining the topology. > > Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") > Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com> > Signed-off-by: Farhan Ali <alifm@linux.ibm.com> > --- > drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++-- > drivers/pci/pci.c | 4 +++- > drivers/pci/slot.c | 14 +++++++++++--- > include/linux/pci.h | 1 + > 4 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/hotplug/s390_pci_hpc.c b/drivers/pci/hotplug/s390_pci_hpc.c > index d9996516f49e..8b547de464bf 100644 > --- a/drivers/pci/hotplug/s390_pci_hpc.c > +++ b/drivers/pci/hotplug/s390_pci_hpc.c > @@ -126,14 +126,20 @@ static const struct hotplug_slot_ops s390_hotplug_slot_ops = { > > int zpci_init_slot(struct zpci_dev *zdev) > { > + int ret; > char name[SLOT_NAME_SIZE]; > struct zpci_bus *zbus = zdev->zbus; > > zdev->hotplug_slot.ops = &s390_hotplug_slot_ops; > > snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid); > - return pci_hp_register(&zdev->hotplug_slot, zbus->bus, > - zdev->devfn, name); > + ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus, > + zdev->devfn, name); > + if (ret) > + return ret; > + > + zdev->hotplug_slot.pci_slot->per_func_slot = 1; > + return 0; > } > > void zpci_exit_slot(struct zpci_dev *zdev) > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 3994fa82df68..70296d3b1cfc 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe) > > static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) > { > - if (dev->multifunction || dev->subordinate || !dev->slot || > + if (dev->multifunction && !dev->slot->per_func_slot) > + return -ENOTTY; > + if (dev->subordinate || !dev->slot || > dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) > return -ENOTTY; > > diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c > index 50fb3eb595fe..51ee59e14393 100644 > --- a/drivers/pci/slot.c > +++ b/drivers/pci/slot.c > @@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot *slot, char *buf) > return bus_speed_read(slot->bus->cur_bus_speed, buf); > } > > +static bool pci_dev_matches_slot(struct pci_dev *dev, struct pci_slot *slot) > +{ > + if (slot->per_func_slot) > + return dev->devfn == slot->number; > + > + return PCI_SLOT(dev->devfn) == slot->number; > +} > + > static void pci_slot_release(struct kobject *kobj) > { > struct pci_dev *dev; > @@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj) > > down_read(&pci_bus_sem); > list_for_each_entry(dev, &slot->bus->devices, bus_list) > - if (PCI_SLOT(dev->devfn) == slot->number) > + if (pci_dev_matches_slot(dev, slot)) > dev->slot = NULL; > up_read(&pci_bus_sem); > > @@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev) > > mutex_lock(&pci_slot_mutex); > list_for_each_entry(slot, &dev->bus->slots, list) > - if (PCI_SLOT(dev->devfn) == slot->number) > + if (pci_dev_matches_slot(dev, slot)) > dev->slot = slot; > mutex_unlock(&pci_slot_mutex); > } > @@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr, > > down_read(&pci_bus_sem); > list_for_each_entry(dev, &parent->devices, bus_list) > - if (PCI_SLOT(dev->devfn) == slot_nr) > + if (pci_dev_matches_slot(dev, slot)) > dev->slot = slot; > up_read(&pci_bus_sem); > > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 59876de13860..9265f32d9786 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -78,6 +78,7 @@ struct pci_slot { > struct list_head list; /* Node in list of slots */ > struct hotplug_slot *hotplug; /* Hotplug info (move here) */ > unsigned char number; /* PCI_SLOT(pci_dev->devfn) */ > + unsigned int per_func_slot:1; /* Allow per function slot */ > struct kobject kobj; > }; > This change generates a kernel oops on x86_64. It can be reproduced in a VM. C. [ 3.073990] BUG: kernel NULL pointer dereference, address: 0000000000000021 [ 3.074976] #PF: supervisor read access in kernel mode [ 3.074976] #PF: error_code(0x0000) - not-present page [ 3.074976] PGD 0 P4D 0 [ 3.074976] Oops: Oops: 0000 [#1] SMP NOPTI [ 3.074976] CPU: 18 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.17.0-rc6-clg-dirty #8 PREEMPT(voluntary) [ 3.074976] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 4.2 12/17/2024 [ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160 [ 3.074976] Code: 4e 08 00 00 40 0f 85 83 00 00 00 48 8b 78 18 e8 27 9d ff ff 83 f8 e7 74 17 48 83 c4 08 5b 5d 41 5c c3 cc cc cc cc 48 8b 43 30 <f6> 40 21 01 75 b6 48 8b 53 10 48 83 7a 10 00 74 5e 48 83 7b 18 00 [ 3.074976] RSP: 0000:ffffcd808007b9a8 EFLAGS: 00010202 [ 3.074976] RAX: 0000000000000000 RBX: ffff88c4019b8000 RCX: 0000000000000000 [ 3.074976] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88c4019b8000 [ 3.074976] RBP: 0000000000000001 R08: 0000000000000002 R09: ffffcd808007b99c [ 3.074976] R10: ffffcd808007b950 R11: 0000000000000000 R12: 0000000000000001 [ 3.074976] R13: ffff88c4019b80c8 R14: ffff88c401a7e028 R15: ffff88c401a73400 [ 3.074976] FS: 0000000000000000(0000) GS:ffff88d38aad5000(0000) knlGS:0000000000000000 [ 3.074976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.074976] CR2: 0000000000000021 CR3: 0000000f66222001 CR4: 0000000000770ef0 [ 3.074976] PKRU: 55555554 [ 3.074976] Call Trace: [ 3.074976] <TASK> [ 3.074976] ? pci_pm_reset+0x39/0x180 [ 3.074976] pci_init_reset_methods+0x52/0x80 [ 3.074976] pci_device_add+0x215/0x5d0 [ 3.074976] pci_scan_single_device+0xa2/0xe0 [ 3.074976] pci_scan_slot+0x66/0x1c0 [ 3.074976] ? klist_next+0x145/0x150 [ 3.074976] pci_scan_child_bus_extend+0x3a/0x290 [ 3.074976] acpi_pci_root_create+0x236/0x2a0 [ 3.074976] pci_acpi_scan_root+0x19b/0x1f0 [ 3.074976] acpi_pci_root_add+0x1a5/0x370 [ 3.074976] acpi_bus_attach+0x1a8/0x290 [ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10 [ 3.074976] device_for_each_child+0x4b/0x80 [ 3.074976] acpi_dev_for_each_child+0x28/0x40 [ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10 [ 3.074976] acpi_bus_attach+0x7a/0x290 [ 3.074976] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10 [ 3.074976] device_for_each_child+0x4b/0x80 [ 3.074976] acpi_dev_for_each_child+0x28/0x40 [ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10 [ 3.074976] acpi_bus_attach+0x7a/0x290 [ 3.074976] acpi_bus_scan+0x6a/0x1c0 [ 3.074976] ? __pfx_acpi_init+0x10/0x10 [ 3.074976] acpi_scan_init+0xdc/0x280 [ 3.074976] ? __pfx_acpi_init+0x10/0x10 [ 3.074976] acpi_init+0x218/0x530 [ 3.074976] do_one_initcall+0x40/0x310 [ 3.074976] kernel_init_freeable+0x2fe/0x450 [ 3.074976] ? __pfx_kernel_init+0x10/0x10 [ 3.074976] kernel_init+0x16/0x1d0 [ 3.074976] ret_from_fork+0x1ab/0x1e0 [ 3.074976] ? __pfx_kernel_init+0x10/0x10 [ 3.074976] ret_from_fork_asm+0x1a/0x30 [ 3.074976] </TASK> [ 3.074976] Modules linked in: [ 3.074976] CR2: 0000000000000021 [ 3.074976] ---[ end trace 0000000000000000 ]--- [ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160
On 9/15/2025 11:52 PM, Cédric Le Goater wrote: > Hello Ali, > > On 9/11/25 20:33, Farhan Ali wrote: >> On s390 systems, which use a machine level hypervisor, PCI devices are >> always accessed through a form of PCI pass-through which fundamentally >> operates on a per PCI function granularity. This is also reflected in >> the >> s390 PCI hotplug driver which creates hotplug slots for individual PCI >> functions. Its reset_slot() function, which is a wrapper for >> zpci_hot_reset_device(), thus also resets individual functions. >> >> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot >> object >> to multifunction devices. This approach worked fine on s390 systems that >> only exposed virtual functions as individual PCI domains to the >> operating >> system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") >> s390 supports exposing the topology of multifunction PCI devices by >> grouping them in a shared PCI domain. When attempting to reset a >> function >> through the hotplug driver, the shared slot assignment causes the wrong >> function to be reset instead of the intended one. It also leaks >> memory as >> we do create a pci_slot object for the function, but don't correctly >> free >> it in pci_slot_release(). >> >> Add a flag for struct pci_slot to allow per function PCI slots for >> functions managed through a hypervisor, which exposes individual PCI >> functions while retaining the topology. >> >> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") >> Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> >> --- >> drivers/pci/hotplug/s390_pci_hpc.c | 10 ++++++++-- >> drivers/pci/pci.c | 4 +++- >> drivers/pci/slot.c | 14 +++++++++++--- >> include/linux/pci.h | 1 + >> 4 files changed, 23 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/pci/hotplug/s390_pci_hpc.c >> b/drivers/pci/hotplug/s390_pci_hpc.c >> index d9996516f49e..8b547de464bf 100644 >> --- a/drivers/pci/hotplug/s390_pci_hpc.c >> +++ b/drivers/pci/hotplug/s390_pci_hpc.c >> @@ -126,14 +126,20 @@ static const struct hotplug_slot_ops >> s390_hotplug_slot_ops = { >> int zpci_init_slot(struct zpci_dev *zdev) >> { >> + int ret; >> char name[SLOT_NAME_SIZE]; >> struct zpci_bus *zbus = zdev->zbus; >> zdev->hotplug_slot.ops = &s390_hotplug_slot_ops; >> snprintf(name, SLOT_NAME_SIZE, "%08x", zdev->fid); >> - return pci_hp_register(&zdev->hotplug_slot, zbus->bus, >> - zdev->devfn, name); >> + ret = pci_hp_register(&zdev->hotplug_slot, zbus->bus, >> + zdev->devfn, name); >> + if (ret) >> + return ret; >> + >> + zdev->hotplug_slot.pci_slot->per_func_slot = 1; >> + return 0; >> } >> void zpci_exit_slot(struct zpci_dev *zdev) >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 3994fa82df68..70296d3b1cfc 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -5061,7 +5061,9 @@ static int pci_reset_hotplug_slot(struct >> hotplug_slot *hotplug, bool probe) >> static int pci_dev_reset_slot_function(struct pci_dev *dev, bool >> probe) >> { >> - if (dev->multifunction || dev->subordinate || !dev->slot || >> + if (dev->multifunction && !dev->slot->per_func_slot) >> + return -ENOTTY; >> + if (dev->subordinate || !dev->slot || >> dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) >> return -ENOTTY; >> diff --git a/drivers/pci/slot.c b/drivers/pci/slot.c >> index 50fb3eb595fe..51ee59e14393 100644 >> --- a/drivers/pci/slot.c >> +++ b/drivers/pci/slot.c >> @@ -63,6 +63,14 @@ static ssize_t cur_speed_read_file(struct pci_slot >> *slot, char *buf) >> return bus_speed_read(slot->bus->cur_bus_speed, buf); >> } >> +static bool pci_dev_matches_slot(struct pci_dev *dev, struct >> pci_slot *slot) >> +{ >> + if (slot->per_func_slot) >> + return dev->devfn == slot->number; >> + >> + return PCI_SLOT(dev->devfn) == slot->number; >> +} >> + >> static void pci_slot_release(struct kobject *kobj) >> { >> struct pci_dev *dev; >> @@ -73,7 +81,7 @@ static void pci_slot_release(struct kobject *kobj) >> down_read(&pci_bus_sem); >> list_for_each_entry(dev, &slot->bus->devices, bus_list) >> - if (PCI_SLOT(dev->devfn) == slot->number) >> + if (pci_dev_matches_slot(dev, slot)) >> dev->slot = NULL; >> up_read(&pci_bus_sem); >> @@ -166,7 +174,7 @@ void pci_dev_assign_slot(struct pci_dev *dev) >> mutex_lock(&pci_slot_mutex); >> list_for_each_entry(slot, &dev->bus->slots, list) >> - if (PCI_SLOT(dev->devfn) == slot->number) >> + if (pci_dev_matches_slot(dev, slot)) >> dev->slot = slot; >> mutex_unlock(&pci_slot_mutex); >> } >> @@ -285,7 +293,7 @@ struct pci_slot *pci_create_slot(struct pci_bus >> *parent, int slot_nr, >> down_read(&pci_bus_sem); >> list_for_each_entry(dev, &parent->devices, bus_list) >> - if (PCI_SLOT(dev->devfn) == slot_nr) >> + if (pci_dev_matches_slot(dev, slot)) >> dev->slot = slot; >> up_read(&pci_bus_sem); >> diff --git a/include/linux/pci.h b/include/linux/pci.h >> index 59876de13860..9265f32d9786 100644 >> --- a/include/linux/pci.h >> +++ b/include/linux/pci.h >> @@ -78,6 +78,7 @@ struct pci_slot { >> struct list_head list; /* Node in list of slots */ >> struct hotplug_slot *hotplug; /* Hotplug info (move here) */ >> unsigned char number; /* PCI_SLOT(pci_dev->devfn) */ >> + unsigned int per_func_slot:1; /* Allow per function slot */ >> struct kobject kobj; >> }; > > This change generates a kernel oops on x86_64. It can be reproduced in > a VM. > > > C. > > [ 3.073990] BUG: kernel NULL pointer dereference, address: > 0000000000000021 > [ 3.074976] #PF: supervisor read access in kernel mode > [ 3.074976] #PF: error_code(0x0000) - not-present page > [ 3.074976] PGD 0 P4D 0 > [ 3.074976] Oops: Oops: 0000 [#1] SMP NOPTI > [ 3.074976] CPU: 18 UID: 0 PID: 1 Comm: swapper/0 Not tainted > 6.17.0-rc6-clg-dirty #8 PREEMPT(voluntary) > [ 3.074976] Hardware name: Supermicro Super Server/X13SAE-F, BIOS > 4.2 12/17/2024 > [ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160 > [ 3.074976] Code: 4e 08 00 00 40 0f 85 83 00 00 00 48 8b 78 18 e8 > 27 9d ff ff 83 f8 e7 74 17 48 83 c4 08 5b 5d 41 5c c3 cc cc cc cc 48 > 8b 43 30 <f6> 40 21 01 75 b6 48 8b 53 10 48 83 7a 10 00 74 5e 48 83 7b > 18 00 > [ 3.074976] RSP: 0000:ffffcd808007b9a8 EFLAGS: 00010202 > [ 3.074976] RAX: 0000000000000000 RBX: ffff88c4019b8000 RCX: > 0000000000000000 > [ 3.074976] RDX: 0000000000000000 RSI: 0000000000000001 RDI: > ffff88c4019b8000 > [ 3.074976] RBP: 0000000000000001 R08: 0000000000000002 R09: > ffffcd808007b99c > [ 3.074976] R10: ffffcd808007b950 R11: 0000000000000000 R12: > 0000000000000001 > [ 3.074976] R13: ffff88c4019b80c8 R14: ffff88c401a7e028 R15: > ffff88c401a73400 > [ 3.074976] FS: 0000000000000000(0000) GS:ffff88d38aad5000(0000) > knlGS:0000000000000000 > [ 3.074976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 3.074976] CR2: 0000000000000021 CR3: 0000000f66222001 CR4: > 0000000000770ef0 > [ 3.074976] PKRU: 55555554 > [ 3.074976] Call Trace: > [ 3.074976] <TASK> > [ 3.074976] ? pci_pm_reset+0x39/0x180 > [ 3.074976] pci_init_reset_methods+0x52/0x80 > [ 3.074976] pci_device_add+0x215/0x5d0 > [ 3.074976] pci_scan_single_device+0xa2/0xe0 > [ 3.074976] pci_scan_slot+0x66/0x1c0 > [ 3.074976] ? klist_next+0x145/0x150 > [ 3.074976] pci_scan_child_bus_extend+0x3a/0x290 > [ 3.074976] acpi_pci_root_create+0x236/0x2a0 > [ 3.074976] pci_acpi_scan_root+0x19b/0x1f0 > [ 3.074976] acpi_pci_root_add+0x1a5/0x370 > [ 3.074976] acpi_bus_attach+0x1a8/0x290 > [ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10 > [ 3.074976] device_for_each_child+0x4b/0x80 > [ 3.074976] acpi_dev_for_each_child+0x28/0x40 > [ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10 > [ 3.074976] acpi_bus_attach+0x7a/0x290 > [ 3.074976] ? _raw_spin_unlock_irqrestore+0x23/0x40 > [ 3.074976] ? __pfx_acpi_dev_for_one_check+0x10/0x10 > [ 3.074976] device_for_each_child+0x4b/0x80 > [ 3.074976] acpi_dev_for_each_child+0x28/0x40 > [ 3.074976] ? __pfx_acpi_bus_attach+0x10/0x10 > [ 3.074976] acpi_bus_attach+0x7a/0x290 > [ 3.074976] acpi_bus_scan+0x6a/0x1c0 > [ 3.074976] ? __pfx_acpi_init+0x10/0x10 > [ 3.074976] acpi_scan_init+0xdc/0x280 > [ 3.074976] ? __pfx_acpi_init+0x10/0x10 > [ 3.074976] acpi_init+0x218/0x530 > [ 3.074976] do_one_initcall+0x40/0x310 > [ 3.074976] kernel_init_freeable+0x2fe/0x450 > [ 3.074976] ? __pfx_kernel_init+0x10/0x10 > [ 3.074976] kernel_init+0x16/0x1d0 > [ 3.074976] ret_from_fork+0x1ab/0x1e0 > [ 3.074976] ? __pfx_kernel_init+0x10/0x10 > [ 3.074976] ret_from_fork_asm+0x1a/0x30 > [ 3.074976] </TASK> > [ 3.074976] Modules linked in: > [ 3.074976] CR2: 0000000000000021 > [ 3.074976] ---[ end trace 0000000000000000 ]--- > [ 3.074976] RIP: 0010:pci_reset_bus_function+0xdf/0x160 > Hi Cedric, Thanks for pointing this out. I missed that dev->slot could be NULL and so the per_func_slot check should be done after the check for !dev->slot. I tried this change on top of the patch in an x86_64 VM and was able to boot the VM without the oops. diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 70296d3b1cfc..3631f7faa0cf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe) static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) { - if (dev->multifunction && !dev->slot->per_func_slot) - return -ENOTTY; if (dev->subordinate || !dev->slot || - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || + (dev->multifunction && !dev->slot->per_func_slot)) return -ENOTTY; Thanks Farhan
Hi Farhan, > Hi Cedric, > > Thanks for pointing this out. I missed that dev->slot could be NULL and so the per_func_slot check should be done after the check for !dev->slot. I tried this change on top of the patch in an x86_64 VM and was able to boot the VM without the oops. > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 70296d3b1cfc..3631f7faa0cf 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, bool probe) > > static int pci_dev_reset_slot_function(struct pci_dev *dev, bool probe) > { > - if (dev->multifunction && !dev->slot->per_func_slot) > - return -ENOTTY; > if (dev->subordinate || !dev->slot || > - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) > + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || > + (dev->multifunction && !dev->slot->per_func_slot)) > return -ENOTTY; All good. I have pushed the Linux branch I use for vfio : https://github.com/legoater/linux/commits/vfio/ These commits have small changes : PCI: Allow per function PCI slots vfio-pci/zdev: Add a device feature for error information Thanks, C.
On 9/16/2025 11:21 PM, Cédric Le Goater wrote: > Hi Farhan, > >> Hi Cedric, >> >> Thanks for pointing this out. I missed that dev->slot could be NULL >> and so the per_func_slot check should be done after the check for >> !dev->slot. I tried this change on top of the patch in an x86_64 VM >> and was able to boot the VM without the oops. >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 70296d3b1cfc..3631f7faa0cf 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -5061,10 +5061,9 @@ static int pci_reset_hotplug_slot(struct >> hotplug_slot *hotplug, bool probe) >> >> static int pci_dev_reset_slot_function(struct pci_dev *dev, bool >> probe) >> { >> - if (dev->multifunction && !dev->slot->per_func_slot) >> - return -ENOTTY; >> if (dev->subordinate || !dev->slot || >> - dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) >> + dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || >> + (dev->multifunction && !dev->slot->per_func_slot)) >> return -ENOTTY; > All good. > > I have pushed the Linux branch I use for vfio : > > https://github.com/legoater/linux/commits/vfio/ > > These commits have small changes : > > PCI: Allow per function PCI slots > vfio-pci/zdev: Add a device feature for error information > > Thanks, > > C. > > Hi Cedric, Thanks again for your help in reviewing the patches. Thanks Farhan
On Thu, Sep 11, 2025 at 11:33:00AM -0700, Farhan Ali wrote: > On s390 systems, which use a machine level hypervisor, PCI devices are > always accessed through a form of PCI pass-through which fundamentally > operates on a per PCI function granularity. This is also reflected in the > s390 PCI hotplug driver which creates hotplug slots for individual PCI > functions. Its reset_slot() function, which is a wrapper for > zpci_hot_reset_device(), thus also resets individual functions. > > Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object > to multifunction devices. This approach worked fine on s390 systems that > only exposed virtual functions as individual PCI domains to the operating > system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") > s390 supports exposing the topology of multifunction PCI devices by > grouping them in a shared PCI domain. When attempting to reset a function > through the hotplug driver, the shared slot assignment causes the wrong > function to be reset instead of the intended one. It also leaks memory as > we do create a pci_slot object for the function, but don't correctly free > it in pci_slot_release(). > > Add a flag for struct pci_slot to allow per function PCI slots for > functions managed through a hypervisor, which exposes individual PCI > functions while retaining the topology. > > Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") Stable tag? Reseting the wrong PCI function sounds bad enough. -- Best Regards, Benjamin Block / Linux on IBM Z Kernel Development IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy Vors. Aufs.-R.: Wolfgang Wendt / Geschäftsführung: David Faller Sitz der Ges.: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
On 9/12/2025 5:23 AM, Benjamin Block wrote: > On Thu, Sep 11, 2025 at 11:33:00AM -0700, Farhan Ali wrote: >> On s390 systems, which use a machine level hypervisor, PCI devices are >> always accessed through a form of PCI pass-through which fundamentally >> operates on a per PCI function granularity. This is also reflected in the >> s390 PCI hotplug driver which creates hotplug slots for individual PCI >> functions. Its reset_slot() function, which is a wrapper for >> zpci_hot_reset_device(), thus also resets individual functions. >> >> Currently, the kernel's PCI_SLOT() macro assigns the same pci_slot object >> to multifunction devices. This approach worked fine on s390 systems that >> only exposed virtual functions as individual PCI domains to the operating >> system. Since commit 44510d6fa0c0 ("s390/pci: Handling multifunctions") >> s390 supports exposing the topology of multifunction PCI devices by >> grouping them in a shared PCI domain. When attempting to reset a function >> through the hotplug driver, the shared slot assignment causes the wrong >> function to be reset instead of the intended one. It also leaks memory as >> we do create a pci_slot object for the function, but don't correctly free >> it in pci_slot_release(). >> >> Add a flag for struct pci_slot to allow per function PCI slots for >> functions managed through a hypervisor, which exposes individual PCI >> functions while retaining the topology. >> >> Fixes: 44510d6fa0c0 ("s390/pci: Handling multifunctions") > Stable tag? > Reseting the wrong PCI function sounds bad enough. That's a fair point. This is definitely broken for NETD devices (https://www.ibm.com/docs/en/linux-on-systems?topic=express-direct-mode). Will cc stable. Thanks Farhan
© 2016 - 2025 Red Hat, Inc.