When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
adding device to bus and enabling it will kick in async host scan
scsi_scan_host+0x21/0x1f0
virtscsi_probe+0x2dd/0x350
..
driver_probe_device+0x19/0x80
...
driver_probe_device+0x19/0x80
pci_bus_add_device+0x53/0x80
pci_bus_add_devices+0x2b/0x70
...
which will schedule a job for async scan. That however breaks
if there are more than one SCSI host behind bridge, since
acpiphp_check_bridge() will walk over all slots and try to
enable each of them regardless of whether they were already
enabled.
As result the bridge might be reconfigured several times
and trigger following sequence:
[cpu 0] acpiphp_check_bridge()
[cpu 0] enable_slot(a)
[cpu 0] configure bridge
[cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
[cpu 0] enable_slot(b)
...
[cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
...
[cpu 0] configure bridge <- temporaly disables bridge
and cause do_scsi_scan_host() failure.
The same race affects SHPC (but it manages to avoid hitting the race due to
1sec delay when enabling slot).
To cover case of single device hotplug (at a time) do not attempt to
enable slot that have already been enabled.
Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
Reported-by: iona Ebner <f.ebner@proxmox.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index 601129772b2d..6b11609927d6 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
trim_stale_devices(dev);
/* configure all functions */
- enable_slot(slot, true);
+ if (slot->flags != SLOT_ENABLED) {
+ enable_slot(slot, true);
+ }
} else {
disable_slot(slot);
}
--
2.39.3
On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
>
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
>
> scsi_scan_host+0x21/0x1f0
> virtscsi_probe+0x2dd/0x350
> ..
> driver_probe_device+0x19/0x80
> ...
> driver_probe_device+0x19/0x80
> pci_bus_add_device+0x53/0x80
> pci_bus_add_devices+0x2b/0x70
> ...
>
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
>
> [cpu 0] acpiphp_check_bridge()
> [cpu 0] enable_slot(a)
> [cpu 0] configure bridge
> [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> [cpu 0] enable_slot(b)
> ...
> [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> ...
> [cpu 0] configure bridge <- temporaly disables bridge
>
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
>
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
> drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index 601129772b2d..6b11609927d6 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> trim_stale_devices(dev);
>
> /* configure all functions */
> - enable_slot(slot, true);
> + if (slot->flags != SLOT_ENABLED) {
> + enable_slot(slot, true);
> + }
Shouldn't this be following the acpiphp_enable_slot() pattern, that is
if (!(slot->flags & SLOT_ENABLED))
enable_slot(slot, true);
Also the braces are redundant.
> } else {
> disable_slot(slot);
> }
> --
On Wed, Dec 13, 2023 at 2:01 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@redhat.com> wrote:
> >
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> >
> > scsi_scan_host+0x21/0x1f0
> > virtscsi_probe+0x2dd/0x350
> > ..
> > driver_probe_device+0x19/0x80
> > ...
> > driver_probe_device+0x19/0x80
> > pci_bus_add_device+0x53/0x80
> > pci_bus_add_devices+0x2b/0x70
> > ...
> >
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> >
> > [cpu 0] acpiphp_check_bridge()
> > [cpu 0] enable_slot(a)
> > [cpu 0] configure bridge
> > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> > [cpu 0] enable_slot(b)
> > ...
> > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> > ...
> > [cpu 0] configure bridge <- temporaly disables bridge
> >
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> >
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > ---
> > drivers/pci/hotplug/acpiphp_glue.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> > index 601129772b2d..6b11609927d6 100644
> > --- a/drivers/pci/hotplug/acpiphp_glue.c
> > +++ b/drivers/pci/hotplug/acpiphp_glue.c
> > @@ -722,7 +722,9 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge)
> > trim_stale_devices(dev);
> >
> > /* configure all functions */
> > - enable_slot(slot, true);
> > + if (slot->flags != SLOT_ENABLED) {
> > + enable_slot(slot, true);
> > + }
>
> Shouldn't this be following the acpiphp_enable_slot() pattern, that is
>
> if (!(slot->flags & SLOT_ENABLED))
> enable_slot(slot, true);
>
> Also the braces are redundant.
I'll fix up on respin if Bjorn is fine with the approach in general.
Patches need respin anyways to fix botched up white spacing.
>
> > } else {
> > disable_slot(slot);
> > }
> > --
>
Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> adding device to bus and enabling it will kick in async host scan
>
> scsi_scan_host+0x21/0x1f0
> virtscsi_probe+0x2dd/0x350
> ..
> driver_probe_device+0x19/0x80
> ...
> driver_probe_device+0x19/0x80
> pci_bus_add_device+0x53/0x80
> pci_bus_add_devices+0x2b/0x70
> ...
>
> which will schedule a job for async scan. That however breaks
> if there are more than one SCSI host behind bridge, since
> acpiphp_check_bridge() will walk over all slots and try to
> enable each of them regardless of whether they were already
> enabled.
> As result the bridge might be reconfigured several times
> and trigger following sequence:
>
> [cpu 0] acpiphp_check_bridge()
> [cpu 0] enable_slot(a)
> [cpu 0] configure bridge
> [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> [cpu 0] enable_slot(b)
> ...
> [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> ...
> [cpu 0] configure bridge <- temporaly disables bridge
>
> and cause do_scsi_scan_host() failure.
> The same race affects SHPC (but it manages to avoid hitting the race due to
> 1sec delay when enabling slot).
> To cover case of single device hotplug (at a time) do not attempt to
> enable slot that have already been enabled.
>
> Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reported-by: iona Ebner <f.ebner@proxmox.com>
Missing an F here ;)
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Thank you! Works for me:
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
On Wed, 13 Dec 2023 10:47:27 +0100
Fiona Ebner <f.ebner@proxmox.com> wrote:
> Am 13.12.23 um 01:36 schrieb Igor Mammedov:
> > When SCSI_SCAN_ASYNC is enabled (either via config or via cmd line),
> > adding device to bus and enabling it will kick in async host scan
> >
> > scsi_scan_host+0x21/0x1f0
> > virtscsi_probe+0x2dd/0x350
> > ..
> > driver_probe_device+0x19/0x80
> > ...
> > driver_probe_device+0x19/0x80
> > pci_bus_add_device+0x53/0x80
> > pci_bus_add_devices+0x2b/0x70
> > ...
> >
> > which will schedule a job for async scan. That however breaks
> > if there are more than one SCSI host behind bridge, since
> > acpiphp_check_bridge() will walk over all slots and try to
> > enable each of them regardless of whether they were already
> > enabled.
> > As result the bridge might be reconfigured several times
> > and trigger following sequence:
> >
> > [cpu 0] acpiphp_check_bridge()
> > [cpu 0] enable_slot(a)
> > [cpu 0] configure bridge
> > [cpu 0] pci_bus_add_devices() -> scsi_scan_host(a1)
> > [cpu 0] enable_slot(b)
> > ...
> > [cpu 1] do_scsi_scan_host(a1) <- async jib scheduled for slot a
> > ...
> > [cpu 0] configure bridge <- temporaly disables bridge
> >
> > and cause do_scsi_scan_host() failure.
> > The same race affects SHPC (but it manages to avoid hitting the race due to
> > 1sec delay when enabling slot).
> > To cover case of single device hotplug (at a time) do not attempt to
> > enable slot that have already been enabled.
> >
> > Fixes: 40613da52b13 ("PCI: acpiphp: Reassign resources on bridge if necessary")
> > Reported-by: Dongli Zhang <dongli.zhang@oracle.com>
> > Reported-by: iona Ebner <f.ebner@proxmox.com>
>
> Missing an F here ;)
Sorry for copypaste mistake, I'll fix it up on the next submission.
>
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> Thank you! Works for me:
>
> Tested-by: Fiona Ebner <f.ebner@proxmox.com>
>
© 2016 - 2025 Red Hat, Inc.