drivers/cxl/core/port.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
In cxl_acpi_probe, we add dports and uports to host bridges iteratively:
- bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport);
- bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport);
Simultaneously, as ports are probed, memdev endpoints can also be
probed. This creates a race condition, where an endpoint can perceive
its path to the root being broken in devm_cxl_enumerate_ports.
The memdev/endpoint probe will see a heirarchy that may look like:
mem1
parent => 0000:c1:00.0
parent => 0000:c0:01.1
parent->parent => NULL
This results in find_cxl_port() returning NULL (since the port hasn't
been associated with the host bridge yet), and add_port_attach_ep
fails because the grandparent's grandparent is NULL.
When the latter condition is detected, the comments suggest:
/*
* The iteration reached the topology root without finding the
* CXL-root 'cxl_port' on a previous iteration, fail for now to
* be re-probed after platform driver attaches.
*/
This case results in an -ENXIO; however, a re-probe never occurs. Change
this return condition to -EPROBE_DEFER to explicitly cause a reprobe.
(v2: additional debug information)
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/cxl/core/port.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 1d5007e3795a..d6bebf70d142 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1553,7 +1553,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
*/
dev_dbg(&cxlmd->dev, "%s is a root dport\n",
dev_name(dport_dev));
- return -ENXIO;
+ return -EPROBE_DEFER;
}
parent_port = find_cxl_port(dparent, &parent_dport);
--
2.43.0
Gregory Price wrote: > In cxl_acpi_probe, we add dports and uports to host bridges iteratively: > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport); > > Simultaneously, as ports are probed, memdev endpoints can also be > probed. This creates a race condition, where an endpoint can perceive > its path to the root being broken in devm_cxl_enumerate_ports. > > The memdev/endpoint probe will see a heirarchy that may look like: > mem1 > parent => 0000:c1:00.0 > parent => 0000:c0:01.1 > parent->parent => NULL > > This results in find_cxl_port() returning NULL (since the port hasn't > been associated with the host bridge yet), and add_port_attach_ep > fails because the grandparent's grandparent is NULL. > > When the latter condition is detected, the comments suggest: > /* > * The iteration reached the topology root without finding the > * CXL-root 'cxl_port' on a previous iteration, fail for now to > * be re-probed after platform driver attaches. > */ > > This case results in an -ENXIO; however, a re-probe never occurs. Change > this return condition to -EPROBE_DEFER to explicitly cause a reprobe. Ok, thanks for the additional debug. Like we chatted on the CXL Discord I think this is potentially pointing to a bug in bus_rescan_devices() where it checks dev->driver without holding the lock. Can you give this fix a try to see if it also resolves the issue? Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi driver loads waaaay after deferred probing has given up, and if this works then EPROBE_DEFER can remain limited to cases where it is absolutely known that no other device_attach() kick is coming to save the day. -- 8< -- diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index 1d5007e3795a..6c0cd94888a3 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -2088,11 +2088,18 @@ static void cxl_bus_remove(struct device *dev) static struct workqueue_struct *cxl_bus_wq; -static void cxl_bus_rescan_queue(struct work_struct *w) +static int attach_device(struct device *dev, void *data) { - int rc = bus_rescan_devices(&cxl_bus_type); + int rc = device_attach(dev); + + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); - pr_debug("CXL bus rescan result: %d\n", rc); + return 0; +} + +static void cxl_bus_rescan_queue(struct work_struct *w) +{ + bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device); } void cxl_bus_rescan(void)
On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote: > Gregory Price wrote: > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively: > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport); > > > > Simultaneously, as ports are probed, memdev endpoints can also be > > probed. This creates a race condition, where an endpoint can perceive > > its path to the root being broken in devm_cxl_enumerate_ports. > > > > The memdev/endpoint probe will see a heirarchy that may look like: > > mem1 > > parent => 0000:c1:00.0 > > parent => 0000:c0:01.1 > > parent->parent => NULL > > > > This results in find_cxl_port() returning NULL (since the port hasn't > > been associated with the host bridge yet), and add_port_attach_ep > > fails because the grandparent's grandparent is NULL. > > > > When the latter condition is detected, the comments suggest: > > /* > > * The iteration reached the topology root without finding the > > * CXL-root 'cxl_port' on a previous iteration, fail for now to > > * be re-probed after platform driver attaches. > > */ > > > > This case results in an -ENXIO; however, a re-probe never occurs. Change > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe. > > Ok, thanks for the additional debug. Like we chatted on the CXL Discord > I think this is potentially pointing to a bug in bus_rescan_devices() > where it checks dev->driver without holding the lock. > > Can you give this fix a try to see if it also resolves the issue? > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi > driver loads waaaay after deferred probing has given up, and if this > works then EPROBE_DEFER can remain limited to cases where it is > absolutely known that no other device_attach() kick is coming to save > the day. > Funny enough, not only did it not work, but now i get neither endpoint lol $ ls /sys/bus/cxl/devices/ decoder0.0 decoder1.0 decoder2.0 decoder3.1 mem0 port1 port3 root0 decoder0.1 decoder1.1 decoder3.0 decoder4.0 mem1 port2 port4 w/ reprobe patch # ls /sys/bus/cxl/devices decoder0.0 decoder1.0 decoder2.0 decoder3.1 decoder5.0 decoder6.0 endpoint5 mem0 port1 port3 root0 decoder0.1 decoder1.1 decoder3.0 decoder4.0 decoder5.1 decoder6.1 endpoint6 mem1 port2 port4 > -- 8< -- > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c > index 1d5007e3795a..6c0cd94888a3 100644 > --- a/drivers/cxl/core/port.c > +++ b/drivers/cxl/core/port.c > @@ -2088,11 +2088,18 @@ static void cxl_bus_remove(struct device *dev) > > static struct workqueue_struct *cxl_bus_wq; > > -static void cxl_bus_rescan_queue(struct work_struct *w) > +static int attach_device(struct device *dev, void *data) > { > - int rc = bus_rescan_devices(&cxl_bus_type); > + int rc = device_attach(dev); > + > + dev_vdbg(dev, "rescan: %s\n", rc ? "attach" : "detached"); > > - pr_debug("CXL bus rescan result: %d\n", rc); > + return 0; > +} > + > +static void cxl_bus_rescan_queue(struct work_struct *w) > +{ > + bus_for_each_dev(&cxl_bus_type, NULL, NULL, attach_device); > } > > void cxl_bus_rescan(void)
Gregory Price wrote: > On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote: > > Gregory Price wrote: > > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively: > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport); > > > > > > Simultaneously, as ports are probed, memdev endpoints can also be > > > probed. This creates a race condition, where an endpoint can perceive > > > its path to the root being broken in devm_cxl_enumerate_ports. > > > > > > The memdev/endpoint probe will see a heirarchy that may look like: > > > mem1 > > > parent => 0000:c1:00.0 > > > parent => 0000:c0:01.1 > > > parent->parent => NULL > > > > > > This results in find_cxl_port() returning NULL (since the port hasn't > > > been associated with the host bridge yet), and add_port_attach_ep > > > fails because the grandparent's grandparent is NULL. > > > > > > When the latter condition is detected, the comments suggest: > > > /* > > > * The iteration reached the topology root without finding the > > > * CXL-root 'cxl_port' on a previous iteration, fail for now to > > > * be re-probed after platform driver attaches. > > > */ > > > > > > This case results in an -ENXIO; however, a re-probe never occurs. Change > > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe. > > > > Ok, thanks for the additional debug. Like we chatted on the CXL Discord > > I think this is potentially pointing to a bug in bus_rescan_devices() > > where it checks dev->driver without holding the lock. > > > > Can you give this fix a try to see if it also resolves the issue? > > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi > > driver loads waaaay after deferred probing has given up, and if this > > works then EPROBE_DEFER can remain limited to cases where it is > > absolutely known that no other device_attach() kick is coming to save > > the day. > > > > Funny enough, not only did it not work, but now i get neither endpoint lol > > $ ls /sys/bus/cxl/devices/ > decoder0.0 decoder1.0 decoder2.0 decoder3.1 mem0 port1 port3 root0 > decoder0.1 decoder1.1 decoder3.0 decoder4.0 mem1 port2 port4 > > w/ reprobe patch > > # ls /sys/bus/cxl/devices > decoder0.0 decoder1.0 decoder2.0 decoder3.1 decoder5.0 decoder6.0 endpoint5 mem0 port1 port3 root0 > decoder0.1 decoder1.1 decoder3.0 decoder4.0 decoder5.1 decoder6.1 endpoint6 mem1 port2 port4 Such a violent result is interesting! While I would have preferred an "all fixed!" version of "interesting", making something fail reliably and completely is at least indication that the starting point was more fragile than expected. Now, I tried to get cxl_test to fail by probing memdevs asynchronously alongside the ACPI root driver. That did reveal a use-after-free bug when out-of-order shutdown causes some cleanup to be skipped, will send a separate fixup for that, but it failed to reproduce the bug you are seeing. The incremental fix here, that applies on top of the device_attach() fixup, is to make sure that all cxl_port instances registered by cxl_acpi_probe() are active before cxl_bus_rescan() runs. That can only be guaranteed when the cxl_port driver is pre-loaded. If you are running from an initial ram disk make sure cxl_port.ko is included there... -- 8< -- diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 82b78e331d8e..432b7cfd12a8 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void) /* load before dax_hmem sees 'Soft Reserved' CXL ranges */ subsys_initcall(cxl_acpi_init); + +/* + * Arrange for host-bridge ports to be active synchronous with + * cxl_acpi_probe() exit. + */ +MODULE_SOFTDEP("pre: cxl_port"); + module_exit(cxl_acpi_exit); MODULE_DESCRIPTION("CXL ACPI: Platform Support"); MODULE_LICENSE("GPL v2");
On Tue, Oct 08, 2024 at 03:37:45PM -0700, Dan Williams wrote: > Gregory Price wrote: > > On Fri, Oct 04, 2024 at 05:08:03PM -0700, Dan Williams wrote: > > > Gregory Price wrote: > > > > In cxl_acpi_probe, we add dports and uports to host bridges iteratively: > > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_dport); > > > > - bus_for_each_dev(adev->dev.bus, NULL, root_port, add_host_bridge_uport); > > > > > > > > Simultaneously, as ports are probed, memdev endpoints can also be > > > > probed. This creates a race condition, where an endpoint can perceive > > > > its path to the root being broken in devm_cxl_enumerate_ports. > > > > > > > > The memdev/endpoint probe will see a heirarchy that may look like: > > > > mem1 > > > > parent => 0000:c1:00.0 > > > > parent => 0000:c0:01.1 > > > > parent->parent => NULL > > > > > > > > This results in find_cxl_port() returning NULL (since the port hasn't > > > > been associated with the host bridge yet), and add_port_attach_ep > > > > fails because the grandparent's grandparent is NULL. > > > > > > > > When the latter condition is detected, the comments suggest: > > > > /* > > > > * The iteration reached the topology root without finding the > > > > * CXL-root 'cxl_port' on a previous iteration, fail for now to > > > > * be re-probed after platform driver attaches. > > > > */ > > > > > > > > This case results in an -ENXIO; however, a re-probe never occurs. Change > > > > this return condition to -EPROBE_DEFER to explicitly cause a reprobe. > > > > > > Ok, thanks for the additional debug. Like we chatted on the CXL Discord > > > I think this is potentially pointing to a bug in bus_rescan_devices() > > > where it checks dev->driver without holding the lock. > > > > > > Can you give this fix a try to see if it also resolves the issue? > > > Effectively, cxl_bus_rescan() is always needed in case the cxl_acpi > > > driver loads waaaay after deferred probing has given up, and if this > > > works then EPROBE_DEFER can remain limited to cases where it is > > > absolutely known that no other device_attach() kick is coming to save > > > the day. > > > > > > > Funny enough, not only did it not work, but now i get neither endpoint lol > > > > $ ls /sys/bus/cxl/devices/ > > decoder0.0 decoder1.0 decoder2.0 decoder3.1 mem0 port1 port3 root0 > > decoder0.1 decoder1.1 decoder3.0 decoder4.0 mem1 port2 port4 > > > > w/ reprobe patch > > > > # ls /sys/bus/cxl/devices > > decoder0.0 decoder1.0 decoder2.0 decoder3.1 decoder5.0 decoder6.0 endpoint5 mem0 port1 port3 root0 > > decoder0.1 decoder1.1 decoder3.0 decoder4.0 decoder5.1 decoder6.1 endpoint6 mem1 port2 port4 > > Such a violent result is interesting! While I would have preferred an > "all fixed!" version of "interesting", making something fail reliably and > completely is at least indication that the starting point was more > fragile than expected. > > Now, I tried to get cxl_test to fail by probing memdevs asynchronously > alongside the ACPI root driver. That did reveal a use-after-free bug > when out-of-order shutdown causes some cleanup to be skipped, will send > a separate fixup for that, but it failed to reproduce the bug you are > seeing. > > The incremental fix here, that applies on top of the device_attach() > fixup, is to make sure that all cxl_port instances registered by > cxl_acpi_probe() are active before cxl_bus_rescan() runs. That can only > be guaranteed when the cxl_port driver is pre-loaded. > :< unfortunately the result is... the same but also different? the worst kind of result # ls /sys/bus/cxl/devices/ decoder0.0 decoder0.2 decoder2.0 decoder3.1 mem0 port1 port3 root0 decoder0.1 decoder1.0 decoder3.0 decoder4.0 mem1 port2 port4 apparently sometimes we get decoder0.2 and sometimes we get decoder1.1 after a reboot we get it the other way # ls /sys/bus/cxl/devices/ decoder0.0 decoder0.2 decoder2.0 decoder3.1 mem0 port1 port3 root0 decoder0.1 decoder1.0 decoder3.0 decoder4.0 mem1 port2 port4 in my experimental kernel where everything "works" I get something like # ls /sys/bus/cxl/devices/ dax_region0 decoder0.0 decoder1.0 decoder3.1 decoder5.1 endpoint5 mem1 port3 region1 dax_region1 decoder0.1 decoder2.0 decoder4.0 decoder6.0 endpoint6 port1 port4 region2 dax_region2 decoder0.2 decoder3.0 decoder5.0 decoder6.1 mem0 port2 region0 root0 which does not have decoder1.1 - but i haven't confirmed whether this is consistent or not. I don't know what causes 0.N vs 1.M, maybe you do? but in the first result (past email) you can see the reprobe patch also generated decoder1.1 instead of decoder0.2 probably not related, more fun side quests! Regardless, the additional patch did not resolve the problem :< > If you are running from an initial ram disk make sure cxl_port.ko is > included there... > > -- 8< -- > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index 82b78e331d8e..432b7cfd12a8 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void) > > /* load before dax_hmem sees 'Soft Reserved' CXL ranges */ > subsys_initcall(cxl_acpi_init); > + > +/* > + * Arrange for host-bridge ports to be active synchronous with > + * cxl_acpi_probe() exit. > + */ > +MODULE_SOFTDEP("pre: cxl_port"); > + > module_exit(cxl_acpi_exit); > MODULE_DESCRIPTION("CXL ACPI: Platform Support"); > MODULE_LICENSE("GPL v2");
© 2016 - 2024 Red Hat, Inc.