When CXL subsystem adds a cxl port to a hierarchy, there is a small
window where the new port becomes visible before it is bound to a
driver. This happens because device_add() adds a device to bus device
list before bus_probe_device() binds it to a driver.
So if two cxl memdevs are trying to add a dport to a same port via
devm_cxl_enumerate_ports(), the second cxl memdev may observe the port
and attempt to add a dport, but fails because the port has not yet been
attached to cxl port driver.
the sequence is like:
CPU 0 CPU 1
devm_cxl_enumerate_ports()
# port not found, add it
add_port_attach_ep()
# hold the parent port lock
# to add the new port
devm_cxl_create_port()
device_add()
# Add dev to bus devs list
bus_add_device()
devm_cxl_enumerate_ports()
# found the port
find_cxl_port_by_uport()
# hold port lock to add a dport
device_lock(the port)
find_or_add_dport()
cxl_port_add_dport()
return -ENXIO because port->dev.driver is NULL
device_unlock(the port)
bus_probe_device()
# hold the port lock
# for attaching
device_lock(the port)
attaching the new port
device_unlock(the port)
To fix this race, require that dport addition holds the parent port lock
of the target port. The CXL subsystem already requires holding the
parent port lock while attaching a new port. Therefore, successfully
acquiring the parent port lock ganrantees that port attaching has
completed.
Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports")
Signed-off-by: Li Ming <ming.li@zohomail.com>
---
drivers/cxl/core/port.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 54f72452fb06..fef2fe913e1f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1817,8 +1817,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
/*
* RP port enumerated by cxl_acpi without dport will
* have the dport added here.
+ *
+ * Hold the parent port lock here to in case that the
+ * port can be observed but has not been attached yet.
*/
- scoped_guard(device, &port->dev) {
+ scoped_guard(device, &parent_port_of(port)->dev) {
+ guard(device)(&port->dev);
dport = find_or_add_dport(port, dport_dev);
if (IS_ERR(dport)) {
if (PTR_ERR(dport) == -EAGAIN)
--
2.43.0
Li Ming wrote: > When CXL subsystem adds a cxl port to a hierarchy, there is a small > window where the new port becomes visible before it is bound to a > driver. This happens because device_add() adds a device to bus device > list before bus_probe_device() binds it to a driver. > So if two cxl memdevs are trying to add a dport to a same port via > devm_cxl_enumerate_ports(), the second cxl memdev may observe the port > and attempt to add a dport, but fails because the port has not yet been > attached to cxl port driver. > the sequence is like: > > CPU 0 CPU 1 > devm_cxl_enumerate_ports() > # port not found, add it > add_port_attach_ep() > # hold the parent port lock > # to add the new port > devm_cxl_create_port() > device_add() > # Add dev to bus devs list > bus_add_device() > devm_cxl_enumerate_ports() > # found the port > find_cxl_port_by_uport() > # hold port lock to add a dport > device_lock(the port) > find_or_add_dport() > cxl_port_add_dport() > return -ENXIO because port->dev.driver is NULL > device_unlock(the port) > bus_probe_device() > # hold the port lock > # for attaching > device_lock(the port) > attaching the new port > device_unlock(the port) > > To fix this race, require that dport addition holds the parent port lock > of the target port. The CXL subsystem already requires holding the > parent port lock while attaching a new port. Therefore, successfully > acquiring the parent port lock ganrantees that port attaching has > completed. Are you seeing this case fail permanently? The expectation is that the one that loses the race iterates up the topology and retries. So yes, you can lose this race once, but not twice is the expectation.
From: <dan.j.williams@intel.com> To: "Li Ming"<ming.li@zohomail.com>, <dave@stgolabs.net>, <jonathan.cameron@huawei.com>, <dave.jiang@intel.com>, <alison.schofield@intel.com>, <vishal.l.verma@intel.com>, <ira.weiny@intel.com>, <dan.j.williams@intel.com> Cc: <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>, "Li Ming"<ming.li@zohomail.com> Date: Tue, 03 Feb 2026 08:07:13 +0800 Subject: Re: [PATCH 2/2] cxl/core: Hold grandparent port lock while dport adding > Li Ming wrote: > > When CXL subsystem adds a cxl port to a hierarchy, there is a small > > window where the new port becomes visible before it is bound to a > > driver. This happens because device_add() adds a device to bus device > > list before bus_probe_device() binds it to a driver. > > So if two cxl memdevs are trying to add a dport to a same port via > > devm_cxl_enumerate_ports(), the second cxl memdev may observe the port > > and attempt to add a dport, but fails because the port has not yet been > > attached to cxl port driver. > > the sequence is like: > > > > CPU 0 CPU 1 > > devm_cxl_enumerate_ports() > > # port not found, add it > > add_port_attach_ep() > > # hold the parent port lock > > # to add the new port > > devm_cxl_create_port() > > device_add() > > # Add dev to bus devs list > > bus_add_device() > > devm_cxl_enumerate_ports() > > # found the port > > find_cxl_port_by_uport() > > # hold port lock to add a dport > > device_lock(the port) > > find_or_add_dport() > > cxl_port_add_dport() > > return -ENXIO because port->dev.driver is NULL > > device_unlock(the port) > > bus_probe_device() > > # hold the port lock > > # for attaching > > device_lock(the port) > > attaching the new port > > device_unlock(the port) > > > > To fix this race, require that dport addition holds the parent port lock > > of the target port. The CXL subsystem already requires holding the > > parent port lock while attaching a new port. Therefore, successfully > > acquiring the parent port lock ganrantees that port attaching has > > completed. > > Are you seeing this case fail permanently? The expectation is that the > one that loses the race iterates up the topology and retries. > > So yes, you can lose this race once, but not twice is the expectation. > Hi Dan, My understanding is that would not trigger enumeration retry, because enumerating ports flow retries the enumeration only when find_or_add_dport() returns a -EAGAIN. but the port's driver checking in cxl_port_add_dport() returns a -ENXIO, so it makes devm_cxl_enumerate_ports() failure directly. Ming
Li Ming wrote: [..] > > > > > > To fix this race, require that dport addition holds the parent port lock > > > of the target port. The CXL subsystem already requires holding the > > > parent port lock while attaching a new port. Therefore, successfully > > > acquiring the parent port lock ganrantees that port attaching has > > > completed. > > > > Are you seeing this case fail permanently? The expectation is that the > > one that loses the race iterates up the topology and retries. > > > > So yes, you can lose this race once, but not twice is the expectation. > > > Hi Dan, > > My understanding is that would not trigger enumeration retry, because > enumerating ports flow retries the enumeration only when > find_or_add_dport() returns a -EAGAIN. but the port's driver checking > in cxl_port_add_dport() returns a -ENXIO, so it makes > devm_cxl_enumerate_ports() failure directly. Ah, true, my mental model was still stuck in the old top-down dport enumeration scheme. So, yes, we do need to make sure that switch port creation does not race port lookup. However, I think the scoped_guard() tends to make code less readable and in this case hides the opportunity for more comments to explain what is happening. I also think, per that observation from Jonathan, that we can save the cxl_bus_resan() violence by taking the CXL platform device lock. So, please move the locking internal to find_or_add_dport(), so that plain guard() can be used. Add comments for the fact that devm_cxl_create_port() and the CXL platform init path need to be flushed by taking the device lock. And explain why the device to lock is different dependening on whether the parent_port is the cxl_root or a descendant port.
From: <dan.j.williams@intel.com> To: "Li Ming"<ming.li@zohomail.com>, "danjwilliams"<dan.j.williams@intel.com> Cc: "dave"<dave@stgolabs.net>, "jonathan.cameron"<jonathan.cameron@huawei.com>, "dave.jiang"<dave.jiang@intel.com>, "alison.schofield"<alison.schofield@intel.com>, "vishal.l.verma"<vishal.l.verma@intel.com>, "ira.weiny"<ira.weiny@intel.com>, "linux-cxl"<linux-cxl@vger.kernel.org>, "linux-kernel"<linux-kernel@vger.kernel.org> Date: Wed, 04 Feb 2026 06:25:00 +0800 Subject: Re: [PATCH 2/2] cxl/core: Hold grandparent port lock while dport adding > Li Ming wrote: > [..] > > > > > > > > To fix this race, require that dport addition holds the parent port lock > > > > of the target port. The CXL subsystem already requires holding the > > > > parent port lock while attaching a new port. Therefore, successfully > > > > acquiring the parent port lock ganrantees that port attaching has > > > > completed. > > > > > > Are you seeing this case fail permanently? The expectation is that the > > > one that loses the race iterates up the topology and retries. > > > > > > So yes, you can lose this race once, but not twice is the expectation. > > > > > Hi Dan, > > > > My understanding is that would not trigger enumeration retry, because > > enumerating ports flow retries the enumeration only when > > find_or_add_dport() returns a -EAGAIN. but the port's driver checking > > in cxl_port_add_dport() returns a -ENXIO, so it makes > > devm_cxl_enumerate_ports() failure directly. > > Ah, true, my mental model was still stuck in the old top-down dport > enumeration scheme. > > So, yes, we do need to make sure that switch port creation does not race > port lookup. However, I think the scoped_guard() tends to make code less > readable and in this case hides the opportunity for more comments to > explain what is happening. > > I also think, per that observation from Jonathan, that we can save the > cxl_bus_resan() violence by taking the CXL platform device lock. > > So, please move the locking internal to find_or_add_dport(), so that > plain guard() can be used. Add comments for the fact that > devm_cxl_create_port() and the CXL platform init path need to be flushed > by taking the device lock. And explain why the device to lock is > different dependening on whether the parent_port is the cxl_root or a > descendant port. > > Got it, will do that in V2, also remove PATCH #1 as you mentioned. Ming
On Sun, Feb 01, 2026 at 05:30:02PM +0800, Li Ming wrote: > When CXL subsystem adds a cxl port to a hierarchy, there is a small > window where the new port becomes visible before it is bound to a > driver. This happens because device_add() adds a device to bus device > list before bus_probe_device() binds it to a driver. > So if two cxl memdevs are trying to add a dport to a same port via > devm_cxl_enumerate_ports(), the second cxl memdev may observe the port > and attempt to add a dport, but fails because the port has not yet been > attached to cxl port driver. > the sequence is like: > > CPU 0 CPU 1 > devm_cxl_enumerate_ports() > # port not found, add it > add_port_attach_ep() > # hold the parent port lock > # to add the new port > devm_cxl_create_port() > device_add() > # Add dev to bus devs list > bus_add_device() > devm_cxl_enumerate_ports() > # found the port > find_cxl_port_by_uport() > # hold port lock to add a dport > device_lock(the port) > find_or_add_dport() > cxl_port_add_dport() > return -ENXIO because port->dev.driver is NULL > device_unlock(the port) > bus_probe_device() > # hold the port lock > # for attaching > device_lock(the port) > attaching the new port > device_unlock(the port) > > To fix this race, require that dport addition holds the parent port lock > of the target port. The CXL subsystem already requires holding the > parent port lock while attaching a new port. Therefore, successfully > acquiring the parent port lock ganrantees that port attaching has > completed. > With just a a cursory look, I'm immediately concerned that you're fixing a race condition with a lock inversion. Can you guarantee the following is not happening Thread A Thread B ---------------------------- lock(parent) lock(port) lock(port) lock(parent) ~Gregory
From: Gregory Price <gourry@gourry.net> To: "Li Ming"<ming.li@zohomail.com> Cc: <dave@stgolabs.net>, <jonathan.cameron@huawei.com>, <dave.jiang@intel.com>, <alison.schofield@intel.com>, <vishal.l.verma@intel.com>, <ira.weiny@intel.com>, <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org> Date: Tue, 03 Feb 2026 00:31:45 +0800 Subject: Re: [PATCH 2/2] cxl/core: Hold grandparent port lock while dport adding > On Sun, Feb 01, 2026 at 05:30:02PM +0800, Li Ming wrote: > > When CXL subsystem adds a cxl port to a hierarchy, there is a small > > window where the new port becomes visible before it is bound to a > > driver. This happens because device_add() adds a device to bus device > > list before bus_probe_device() binds it to a driver. > > So if two cxl memdevs are trying to add a dport to a same port via > > devm_cxl_enumerate_ports(), the second cxl memdev may observe the port > > and attempt to add a dport, but fails because the port has not yet been > > attached to cxl port driver. > > the sequence is like: > > > > CPU 0 CPU 1 > > devm_cxl_enumerate_ports() > > # port not found, add it > > add_port_attach_ep() > > # hold the parent port lock > > # to add the new port > > devm_cxl_create_port() > > device_add() > > # Add dev to bus devs list > > bus_add_device() > > devm_cxl_enumerate_ports() > > # found the port > > find_cxl_port_by_uport() > > # hold port lock to add a dport > > device_lock(the port) > > find_or_add_dport() > > cxl_port_add_dport() > > return -ENXIO because port->dev.driver is NULL > > device_unlock(the port) > > bus_probe_device() > > # hold the port lock > > # for attaching > > device_lock(the port) > > attaching the new port > > device_unlock(the port) > > > > To fix this race, require that dport addition holds the parent port lock > > of the target port. The CXL subsystem already requires holding the > > parent port lock while attaching a new port. Therefore, successfully > > acquiring the parent port lock ganrantees that port attaching has > > completed. > > > > With just a a cursory look, I'm immediately concerned that you're fixing > a race condition with a lock inversion. > > Can you guarantee the following is not happening > > Thread A Thread B > ---------------------------- > lock(parent) lock(port) > lock(port) lock(parent) > > ~Gregory > Hi Gregory, I think no other scenario where driver needs to hold a child port lock together with its parent port lock than during a new port attaching or a port removal. After re-reading a new port attaching and a port removal flows, I believe both operations acquire the parent port first before acquiring a child port lock. So I think the thread B case would not happen. If I miss something please correct me. Thanks Ming
On Sun, 1 Feb 2026 17:30:02 +0800
Li Ming <ming.li@zohomail.com> wrote:
> When CXL subsystem adds a cxl port to a hierarchy, there is a small
> window where the new port becomes visible before it is bound to a
> driver. This happens because device_add() adds a device to bus device
> list before bus_probe_device() binds it to a driver.
> So if two cxl memdevs are trying to add a dport to a same port via
> devm_cxl_enumerate_ports(), the second cxl memdev may observe the port
> and attempt to add a dport, but fails because the port has not yet been
> attached to cxl port driver.
> the sequence is like:
>
> CPU 0 CPU 1
> devm_cxl_enumerate_ports()
> # port not found, add it
> add_port_attach_ep()
> # hold the parent port lock
> # to add the new port
> devm_cxl_create_port()
> device_add()
> # Add dev to bus devs list
> bus_add_device()
> devm_cxl_enumerate_ports()
> # found the port
Indenting not consistent here as this call is in devm_cxl_enumerate_ports()
> find_cxl_port_by_uport()
> # hold port lock to add a dport
> device_lock(the port)
> find_or_add_dport()
> cxl_port_add_dport()
> return -ENXIO because port->dev.driver is NULL
> device_unlock(the port)
> bus_probe_device()
> # hold the port lock
> # for attaching
> device_lock(the port)
> attaching the new port
> device_unlock(the port)
>
> To fix this race, require that dport addition holds the parent port lock
> of the target port. The CXL subsystem already requires holding the
> parent port lock while attaching a new port. Therefore, successfully
> acquiring the parent port lock ganrantees that port attaching has
Spell check. Guarantees
> completed.
>
> Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports")
> Signed-off-by: Li Ming <ming.li@zohomail.com>
Analysis looks reasonable to me, but I'm not hugely confident on this
one so would like others to take a close look as well.
Question inline.
> ---
> drivers/cxl/core/port.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 54f72452fb06..fef2fe913e1f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1817,8 +1817,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> /*
> * RP port enumerated by cxl_acpi without dport will
> * have the dport added here.
> + *
> + * Hold the parent port lock here to in case that the
> + * port can be observed but has not been attached yet.
> */
> - scoped_guard(device, &port->dev) {
> + scoped_guard(device, &parent_port_of(port)->dev) {
I'm nervous about whether this is the right lock. For unregister_port()
(which is easier to track down that the add path locking) the lock
taken depends on where the port is that is being unregistered.
Specifically root ports are unregistered under parent->uport_dev, not
parent->dev.
> + guard(device)(&port->dev);
> dport = find_or_add_dport(port, dport_dev);
> if (IS_ERR(dport)) {
> if (PTR_ERR(dport) == -EAGAIN)
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: "Li Ming"<ming.li@zohomail.com>
Cc: <dave@stgolabs.net>, <dave.jiang@intel.com>, <alison.schofield@intel.com>, <vishal.l.verma@intel.com>, <ira.weiny@intel.com>, <dan.j.williams@intel.com>, <linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Date: Mon, 02 Feb 2026 23:39:24 +0800
Subject: Re: [PATCH 2/2] cxl/core: Hold grandparent port lock while dport adding
> On Sun, 1 Feb 2026 17:30:02 +0800
> Li Ming <ming.li@zohomail.com> wrote:
>
> > When CXL subsystem adds a cxl port to a hierarchy, there is a small
> > window where the new port becomes visible before it is bound to a
> > driver. This happens because device_add() adds a device to bus device
> > list before bus_probe_device() binds it to a driver.
> > So if two cxl memdevs are trying to add a dport to a same port via
> > devm_cxl_enumerate_ports(), the second cxl memdev may observe the port
> > and attempt to add a dport, but fails because the port has not yet been
> > attached to cxl port driver.
> > the sequence is like:
> >
> > CPU 0 CPU 1
> > devm_cxl_enumerate_ports()
> > # port not found, add it
> > add_port_attach_ep()
> > # hold the parent port lock
> > # to add the new port
> > devm_cxl_create_port()
> > device_add()
> > # Add dev to bus devs list
> > bus_add_device()
> > devm_cxl_enumerate_ports()
> > # found the port
>
> Indenting not consistent here as this call is in devm_cxl_enumerate_ports()
Thanks for that, will fix it in next version.
>
> > find_cxl_port_by_uport()
> > # hold port lock to add a dport
> > device_lock(the port)
> > find_or_add_dport()
> > cxl_port_add_dport()
> > return -ENXIO because port->dev.driver is NULL
> > device_unlock(the port)
> > bus_probe_device()
> > # hold the port lock
> > # for attaching
> > device_lock(the port)
> > attaching the new port
> > device_unlock(the port)
> >
> > To fix this race, require that dport addition holds the parent port lock
> > of the target port. The CXL subsystem already requires holding the
> > parent port lock while attaching a new port. Therefore, successfully
> > acquiring the parent port lock ganrantees that port attaching has
>
> Spell check. Guarantees
Got it.
>
> > completed.
> >
> > Fixes: 4f06d81e7c6a ("cxl: Defer dport allocation for switch ports")
> > Signed-off-by: Li Ming <ming.li@zohomail.com>
>
> Analysis looks reasonable to me, but I'm not hugely confident on this
> one so would like others to take a close look as well.
> Question inline.
>
>
> > ---
> > drivers/cxl/core/port.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> > index 54f72452fb06..fef2fe913e1f 100644
> > --- a/drivers/cxl/core/port.c
> > +++ b/drivers/cxl/core/port.c
> > @@ -1817,8 +1817,12 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd)
> > /*
> > * RP port enumerated by cxl_acpi without dport will
> > * have the dport added here.
> > + *
> > + * Hold the parent port lock here to in case that the
> > + * port can be observed but has not been attached yet.
> > */
> > - scoped_guard(device, &port->dev) {
> > + scoped_guard(device, &parent_port_of(port)->dev) {
> I'm nervous about whether this is the right lock. For unregister_port()
> (which is easier to track down that the add path locking) the lock
> taken depends on where the port is that is being unregistered.
> Specifically root ports are unregistered under parent->uport_dev, not
> parent->dev.
You are right.
When cxl acpi driver attempts to add a cxl host bridge port, it will hold cxl_root->uport_dev.
Otherwide, hold parent_port->dev lock for the new port addition.
So I think it is possible that memdev can observe a cxl host bridge port but it has not been attach yet.
Maybe I should hold the lock of the new port's host.
Let's see if other reviewers have more comments on that.
Ming
>
> > + guard(device)(&port->dev);
> > dport = find_or_add_dport(port, dport_dev);
> > if (IS_ERR(dport)) {
> > if (PTR_ERR(dport) == -EAGAIN)
>
>
Li Ming wrote:
[..]
> > > - scoped_guard(device, &port->dev) {
> > > + scoped_guard(device, &parent_port_of(port)->dev) {
> > I'm nervous about whether this is the right lock. For unregister_port()
> > (which is easier to track down that the add path locking) the lock
> > taken depends on where the port is that is being unregistered.
> > Specifically root ports are unregistered under parent->uport_dev, not
> > parent->dev.
> You are right.
> When cxl acpi driver attempts to add a cxl host bridge port, it will hold cxl_root->uport_dev.
> Otherwide, hold parent_port->dev lock for the new port addition.
> So I think it is possible that memdev can observe a cxl host bridge port but it has not been attach yet.
> Maybe I should hold the lock of the new port's host.
> Let's see if other reviewers have more comments on that.
So, I think current code is ok, if cxl_mem_probe() races
cxl_acpi_probe() in the same way that case is already caught by the
cxl_bus_rescan() at the end of cxl_acpi_probe().
© 2016 - 2026 Red Hat, Inc.