[PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.

Sungwoo Kim posted 2 patches 1 month, 3 weeks ago
drivers/cxl/core/port.c   |  6 +++++
drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
drivers/cxl/cxl.h         |  9 ++++++++
3 files changed, 57 insertions(+), 5 deletions(-)
[PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Sungwoo Kim 1 month, 3 weeks ago
This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].

Overview
========
This patch series fixes race conditions in cxl region unregistration.

devm_release_action() should be called once, otherwise, it warns about
the second call. However, the current implementation has a race condition
that allows multiple calls to devm_release_action(). The details are in
each patch.

To fix these, the first patch adds a new function that guarantees that
devm_release_action() is called only once.
Using this function, the second patch subsitutes the current use of 
devm_release_action() in cxl region with the new function.

Change in v3
============
Addressed Dave's comments:
- Split and made this in a patch series.
- Enhanced a commit log explaining why a workqueue is used in the first patch.
- Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
Sashiko AI review fixes:
- Fixed construct_region() as it also can race.
- Used a driver's wq instead of system wq so unbinding can drain a work.

[1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim

Earlier approach:
v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/

Sungwoo Kim (2):
  cxl/region: serialize devm action removal via scheduled work
  cxl/region: Fix a race bug in delete_region_store()

 drivers/cxl/core/port.c   |  6 +++++
 drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
 drivers/cxl/cxl.h         |  9 ++++++++
 3 files changed, 57 insertions(+), 5 deletions(-)

-- 
2.47.3
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Dave Jiang 1 month, 2 weeks ago

On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> 
> Overview
> ========
> This patch series fixes race conditions in cxl region unregistration.
> 
> devm_release_action() should be called once, otherwise, it warns about
> the second call. However, the current implementation has a race condition
> that allows multiple calls to devm_release_action(). The details are in
> each patch.
> 
> To fix these, the first patch adds a new function that guarantees that
> devm_release_action() is called only once.
> Using this function, the second patch subsitutes the current use of 
> devm_release_action() in cxl region with the new function.
> 
> Change in v3
> ============
> Addressed Dave's comments:
> - Split and made this in a patch series.
> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> Sashiko AI review fixes:
> - Fixed construct_region() as it also can race.
> - Used a driver's wq instead of system wq so unbinding can drain a work.

A few issues Sashiko raised with the v3.
https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim

DJ


> 
> [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
> 
> Earlier approach:
> v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
> 
> Sungwoo Kim (2):
>   cxl/region: serialize devm action removal via scheduled work
>   cxl/region: Fix a race bug in delete_region_store()
> 
>  drivers/cxl/core/port.c   |  6 +++++
>  drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
>  drivers/cxl/cxl.h         |  9 ++++++++
>  3 files changed, 57 insertions(+), 5 deletions(-)
>
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Sungwoo Kim 1 month, 2 weeks ago
On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> > This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> >
> > Overview
> > ========
> > This patch series fixes race conditions in cxl region unregistration.
> >
> > devm_release_action() should be called once, otherwise, it warns about
> > the second call. However, the current implementation has a race condition
> > that allows multiple calls to devm_release_action(). The details are in
> > each patch.
> >
> > To fix these, the first patch adds a new function that guarantees that
> > devm_release_action() is called only once.
> > Using this function, the second patch subsitutes the current use of
> > devm_release_action() in cxl region with the new function.
> >
> > Change in v3
> > ============
> > Addressed Dave's comments:
> > - Split and made this in a patch series.
> > - Enhanced a commit log explaining why a workqueue is used in the first patch.
> > - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> > Sashiko AI review fixes:
> > - Fixed construct_region() as it also can race.
> > - Used a driver's wq instead of system wq so unbinding can drain a work.
>
> A few issues Sashiko raised with the v3.
> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim

A Sashiko's reasoning makes sense to me. In the middle of
construct_region(), users can perform a sysfs write, which then calls
unregister_region().
unregister_region() must not be called before construct_region() has
completed, since it calls device_del() and put_device().

This isn't introduced by this patch, but I can fix it in v4.

An enable/disable flag to allow sysfs write might fix this, like:

static struct cxl_region *construct_region(...)
{
... // when a construction is done
  set_bit(CXL_REGION_F_ENABLE_SYSFS);
  return cxlr;
}

This prevents a sysfs write before it's done:

static ssize_t delete_region_store(...)
{
  if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
    return -EBUSY;
  }
...
}

Would there be a better solution? I'd like to ask for comments on this.
This patch series already introduces two new flags, and I'm concerned
about adding another.

Thank you!
Sungwoo.

>
> DJ
>
>
> >
> > [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
> >
> > Earlier approach:
> > v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> > v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
> >
> > Sungwoo Kim (2):
> >   cxl/region: serialize devm action removal via scheduled work
> >   cxl/region: Fix a race bug in delete_region_store()
> >
> >  drivers/cxl/core/port.c   |  6 +++++
> >  drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
> >  drivers/cxl/cxl.h         |  9 ++++++++
> >  3 files changed, 57 insertions(+), 5 deletions(-)
> >
>
>
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Dave Jiang 1 month, 2 weeks ago

On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
>>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
>>>
>>> Overview
>>> ========
>>> This patch series fixes race conditions in cxl region unregistration.
>>>
>>> devm_release_action() should be called once, otherwise, it warns about
>>> the second call. However, the current implementation has a race condition
>>> that allows multiple calls to devm_release_action(). The details are in
>>> each patch.
>>>
>>> To fix these, the first patch adds a new function that guarantees that
>>> devm_release_action() is called only once.
>>> Using this function, the second patch subsitutes the current use of
>>> devm_release_action() in cxl region with the new function.
>>>
>>> Change in v3
>>> ============
>>> Addressed Dave's comments:
>>> - Split and made this in a patch series.
>>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
>>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
>>> Sashiko AI review fixes:
>>> - Fixed construct_region() as it also can race.
>>> - Used a driver's wq instead of system wq so unbinding can drain a work.
>>
>> A few issues Sashiko raised with the v3.
>> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
> 
> A Sashiko's reasoning makes sense to me. In the middle of
> construct_region(), users can perform a sysfs write, which then calls
> unregister_region().
> unregister_region() must not be called before construct_region() has
> completed, since it calls device_del() and put_device().
> 
> This isn't introduced by this patch, but I can fix it in v4.
> 
> An enable/disable flag to allow sysfs write might fix this, like:
> 
> static struct cxl_region *construct_region(...)
> {
> ... // when a construction is done
>   set_bit(CXL_REGION_F_ENABLE_SYSFS);
>   return cxlr;
> }
> 
> This prevents a sysfs write before it's done:
> 
> static ssize_t delete_region_store(...)
> {
>   if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
>     return -EBUSY;
>   }
> ...
> }
> 
> Would there be a better solution? I'd like to ask for comments on this.
> This patch series already introduces two new flags, and I'm concerned
> about adding another.

Sungwoo,
I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?

---

    Race 1: Concurrent delete_region_store() calls
    ===============================================
    CPU 0                                  CPU 1
    delete_region_store()                  delete_region_store()
      cxl_find_region_by_name()              cxl_find_region_by_name()
      devm_release_action()                  devm_release_action()
        unregister_region()                    WARN_ON(-ENOENT)
    
    The second call fails because the devm action was already removed.
    
    Race 2: Concurrent delete and driver unbind
    ============================================
    CPU 0                                  CPU 1
    delete_region_store()                  driver unbind
      cxl_find_region_by_name()              devres_release_all()
      devm_release_action()                    unregister_region()
        unregister_region()                    // cleanup done
        // cleanup done again!
                                             WARN_ON(-ENOENT)
    
    Race 3: Use-after-free during construction
    ===========================================
    CPU 0: __construct_region()            CPU 1: delete_region_store()
    device_add(&cxlr->dev)
      // Region visible in sysfs!
                                           cxl_find_region_by_name()
                                             // SUCCESS!
    __construct_region(cxlr)
      p = &cxlr->params
      // p->interleave_ways still 0
                                           unregister_region(cxlr)
                                             device_del()
                                             for (i = 0; i < 0; i++)
                                               // LOOP SKIPPED!
                                             put_device(&cxlr->dev)
                                             // refcount -> 0
                                           // cxlr FREED!
    
      p->interleave_ways = ctx->...        // USE-AFTER-FREE!
      p->state = INTERLEAVE_ACTIVE         // USE-AFTER-FREE!
    
    The Solution
    ============
    
    1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
       Multiple concurrent calls will have one do the cleanup and others
       return early safely.
    
    2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
       - Call unregister_region() first (idempotent cleanup)
       - Then remove devm action with _nowarn variant
       - -ENOENT is benign in all race scenarios:
         * devres_release_all() already handled it
         * Driver unbind/rebind race (old action gone, new devres list)
    
    3. Protect params read in unregister_region() with lock
       Use scoped_guard(rwsem_read) to safely read interleave_ways even
       during concurrent construction. If state < INTERLEAVE_ACTIVE, the
       region is still initializing and ways=0 is correct.
    
    4. Add memory barriers for state transitions
       Use smp_store_release() in __construct_region() and smp_load_acquire()
       in unregister_region() to ensure proper ordering of initialization.
    
    5. Hold extra reference during construct_region()
       Prevents premature freeing if unregister races during initialization.
    
    6. Check unregister flag in __construct_region()
       Abort construction early if delete_region_store() raced after
       device_add() but before acquiring the write lock.
    
    This approach is simpler and more correct than async workers because:
    - Relies on fundamental kernel primitives (idempotency + locks)
    - No TOCTOU races with driver binding state
    - Fewer lines of code
    - Handles all race scenarios correctly
    
    Race Scenario Analysis:
    
    | Scenario                  | Outcome                              |
    |---------------------------|--------------------------------------|
    | Normal delete             | Cleanup done, action removed         |
    | Concurrent deletes        | One cleans up, others return early   |
    | Delete + driver unbind    | One cleans up, other gets -ENOENT    |
    | Delete + unbind/rebind    | Old action handled, -ENOENT benign   |
    | Delete during construct   | Extra ref prevents UAF, lock protects|
    
---

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..8474756913e6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
 {
 	struct cxl_region *cxlr = _cxlr;
 	struct cxl_region_params *p = &cxlr->params;
-	int i;
+	int i, ways;
+
+	/*
+	 * Idempotency: prevent multiple concurrent calls to unregister_region().
+	 * This can happen if user calls delete_region_store() while driver
+	 * unbind triggers devres_release_all() concurrently.
+	 */
+	if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
+		return;
 
 	device_del(&cxlr->dev);
 
 	/*
-	 * Now that region sysfs is shutdown, the parameter block is now
-	 * read-only, so no need to hold the region rwsem to access the
-	 * region parameters.
+	 * Even though sysfs is shutdown, the region may still be initializing
+	 * if construct_region() is running concurrently. We must safely read
+	 * interleave_ways under lock to avoid reading uninitialized values.
+	 *
+	 * Use a read lock (not write) to allow concurrent operations if needed.
+	 * Read the ways count under lock, then release before the loop to
+	 * minimize lock hold time.
+	 *
+	 * Memory ordering: use smp_load_acquire() to ensure we see the fully
+	 * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
 	 */
-	for (i = 0; i < p->interleave_ways; i++)
+	scoped_guard(rwsem_read, &cxl_rwsem.region) {
+		/*
+		 * Pairs with smp_store_release() in __construct_region().
+		 * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
+		 * the initialized interleave_ways value.
+		 */
+		enum cxl_config_state state = smp_load_acquire(&p->state);
+
+		if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+			ways = READ_ONCE(p->interleave_ways);
+		else
+			ways = 0; /* Still initializing, nothing to detach */
+	}
+
+	for (i = 0; i < ways; i++)
 		detach_target(cxlr, i);
 
 	cxlr->hpa_range = DEFINE_RANGE(0, -1);
@@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_port *port = to_cxl_port(dev->parent);
 	struct cxl_region *cxlr;
+	int rc;
 
 	cxlr = cxl_find_region_by_name(cxlrd, buf);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
 
-	devm_release_action(port->uport_dev, unregister_region, cxlr);
+	/*
+	 * Call unregister_region() first to do the cleanup. This is
+	 * idempotent (CXL_REGION_F_UNREGISTER flag), so if
+	 * devres_release_all() calls it concurrently, one will do the
+	 * work and the other will return early.
+	 */
+	unregister_region(cxlr);
+
+	/*
+	 * Now remove the devm action to prevent devres_release_all()
+	 * from calling unregister_region() again during driver unbind.
+	 *
+	 * Three possible outcomes:
+	 *
+	 * 1. Success: We removed the action before driver unbind.
+	 *    Driver unbind won't call unregister_region().
+	 *
+	 * 2. -ENOENT: devres_release_all() already removed the action
+	 *    and called unregister_region() (which returned early due
+	 *    to F_UNREGISTER flag). This is benign.
+	 *
+	 * 3. -ENOENT: Driver was unbound and rebound between
+	 *    unregister_region() above and here. The old action was
+	 *    already handled by the old unbind. The new binding has
+	 *    no action for this region. This is benign.
+	 *
+	 * Use _nowarn variant since -ENOENT is expected and benign.
+	 * The devres_lock serializes access, so we can't corrupt the
+	 * devres list.
+	 */
+	rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+	if (rc && rc != -ENOENT)
+		dev_warn(&cxlr->dev,
+			 "Unexpected error removing devm action: %d\n", rc);
+
 	put_device(&cxlr->dev);
-
 	return len;
 }
 DEVICE_ATTR_WO(delete_region);
@@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
 		return -EBUSY;
 	}
 
+	/*
+	 * Check if region is being unregistered concurrently. This can happen
+	 * if user triggers delete_region_store() right after device_add() but
+	 * before we acquire the write lock above.
+	 */
+	if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
+		dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
+		return -EBUSY;
+	}
+
 	set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
 	cxlr->hpa_range = *hpa_range;
 
@@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
 	p->res = res;
 	p->interleave_ways = ctx->interleave_ways;
 	p->interleave_granularity = ctx->interleave_granularity;
-	p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
+
+	/*
+	 * Use smp_store_release() to ensure all initialization above is
+	 * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
+	 * smp_load_acquire() in unregister_region().
+	 *
+	 * This ensures that if unregister_region() sees state >=
+	 * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
+	 */
+	smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
 
 	rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
 	if (rc)
@@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 		return cxlr;
 	}
 
+	/*
+	 * Hold an extra reference during initialization to prevent
+	 * unregister_region() from freeing the object if delete_region_store()
+	 * races with __construct_region(). The device_initialize() in
+	 * cxl_region_alloc() set refcount=1, and device_add() above made it
+	 * visible, so a concurrent unregister could drop that reference while
+	 * we're still initializing.
+	 */
+	get_device(&cxlr->dev);
+
 	rc = __construct_region(cxlr, ctx);
 	if (rc) {
-		devm_release_action(port->uport_dev, unregister_region, cxlr);
+		/*
+		 * Construction failed. Unregister the partially initialized
+		 * region and remove the devm action. Both are safe to call
+		 * due to idempotency and proper locking.
+		 */
+		unregister_region(cxlr);
+		devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
+
+		put_device(&cxlr->dev); /* Drop construction reference */
 		return ERR_PTR(rc);
 	}
 
+	put_device(&cxlr->dev); /* Drop construction reference */
 	return cxlr;
 }
 
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..81952f0763e1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -447,6 +447,13 @@ struct cxl_region_params {
  */
 #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
 
+/*
+ * Indicate that this region is being unregistered. Used to make
+ * unregister_region() idempotent and prevent race conditions between
+ * delete_region_store() and devres_release_all().
+ */
+#define CXL_REGION_F_UNREGISTER 4
+
 /**
  * struct cxl_region - CXL region
  * @dev: This region's device
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Sungwoo Kim 1 month, 2 weeks ago
Dear Dave, thank you for sharing the patch that doesn't use the workqueue.

Claude suggests not using wq, since it's simpler. I agree that it's
simple, but it's overly tailored to fix a specific bug.
Actually, v1[1] proposed a similar patch. So let me bring a patch and
discussion from v1:

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 08fa3deef70ab..7ade9aa2aeecc 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
  struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
  struct cxl_port *port = to_cxl_port(dev->parent);
  struct cxl_region *cxlr;
+ int err;

  cxlr = cxl_find_region_by_name(cxlrd, buf);
  if (IS_ERR(cxlr))
  return PTR_ERR(cxlr);

- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
+       cxlr);
+ if (err) {
+ put_device(&cxlr->dev);
+ return err;
+ }
+ unregister_region(cxlr);
  put_device(&cxlr->dev);

  return len;

However, v1 has not been merged. Dan[2] commented that "No, that is
not an acceptable or comprehensive fix. A subsystem should never try
to double unregister a device." Also in the following thread[3], "The
patch was technically correct but it relies on a design that requires
depending on a double free semantic."

I respect this design decision. Then, I need to execute
devm_release_[action|all]() only once, which requires a device lock,
guard(device)(port->uport_dev). Under a lock, I can check a flag to
execute devm_release_[action|all]() only once.
To use the lock, a clean work without a prior lock is required. That's
a reason this patch ended up in wq.

I hope I've explained the rationale for using wq. What do you think?

Thank you!
Sungwoo.

[1] https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/
[2] https://lore.kernel.org/linux-cxl/69b0a0f8bfb0b_213210026@dwillia2-mobl4.notmuch/
[3] https://lore.kernel.org/linux-cxl/69cc961ef12e8_17890410036@dwillia2-mobl4.notmuch/

On Tue, Apr 28, 2026 at 3:05 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/27/26 10:42 PM, Sungwoo Kim wrote:
> > On Mon, Apr 27, 2026 at 1:42 PM Dave Jiang <dave.jiang@intel.com> wrote:
> >>
> >>
> >>
> >> On 4/26/26 8:20 PM, Sungwoo Kim wrote:
> >>> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> >>>
> >>> Overview
> >>> ========
> >>> This patch series fixes race conditions in cxl region unregistration.
> >>>
> >>> devm_release_action() should be called once, otherwise, it warns about
> >>> the second call. However, the current implementation has a race condition
> >>> that allows multiple calls to devm_release_action(). The details are in
> >>> each patch.
> >>>
> >>> To fix these, the first patch adds a new function that guarantees that
> >>> devm_release_action() is called only once.
> >>> Using this function, the second patch subsitutes the current use of
> >>> devm_release_action() in cxl region with the new function.
> >>>
> >>> Change in v3
> >>> ============
> >>> Addressed Dave's comments:
> >>> - Split and made this in a patch series.
> >>> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> >>> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> >>> Sashiko AI review fixes:
> >>> - Fixed construct_region() as it also can race.
> >>> - Used a driver's wq instead of system wq so unbinding can drain a work.
> >>
> >> A few issues Sashiko raised with the v3.
> >> https://sashiko.dev/#/patchset/20260427032010.916681-2-iam%40sung-woo.kim
> >
> > A Sashiko's reasoning makes sense to me. In the middle of
> > construct_region(), users can perform a sysfs write, which then calls
> > unregister_region().
> > unregister_region() must not be called before construct_region() has
> > completed, since it calls device_del() and put_device().
> >
> > This isn't introduced by this patch, but I can fix it in v4.
> >
> > An enable/disable flag to allow sysfs write might fix this, like:
> >
> > static struct cxl_region *construct_region(...)
> > {
> > ... // when a construction is done
> >   set_bit(CXL_REGION_F_ENABLE_SYSFS);
> >   return cxlr;
> > }
> >
> > This prevents a sysfs write before it's done:
> >
> > static ssize_t delete_region_store(...)
> > {
> >   if (!test_bit(CXL_REGION_F_ENABLE_SYSFS)) {
> >     return -EBUSY;
> >   }
> > ...
> > }
> >
> > Would there be a better solution? I'd like to ask for comments on this.
> > This patch series already introduces two new flags, and I'm concerned
> > about adding another.
>
> Sungwoo,
> I looked at the sachiko raised issues and looked at the current patches. I threw the sachiko objections and the current patches at Claude and it says the workqueue change made this more complicated and proposed this solution. Seems reasonable. What do you think?
>
> ---
>
>     Race 1: Concurrent delete_region_store() calls
>     ===============================================
>     CPU 0                                  CPU 1
>     delete_region_store()                  delete_region_store()
>       cxl_find_region_by_name()              cxl_find_region_by_name()
>       devm_release_action()                  devm_release_action()
>         unregister_region()                    WARN_ON(-ENOENT)
>
>     The second call fails because the devm action was already removed.
>
>     Race 2: Concurrent delete and driver unbind
>     ============================================
>     CPU 0                                  CPU 1
>     delete_region_store()                  driver unbind
>       cxl_find_region_by_name()              devres_release_all()
>       devm_release_action()                    unregister_region()
>         unregister_region()                    // cleanup done
>         // cleanup done again!
>                                              WARN_ON(-ENOENT)
>
>     Race 3: Use-after-free during construction
>     ===========================================
>     CPU 0: __construct_region()            CPU 1: delete_region_store()
>     device_add(&cxlr->dev)
>       // Region visible in sysfs!
>                                            cxl_find_region_by_name()
>                                              // SUCCESS!
>     __construct_region(cxlr)
>       p = &cxlr->params
>       // p->interleave_ways still 0
>                                            unregister_region(cxlr)
>                                              device_del()
>                                              for (i = 0; i < 0; i++)
>                                                // LOOP SKIPPED!
>                                              put_device(&cxlr->dev)
>                                              // refcount -> 0
>                                            // cxlr FREED!
>
>       p->interleave_ways = ctx->...        // USE-AFTER-FREE!
>       p->state = INTERLEAVE_ACTIVE         // USE-AFTER-FREE!
>
>     The Solution
>     ============
>
>     1. Make unregister_region() idempotent using CXL_REGION_F_UNREGISTER flag
>        Multiple concurrent calls will have one do the cleanup and others
>        return early safely.
>
>     2. Use synchronous devm_remove_action_nowarn() instead of devm_release_action()
>        - Call unregister_region() first (idempotent cleanup)
>        - Then remove devm action with _nowarn variant
>        - -ENOENT is benign in all race scenarios:
>          * devres_release_all() already handled it
>          * Driver unbind/rebind race (old action gone, new devres list)
>
>     3. Protect params read in unregister_region() with lock
>        Use scoped_guard(rwsem_read) to safely read interleave_ways even
>        during concurrent construction. If state < INTERLEAVE_ACTIVE, the
>        region is still initializing and ways=0 is correct.
>
>     4. Add memory barriers for state transitions
>        Use smp_store_release() in __construct_region() and smp_load_acquire()
>        in unregister_region() to ensure proper ordering of initialization.
>
>     5. Hold extra reference during construct_region()
>        Prevents premature freeing if unregister races during initialization.
>
>     6. Check unregister flag in __construct_region()
>        Abort construction early if delete_region_store() raced after
>        device_add() but before acquiring the write lock.
>
>     This approach is simpler and more correct than async workers because:
>     - Relies on fundamental kernel primitives (idempotency + locks)
>     - No TOCTOU races with driver binding state
>     - Fewer lines of code
>     - Handles all race scenarios correctly
>
>     Race Scenario Analysis:
>
>     | Scenario                  | Outcome                              |
>     |---------------------------|--------------------------------------|
>     | Normal delete             | Cleanup done, action removed         |
>     | Concurrent deletes        | One cleans up, others return early   |
>     | Delete + driver unbind    | One cleans up, other gets -ENOENT    |
>     | Delete + unbind/rebind    | Old action handled, -ENOENT benign   |
>     | Delete during construct   | Extra ref prevents UAF, lock protects|
>
> ---
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..8474756913e6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2541,16 +2541,45 @@ static void unregister_region(void *_cxlr)
>  {
>         struct cxl_region *cxlr = _cxlr;
>         struct cxl_region_params *p = &cxlr->params;
> -       int i;
> +       int i, ways;
> +
> +       /*
> +        * Idempotency: prevent multiple concurrent calls to unregister_region().
> +        * This can happen if user calls delete_region_store() while driver
> +        * unbind triggers devres_release_all() concurrently.
> +        */
> +       if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
> +               return;
>
>         device_del(&cxlr->dev);
>
>         /*
> -        * Now that region sysfs is shutdown, the parameter block is now
> -        * read-only, so no need to hold the region rwsem to access the
> -        * region parameters.
> +        * Even though sysfs is shutdown, the region may still be initializing
> +        * if construct_region() is running concurrently. We must safely read
> +        * interleave_ways under lock to avoid reading uninitialized values.
> +        *
> +        * Use a read lock (not write) to allow concurrent operations if needed.
> +        * Read the ways count under lock, then release before the loop to
> +        * minimize lock hold time.
> +        *
> +        * Memory ordering: use smp_load_acquire() to ensure we see the fully
> +        * initialized value of interleave_ways if state >= INTERLEAVE_ACTIVE.
>          */
> -       for (i = 0; i < p->interleave_ways; i++)
> +       scoped_guard(rwsem_read, &cxl_rwsem.region) {
> +               /*
> +                * Pairs with smp_store_release() in __construct_region().
> +                * Ensures if we see state >= INTERLEAVE_ACTIVE, we also see
> +                * the initialized interleave_ways value.
> +                */
> +               enum cxl_config_state state = smp_load_acquire(&p->state);
> +
> +               if (state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
> +                       ways = READ_ONCE(p->interleave_ways);
> +               else
> +                       ways = 0; /* Still initializing, nothing to detach */
> +       }
> +
> +       for (i = 0; i < ways; i++)
>                 detach_target(cxlr, i);
>
>         cxlr->hpa_range = DEFINE_RANGE(0, -1);
> @@ -2838,14 +2867,48 @@ static ssize_t delete_region_store(struct device *dev,
>         struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>         struct cxl_port *port = to_cxl_port(dev->parent);
>         struct cxl_region *cxlr;
> +       int rc;
>
>         cxlr = cxl_find_region_by_name(cxlrd, buf);
>         if (IS_ERR(cxlr))
>                 return PTR_ERR(cxlr);
>
> -       devm_release_action(port->uport_dev, unregister_region, cxlr);
> +       /*
> +        * Call unregister_region() first to do the cleanup. This is
> +        * idempotent (CXL_REGION_F_UNREGISTER flag), so if
> +        * devres_release_all() calls it concurrently, one will do the
> +        * work and the other will return early.
> +        */
> +       unregister_region(cxlr);
> +
> +       /*
> +        * Now remove the devm action to prevent devres_release_all()
> +        * from calling unregister_region() again during driver unbind.
> +        *
> +        * Three possible outcomes:
> +        *
> +        * 1. Success: We removed the action before driver unbind.
> +        *    Driver unbind won't call unregister_region().
> +        *
> +        * 2. -ENOENT: devres_release_all() already removed the action
> +        *    and called unregister_region() (which returned early due
> +        *    to F_UNREGISTER flag). This is benign.
> +        *
> +        * 3. -ENOENT: Driver was unbound and rebound between
> +        *    unregister_region() above and here. The old action was
> +        *    already handled by the old unbind. The new binding has
> +        *    no action for this region. This is benign.
> +        *
> +        * Use _nowarn variant since -ENOENT is expected and benign.
> +        * The devres_lock serializes access, so we can't corrupt the
> +        * devres list.
> +        */
> +       rc = devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
> +       if (rc && rc != -ENOENT)
> +               dev_warn(&cxlr->dev,
> +                        "Unexpected error removing devm action: %d\n", rc);
> +
>         put_device(&cxlr->dev);
> -
>         return len;
>  }
>  DEVICE_ATTR_WO(delete_region);
> @@ -3644,6 +3707,16 @@ static int __construct_region(struct cxl_region *cxlr,
>                 return -EBUSY;
>         }
>
> +       /*
> +        * Check if region is being unregistered concurrently. This can happen
> +        * if user triggers delete_region_store() right after device_add() but
> +        * before we acquire the write lock above.
> +        */
> +       if (test_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags)) {
> +               dev_dbg(&cxlr->dev, "Region unregister in progress, aborting construction\n");
> +               return -EBUSY;
> +       }
> +
>         set_bit(CXL_REGION_F_AUTO, &cxlr->flags);
>         cxlr->hpa_range = *hpa_range;
>
> @@ -3686,7 +3759,16 @@ static int __construct_region(struct cxl_region *cxlr,
>         p->res = res;
>         p->interleave_ways = ctx->interleave_ways;
>         p->interleave_granularity = ctx->interleave_granularity;
> -       p->state = CXL_CONFIG_INTERLEAVE_ACTIVE;
> +
> +       /*
> +        * Use smp_store_release() to ensure all initialization above is
> +        * visible before state becomes INTERLEAVE_ACTIVE. This pairs with
> +        * smp_load_acquire() in unregister_region().
> +        *
> +        * This ensures that if unregister_region() sees state >=
> +        * INTERLEAVE_ACTIVE, it will also see the initialized interleave_ways.
> +        */
> +       smp_store_release(&p->state, CXL_CONFIG_INTERLEAVE_ACTIVE);
>
>         rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
>         if (rc)
> @@ -3728,12 +3810,31 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>                 return cxlr;
>         }
>
> +       /*
> +        * Hold an extra reference during initialization to prevent
> +        * unregister_region() from freeing the object if delete_region_store()
> +        * races with __construct_region(). The device_initialize() in
> +        * cxl_region_alloc() set refcount=1, and device_add() above made it
> +        * visible, so a concurrent unregister could drop that reference while
> +        * we're still initializing.
> +        */
> +       get_device(&cxlr->dev);
> +
>         rc = __construct_region(cxlr, ctx);
>         if (rc) {
> -               devm_release_action(port->uport_dev, unregister_region, cxlr);
> +               /*
> +                * Construction failed. Unregister the partially initialized
> +                * region and remove the devm action. Both are safe to call
> +                * due to idempotency and proper locking.
> +                */
> +               unregister_region(cxlr);
> +               devm_remove_action_nowarn(port->uport_dev, unregister_region, cxlr);
> +
> +               put_device(&cxlr->dev); /* Drop construction reference */
>                 return ERR_PTR(rc);
>         }
>
> +       put_device(&cxlr->dev); /* Drop construction reference */
>         return cxlr;
>  }
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 1297594beaec..81952f0763e1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -447,6 +447,13 @@ struct cxl_region_params {
>   */
>  #define CXL_REGION_F_NORMALIZED_ADDRESSING 3
>
> +/*
> + * Indicate that this region is being unregistered. Used to make
> + * unregister_region() idempotent and prevent race conditions between
> + * delete_region_store() and devres_release_all().
> + */
> +#define CXL_REGION_F_UNREGISTER 4
> +
>  /**
>   * struct cxl_region - CXL region
>   * @dev: This region's device
>
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Dave Jiang 1 month, 2 weeks ago

On 4/28/26 1:28 PM, Sungwoo Kim wrote:
> Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
> 
> Claude suggests not using wq, since it's simpler. I agree that it's
> simple, but it's overly tailored to fix a specific bug.
> Actually, v1[1] proposed a similar patch. So let me bring a patch and
> discussion from v1:
> 
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 08fa3deef70ab..7ade9aa2aeecc 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
>   struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>   struct cxl_port *port = to_cxl_port(dev->parent);
>   struct cxl_region *cxlr;
> + int err;
> 
>   cxlr = cxl_find_region_by_name(cxlrd, buf);
>   if (IS_ERR(cxlr))
>   return PTR_ERR(cxlr);
> 
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
> +       cxlr);
> + if (err) {
> + put_device(&cxlr->dev);
> + return err;
> + }
> + unregister_region(cxlr);
>   put_device(&cxlr->dev);
> 
>   return len;
> 
> However, v1 has not been merged. Dan[2] commented that "No, that is
> not an acceptable or comprehensive fix. A subsystem should never try
> to double unregister a device." Also in the following thread[3], "The
> patch was technically correct but it relies on a design that requires
> depending on a double free semantic."
> 
> I respect this design decision. Then, I need to execute
> devm_release_[action|all]() only once, which requires a device lock,
> guard(device)(port->uport_dev). Under a lock, I can check a flag to
> execute devm_release_[action|all]() only once.
> To use the lock, a clean work without a prior lock is required. That's
> a reason this patch ended up in wq.
> 
> I hope I've explained the rationale for using wq. What do you think?

Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.

Issue 1: Lockless read of p->interleave_ways
=============================================
Problem: unregister_region() reads p->interleave_ways without holding
the region rwsem while __construct_region() may still be initializing it
under write lock. This can result in reading 0 and skipping target
detachment, potentially causing use-after-free.

Solution: Acquire rwsem read lock before reading p->interleave_ways to
serialize with __construct_region(). Add explicit handling for
interleave_ways == 0 (uninitialized region) and use smp_rmb() to ensure
proper memory ordering.

Issue 2: Driver unbind/rebind race with workqueue
==================================================
Problem: The async workqueue could execute after driver unbind clears
the devres list, then operate on a new/empty list after rebind.

Solution: The workqueue already checks if driver is bound before calling
devm_remove_action(). Added cancel_work_sync() in cxl_region_release()
to ensure no dangling work when region is freed. Enhanced comments to
clarify the synchronization.

Issue 3: Use-after-free in construct_region() error path
=========================================================
Problem: In the original problematic code, calling unregister_region()
directly then get_device() created a window where concurrent
delete_region_store() could free the region.

Solution: Never call unregister_region() directly. The workqueue calls
devm_remove_action() which invokes unregister_region() in a safe context.
Reference transfer is now clean - caller's reference is transferred to
the workqueue, which drops it after work completes.

Key design:
- unregister_region() is ONLY called as a devm callback (driver unbind
  or workqueue via devm_remove_action())
- CXL_REGION_F_UNREGISTER flag is ONLY set by unregister_region() itself,
  preventing concurrent execution
- CXL_REGION_F_DEVM_REMOVE flag is set by workqueue to prevent duplicate
  devm_remove_action() calls
- Reference counting is clear: lookup refs are transferred to workqueue

diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 08b93b14a2c7..4c5771bc4717 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2500,6 +2500,14 @@ static void cxl_region_release(struct device *dev)
 	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
 	int id = atomic_read(&cxlrd->region_id);
 
+	/*
+	 * Ensure pending removal work completes before freeing.
+	 * This is safe even if called from within the work function
+	 * because cancel_work_sync() returns immediately if the work
+	 * is currently executing.
+	 */
+	cancel_work_sync(&cxlr->remove_work);
+
 	/*
 	 * Try to reuse the recently idled id rather than the cached
 	 * next id to prevent the region id space from increasing
@@ -2544,22 +2552,46 @@ static void unregister_region(void *_cxlr)
 	struct cxl_region_params *p = &cxlr->params;
 	int i;
 
+	/*
+	 * This is ONLY called as a devm callback:
+	 * 1. During driver unbind (devres_release_all)
+	 * 2. Via devm_remove_action() from workqueue
+	 *
+	 * The UNREGISTER flag prevents concurrent execution if both
+	 * paths race (e.g., user delete concurrent with driver unbind).
+	 */
 	if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
 		return;
 
 	device_del(&cxlr->dev);
 
 	/*
-	 * Now that region sysfs is shutdown, the parameter block is now
-	 * read-only, so no need to hold the region rwsem to access the
-	 * region parameters.
+	 * Sysfs is shutdown, blocking new writers. However, __construct_region()
+	 * may still be initializing params under region rwsem write lock.
+	 * Acquire read lock to serialize and get consistent view of interleave_ways.
+	 *
+	 * If interleave_ways is 0, region was never fully initialized.
 	 */
+	guard(rwsem_read)(&cxl_rwsem.region);
+
+	/*
+	 * Ensure UNREGISTER flag check happens-before interleave_ways read.
+	 */
+	smp_rmb();
+
+	if (p->interleave_ways == 0) {
+		/* Not initialized - skip target detachment */
+		goto out;
+	}
+
 	for (i = 0; i < p->interleave_ways; i++)
 		detach_target(cxlr, i);
 
 	cxlr->hpa_range = DEFINE_RANGE(0, -1);
 
+out:
 	cxl_region_iomem_release(cxlr);
+	/* Drop the devm-owned reference */
 	put_device(&cxlr->dev);
 }
 
@@ -2842,22 +2874,59 @@ static bool remove_devm_actions(struct cxl_region *cxlr)
 	return schedule_cxl_region_remove_devm_actions(cxlr);
 }
 
+/**
+ * remove_devm_actions_work - Workqueue handler for region removal
+ * @work: work_struct embedded in cxl_region
+ *
+ * Runs in safe context where it can acquire locks and call
+ * devm_remove_action() which will invoke unregister_region() without deadlock.
+ *
+ * This ensures unregister_region() is called through the devm system.
+ */
 static void remove_devm_actions_work(struct work_struct *work)
 {
 	struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
 	struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
 	struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
 
+	/*
+	 * Prevent multiple attempts to remove devm actions.
+	 * If already set, another thread beat us to it.
+	 */
 	if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
 		put_device(&cxlr->dev);
 		return;
 	}
 
+	/*
+	 * Lock port device and verify driver is still bound.
+	 * If driver was unbound, devres_release_all() already called
+	 * unregister_region() and the devres entry is gone.
+	 */
 	scoped_guard(device, port->uport_dev) {
-		if (port->uport_dev->driver)
+		if (port->uport_dev->driver) {
+			/*
+			 * Call devm_remove_action() which will:
+			 * 1. Find and remove the devres entry
+			 * 2. Call unregister_region(cxlr)
+			 *
+			 * The UNREGISTER flag in unregister_region() prevents
+			 * re-entry if this races with driver unbind.
+			 */
 			devm_remove_action(port->uport_dev, unregister_region, cxlr);
+		}
+		/*
+		 * If driver not bound, devres already released everything.
+		 * The UNREGISTER flag would have prevented duplicate execution.
+		 */
 	}
 
+	/*
+	 * Drop the extra reference taken when work was queued.
+	 * After this put_device(), if ref reaches 0, cxl_region_release()
+	 * will be called, which does cancel_work_sync() on this work item
+	 * (safe because we're about to return from the work function).
+	 */
 	put_device(&cxlr->dev);
 }
 
@@ -2868,17 +2937,40 @@ static ssize_t delete_region_store(struct device *dev,
 	struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
 	struct cxl_region *cxlr;
 
-	/* remove_devm_actions_work() will put cxlr->dev. */
+	/*
+	 * Look up region by name. Takes a reference that will be transferred
+	 * to the workqueue.
+	 */
 	cxlr = cxl_find_region_by_name(cxlrd, buf);
 	if (IS_ERR(cxlr))
 		return PTR_ERR(cxlr);
 
-	unregister_region(cxlr);
+	/*
+	 * Queue work to remove devm action. The workqueue will call
+	 * devm_remove_action() -> unregister_region() in safe context.
+	 *
+	 * unregister_region() will set CXL_REGION_F_UNREGISTER to prevent
+	 * concurrent execution. We don't set it here because that would
+	 * cause unregister_region() to return early without cleanup.
+	 *
+	 * The work function expects to hold a reference, and will drop it
+	 * when done. We transfer our lookup reference to the workqueue.
+	 */
 	if (!remove_devm_actions(cxlr)) {
+		/*
+		 * Failed to queue work. This can happen if:
+		 * - Work already queued (another thread racing)
+		 * - Workqueue stopped
+		 * Drop our reference and return busy.
+		 */
 		put_device(&cxlr->dev);
 		return -EBUSY;
 	}
 
+	/*
+	 * Work queued successfully. The workqueue now owns our reference
+	 * and will drop it after calling devm_remove_action().
+	 */
 	return len;
 }
 DEVICE_ATTR_WO(delete_region);
@@ -3762,11 +3854,32 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
 
 	rc = __construct_region(cxlr, ctx);
 	if (rc) {
-		/* remove_devm_actions_work() will put the device */
-		get_device(&cxlr->dev);
-		unregister_region(cxlr);
-		if (!remove_devm_actions(cxlr))
+		/*
+		 * Construction failed. We need to unregister the region.
+		 * We hold a reference from __create_region() above.
+		 *
+		 * Queue work to remove devm action. The work will call
+		 * devm_remove_action() -> unregister_region() which will
+		 * set CXL_REGION_F_UNREGISTER to prevent concurrent execution.
+		 *
+		 * The work function expects a reference. We transfer our
+		 * reference from __create_region() to the workqueue.
+		 */
+		if (!remove_devm_actions(cxlr)) {
+			/*
+			 * Failed to queue work. This can happen if:
+			 * - Another thread already queued removal
+			 * - Workqueue is stopped
+			 *
+			 * In either case, drop our reference and let the other
+			 * path handle cleanup.
+			 */
 			put_device(&cxlr->dev);
+		}
+		/*
+		 * If work queued successfully, it now owns our reference
+		 * and will drop it after calling devm_remove_action().
+		 */
 
 		return ERR_PTR(rc);
 	}
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Sungwoo Kim 1 month, 2 weeks ago
On Tue, Apr 28, 2026 at 6:33 PM Dave Jiang <dave.jiang@intel.com> wrote:
>
>
>
> On 4/28/26 1:28 PM, Sungwoo Kim wrote:
> > Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
> >
> > Claude suggests not using wq, since it's simpler. I agree that it's
> > simple, but it's overly tailored to fix a specific bug.
> > Actually, v1[1] proposed a similar patch. So let me bring a patch and
> > discussion from v1:
> >
> > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > index 08fa3deef70ab..7ade9aa2aeecc 100644
> > --- a/drivers/cxl/core/region.c
> > +++ b/drivers/cxl/core/region.c
> > @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
> >   struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> >   struct cxl_port *port = to_cxl_port(dev->parent);
> >   struct cxl_region *cxlr;
> > + int err;
> >
> >   cxlr = cxl_find_region_by_name(cxlrd, buf);
> >   if (IS_ERR(cxlr))
> >   return PTR_ERR(cxlr);
> >
> > - devm_release_action(port->uport_dev, unregister_region, cxlr);
> > + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
> > +       cxlr);
> > + if (err) {
> > + put_device(&cxlr->dev);
> > + return err;
> > + }
> > + unregister_region(cxlr);
> >   put_device(&cxlr->dev);
> >
> >   return len;
> >
> > However, v1 has not been merged. Dan[2] commented that "No, that is
> > not an acceptable or comprehensive fix. A subsystem should never try
> > to double unregister a device." Also in the following thread[3], "The
> > patch was technically correct but it relies on a design that requires
> > depending on a double free semantic."
> >
> > I respect this design decision. Then, I need to execute
> > devm_release_[action|all]() only once, which requires a device lock,
> > guard(device)(port->uport_dev). Under a lock, I can check a flag to
> > execute devm_release_[action|all]() only once.
> > To use the lock, a clean work without a prior lock is required. That's
> > a reason this patch ended up in wq.
> >
> > I hope I've explained the rationale for using wq. What do you think?
>
> Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.

I (and Claude) don't have a better solution than using wq, although I
agree that not using wq is simpler.
Also, I'm not yet experienced enough to decide which is better for
CXL, so I'm happy with both directions. Would you prefer the version
without wq?

>
> Issue 1: Lockless read of p->interleave_ways
> =============================================
> Problem: unregister_region() reads p->interleave_ways without holding
> the region rwsem while __construct_region() may still be initializing it
> under write lock. This can result in reading 0 and skipping target
> detachment, potentially causing use-after-free.
>
> Solution: Acquire rwsem read lock before reading p->interleave_ways to
> serialize with __construct_region(). Add explicit handling for
> interleave_ways == 0 (uninitialized region) and use smp_rmb() to ensure
> proper memory ordering.

Memory barrier is a good catch.

>
> Issue 2: Driver unbind/rebind race with workqueue
> ==================================================
> Problem: The async workqueue could execute after driver unbind clears
> the devres list, then operate on a new/empty list after rebind.
>
> Solution: The workqueue already checks if driver is bound before calling
> devm_remove_action(). Added cancel_work_sync() in cxl_region_release()
> to ensure no dangling work when region is freed. Enhanced comments to
> clarify the synchronization.

Sounds good to me.

>
> Issue 3: Use-after-free in construct_region() error path
> =========================================================
> Problem: In the original problematic code, calling unregister_region()
> directly then get_device() created a window where concurrent
> delete_region_store() could free the region.
>
> Solution: Never call unregister_region() directly. The workqueue calls
> devm_remove_action() which invokes unregister_region() in a safe context.
> Reference transfer is now clean - caller's reference is transferred to
> the workqueue, which drops it after work completes.
>
> Key design:
> - unregister_region() is ONLY called as a devm callback (driver unbind
>   or workqueue via devm_remove_action())
> - CXL_REGION_F_UNREGISTER flag is ONLY set by unregister_region() itself,
>   preventing concurrent execution
> - CXL_REGION_F_DEVM_REMOVE flag is set by workqueue to prevent duplicate
>   devm_remove_action() calls
> - Reference counting is clear: lookup refs are transferred to workqueue
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 08b93b14a2c7..4c5771bc4717 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2500,6 +2500,14 @@ static void cxl_region_release(struct device *dev)
>         struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
>         int id = atomic_read(&cxlrd->region_id);
>
> +       /*
> +        * Ensure pending removal work completes before freeing.
> +        * This is safe even if called from within the work function
> +        * because cancel_work_sync() returns immediately if the work
> +        * is currently executing.
> +        */
> +       cancel_work_sync(&cxlr->remove_work);
> +
>         /*
>          * Try to reuse the recently idled id rather than the cached
>          * next id to prevent the region id space from increasing
> @@ -2544,22 +2552,46 @@ static void unregister_region(void *_cxlr)
>         struct cxl_region_params *p = &cxlr->params;
>         int i;
>
> +       /*
> +        * This is ONLY called as a devm callback:
> +        * 1. During driver unbind (devres_release_all)
> +        * 2. Via devm_remove_action() from workqueue
> +        *
> +        * The UNREGISTER flag prevents concurrent execution if both
> +        * paths race (e.g., user delete concurrent with driver unbind).
> +        */
>         if (test_and_set_bit(CXL_REGION_F_UNREGISTER, &cxlr->flags))
>                 return;
>
>         device_del(&cxlr->dev);
>
>         /*
> -        * Now that region sysfs is shutdown, the parameter block is now
> -        * read-only, so no need to hold the region rwsem to access the
> -        * region parameters.
> +        * Sysfs is shutdown, blocking new writers. However, __construct_region()
> +        * may still be initializing params under region rwsem write lock.
> +        * Acquire read lock to serialize and get consistent view of interleave_ways.
> +        *
> +        * If interleave_ways is 0, region was never fully initialized.
>          */
> +       guard(rwsem_read)(&cxl_rwsem.region);
> +
> +       /*
> +        * Ensure UNREGISTER flag check happens-before interleave_ways read.
> +        */
> +       smp_rmb();
> +
> +       if (p->interleave_ways == 0) {
> +               /* Not initialized - skip target detachment */
> +               goto out;
> +       }
> +
>         for (i = 0; i < p->interleave_ways; i++)
>                 detach_target(cxlr, i);
>
>         cxlr->hpa_range = DEFINE_RANGE(0, -1);
>
> +out:
>         cxl_region_iomem_release(cxlr);
> +       /* Drop the devm-owned reference */
>         put_device(&cxlr->dev);

Does this lead to dropping the refcount to below 0 in __construct_region()?
If unregister_region() is called before __construct_region() by
devm_remove_all(), cxlr->dev's refcount can be 0 at this point, and
could lead to use-after-free or refcount underflow in
__construct_region().

Such as:

__create_region()
  devm_cxl_add_region()
    cxl_region_alloc();   // refcount = 1   ... (1)
  __construct_region() ... (3)
    get_device(&cxlr->dev); // BUG: refcount is already 0 at (2)

devm_remove_all()
...
  unregister_region() ... (2)
    put_device(); // refcount = 0, release.

To fix this, __construct_region() should also check that
unregister_region() has been called. If so, __construct_region()
should return failure.

Thank you!
Sungwoo.


>  }
>
> @@ -2842,22 +2874,59 @@ static bool remove_devm_actions(struct cxl_region *cxlr)
>         return schedule_cxl_region_remove_devm_actions(cxlr);
>  }
>
> +/**
> + * remove_devm_actions_work - Workqueue handler for region removal
> + * @work: work_struct embedded in cxl_region
> + *
> + * Runs in safe context where it can acquire locks and call
> + * devm_remove_action() which will invoke unregister_region() without deadlock.
> + *
> + * This ensures unregister_region() is called through the devm system.
> + */
>  static void remove_devm_actions_work(struct work_struct *work)
>  {
>         struct cxl_region *cxlr = container_of(work, typeof(*cxlr), remove_work);
>         struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
>         struct cxl_port *port = to_cxl_port(cxlrd->cxlsd.cxld.dev.parent);
>
> +       /*
> +        * Prevent multiple attempts to remove devm actions.
> +        * If already set, another thread beat us to it.
> +        */
>         if (test_and_set_bit(CXL_REGION_F_DEVM_REMOVE, &cxlr->flags)) {
>                 put_device(&cxlr->dev);
>                 return;
>         }
>
> +       /*
> +        * Lock port device and verify driver is still bound.
> +        * If driver was unbound, devres_release_all() already called
> +        * unregister_region() and the devres entry is gone.
> +        */
>         scoped_guard(device, port->uport_dev) {
> -               if (port->uport_dev->driver)
> +               if (port->uport_dev->driver) {
> +                       /*
> +                        * Call devm_remove_action() which will:
> +                        * 1. Find and remove the devres entry
> +                        * 2. Call unregister_region(cxlr)
> +                        *
> +                        * The UNREGISTER flag in unregister_region() prevents
> +                        * re-entry if this races with driver unbind.
> +                        */
>                         devm_remove_action(port->uport_dev, unregister_region, cxlr);
> +               }
> +               /*
> +                * If driver not bound, devres already released everything.
> +                * The UNREGISTER flag would have prevented duplicate execution.
> +                */
>         }
>
> +       /*
> +        * Drop the extra reference taken when work was queued.
> +        * After this put_device(), if ref reaches 0, cxl_region_release()
> +        * will be called, which does cancel_work_sync() on this work item
> +        * (safe because we're about to return from the work function).
> +        */
>         put_device(&cxlr->dev);
>  }
>
> @@ -2868,17 +2937,40 @@ static ssize_t delete_region_store(struct device *dev,
>         struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>         struct cxl_region *cxlr;
>
> -       /* remove_devm_actions_work() will put cxlr->dev. */
> +       /*
> +        * Look up region by name. Takes a reference that will be transferred
> +        * to the workqueue.
> +        */
>         cxlr = cxl_find_region_by_name(cxlrd, buf);
>         if (IS_ERR(cxlr))
>                 return PTR_ERR(cxlr);
>
> -       unregister_region(cxlr);
> +       /*
> +        * Queue work to remove devm action. The workqueue will call
> +        * devm_remove_action() -> unregister_region() in safe context.
> +        *
> +        * unregister_region() will set CXL_REGION_F_UNREGISTER to prevent
> +        * concurrent execution. We don't set it here because that would
> +        * cause unregister_region() to return early without cleanup.
> +        *
> +        * The work function expects to hold a reference, and will drop it
> +        * when done. We transfer our lookup reference to the workqueue.
> +        */
>         if (!remove_devm_actions(cxlr)) {
> +               /*
> +                * Failed to queue work. This can happen if:
> +                * - Work already queued (another thread racing)
> +                * - Workqueue stopped
> +                * Drop our reference and return busy.
> +                */
>                 put_device(&cxlr->dev);
>                 return -EBUSY;
>         }
>
> +       /*
> +        * Work queued successfully. The workqueue now owns our reference
> +        * and will drop it after calling devm_remove_action().
> +        */
>         return len;
>  }
>  DEVICE_ATTR_WO(delete_region);
> @@ -3762,11 +3854,32 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>
>         rc = __construct_region(cxlr, ctx);
>         if (rc) {
> -               /* remove_devm_actions_work() will put the device */
> -               get_device(&cxlr->dev);
> -               unregister_region(cxlr);
> -               if (!remove_devm_actions(cxlr))
> +               /*
> +                * Construction failed. We need to unregister the region.
> +                * We hold a reference from __create_region() above.
> +                *
> +                * Queue work to remove devm action. The work will call
> +                * devm_remove_action() -> unregister_region() which will
> +                * set CXL_REGION_F_UNREGISTER to prevent concurrent execution.
> +                *
> +                * The work function expects a reference. We transfer our
> +                * reference from __create_region() to the workqueue.
> +                */
> +               if (!remove_devm_actions(cxlr)) {
> +                       /*
> +                        * Failed to queue work. This can happen if:
> +                        * - Another thread already queued removal
> +                        * - Workqueue is stopped
> +                        *
> +                        * In either case, drop our reference and let the other
> +                        * path handle cleanup.
> +                        */
>                         put_device(&cxlr->dev);
> +               }
> +               /*
> +                * If work queued successfully, it now owns our reference
> +                * and will drop it after calling devm_remove_action().
> +                */
>
>                 return ERR_PTR(rc);
>         }
>
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Dave Jiang 1 month, 2 weeks ago

On 4/29/26 9:39 PM, Sungwoo Kim wrote:
> On Tue, Apr 28, 2026 at 6:33 PM Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>>
>> On 4/28/26 1:28 PM, Sungwoo Kim wrote:
>>> Dear Dave, thank you for sharing the patch that doesn't use the workqueue.
>>>
>>> Claude suggests not using wq, since it's simpler. I agree that it's
>>> simple, but it's overly tailored to fix a specific bug.
>>> Actually, v1[1] proposed a similar patch. So let me bring a patch and
>>> discussion from v1:
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 08fa3deef70ab..7ade9aa2aeecc 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -2745,12 +2745,19 @@ static ssize_t delete_region_store(struct device *dev,
>>>   struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
>>>   struct cxl_port *port = to_cxl_port(dev->parent);
>>>   struct cxl_region *cxlr;
>>> + int err;
>>>
>>>   cxlr = cxl_find_region_by_name(cxlrd, buf);
>>>   if (IS_ERR(cxlr))
>>>   return PTR_ERR(cxlr);
>>>
>>> - devm_release_action(port->uport_dev, unregister_region, cxlr);
>>> + err = devm_remove_action_nowarn(port->uport_dev, unregister_region,
>>> +       cxlr);
>>> + if (err) {
>>> + put_device(&cxlr->dev);
>>> + return err;
>>> + }
>>> + unregister_region(cxlr);
>>>   put_device(&cxlr->dev);
>>>
>>>   return len;
>>>
>>> However, v1 has not been merged. Dan[2] commented that "No, that is
>>> not an acceptable or comprehensive fix. A subsystem should never try
>>> to double unregister a device." Also in the following thread[3], "The
>>> patch was technically correct but it relies on a design that requires
>>> depending on a double free semantic."
>>>
>>> I respect this design decision. Then, I need to execute
>>> devm_release_[action|all]() only once, which requires a device lock,
>>> guard(device)(port->uport_dev). Under a lock, I can check a flag to
>>> execute devm_release_[action|all]() only once.
>>> To use the lock, a clean work without a prior lock is required. That's
>>> a reason this patch ended up in wq.
>>>
>>> I hope I've explained the rationale for using wq. What do you think?
>>
>> Right I went back and also read what Dan proposed. I just wonder if we are over complicating things now and introducing more issues on top by doing that. Obviously we have to address the issues sachiko brought up in v3. Below is what claude suggested to fix the Sashiko issues in v3 patches. Some of the comments may be excessive but help reading through the changes.
> 
> I (and Claude) don't have a better solution than using wq, although I
> agree that not using wq is simpler.
> Also, I'm not yet experienced enough to decide which is better for
> CXL, so I'm happy with both directions. Would you prefer the version
> without wq?

Looks like Dan is working on something [1]. So maybe we wait and see what he comes up with.

[1]: https://lore.kernel.org/linux-cxl/c65851c1-4fca-46ba-8dde-fa10b7cb9cd3@amd.com/T/#mdd1ab49c012321fcd3dc34bd0cb7c0846cf1d1f9

DJ
Re: [PATCH v3 0/2] cxl/region: Fix race conditions in cxl region unregistration.
Posted by Jonathan Cameron 1 month, 2 weeks ago
On Sun, 26 Apr 2026 23:20:07 -0400
Sungwoo Kim <iam@sung-woo.kim> wrote:

> This version mainly addresses Dave's comments and meaningful issues reported by Sashiko AI[1].
> 
> Overview
> ========
> This patch series fixes race conditions in cxl region unregistration.
> 
> devm_release_action() should be called once, otherwise, it warns about
> the second call. However, the current implementation has a race condition
> that allows multiple calls to devm_release_action(). The details are in
> each patch.
> 
> To fix these, the first patch adds a new function that guarantees that
> devm_release_action() is called only once.
> Using this function, the second patch subsitutes the current use of 
> devm_release_action() in cxl region with the new function.
> 
> Change in v3
> ============
> Addressed Dave's comments:
> - Split and made this in a patch series.
> - Enhanced a commit log explaining why a workqueue is used in the first patch.
> - Added a context on how these issues were found and what impact was observed and how that effects a user in the second patch.
> Sashiko AI review fixes:
> - Fixed construct_region() as it also can race.
> - Used a driver's wq instead of system wq so unbinding can drain a work.
> 
> [1] https://sashiko.dev/#/patchset/20260422045637.3048249-2-iam%40sung-woo.kim
> 
> Earlier approach:
> v2: https://lore.kernel.org/linux-cxl/20260422045637.3048249-2-iam@sung-woo.kim/
> v1: https://lore.kernel.org/linux-cxl/20260308185958.2453707-2-iam@sung-woo.kim/

Looks fine to me - I'm not particular keen on delayed releasing as it
may end up with unordered tear down but I don't have a better idea.

Reviewed-by: Jonathan Cameron <jic23@kernel.org>


> 
> Sungwoo Kim (2):
>   cxl/region: serialize devm action removal via scheduled work
>   cxl/region: Fix a race bug in delete_region_store()
> 
>  drivers/cxl/core/port.c   |  6 +++++
>  drivers/cxl/core/region.c | 47 ++++++++++++++++++++++++++++++++++-----
>  drivers/cxl/cxl.h         |  9 ++++++++
>  3 files changed, 57 insertions(+), 5 deletions(-)
>