Introduce DEFINE_ACQUIRE(), a scoped_cond_guard() replacement

[PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Dan Williams 9 months, 1 week ago

The scoped_cond_guard() helper has two problems. First, with "convert to
cleanup" style patches, it results in messy diffs that re-indent large
swaths of code. Second, it swallows return values from interruptible /
killable locks, i.e. the return value of those is not always -EINTR.

The last attempt at improving this situation, if_not_guard() [1], was
reverted with a suggestion:

    "I really think the basic issue is that 'cond_guard' itself is a pretty
     broken concept. It simply doesn't work very well in the C syntax."

However, the observation is that just exposing the CLASS() to callers
solves: the safety concern (compiler enforced "expected expression"
checks), conveying the conditional acquisition state of the lock, and
avoiding re-indentation caused by scoped_cond_guard(). See the commit below
for the analysis of the revert.

Commit b4d83c8323b0 ("headers/cleanup.h: Remove the if_not_guard() facility")

The DEFINE_ACQUIRE() macro wraps a lock type like 'struct mutex' into a
'struct mutex_acquire' type that can only be acquired and automatically
released by scope-based helpers. E.g.:

    [scoped_]guard(mutex_acquire)(...)
    CLASS(mutex_intr_acquire, ...)
    CLASS(mutex_try_acquire, ...)

Use DEFINE_ACQUIRE to create the new classes above and use
mutex_intr_acquire in the CXL subsystem to demonstrate the conversion.

Link: https://lore.kernel.org/all/CAHk-=whn07tnDosPfn+UcAtWHBcLg=KqA16SHVv0GV4t8P1fHw@mail.gmail.com [1]
Cc: David Lechner <dlechner@baylibre.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Fabio M. De Francesco" <fabio.maria.de.francesco@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/cxl/core/mbox.c |  9 +++---
 drivers/cxl/cxlmem.h    |  2 +-
 include/linux/cleanup.h | 62 +++++++++++++++++++++++++++++++++++++++++
 include/linux/mutex.h   | 24 ++++++++++++++++
 4 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..cec9dfb22567 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1394,9 +1394,9 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int nr_records = 0;
 	int rc;
 
-	rc = mutex_lock_interruptible(&mds->poison.lock);
-	if (rc)
-		return rc;
+	CLASS(mutex_intr_acquire, lock)(&mds->poison.lock);
+	if (IS_ERR(lock))
+		return PTR_ERR(lock);
 
 	po = mds->poison.list_out;
 	pi.offset = cpu_to_le64(offset);
@@ -1430,7 +1430,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		}
 	} while (po->flags & CXL_POISON_FLAG_MORE);
 
-	mutex_unlock(&mds->poison.lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, "CXL");
@@ -1466,7 +1465,7 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
 		return rc;
 	}
 
-	mutex_init(&mds->poison.lock);
+	mutex_acquire_init(&mds->poison.lock);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, "CXL");
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..9b4ab5d1a7c4 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -257,7 +257,7 @@ struct cxl_poison_state {
 	u32 max_errors;
 	DECLARE_BITMAP(enabled_cmds, CXL_POISON_ENABLED_MAX);
 	struct cxl_mbox_poison_out *list_out;
-	struct mutex lock;  /* Protect reads of poison list */
+	struct mutex_acquire lock;  /* Protect reads of poison list */
 };
 
 /*
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 7e57047e1564..926b95daa4b5 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -424,5 +424,67 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
 	{ return class_##_name##_lock_ptr(_T); }
 
+/*
+ * DEFINE_ACQUIRE(acquire_class_name, lock_type, unlock, cond_lock):
+ *	Define a CLASS() that instantiates and acquires a conditional lock
+ *	within an existing scope. In contrast to DEFINE_GUARD[_COND](), which
+ *	hides the variable tracking the lock scope, CLASS(@acquire_class_name,
+ *	@lock) instantiates @lock as either an ERR_PTR() or a cookie that drops
+ *	the lock when it goes out of scope. An "_acquire" suffix is appended to
+ *	@lock_type to provide type-safety against mixing explicit and implicit
+ *	(scope-based) cleanup.
+ *
+ * Ex.
+ *
+ * DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
+ *                mutex_lock_interruptible)
+ *
+ *	int interruptible_operation(...)
+ *	{
+ *		...
+ *		CLASS(mutex_intr_acquire, lock)(&obj->lock);
+ *	     if (IS_ERR(lock))
+ *			return PTR_ERR(lock);
+ *		...
+ *	} <= obj->lock dropped here.
+ *
+ * Attempts to perform:
+ *
+ * mutex_unlock(&obj->lock);
+ *
+ * ...fail because obj->lock is a 'struct mutex_acquire' not 'struct mutex'
+ * instance.
+ *
+ * Also, attempts to use the CLASS() conditionally require the ambiguous
+ * scope to be clarified (compiler enforced):
+ *
+ *	if (...)
+ *		CLASS(mutex_intr_acquire, lock)(&obj->lock); // <-- "error: expected expression"
+ *		if (IS_ERR(lock))
+ *			return PTR_ERR(lock);
+ *
+ * vs:
+ *
+ *	if (...) {
+ *		CLASS(mutex_intr_acquire, lock)(&obj->lock);
+ *		if (IS_ERR(lock))
+ *			return PTR_ERR(lock);
+ *	} // <-- lock released here
+ */
+#define DEFINE_ACQUIRE(_name, _locktype, _unlock, _cond_lock)                \
+	DEFINE_CLASS(_name, struct _locktype##_acquire *,                    \
+		     if (!IS_ERR_OR_NULL(_T)) _unlock(&_T->_locktype), ({    \
+			     struct _locktype##_acquire *lock_result;        \
+			     int ret = _cond_lock(&to_lock->_locktype);      \
+                                                                             \
+			     if (ret)                                        \
+				     lock_result = ERR_PTR(ret);             \
+			     else                                            \
+				     lock_result =                           \
+					     (struct _locktype##_acquire     \
+						      *)&to_lock->_locktype; \
+			     lock_result;                                    \
+		     }),                                                     \
+		     struct _locktype##_acquire *to_lock)
 
 #endif /* _LINUX_CLEANUP_H */
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2143d05116be..283111f43b0f 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -64,6 +64,8 @@ do {									\
 	__mutex_init((mutex), #mutex, &__key);				\
 } while (0)
 
+#define mutex_acquire_init(lock) mutex_init(&(lock)->mutex)
+
 /**
  * mutex_init_with_key - initialize a mutex with a given lockdep key
  * @mutex: the mutex to be initialized
@@ -202,6 +204,28 @@ DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
 DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
 DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
 
+/* mutex type that only implements scope-based unlock */
+struct mutex_acquire {
+	/* private: */
+	struct mutex mutex;
+};
+DEFINE_GUARD(mutex_acquire, struct mutex_acquire *, mutex_lock(&_T->mutex),
+	     mutex_unlock(&_T->mutex))
+DEFINE_GUARD_COND(mutex_acquire, _try, mutex_trylock(&_T->mutex))
+DEFINE_GUARD_COND(mutex_acquire, _intr, mutex_lock_interruptible(&_T->mutex) == 0)
+DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
+	       mutex_lock_interruptible)
+
+static inline int mutex_try_or_busy(struct mutex *lock)
+{
+	int ret[] = { -EBUSY, 0 };
+
+	return ret[mutex_trylock(lock)];
+}
+
+DEFINE_ACQUIRE(mutex_try_acquire, mutex, mutex_unlock,
+	       mutex_try_or_busy)
+
 extern unsigned long mutex_get_owner(struct mutex *lock);
 
 #endif /* __LINUX_MUTEX_H */
-- 
2.49.0

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by kernel test robot 9 months ago

Hi Dan,

kernel test robot noticed the following build errors:

[auto build test ERROR on b4432656b36e5cc1d50a1f2dc15357543add530e]

url:    https://github.com/intel-lab-lkp/linux/commits/Dan-Williams/cleanup-Introduce-DEFINE_ACQUIRE-a-CLASS-for-conditional-locking/20250507-152728
base:   b4432656b36e5cc1d50a1f2dc15357543add530e
patch link:    https://lore.kernel.org/r/20250507072145.3614298-2-dan.j.williams%40intel.com
patch subject: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking
config: arm-randconfig-001-20250509 (https://download.01.org/0day-ci/archive/20250510/202505100206.85k3mymM-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250510/202505100206.85k3mymM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505100206.85k3mymM-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/irqflags.h:17,
                    from arch/arm/include/asm/bitops.h:28,
                    from include/linux/bitops.h:68,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4defs.h:40,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:36,
                    from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
                    from arch/arm/boot/compressed/decompress.c:60:
   include/linux/mutex.h: In function 'class_mutex_intr_acquire_destructor':
>> include/linux/cleanup.h:476:13: error: implicit declaration of function 'IS_ERR_OR_NULL' [-Werror=implicit-function-declaration]
           if (!IS_ERR_OR_NULL(_T)) _unlock(&_T->_locktype), ({    \
                ^~~~~~~~~~~~~~
   include/linux/cleanup.h:246:18: note: in definition of macro 'DEFINE_CLASS'
    { _type _T = *p; _exit; }      \
                     ^~~~~
   include/linux/mutex.h:216:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
   include/linux/mutex.h: In function 'class_mutex_intr_acquire_constructor':
>> include/linux/cleanup.h:481:24: error: implicit declaration of function 'ERR_PTR'; did you mean 'PERCPU_PTR'? [-Werror=implicit-function-declaration]
             lock_result = ERR_PTR(ret);             \
                           ^~~~~~~
   include/linux/cleanup.h:248:13: note: in definition of macro 'DEFINE_CLASS'
    { _type t = _init; return t; }
                ^~~~~
   include/linux/mutex.h:216:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
>> include/linux/cleanup.h:481:22: warning: assignment to 'struct mutex_acquire *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
             lock_result = ERR_PTR(ret);             \
                         ^
   include/linux/cleanup.h:248:13: note: in definition of macro 'DEFINE_CLASS'
    { _type t = _init; return t; }
                ^~~~~
   include/linux/mutex.h:216:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
   include/linux/mutex.h: In function 'class_mutex_try_acquire_constructor':
>> include/linux/cleanup.h:481:22: warning: assignment to 'struct mutex_acquire *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
             lock_result = ERR_PTR(ret);             \
                         ^
   include/linux/cleanup.h:248:13: note: in definition of macro 'DEFINE_CLASS'
    { _type t = _init; return t; }
                ^~~~~
   include/linux/mutex.h:226:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_try_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
   In file included from include/linux/rwsem.h:17,
                    from include/linux/mm_types.h:13,
                    from include/linux/mmzone.h:22,
                    from include/linux/gfp.h:7,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:17,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:38,
                    from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
                    from arch/arm/boot/compressed/decompress.c:60:
   include/linux/err.h: At top level:
>> include/linux/err.h:39:35: error: conflicting types for 'ERR_PTR'
    static inline void * __must_check ERR_PTR(long error)
                                      ^~~~~~~
   In file included from include/linux/irqflags.h:17,
                    from arch/arm/include/asm/bitops.h:28,
                    from include/linux/bitops.h:68,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4defs.h:40,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:36,
                    from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
                    from arch/arm/boot/compressed/decompress.c:60:
   include/linux/cleanup.h:481:24: note: previous implicit declaration of 'ERR_PTR' was here
             lock_result = ERR_PTR(ret);             \
                           ^~~~~~~
   include/linux/cleanup.h:248:13: note: in definition of macro 'DEFINE_CLASS'
    { _type t = _init; return t; }
                ^~~~~
   include/linux/mutex.h:216:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
   In file included from include/linux/rwsem.h:17,
                    from include/linux/mm_types.h:13,
                    from include/linux/mmzone.h:22,
                    from include/linux/gfp.h:7,
                    from include/linux/umh.h:4,
                    from include/linux/kmod.h:9,
                    from include/linux/module.h:17,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:38,
                    from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
                    from arch/arm/boot/compressed/decompress.c:60:
>> include/linux/err.h:82:33: error: conflicting types for 'IS_ERR_OR_NULL'
    static inline bool __must_check IS_ERR_OR_NULL(__force const void *ptr)
                                    ^~~~~~~~~~~~~~
   In file included from include/linux/irqflags.h:17,
                    from arch/arm/include/asm/bitops.h:28,
                    from include/linux/bitops.h:68,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4defs.h:40,
                    from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:36,
                    from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10,
                    from arch/arm/boot/compressed/decompress.c:60:
   include/linux/cleanup.h:476:13: note: previous implicit declaration of 'IS_ERR_OR_NULL' was here
           if (!IS_ERR_OR_NULL(_T)) _unlock(&_T->_locktype), ({    \
                ^~~~~~~~~~~~~~
   include/linux/cleanup.h:246:18: note: in definition of macro 'DEFINE_CLASS'
    { _type _T = *p; _exit; }      \
                     ^~~~~
   include/linux/mutex.h:216:1: note: in expansion of macro 'DEFINE_ACQUIRE'
    DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
    ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/IS_ERR_OR_NULL +476 include/linux/cleanup.h

   416	
   417	#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock)		\
   418		__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true);		\
   419		EXTEND_CLASS(_name, _ext,					\
   420			     ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
   421			        if (_T->lock && !(_condlock)) _T->lock = NULL;	\
   422				_t; }),						\
   423			     typeof_member(class_##_name##_t, lock) l)		\
   424		static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
   425		{ return class_##_name##_lock_ptr(_T); }
   426	
   427	/*
   428	 * DEFINE_ACQUIRE(acquire_class_name, lock_type, unlock, cond_lock):
   429	 *	Define a CLASS() that instantiates and acquires a conditional lock
   430	 *	within an existing scope. In contrast to DEFINE_GUARD[_COND](), which
   431	 *	hides the variable tracking the lock scope, CLASS(@acquire_class_name,
   432	 *	@lock) instantiates @lock as either an ERR_PTR() or a cookie that drops
   433	 *	the lock when it goes out of scope. An "_acquire" suffix is appended to
   434	 *	@lock_type to provide type-safety against mixing explicit and implicit
   435	 *	(scope-based) cleanup.
   436	 *
   437	 * Ex.
   438	 *
   439	 * DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
   440	 *                mutex_lock_interruptible)
   441	 *
   442	 *	int interruptible_operation(...)
   443	 *	{
   444	 *		...
   445	 *		CLASS(mutex_intr_acquire, lock)(&obj->lock);
   446	 *	     if (IS_ERR(lock))
   447	 *			return PTR_ERR(lock);
   448	 *		...
   449	 *	} <= obj->lock dropped here.
   450	 *
   451	 * Attempts to perform:
   452	 *
   453	 * mutex_unlock(&obj->lock);
   454	 *
   455	 * ...fail because obj->lock is a 'struct mutex_acquire' not 'struct mutex'
   456	 * instance.
   457	 *
   458	 * Also, attempts to use the CLASS() conditionally require the ambiguous
   459	 * scope to be clarified (compiler enforced):
   460	 *
   461	 *	if (...)
   462	 *		CLASS(mutex_intr_acquire, lock)(&obj->lock); // <-- "error: expected expression"
   463	 *		if (IS_ERR(lock))
   464	 *			return PTR_ERR(lock);
   465	 *
   466	 * vs:
   467	 *
   468	 *	if (...) {
   469	 *		CLASS(mutex_intr_acquire, lock)(&obj->lock);
   470	 *		if (IS_ERR(lock))
   471	 *			return PTR_ERR(lock);
   472	 *	} // <-- lock released here
   473	 */
   474	#define DEFINE_ACQUIRE(_name, _locktype, _unlock, _cond_lock)                \
   475		DEFINE_CLASS(_name, struct _locktype##_acquire *,                    \
 > 476			     if (!IS_ERR_OR_NULL(_T)) _unlock(&_T->_locktype), ({    \
   477				     struct _locktype##_acquire *lock_result;        \
   478				     int ret = _cond_lock(&to_lock->_locktype);      \
   479	                                                                             \
   480				     if (ret)                                        \
 > 481					     lock_result = ERR_PTR(ret);             \
   482				     else                                            \
   483					     lock_result =                           \
   484						     (struct _locktype##_acquire     \
   485							      *)&to_lock->_locktype; \
   486				     lock_result;                                    \
   487			     }),                                                     \
   488			     struct _locktype##_acquire *to_lock)
   489	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months, 1 week ago

On Wed, May 07, 2025 at 12:21:39AM -0700, Dan Williams wrote:
> The scoped_cond_guard() helper has two problems. First, with "convert to
> cleanup" style patches, it results in messy diffs that re-indent large
> swaths of code. Second, it swallows return values from interruptible /
> killable locks, i.e. the return value of those is not always -EINTR.
> 
> The last attempt at improving this situation, if_not_guard() [1], was
> reverted with a suggestion:
> 
>     "I really think the basic issue is that 'cond_guard' itself is a pretty
>      broken concept. It simply doesn't work very well in the C syntax."
> 
> However, the observation is that just exposing the CLASS() to callers
> solves: the safety concern (compiler enforced "expected expression"
> checks), conveying the conditional acquisition state of the lock, and
> avoiding re-indentation caused by scoped_cond_guard(). See the commit below
> for the analysis of the revert.
> 
> Commit b4d83c8323b0 ("headers/cleanup.h: Remove the if_not_guard() facility")
> 
> The DEFINE_ACQUIRE() macro wraps a lock type like 'struct mutex' into a
> 'struct mutex_acquire' type that can only be acquired and automatically
> released by scope-based helpers. E.g.:
> 
>     [scoped_]guard(mutex_acquire)(...)
>     CLASS(mutex_intr_acquire, ...)
>     CLASS(mutex_try_acquire, ...)
> 
> Use DEFINE_ACQUIRE to create the new classes above and use
> mutex_intr_acquire in the CXL subsystem to demonstrate the conversion.
> 

> +#define DEFINE_ACQUIRE(_name, _locktype, _unlock, _cond_lock)                \
> +	DEFINE_CLASS(_name, struct _locktype##_acquire *,                    \
> +		     if (!IS_ERR_OR_NULL(_T)) _unlock(&_T->_locktype), ({    \
> +			     struct _locktype##_acquire *lock_result;        \
> +			     int ret = _cond_lock(&to_lock->_locktype);      \
> +                                                                             \
> +			     if (ret)                                        \
> +				     lock_result = ERR_PTR(ret);             \
> +			     else                                            \
> +				     lock_result =                           \
> +					     (struct _locktype##_acquire     \
> +						      *)&to_lock->_locktype; \
> +			     lock_result;                                    \
> +		     }),                                                     \
> +		     struct _locktype##_acquire *to_lock)
>  
>  #endif /* _LINUX_CLEANUP_H */
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 2143d05116be..283111f43b0f 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -64,6 +64,8 @@ do {									\
>  	__mutex_init((mutex), #mutex, &__key);				\
>  } while (0)
>  
> +#define mutex_acquire_init(lock) mutex_init(&(lock)->mutex)
> +
>  /**
>   * mutex_init_with_key - initialize a mutex with a given lockdep key
>   * @mutex: the mutex to be initialized
> @@ -202,6 +204,28 @@ DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
>  DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
>  DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
>  
> +/* mutex type that only implements scope-based unlock */
> +struct mutex_acquire {
> +	/* private: */
> +	struct mutex mutex;
> +};
> +DEFINE_GUARD(mutex_acquire, struct mutex_acquire *, mutex_lock(&_T->mutex),
> +	     mutex_unlock(&_T->mutex))
> +DEFINE_GUARD_COND(mutex_acquire, _try, mutex_trylock(&_T->mutex))
> +DEFINE_GUARD_COND(mutex_acquire, _intr, mutex_lock_interruptible(&_T->mutex) == 0)
> +DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
> +	       mutex_lock_interruptible)
> +
> +static inline int mutex_try_or_busy(struct mutex *lock)
> +{
> +	int ret[] = { -EBUSY, 0 };
> +
> +	return ret[mutex_trylock(lock)];
> +}
> +
> +DEFINE_ACQUIRE(mutex_try_acquire, mutex, mutex_unlock,
> +	       mutex_try_or_busy)
> +
>  extern unsigned long mutex_get_owner(struct mutex *lock);
>  
>  #endif /* __LINUX_MUTEX_H */

I'm terribly confused...

What's wrong with:

	CLASS(mutex_intr, lock)(&foo);
	if (IS_ERR(__guard_ptr(mutex_intr)(lock)))
		return __guard_ptr(mutex_intr)(lock);

I mean, yeah __guard_ptr(mutex_intr) doesn't really roll of the tongue,
but if that is the whole objection, surely we can try and fix that bit
instead of building an entire parallel set of primitives.


Notably, you're going to be running into trouble the moment you want to
use your acquire stuff on things like raw_spin_trylock_irqsave(),
because all that already wraps the return type, in order to hide the
flags thing etc.

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Dan Williams 9 months, 1 week ago

Peter Zijlstra wrote:
[..]
> > @@ -202,6 +204,28 @@ DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
> >  DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
> >  DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
> >  
> > +/* mutex type that only implements scope-based unlock */
> > +struct mutex_acquire {
> > +	/* private: */
> > +	struct mutex mutex;
> > +};
> > +DEFINE_GUARD(mutex_acquire, struct mutex_acquire *, mutex_lock(&_T->mutex),
> > +	     mutex_unlock(&_T->mutex))
> > +DEFINE_GUARD_COND(mutex_acquire, _try, mutex_trylock(&_T->mutex))
> > +DEFINE_GUARD_COND(mutex_acquire, _intr, mutex_lock_interruptible(&_T->mutex) == 0)
> > +DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
> > +	       mutex_lock_interruptible)
> > +
> > +static inline int mutex_try_or_busy(struct mutex *lock)
> > +{
> > +	int ret[] = { -EBUSY, 0 };
> > +
> > +	return ret[mutex_trylock(lock)];
> > +}
> > +
> > +DEFINE_ACQUIRE(mutex_try_acquire, mutex, mutex_unlock,
> > +	       mutex_try_or_busy)
> > +
> >  extern unsigned long mutex_get_owner(struct mutex *lock);
> >  
> >  #endif /* __LINUX_MUTEX_H */
> 
> I'm terribly confused...

I suspect the disconnect is that this proposal adds safety where guard()
does not today. That was driven by the mistake that Linus caught in the
RFC [1]

	at the same time I note that your patch is horribly broken. Look
	at your change to drivers/cxl/core/mbox.c: you made it use

	+       struct mutex *lock __drop(mutex_unlock) =
	+               mutex_intr_acquire(&mds->poison.lock);

	but then you didn't remove the existing unlocking, so that
	function still has

[1]: http://lore.kernel.org/CAHk-=wgRPDGvofj1PU=NemF6iFu308pFZ=w5P+sQyOMGd978fA@mail.gmail.com

I.e. in my haste I forgot to cleanup a straggling open-coded
mutex_unlock(), but that is something the compiler warns about iff we
switch to parallel primitive universe.

> What's wrong with:
> 
> 	CLASS(mutex_intr, lock)(&foo);
> 	if (IS_ERR(__guard_ptr(mutex_intr)(lock)))
> 		return __guard_ptr(mutex_intr)(lock);

__guard_ptr() returns NULL on error, not an ERR_PTR, but I get the gist.

> I mean, yeah __guard_ptr(mutex_intr) doesn't really roll of the tongue,
> but if that is the whole objection, surely we can try and fix that bit
> instead of building an entire parallel set of primitives.

Yes, the "entire set of parallel primitives" was the least confident
part of this proposal, but the more I look, that is a feature (albeit
inelegant) not a bug.

Today one can write:

    guard(mutex_intr)(&lock);
    ...
    mutex_unlock(lock);

...and the compiler does not tell you that the lock may not even be held
upon return, nor that this is unlocking a lock that will also be
unlocked when @lock goes out of scope.

The only type safety today is the BUILD_BUG_ON() in scoped_cond_guard()
when passing in the wrong lock class.

So the proposal is, if you know what you are doing, or have a need to
switch back and forth between scope-based and explicit unlock for a give
lock, use the base primitives. If instead you want to fully convert to
scope-based lock management (excise all explicit unlock() calls) *and*
you want the compiler to validate the conversion, switch to the _acquire
parallel universe.

> Notably, you're going to be running into trouble the moment you want to
> use your acquire stuff on things like raw_spin_trylock_irqsave(),
> because all that already wraps the return type, in order to hide the
> flags thing etc.

I think that is solvable, but only with a new DEFINE_LOCK_GUARD_1() that
knows that the @lock member of class_##name##_t needs to be cast to the
base lock type.

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Wed, May 07, 2025 at 02:18:25PM -0700, Dan Williams wrote:
> Peter Zijlstra wrote:
> [..]
> > > @@ -202,6 +204,28 @@ DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
> > >  DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
> > >  DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
> > >  
> > > +/* mutex type that only implements scope-based unlock */
> > > +struct mutex_acquire {
> > > +	/* private: */
> > > +	struct mutex mutex;
> > > +};
> > > +DEFINE_GUARD(mutex_acquire, struct mutex_acquire *, mutex_lock(&_T->mutex),
> > > +	     mutex_unlock(&_T->mutex))
> > > +DEFINE_GUARD_COND(mutex_acquire, _try, mutex_trylock(&_T->mutex))
> > > +DEFINE_GUARD_COND(mutex_acquire, _intr, mutex_lock_interruptible(&_T->mutex) == 0)
> > > +DEFINE_ACQUIRE(mutex_intr_acquire, mutex, mutex_unlock,
> > > +	       mutex_lock_interruptible)
> > > +
> > > +static inline int mutex_try_or_busy(struct mutex *lock)
> > > +{
> > > +	int ret[] = { -EBUSY, 0 };
> > > +
> > > +	return ret[mutex_trylock(lock)];
> > > +}
> > > +
> > > +DEFINE_ACQUIRE(mutex_try_acquire, mutex, mutex_unlock,
> > > +	       mutex_try_or_busy)
> > > +
> > >  extern unsigned long mutex_get_owner(struct mutex *lock);
> > >  
> > >  #endif /* __LINUX_MUTEX_H */
> > 
> > I'm terribly confused...
> 
> I suspect the disconnect is that this proposal adds safety where guard()
> does not today. That was driven by the mistake that Linus caught in the
> RFC [1]
> 
> 	at the same time I note that your patch is horribly broken. Look
> 	at your change to drivers/cxl/core/mbox.c: you made it use
> 	
> 	+       struct mutex *lock __drop(mutex_unlock) =
> 	+               mutex_intr_acquire(&mds->poison.lock);
> 	
> 	but then you didn't remove the existing unlocking, so that
> 	function still has
> 
> [1]: http://lore.kernel.org/CAHk-=wgRPDGvofj1PU=NemF6iFu308pFZ=w5P+sQyOMGd978fA@mail.gmail.com
> 
> I.e. in my haste I forgot to cleanup a straggling open-coded
> mutex_unlock(), but that is something the compiler warns about iff we
> switch to parallel primitive universe.

AFAICT all you've done is an effective rename. You can still write:

	mutex_unlock(&foo->lock.lock);

and the compiler will again happily do so.

> > What's wrong with:
> > 
> > 	CLASS(mutex_intr, lock)(&foo);
> > 	if (IS_ERR(__guard_ptr(mutex_intr)(lock)))
> > 		return __guard_ptr(mutex_intr)(lock);
> 
> __guard_ptr() returns NULL on error, not an ERR_PTR, but I get the gist.

(Yeah, I realized after sending, but didn't want to self-reply for
something minor like that :-)

> > I mean, yeah __guard_ptr(mutex_intr) doesn't really roll of the tongue,
> > but if that is the whole objection, surely we can try and fix that bit
> > instead of building an entire parallel set of primitives.
> 
> Yes, the "entire set of parallel primitives" was the least confident
> part of this proposal, but the more I look, that is a feature (albeit
> inelegant) not a bug.
> 
> Today one can write:
> 
>     guard(mutex_intr)(&lock);
>     ...
>     mutex_unlock(lock);
> 
> ...and the compiler does not tell you that the lock may not even be held
> upon return, nor that this is unlocking a lock that will also be
> unlocked when @lock goes out of scope.
> 
> The only type safety today is the BUILD_BUG_ON() in scoped_cond_guard()
> when passing in the wrong lock class.
> 
> So the proposal is, if you know what you are doing, or have a need to
> switch back and forth between scope-based and explicit unlock for a give
> lock, use the base primitives. If instead you want to fully convert to
> scope-based lock management (excise all explicit unlock() calls) *and*
> you want the compiler to validate the conversion, switch to the _acquire
> parallel universe.

As with all refactoring ever, the rename trick always works. But I don't
think that warrants building a parallel infrastructure just for that.

Specifically, it very much does not disallow calling mutex_unlock() on
your new member. So all you get is some compiler help during refactor,
and again, just rename the lock member already.

Also, if we ever actually get LLVM's Thread Safety Analysis working,
that will help us with all these problems:

  https://lore.kernel.org/all/20250304092417.2873893-1-elver@google.com/

But the compiler needs a little more work go grok C :-)

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Dan Williams 9 months ago

Peter Zijlstra wrote:
[..]
> > So the proposal is, if you know what you are doing, or have a need to
> > switch back and forth between scope-based and explicit unlock for a give
> > lock, use the base primitives. If instead you want to fully convert to
> > scope-based lock management (excise all explicit unlock() calls) *and*
> > you want the compiler to validate the conversion, switch to the _acquire
> > parallel universe.
> 
> As with all refactoring ever, the rename trick always works. But I don't
> think that warrants building a parallel infrastructure just for that.
> 
> Specifically, it very much does not disallow calling mutex_unlock() on
> your new member. So all you get is some compiler help during refactor,
> and again, just rename the lock member already.
> 
> Also, if we ever actually get LLVM's Thread Safety Analysis working,
> that will help us with all these problems:
> 
>   https://lore.kernel.org/all/20250304092417.2873893-1-elver@google.com/

That looks lovely.

> But the compiler needs a little more work go grok C :-)

Ok, here is a last shot that incorporates all the feedback:

1/ Conceptually no need for a new CLASS() save for the fact that
   __guard_ptr() returns NULL on failure, not an ERR_PTR().

2/ The rename trick is not true type safety, especially if it leads to
   parallel universe of primitives, but it is a useful trick.

3/ "IS_ERR(__guard_ptr(mutex_intr)(lock))" is a mouthful, would be nice
   to have something more succint while maintaining some safety.

That leads me to a scheme like the following:

    DEFINE_GUARD_ERR(mutex, _intr, mutex_lock_interruptible(_T))
    ...
    ACQUIRE(mutex_intr, lock)(&obj->lock);
    if (IS_ERR(lock))
       return PTR_ERR(lock);

Where that guard class differs from mutex_intr in that it returns an
ERR_PTR(). Some type-safety is provided by ACQUIRE() which looks for the
"mutex_intr_err" class not "mutex_intr".

The rename trick can be opt-in for helping to validate refactoring, but
the expectation is that something like Thread Safety Analysis should
make that rename track moot. In the meantime code stays with all the
same named primitives, only ever one primitive universe to handle.

I am open to making the rule be that "#define MUTEX_ACQUIRE" never ships
upstream, it's only a local hack during code refactoring to check
assumptions.

-- 8< --
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..b767d3e8f9c7 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1,9 +1,10 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+#define MUTEX_ACQUIRE
+#include <linux/mutex.h>
 #include <linux/security.h>
 #include <linux/debugfs.h>
 #include <linux/ktime.h>
-#include <linux/mutex.h>
 #include <linux/unaligned.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
@@ -1394,9 +1395,9 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int nr_records = 0;
 	int rc;
 
-	rc = mutex_lock_interruptible(&mds->poison.lock);
-	if (rc)
-		return rc;
+	ACQUIRE(mutex_intr, lock)(&mds->poison.lock);
+	if (IS_ERR(lock))
+		return PTR_ERR(lock);
 
 	po = mds->poison.list_out;
 	pi.offset = cpu_to_le64(offset);
@@ -1430,7 +1431,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		}
 	} while (po->flags & CXL_POISON_FLAG_MORE);
 
-	mutex_unlock(&mds->poison.lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, "CXL");
@@ -1466,7 +1466,7 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
 		return rc;
 	}
 
-	mutex_init(&mds->poison.lock);
+	mutex_init(&mds->poison.lock.mutex);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, "CXL");
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..9b4ab5d1a7c4 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -257,7 +257,7 @@ struct cxl_poison_state {
 	u32 max_errors;
 	DECLARE_BITMAP(enabled_cmds, CXL_POISON_ENABLED_MAX);
 	struct cxl_mbox_poison_out *list_out;
-	struct mutex lock;  /* Protect reads of poison list */
+	struct mutex_acquire lock;  /* Protect reads of poison list */
 };
 
 /*
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 7e57047e1564..403947d2537e 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -291,16 +291,21 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
 #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond)	\
 static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
+#define __DEFINE_CLASS_IS_ERR_PTR(_name, _is_err)	\
+static __maybe_unused const bool class_##_name##_is_err_ptr = _is_err
+
 #define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \
 	static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
 	{ return (void *)(__force unsigned long)*(_exp); }
 
-#define DEFINE_CLASS_IS_GUARD(_name) \
+#define DEFINE_CLASS_IS_GUARD(_name)                 \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \
+	__DEFINE_CLASS_IS_ERR_PTR(_name, false);     \
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
-#define DEFINE_CLASS_IS_COND_GUARD(_name) \
+#define DEFINE_CLASS_IS_COND_GUARD(_name)           \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, true); \
+	__DEFINE_CLASS_IS_ERR_PTR(_name, false);    \
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
@@ -309,6 +314,7 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
 #define DEFINE_GUARD_COND(_name, _ext, _condlock) \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
+	__DEFINE_CLASS_IS_ERR_PTR(_name##_ext, false); \
 	EXTEND_CLASS(_name, _ext, \
 		     ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
 		     class_##_name##_t _T) \
@@ -320,6 +326,7 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
 #define __guard_ptr(_name) class_##_name##_lock_ptr
 #define __is_cond_ptr(_name) class_##_name##_is_conditional
+#define __is_guard_err_ptr(_name) class_##_name##_is_err_ptr
 
 /*
  * Helper macro for scoped_guard().
@@ -346,6 +353,7 @@ _label:									\
 	for (CLASS(_name, scope)(args); true; ({ goto _label; }))	\
 		if (!__guard_ptr(_name)(&scope)) {			\
 			BUILD_BUG_ON(!__is_cond_ptr(_name));		\
+			BUILD_BUG_ON(__is_guard_err_ptr(_name));	\
 			_fail;						\
 _label:									\
 			break;						\
@@ -424,5 +432,43 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
 	{ return class_##_name##_lock_ptr(_T); }
 
+#define EXTEND_CLASS_ERR(_name, ext, _init, _init_args...)                    \
+	typedef class_##_name##_t class_##_name##ext##_err_t;                 \
+	static inline void class_##_name##ext##_err_destructor(               \
+		class_##_name##_t *p)                                         \
+	{                                                                     \
+		/* base destructors do not expect ERR_PTR */                  \
+		if (IS_ERR(p))                                                \
+			p = NULL;                                             \
+		class_##_name##_destructor(p);                                \
+	}                                                                     \
+	static inline class_##_name##_t class_##_name##ext##_err_constructor( \
+		_init_args)                                                   \
+	{                                                                     \
+		class_##_name##_t t = _init;                                  \
+		return t;                                                     \
+	}
+
+#define DEFINE_GUARD_ERR(_name, _ext, _condlock)                \
+	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext##_err, true); \
+	__DEFINE_CLASS_IS_ERR_PTR(_name##_ext##_err, true);     \
+	EXTEND_CLASS_ERR(_name, _ext, ({                        \
+				 void *_t = _T;                 \
+				 int err = _condlock;           \
+                                                                \
+				 if (err)                       \
+					 _t = ERR_PTR(err);     \
+				 _t;                            \
+			 }),                                    \
+			 class_##_name##_t _T)                  \
+	static inline void *class_##_name##_ext##_err_lock_ptr( \
+		class_##_name##_t *_T)                          \
+	{                                                       \
+		return class_##_name##_lock_ptr(_T);            \
+	}
+
+#define ACQUIRE(_name, var)                                                   \
+	class_##_name##_err_t var __cleanup(class_##_name##_err_destructor) = \
+		class_##_name##_err_constructor
 
 #endif /* _LINUX_CLEANUP_H */
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2143d05116be..bd4b449ea6bd 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -198,9 +198,37 @@ extern void mutex_unlock(struct mutex *lock);
 
 extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
 
+/*
+ * For subsystems that want to use a "rename trick" to use type-safety
+ * to validate debug scope-based unlock, define MUTEX_ACQUIRE before
+ * including mutex.h.
+ */
+struct mutex_acquire {
+	/* private: */
+	struct mutex mutex;
+};
+
+static inline int mutex_try_or_busy(struct mutex *lock)
+{
+	int ret[] = { -EBUSY, 0 };
+
+	return ret[mutex_trylock(lock)];
+}
+
+#ifndef MUTEX_ACQUIRE
 DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
 DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
 DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
+DEFINE_GUARD_ERR(mutex, _intr, mutex_lock_interruptible(_T))
+DEFINE_GUARD_ERR(mutex, _try, mutex_try_or_busy(_T))
+#else
+DEFINE_GUARD(mutex, struct mutex_acquire *, mutex_lock(&_T->mutex),
+	     mutex_unlock(&_T->mutex))
+DEFINE_GUARD_COND(mutex, _try, mutex_trylock(&_T->mutex))
+DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(&_T->mutex) == 0)
+DEFINE_GUARD_ERR(mutex, _intr, mutex_lock_interruptible(&_T->mutex))
+DEFINE_GUARD_ERR(mutex, _try, mutex_try_or_busy(&_T->mutex))
+#endif
 
 extern unsigned long mutex_get_owner(struct mutex *lock);

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Thu, May 08, 2025 at 10:04:32PM -0700, Dan Williams wrote:
> Peter Zijlstra wrote:
> [..]
> > > So the proposal is, if you know what you are doing, or have a need to
> > > switch back and forth between scope-based and explicit unlock for a give
> > > lock, use the base primitives. If instead you want to fully convert to
> > > scope-based lock management (excise all explicit unlock() calls) *and*
> > > you want the compiler to validate the conversion, switch to the _acquire
> > > parallel universe.
> > 
> > As with all refactoring ever, the rename trick always works. But I don't
> > think that warrants building a parallel infrastructure just for that.
> > 
> > Specifically, it very much does not disallow calling mutex_unlock() on
> > your new member. So all you get is some compiler help during refactor,
> > and again, just rename the lock member already.
> > 
> > Also, if we ever actually get LLVM's Thread Safety Analysis working,
> > that will help us with all these problems:
> > 
> >   https://lore.kernel.org/all/20250304092417.2873893-1-elver@google.com/
> 
> That looks lovely.
> 
> > But the compiler needs a little more work go grok C :-)
> 
> Ok, here is a last shot that incorporates all the feedback:
> 
> 1/ Conceptually no need for a new CLASS() save for the fact that
>    __guard_ptr() returns NULL on failure, not an ERR_PTR().
> 
> 2/ The rename trick is not true type safety, especially if it leads to
>    parallel universe of primitives, but it is a useful trick.
> 
> 3/ "IS_ERR(__guard_ptr(mutex_intr)(lock))" is a mouthful, would be nice
>    to have something more succint while maintaining some safety.
> 
> That leads me to a scheme like the following:
> 
>     DEFINE_GUARD_ERR(mutex, _intr, mutex_lock_interruptible(_T))
>     ...
>     ACQUIRE(mutex_intr, lock)(&obj->lock);
>     if (IS_ERR(lock))
>        return PTR_ERR(lock);

Urgh.. can you live with something like this?

---
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..6b0ca400b393 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1394,8 +1394,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int nr_records = 0;
 	int rc;
 
-	rc = mutex_lock_interruptible(&mds->poison.lock);
-	if (rc)
+	ACQUIRE(mutex_intr, lock)(&mds->poison.poison_lock);
+	if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
 		return rc;
 
 	po = mds->poison.list_out;
@@ -1430,7 +1430,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		}
 	} while (po->flags & CXL_POISON_FLAG_MORE);
 
-	mutex_unlock(&mds->poison.lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, "CXL");
@@ -1466,7 +1465,7 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
 		return rc;
 	}
 
-	mutex_init(&mds->poison.lock);
+	mutex_init(&mds->poison.poison_lock);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, "CXL");
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..85a160c778ae 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -257,7 +257,7 @@ struct cxl_poison_state {
 	u32 max_errors;
 	DECLARE_BITMAP(enabled_cmds, CXL_POISON_ENABLED_MAX);
 	struct cxl_mbox_poison_out *list_out;
-	struct mutex lock;  /* Protect reads of poison list */
+	struct mutex poison_lock;  /* Protect reads of poison list */
 };
 
 /*
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 7093e1d08af0..c0439fd63915 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -3,6 +3,8 @@
 #define _LINUX_CLEANUP_H
 
 #include <linux/compiler.h>
+#include <linux/err.h>
+#include <linux/args.h>
 
 /**
  * DOC: scope-based cleanup helpers
@@ -323,23 +325,40 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
-	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
+	DEFINE_CLASS(_name, _type, if (!IS_ERR_OR_NULL(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
 	DEFINE_CLASS_IS_GUARD(_name)
 
-#define DEFINE_GUARD_COND(_name, _ext, _condlock) \
+#define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
 	EXTEND_CLASS(_name, _ext, \
-		     ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
+		     ({ void *_t = _T; int _RET = (_lock); if (_T && !(_cond)) _t = ERR_PTR(_RET); _t; }), \
 		     class_##_name##_t _T) \
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
 	{ return class_##_name##_lock_ptr(_T); }
 
+/*
+ * Default binary condition; success on 'true'.
+ */
+#define DEFINE_GUARD_COND_3(_name, _ext, _lock) \
+	DEFINE_GUARD_COND_4(_name, _ext, _lock, _RET)
+
+#define DEFINE_GUARD_COND(X...) CONCATENATE(DEFINE_GUARD_COND_, COUNT_ARGS(X))(X)
+
 #define guard(_name) \
 	CLASS(_name, __UNIQUE_ID(guard))
 
 #define __guard_ptr(_name) class_##_name##_lock_ptr
 #define __is_cond_ptr(_name) class_##_name##_is_conditional
 
+#define ACQUIRE(_name, _var) \
+	CLASS(_name, _var)
+
+#define ACQUIRE_ERR(_name, _var) \
+	({ long _rc = PTR_ERR(__guard_ptr(_name)(_var)); \
+	   if (!_rc) _rc = -EBUSY; \
+	   if (!IS_ERR_VALUE(_rc)) _rc = 0; \
+	   _rc; })
+
 /*
  * Helper macro for scoped_guard().
  *
@@ -401,7 +420,7 @@ typedef struct {							\
 									\
 static inline void class_##_name##_destructor(class_##_name##_t *_T)	\
 {									\
-	if (_T->lock) { _unlock; }					\
+	if (!IS_ERR_OR_NULL(_T->lock)) { _unlock; }			\
 }									\
 									\
 __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)
@@ -433,15 +452,20 @@ __DEFINE_CLASS_IS_CONDITIONAL(_name, false);				\
 __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__)		\
 __DEFINE_LOCK_GUARD_0(_name, _lock)
 
-#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock)		\
+#define DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _cond)		\
 	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true);		\
 	EXTEND_CLASS(_name, _ext,					\
 		     ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
-		        if (_T->lock && !(_condlock)) _T->lock = NULL;	\
+		        int _RET = (_lock);                             \
+		        if (_T->lock && !(_cond)) _T->lock = ERR_PTR(_RET);\
 			_t; }),						\
 		     typeof_member(class_##_name##_t, lock) l)		\
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
 	{ return class_##_name##_lock_ptr(_T); }
 
+#define DEFINE_LOCK_GUARD_1_COND_3(_name, _ext, _lock) \
+	DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _RET);
+
+#define DEFINE_LOCK_GUARD_1_COND(X...) CONCATENATE(DEFINE_LOCK_GUARD_1_COND_, COUNT_ARGS(X))(X)
 
 #endif /* _LINUX_CLEANUP_H */
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2143d05116be..232fdde82bbb 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -200,7 +200,7 @@ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
 
 DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
 DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
-DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
+DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T), _RET == 0)
 
 extern unsigned long mutex_get_owner(struct mutex *lock);
 
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index c8b543d428b0..c810deb88d13 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -240,7 +240,7 @@ extern void up_write(struct rw_semaphore *sem);
 
 DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T))
 DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T))
-DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0)
+DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T), _RET == 0)
 
 DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T))
 DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T))

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by dan.j.williams@intel.com 9 months ago

Peter Zijlstra wrote:
> On Thu, May 08, 2025 at 10:04:32PM -0700, Dan Williams wrote:
> > Peter Zijlstra wrote:
> > [..]
> > > > So the proposal is, if you know what you are doing, or have a need to
> > > > switch back and forth between scope-based and explicit unlock for a give
> > > > lock, use the base primitives. If instead you want to fully convert to
> > > > scope-based lock management (excise all explicit unlock() calls) *and*
> > > > you want the compiler to validate the conversion, switch to the _acquire
> > > > parallel universe.
> > > 
> > > As with all refactoring ever, the rename trick always works. But I don't
> > > think that warrants building a parallel infrastructure just for that.
> > > 
> > > Specifically, it very much does not disallow calling mutex_unlock() on
> > > your new member. So all you get is some compiler help during refactor,
> > > and again, just rename the lock member already.
> > > 
> > > Also, if we ever actually get LLVM's Thread Safety Analysis working,
> > > that will help us with all these problems:
> > > 
> > >   https://lore.kernel.org/all/20250304092417.2873893-1-elver@google.com/
> > 
> > That looks lovely.
> > 
> > > But the compiler needs a little more work go grok C :-)
> > 
> > Ok, here is a last shot that incorporates all the feedback:
> > 
> > 1/ Conceptually no need for a new CLASS() save for the fact that
> >    __guard_ptr() returns NULL on failure, not an ERR_PTR().
> > 
> > 2/ The rename trick is not true type safety, especially if it leads to
> >    parallel universe of primitives, but it is a useful trick.
> > 
> > 3/ "IS_ERR(__guard_ptr(mutex_intr)(lock))" is a mouthful, would be nice
> >    to have something more succint while maintaining some safety.
> > 
> > That leads me to a scheme like the following:
> > 
> >     DEFINE_GUARD_ERR(mutex, _intr, mutex_lock_interruptible(_T))
> >     ...
> >     ACQUIRE(mutex_intr, lock)(&obj->lock);
> >     if (IS_ERR(lock))
> >        return PTR_ERR(lock);
> 
> Urgh.. can you live with something like this?
> 
> ---
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d72764056ce6..6b0ca400b393 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1394,8 +1394,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  	int nr_records = 0;
>  	int rc;
>  
> -	rc = mutex_lock_interruptible(&mds->poison.lock);
> -	if (rc)
> +	ACQUIRE(mutex_intr, lock)(&mds->poison.poison_lock);
> +	if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
>  		return rc;

Yes, I can live with that, and I like the compactness of the resulting
cleanup.h macros better than my attempt.

Although, how about this small ergonomic fixup for more symmetry between
ACQUIRE() and ACQUIRE_ERR()?

---
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 6b0ca400b393..e5d2171c9a48 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1395,7 +1395,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int rc;
 
 	ACQUIRE(mutex_intr, lock)(&mds->poison.poison_lock);
-	if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
+	if ((rc = ACQUIRE_ERR(mutex_intr, lock)))
 		return rc;
 
 	po = mds->poison.list_out;
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 17d4655a6b73..b379ff445179 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -335,7 +342,7 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 	CLASS(_name, _var)
 
 #define ACQUIRE_ERR(_name, _var) \
-	({ long _rc = PTR_ERR(__guard_ptr(_name)(_var)); \
+	({ long _rc = PTR_ERR(__guard_ptr(_name)(&(_var))); \
 	   if (!_rc) _rc = -EBUSY; \
 	   if (!IS_ERR_VALUE(_rc)) _rc = 0; \
 	   _rc; })


---

Also, I needed the following to get this to compile:

---

diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 17d4655a6b73..052bbad6ac68 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -305,8 +305,15 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, true); \
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
+/*
+ * For guard with a potential percpu lock, the address space needs to be
+ * cast away.
+ */
+#define IS_ERR_OR_NULL_ANY(x) \
+IS_ERR_OR_NULL((const void *)(__force const unsigned long)(x))
+
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
-	DEFINE_CLASS(_name, _type, if (!IS_ERR_OR_NULL(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
+	DEFINE_CLASS(_name, _type, if (!IS_ERR_OR_NULL_ANY(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
 	DEFINE_CLASS_IS_GUARD(_name)
 
 #define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
@@ -401,7 +408,7 @@ typedef struct {							\
 									\
 static inline void class_##_name##_destructor(class_##_name##_t *_T)	\
 {									\
-	if (!IS_ERR_OR_NULL(_T->lock)) { _unlock; }			\
+	if (!IS_ERR_OR_NULL_ANY(_T->lock)) { _unlock; }			\
 }									\
 									\
 __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Fri, May 09, 2025 at 06:11:49PM -0700, dan.j.williams@intel.com wrote:

> ---
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 6b0ca400b393..e5d2171c9a48 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1395,7 +1395,7 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>  	int rc;
>  
>  	ACQUIRE(mutex_intr, lock)(&mds->poison.poison_lock);
> -	if ((rc = ACQUIRE_ERR(mutex_intr, &lock)))
> +	if ((rc = ACQUIRE_ERR(mutex_intr, lock)))
>  		return rc;
>  
>  	po = mds->poison.list_out;
> diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
> index 17d4655a6b73..b379ff445179 100644
> --- a/include/linux/cleanup.h
> +++ b/include/linux/cleanup.h
> @@ -335,7 +342,7 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
>  	CLASS(_name, _var)
>  
>  #define ACQUIRE_ERR(_name, _var) \
> -	({ long _rc = PTR_ERR(__guard_ptr(_name)(_var)); \
> +	({ long _rc = PTR_ERR(__guard_ptr(_name)(&(_var))); \
>  	   if (!_rc) _rc = -EBUSY; \
>  	   if (!IS_ERR_VALUE(_rc)) _rc = 0; \
>  	   _rc; })
> 

No strong feelings about this. I see arguments either way.

> ---
> 
> Also, I needed the following to get this to compile:

Ah, I didn't get beyond nvim/clangd not complaining about things :-)

> ---
> 
> diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
> index 17d4655a6b73..052bbad6ac68 100644
> --- a/include/linux/cleanup.h
> +++ b/include/linux/cleanup.h
> @@ -305,8 +305,15 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
>  	__DEFINE_CLASS_IS_CONDITIONAL(_name, true); \
>  	__DEFINE_GUARD_LOCK_PTR(_name, _T)
>  
> +/*
> + * For guard with a potential percpu lock, the address space needs to be
> + * cast away.
> + */
> +#define IS_ERR_OR_NULL_ANY(x) \
> +IS_ERR_OR_NULL((const void *)(__force const unsigned long)(x))
> +
>  #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
> -	DEFINE_CLASS(_name, _type, if (!IS_ERR_OR_NULL(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
> +	DEFINE_CLASS(_name, _type, if (!IS_ERR_OR_NULL_ANY(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
>  	DEFINE_CLASS_IS_GUARD(_name)
>  
>  #define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
> @@ -401,7 +408,7 @@ typedef struct {							\
>  									\
>  static inline void class_##_name##_destructor(class_##_name##_t *_T)	\
>  {									\
> -	if (!IS_ERR_OR_NULL(_T->lock)) { _unlock; }			\
> +	if (!IS_ERR_OR_NULL_ANY(_T->lock)) { _unlock; }			\
>  }									\
>  									\
>  __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)

So looking at this I realized I might have inadvertly broken
scoped_guard() and scoped_cond_guard(), both rely on !__guard_ptr() for
the failure case, and this is no longer true.

The below seems to cover things. I even did a simple compile :-)

---

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..8a52e337dd0f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1394,8 +1394,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 	int nr_records = 0;
 	int rc;
 
-	rc = mutex_lock_interruptible(&mds->poison.lock);
-	if (rc)
+	ACQUIRE(mutex_intr, lock)(&mds->poison.mutex);
+	if ((rc = ACQUIRE_ERR(mutex_intr, lock)))
 		return rc;
 
 	po = mds->poison.list_out;
@@ -1430,7 +1430,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 		}
 	} while (po->flags & CXL_POISON_FLAG_MORE);
 
-	mutex_unlock(&mds->poison.lock);
 	return rc;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_mem_get_poison, "CXL");
@@ -1466,7 +1465,7 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
 		return rc;
 	}
 
-	mutex_init(&mds->poison.lock);
+	mutex_init(&mds->poison.mutex);
 	return 0;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, "CXL");
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..1dfd361ed295 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -257,7 +257,7 @@ struct cxl_poison_state {
 	u32 max_errors;
 	DECLARE_BITMAP(enabled_cmds, CXL_POISON_ENABLED_MAX);
 	struct cxl_mbox_poison_out *list_out;
-	struct mutex lock;  /* Protect reads of poison list */
+	struct mutex mutex;  /* Protect reads of poison list */
 };
 
 /*
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 7093e1d08af0..a20d70e563e1 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -3,6 +3,8 @@
 #define _LINUX_CLEANUP_H
 
 #include <linux/compiler.h>
+#include <linux/err.h>
+#include <linux/args.h>
 
 /**
  * DOC: scope-based cleanup helpers
@@ -310,9 +312,17 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
 #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond)	\
 static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
+#define __GUARD_ERR(_ptr) \
+	({ long _rc = (__force unsigned long)(_ptr); \
+	   if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
+	   _rc; })
+
 #define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \
 	static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
-	{ return (void *)(__force unsigned long)*(_exp); }
+	{ void *_ptr = (void *)(__force unsigned long)*(_exp); \
+	  if (IS_ERR(_ptr)) { _ptr = NULL; } return _ptr; } \
+	static inline int class_##_name##_lock_err(class_##_name##_t *_T) \
+	{ return __GUARD_ERR(*(_exp)); }
 
 #define DEFINE_CLASS_IS_GUARD(_name) \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \
@@ -323,23 +333,37 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
-	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
+	DEFINE_CLASS(_name, _type, if (!__GUARD_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
 	DEFINE_CLASS_IS_GUARD(_name)
 
-#define DEFINE_GUARD_COND(_name, _ext, _condlock) \
+#define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
 	EXTEND_CLASS(_name, _ext, \
-		     ({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
+		     ({ void *_t = _T; int _RET = (_lock); if (_T && !(_cond)) _t = ERR_PTR(_RET); _t; }), \
 		     class_##_name##_t _T) \
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
-	{ return class_##_name##_lock_ptr(_T); }
+	{ return class_##_name##_lock_ptr(_T); } \
+	static inline int class_##_name##_ext##_lock_err(class_##_name##_t *_T) \
+	{ return class_##_name##_lock_err(_T); }
+
+/*
+ * Default binary condition; success on 'true'.
+ */
+#define DEFINE_GUARD_COND_3(_name, _ext, _lock) \
+	DEFINE_GUARD_COND_4(_name, _ext, _lock, _RET)
+
+#define DEFINE_GUARD_COND(X...) CONCATENATE(DEFINE_GUARD_COND_, COUNT_ARGS(X))(X)
 
 #define guard(_name) \
 	CLASS(_name, __UNIQUE_ID(guard))
 
 #define __guard_ptr(_name) class_##_name##_lock_ptr
+#define __guard_err(_name) class_##_name##_lock_err
 #define __is_cond_ptr(_name) class_##_name##_is_conditional
 
+#define ACQUIRE(_name, _var)     CLASS(_name, _var)
+#define ACQUIRE_ERR(_name, _var) __guard_err(_name)(&(_var))
+
 /*
  * Helper macro for scoped_guard().
  *
@@ -401,7 +425,7 @@ typedef struct {							\
 									\
 static inline void class_##_name##_destructor(class_##_name##_t *_T)	\
 {									\
-	if (_T->lock) { _unlock; }					\
+	if (!__GUARD_ERR(_T->lock)) { _unlock; }			\
 }									\
 									\
 __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)
@@ -433,15 +457,22 @@ __DEFINE_CLASS_IS_CONDITIONAL(_name, false);				\
 __DEFINE_UNLOCK_GUARD(_name, void, _unlock, __VA_ARGS__)		\
 __DEFINE_LOCK_GUARD_0(_name, _lock)
 
-#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock)		\
+#define DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _cond)		\
 	__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true);		\
 	EXTEND_CLASS(_name, _ext,					\
 		     ({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
-		        if (_T->lock && !(_condlock)) _T->lock = NULL;	\
+		        int _RET = (_lock);                             \
+		        if (_T->lock && !(_cond)) _T->lock = ERR_PTR(_RET);\
 			_t; }),						\
 		     typeof_member(class_##_name##_t, lock) l)		\
 	static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
-	{ return class_##_name##_lock_ptr(_T); }
+	{ return class_##_name##_lock_ptr(_T); } \
+	static inline int class_##_name##_ext##_lock_err(class_##_name##_t *_T) \
+	{ return class_##_name##_lock_err(_T); }
+
+#define DEFINE_LOCK_GUARD_1_COND_3(_name, _ext, _lock) \
+	DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _RET);
 
+#define DEFINE_LOCK_GUARD_1_COND(X...) CONCATENATE(DEFINE_LOCK_GUARD_1_COND_, COUNT_ARGS(X))(X)
 
 #endif /* _LINUX_CLEANUP_H */
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2143d05116be..232fdde82bbb 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -200,7 +200,7 @@ extern int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
 
 DEFINE_GUARD(mutex, struct mutex *, mutex_lock(_T), mutex_unlock(_T))
 DEFINE_GUARD_COND(mutex, _try, mutex_trylock(_T))
-DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T) == 0)
+DEFINE_GUARD_COND(mutex, _intr, mutex_lock_interruptible(_T), _RET == 0)
 
 extern unsigned long mutex_get_owner(struct mutex *lock);
 
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index c8b543d428b0..c810deb88d13 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -240,7 +240,7 @@ extern void up_write(struct rw_semaphore *sem);
 
 DEFINE_GUARD(rwsem_read, struct rw_semaphore *, down_read(_T), up_read(_T))
 DEFINE_GUARD_COND(rwsem_read, _try, down_read_trylock(_T))
-DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T) == 0)
+DEFINE_GUARD_COND(rwsem_read, _intr, down_read_interruptible(_T), _RET == 0)
 
 DEFINE_GUARD(rwsem_write, struct rw_semaphore *, down_write(_T), up_write(_T))
 DEFINE_GUARD_COND(rwsem_write, _try, down_write_trylock(_T))

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Mon, May 12, 2025 at 12:50:26PM +0200, Peter Zijlstra wrote:

> +#define __GUARD_ERR(_ptr) \
> +	({ long _rc = (__force unsigned long)(_ptr); \
> +	   if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
> +	   _rc; })
> +

>  #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
> -	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
> +	DEFINE_CLASS(_name, _type, if (!__GUARD_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
>  	DEFINE_CLASS_IS_GUARD(_name)

GCC is 'stupid' and this generates atrocious code. I'll play with it.

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Mon, May 12, 2025 at 08:25:59PM +0200, Peter Zijlstra wrote:
> On Mon, May 12, 2025 at 12:50:26PM +0200, Peter Zijlstra wrote:
> 
> > +#define __GUARD_ERR(_ptr) \
> > +	({ long _rc = (__force unsigned long)(_ptr); \
> > +	   if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
> > +	   _rc; })
> > +
> 
> >  #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
> > -	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
> > +	DEFINE_CLASS(_name, _type, if (!__GUARD_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
> >  	DEFINE_CLASS_IS_GUARD(_name)
> 
> GCC is 'stupid' and this generates atrocious code. I'll play with it.

PRE:
    bf9e:       48 85 db                test   %rbx,%rbx
    bfa1:       74 1a                   je     bfbd <foo+0x5d>
    bfa3:       48 81 fb 00 f0 ff ff    cmp    $0xfffffffffffff000,%rbx
    bfaa:       77 11                   ja     bfbd <foo+0x5d>

POST:
    bf9e:       48 8d 43 ff             lea    -0x1(%rbx),%rax
    bfa2:       48 3d ff ef ff ff       cmp    $0xffffffffffffefff,%rax
    bfa8:       77 11                   ja     bfbb <foo+0x5b>

---
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -293,17 +293,18 @@ static inline class_##_name##_t class_##
 #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond)	\
 static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
-#define __GUARD_ERR(_ptr) \
-	({ long _rc = (__force unsigned long)(_ptr); \
-	   if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
-	   _rc; })
+#define __GUARD_IS_ERR(_ptr) \
+	({ unsigned long _rc = (__force unsigned long)(_ptr); \
+	   unlikely((_rc-1) >= -MAX_ERRNO-1); })
 
 #define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \
 	static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
 	{ void *_ptr = (void *)(__force unsigned long)*(_exp); \
 	  if (IS_ERR(_ptr)) { _ptr = NULL; } return _ptr; } \
 	static inline int class_##_name##_lock_err(class_##_name##_t *_T) \
-	{ return __GUARD_ERR(*(_exp)); }
+	{ long _rc = (__force unsigned long)*(_exp); \
+	  if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
+	  return _rc; }
 
 #define DEFINE_CLASS_IS_GUARD(_name) \
 	__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \
@@ -314,7 +315,7 @@ static __maybe_unused const bool class_#
 	__DEFINE_GUARD_LOCK_PTR(_name, _T)
 
 #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
-	DEFINE_CLASS(_name, _type, if (!__GUARD_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
+	DEFINE_CLASS(_name, _type, if (!__GUARD_IS_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
 	DEFINE_CLASS_IS_GUARD(_name)
 
 #define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
@@ -406,7 +407,7 @@ typedef struct {							\
 									\
 static inline void class_##_name##_destructor(class_##_name##_t *_T)	\
 {									\
-	if (!__GUARD_ERR(_T->lock)) { _unlock; }			\
+	if (!__GUARD_IS_ERR(_T->lock)) { _unlock; }			\
 }									\
 									\
 __DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by dan.j.williams@intel.com 9 months ago

Peter Zijlstra wrote:
> On Mon, May 12, 2025 at 08:25:59PM +0200, Peter Zijlstra wrote:
> > On Mon, May 12, 2025 at 12:50:26PM +0200, Peter Zijlstra wrote:
> > 
> > > +#define __GUARD_ERR(_ptr) \
> > > +	({ long _rc = (__force unsigned long)(_ptr); \
> > > +	   if (!_rc) { _rc = -EBUSY; } if (!IS_ERR_VALUE(_rc)) { _rc = 0; } \
> > > +	   _rc; })
> > > +
> > 
> > >  #define DEFINE_GUARD(_name, _type, _lock, _unlock) \
> > > -	DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
> > > +	DEFINE_CLASS(_name, _type, if (!__GUARD_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
> > >  	DEFINE_CLASS_IS_GUARD(_name)
> > 
> > GCC is 'stupid' and this generates atrocious code. I'll play with it.
> 
> PRE:
>     bf9e:       48 85 db                test   %rbx,%rbx
>     bfa1:       74 1a                   je     bfbd <foo+0x5d>
>     bfa3:       48 81 fb 00 f0 ff ff    cmp    $0xfffffffffffff000,%rbx
>     bfaa:       77 11                   ja     bfbd <foo+0x5d>
> 
> POST:
>     bf9e:       48 8d 43 ff             lea    -0x1(%rbx),%rax
>     bfa2:       48 3d ff ef ff ff       cmp    $0xffffffffffffefff,%rax
>     bfa8:       77 11                   ja     bfbb <foo+0x5b>
> 

FWIW this looks good to me, and the conversion to drop all explicit
locking passed tests. I threw it out on a branch to get some bot
coverage in the meantime.

https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=cxl-acquire

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Linus Torvalds 9 months ago

On Mon, 12 May 2025 at 11:58, Peter Zijlstra <peterz@infradead.org> wrote:
>
> > GCC is 'stupid' and this generates atrocious code. I'll play with it.
>
> PRE:
>     bf9e:       48 85 db                test   %rbx,%rbx
>     bfa1:       74 1a                   je     bfbd <foo+0x5d>
>     bfa3:       48 81 fb 00 f0 ff ff    cmp    $0xfffffffffffff000,%rbx
>     bfaa:       77 11                   ja     bfbd <foo+0x5d>
>
> POST:
>     bf9e:       48 8d 43 ff             lea    -0x1(%rbx),%rax
>     bfa2:       48 3d ff ef ff ff       cmp    $0xffffffffffffefff,%rax
>     bfa8:       77 11                   ja     bfbb <foo+0x5b>

I'm not convinced that's actually an improvement.

Yes, it's one less instruction, and three bytes shorter. But it uses
an extra register, so now it might make surrounding code much worse by
making register allocation have a harder time.

If you *really* care about this, I think you should realize that the
non-error case is a valid kernel pointer.

And we could add some architecture-specific function to check for "is
this a valid non-NULL and non-error pointer" with a fallback to the
generic case.

Because then on a platform like x86, where kernel pointers are always
negative, but not *as* negative as the error pointers, you can check
for that with a single compare.

The logic is "add MAX_ERRNO, and if it's still negative, it wasn't
NULL and it wasn't ERR_PTR".

And while 'add' needs a destination register, 'sub' with the negated
value does not, and is called 'cmp'.

So I think you can do that with

        cmp $-MAX_ERRNO,...
        js ...

Sadly, I can't seem to get gcc to generate that code. But I didn't try
very hard.

              Linus

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Mon, May 12, 2025 at 01:39:19PM -0700, Linus Torvalds wrote:
> On Mon, 12 May 2025 at 11:58, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > GCC is 'stupid' and this generates atrocious code. I'll play with it.
> >
> > PRE:
> >     bf9e:       48 85 db                test   %rbx,%rbx
> >     bfa1:       74 1a                   je     bfbd <foo+0x5d>
> >     bfa3:       48 81 fb 00 f0 ff ff    cmp    $0xfffffffffffff000,%rbx
> >     bfaa:       77 11                   ja     bfbd <foo+0x5d>
> >
> > POST:
> >     bf9e:       48 8d 43 ff             lea    -0x1(%rbx),%rax
> >     bfa2:       48 3d ff ef ff ff       cmp    $0xffffffffffffefff,%rax
> >     bfa8:       77 11                   ja     bfbb <foo+0x5b>
> 
> I'm not convinced that's actually an improvement.
> 
> Yes, it's one less instruction, and three bytes shorter. But it uses
> an extra register, so now it might make surrounding code much worse by
> making register allocation have a harder time.

I was going for the one less branch, but yeah, register pressure :/
Typically this is at the end of a scope, and I was hoping this is where
you have free regs etc.

> If you *really* care about this, I think you should realize that the
> non-error case is a valid kernel pointer.
> 
> And we could add some architecture-specific function to check for "is
> this a valid non-NULL and non-error pointer" with a fallback to the
> generic case.
> 
> Because then on a platform like x86, where kernel pointers are always
> negative, but not *as* negative as the error pointers, you can check
> for that with a single compare.
> 
> The logic is "add MAX_ERRNO, and if it's still negative, it wasn't
> NULL and it wasn't ERR_PTR".
> 
> And while 'add' needs a destination register, 'sub' with the negated
> value does not, and is called 'cmp'.
> 
> So I think you can do that with
> 
>         cmp $-MAX_ERRNO,...
>         js ...
> 
> Sadly, I can't seem to get gcc to generate that code. But I didn't try
> very hard.

And so try I must :-)

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Tue, May 13, 2025 at 09:09:18AM +0200, Peter Zijlstra wrote:
> On Mon, May 12, 2025 at 01:39:19PM -0700, Linus Torvalds wrote:

> > If you *really* care about this, I think you should realize that the
> > non-error case is a valid kernel pointer.
> > 
> > And we could add some architecture-specific function to check for "is
> > this a valid non-NULL and non-error pointer" with a fallback to the
> > generic case.
> > 
> > Because then on a platform like x86, where kernel pointers are always
> > negative, but not *as* negative as the error pointers, you can check
> > for that with a single compare.
> > 
> > The logic is "add MAX_ERRNO, and if it's still negative, it wasn't
> > NULL and it wasn't ERR_PTR".
> > 
> > And while 'add' needs a destination register, 'sub' with the negated
> > value does not, and is called 'cmp'.
> > 
> > So I think you can do that with
> > 
> >         cmp $-MAX_ERRNO,...
> >         js ...
> > 
> > Sadly, I can't seem to get gcc to generate that code. But I didn't try
> > very hard.

Yeah, it seems to really like emitting add and lea.

Inline asm obviously works:

003e     c09e:  48 81 fb 01 f0 ff ff    cmp    $0xfffffffffffff001,%rbx
0045     c0a5:  79 11                   jns    c0b8 <foo+0x58>
0047     c0a7:  48 89 df                mov    %rbx,%rdi
004a     c0aa:  e8 00 00 00 00          call   c0af <foo+0x4f>  c0ab: R_X86_64_PLT32    raw_spin_rq_unlock-0x4
...
0058     c0b8:  5b                      pop    %rbx
0059     c0b9:  5d                      pop    %rbp
005a     c0ba:  e9 00 00 00 00          jmp    c0bf <foo+0x5f>  c0bb: R_X86_64_PLT32    __x86_return_thunk-0x4

Just not sure its worth it at this point.

---
diff --git a/arch/x86/include/asm/cleanup.h b/arch/x86/include/asm/cleanup.h
new file mode 100644
index 000000000000..7cef49be8570
--- /dev/null
+++ b/arch/x86/include/asm/cleanup.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_CLEANUP_H
+#define _ASM_X86_CLEANUP_H
+
+#define __GUARD_IS_ERR(_ptr) \
+	({ unsigned long _var = (__force unsigned long)(_ptr); \
+	   bool _s; \
+	   asm_inline volatile ("cmp %[val], %[var]" \
+				: "=@ccns" (_s) \
+				: [val] "i" (-MAX_ERRNO), \
+				  [var] "r" (_var)); \
+	   unlikely(_s); })
+
+#endif /* _ASM_X86_CLEANUP_H */
+
diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild
index 8675b7b4ad23..a59a88c95277 100644
--- a/include/asm-generic/Kbuild
+++ b/include/asm-generic/Kbuild
@@ -12,6 +12,7 @@ mandatory-y += bug.h
 mandatory-y += cacheflush.h
 mandatory-y += cfi.h
 mandatory-y += checksum.h
+mandatory-y += cleanup.h
 mandatory-y += compat.h
 mandatory-y += current.h
 mandatory-y += delay.h
diff --git a/include/asm-generic/cleanup.h b/include/asm-generic/cleanup.h
new file mode 100644
index 000000000000..616ae558638e
--- /dev/null
+++ b/include/asm-generic/cleanup.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_GENERIC_CLEANUP_H
+#define _ASM_GENERIC_CLEANUP_H
+
+#endif /* _ASM_GENERIC_CLEANUP_H */
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 18209e191973..00e5ef7aa314 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -6,6 +6,8 @@
 #include <linux/err.h>
 #include <linux/args.h>
 
+#include <asm/cleanup.h>
+
 /**
  * DOC: scope-based cleanup helpers
  *
@@ -312,9 +314,11 @@ static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
 #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond)	\
 static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
 
+#ifndef __GUARD_IS_ERR
 #define __GUARD_IS_ERR(_ptr) \
 	({ unsigned long _rc = (__force unsigned long)(_ptr); \
 	   unlikely((_rc-1) >= -(MAX_ERRNO+1)); })
+#endif
 
 #define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \
 	static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Linus Torvalds 9 months ago

On Tue, 13 May 2025 at 01:50, Peter Zijlstra <peterz@infradead.org> wrote:
>
> +#define __GUARD_IS_ERR(_ptr) \
> +       ({ unsigned long _var = (__force unsigned long)(_ptr); \
> +          bool _s; \
> +          asm_inline volatile ("cmp %[val], %[var]" \
> +                               : "=@ccns" (_s) \
> +                               : [val] "i" (-MAX_ERRNO), \
> +                                 [var] "r" (_var)); \
> +          unlikely(_s); })

I think that this might be acceptable if it was some actual common operation.

But for just the conditional guard test, I think it's cute, but I
don't think it's really worth it.

Put another way: if we actually used this for IS_ERR_OR_NULL(), it
might be a worthwhile thing to look at. We have lots of those - some
of them in important core places.

Right now IS_ERR_OR_NULL() generates pretty disgusting code, with
clang doing things like this:

        testq   %rdi, %rdi
        sete    %al
        cmpq    $-4095, %rdi                    # imm = 0xF001
        setae   %cl
        orb     %al, %cl
        je      .LBB3_1

in order to avoid two jumps, while gcc generates that

        testq   %rdi, %rdi
        je      .L189
        cmpq    $-4096, %rdi
        ja      .L189

pattern.

               Linus

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Peter Zijlstra 9 months ago

On Tue, May 13, 2025 at 12:46:29PM -0700, Linus Torvalds wrote:
> On Tue, 13 May 2025 at 01:50, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > +#define __GUARD_IS_ERR(_ptr) \
> > +       ({ unsigned long _var = (__force unsigned long)(_ptr); \
> > +          bool _s; \
> > +          asm_inline volatile ("cmp %[val], %[var]" \
> > +                               : "=@ccns" (_s) \
> > +                               : [val] "i" (-MAX_ERRNO), \
> > +                                 [var] "r" (_var)); \
> > +          unlikely(_s); })
> 
> I think that this might be acceptable if it was some actual common operation.
> 
> But for just the conditional guard test, I think it's cute, but I
> don't think it's really worth it.

Its actually every guard, the destructor is shared between unconditional
and conditional locks.

> Right now IS_ERR_OR_NULL() generates pretty disgusting code, with
> clang doing things like this:
> 
>         testq   %rdi, %rdi
>         sete    %al
>         cmpq    $-4095, %rdi                    # imm = 0xF001
>         setae   %cl
>         orb     %al, %cl
>         je      .LBB3_1

Whee, that is creative indeed :-)

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Al Viro 9 months ago

On Tue, May 13, 2025 at 12:46:29PM -0700, Linus Torvalds wrote:

> Right now IS_ERR_OR_NULL() generates pretty disgusting code, with
> clang doing things like this:
> 
>         testq   %rdi, %rdi
>         sete    %al
>         cmpq    $-4095, %rdi                    # imm = 0xF001
>         setae   %cl
>         orb     %al, %cl
>         je      .LBB3_1
> 
> in order to avoid two jumps, while gcc generates that
> 
>         testq   %rdi, %rdi
>         je      .L189
>         cmpq    $-4096, %rdi
>         ja      .L189
> 
> pattern.

FWIW (unsigned long)v - 1 >= (unsigned long)(-MAX_ERRNO-1) yields
        leaq    -1(%rdi), %rax
	cmpq    $-4097, %rax
	ja      .L4
from gcc and
        leaq    4095(%rdi), %rax
	cmpq    $4095, %rax                     # imm = 0xFFF
	ja      .LBB0_1
from clang...

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Al Viro 9 months ago

On Tue, May 13, 2025 at 09:06:19PM +0100, Al Viro wrote:

> FWIW (unsigned long)v - 1 >= (unsigned long)(-MAX_ERRNO-1) yields
>         leaq    -1(%rdi), %rax
> 	cmpq    $-4097, %rax
> 	ja      .L4
> from gcc and
>         leaq    4095(%rdi), %rax
> 	cmpq    $4095, %rax                     # imm = 0xFFF
> 	ja      .LBB0_1
> from clang...

Nevermind - should've read back through the thread for context.
Sorry.

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by Linus Torvalds 9 months ago

On Tue, 13 May 2025 at 13:31, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Nevermind - should've read back through the thread for context.

Well, your comment did make me test what I can make gcc generate..

I still can't get gcc to do

       cmpq $-4095,%rdi
        jns     .L189

for IS_ERR_OR_NULL() however hard I try.

The best I *can* get both gcc and clang to at least do

        movq    %rdi, %rcx
        addq    $4095, %rcx
        jns     .L189

which I suspect it much better than the "lea+cmpq", because a pure
register move is handled by the renaming and has no cost aside from
the front end (ie decoding).

So

  #define IS_ERR_OR_NULL(ptr) (MAX_ERRNO + (long)(ptr) >= 0)

does seem to be potentially something we could use, and maybe we could
push the compiler people to realize that their current code generation
is bad.

Of course, it doesn't actually *really* work for IS_ERR_OR_NULL(),
because it gets the wrong results for user pointers, and while the
*common* case for the kernel is to test various kernel pointers, the
user pointer case does happen (ie mmap() and friends).

IOW, it's not actually correct in that form, I just wanted to see what
we could do for some limited form of this common pattern.

Anyway, I am surprised that neither gcc nor clang seem to have
realized that you can turn an "add" that just checks the condition
codes for sign or equality into a "cmp" of the negative value.

It seems such a trivial and obvious thing to do. But maybe I'm
confused and am missing something.

            Linus

Re: [PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking

Posted by David Laight 8 months, 3 weeks ago

On Tue, 13 May 2025 14:28:37 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 13 May 2025 at 13:31, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Nevermind - should've read back through the thread for context.  
> 
> Well, your comment did make me test what I can make gcc generate..
> 
> I still can't get gcc to do
> 
>        cmpq $-4095,%rdi
>         jns     .L189
> 
> for IS_ERR_OR_NULL() however hard I try.
> 
> The best I *can* get both gcc and clang to at least do
> 
>         movq    %rdi, %rcx
>         addq    $4095, %rcx
>         jns     .L189
> 
> which I suspect it much better than the "lea+cmpq", because a pure
> register move is handled by the renaming and has no cost aside from
> the front end (ie decoding).
> 
> So
> 
>   #define IS_ERR_OR_NULL(ptr) (MAX_ERRNO + (long)(ptr) >= 0)
> 
> does seem to be potentially something we could use, and maybe we could
> push the compiler people to realize that their current code generation
> is bad.
> 
> Of course, it doesn't actually *really* work for IS_ERR_OR_NULL(),
> because it gets the wrong results for user pointers, and while the
> *common* case for the kernel is to test various kernel pointers, the
> user pointer case does happen (ie mmap() and friends).
> 
> IOW, it's not actually correct in that form, I just wanted to see what
> we could do for some limited form of this common pattern.
> 
> Anyway, I am surprised that neither gcc nor clang seem to have
> realized that you can turn an "add" that just checks the condition
> codes for sign or equality into a "cmp" of the negative value.
> 
> It seems such a trivial and obvious thing to do. But maybe I'm
> confused and am missing something.

Doing the signed compare (long)(ptr) >= -MAX_ERRNO generates cmp + jl
(sign != overflow) which is a better test.

To let user pointers through it might be possible to generate:
	leaq	-1(%reg), %reg
	cmpq	$-4097, %reg
	leaq	1(%reg), %reg
	ja	label
which trades a register for an instruction.
It wouldn't be too bad if the second 'leaq' is moved to the branch
target - especially for any cpu that don't have inc/dec that doesn't
affect the flags.

	David

[PATCH 1/7] cleanup: Introduce DEFINE_ACQUIRE() a CLASS() for conditional locking
[PATCH 2/7] cxl/decoder: Move decoder register programming to a helper
[PATCH 3/7] cxl/decoder: Drop pointless locking
[PATCH 4/7] cxl/region: Split commit_store() into __commit() and queue_reset() helpers
[PATCH 5/7] cxl/region: Move ready-to-probe state check to a helper
[PATCH 6/7] cxl/region: Introduce CLASS(cxl_decoder_detach...) consolidate multiple paths
[PATCH 7/7] cleanup: Create an rwsem conditional acquisition class