.../dev-tools/capability-analysis.rst | 148 +++++ Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/sparse.rst | 19 - Documentation/mm/process_addrs.rst | 6 +- MAINTAINERS | 11 + Makefile | 1 + crypto/Makefile | 2 + crypto/algapi.c | 2 + crypto/api.c | 1 + crypto/crypto_engine.c | 2 +- crypto/drbg.c | 5 + crypto/internal.h | 2 +- crypto/proc.c | 3 + crypto/scompress.c | 8 +- .../net/wireless/intel/iwlwifi/iwl-trans.c | 4 +- .../net/wireless/intel/iwlwifi/iwl-trans.h | 6 +- .../wireless/intel/iwlwifi/pcie/internal.h | 5 +- .../net/wireless/intel/iwlwifi/pcie/trans.c | 4 +- drivers/tty/Makefile | 3 + drivers/tty/n_tty.c | 16 + drivers/tty/pty.c | 1 + drivers/tty/sysrq.c | 1 + drivers/tty/tty.h | 8 +- drivers/tty/tty_buffer.c | 8 +- drivers/tty/tty_io.c | 12 +- drivers/tty/tty_ioctl.c | 2 +- drivers/tty/tty_ldisc.c | 35 +- drivers/tty/tty_ldsem.c | 2 + drivers/tty/tty_mutex.c | 4 + drivers/tty/tty_port.c | 2 + fs/dlm/lock.c | 2 +- include/crypto/internal/engine.h | 2 +- include/linux/bit_spinlock.h | 24 +- include/linux/cleanup.h | 18 +- include/linux/compiler-capability-analysis.h | 368 ++++++++++++ include/linux/compiler.h | 2 + include/linux/compiler_types.h | 18 +- include/linux/console.h | 4 +- include/linux/debugfs.h | 12 +- include/linux/kref.h | 2 + include/linux/list_bl.h | 2 + include/linux/local_lock.h | 18 +- include/linux/local_lock_internal.h | 43 +- include/linux/lockdep.h | 12 +- include/linux/mm.h | 33 +- include/linux/mutex.h | 29 +- include/linux/mutex_types.h | 4 +- include/linux/rcupdate.h | 86 +-- include/linux/refcount.h | 6 +- include/linux/rhashtable.h | 14 +- include/linux/rwlock.h | 22 +- include/linux/rwlock_api_smp.h | 43 +- include/linux/rwlock_rt.h | 44 +- include/linux/rwlock_types.h | 10 +- include/linux/rwsem.h | 56 +- include/linux/sched/signal.h | 14 +- include/linux/seqlock.h | 24 + include/linux/seqlock_types.h | 5 +- include/linux/spinlock.h | 64 +- include/linux/spinlock_api_smp.h | 34 +- include/linux/spinlock_api_up.h | 112 +++- include/linux/spinlock_rt.h | 37 +- include/linux/spinlock_types.h | 10 +- include/linux/spinlock_types_raw.h | 5 +- include/linux/srcu.h | 61 +- include/linux/tty.h | 14 +- include/linux/tty_flip.h | 4 +- include/linux/tty_ldisc.h | 19 +- include/linux/ww_mutex.h | 21 +- kernel/Makefile | 2 + kernel/kcov.c | 36 +- kernel/printk/printk.c | 2 + kernel/signal.c | 4 +- kernel/time/posix-timers.c | 10 +- lib/Kconfig.debug | 45 ++ lib/Makefile | 6 + lib/dec_and_lock.c | 8 +- lib/rhashtable.c | 5 +- lib/stackdepot.c | 20 +- lib/test_capability-analysis.c | 548 ++++++++++++++++++ mm/kfence/Makefile | 2 + mm/kfence/core.c | 20 +- mm/kfence/kfence.h | 14 +- mm/kfence/report.c | 4 +- mm/memory.c | 4 +- mm/pgtable-generic.c | 19 +- net/ipv4/tcp_sigpool.c | 2 +- scripts/Makefile.capability-analysis | 11 + scripts/Makefile.lib | 10 + scripts/capability-analysis-suppression.txt | 32 + scripts/checkpatch.pl | 8 + security/tomoyo/Makefile | 2 + security/tomoyo/common.c | 52 +- security/tomoyo/common.h | 77 +-- security/tomoyo/domain.c | 1 + security/tomoyo/environ.c | 1 + security/tomoyo/file.c | 5 + security/tomoyo/gc.c | 28 +- security/tomoyo/mount.c | 2 + security/tomoyo/network.c | 3 + tools/include/linux/compiler_types.h | 2 - 101 files changed, 2086 insertions(+), 521 deletions(-) create mode 100644 Documentation/dev-tools/capability-analysis.rst create mode 100644 include/linux/compiler-capability-analysis.h create mode 100644 lib/test_capability-analysis.c create mode 100644 scripts/Makefile.capability-analysis create mode 100644 scripts/capability-analysis-suppression.txt
Capability analysis is a C language extension, which enables statically
checking that user-definable "capabilities" are acquired and released where
required. An obvious application is lock-safety checking for the kernel's
various synchronization primitives (each of which represents a "capability"),
and checking that locking rules are not violated.
Clang originally called the feature "Thread Safety Analysis" [1], with
some terminology still using the thread-safety-analysis-only names. This
was later changed and the feature became more flexible, gaining the
ability to define custom "capabilities". Its foundations can be found in
"capability systems" [2], used to specify the permissibility of
operations to depend on some capability being held (or not held).
Because the feature is not just able to express capabilities related to
synchronization primitives, the naming chosen for the kernel departs
from Clang's initial "Thread Safety" nomenclature and refers to the
feature as "Capability Analysis" to avoid confusion. The implementation
still makes references to the older terminology in some places, such as
`-Wthread-safety` being the warning enabled option that also still
appears in diagnostic messages.
Enabling capability analysis can be seen as enabling a dialect of Linux
C with a Capability System.
Additional details can be found in the added kernel-doc documentation.
[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf
=== Development Approach ===
Prior art exists in the form of Sparse's context tracking. Locking
annotations on functions exist, so the concept of analyzing locking rules
is not foreign to the kernel's codebase.
However, Clang's analysis is more complete vs. Sparse's, with the
typical trade-offs in static analysis: improved completeness is
sacrificed for more possible false positives or additional annotations
required by the programmer. Numerous options exist to disable or opt out
certain code from analysis.
This series initially aimed to retain compatibility with Sparse, which
can provide tree-wide analysis of a subset of the capability analysis
introduced, but it was later decided to drop Sparse compatibility. For
the most part, the new (and old) keywords used for annotations remain
the same, and many of the pre-existing annotations remain valid.
One big question is how to enable this feature, given we end up with a
new dialect of C -- 2 approaches have been considered:
A. Tree-wide all-or-nothing approach. This approach requires tree-wide
changes, adding annotations or selective opt-outs. Making additional
primitives capability-enabled increases churn, esp. where maintainers
are unaware of the feature's existence and how to use it.
Because we can't change the programming language (even if from one C
dialect to another) of the kernel overnight, a different approach might
cause less friction.
B. A selective, incremental, and much less intrusive approach.
Maintainers of subsystems opt in their modules or directories into
"capability analysis" (via Makefile):
CAPABILITY_ANALYSIS_foo.o := y # foo.o only
CAPABILITY_ANALYSIS := y # all TUs
Most (eventually all) synchronization primitives and more
capabilities (including ones that could track "irq disabled",
"preemption" disabled, etc.) could be supported.
The approach taken by this series is B. This ensures that only
subsystems where maintainers are willing to deal with any warnings are
opted-in. Introducing the feature can be done incrementally, without
large tree-wide changes and adding numerous opt-outs and annotations to
the majority of code.
Note: Bart Van Assche concurrently worked on enabling -Wthread-safety:
https://lore.kernel.org/all/20250206175114.1974171-1-bvanassche@acm.org/
Bart's work has shown what it might take to go with approach A
(tree-wide, restricted to 'mutex' usage). This has shown that the
analysis finds real issues when applied to enough subsystems! We hope
this serves as motivation to eventually enable the analysis in as many
subsystems as possible, particularly subsystems that are not as easily
tested by CI systems and test robots.
=== Initial Uses ===
With this initial series, the following synchronization primitives are
supported: `raw_spinlock_t`, `spinlock_t`, `rwlock_t`, `mutex`,
`seqlock_t`, `bit_spinlock`, RCU, SRCU (`srcu_struct`), `rw_semaphore`,
`local_lock_t`, `ww_mutex`.
To demonstrate use of the feature on real kernel code, the series also
enables capability analysis for the following subsystems:
* mm/kfence
* kernel/kcov
* lib/stackdepot
* lib/rhashtable
* drivers/tty
* security/tomoyo
* crypto/
The initial benefits are static detection of violations of locking
rules. As more capabilities are added, we would see more static checking
beyond what regular C can provide, all while remaining easy (read quick)
to use via the Clang compiler.
Note: The kernel already provides dynamic analysis tools Lockdep and
KCSAN for lock-safety checking and data-race detection respectively.
Unlike those, Clang's capability analysis is a compile-time static
analysis with no runtime impact. The static analysis complements
existing dynamic analysis tools, as it may catch some issues before
even getting into a running kernel, but is *not* a replacement for
whole-kernel testing with the dynamic analysis tools enabled!
=== Appendix ===
A Clang version that supports `-Wthread-safety-pointer` is recommended,
but not a strong dependency:
https://github.com/llvm/llvm-project/commit/de10e44b6fe7f3d3cfde3afd8e1222d251172ade
This series is also available at this Git tree:
https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git/log/?h=cap-analysis/dev
=== Changelog ===
v2:
- Remove Sparse context tracking support - after the introduction of
Clang support, so that backports can skip removal of Sparse support.
- Remove __cond_lock() function-like helper.
- ww_mutex support.
- -Wthread-safety-addressof was reworked and committed in upstream
Clang as -Wthread-safety-pointer.
- Make __cond_acquires() and __cond_acquires_shared() take abstract
value, since compiler only cares about zero and non-zero.
- Rename __var_guarded_by to simply __guarded_by. Initially the idea
was to be explicit about if the variable itself or the pointed-to
data is guarded, but in the long-term, making this shorter might be
better.
- Likewise rename __ref_guarded_by to __pt_guarded_by.
- Introduce common header warning suppressions - this is a better
solution than guarding header inclusions with disable_ +
enable_capability_analysis(). Header suppressions are disabled when
selecting CONFIG_WARN_CAPABILITY_ANALYSIS_ALL=y. This bumps the
minimum Clang version required to 20+.
- Make the data_race() macro imply disabled capability analysis.
Writing capability_unsafe(data_race(..)) is unnecessarily verbose
and data_race() on its own already indicates something subtly unsafe
is happening. This change was made after analysis of a finding in
security/tomoyo.
- Enable analysis in the following subsystems as additional examples
of larger subsystem. Where it was obvious, the __guarded_by
attribute was added to lock-guarded variables to improve coverage.
* drivers/tty
* security/tomoyo
* crypto/
RFC v1: https://lore.kernel.org/lkml/20250206181711.1902989-1-elver@google.com
Marco Elver (34):
compiler_types: Move lock checking attributes to
compiler-capability-analysis.h
compiler-capability-analysis: Add infrastructure for Clang's
capability analysis
compiler-capability-analysis: Add test stub
Documentation: Add documentation for Compiler-Based Capability
Analysis
checkpatch: Warn about capability_unsafe() without comment
cleanup: Basic compatibility with capability analysis
lockdep: Annotate lockdep assertions for capability analysis
locking/rwlock, spinlock: Support Clang's capability analysis
compiler-capability-analysis: Change __cond_acquires to take return
value
locking/mutex: Support Clang's capability analysis
locking/seqlock: Support Clang's capability analysis
bit_spinlock: Include missing <asm/processor.h>
bit_spinlock: Support Clang's capability analysis
rcu: Support Clang's capability analysis
srcu: Support Clang's capability analysis
kref: Add capability-analysis annotations
locking/rwsem: Support Clang's capability analysis
locking/local_lock: Include missing headers
locking/local_lock: Support Clang's capability analysis
locking/ww_mutex: Support Clang's capability analysis
debugfs: Make debugfs_cancellation a capability struct
compiler-capability-analysis: Remove Sparse support
compiler-capability-analysis: Remove __cond_lock() function-like
helper
compiler-capability-analysis: Introduce header suppressions
compiler: Let data_race() imply disabled capability analysis
kfence: Enable capability analysis
kcov: Enable capability analysis
stackdepot: Enable capability analysis
rhashtable: Enable capability analysis
printk: Move locking annotation to printk.c
drivers/tty: Enable capability analysis for core files
security/tomoyo: Enable capability analysis
crypto: Enable capability analysis
MAINTAINERS: Add entry for Capability Analysis
.../dev-tools/capability-analysis.rst | 148 +++++
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/sparse.rst | 19 -
Documentation/mm/process_addrs.rst | 6 +-
MAINTAINERS | 11 +
Makefile | 1 +
crypto/Makefile | 2 +
crypto/algapi.c | 2 +
crypto/api.c | 1 +
crypto/crypto_engine.c | 2 +-
crypto/drbg.c | 5 +
crypto/internal.h | 2 +-
crypto/proc.c | 3 +
crypto/scompress.c | 8 +-
.../net/wireless/intel/iwlwifi/iwl-trans.c | 4 +-
.../net/wireless/intel/iwlwifi/iwl-trans.h | 6 +-
.../wireless/intel/iwlwifi/pcie/internal.h | 5 +-
.../net/wireless/intel/iwlwifi/pcie/trans.c | 4 +-
drivers/tty/Makefile | 3 +
drivers/tty/n_tty.c | 16 +
drivers/tty/pty.c | 1 +
drivers/tty/sysrq.c | 1 +
drivers/tty/tty.h | 8 +-
drivers/tty/tty_buffer.c | 8 +-
drivers/tty/tty_io.c | 12 +-
drivers/tty/tty_ioctl.c | 2 +-
drivers/tty/tty_ldisc.c | 35 +-
drivers/tty/tty_ldsem.c | 2 +
drivers/tty/tty_mutex.c | 4 +
drivers/tty/tty_port.c | 2 +
fs/dlm/lock.c | 2 +-
include/crypto/internal/engine.h | 2 +-
include/linux/bit_spinlock.h | 24 +-
include/linux/cleanup.h | 18 +-
include/linux/compiler-capability-analysis.h | 368 ++++++++++++
include/linux/compiler.h | 2 +
include/linux/compiler_types.h | 18 +-
include/linux/console.h | 4 +-
include/linux/debugfs.h | 12 +-
include/linux/kref.h | 2 +
include/linux/list_bl.h | 2 +
include/linux/local_lock.h | 18 +-
include/linux/local_lock_internal.h | 43 +-
include/linux/lockdep.h | 12 +-
include/linux/mm.h | 33 +-
include/linux/mutex.h | 29 +-
include/linux/mutex_types.h | 4 +-
include/linux/rcupdate.h | 86 +--
include/linux/refcount.h | 6 +-
include/linux/rhashtable.h | 14 +-
include/linux/rwlock.h | 22 +-
include/linux/rwlock_api_smp.h | 43 +-
include/linux/rwlock_rt.h | 44 +-
include/linux/rwlock_types.h | 10 +-
include/linux/rwsem.h | 56 +-
include/linux/sched/signal.h | 14 +-
include/linux/seqlock.h | 24 +
include/linux/seqlock_types.h | 5 +-
include/linux/spinlock.h | 64 +-
include/linux/spinlock_api_smp.h | 34 +-
include/linux/spinlock_api_up.h | 112 +++-
include/linux/spinlock_rt.h | 37 +-
include/linux/spinlock_types.h | 10 +-
include/linux/spinlock_types_raw.h | 5 +-
include/linux/srcu.h | 61 +-
include/linux/tty.h | 14 +-
include/linux/tty_flip.h | 4 +-
include/linux/tty_ldisc.h | 19 +-
include/linux/ww_mutex.h | 21 +-
kernel/Makefile | 2 +
kernel/kcov.c | 36 +-
kernel/printk/printk.c | 2 +
kernel/signal.c | 4 +-
kernel/time/posix-timers.c | 10 +-
lib/Kconfig.debug | 45 ++
lib/Makefile | 6 +
lib/dec_and_lock.c | 8 +-
lib/rhashtable.c | 5 +-
lib/stackdepot.c | 20 +-
lib/test_capability-analysis.c | 548 ++++++++++++++++++
mm/kfence/Makefile | 2 +
mm/kfence/core.c | 20 +-
mm/kfence/kfence.h | 14 +-
mm/kfence/report.c | 4 +-
mm/memory.c | 4 +-
mm/pgtable-generic.c | 19 +-
net/ipv4/tcp_sigpool.c | 2 +-
scripts/Makefile.capability-analysis | 11 +
scripts/Makefile.lib | 10 +
scripts/capability-analysis-suppression.txt | 32 +
scripts/checkpatch.pl | 8 +
security/tomoyo/Makefile | 2 +
security/tomoyo/common.c | 52 +-
security/tomoyo/common.h | 77 +--
security/tomoyo/domain.c | 1 +
security/tomoyo/environ.c | 1 +
security/tomoyo/file.c | 5 +
security/tomoyo/gc.c | 28 +-
security/tomoyo/mount.c | 2 +
security/tomoyo/network.c | 3 +
tools/include/linux/compiler_types.h | 2 -
101 files changed, 2086 insertions(+), 521 deletions(-)
create mode 100644 Documentation/dev-tools/capability-analysis.rst
create mode 100644 include/linux/compiler-capability-analysis.h
create mode 100644 lib/test_capability-analysis.c
create mode 100644 scripts/Makefile.capability-analysis
create mode 100644 scripts/capability-analysis-suppression.txt
--
2.48.1.711.g2feabab25a-goog
Right, so since this is all awesome, I figured I should try and have it
compile kernel/sched/, see how far I get.
I know I can't use the __guarded_by() things, they're just too primitive
for that I need, but I figured I should try and have it track the
lock/unlock thingies at least.
I've had to rework how the GUARD bits work, since some guards are
defined on objects that contain locks, rather than the locks themselves.
We've also had discussions on IRQ on all the things I've ran across,
like needing to be able to reference the return object for things like:
struct sighand_struct *
lock_task_sighand(struct task_struct *task, unsigned long *flags)
__cond_acquires(nonnull, return->siglock);
The patch below tried working around that by doing:
__cond_acquires(nonnull, task->sighand->siglock)
But I'm not quite sure it does the right thing, and definitely doesn't
always work for other such cases. Notably, I've managed to ICE clang
(send you the details).
This is on top of queue.git/locking/core, but I imagine that tree won't
live very long at those point.
It is basically tip/locking/core with your patches ran through:
sed -e 's/__acquire/__assume/g'
before applying them.
---
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 248416ecd01c..d27607d9c2dc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,6 +945,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
*/
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
+ __acquires(q->limits_lock)
{
mutex_lock(&q->limits_lock);
return q->limits;
@@ -965,6 +966,7 @@ int blk_validate_limits(struct queue_limits *lim);
* starting update.
*/
static inline void queue_limits_cancel_update(struct request_queue *q)
+ __releases(q->limits_lock)
{
mutex_unlock(&q->limits_lock);
}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f3f50e29d639..2b6d8cc1f144 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2196,6 +2196,7 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
static __always_inline u32
bpf_prog_run_array_uprobe(const struct bpf_prog_array *array,
const void *ctx, bpf_prog_run_fn run_prog)
+ __no_capability_analysis /* too stupid for cond rcu lock */
{
const struct bpf_prog_array_item *item;
const struct bpf_prog *prog;
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index f8ef47f8a634..cbda68703858 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -364,11 +364,13 @@ static inline void cgroup_put(struct cgroup *cgrp)
extern struct mutex cgroup_mutex;
static inline void cgroup_lock(void)
+ __acquires(cgroup_mutex)
{
mutex_lock(&cgroup_mutex);
}
static inline void cgroup_unlock(void)
+ __releases(cgroup_mutex)
{
mutex_unlock(&cgroup_mutex);
}
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index 2ce479b1873e..69da4f20ec57 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -249,12 +249,13 @@ static inline _type class_##_name##_constructor(_init_args) \
__no_capability_analysis \
{ _type t = _init; return t; }
-#define EXTEND_CLASS(_name, ext, ctor_attrs, _init, _init_args...) \
+#define EXTEND_CLASS(_name, ext, _init, _init_args...) \
+typedef lock_##_name##_t lock_##_name##ext##_t; \
typedef class_##_name##_t class_##_name##ext##_t; \
static inline void class_##_name##ext##_destructor(class_##_name##_t *p)\
{ class_##_name##_destructor(p); } \
static inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
- __no_capability_analysis ctor_attrs \
+ __no_capability_analysis \
{ class_##_name##_t t = _init; return t; }
#define CLASS(_name, var) \
@@ -302,7 +303,7 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
#define DEFINE_GUARD_COND(_name, _ext, _condlock) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext,, \
+ EXTEND_CLASS(_name, _ext, \
({ void *_t = _T; if (_T && !(_condlock)) _t = NULL; _t; }), \
class_##_name##_t _T) \
static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
@@ -368,6 +369,7 @@ _label: \
*/
#define __DEFINE_UNLOCK_GUARD(_name, _type, _unlock, ...) \
+typedef _type lock_##_name##_t; \
typedef struct { \
_type *lock; \
__VA_ARGS__; \
@@ -387,7 +389,7 @@ static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \
#define __DEFINE_LOCK_GUARD_1(_name, _type, _lock) \
static inline class_##_name##_t class_##_name##_constructor(_type *l) \
- __no_capability_analysis __assumes_cap(l) \
+ __no_capability_analysis \
{ \
class_##_name##_t _t = { .lock = l }, *_T = &_t; \
_lock; \
@@ -408,6 +410,10 @@ static inline class_##_name##_t class_##_name##_constructor(void) \
static inline class_##_name##_t class_##_name##_constructor(void) _lock;\
static inline void class_##_name##_destructor(class_##_name##_t *_T) _unlock
+#define DECLARE_LOCK_GUARD_1_ATTRS(_name, _lock, _unlock) \
+static inline class_##_name##_t class_##_name##_constructor(lock_##_name##_t *_T) _lock;\
+static inline void class_##_name##_destructor(class_##_name##_t *_T) _unlock
+
#define DEFINE_LOCK_GUARD_1(_name, _type, _lock, _unlock, ...) \
__DEFINE_CLASS_IS_CONDITIONAL(_name, false); \
__DEFINE_UNLOCK_GUARD(_name, _type, _unlock, __VA_ARGS__) \
@@ -420,7 +426,7 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
#define DEFINE_LOCK_GUARD_1_COND(_name, _ext, _condlock) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext, __assumes_cap(l), \
+ EXTEND_CLASS(_name, _ext, \
({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
if (_T->lock && !(_condlock)) _T->lock = NULL; \
_t; }), \
@@ -428,5 +434,4 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
static inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
{ return class_##_name##_lock_ptr(_T); }
-
#endif /* _LINUX_CLEANUP_H */
diff --git a/include/linux/device.h b/include/linux/device.h
index 80a5b3268986..283fb85d96c8 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1026,21 +1026,25 @@ static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
}
static inline void device_lock(struct device *dev)
+ __acquires(dev->mutex)
{
mutex_lock(&dev->mutex);
}
static inline int device_lock_interruptible(struct device *dev)
+ __cond_acquires(0, dev->mutex)
{
return mutex_lock_interruptible(&dev->mutex);
}
static inline int device_trylock(struct device *dev)
+ __cond_acquires(true, dev->mutex)
{
return mutex_trylock(&dev->mutex);
}
static inline void device_unlock(struct device *dev)
+ __releases(dev->mutex)
{
mutex_unlock(&dev->mutex);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2c3b2f8a621f..52b69fed3e65 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -540,31 +540,37 @@ static inline bool mapping_tagged(struct address_space *mapping, xa_mark_t tag)
}
static inline void i_mmap_lock_write(struct address_space *mapping)
+ __acquires(mapping->i_mmap_rwsem)
{
down_write(&mapping->i_mmap_rwsem);
}
static inline int i_mmap_trylock_write(struct address_space *mapping)
+ __cond_acquires(true, mapping->i_mmap_rwsem)
{
return down_write_trylock(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_unlock_write(struct address_space *mapping)
+ __releases(mapping->i_mmap_rwsem)
{
up_write(&mapping->i_mmap_rwsem);
}
static inline int i_mmap_trylock_read(struct address_space *mapping)
+ __cond_acquires_shared(true, mapping->i_mmap_rwsem)
{
return down_read_trylock(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_lock_read(struct address_space *mapping)
+ __acquires_shared(mapping->i_mmap_rwsem)
{
down_read(&mapping->i_mmap_rwsem);
}
static inline void i_mmap_unlock_read(struct address_space *mapping)
+ __releases_shared(mapping->i_mmap_rwsem)
{
up_read(&mapping->i_mmap_rwsem);
}
@@ -873,31 +879,37 @@ enum inode_i_mutex_lock_class
};
static inline void inode_lock(struct inode *inode)
+ __acquires(inode->i_rwsem)
{
down_write(&inode->i_rwsem);
}
static inline void inode_unlock(struct inode *inode)
+ __releases(inode->i_rwsem)
{
up_write(&inode->i_rwsem);
}
static inline void inode_lock_shared(struct inode *inode)
+ __acquires_shared(inode->i_rwsem)
{
down_read(&inode->i_rwsem);
}
static inline void inode_unlock_shared(struct inode *inode)
+ __releases_shared(inode->i_rwsem)
{
up_read(&inode->i_rwsem);
}
static inline int inode_trylock(struct inode *inode)
+ __cond_acquires(true, inode->i_rwsem)
{
return down_write_trylock(&inode->i_rwsem);
}
static inline int inode_trylock_shared(struct inode *inode)
+ __cond_acquires_shared(true, inode->i_rwsem)
{
return down_read_trylock(&inode->i_rwsem);
}
@@ -908,38 +920,45 @@ static inline int inode_is_locked(struct inode *inode)
}
static inline void inode_lock_nested(struct inode *inode, unsigned subclass)
+ __acquires(inode->i_rwsem)
{
down_write_nested(&inode->i_rwsem, subclass);
}
static inline void inode_lock_shared_nested(struct inode *inode, unsigned subclass)
+ __acquires_shared(inode->i_rwsem)
{
down_read_nested(&inode->i_rwsem, subclass);
}
static inline void filemap_invalidate_lock(struct address_space *mapping)
+ __acquires(mapping->invalidate_lock)
{
down_write(&mapping->invalidate_lock);
}
static inline void filemap_invalidate_unlock(struct address_space *mapping)
+ __releases(mapping->invalidate_lock)
{
up_write(&mapping->invalidate_lock);
}
static inline void filemap_invalidate_lock_shared(struct address_space *mapping)
+ __acquires_shared(mapping->invalidate_lock)
{
down_read(&mapping->invalidate_lock);
}
static inline int filemap_invalidate_trylock_shared(
struct address_space *mapping)
+ __cond_acquires_shared(true, mapping->invalidate_lock)
{
return down_read_trylock(&mapping->invalidate_lock);
}
static inline void filemap_invalidate_unlock_shared(
struct address_space *mapping)
+ __releases_shared(mapping->invalidate_lock)
{
up_read(&mapping->invalidate_lock);
}
@@ -3873,6 +3892,7 @@ static inline bool dir_emit_dots(struct file *file, struct dir_context *ctx)
return true;
}
static inline bool dir_relax(struct inode *inode)
+ __releases(inode->i_rwsem) __acquires(inode->i_rwsem)
{
inode_unlock(inode);
inode_lock(inode);
@@ -3880,6 +3900,7 @@ static inline bool dir_relax(struct inode *inode)
}
static inline bool dir_relax_shared(struct inode *inode)
+ __releases_shared(inode->i_rwsem) __acquires_shared(inode->i_rwsem)
{
inode_unlock_shared(inode);
inode_lock_shared(inode);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index da5f5fa4a3a6..c157f872240c 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -170,6 +170,7 @@ static inline bool idr_is_empty(const struct idr *idr)
* function. See idr_preload() for details.
*/
static inline void idr_preload_end(void)
+ __releases(radix_tree_preloads.lock)
{
local_unlock(&radix_tree_preloads.lock);
}
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 8cd9327e4e78..74ef6df7549c 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -236,8 +236,8 @@ extern void enable_percpu_irq(unsigned int irq, unsigned int type);
extern bool irq_percpu_is_enabled(unsigned int irq);
extern void irq_wake_thread(unsigned int irq, void *dev_id);
-DEFINE_LOCK_GUARD_1(disable_irq, int,
- disable_irq(*_T->lock), enable_irq(*_T->lock))
+DEFINE_GUARD(disable_irq, int,
+ disable_irq(_T), enable_irq(_T));
extern void disable_nmi_nosync(unsigned int irq);
extern void disable_percpu_nmi(unsigned int irq);
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 6e74b8254d9b..9afbea99feea 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1512,17 +1512,20 @@ static inline struct lruvec *parent_lruvec(struct lruvec *lruvec)
}
static inline void unlock_page_lruvec(struct lruvec *lruvec)
+ __releases(lruvec->lru_lock)
{
spin_unlock(&lruvec->lru_lock);
}
static inline void unlock_page_lruvec_irq(struct lruvec *lruvec)
+ __releases(lruvec->lru_lock)
{
spin_unlock_irq(&lruvec->lru_lock);
}
static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec,
- unsigned long flags)
+ unsigned long flags)
+ __releases(lruvec->lru_lock)
{
spin_unlock_irqrestore(&lruvec->lru_lock, flags);
}
@@ -1537,7 +1540,8 @@ static inline bool folio_matches_lruvec(struct folio *folio,
/* Don't lock again iff page's lruvec locked */
static inline struct lruvec *folio_lruvec_relock_irq(struct folio *folio,
- struct lruvec *locked_lruvec)
+ struct lruvec *locked_lruvec)
+ __no_capability_analysis /* no __cond_releases */
{
if (locked_lruvec) {
if (folio_matches_lruvec(folio, locked_lruvec))
@@ -1551,7 +1555,9 @@ static inline struct lruvec *folio_lruvec_relock_irq(struct folio *folio,
/* Don't lock again iff folio's lruvec locked */
static inline void folio_lruvec_relock_irqsave(struct folio *folio,
- struct lruvec **lruvecp, unsigned long *flags)
+ struct lruvec **lruvecp,
+ unsigned long *flags)
+ __no_capability_analysis
{
if (*lruvecp) {
if (folio_matches_lruvec(folio, *lruvecp))
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index eaac5ae8c05c..aebdcaa83b86 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -99,18 +99,22 @@ bool mhp_supports_memmap_on_memory(void);
* can't be changed while pgdat_resize_lock() held.
*/
static inline unsigned zone_span_seqbegin(struct zone *zone)
+ __acquires_shared(zone->span_seqlock)
{
return read_seqbegin(&zone->span_seqlock);
}
static inline int zone_span_seqretry(struct zone *zone, unsigned iv)
+ __releases_shared(zone->span_seqlock)
{
return read_seqretry(&zone->span_seqlock, iv);
}
static inline void zone_span_writelock(struct zone *zone)
+ __acquires(zone->span_seqlock)
{
write_seqlock(&zone->span_seqlock);
}
static inline void zone_span_writeunlock(struct zone *zone)
+ __releases(zone->span_seqlock)
{
write_sequnlock(&zone->span_seqlock);
}
@@ -178,11 +182,13 @@ void mem_hotplug_done(void);
/* See kswapd_is_running() */
static inline void pgdat_kswapd_lock(pg_data_t *pgdat)
+ __acquires(pgdat->kswapd_lock)
{
mutex_lock(&pgdat->kswapd_lock);
}
static inline void pgdat_kswapd_unlock(pg_data_t *pgdat)
+ __releases(pgdat->kswapd_lock)
{
mutex_unlock(&pgdat->kswapd_lock);
}
@@ -252,11 +258,13 @@ struct range arch_get_mappable_range(void);
*/
static inline
void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags)
+ __acquires(pgdat->node_size_lock)
{
spin_lock_irqsave(&pgdat->node_size_lock, *flags);
}
static inline
void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags)
+ __releases(pgdat->node_size_lock)
{
spin_unlock_irqrestore(&pgdat->node_size_lock, *flags);
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dbf4eb414bd1..3979b6546082 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -703,6 +703,7 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
* using mmap_lock. The function should never yield false unlocked result.
*/
static inline bool vma_start_read(struct vm_area_struct *vma)
+ __cond_acquires_shared(true, vma->vm_lock->lock)
{
/*
* Check before locking. A race might cause false locked result.
@@ -736,6 +737,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma)
}
static inline void vma_end_read(struct vm_area_struct *vma)
+ __releases_shared(vma->vm_lock->lock)
{
rcu_read_lock(); /* keeps vma alive till the end of up_read */
up_read(&vma->vm_lock->lock);
@@ -800,6 +802,7 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached)
}
static inline void release_fault_lock(struct vm_fault *vmf)
+ __no_capability_analysis
{
if (vmf->flags & FAULT_FLAG_VMA_LOCK)
vma_end_read(vmf->vma);
@@ -3016,7 +3019,8 @@ static inline bool pagetable_pte_ctor(struct ptdesc *ptdesc)
return true;
}
-pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp);
+pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
+ __cond_acquires_shared(nonnull, RCU);
static inline pte_t *pte_offset_map(pmd_t *pmd, unsigned long addr)
{
@@ -3092,6 +3096,7 @@ static inline bool pmd_ptlock_init(struct ptdesc *ptdesc) { return true; }
#endif
static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
+ __no_capability_analysis
{
spinlock_t *ptl = pmd_lockptr(mm, pmd);
spin_lock(ptl);
@@ -3119,6 +3124,7 @@ static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud)
}
static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud)
+ __no_capability_analysis
{
spinlock_t *ptl = pud_lockptr(mm, pud);
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 45a21faa3ff6..17aa837362ae 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -129,6 +129,7 @@ static inline void mmap_init_lock(struct mm_struct *mm)
}
static inline void mmap_write_lock(struct mm_struct *mm)
+ __acquires(mm->mmap_lock)
{
__mmap_lock_trace_start_locking(mm, true);
down_write(&mm->mmap_lock);
@@ -137,6 +138,7 @@ static inline void mmap_write_lock(struct mm_struct *mm)
}
static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass)
+ __acquires(mm->mmap_lock)
{
__mmap_lock_trace_start_locking(mm, true);
down_write_nested(&mm->mmap_lock, subclass);
@@ -145,6 +147,7 @@ static inline void mmap_write_lock_nested(struct mm_struct *mm, int subclass)
}
static inline int mmap_write_lock_killable(struct mm_struct *mm)
+ __cond_acquires(0, mm->mmap_lock)
{
int ret;
@@ -171,6 +174,7 @@ static inline void vma_end_write_all(struct mm_struct *mm)
}
static inline void mmap_write_unlock(struct mm_struct *mm)
+ __releases(mm->mmap_lock)
{
__mmap_lock_trace_released(mm, true);
vma_end_write_all(mm);
@@ -178,6 +182,7 @@ static inline void mmap_write_unlock(struct mm_struct *mm)
}
static inline void mmap_write_downgrade(struct mm_struct *mm)
+ __releases(mm->mmap_lock) __acquires_shared(mm->mmap_lock)
{
__mmap_lock_trace_acquire_returned(mm, false, true);
vma_end_write_all(mm);
@@ -185,6 +190,7 @@ static inline void mmap_write_downgrade(struct mm_struct *mm)
}
static inline void mmap_read_lock(struct mm_struct *mm)
+ __acquires_shared(mm->mmap_lock)
{
__mmap_lock_trace_start_locking(mm, false);
down_read(&mm->mmap_lock);
@@ -192,6 +198,7 @@ static inline void mmap_read_lock(struct mm_struct *mm)
}
static inline int mmap_read_lock_killable(struct mm_struct *mm)
+ __cond_acquires_shared(0, mm->mmap_lock)
{
int ret;
@@ -202,6 +209,7 @@ static inline int mmap_read_lock_killable(struct mm_struct *mm)
}
static inline bool mmap_read_trylock(struct mm_struct *mm)
+ __cond_acquires_shared(true, mm->mmap_lock)
{
bool ret;
@@ -212,12 +220,14 @@ static inline bool mmap_read_trylock(struct mm_struct *mm)
}
static inline void mmap_read_unlock(struct mm_struct *mm)
+ __releases_shared(mm->mmap_lock)
{
__mmap_lock_trace_released(mm, false);
up_read(&mm->mmap_lock);
}
static inline void mmap_read_unlock_non_owner(struct mm_struct *mm)
+ __releases_shared(mm->mmap_lock)
{
__mmap_lock_trace_released(mm, false);
up_read_non_owner(&mm->mmap_lock);
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index d653692078ad..4b1f794f58a6 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -203,4 +203,8 @@ DEFINE_LOCK_GUARD_1(mutex, struct mutex, mutex_lock(_T->lock), mutex_unlock(_T->
DEFINE_LOCK_GUARD_1_COND(mutex, _try, mutex_trylock(_T->lock))
DEFINE_LOCK_GUARD_1_COND(mutex, _intr, mutex_lock_interruptible(_T->lock) == 0)
+DECLARE_LOCK_GUARD_1_ATTRS(mutex, __assumes_cap(_T), /* */);
+DECLARE_LOCK_GUARD_1_ATTRS(mutex_try, __assumes_cap(_T), /* */);
+DECLARE_LOCK_GUARD_1_ATTRS(mutex_intr, __assumes_cap(_T), /* */);
+
#endif /* __LINUX_MUTEX_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 94d267d02372..9f5841f1e0bd 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -132,6 +132,7 @@ static inline pte_t *__pte_map(pmd_t *pmd, unsigned long address)
return pte_offset_kernel(pmd, address);
}
static inline void pte_unmap(pte_t *pte)
+ __releases_shared(RCU)
{
rcu_read_unlock();
}
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index eae67015ce51..46771d36af34 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -236,7 +236,7 @@ void *radix_tree_delete(struct radix_tree_root *, unsigned long);
unsigned int radix_tree_gang_lookup(const struct radix_tree_root *,
void **results, unsigned long first_index,
unsigned int max_items);
-int radix_tree_preload(gfp_t gfp_mask);
+int radix_tree_preload(gfp_t gfp_mask) __acquires(radix_tree_preloads.lock);
int radix_tree_maybe_preload(gfp_t gfp_mask);
void radix_tree_init(void);
void *radix_tree_tag_set(struct radix_tree_root *,
@@ -256,6 +256,7 @@ unsigned int radix_tree_gang_lookup_tag_slot(const struct radix_tree_root *,
int radix_tree_tagged(const struct radix_tree_root *, unsigned int tag);
static inline void radix_tree_preload_end(void)
+ __releases(radix_tree_preloads.lock)
{
local_unlock(&radix_tree_preloads.lock);
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 82c486b67e92..ecb7b2fd8381 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -733,10 +733,12 @@ static inline int thread_group_empty(struct task_struct *p)
(thread_group_leader(p) && !thread_group_empty(p))
extern struct sighand_struct *lock_task_sighand(struct task_struct *task,
- unsigned long *flags);
+ unsigned long *flags)
+ __cond_acquires(nonnull, task->sighand->siglock);
static inline void unlock_task_sighand(struct task_struct *task,
unsigned long *flags)
+ __releases(task->sighand->siglock)
{
spin_unlock_irqrestore(&task->sighand->siglock, *flags);
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ca1db4b92c32..d215d42449b2 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -226,11 +226,13 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* neither inside nor outside.
*/
static inline void task_lock(struct task_struct *p)
+ __acquires(p->alloc_lock)
{
spin_lock(&p->alloc_lock);
}
static inline void task_unlock(struct task_struct *p)
+ __releases(p->alloc_lock)
{
spin_unlock(&p->alloc_lock);
}
diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 0f28b4623ad4..765bbc3d54be 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h
@@ -66,6 +66,7 @@ extern void wake_up_q(struct wake_q_head *head);
/* Spin unlock helpers to unlock and call wake_up_q with preempt disabled */
static inline
void raw_spin_unlock_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock(lock);
@@ -77,6 +78,7 @@ void raw_spin_unlock_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
static inline
void raw_spin_unlock_irq_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock_irq(lock);
@@ -89,6 +91,7 @@ void raw_spin_unlock_irq_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
static inline
void raw_spin_unlock_irqrestore_wake(raw_spinlock_t *lock, unsigned long flags,
struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock_irqrestore(lock, flags);
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index d05ed91de15f..ec7abf93fad9 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -359,11 +359,13 @@ static __always_inline int spin_trylock(spinlock_t *lock)
#define spin_lock_nested(lock, subclass) \
do { \
raw_spin_lock_nested(spinlock_check(lock), subclass); \
+ __release(spinlock_check(lock)); __acquire(lock); \
} while (0)
#define spin_lock_nest_lock(lock, nest_lock) \
do { \
raw_spin_lock_nest_lock(spinlock_check(lock), nest_lock); \
+ __release(spinlock_check(lock)); __acquire(lock); \
} while (0)
static __always_inline void spin_lock_irq(spinlock_t *lock)
@@ -535,73 +537,92 @@ void free_bucket_spinlocks(spinlock_t *locks);
DEFINE_LOCK_GUARD_1(raw_spinlock, raw_spinlock_t,
raw_spin_lock(_T->lock),
raw_spin_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(raw_spinlock, _try, raw_spin_trylock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(raw_spinlock_nested, raw_spinlock_t,
raw_spin_lock_nested(_T->lock, SINGLE_DEPTH_NESTING),
raw_spin_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_nested, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(raw_spinlock_irq, raw_spinlock_t,
raw_spin_lock_irq(_T->lock),
raw_spin_unlock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irq, _try, raw_spin_trylock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irq_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(raw_spinlock_irqsave, raw_spinlock_t,
raw_spin_lock_irqsave(_T->lock, _T->flags),
raw_spin_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(raw_spinlock_irqsave, _try,
raw_spin_trylock_irqsave(_T->lock, _T->flags))
+DECLARE_LOCK_GUARD_1_ATTRS(raw_spinlock_irqsave_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(spinlock, spinlock_t,
spin_lock(_T->lock),
spin_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(spinlock, _try, spin_trylock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(spinlock_irq, spinlock_t,
spin_lock_irq(_T->lock),
spin_unlock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(spinlock_irq, _try,
spin_trylock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irq_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(spinlock_irqsave, spinlock_t,
spin_lock_irqsave(_T->lock, _T->flags),
spin_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1_COND(spinlock_irqsave, _try,
spin_trylock_irqsave(_T->lock, _T->flags))
+DECLARE_LOCK_GUARD_1_ATTRS(spinlock_irqsave_try, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(read_lock, rwlock_t,
read_lock(_T->lock),
read_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(read_lock, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(read_lock_irq, rwlock_t,
read_lock_irq(_T->lock),
read_unlock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(read_lock_irq, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(read_lock_irqsave, rwlock_t,
read_lock_irqsave(_T->lock, _T->flags),
read_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)
+DECLARE_LOCK_GUARD_1_ATTRS(read_lock_irqsave, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(write_lock, rwlock_t,
write_lock(_T->lock),
write_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(write_lock, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(write_lock_irq, rwlock_t,
write_lock_irq(_T->lock),
write_unlock_irq(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(write_lock_irq, __assumes_cap(_T), /* */);
DEFINE_LOCK_GUARD_1(write_lock_irqsave, rwlock_t,
write_lock_irqsave(_T->lock, _T->flags),
write_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)
+DECLARE_LOCK_GUARD_1_ATTRS(write_lock_irqsave, __assumes_cap(_T), /* */);
#undef __LINUX_INSIDE_SPINLOCK_H
#endif /* __LINUX_SPINLOCK_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b93c8c3dc05a..c3a1beaafdb6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1512,28 +1512,37 @@ static inline void lockdep_assert_rq_held(struct rq *rq)
lockdep_assert_held(__rq_lockp(rq));
}
-extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);
-extern bool raw_spin_rq_trylock(struct rq *rq);
-extern void raw_spin_rq_unlock(struct rq *rq);
+extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass)
+ __acquires(rq->__lock);
+
+extern bool raw_spin_rq_trylock(struct rq *rq)
+ __cond_acquires(true, rq->__lock);
+
+extern void raw_spin_rq_unlock(struct rq *rq)
+ __releases(rq->__lock);
static inline void raw_spin_rq_lock(struct rq *rq)
+ __acquires(rq->__lock)
{
raw_spin_rq_lock_nested(rq, 0);
}
static inline void raw_spin_rq_lock_irq(struct rq *rq)
+ __acquires(rq->__lock)
{
local_irq_disable();
raw_spin_rq_lock(rq);
}
static inline void raw_spin_rq_unlock_irq(struct rq *rq)
+ __releases(rq->__lock)
{
raw_spin_rq_unlock(rq);
local_irq_enable();
}
static inline unsigned long _raw_spin_rq_lock_irqsave(struct rq *rq)
+ __acquires(rq->__lock)
{
unsigned long flags;
@@ -1544,6 +1553,7 @@ static inline unsigned long _raw_spin_rq_lock_irqsave(struct rq *rq)
}
static inline void raw_spin_rq_unlock_irqrestore(struct rq *rq, unsigned long flags)
+ __releases(rq->__lock)
{
raw_spin_rq_unlock(rq);
local_irq_restore(flags);
@@ -1803,15 +1813,14 @@ static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
extern
struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(rq->lock);
+ __acquires(task_rq(p)->__lock) /* ICEs clang!!, wants to be: return->__lock */
extern
struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(p->pi_lock)
- __acquires(rq->lock);
+ __acquires(p->pi_lock);
static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
@@ -1819,7 +1828,7 @@ static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
static inline void
task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(rq->__lock)
__releases(p->pi_lock)
{
rq_unpin_lock(rq, rf);
@@ -1832,43 +1841,45 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
task_rq_unlock(_T->rq, _T->lock, &_T->rf),
struct rq *rq; struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(task_rq_lock, __assumes_cap(_T->pi_lock), /* nothing */);
+
static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(rq->__lock)
{
raw_spin_rq_lock_irqsave(rq, rf->flags);
rq_pin_lock(rq, rf);
}
static inline void rq_lock_irq(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(rq->__lock)
{
raw_spin_rq_lock_irq(rq);
rq_pin_lock(rq, rf);
}
static inline void rq_lock(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(rq->__lock)
{
raw_spin_rq_lock(rq);
rq_pin_lock(rq, rf);
}
static inline void rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock_irqrestore(rq, rf->flags);
}
static inline void rq_unlock_irq(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock_irq(rq);
}
static inline void rq_unlock(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
@@ -1879,18 +1890,24 @@ DEFINE_LOCK_GUARD_1(rq_lock, struct rq,
rq_unlock(_T->lock, &_T->rf),
struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock, __assumes_cap(_T->__lock), /* */);
+
DEFINE_LOCK_GUARD_1(rq_lock_irq, struct rq,
rq_lock_irq(_T->lock, &_T->rf),
rq_unlock_irq(_T->lock, &_T->rf),
struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock_irq, __assumes_cap(_T->__lock), /* */);
+
DEFINE_LOCK_GUARD_1(rq_lock_irqsave, struct rq,
rq_lock_irqsave(_T->lock, &_T->rf),
rq_unlock_irqrestore(_T->lock, &_T->rf),
struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock_irqsave, __assumes_cap(_T->__lock), /* */);
+
static inline struct rq *this_rq_lock_irq(struct rq_flags *rf)
- __acquires(rq->lock)
+ __no_capability_analysis /* need return value */
{
struct rq *rq;
@@ -2954,9 +2971,15 @@ static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
#define DEFINE_LOCK_GUARD_2(name, type, _lock, _unlock, ...) \
__DEFINE_UNLOCK_GUARD(name, type, _unlock, type *lock2; __VA_ARGS__) \
static inline class_##name##_t class_##name##_constructor(type *lock, type *lock2) \
+ __no_capability_analysis \
{ class_##name##_t _t = { .lock = lock, .lock2 = lock2 }, *_T = &_t; \
_lock; return _t; }
+#define DECLARE_LOCK_GUARD_2_ATTRS(_name, _lock, _unlock) \
+static inline class_##_name##_t class_##_name##_constructor(lock_##_name##_t *_T1, \
+ lock_##_name##_t *_T2) _lock; \
+static inline void class_##_name##_destructor(class_##_name##_t *_T) _unlock
+
#ifdef CONFIG_SMP
static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
@@ -2985,7 +3008,8 @@ static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
return rq1->cpu < rq2->cpu;
}
-extern void double_rq_lock(struct rq *rq1, struct rq *rq2);
+extern void double_rq_lock(struct rq *rq1, struct rq *rq2)
+ __acquires(rq1->__lock) __acquires(rq2->__lock);
#ifdef CONFIG_PREEMPTION
@@ -2998,9 +3022,9 @@ extern void double_rq_lock(struct rq *rq1, struct rq *rq2);
* also adds more overhead and therefore may reduce throughput.
*/
static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(this_rq->lock)
- __acquires(busiest->lock)
- __acquires(this_rq->lock)
+ __releases(this_rq->__lock)
+ __acquires(busiest->__lock)
+ __acquires(this_rq->__lock)
{
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
@@ -3017,9 +3041,9 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
* regardless of entry order into the function.
*/
static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(this_rq->lock)
- __acquires(busiest->lock)
- __acquires(this_rq->lock)
+ __releases(this_rq->__lock)
+ __acquires(busiest->__lock)
+ __acquires(this_rq->__lock)
{
if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
likely(raw_spin_rq_trylock(busiest))) {
@@ -3045,6 +3069,9 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
* double_lock_balance - lock the busiest runqueue, this_rq is locked already.
*/
static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
+ __releases(this_rq->__lock)
+ __acquires(this_rq->__lock)
+ __acquires(busiest->__lock)
{
lockdep_assert_irqs_disabled();
@@ -3052,7 +3079,8 @@ static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
}
static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(busiest->lock)
+ __releases(busiest->__lock)
+ __no_capability_analysis /* conditional code */
{
if (__rq_lockp(this_rq) != __rq_lockp(busiest))
raw_spin_rq_unlock(busiest);
@@ -3060,6 +3088,8 @@ static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
}
static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
+ __acquires(l1)
+ __acquires(l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3069,6 +3099,8 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
}
static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+ __acquires(l1)
+ __acquires(l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3078,6 +3110,8 @@ static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
}
static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
+ __acquires(l1)
+ __acquires(l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3087,6 +3121,8 @@ static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
}
static inline void double_raw_unlock(raw_spinlock_t *l1, raw_spinlock_t *l2)
+ __releases(l1)
+ __releases(l2)
{
raw_spin_unlock(l1);
raw_spin_unlock(l2);
@@ -3096,6 +3132,8 @@ DEFINE_LOCK_GUARD_2(double_raw_spinlock, raw_spinlock_t,
double_raw_lock(_T->lock, _T->lock2),
double_raw_unlock(_T->lock, _T->lock2))
+DECLARE_LOCK_GUARD_2_ATTRS(double_raw_spinlock, __assumes_cap(_T1) __assumes_cap(_T2), /* */);
+
/*
* double_rq_unlock - safely unlock two runqueues
*
@@ -3103,13 +3141,13 @@ DEFINE_LOCK_GUARD_2(double_raw_spinlock, raw_spinlock_t,
* you need to do so manually after calling.
*/
static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
- __releases(rq1->lock)
- __releases(rq2->lock)
+ __releases(rq1->__lock)
+ __releases(rq2->__lock)
{
if (__rq_lockp(rq1) != __rq_lockp(rq2))
raw_spin_rq_unlock(rq2);
else
- __release(rq2->lock);
+ __release_cap(&rq2->__lock);
raw_spin_rq_unlock(rq1);
}
@@ -3127,8 +3165,8 @@ extern bool sched_smp_initialized;
* you need to do so manually before calling.
*/
static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
- __acquires(rq1->lock)
- __acquires(rq2->lock)
+ __acquires(rq1->__lock)
+ __acquires(rq2->__lock)
{
WARN_ON_ONCE(!irqs_disabled());
WARN_ON_ONCE(rq1 != rq2);
@@ -3144,8 +3182,8 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
* you need to do so manually after calling.
*/
static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
- __releases(rq1->lock)
- __releases(rq2->lock)
+ __releases(rq1->__lock)
+ __releases(rq2->__lock)
{
WARN_ON_ONCE(rq1 != rq2);
raw_spin_rq_unlock(rq1);
On Wed, Mar 05, 2025 at 12:20PM +0100, Peter Zijlstra wrote:
>
> Right, so since this is all awesome, I figured I should try and have it
> compile kernel/sched/, see how far I get.
>
[...]
It's been a while, but teaching Clang new tricks for this analysis has
been taking its time (and I've only been looking into this on and off).
Anyway, Clang has already gained __attribute__((reentrant_capability).
Of course, that alone doesn't quite help that much.
But what we really wanted, I think, per this Clang discussion thread
[1], was some "simple" form of intra-procedural alias analysis.
[1] https://lore.kernel.org/all/CANpmjNPquO=W1JAh1FNQb8pMQjgeZAKCPQUAd7qUg=5pjJ6x=Q@mail.gmail.com/
Anyway, this evolving Clang PR probably gets us pretty close:
https://github.com/llvm/llvm-project/pull/142955
With Clang from that PR, I can compile kernel/sched/{core.c, fair.c}
with modest changes (see below - work in progress) without warnings.
Notably, this can also deal with "capability acquired in returned
object" with some macro magic.
The full v3 series preview is here:
https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git/log/?h=cap-analysis/dev
The whole tree compiles cleanly, although I might have missed testing
some kernel configs.
If/when that Clang PR lands (~ETA another month probably), I would think
about sending the next version of this series.
Thanks,
-- Marco
------ >8 ------
From: Marco Elver <elver@google.com>
Date: Sun, 3 Aug 2025 20:21:39 +0200
Subject: [PATCH] sched: Enable capability analysis for core.c and fair.c
This demonstrates a larger conversion to use Clang's capability
analysis. The benefit is additional static checking of locking rules,
along with better documentation.
Arguably, kernel/sched is the "final boss" of Clang's capability
analysis, and application to core.c & fair.c demonstrates that the
latest Clang version has become powerful enough to start applying this
to more complex subsystems (with some modest annotations and changes).
Signed-off-by: Marco Elver <elver@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
---
v3:
* New patch.
---
include/linux/sched.h | 6 +-
include/linux/sched/signal.h | 4 +-
include/linux/sched/task.h | 5 +-
include/linux/sched/wake_q.h | 3 +
kernel/sched/Makefile | 3 +
kernel/sched/core.c | 82 ++++++++++-----
kernel/sched/fair.c | 6 +-
kernel/sched/sched.h | 108 +++++++++++++-------
scripts/capability-analysis-suppression.txt | 1 +
9 files changed, 148 insertions(+), 70 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa9c5be7a632..3ac9d2407773 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2125,9 +2125,9 @@ static inline int _cond_resched(void)
_cond_resched(); \
})
-extern int __cond_resched_lock(spinlock_t *lock);
-extern int __cond_resched_rwlock_read(rwlock_t *lock);
-extern int __cond_resched_rwlock_write(rwlock_t *lock);
+extern int __cond_resched_lock(spinlock_t *lock) __must_hold(lock);
+extern int __cond_resched_rwlock_read(rwlock_t *lock) __must_hold_shared(lock);
+extern int __cond_resched_rwlock_write(rwlock_t *lock) __must_hold(lock);
#define MIGHT_RESCHED_RCU_SHIFT 8
#define MIGHT_RESCHED_PREEMPT_MASK ((1U << MIGHT_RESCHED_RCU_SHIFT) - 1)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index bc7f83b012fb..6f581a750e84 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -734,10 +734,12 @@ static inline int thread_group_empty(struct task_struct *p)
(thread_group_leader(p) && !thread_group_empty(p))
extern struct sighand_struct *lock_task_sighand(struct task_struct *task,
- unsigned long *flags);
+ unsigned long *flags)
+ __acquires(&task->sighand->siglock);
static inline void unlock_task_sighand(struct task_struct *task,
unsigned long *flags)
+ __releases(&task->sighand->siglock)
{
spin_unlock_irqrestore(&task->sighand->siglock, *flags);
}
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index ca1db4b92c32..a4373fc687bd 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -226,15 +226,18 @@ static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
* neither inside nor outside.
*/
static inline void task_lock(struct task_struct *p)
+ __acquires(&p->alloc_lock)
{
spin_lock(&p->alloc_lock);
}
static inline void task_unlock(struct task_struct *p)
+ __releases(&p->alloc_lock)
{
spin_unlock(&p->alloc_lock);
}
-DEFINE_GUARD(task_lock, struct task_struct *, task_lock(_T), task_unlock(_T))
+DEFINE_LOCK_GUARD_1(task_lock, struct task_struct, task_lock(_T->lock), task_unlock(_T->lock))
+DECLARE_LOCK_GUARD_1_ATTRS(task_lock, __assumes_cap(_T->alloc_lock), /* */)
#endif /* _LINUX_SCHED_TASK_H */
diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 0f28b4623ad4..765bbc3d54be 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h
@@ -66,6 +66,7 @@ extern void wake_up_q(struct wake_q_head *head);
/* Spin unlock helpers to unlock and call wake_up_q with preempt disabled */
static inline
void raw_spin_unlock_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock(lock);
@@ -77,6 +78,7 @@ void raw_spin_unlock_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
static inline
void raw_spin_unlock_irq_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock_irq(lock);
@@ -89,6 +91,7 @@ void raw_spin_unlock_irq_wake(raw_spinlock_t *lock, struct wake_q_head *wake_q)
static inline
void raw_spin_unlock_irqrestore_wake(raw_spinlock_t *lock, unsigned long flags,
struct wake_q_head *wake_q)
+ __releases(lock)
{
guard(preempt)();
raw_spin_unlock_irqrestore(lock, flags);
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 8ae86371ddcd..8603987ce4c1 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -1,5 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
+CAPABILITY_ANALYSIS_core.o := y
+CAPABILITY_ANALYSIS_fair.o := y
+
# The compilers are complaining about unused variables inside an if(0) scope
# block. This is daft, shut them up.
ccflags-y += $(call cc-disable-warning, unused-but-set-variable)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..0182d0246f44 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -664,16 +664,17 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
raw_spin_rq_lock(rq1);
if (__rq_lockp(rq1) != __rq_lockp(rq2))
raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ else
+ __acquire_cap(__rq_lockp(rq2)); /* fake acquire */
double_rq_clock_clear_update(rq1, rq2);
}
#endif
/*
- * __task_rq_lock - lock the rq @p resides on.
+ * ___task_rq_lock - lock the rq @p resides on.
*/
-struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(rq->lock)
+struct rq *___task_rq_lock(struct task_struct *p, struct rq_flags *rf)
{
struct rq *rq;
@@ -696,9 +697,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
/*
* task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
*/
-struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(p->pi_lock)
- __acquires(rq->lock)
+struct rq *_task_rq_lock(struct task_struct *p, struct rq_flags *rf)
{
struct rq *rq;
@@ -2494,6 +2493,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
*/
static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int new_cpu)
+ __must_hold(&rq->__lock)
{
lockdep_assert_rq_held(rq);
@@ -2540,6 +2540,7 @@ struct set_affinity_pending {
*/
static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
struct task_struct *p, int dest_cpu)
+ __must_hold(&rq->__lock)
{
/* Affinity changed (again). */
if (!is_cpu_allowed(p, dest_cpu))
@@ -2576,6 +2577,12 @@ static int migration_cpu_stop(void *data)
*/
flush_smp_call_function_queue();
+ /*
+ * We may change the underlying rq, but the locks held will
+ * appropriately be "transferred" when switching.
+ */
+ capability_unsafe_alias(rq);
+
raw_spin_lock(&p->pi_lock);
rq_lock(rq, &rf);
@@ -2685,6 +2692,8 @@ int push_cpu_stop(void *arg)
if (!lowest_rq)
goto out_unlock;
+ lockdep_assert_rq_held(lowest_rq);
+
// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
move_queued_task_locked(rq, lowest_rq, p);
@@ -2930,8 +2939,7 @@ void release_user_cpus_ptr(struct task_struct *p)
*/
static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flags *rf,
int dest_cpu, unsigned int flags)
- __releases(rq->lock)
- __releases(p->pi_lock)
+ __releases(&rq->__lock, &p->pi_lock)
{
struct set_affinity_pending my_pending = { }, *pending = NULL;
bool stop_pending, complete = false;
@@ -3079,8 +3087,7 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
struct affinity_context *ctx,
struct rq *rq,
struct rq_flags *rf)
- __releases(rq->lock)
- __releases(p->pi_lock)
+ __releases(&rq->__lock, &p->pi_lock)
{
const struct cpumask *cpu_allowed_mask = task_cpu_possible_mask(p);
const struct cpumask *cpu_valid_mask = cpu_active_mask;
@@ -4400,29 +4407,30 @@ static bool __task_needs_rq_lock(struct task_struct *p)
*/
int task_call_func(struct task_struct *p, task_call_f func, void *arg)
{
- struct rq *rq = NULL;
struct rq_flags rf;
int ret;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
- if (__task_needs_rq_lock(p))
- rq = __task_rq_lock(p, &rf);
+ if (__task_needs_rq_lock(p)) {
+ struct rq *rq = __task_rq_lock(p, &rf);
- /*
- * At this point the task is pinned; either:
- * - blocked and we're holding off wakeups (pi->lock)
- * - woken, and we're holding off enqueue (rq->lock)
- * - queued, and we're holding off schedule (rq->lock)
- * - running, and we're holding off de-schedule (rq->lock)
- *
- * The called function (@func) can use: task_curr(), p->on_rq and
- * p->__state to differentiate between these states.
- */
- ret = func(p, arg);
+ /*
+ * At this point the task is pinned; either:
+ * - blocked and we're holding off wakeups (pi->lock)
+ * - woken, and we're holding off enqueue (rq->lock)
+ * - queued, and we're holding off schedule (rq->lock)
+ * - running, and we're holding off de-schedule (rq->lock)
+ *
+ * The called function (@func) can use: task_curr(), p->on_rq and
+ * p->__state to differentiate between these states.
+ */
+ ret = func(p, arg);
- if (rq)
rq_unlock(rq, &rf);
+ } else {
+ ret = func(p, arg);
+ }
raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
return ret;
@@ -5118,6 +5126,8 @@ static inline void __balance_callbacks(struct rq *rq)
static inline void
prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf)
+ __releases(__rq_lockp(rq))
+ __acquires(__rq_lockp(this_rq()))
{
/*
* Since the runqueue lock will be released by the next
@@ -5131,9 +5141,15 @@ prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf
/* this is a valid case when another task releases the spinlock */
rq_lockp(rq)->owner = next;
#endif
+ /*
+ * Model the rq reference switcheroo.
+ */
+ __release(__rq_lockp(rq));
+ __acquire(__rq_lockp(this_rq()));
}
static inline void finish_lock_switch(struct rq *rq)
+ __releases(&rq->__lock)
{
/*
* If we are tracking spinlock dependencies then we have to
@@ -5189,6 +5205,7 @@ static inline void kmap_local_sched_in(void)
static inline void
prepare_task_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
+ __must_hold(&rq->__lock)
{
kcov_prepare_switch(prev);
sched_info_switch(rq, prev, next);
@@ -5220,7 +5237,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
* because prev may have moved to another CPU.
*/
static struct rq *finish_task_switch(struct task_struct *prev)
- __releases(rq->lock)
+ __releases(__rq_lockp(this_rq()))
{
struct rq *rq = this_rq();
struct mm_struct *mm = rq->prev_mm;
@@ -5308,7 +5325,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
* @prev: the thread we just switched away from.
*/
asmlinkage __visible void schedule_tail(struct task_struct *prev)
- __releases(rq->lock)
+ __releases(&this_rq()->__lock)
{
/*
* New tasks start with FORK_PREEMPT_COUNT, see there and
@@ -5340,6 +5357,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
static __always_inline struct rq *
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next, struct rq_flags *rf)
+ __releases(&rq->__lock)
{
prepare_task_switch(rq, prev, next);
@@ -6026,6 +6044,7 @@ static void prev_balance(struct rq *rq, struct task_struct *prev,
*/
static inline struct task_struct *
__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+ __must_hold(__rq_lockp(rq))
{
const struct sched_class *class;
struct task_struct *p;
@@ -6118,6 +6137,7 @@ static void queue_core_balance(struct rq *rq);
static struct task_struct *
pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+ __must_hold(__rq_lockp(rq))
{
struct task_struct *next, *p, *max = NULL;
const struct cpumask *smt_mask;
@@ -6562,6 +6582,7 @@ static inline void sched_core_cpu_dying(unsigned int cpu) {}
static struct task_struct *
pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+ __must_hold(__rq_lockp(rq))
{
return __pick_next_task(rq, prev, rf);
}
@@ -8004,6 +8025,12 @@ static int __balance_push_cpu_stop(void *arg)
struct rq_flags rf;
int cpu;
+ /*
+ * We may change the underlying rq, but the locks held will
+ * appropriately be "transferred" when switching.
+ */
+ capability_unsafe_alias(rq);
+
raw_spin_lock_irq(&p->pi_lock);
rq_lock(rq, &rf);
@@ -8031,6 +8058,7 @@ static DEFINE_PER_CPU(struct cpu_stop_work, push_work);
* effective when the hotplug motion is down.
*/
static void balance_push(struct rq *rq)
+ __must_hold(&rq->__lock)
{
struct task_struct *push_task = rq->curr;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a14da5396fb..260158287ddb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4842,7 +4842,8 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
return cfs_rq->avg.load_avg;
}
-static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf);
+static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
+ __must_hold(__rq_lockp(this_rq));
static inline unsigned long task_util(struct task_struct *p)
{
@@ -8737,6 +8738,7 @@ static void set_cpus_allowed_fair(struct task_struct *p, struct affinity_context
static int
balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+ __must_hold(__rq_lockp(rq))
{
if (sched_fair_runnable(rq))
return 1;
@@ -8884,6 +8886,7 @@ static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
struct task_struct *
pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
+ __must_hold(__rq_lockp(rq))
{
struct sched_entity *se;
struct task_struct *p;
@@ -8970,6 +8973,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
}
static struct task_struct *__pick_next_task_fair(struct rq *rq, struct task_struct *prev)
+ __must_hold(__rq_lockp(rq))
{
return pick_next_task_fair(rq, prev, NULL);
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 83e3aa917142..0da7c8b89030 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1343,8 +1343,13 @@ static inline bool is_migration_disabled(struct task_struct *p)
DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+static __always_inline struct rq *__this_rq(void)
+{
+ return this_cpu_ptr(&runqueues);
+}
+
#define cpu_rq(cpu) (&per_cpu(runqueues, (cpu)))
-#define this_rq() this_cpu_ptr(&runqueues)
+#define this_rq() __this_rq()
#define task_rq(p) cpu_rq(task_cpu(p))
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() raw_cpu_ptr(&runqueues)
@@ -1473,11 +1478,13 @@ static inline bool sched_core_disabled(void)
}
static inline raw_spinlock_t *rq_lockp(struct rq *rq)
+ __returns_cap(&rq->__lock)
{
return &rq->__lock;
}
static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
+ __returns_cap(&rq->__lock)
{
return &rq->__lock;
}
@@ -1519,32 +1526,42 @@ static inline bool rt_group_sched_enabled(void)
#endif /* CONFIG_RT_GROUP_SCHED */
static inline void lockdep_assert_rq_held(struct rq *rq)
+ __assumes_cap(__rq_lockp(rq))
{
lockdep_assert_held(__rq_lockp(rq));
}
-extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass);
-extern bool raw_spin_rq_trylock(struct rq *rq);
-extern void raw_spin_rq_unlock(struct rq *rq);
+extern void raw_spin_rq_lock_nested(struct rq *rq, int subclass)
+ __acquires(&rq->__lock);
+
+extern bool raw_spin_rq_trylock(struct rq *rq)
+ __cond_acquires(true, &rq->__lock);
+
+extern void raw_spin_rq_unlock(struct rq *rq)
+ __releases(&rq->__lock);
static inline void raw_spin_rq_lock(struct rq *rq)
+ __acquires(&rq->__lock)
{
raw_spin_rq_lock_nested(rq, 0);
}
static inline void raw_spin_rq_lock_irq(struct rq *rq)
+ __acquires(&rq->__lock)
{
local_irq_disable();
raw_spin_rq_lock(rq);
}
static inline void raw_spin_rq_unlock_irq(struct rq *rq)
+ __releases(&rq->__lock)
{
raw_spin_rq_unlock(rq);
local_irq_enable();
}
static inline unsigned long _raw_spin_rq_lock_irqsave(struct rq *rq)
+ __acquires(&rq->__lock)
{
unsigned long flags;
@@ -1555,6 +1572,7 @@ static inline unsigned long _raw_spin_rq_lock_irqsave(struct rq *rq)
}
static inline void raw_spin_rq_unlock_irqrestore(struct rq *rq, unsigned long flags)
+ __releases(&rq->__lock)
{
raw_spin_rq_unlock(rq);
local_irq_restore(flags);
@@ -1805,17 +1823,15 @@ static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
rq->clock_update_flags |= rf->clock_update_flags;
}
-extern
-struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(rq->lock);
+#define __task_rq_lock(...) __acquire_ret(___task_rq_lock(__VA_ARGS__), &__ret->__lock)
+extern struct rq *___task_rq_lock(struct task_struct *p, struct rq_flags *rf) __acquires_ret;
-extern
-struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
- __acquires(p->pi_lock)
- __acquires(rq->lock);
+#define task_rq_lock(...) __acquire_ret(_task_rq_lock(__VA_ARGS__), &__ret->__lock)
+extern struct rq *_task_rq_lock(struct task_struct *p, struct rq_flags *rf)
+ __acquires(&p->pi_lock) __acquires_ret;
static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(&rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
@@ -1823,8 +1839,7 @@ static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
static inline void
task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
- __releases(rq->lock)
- __releases(p->pi_lock)
+ __releases(&rq->__lock, &p->pi_lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
@@ -1835,44 +1850,45 @@ DEFINE_LOCK_GUARD_1(task_rq_lock, struct task_struct,
_T->rq = task_rq_lock(_T->lock, &_T->rf),
task_rq_unlock(_T->rq, _T->lock, &_T->rf),
struct rq *rq; struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(task_rq_lock, __assumes_cap(_T->pi_lock), /* */)
static inline void rq_lock_irqsave(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(&rq->__lock)
{
raw_spin_rq_lock_irqsave(rq, rf->flags);
rq_pin_lock(rq, rf);
}
static inline void rq_lock_irq(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(&rq->__lock)
{
raw_spin_rq_lock_irq(rq);
rq_pin_lock(rq, rf);
}
static inline void rq_lock(struct rq *rq, struct rq_flags *rf)
- __acquires(rq->lock)
+ __acquires(&rq->__lock)
{
raw_spin_rq_lock(rq);
rq_pin_lock(rq, rf);
}
static inline void rq_unlock_irqrestore(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(&rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock_irqrestore(rq, rf->flags);
}
static inline void rq_unlock_irq(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(&rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock_irq(rq);
}
static inline void rq_unlock(struct rq *rq, struct rq_flags *rf)
- __releases(rq->lock)
+ __releases(&rq->__lock)
{
rq_unpin_lock(rq, rf);
raw_spin_rq_unlock(rq);
@@ -1883,18 +1899,24 @@ DEFINE_LOCK_GUARD_1(rq_lock, struct rq,
rq_unlock(_T->lock, &_T->rf),
struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock, __assumes_cap(_T->__lock), /* */);
+
DEFINE_LOCK_GUARD_1(rq_lock_irq, struct rq,
rq_lock_irq(_T->lock, &_T->rf),
rq_unlock_irq(_T->lock, &_T->rf),
struct rq_flags rf)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock_irq, __assumes_cap(_T->__lock), /* */);
+
DEFINE_LOCK_GUARD_1(rq_lock_irqsave, struct rq,
rq_lock_irqsave(_T->lock, &_T->rf),
rq_unlock_irqrestore(_T->lock, &_T->rf),
struct rq_flags rf)
-static inline struct rq *this_rq_lock_irq(struct rq_flags *rf)
- __acquires(rq->lock)
+DECLARE_LOCK_GUARD_1_ATTRS(rq_lock_irqsave, __assumes_cap(_T->__lock), /* */);
+
+#define this_rq_lock_irq(...) __acquire_ret(_this_rq_lock_irq(__VA_ARGS__), &__ret->__lock)
+static inline struct rq *_this_rq_lock_irq(struct rq_flags *rf) __acquires_ret
{
struct rq *rq;
@@ -2927,9 +2949,15 @@ static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
#define DEFINE_LOCK_GUARD_2(name, type, _lock, _unlock, ...) \
__DEFINE_UNLOCK_GUARD(name, type, _unlock, type *lock2; __VA_ARGS__) \
static inline class_##name##_t class_##name##_constructor(type *lock, type *lock2) \
+ __no_capability_analysis \
{ class_##name##_t _t = { .lock = lock, .lock2 = lock2 }, *_T = &_t; \
_lock; return _t; }
+#define DECLARE_LOCK_GUARD_2_ATTRS(_name, _lock, _unlock) \
+static inline class_##_name##_t class_##_name##_constructor(lock_##_name##_t *_T1, \
+ lock_##_name##_t *_T2) _lock; \
+static inline void class_##_name##_destructor(class_##_name##_t *_T) _unlock
+
#ifdef CONFIG_SMP
static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
@@ -2958,7 +2986,8 @@ static inline bool rq_order_less(struct rq *rq1, struct rq *rq2)
return rq1->cpu < rq2->cpu;
}
-extern void double_rq_lock(struct rq *rq1, struct rq *rq2);
+extern void double_rq_lock(struct rq *rq1, struct rq *rq2)
+ __acquires(&rq1->__lock, &rq2->__lock);
#ifdef CONFIG_PREEMPTION
@@ -2971,9 +3000,8 @@ extern void double_rq_lock(struct rq *rq1, struct rq *rq2);
* also adds more overhead and therefore may reduce throughput.
*/
static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(this_rq->lock)
- __acquires(busiest->lock)
- __acquires(this_rq->lock)
+ __must_hold(&this_rq->__lock)
+ __acquires(&busiest->__lock)
{
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
@@ -2990,9 +3018,8 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
* regardless of entry order into the function.
*/
static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(this_rq->lock)
- __acquires(busiest->lock)
- __acquires(this_rq->lock)
+ __must_hold(&this_rq->__lock)
+ __acquires(&busiest->__lock)
{
if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
likely(raw_spin_rq_trylock(busiest))) {
@@ -3018,6 +3045,8 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
* double_lock_balance - lock the busiest runqueue, this_rq is locked already.
*/
static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
+ __must_hold(&this_rq->__lock)
+ __acquires(&busiest->__lock)
{
lockdep_assert_irqs_disabled();
@@ -3025,14 +3054,17 @@ static inline int double_lock_balance(struct rq *this_rq, struct rq *busiest)
}
static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
- __releases(busiest->lock)
+ __releases(&busiest->__lock)
{
if (__rq_lockp(this_rq) != __rq_lockp(busiest))
raw_spin_rq_unlock(busiest);
+ else
+ __release(__rq_lockp(busiest)); /* fake release */
lock_set_subclass(&__rq_lockp(this_rq)->dep_map, 0, _RET_IP_);
}
static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
+ __acquires(l1, l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3042,6 +3074,7 @@ static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
}
static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
+ __acquires(l1, l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3051,6 +3084,7 @@ static inline void double_lock_irq(spinlock_t *l1, spinlock_t *l2)
}
static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
+ __acquires(l1, l2)
{
if (l1 > l2)
swap(l1, l2);
@@ -3060,6 +3094,7 @@ static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
}
static inline void double_raw_unlock(raw_spinlock_t *l1, raw_spinlock_t *l2)
+ __releases(l1, l2)
{
raw_spin_unlock(l1);
raw_spin_unlock(l2);
@@ -3069,6 +3104,8 @@ DEFINE_LOCK_GUARD_2(double_raw_spinlock, raw_spinlock_t,
double_raw_lock(_T->lock, _T->lock2),
double_raw_unlock(_T->lock, _T->lock2))
+DECLARE_LOCK_GUARD_2_ATTRS(double_raw_spinlock, __assumes_cap(_T1) __assumes_cap(_T2), /* */);
+
/*
* double_rq_unlock - safely unlock two runqueues
*
@@ -3076,13 +3113,12 @@ DEFINE_LOCK_GUARD_2(double_raw_spinlock, raw_spinlock_t,
* you need to do so manually after calling.
*/
static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
- __releases(rq1->lock)
- __releases(rq2->lock)
+ __releases(&rq1->__lock, &rq2->__lock)
{
if (__rq_lockp(rq1) != __rq_lockp(rq2))
raw_spin_rq_unlock(rq2);
else
- __release(rq2->lock);
+ __release(&rq2->__lock); /* fake release */
raw_spin_rq_unlock(rq1);
}
@@ -3100,8 +3136,7 @@ extern bool sched_smp_initialized;
* you need to do so manually before calling.
*/
static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
- __acquires(rq1->lock)
- __acquires(rq2->lock)
+ __acquires(&rq1->__lock, &rq2->__lock)
{
WARN_ON_ONCE(!irqs_disabled());
WARN_ON_ONCE(rq1 != rq2);
@@ -3117,8 +3152,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
* you need to do so manually after calling.
*/
static inline void double_rq_unlock(struct rq *rq1, struct rq *rq2)
- __releases(rq1->lock)
- __releases(rq2->lock)
+ __releases(&rq1->__lock, &rq2->__lock)
{
WARN_ON_ONCE(rq1 != rq2);
raw_spin_rq_unlock(rq1);
diff --git a/scripts/capability-analysis-suppression.txt b/scripts/capability-analysis-suppression.txt
index 95fb0b65a8e6..7ecd888ac522 100644
--- a/scripts/capability-analysis-suppression.txt
+++ b/scripts/capability-analysis-suppression.txt
@@ -26,6 +26,7 @@ src:*include/linux/refcount.h=emit
src:*include/linux/rhashtable.h=emit
src:*include/linux/rwlock*.h=emit
src:*include/linux/rwsem.h=emit
+src:*include/linux/sched*=emit
src:*include/linux/seqlock*.h=emit
src:*include/linux/spinlock*.h=emit
src:*include/linux/srcu*.h=emit
--
2.50.1.565.gc32cd1483b-goog
On 3/5/25 3:20 AM, Peter Zijlstra wrote:
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 248416ecd01c..d27607d9c2dc 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -945,6 +945,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
> */
> static inline struct queue_limits
> queue_limits_start_update(struct request_queue *q)
> + __acquires(q->limits_lock)
> {
> mutex_lock(&q->limits_lock);
> return q->limits;
> @@ -965,6 +966,7 @@ int blk_validate_limits(struct queue_limits *lim);
> * starting update.
> */
> static inline void queue_limits_cancel_update(struct request_queue *q)
> + __releases(q->limits_lock)
> {
> mutex_unlock(&q->limits_lock);
> }
The above is incomplete. Here is what I came up with myself:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 248416ecd01c..0d011270e642 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -945,15 +945,19 @@ static inline unsigned int
blk_boundary_sectors_left(sector_t offset,
*/
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
+ ACQUIRE(q->limits_lock)
{
mutex_lock(&q->limits_lock);
return q->limits;
}
int queue_limits_commit_update_frozen(struct request_queue *q,
- struct queue_limits *lim);
+ struct queue_limits *lim)
+ RELEASE(q->limits_lock);
int queue_limits_commit_update(struct request_queue *q,
- struct queue_limits *lim);
-int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
+ struct queue_limits *lim)
+ RELEASE(q->limits_lock);
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
+ EXCLUDES(q->limits_lock);
int blk_validate_limits(struct queue_limits *lim);
/**
@@ -965,6 +969,7 @@ int blk_validate_limits(struct queue_limits *lim);
* starting update.
*/
static inline void queue_limits_cancel_update(struct request_queue *q)
+ RELEASE(q->limits_lock)
{
mutex_unlock(&q->limits_lock);
}
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 80a5b3268986..283fb85d96c8 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1026,21 +1026,25 @@ static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> }
>
> static inline void device_lock(struct device *dev)
> + __acquires(dev->mutex)
> {
> mutex_lock(&dev->mutex);
> }
>
> static inline int device_lock_interruptible(struct device *dev)
> + __cond_acquires(0, dev->mutex)
> {
> return mutex_lock_interruptible(&dev->mutex);
> }
>
> static inline int device_trylock(struct device *dev)
> + __cond_acquires(true, dev->mutex)
> {
> return mutex_trylock(&dev->mutex);
> }
>
> static inline void device_unlock(struct device *dev)
> + __releases(dev->mutex)
> {
> mutex_unlock(&dev->mutex);
> }
I propose to annotate these functions with __no_capability_analysis as a
first step. Review of all callers of these functions in the entire
kernel tree learned me that annotating these functions results in a
significant number of false positives and not to the discovery of any
bugs. The false positives are triggered by conditional locking. An
example of code that triggers false positive thread-safety warnings:
static void ath9k_hif_usb_firmware_fail(struct hif_device_usb *hif_dev)
{
struct device *dev = &hif_dev->udev->dev;
struct device *parent = dev->parent;
complete_all(&hif_dev->fw_done);
if (parent)
device_lock(parent);
device_release_driver(dev);
if (parent)
device_unlock(parent);
}
Thanks,
Bart.
On Wed, Mar 05, 2025 at 07:27:32AM -0800, Bart Van Assche wrote:
> On 3/5/25 3:20 AM, Peter Zijlstra wrote:
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 248416ecd01c..d27607d9c2dc 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -945,6 +945,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
> > */
> > static inline struct queue_limits
> > queue_limits_start_update(struct request_queue *q)
> > + __acquires(q->limits_lock)
> > {
> > mutex_lock(&q->limits_lock);
> > return q->limits;
> > @@ -965,6 +966,7 @@ int blk_validate_limits(struct queue_limits *lim);
> > * starting update.
> > */
> > static inline void queue_limits_cancel_update(struct request_queue *q)
> > + __releases(q->limits_lock)
> > {
> > mutex_unlock(&q->limits_lock);
> > }
>
> The above is incomplete. Here is what I came up with myself:
Oh, I'm sure. I simply fixed whatever was topmost in the compile output
when trying to build kernel/sched/. After fixing these two, it stopped
complaining about blkdev.
I think it complains about these because they're inline, even though
they're otherwise unused.
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 80a5b3268986..283fb85d96c8 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -1026,21 +1026,25 @@ static inline bool dev_pm_test_driver_flags(struct device *dev, u32 flags)
> > }
> > static inline void device_lock(struct device *dev)
> > + __acquires(dev->mutex)
> > {
> > mutex_lock(&dev->mutex);
> > }
> > static inline int device_lock_interruptible(struct device *dev)
> > + __cond_acquires(0, dev->mutex)
> > {
> > return mutex_lock_interruptible(&dev->mutex);
> > }
> > static inline int device_trylock(struct device *dev)
> > + __cond_acquires(true, dev->mutex)
> > {
> > return mutex_trylock(&dev->mutex);
> > }
> > static inline void device_unlock(struct device *dev)
> > + __releases(dev->mutex)
> > {
> > mutex_unlock(&dev->mutex);
> > }
>
> I propose to annotate these functions with __no_capability_analysis as a
> first step. Review of all callers of these functions in the entire
> kernel tree learned me that annotating these functions results in a
> significant number of false positives and not to the discovery of any
> bugs. The false positives are triggered by conditional locking. An
> example of code that triggers false positive thread-safety warnings:
Yeah, I've ran into this as well. The thing is entirely stupid when it
sees a branch. This is really unfortunate. But I disagree, I would
annotate those functions that have conditional locking with
__no_capability_analysis, or possibly:
#define __confused_by_conditionals __no_capability_analysis
I'm also not quite sure how to annotate things like pte_lockptr().
Anyway, this thing has some promise, however it is *really*, as in
*really* *REALLY* simple. Anything remotely interesting, where you
actually want the help, it falls over.
But you gotta start somewhere I suppose. I think the thing that is
important here is how receptive the clang folks are to working on this
-- because it definitely needs work.
On Tue, Mar 04, 2025 at 10:20:59AM +0100, Marco Elver wrote: > === Initial Uses === > > With this initial series, the following synchronization primitives are > supported: `raw_spinlock_t`, `spinlock_t`, `rwlock_t`, `mutex`, > `seqlock_t`, `bit_spinlock`, RCU, SRCU (`srcu_struct`), `rw_semaphore`, > `local_lock_t`, `ww_mutex`. Wasn't there a limitation wrt recursion -- specifically RCU is very much a recursive lock and TS didn't really fancy that? > - Rename __var_guarded_by to simply __guarded_by. Initially the idea > was to be explicit about if the variable itself or the pointed-to > data is guarded, but in the long-term, making this shorter might be > better. > > - Likewise rename __ref_guarded_by to __pt_guarded_by. Shorter is better :-) Anyway; I think I would like to start talking about extensions for these asap. Notably I feel like we should have a means to annotate the rules for access/read vs modify/write to a variable. The obvious case is RCU; where holding RCU is sufficient to read, but modification requires a 'real' lock. This is not something that can be currently expressed. The other is the lock pattern I touched upon the other day, where reading is permitted when holding one of two locks, while writing requires holding both locks. Being able to explicitly write that in the __guarded_by() annotations is the cleanest way I think. Anyway, let me go stare at the actual patches :-)
On Tue, 4 Mar 2025 at 12:21, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Mar 04, 2025 at 10:20:59AM +0100, Marco Elver wrote: > > > === Initial Uses === > > > > With this initial series, the following synchronization primitives are > > supported: `raw_spinlock_t`, `spinlock_t`, `rwlock_t`, `mutex`, > > `seqlock_t`, `bit_spinlock`, RCU, SRCU (`srcu_struct`), `rw_semaphore`, > > `local_lock_t`, `ww_mutex`. > > Wasn't there a limitation wrt recursion -- specifically RCU is very much > a recursive lock and TS didn't really fancy that? Yup, I mentioned that in the rcu patch. Make it more prominent in documentation? > > - Rename __var_guarded_by to simply __guarded_by. Initially the idea > > was to be explicit about if the variable itself or the pointed-to > > data is guarded, but in the long-term, making this shorter might be > > better. > > > > - Likewise rename __ref_guarded_by to __pt_guarded_by. > > Shorter is better :-) > > Anyway; I think I would like to start talking about extensions for these > asap. > > Notably I feel like we should have a means to annotate the rules for > access/read vs modify/write to a variable. > > The obvious case is RCU; where holding RCU is sufficient to read, but > modification requires a 'real' lock. This is not something that can be > currently expressed. It can. It distinguishes between holding shared/read locks and exclusive/read-write locks. RCU is is a bit special because we also have rcu_dereference() and rcu_assign_pointer() and such, but in general if you only hold a "shared capability" e.g. the RCU read lock only, it won't let you write to __guarded_by variables. Again, the RCU case is special because updating RCU-guarded can be done any number of ways, so I had to make rcu_assign_pointer() a bit more relaxed. But besides RCU, the distinction between holding a lock exclusively or shared does what one would expect: holding the lock exclusively lets you write, and holding it shared only lets you only read a __guarded_by() member. > The other is the lock pattern I touched upon the other day, where > reading is permitted when holding one of two locks, while writing > requires holding both locks. > > Being able to explicitly write that in the __guarded_by() annotations is > the cleanest way I think. Simpler forms of this are possible if you stack __guarded_by(): you must hold both locks exclusively to write, otherwise you can only read (but must still hold both locks "shared", or "shared"+"exclusive"). The special case regarding "hold lock A -OR- B to read" is problematic of course - that can be solved by designing lock-wrappers that "fake acquire" some lock, or we do design some extension. We can go off and propose something to the Clang maintainers, but I fear that there are only few cases where we need __guarded_by(A OR B). If you say we need an extension, then we need a list of requirements that we can go and design a clear and implementable extension. In general, yes, the analysis imposes additional constraints, and not all kernel locking patterns will be expressible (if ever). But a lot of the "regular" code (drivers!) can be opted in today. Thanks, -- Marco
© 2016 - 2026 Red Hat, Inc.