Use the newly introduced zone_lock_irqsave lock guard in
reserve_highatomic_pageblock() to replace the explicit lock/unlock and
goto out_unlock pattern with automatic scope-based cleanup.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com>
---
include/linux/mmzone_lock.h | 9 +++++++++
mm/page_alloc.c | 13 +++++--------
2 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/include/linux/mmzone_lock.h b/include/linux/mmzone_lock.h
index 6bd8b026029f..fe399a4505ba 100644
--- a/include/linux/mmzone_lock.h
+++ b/include/linux/mmzone_lock.h
@@ -97,4 +97,13 @@ static inline void zone_unlock_irq(struct zone *zone)
spin_unlock_irq(&zone->_lock);
}
+DEFINE_LOCK_GUARD_1(zone_lock_irqsave, struct zone,
+ zone_lock_irqsave(_T->lock, _T->flags),
+ zone_unlock_irqrestore(_T->lock, _T->flags),
+ unsigned long flags)
+DECLARE_LOCK_GUARD_1_ATTRS(zone_lock_irqsave,
+ __acquires(_T), __releases(*(struct zone **)_T))
+#define class_zone_lock_irqsave_constructor(_T) \
+ WITH_LOCK_GUARD_1_ATTRS(zone_lock_irqsave, _T)
+
#endif /* _LINUX_MMZONE_LOCK_H */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 75ee81445640..260fb003822a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3407,7 +3407,7 @@ static void reserve_highatomic_pageblock(struct page *page, int order,
struct zone *zone)
{
int mt;
- unsigned long max_managed, flags;
+ unsigned long max_managed;
/*
* The number reserved as: minimum is 1 pageblock, maximum is
@@ -3421,29 +3421,26 @@ static void reserve_highatomic_pageblock(struct page *page, int order,
if (zone->nr_reserved_highatomic >= max_managed)
return;
- zone_lock_irqsave(zone, flags);
+ guard(zone_lock_irqsave)(zone);
/* Recheck the nr_reserved_highatomic limit under the lock */
if (zone->nr_reserved_highatomic >= max_managed)
- goto out_unlock;
+ return;
/* Yoink! */
mt = get_pageblock_migratetype(page);
/* Only reserve normal pageblocks (i.e., they can merge with others) */
if (!migratetype_is_mergeable(mt))
- goto out_unlock;
+ return;
if (order < pageblock_order) {
if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) == -1)
- goto out_unlock;
+ return;
zone->nr_reserved_highatomic += pageblock_nr_pages;
} else {
change_pageblock_range(page, order, MIGRATE_HIGHATOMIC);
zone->nr_reserved_highatomic += 1 << order;
}
-
-out_unlock:
- zone_unlock_irqrestore(zone, flags);
}
/*
--
2.47.3
On Fri, 6 Mar 2026 16:05:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > Use the newly introduced zone_lock_irqsave lock guard in > reserve_highatomic_pageblock() to replace the explicit lock/unlock and > goto out_unlock pattern with automatic scope-based cleanup. > > ... > > - zone_lock_irqsave(zone, flags); > + guard(zone_lock_irqsave)(zone); guard() is cute, but this patch adds a little overhead - defconfig page_alloc.o text increases by 32 bytes, presumably all in reserve_highatomic_pageblock(). More instructions, larger cache footprint. So we're adding a little overhead to every user's Linux machine for all time. In return for which the developers get a little convenience and maintainability. Is it worth it?
On Fri, Mar 06, 2026 at 09:53:36AM -0800, Andrew Morton wrote: > On Fri, 6 Mar 2026 16:05:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > > > Use the newly introduced zone_lock_irqsave lock guard in > > reserve_highatomic_pageblock() to replace the explicit lock/unlock and > > goto out_unlock pattern with automatic scope-based cleanup. > > > > ... > > > > - zone_lock_irqsave(zone, flags); > > + guard(zone_lock_irqsave)(zone); > > guard() is cute, but this patch adds a little overhead - defconfig > page_alloc.o text increases by 32 bytes, presumably all in > reserve_highatomic_pageblock(). More instructions, larger cache > footprint. > > So we're adding a little overhead to every user's Linux machine for all > time. In return for which the developers get a little convenience and > maintainability. > > Is it worth it? Hi Andrew, Before respinning this series, I wanted to check if it's worth pursuing. At the time you noted the text size increase and questioned whether the trade-off makes sense. Since then, the guard infrastructure was fixed by Peter, so the code generation situation has improved. The main benefit of the series is still simplifying control flow in these functions (removing multiple unlock paths, gotos, etc.). Would you be open to this direction if the overhead is negligible, or would you prefer to avoid this kind of transformation regardless? I can also limit the series to only the more complex cases if that helps.
On Thu, 26 Mar 2026 18:04:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > On Fri, Mar 06, 2026 at 09:53:36AM -0800, Andrew Morton wrote: > > On Fri, 6 Mar 2026 16:05:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > > > > > Use the newly introduced zone_lock_irqsave lock guard in > > > reserve_highatomic_pageblock() to replace the explicit lock/unlock and > > > goto out_unlock pattern with automatic scope-based cleanup. > > > > > > ... > > > > > > - zone_lock_irqsave(zone, flags); > > > + guard(zone_lock_irqsave)(zone); > > > > guard() is cute, but this patch adds a little overhead - defconfig > > page_alloc.o text increases by 32 bytes, presumably all in > > reserve_highatomic_pageblock(). More instructions, larger cache > > footprint. > > > > So we're adding a little overhead to every user's Linux machine for all > > time. In return for which the developers get a little convenience and > > maintainability. > > > > Is it worth it? > > Hi Andrew, > > Before respinning this series, I wanted to check if it's worth pursuing. Probably. Much depends on the views of the people who regularly work on this code. Do they like guard(), or do they prefer the current explicit open-coded locking? > At the time you noted the text size increase and questioned whether the > trade-off makes sense. Since then, the guard infrastructure was fixed by > Peter, so the code generation situation has improved. Great. > The main benefit of the series is still simplifying control flow in > these functions (removing multiple unlock paths, gotos, etc.). > > Would you be open to this direction if the overhead is negligible, or > would you prefer to avoid this kind of transformation regardless? > > I can also limit the series to only the more complex cases if that > helps. Gee. I think it would be helpful to prepare a respin which reflects your current thinking, see what others think. Please understand that I'm resisting adding new material during this cycle (https://lkml.kernel.org/r/20260323202941.08ddf2b0411501cae801ab4c@linux-foundation.org) so you'd best be targeting 7.1-rc1 at the earliest. But sending out a new version during this cycle for people to consider would be a good step.
[ Adding Peter ] On Fri, 6 Mar 2026 09:53:36 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > On Fri, 6 Mar 2026 16:05:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: > > > Use the newly introduced zone_lock_irqsave lock guard in > > reserve_highatomic_pageblock() to replace the explicit lock/unlock and > > goto out_unlock pattern with automatic scope-based cleanup. > > > > ... > > > > - zone_lock_irqsave(zone, flags); > > + guard(zone_lock_irqsave)(zone); > > guard() is cute, but this patch adds a little overhead - defconfig > page_alloc.o text increases by 32 bytes, presumably all in > reserve_highatomic_pageblock(). More instructions, larger cache > footprint. > > So we're adding a little overhead to every user's Linux machine for all > time. In return for which the developers get a little convenience and > maintainability. I think maintainability is of importance. Is there any measurable slowdown? Or are we only worried about the text size increase? > > Is it worth it? This is being done all over the kernel. Perhaps we should look at ways to make the generic infrastructure more performant? -- Steve
On 3/6/26 19:00, Steven Rostedt wrote: > > [ Adding Peter ] > > On Fri, 6 Mar 2026 09:53:36 -0800 > Andrew Morton <akpm@linux-foundation.org> wrote: > >> On Fri, 6 Mar 2026 16:05:35 +0000 Dmitry Ilvokhin <d@ilvokhin.com> wrote: >> >> > Use the newly introduced zone_lock_irqsave lock guard in >> > reserve_highatomic_pageblock() to replace the explicit lock/unlock and >> > goto out_unlock pattern with automatic scope-based cleanup. >> > >> > ... >> > >> > - zone_lock_irqsave(zone, flags); >> > + guard(zone_lock_irqsave)(zone); >> >> guard() is cute, but this patch adds a little overhead - defconfig >> page_alloc.o text increases by 32 bytes, presumably all in >> reserve_highatomic_pageblock(). More instructions, larger cache >> footprint. I get this: Function old new delta get_page_from_freelist 6389 6452 +63 >> So we're adding a little overhead to every user's Linux machine for all >> time. In return for which the developers get a little convenience and >> maintainability. > > I think maintainability is of importance. Is there any measurable slowdown? > Or are we only worried about the text size increase? > >> >> Is it worth it? > > This is being done all over the kernel. Perhaps we should look at ways to > make the generic infrastructure more performant? Yeah I don't think the guard construct in this case should be doing anything here that wouldn't allow the compiler to compile to the exactly same result as before? Either there's some problem with the infra, or we're just victim of compiler heuristics. In both cases imho worth looking into rather than rejecting the construct. > -- Steve
On Fri, Mar 06, 2026 at 07:24:56PM +0100, Vlastimil Babka wrote: > Yeah I don't think the guard construct in this case should be doing anything > here that wouldn't allow the compiler to compile to the exactly same result > as before? Either there's some problem with the infra, or we're just victim > of compiler heuristics. In both cases imho worth looking into rather than > rejecting the construct. I'd love to look into it, but I can't seem to apply these patches to anything. By virtue of not actually having the patches, I had to resort to b4, and I think the incantation is something like: b4 shazam cover.1772811429.git.d@ilvokhin.com but it doesn't want to apply to anything I have at hand. Specifically, I tried Linus' tree and tip, which is most of what I have at hand.
On Sat, Mar 07, 2026 at 02:16:41PM +0100, Peter Zijlstra wrote: > On Fri, Mar 06, 2026 at 07:24:56PM +0100, Vlastimil Babka wrote: > > > Yeah I don't think the guard construct in this case should be doing anything > > here that wouldn't allow the compiler to compile to the exactly same result > > as before? Either there's some problem with the infra, or we're just victim > > of compiler heuristics. In both cases imho worth looking into rather than > > rejecting the construct. > > I'd love to look into it, but I can't seem to apply these patches to > anything. > > By virtue of not actually having the patches, I had to resort to b4, and > I think the incantation is something like: > > b4 shazam cover.1772811429.git.d@ilvokhin.com > > but it doesn't want to apply to anything I have at hand. Specifically, I > tried Linus' tree and tip, which is most of what I have at hand. Thanks for taking a look, Peter. This series is based on mm-new and depends on my earlier patchset: https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/ Those patches are currently only in Andrew's mm-new tree, so this series won't apply cleanly on Linus' tree or tip. It should apply on top of mm-new from: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
On Sat, Mar 07, 2026 at 02:09:41PM +0000, Dmitry Ilvokhin wrote:
> On Sat, Mar 07, 2026 at 02:16:41PM +0100, Peter Zijlstra wrote:
> > On Fri, Mar 06, 2026 at 07:24:56PM +0100, Vlastimil Babka wrote:
> >
> > > Yeah I don't think the guard construct in this case should be doing anything
> > > here that wouldn't allow the compiler to compile to the exactly same result
> > > as before? Either there's some problem with the infra, or we're just victim
> > > of compiler heuristics. In both cases imho worth looking into rather than
> > > rejecting the construct.
> >
> > I'd love to look into it, but I can't seem to apply these patches to
> > anything.
> >
> > By virtue of not actually having the patches, I had to resort to b4, and
> > I think the incantation is something like:
> >
> > b4 shazam cover.1772811429.git.d@ilvokhin.com
> >
> > but it doesn't want to apply to anything I have at hand. Specifically, I
> > tried Linus' tree and tip, which is most of what I have at hand.
>
> Thanks for taking a look, Peter.
>
> This series is based on mm-new and depends on my earlier patchset:
>
> https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/
>
> Those patches are currently only in Andrew's mm-new tree, so this series
> won't apply cleanly on Linus' tree or tip.
>
> It should apply on top of mm-new from:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
OK, so the big problem is __GUARD_IS_ERR(), and that came up before, but
while Linus told me how to fix it, he didn't actually like it very much:
https://lore.kernel.org/all/20250513085001.GC25891@noisy.programming.kicks-ass.net/
However it does help with this:
$ ./scripts/bloat-o-meter defconfig-build/mm/page_alloc-pre-gcc-16.o defconfig-build/mm/page_alloc-post-gcc-16.o | grep -v __UNIQUE
add/remove: 24/24 grow/shrink: 3/2 up/down: 296/-224 (72)
Function old new delta
get_page_from_freelist 6158 6198 +40
free_pcppages_bulk 678 714 +36
unreserve_highatomic_pageblock 708 736 +28
make_alloc_exact 280 264 -16
alloc_pages_bulk_noprof 1415 1399 -16
Total: Before=45299, After=45371, chg +0.16%
$ ./scripts/bloat-o-meter defconfig-build/mm/page_alloc-pre-gcc-16.o defconfig-build/mm/page_alloc.o | grep -v __UNIQUE
add/remove: 24/24 grow/shrink: 3/15 up/down: 277/-363 (-86)
Function old new delta
unreserve_highatomic_pageblock 708 757 +49
free_pcppages_bulk 678 707 +29
get_page_from_freelist 6158 6165 +7
try_to_claim_block 1729 1726 -3
setup_per_zone_wmarks 656 653 -3
free_pages_prepare 924 921 -3
calculate_totalreserve_pages 282 279 -3
alloc_frozen_pages_nolock_noprof 622 619 -3
__free_pages_prepare 924 921 -3
__free_pages_ok 1197 1194 -3
__free_one_page 1330 1327 -3
__free_frozen_pages 1303 1300 -3
__rmqueue_pcplist 2786 2777 -9
free_unref_folios 1905 1894 -11
setup_per_zone_lowmem_reserve 388 374 -14
make_alloc_exact 280 264 -16
__alloc_frozen_pages_noprof 5411 5368 -43
nr_free_zone_pages 189 138 -51
Total: Before=45299, After=45213, chg -0.19%
However, looking at things again, I think we can get rid of that
unconditional __GUARD_IS_ERR(), something like the below, Dan?
This then gives:
$ ./scripts/bloat-o-meter defconfig-build/mm/page_alloc-pre-gcc-16.o defconfig-build/mm/page_alloc.o | grep -v __UNIQUE
add/remove: 24/24 grow/shrink: 1/16 up/down: 213/-486 (-273)
Function old new delta
free_pcppages_bulk 678 699 +21
try_to_claim_block 1729 1723 -6
setup_per_zone_wmarks 656 650 -6
free_pages_prepare 924 918 -6
calculate_totalreserve_pages 282 276 -6
alloc_frozen_pages_nolock_noprof 622 616 -6
__free_pages_prepare 924 918 -6
__free_pages_ok 1197 1191 -6
__free_one_page 1330 1324 -6
__free_frozen_pages 1303 1297 -6
free_pages_exact 199 183 -16
setup_per_zone_lowmem_reserve 388 371 -17
free_unref_folios 1905 1888 -17
__rmqueue_pcplist 2786 2768 -18
nr_free_zone_pages 189 138 -51
__alloc_frozen_pages_noprof 5411 5359 -52
get_page_from_freelist 6158 6089 -69
Total: Before=45299, After=45026, chg -0.60%
Anyway, if you all care about the size of things -- those tracepoints
consume *WAAY* more bytes than any of this.
---
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -286,15 +286,18 @@ static __always_inline _type class_##_na
__no_context_analysis \
{ _type t = _init; return t; }
-#define EXTEND_CLASS(_name, ext, _init, _init_args...) \
-typedef lock_##_name##_t lock_##_name##ext##_t; \
+#define EXTEND_CLASS_COND(_name, ext, _cond, _init, _init_args...) \
+typedef lock_##_name##_t lock_##_name##ext##_t; \
typedef class_##_name##_t class_##_name##ext##_t; \
-static __always_inline void class_##_name##ext##_destructor(class_##_name##_t *p) \
-{ class_##_name##_destructor(p); } \
+static __always_inline void class_##_name##ext##_destructor(class_##_name##_t *_T) \
+{ if (_cond) return; class_##_name##_destructor(_T); } \
static __always_inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
__no_context_analysis \
{ class_##_name##_t t = _init; return t; }
+#define EXTEND_CLASS(_name, ext, _init, _init_args...) \
+ EXTEND_CLASS_COND(_name, ext, 0, _init, _init_args)
+
#define CLASS(_name, var) \
class_##_name##_t var __cleanup(class_##_name##_destructor) = \
class_##_name##_constructor
@@ -394,12 +397,12 @@ static __maybe_unused const bool class_#
__DEFINE_GUARD_LOCK_PTR(_name, _T)
#define DEFINE_GUARD(_name, _type, _lock, _unlock) \
- DEFINE_CLASS(_name, _type, if (!__GUARD_IS_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
+ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
DEFINE_CLASS_IS_GUARD(_name)
#define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext, \
+ EXTEND_CLASS_COND(_name, _ext, __GUARD_IS_ERR(*_T), \
({ void *_t = _T; int _RET = (_lock); if (_T && !(_cond)) _t = ERR_PTR(_RET); _t; }), \
class_##_name##_t _T) \
static __always_inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
@@ -488,7 +491,7 @@ typedef struct { \
static __always_inline void class_##_name##_destructor(class_##_name##_t *_T) \
__no_context_analysis \
{ \
- if (!__GUARD_IS_ERR(_T->lock)) { _unlock; } \
+ if (_T->lock) { _unlock; } \
} \
\
__DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)
@@ -565,7 +568,7 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
#define DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _cond) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext, \
+ EXTEND_CLASS_COND(_name, _ext, __GUARD_IS_ERR(_T->lock), \
({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
int _RET = (_lock); \
if (_T->lock && !(_cond)) _T->lock = ERR_PTR(_RET);\
Peter Zijlstra wrote: [..] > However, looking at things again, I think we can get rid of that > unconditional __GUARD_IS_ERR(), something like the below, Dan? I think it makes sense, do not make everyone pay the cost of __GUARD_IS_ERR() at least until a better __GUARD_IS_ERR() comes along. I gave the below a run through the CXL subsystem tests which uses conditional guards quite a bit. Worked fine, and looks good to me. So feel free to add a "tested by" from me. Not putting the actual tag here so that b4 does not slurp a tag for the wrong patchset. > --- > --- a/include/linux/cleanup.h > +++ b/include/linux/cleanup.h [..]
On Thu, Mar 12, 2026 at 04:40:45PM -0700, Dan Williams wrote: > Peter Zijlstra wrote: > [..] > > However, looking at things again, I think we can get rid of that > > unconditional __GUARD_IS_ERR(), something like the below, Dan? > > I think it makes sense, do not make everyone pay the cost of > __GUARD_IS_ERR() at least until a better __GUARD_IS_ERR() comes along. > > I gave the below a run through the CXL subsystem tests which uses > conditional guards quite a bit. Worked fine, and looks good to me. > > So feel free to add a "tested by" from me. Not putting the actual tag > here so that b4 does not slurp a tag for the wrong patchset. Excellent, thanks for having a look. Let me go write it up.
On Mon, Mar 09, 2026 at 05:45:16PM +0100, Peter Zijlstra wrote: > On Sat, Mar 07, 2026 at 02:09:41PM +0000, Dmitry Ilvokhin wrote: > > On Sat, Mar 07, 2026 at 02:16:41PM +0100, Peter Zijlstra wrote: > > > On Fri, Mar 06, 2026 at 07:24:56PM +0100, Vlastimil Babka wrote: > > > > > > > Yeah I don't think the guard construct in this case should be doing anything > > > > here that wouldn't allow the compiler to compile to the exactly same result > > > > as before? Either there's some problem with the infra, or we're just victim > > > > of compiler heuristics. In both cases imho worth looking into rather than > > > > rejecting the construct. > > > > > > I'd love to look into it, but I can't seem to apply these patches to > > > anything. > > > > > > By virtue of not actually having the patches, I had to resort to b4, and > > > I think the incantation is something like: > > > > > > b4 shazam cover.1772811429.git.d@ilvokhin.com > > > > > > but it doesn't want to apply to anything I have at hand. Specifically, I > > > tried Linus' tree and tip, which is most of what I have at hand. > > > > Thanks for taking a look, Peter. > > > > This series is based on mm-new and depends on my earlier patchset: > > > > https://lore.kernel.org/all/cover.1772206930.git.d@ilvokhin.com/ > > > > Those patches are currently only in Andrew's mm-new tree, so this series > > won't apply cleanly on Linus' tree or tip. > > > > It should apply on top of mm-new from: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > OK, so the big problem is __GUARD_IS_ERR(), and that came up before, but > while Linus told me how to fix it, he didn't actually like it very much: > > https://lore.kernel.org/all/20250513085001.GC25891@noisy.programming.kicks-ass.net/ Thanks for taking a look and digging into this. [...] > Anyway, if you all care about the size of things -- those tracepoints > consume *WAAY* more bytes than any of this. That's a fair point, but as I understand Andrew's main concern, the guard() usage becomes part of the code unconditionally, with no way to disable it, whereas tracepoints can be compiled out. Any overhead introduced by guards is therefore carried by all kernel builds. Given that, improvements to the guard infrastructure itself seem worth exploring regardless of whether this particular patchset ends up going in. If the overhead can be reduced or eliminated in the common case, it should make the trade-off much easier. Thanks again for investigating this and suggesting a possible approach.
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 2deccd5c862a0337a691bcfaa87919b4216e6103
Gitweb: https://git.kernel.org/tip/2deccd5c862a0337a691bcfaa87919b4216e6103
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 09 Mar 2026 17:40:42 +01:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 16 Mar 2026 13:16:49 +01:00
cleanup: Optimize guards
Andrew reported that a guard() conversion of zone_lock increased the
code size unnecessarily.
It turns out the unconditional __GUARD_IS_ERR() is to blame. As
explored earlier [1], __GUARD_IS_ERR(), similar to IS_ERR_OR_NULL(),
generates somewhat sub-optimal code.
However, looking at things again, it is possible to avoid doing the
__GUARD_IS_ERR() unconditionally. Revert the normal destructors to a
simple NULL test and only add the IS_ERR bit to COND guards.
This cures the reported overhead; as compiled by GCC-16:
page_alloc.o:
pre: Total: Before=45299, After=45371, chg +0.16%
post: Total: Before=45299, After=45026, chg -0.60%
[1] https://lkml.kernel.org/r/20250513085001.GC25891@noisy.programming.kicks-ass.net
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260309164516.GE606826@noisy.programming.kicks-ass.net
---
include/linux/cleanup.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index dbc4162..ea95ca4 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -286,15 +286,18 @@ static __always_inline _type class_##_name##_constructor(_init_args) \
__no_context_analysis \
{ _type t = _init; return t; }
-#define EXTEND_CLASS(_name, ext, _init, _init_args...) \
-typedef lock_##_name##_t lock_##_name##ext##_t; \
+#define EXTEND_CLASS_COND(_name, ext, _cond, _init, _init_args...) \
+typedef lock_##_name##_t lock_##_name##ext##_t; \
typedef class_##_name##_t class_##_name##ext##_t; \
-static __always_inline void class_##_name##ext##_destructor(class_##_name##_t *p) \
-{ class_##_name##_destructor(p); } \
+static __always_inline void class_##_name##ext##_destructor(class_##_name##_t *_T) \
+{ if (_cond) return; class_##_name##_destructor(_T); } \
static __always_inline class_##_name##_t class_##_name##ext##_constructor(_init_args) \
__no_context_analysis \
{ class_##_name##_t t = _init; return t; }
+#define EXTEND_CLASS(_name, ext, _init, _init_args...) \
+ EXTEND_CLASS_COND(_name, ext, 0, _init, _init_args)
+
#define CLASS(_name, var) \
class_##_name##_t var __cleanup(class_##_name##_destructor) = \
class_##_name##_constructor
@@ -394,12 +397,12 @@ static __maybe_unused const bool class_##_name##_is_conditional = _is_cond
__DEFINE_GUARD_LOCK_PTR(_name, _T)
#define DEFINE_GUARD(_name, _type, _lock, _unlock) \
- DEFINE_CLASS(_name, _type, if (!__GUARD_IS_ERR(_T)) { _unlock; }, ({ _lock; _T; }), _type _T); \
+ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
DEFINE_CLASS_IS_GUARD(_name)
#define DEFINE_GUARD_COND_4(_name, _ext, _lock, _cond) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext, \
+ EXTEND_CLASS_COND(_name, _ext, __GUARD_IS_ERR(*_T), \
({ void *_t = _T; int _RET = (_lock); if (_T && !(_cond)) _t = ERR_PTR(_RET); _t; }), \
class_##_name##_t _T) \
static __always_inline void * class_##_name##_ext##_lock_ptr(class_##_name##_t *_T) \
@@ -488,7 +491,7 @@ typedef struct { \
static __always_inline void class_##_name##_destructor(class_##_name##_t *_T) \
__no_context_analysis \
{ \
- if (!__GUARD_IS_ERR(_T->lock)) { _unlock; } \
+ if (_T->lock) { _unlock; } \
} \
\
__DEFINE_GUARD_LOCK_PTR(_name, &_T->lock)
@@ -565,7 +568,7 @@ __DEFINE_LOCK_GUARD_0(_name, _lock)
#define DEFINE_LOCK_GUARD_1_COND_4(_name, _ext, _lock, _cond) \
__DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \
- EXTEND_CLASS(_name, _ext, \
+ EXTEND_CLASS_COND(_name, _ext, __GUARD_IS_ERR(_T->lock), \
({ class_##_name##_t _t = { .lock = l }, *_T = &_t;\
int _RET = (_lock); \
if (_T->lock && !(_cond)) _T->lock = ERR_PTR(_RET);\
On Fri, 6 Mar 2026 19:24:56 +0100 Vlastimil Babka <vbabka@kernel.org> wrote: > >> > >> Is it worth it? > > > > This is being done all over the kernel. Perhaps we should look at ways to > > make the generic infrastructure more performant? > > Yeah I don't think the guard construct in this case should be doing anything > here that wouldn't allow the compiler to compile to the exactly same result > as before? Either there's some problem with the infra, or we're just victim > of compiler heuristics. Sure, it'd be good to figure this out. > In both cases imho worth looking into rather than > rejecting the construct. I'm not enjoying the ides of penalizing billions of machines all of the time in order to make life a little easier for the developers. Seems like a poor tradeoff.
On Fri, 6 Mar 2026 10:33:07 -0800 Andrew Morton <akpm@linux-foundation.org> wrote: > I'm not enjoying the ides of penalizing billions of machines all of the > time in order to make life a little easier for the developers. Seems > like a poor tradeoff. But if there's a bug due to not being as maintainable, that too will affect billions of machines! To me, that balances the tradeoff. -- Steve
© 2016 - 2026 Red Hat, Inc.