[v2] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

[PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and
error prone. With the new helper the "typical" code like

	int seq, nextseq;
	unsigned long flags;

	nextseq = 0;
	do {
		seq = nextseq;
		flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq);

		// read-side critical section

		nextseq = 1;
	} while (need_seqretry(&seqlock, seq));
	done_seqretry_irqrestore(&seqlock, seq, flags);

can be rewritten as

	scoped_seqlock_read_irqsave (&seqlock) {
		// read-side critical section
	}

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/seqlock.h | 64 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5ce48eab7a2a..fa8d73668f4b 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1209,4 +1209,68 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 	if (seq & 1)
 		read_sequnlock_excl_irqrestore(lock, flags);
 }
+
+/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */
+static inline bool
+__scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags)
+{
+	bool retry = false;
+
+	if (*seq & 1) {
+		if (flags)
+			read_sequnlock_excl_irqrestore(lock, *flags);
+		else
+			read_sequnlock_excl(lock);
+	} else if (read_seqretry(lock, *seq)) {
+		*seq = 1;
+		retry = true;
+		if (flags)
+			read_seqlock_excl_irqsave(lock, *flags);
+		else
+			read_seqlock_excl(lock);
+	}
+
+	return retry;
+}
+
+#define __scoped_seqlock_read(lock, s)	\
+	for (struct { bool lockless; int seq; }					\
+		s = { .lockless = true, .seq = read_seqbegin(lock) };		\
+	     s.lockless || __scoped_seqlock_read_retry(lock, &s.seq, NULL);	\
+	     s.lockless = false)
+
+#define __scoped_seqlock_read_irqsave(lock, s)	\
+	for (struct { bool lockless; int seq; unsigned long flags; }		\
+		s = { .lockless = true, .seq = read_seqbegin(lock) };		\
+	     s.lockless || __scoped_seqlock_read_retry(lock, &s.seq, &s.flags);	\
+	     s.lockless = false)
+
+/**
+ * scoped_seqlock_read(lock) - execute the read side critical section
+ *                             without manual sequence counter handling
+ *                             or calls to other helpers
+ * @lock: pointer to the seqlock_t protecting the data
+ *
+ * Example:
+ *
+ *	scoped_seqlock_read(&lock) {
+ *		// read-side critical section
+ *	}
+ *
+ * Starts with a lockless pass first. If it fails, restarts the critical
+ * section with the lock held.
+ *
+ * The critical section must not contain control flow that escapes the loop.
+ */
+#define scoped_seqlock_read(lock)	\
+	__scoped_seqlock_read(lock, __UNIQUE_ID(s))
+
+/**
+ * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but
+ *                                     disables irqs on a locking pass
+ * @lock: pointer to the seqlock_t protecting the data
+ */
+#define scoped_seqlock_read_irqsave(lock)	\
+	__scoped_seqlock_read_irqsave(lock, __UNIQUE_ID(s))
+
 #endif /* __LINUX_SEQLOCK_H */
-- 
2.25.1.362.g51ebf55

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

Bah.

Now I'm reading this again, and going through it more carefully, and
now I hate your helper.

Why?

Because I think that with the new organization, you can do so much better.

So the old code used "seq" as two different values:

 - the first loop around, it's the sequence count for the lockless
attempt and is always even because of how read_seqbegin() works (and
because the original code had that whole 'nextseq' hackery.

 - the second loop around, it's just a flag that now we hold the lock

and you converted the new scheme to that same model.

Which works, but in the new scheme, it's just all *pointless*.

Why do I say that?

Because in the new scheme, you have that explicit "lockless" flag
which is so much more legible in the first place.

You use it in the for-loop, but you don't use it in the helper function.

So all the games with

        if (*seq & 1) {
       ...
                *seq = 1;

are horribly non-intuitive and shouldn't exist because it would be
much more logical to just use the 'lockless' boolean instead.

Those games only make the code harder to understand, and I think they
also make the compiler unable to just until the "loop" twice.

So instead of this:

  static inline bool
  __scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags)
  {
         bool retry = false;

         if (*seq & 1) {
                 if (flags)
                         read_sequnlock_excl_irqrestore(lock, *flags);
                 else
                         read_sequnlock_excl(lock);
         } else if (read_seqretry(lock, *seq)) {
                 *seq = 1;
                 retry = true;
                 if (flags)
                         read_seqlock_excl_irqsave(lock, *flags);
                 else
                         read_seqlock_excl(lock);
         }

         return retry;
  }

I think that you should make that helper function be a macro - so that
it can take the unnamed union type as its argument - and then make it
do something like this instead:

  #define __scoped_seqlock_read_retry(s) ({ s.lockless ?       \
        (s.lockless = false) || read_seqretry(lock, s.seq) ?   \
                ({ read_seqlock_excl(lock); true }) : false :  \
        ({ read_sequnlock_excl(lock); false }) })

Ok, that hasn't even been close to a compiler and may be completely
buggy, but basically the logic is

 - if lockless: set lockless to false (unconditionally to make it real
easy for the compiler), then do the seq_retry, and if that says we
should retry, it does the locking and returns true (so that the loop
is re-done)

 - if not lockless, just unlock the lock and return false to say we're done

And yes, you'd need the irqsafe version too, and yes, you could do it
with the "if (flags)" thing or you could duplicate that macro, but you
could also just pass the lock/unlock sequence as an argument, and now
that macro looks even simpler

  #define __scoped_seqlock_retry(s, lock, unlock) ({ s.lockless ?  \
        (s.lockless = false) || read_seqretry(lock, s.seq) ?       \
                ({ lock ; true; }) : false :                       \
        ({ unlock ; false }) })

although then the users or that macro will be a bit uglier (maybe do
that with another level of macro to make each step simpler).

And I think the compiler will be able to optimize this better, because
the compiler actually can just follow the 's.lockless' tests and turn
all those tests into static jumps when it unrolls that loop twice
(first the lockless = true, then the lockless = false).

Again: I DID NOT FEED THIS TO A COMPILER. It may have syntax errors.
It may have logic errors,

Or it may be me having a complete breakdown and spouting nonsense. But
when re-reading your __scoped_seqlock_read_retry() helper, I really
really hated how it played those games with even/odd *seq, and I think
it's unnecessary.

Am I making sense? Who knows?

             Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Wed, 8 Oct 2025 at 09:05, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think that you should make that helper function be a macro - so that
> it can take the unnamed union type as its argument - and then make it
> do something like this instead:
>
>   #define __scoped_seqlock_read_retry(s) ({ s.lockless ?       \
>         (s.lockless = false) || read_seqretry(lock, s.seq) ?   \
>                 ({ read_seqlock_excl(lock); true }) : false :  \
>         ({ read_sequnlock_excl(lock); false }) })

Actually, it shouldn't do that "s.lockless = false" thing at all -
because that's done by the for() loop.

So that was just a thinko from trying to translate the whole "*seq =
1" hackery from the old model.

But that hackery shouldn't even exist since it's all handled naturally
and much more cleanly by the surrounding for-loop.

The only thing that the macro needs to worry about is whether it
should retry after the lockless case (and take the lock if so), or
whether it should just unlock after the locked case.

             Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Wed, 8 Oct 2025 at 22:31, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But that hackery shouldn't even exist since it's all handled naturally
> and much more cleanly by the surrounding for-loop.

I'm batting zero. I still think it's true that this logic should be
handled by the for-loop, but the *seq=1" hackery did actually do
something.

Because it, together with the 'lockless' flag, enumerated three
states, not two. It's just that the first state didn't _do_ anything,
so it was kind of easy to miss it when I read through the code.

So the three states are

 (a) before anything has been done (lockless, seq is even)

 (b) the "we've done the lockless, we need to end the loop or take the
lock" (!lockless, seq is even)

 (c) we've done the locked, we need to unlock and end (seq is odd)

and so relying on just a 'bool lockless' in the loop isn't enough.

I do still think this could be done at the for-loop level, so that the
compiler can statically see the stages and unroll it all to not be a
loop at all.

But I think it means that "lockless" would have to be a "phase", and go 0/1/2.

And I did get it to work that way, and did get gcc to unroll it to
generate really nice code. IOW, I can make this code

        scoped_seqlock_read(lock) {
                asm volatile("TEST");
        }

generate this assembly:

.L12:
        movl    (%rdi), %eax    #* lock, _25
        testb   $1, %al #, _25
        jne     .L14    #,
        TEST
        movl    (%rdi), %edx    # MEM[(const volatile unsigned int
*)lock_6(D)], _70
        cmpl    %edx, %eax      # _70, _25
        jne     .L15    #,
        ret
.L15:
        subq    $8, %rsp        #,
        addq    $4, %rdi        #, _74
        movq    %rdi, (%rsp)    # _74, %sfp
        call    _raw_spin_lock  #
        TEST
        movq    (%rsp), %rdi    # %sfp, _74
        addq    $8, %rsp        #,
        jmp     _raw_spin_unlock        #
.L14:
        pause
        jmp     .L12    #

which is basically optimal, but I have to say, the hoops I had to jump
through to get there makes me suspect it's not worth it.

(Note above how there is no sign of 'phase' left in the code, and the
only conditionals are on the actual sequence count state, and the nice
default action of no sequence count problems is all linear
fall-through code).

The irqsave version looks the same, just (obviously) calling a
different set of spinlock/unlock functions and having one extra
register argument for that.

But to get there, I had to use not only linux/unroll.h, but also make
that for-loop explicitly have that s.phase < 3 test just to get the
unrolling to actually trigger.

I'm attaching the test-case I wrote that does this. It's not
*horrific*, but the extra unrolling hackery makes me doubt that it's
really worth it.

So I suspect your patch is fine as-is, and I was not just wrong on
relying on the (insufficient) "locked" boolean, but the more complete
solution is just too damn ugly.

But hey, I *did* get gcc to generate nice code. So you might look at
my test-case below and decide that maybe there's something to be said
for this.

                 Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/09, Linus Torvalds wrote:
>
> I'm batting zero. I still think it's true that this logic should be
> handled by the for-loop, but the *seq=1" hackery did actually do
> something.

Yes...

> But I think it means that "lockless" would have to be a "phase", and go 0/1/2.

OK, I like your version more than mine. In particular, the fact that it
puts seq/flags into the unsigned long data "union", somehow I didn't think
about this option. Plus, yes, it removes the "make/check seq is odd". Nice.

But do we really want to unroll the loop? This code should be optimized
for the likely case when the lockless pass succeeds.

Let me think a bit more before I send V3...

> #define __scoped_seqlock_read(lock, s, cond) \
> 	unrolled_full for (struct { unsigned long phase, data; } s = { 0 }; \
> 		s.phase < 3 && cond(lock, s.phase, &s.data); s.phase++)

I guess in your version "struct" doesn't buy too much, just saves one
__UNIQUE_ID(). But this is a matter of taste.

Thanks!

Oleg.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Thu, Oct 09, 2025 at 04:37:49PM +0200, Oleg Nesterov wrote:

> Let me think a bit more before I send V3...

How do we feel about something a little like so?


diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5ce48eab7a2a..9786b8d14164 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1209,4 +1209,85 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 	if (seq & 1)
 		read_sequnlock_excl_irqrestore(lock, flags);
 }
+
+enum ss_state {
+	sss_done = 0,
+	sss_locked,
+	sss_lockless,
+};
+
+enum ss_type {
+	ss_lockless,
+	ss_lock,
+	ss_lock_irqsave,
+};
+
+struct ss_tmp {
+	enum ss_type	type;
+	enum ss_state	state;
+	int		seq;
+	unsigned long	flags;
+	spinlock_t	*lock;
+};
+
+static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
+{
+	if (!sst->lock)
+		return;
+
+	if (sst->type == ss_lock_irqsave) {
+		spin_unlock_irqrestore(sst->lock, sst->flags);
+		return;
+	}
+
+	spin_unlock(sst->lock);
+}
+
+static inline void
+__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock)
+{
+	switch (sst->state) {
+	case sss_lockless:
+		if (!read_seqretry(lock, sst->seq)) {
+			sst->state = sss_done;
+			return;
+		}
+
+		switch (sst->type) {
+		case ss_lock:
+			sst->lock = &lock->lock;
+			spin_lock(sst->lock);
+			sst->state = sss_locked;
+			return;
+
+		case ss_lock_irqsave:
+			sst->lock = &lock->lock;
+			spin_lock_irqsave(sst->lock, sst->flags);
+			sst->state = sss_locked;
+			return;
+
+		case ss_lockless:
+			sst->seq = read_seqbegin(lock);
+			return;
+		}
+
+	case sss_locked:
+		sst->state = sss_done;
+		return;
+
+	case sss_done:
+		BUG();
+	}
+}
+
+#define __scoped_seqlock_read(_seqlock, _type, _s)			\
+	for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = {	\
+		.type = _type, .state = sss_lockless,			\
+		.seq = read_seqbegin(_seqlock), .lock = NULL };		\
+	     _s.state != sss_done;					\
+	     __scoped_seqlock_next(&_s, _seqlock))
+
+#define scoped_seqlock_read(_seqlock, _type)				\
+	__scoped_seqlock_read(_seqlock, _type, __UNIQUE_ID(seqlock))
+
 #endif /* __LINUX_SEQLOCK_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..d2b3f987c888 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t)
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
-	u64 utime, stime;
 	struct task_struct *t;
-	unsigned int seq, nextseq;
-	unsigned long flags;
+	u64 utime, stime;
 
 	/*
 	 * Update current task runtime to account pending time since last
@@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	if (same_thread_group(current, tsk))
 		(void) task_sched_runtime(current);
 
-	rcu_read_lock();
-	/* Attempt a lockless read on the first round. */
-	nextseq = 0;
-	do {
-		seq = nextseq;
-		flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+	guard(rcu)();
+	scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) {
 		times->utime = sig->utime;
 		times->stime = sig->stime;
 		times->sum_exec_runtime = sig->sum_sched_runtime;
 
-		for_each_thread(tsk, t) {
+		__for_each_thread(sig, t) {
 			task_cputime(t, &utime, &stime);
 			times->utime += utime;
 			times->stime += stime;
 			times->sum_exec_runtime += read_sum_exec_runtime(t);
 		}
-		/* If lockless access failed, take the lock. */
-		nextseq = 1;
-	} while (need_seqretry(&sig->stats_lock, seq));
-	done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
-	rcu_read_unlock();
+	}
 }
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Thu, Oct 09, 2025 at 09:50:24PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 09, 2025 at 04:37:49PM +0200, Oleg Nesterov wrote:
> 
> > Let me think a bit more before I send V3...
> 
> How do we feel about something a little like so?

Slightly nicer version that's actually compiled :-)

---
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5ce48eab7a2a..7273fddc19a2 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1209,4 +1209,87 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 	if (seq & 1)
 		read_sequnlock_excl_irqrestore(lock, flags);
 }
+
+enum ss_state {
+	ss_done = 0,
+	ss_lock,
+	ss_lock_irqsave,
+	ss_lockless,
+};
+
+struct ss_tmp {
+	enum ss_state	state;
+	int		seq;
+	unsigned long	flags;
+	spinlock_t	*lock;
+};
+
+static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
+{
+	if (!sst->lock)
+		return;
+
+	if (sst->state == ss_lock_irqsave) {
+		spin_unlock_irqrestore(sst->lock, sst->flags);
+		return;
+	}
+
+	spin_unlock(sst->lock);
+}
+
+extern void __scoped_seqlock_fail(void);
+
+static inline void
+__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target)
+{
+	switch (sst->state) {
+	case ss_lockless:
+		if (!read_seqretry(lock, sst->seq)) {
+			sst->state = ss_done;
+			return;
+		}
+
+		switch (target) {
+		case ss_done:
+			__scoped_seqlock_fail();
+			return;
+
+		case ss_lock:
+			sst->lock = &lock->lock;
+			spin_lock(sst->lock);
+			sst->state = ss_lock;
+			return;
+
+		case ss_lock_irqsave:
+			sst->lock = &lock->lock;
+			spin_lock_irqsave(sst->lock, sst->flags);
+				sst->state = ss_lock_irqsave;
+			return;
+
+		case ss_lockless:
+			sst->seq = read_seqbegin(lock);
+			return;
+		}
+		BUG();
+
+	case ss_lock:
+	case ss_lock_irqsave:
+		sst->state = ss_done;
+		return;
+
+	case ss_done:
+		BUG();
+	}
+}
+
+#define __scoped_seqlock_read(_seqlock, _target, _s)			\
+	for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = {	\
+		.state = ss_lockless, .seq = read_seqbegin(_seqlock),	\
+		.lock = NULL };						\
+	     _s.state != ss_done;					\
+	     __scoped_seqlock_next(&_s, _seqlock, _target))
+
+#define scoped_seqlock_read(_seqlock, _target)				\
+	__scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock))
+
 #endif /* __LINUX_SEQLOCK_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..d2b3f987c888 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t)
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
-	u64 utime, stime;
 	struct task_struct *t;
-	unsigned int seq, nextseq;
-	unsigned long flags;
+	u64 utime, stime;
 
 	/*
 	 * Update current task runtime to account pending time since last
@@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	if (same_thread_group(current, tsk))
 		(void) task_sched_runtime(current);
 
-	rcu_read_lock();
-	/* Attempt a lockless read on the first round. */
-	nextseq = 0;
-	do {
-		seq = nextseq;
-		flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+	guard(rcu)();
+	scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) {
 		times->utime = sig->utime;
 		times->stime = sig->stime;
 		times->sum_exec_runtime = sig->sum_sched_runtime;
 
-		for_each_thread(tsk, t) {
+		__for_each_thread(sig, t) {
 			task_cputime(t, &utime, &stime);
 			times->utime += utime;
 			times->stime += stime;
 			times->sum_exec_runtime += read_sum_exec_runtime(t);
 		}
-		/* If lockless access failed, take the lock. */
-		nextseq = 1;
-	} while (need_seqretry(&sig->stats_lock, seq));
-	done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
-	rcu_read_unlock();
+	}
 }
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Slightly nicer version that's actually compiled :-)

I assume that "target of ss_lockless" is an intentional extension to
just make the loop never take a lock at all?

I do like it, but I think you'll find that having a separate 'seq' and
'flags' like this:

> +struct ss_tmp {
> +       enum ss_state   state;
> +       int             seq;
> +       unsigned long   flags;
> +       spinlock_t      *lock;
> +};

makes it unnecessarily waste a register.

You never need both seq and flags at the same time, since if you take
the spinlock the sequence number is pointless.

So please make that a union, and I think it will help avoid wasting a
register in the loop.

Other than that I like it. Except that "BUG()" really bugs me. It will
generate horrendous code for no reason and we *really* shouldn't add
BUG statements anyway.

Either that inline function is fine, or it isn't. Don't make it
generate stupid code for "I'm not fine" that will also be a huge pain
to debug because if that code is buggy it will presumably trigger in
context where the machine will be dead, dead, dead.

             Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Thu, Oct 09, 2025 at 01:24:51PM -0700, Linus Torvalds wrote:
> On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Slightly nicer version that's actually compiled :-)
> 
> I assume that "target of ss_lockless" is an intentional extension to
> just make the loop never take a lock at all?

Yep. Almost came for free, so might as well do it.

> I do like it, but I think you'll find that having a separate 'seq' and
> 'flags' like this:
> 
> > +struct ss_tmp {
> > +       enum ss_state   state;
> > +       int             seq;
> > +       unsigned long   flags;
> > +       spinlock_t      *lock;
> > +};
> 
> makes it unnecessarily waste a register.
> 
> You never need both seq and flags at the same time, since if you take
> the spinlock the sequence number is pointless.
> 
> So please make that a union, and I think it will help avoid wasting a
> register in the loop.

Sure; otoh compiler should be able to tell the same using liveness
analysis I suppose, but perhaps they're not *that* clever.

> Other than that I like it. Except that "BUG()" really bugs me. It will
> generate horrendous code for no reason and we *really* shouldn't add
> BUG statements anyway.
> 
> Either that inline function is fine, or it isn't. Don't make it
> generate stupid code for "I'm not fine" that will also be a huge pain
> to debug because if that code is buggy it will presumably trigger in
> context where the machine will be dead, dead, dead.

So I thought they were fine; we handle all the enum cases with 'return'
so its impossible to not exit the switch() but the silly compiler was
complaining about possible fall-through, so clearly it was getting
confused.

The ss_done case should never trip either -- perhaps I should make that
one of those __scoped_seqlock_fail like things as well.

Anyway, I'll play around with it some (and look at actual code-gen)
tomorrow.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Fri, Oct 10, 2025 at 12:12:42AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 09, 2025 at 01:24:51PM -0700, Linus Torvalds wrote:
> > On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Slightly nicer version that's actually compiled :-)
> > 
> > I assume that "target of ss_lockless" is an intentional extension to
> > just make the loop never take a lock at all?
> 
> Yep. Almost came for free, so might as well do it.
> 
> > I do like it, but I think you'll find that having a separate 'seq' and
> > 'flags' like this:
> > 
> > > +struct ss_tmp {
> > > +       enum ss_state   state;
> > > +       int             seq;
> > > +       unsigned long   flags;
> > > +       spinlock_t      *lock;
> > > +};
> > 
> > makes it unnecessarily waste a register.
> > 
> > You never need both seq and flags at the same time, since if you take
> > the spinlock the sequence number is pointless.
> > 
> > So please make that a union, and I think it will help avoid wasting a
> > register in the loop.
> 
> Sure; otoh compiler should be able to tell the same using liveness
> analysis I suppose, but perhaps they're not *that* clever.

The moment I use a union gcc-14 creates the whole structure on the
stack, while without the union it mostly manages to keep the things
in registers without too much spilling.

Also, there's a rather silly bug in the code, the __cleanup function
looks at ->state to determine which spin_unlock variant to pick, but
->state will be ss_done. Additionally, when someone does 'break' or
'goto' out of the scope the state isn't reliably anyway.

The sanest code gen happens when I do:

struct ss_tmp {
	enum ss_state	state;
	int		seq;
	unsigned long	flags;
	spinlock_t	*lock;
	spinlock_t	*lock_irqsave;
}

static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
{
	if (sst->lock)
		spin_unlock(sst->lock);
	if (sst->lock_irqsave)
		spin_unlock_irqrestore(sst->lock, sst->flags);
}

liveness analysis seems to work well, and it is able to determine which
fields are active and only track those in registers and utterly discard
the rest. Specifically in the thread_group_cputime code, there are only
_raw_spin_lock_irqsave and _raw_spin_unlock_irqrestore calls left, all
the other stuff is just optimized out.

I'll look what clang does ... but that's really tomorrow, or rather,
somewhat later this morning :/

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Thu, 9 Oct 2025 at 16:20, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The moment I use a union gcc-14 creates the whole structure on the
> stack, while without the union it mostly manages to keep the things
> in registers without too much spilling.

Ouch.

Do the union manually like I did by just using a "unsigned long" for
both 'seq' and 'flags'.

I bet that works fine - it seemed to work well for my - admittedly
somewhat different - version.

              Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Thu, 9 Oct 2025 at 15:12, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Sure; otoh compiler should be able to tell the same using liveness
> analysis I suppose, but perhaps they're not *that* clever.

They are that clever, but only if they end up unrolling the loop
statically. If the loop remains a loop, the two variables end up live
in the same code.

And while a compiler _could_ in theory still see that they aren't
actually live in the same _iteration_, I don't think any compiler
actually ends up being that clever in practice.

So making it a union then hopefully gets the compiler to basically use
that explicit information.

> So I thought they were fine; we handle all the enum cases with 'return'
> so its impossible to not exit the switch() but the silly compiler was
> complaining about possible fall-through, so clearly it was getting
> confused.

Yeah, I found the same thing with the 0/1/2 approach - the compiler
wouldn't realize that the range was limited until I added a very
explicit limit check that "shouldn't matter", but did.

This might obviously end up depending on compiler version and other
random things, but in general the whole value range analysis tends to
be a pretty fragile thing.

In practice, compilers tend to be good at doing value range analysis
if they see particular patterns (like initializing it to some value,
always incrementing it by one, and comparing against another value).

But when it's written more like a state machine like this, it's
clearly very hit and miss.

             Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Thu, Oct 09, 2025 at 03:55:15PM -0700, Linus Torvalds wrote:
> On Thu, 9 Oct 2025 at 15:12, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Sure; otoh compiler should be able to tell the same using liveness
> > analysis I suppose, but perhaps they're not *that* clever.
> 
> They are that clever, but only if they end up unrolling the loop
> statically. If the loop remains a loop, the two variables end up live
> in the same code.
> 
> And while a compiler _could_ in theory still see that they aren't
> actually live in the same _iteration_, I don't think any compiler
> actually ends up being that clever in practice.
> 
> So making it a union then hopefully gets the compiler to basically use
> that explicit information.

Right, so I had to use -Os to not make it unroll the thing, but then
indeed, sharing the variable helps it.

> > So I thought they were fine; we handle all the enum cases with 'return'
> > so its impossible to not exit the switch() but the silly compiler was
> > complaining about possible fall-through, so clearly it was getting
> > confused.
> 
> Yeah, I found the same thing with the 0/1/2 approach - the compiler
> wouldn't realize that the range was limited until I added a very
> explicit limit check that "shouldn't matter", but did.
> 
> This might obviously end up depending on compiler version and other
> random things, but in general the whole value range analysis tends to
> be a pretty fragile thing.
> 
> In practice, compilers tend to be good at doing value range analysis
> if they see particular patterns (like initializing it to some value,
> always incrementing it by one, and comparing against another value).
> 
> But when it's written more like a state machine like this, it's
> clearly very hit and miss.

I reordered the code, it is happier now.

Anyway, the below seems to generate decent code for
{-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-)

---
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5ce48eab7a2a..45fab026f7d6 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1209,4 +1209,83 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 	if (seq & 1)
 		read_sequnlock_excl_irqrestore(lock, flags);
 }
+
+enum ss_state {
+	ss_done = 0,
+	ss_lock,
+	ss_lock_irqsave,
+	ss_lockless,
+};
+
+struct ss_tmp {
+	enum ss_state	state;
+	unsigned long	data;
+	spinlock_t	*lock;
+	spinlock_t	*lock_irqsave;
+};
+
+static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
+{
+	if (sst->lock)
+		spin_unlock(sst->lock);
+	if (sst->lock_irqsave)
+		spin_unlock_irqrestore(sst->lock, sst->data);
+}
+
+extern void __scoped_seqlock_invalid_target(void);
+extern void __scoped_seqlock_bug(void);
+
+static inline void
+__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target)
+{
+	switch (sst->state) {
+	case ss_done:
+		__scoped_seqlock_bug();
+		return;
+
+	case ss_lock:
+	case ss_lock_irqsave:
+		sst->state = ss_done;
+		return;
+
+	case ss_lockless:
+		if (!read_seqretry(lock, sst->data)) {
+			sst->state = ss_done;
+			return;
+		}
+		break;
+	}
+
+	switch (target) {
+	case ss_done:
+		__scoped_seqlock_invalid_target();
+		return;
+
+	case ss_lock:
+		sst->lock = &lock->lock;
+		spin_lock(sst->lock);
+		sst->state = ss_lock;
+		return;
+
+	case ss_lock_irqsave:
+		sst->lock_irqsave = &lock->lock;
+		spin_lock_irqsave(sst->lock, sst->data);
+		sst->state = ss_lock_irqsave;
+		return;
+
+	case ss_lockless:
+		sst->data = read_seqbegin(lock);
+		return;
+	}
+}
+
+#define __scoped_seqlock_read(_seqlock, _target, _s)			\
+	for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) =	\
+	     { .state = ss_lockless, .data = read_seqbegin(_seqlock) };	\
+	     _s.state != ss_done;					\
+	     __scoped_seqlock_next(&_s, _seqlock, _target))
+
+#define scoped_seqlock_read(_seqlock, _target)				\
+	__scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock))
+
 #endif /* __LINUX_SEQLOCK_H */
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 7097de2c8cda..d2b3f987c888 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t)
 void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 {
 	struct signal_struct *sig = tsk->signal;
-	u64 utime, stime;
 	struct task_struct *t;
-	unsigned int seq, nextseq;
-	unsigned long flags;
+	u64 utime, stime;
 
 	/*
 	 * Update current task runtime to account pending time since last
@@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 	if (same_thread_group(current, tsk))
 		(void) task_sched_runtime(current);
 
-	rcu_read_lock();
-	/* Attempt a lockless read on the first round. */
-	nextseq = 0;
-	do {
-		seq = nextseq;
-		flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+	guard(rcu)();
+	scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) {
 		times->utime = sig->utime;
 		times->stime = sig->stime;
 		times->sum_exec_runtime = sig->sum_sched_runtime;
 
-		for_each_thread(tsk, t) {
+		__for_each_thread(sig, t) {
 			task_cputime(t, &utime, &stime);
 			times->utime += utime;
 			times->stime += stime;
 			times->sum_exec_runtime += read_sum_exec_runtime(t);
 		}
-		/* If lockless access failed, take the lock. */
-		nextseq = 1;
-	} while (need_seqretry(&sig->stats_lock, seq));
-	done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
-	rcu_read_unlock();
+	}
 }
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/10, Peter Zijlstra wrote:
>
> I reordered the code, it is happier now.
>
> Anyway, the below seems to generate decent code for
> {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-)

Another approach which looks better than mine ;)

Linus's version is simpler, but yours can handle break/return and
the "only lockless" case, good.

I leave this patch to you and Linus, he seems to like your code too.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

But... perhaps we should not "export" the _target names and instead
add the additional defines, something like

	scoped_seqlock_read()
	scoped_seqlock_read_or_lock()
	scoped_seqlock_read_or_lock_irqsave()

?

Oleg.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Fri, 10 Oct 2025 at 05:33, Oleg Nesterov <oleg@redhat.com> wrote:
>
> I leave this patch to you and Linus, he seems to like your code too.

Yeah, I think PeterZ's is the best of the bunch, I like how it looks
in that latest iteration.

           Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/10, Oleg Nesterov wrote:
>
> On 10/10, Peter Zijlstra wrote:
> >
> > I reordered the code, it is happier now.
> >
> > Anyway, the below seems to generate decent code for
> > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-)
>
> Another approach which looks better than mine ;)
>
> Linus's version is simpler, but yours can handle break/return and
> the "only lockless" case, good.
>
> I leave this patch to you and Linus, he seems to like your code too.
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>
>
> But... perhaps we should not "export" the _target names and instead
> add the additional defines, something like
>
> 	scoped_seqlock_read()
> 	scoped_seqlock_read_or_lock()
> 	scoped_seqlock_read_or_lock_irqsave()
>
> ?

And... perhaps we can simplify this code a little bit? I mean

	enum ss_state {
		ss_lockless	= 0,
		ss_lock		= 1,
		ss_lock_irqsave	= 2,
		ss_done		= 4,
	};

	struct ss_tmp {
		enum ss_state	state;
		unsigned long	data;
		seqlock_t	*lock;
	};

	static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
	{
		if (sst->state & ss_lock)
			spin_unlock(&sst->lock.lock);
		if (sst->state & ss_lock_irqsave)
			spin_unlock_irqrestore(&sst->lock.lock, sst->data);
	}

	static inline void
	__scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target)
	{
		switch (sst->state) {
		case ss_lock:
		case ss_lock_irqsave:
			sst->state |= ss_done;
			return;

		case ss_lockless:
			if (!read_seqretry(sst->lock, sst->data)) {
				sst->state = ss_done;
				return;
			}
			break;
		}

		switch (target) {
		case ss_lock:
			spin_lock(&sst->lock.lock);
			sst->state = ss_lock;
			return;

		case ss_lock_irqsave:
			spin_lock_irqsave(&sst->lock.lock, sst->data);
			sst->state = ss_lock_irqsave;
			return;

		case ss_lockless:
			sst->data = read_seqbegin(sst->lock);
			return;
		}
	}

	#define __scoped_seqlock_read(_seqlock, _target, _s)			\
		for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) =				\
		     { .state = ss_lockless, .data = read_seqbegin(_seqlock), .lock = __seqlock };	\
		     !(_s.state & ss_done);								\
		     __scoped_seqlock_next(&_s, _target))


(I removed __scoped_seqlock_invalid_target/__scoped_seqlock_bug to lessen the code).

Not sure this makes sense. Plus I didn't even try to compile this code and I have
no idea how this change can affect the code generation. But let me ask anyway...

Oleg.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Fri, Oct 10, 2025 at 03:14:39PM +0200, Oleg Nesterov wrote:
> On 10/10, Oleg Nesterov wrote:
> >
> > On 10/10, Peter Zijlstra wrote:
> > >
> > > I reordered the code, it is happier now.
> > >
> > > Anyway, the below seems to generate decent code for
> > > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-)
> >
> > Another approach which looks better than mine ;)
> >
> > Linus's version is simpler, but yours can handle break/return and
> > the "only lockless" case, good.
> >
> > I leave this patch to you and Linus, he seems to like your code too.
> >
> > Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> >
> >
> > But... perhaps we should not "export" the _target names and instead
> > add the additional defines, something like
> >
> > 	scoped_seqlock_read()
> > 	scoped_seqlock_read_or_lock()
> > 	scoped_seqlock_read_or_lock_irqsave()
> >
> > ?
> 
> And... perhaps we can simplify this code a little bit? I mean
> 
> 	enum ss_state {
> 		ss_lockless	= 0,
> 		ss_lock		= 1,
> 		ss_lock_irqsave	= 2,
> 		ss_done		= 4,
> 	};
> 
> 	struct ss_tmp {
> 		enum ss_state	state;
> 		unsigned long	data;
> 		seqlock_t	*lock;
> 	};
> 
> 	static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
> 	{
> 		if (sst->state & ss_lock)
> 			spin_unlock(&sst->lock.lock);
> 		if (sst->state & ss_lock_irqsave)
> 			spin_unlock_irqrestore(&sst->lock.lock, sst->data);
> 	}
> 
> 	static inline void
> 	__scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target)
> 	{
> 		switch (sst->state) {
> 		case ss_lock:
> 		case ss_lock_irqsave:
> 			sst->state |= ss_done;
> 			return;
> 
> 		case ss_lockless:
> 			if (!read_seqretry(sst->lock, sst->data)) {
> 				sst->state = ss_done;
> 				return;
> 			}
> 			break;
> 		}
> 
> 		switch (target) {
> 		case ss_lock:
> 			spin_lock(&sst->lock.lock);
> 			sst->state = ss_lock;
> 			return;
> 
> 		case ss_lock_irqsave:
> 			spin_lock_irqsave(&sst->lock.lock, sst->data);
> 			sst->state = ss_lock_irqsave;
> 			return;
> 
> 		case ss_lockless:
> 			sst->data = read_seqbegin(sst->lock);
> 			return;
> 		}
> 	}
> 
> 	#define __scoped_seqlock_read(_seqlock, _target, _s)			\
> 		for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) =				\
> 		     { .state = ss_lockless, .data = read_seqbegin(_seqlock), .lock = __seqlock };	\
> 		     !(_s.state & ss_done);								\
> 		     __scoped_seqlock_next(&_s, _target))
> 
> 
> (I removed __scoped_seqlock_invalid_target/__scoped_seqlock_bug to lessen the code).
> 
> Not sure this makes sense. Plus I didn't even try to compile this code and I have
> no idea how this change can affect the code generation. But let me ask anyway...

So GCC is clever enough to see through this scheme, but Clang gets
confused and generates worse code. Specifically it emits the whole
__scoped_seqlock_cleanup() sequence, testing both bits and both unlock
options.

Where previously it would only have to discover which field was written
to and could delete all code for the unwritten field, it now has to
track the state and discover ss_lock|ss_done is not possible while
ss_lock_irqsave|ss_done is.

So while that additional pointer might seem wasteful, it actually makes
the state tracking easier and allows the compiler to more easily throw
away stuff.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/13, Peter Zijlstra wrote:
>
> On Fri, Oct 10, 2025 at 03:14:39PM +0200, Oleg Nesterov wrote:
> > 	static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
> > 	{
> > 		if (sst->state & ss_lock)
> > 			spin_unlock(&sst->lock.lock);
> > 		if (sst->state & ss_lock_irqsave)
> > 			spin_unlock_irqrestore(&sst->lock.lock, sst->data);
> > 	}
> > 
> > 	static inline void
> > 	__scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target)
> > 	{
> > 		switch (sst->state) {
> > 		case ss_lock:
> > 		case ss_lock_irqsave:
> > 			sst->state |= ss_done;
> 
> So GCC is clever enough to see through this scheme, but Clang gets
> confused and generates worse code. Specifically it emits the whole
> __scoped_seqlock_cleanup() sequence, testing both bits and both unlock
> options.

OK, thanks and sorry for the noise then.

> So while that additional pointer might seem wasteful, it actually makes
> the state tracking easier and allows the compiler to more easily throw
> away stuff.

Great ;)

Oleg.

[tip: locking/core] seqlock: Introduce scoped_seqlock_read()

Posted by tip-bot2 for Peter Zijlstra 3 months, 3 weeks ago

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     cc39f3872c0865bef992b713338df369554fa9e0
Gitweb:        https://git.kernel.org/tip/cc39f3872c0865bef992b713338df369554fa9e0
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 09 Oct 2025 22:11:54 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 21 Oct 2025 12:31:57 +02:00

seqlock: Introduce scoped_seqlock_read()

The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and
error prone. With the new helper the "typical" code like

	int seq, nextseq;
	unsigned long flags;

	nextseq = 0;
	do {
		seq = nextseq;
		flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq);

		// read-side critical section

		nextseq = 1;
	} while (need_seqretry(&seqlock, seq));
	done_seqretry_irqrestore(&seqlock, seq, flags);

can be rewritten as

	scoped_seqlock_read (&seqlock, ss_lock_irqsave) {
		// read-side critical section
	}

Original idea by Oleg Nesterov; with contributions from Linus.

Originally-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/seqlock.h | 111 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 5ce48ea..b7bcc41 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -1209,4 +1209,115 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags)
 	if (seq & 1)
 		read_sequnlock_excl_irqrestore(lock, flags);
 }
+
+enum ss_state {
+	ss_done = 0,
+	ss_lock,
+	ss_lock_irqsave,
+	ss_lockless,
+};
+
+struct ss_tmp {
+	enum ss_state	state;
+	unsigned long	data;
+	spinlock_t	*lock;
+	spinlock_t	*lock_irqsave;
+};
+
+static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
+{
+	if (sst->lock)
+		spin_unlock(sst->lock);
+	if (sst->lock_irqsave)
+		spin_unlock_irqrestore(sst->lock_irqsave, sst->data);
+}
+
+extern void __scoped_seqlock_invalid_target(void);
+
+#if defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 90000
+/*
+ * For some reason some GCC-8 architectures (nios2, alpha) have trouble
+ * determining that the ss_done state is impossible in __scoped_seqlock_next()
+ * below.
+ */
+static inline void __scoped_seqlock_bug(void) { }
+#else
+/*
+ * Canary for compiler optimization -- if the compiler doesn't realize this is
+ * an impossible state, it very likely generates sub-optimal code here.
+ */
+extern void __scoped_seqlock_bug(void);
+#endif
+
+static inline void
+__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target)
+{
+	switch (sst->state) {
+	case ss_done:
+		__scoped_seqlock_bug();
+		return;
+
+	case ss_lock:
+	case ss_lock_irqsave:
+		sst->state = ss_done;
+		return;
+
+	case ss_lockless:
+		if (!read_seqretry(lock, sst->data)) {
+			sst->state = ss_done;
+			return;
+		}
+		break;
+	}
+
+	switch (target) {
+	case ss_done:
+		__scoped_seqlock_invalid_target();
+		return;
+
+	case ss_lock:
+		sst->lock = &lock->lock;
+		spin_lock(sst->lock);
+		sst->state = ss_lock;
+		return;
+
+	case ss_lock_irqsave:
+		sst->lock_irqsave = &lock->lock;
+		spin_lock_irqsave(sst->lock_irqsave, sst->data);
+		sst->state = ss_lock_irqsave;
+		return;
+
+	case ss_lockless:
+		sst->data = read_seqbegin(lock);
+		return;
+	}
+}
+
+#define __scoped_seqlock_read(_seqlock, _target, _s)			\
+	for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) =	\
+	     { .state = ss_lockless, .data = read_seqbegin(_seqlock) };	\
+	     _s.state != ss_done;					\
+	     __scoped_seqlock_next(&_s, _seqlock, _target))
+
+/**
+ * scoped_seqlock_read (lock, ss_state) - execute the read side critical
+ *                                        section without manual sequence
+ *                                        counter handling or calls to other
+ *                                        helpers
+ * @lock: pointer to seqlock_t protecting the data
+ * @ss_state: one of {ss_lock, ss_lock_irqsave, ss_lockless} indicating
+ *            the type of critical read section
+ *
+ * Example:
+ *
+ *     scoped_seqlock_read (&lock, ss_lock) {
+ *         // read-side critical section
+ *     }
+ *
+ * Starts with a lockess pass first. If it fails, restarts the critical
+ * section with the lock held.
+ */
+#define scoped_seqlock_read(_seqlock, _target)				\
+	__scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock))
+
 #endif /* __LINUX_SEQLOCK_H */

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Linus Torvalds 4 months ago

On Thu, 9 Oct 2025 at 07:39, Oleg Nesterov <oleg@redhat.com> wrote:
>
> But do we really want to unroll the loop? This code should be optimized
> for the likely case when the lockless pass succeeds.

Dropping the unroll magic certainly makes the patch more palatable,
and maybe it doesn't matter that much.

It really was mainly me going "this should be easy for the compiler to
see the flow statically", and the need to keep two state variables
annoyed me.

But I guess many of the users are complicated enough that you actually
don't want to have two copies of the inner body. If they weren't
complicated, they'd just use the simpler

   do { } while (read_seqcount_retry(...));

model.

So you are right, I was probably chasing that optimal code generation
for no real reason.

            Linus

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/08, Linus Torvalds wrote:
>
> Bah.
>
> Now I'm reading this again, and going through it more carefully, and
> now I hate your helper.

OK ;)

let me read/check your email and reply tomorrow, after sleep.

Oleg.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote:
> The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and
> error prone. With the new helper the "typical" code like
> 
> 	int seq, nextseq;
> 	unsigned long flags;
> 
> 	nextseq = 0;
> 	do {
> 		seq = nextseq;
> 		flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq);
> 
> 		// read-side critical section
> 
> 		nextseq = 1;
> 	} while (need_seqretry(&seqlock, seq));
> 	done_seqretry_irqrestore(&seqlock, seq, flags);
> 
> can be rewritten as
> 
> 	scoped_seqlock_read_irqsave (&seqlock) {
> 		// read-side critical section
> 	}
> 

Hmm, on first reading I was expecting that to be:

	do {
		seq = read_seqbegin(&seqlock);

		// read-side section

	} while (read_seqretry(&seqlock, seq));

for lack of that _or_lock() wording, but I suppose we can make that
something like:

 	scoped_seqbegin_read (&seqlock) {
		// read-side section
	}

which is distinctive enough.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Oleg Nesterov 4 months ago

On 10/08, Peter Zijlstra wrote:
>
> On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote:
> >
> > 	scoped_seqlock_read_irqsave (&seqlock) {
> > 		// read-side critical section
> > 	}
> >
>
> Hmm, on first reading I was expecting that to be:
>
> 	do {
> 		seq = read_seqbegin(&seqlock);
>
> 		// read-side section
>
> 	} while (read_seqretry(&seqlock, seq));
>
> for lack of that _or_lock() wording, but I suppose we can make that
> something like:
>
>  	scoped_seqbegin_read (&seqlock) {

I was thinking about scoped_seqcount_read() but I agree with any naming

Oleg.

Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()

Posted by Peter Zijlstra 4 months ago

On Wed, Oct 08, 2025 at 02:59:03PM +0200, Oleg Nesterov wrote:
> On 10/08, Peter Zijlstra wrote:
> >
> > On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote:
> > >
> > > 	scoped_seqlock_read_irqsave (&seqlock) {
> > > 		// read-side critical section
> > > 	}
> > >
> >
> > Hmm, on first reading I was expecting that to be:
> >
> > 	do {
> > 		seq = read_seqbegin(&seqlock);
> >
> > 		// read-side section
> >
> > 	} while (read_seqretry(&seqlock, seq));
> >
> > for lack of that _or_lock() wording, but I suppose we can make that
> > something like:
> >
> >  	scoped_seqbegin_read (&seqlock) {
> 
> I was thinking about scoped_seqcount_read() but I agree with any naming

Ah, but that would take a seqcount_*t not a seqlock_t no?

Or we can do:

	scoped_seqcount_read (&seqlock.seqcount)

But yeah, for later I suppose.