Hello, I already tried to discuss this about 2 years ago, but nobody replied. Let me make another attempt. 1/1 is the trivial doc fix, but what do you think about "RFC 2/1" ? Does it make any sense? Oleg.
Another problem is that this API is error prone. Two years ago half of the read_seqbegin_or_lock() users were buggy (they followed the wrong example from Documentation/locking/seqlock.rst). And even if the code is mostly correct it is very easy to add a hard-to-detect mistake, see for example [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.com/ Can we improve this API? ------------------------------------------------------------------------------- To simplify, suppose we add the new helper static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) { int ret = !(*seq & 1) && read_seqretry(lock, *seq); if (ret) ++*seq; /* make this counter odd */ return ret; } which can be used instead of need_seqretry(). This way the users do not need to modify "seq" in the main loop. For example, we can simplify thread_group_cputime() as follows: --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; + unsigned int seq; unsigned long flags; /* @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) rcu_read_lock(); /* Attempt a lockless read on the first round. */ - nextseq = 0; + seq = 0; do { - seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); times->utime = sig->utime; times->stime = sig->stime; @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); done_seqretry_irqrestore(&sig->stats_lock, seq, flags); rcu_read_unlock(); } most (if not all) of other users can be changed the same way. ------------------------------------------------------------------------------- Or perhaps we can even add a helper that hides all the details, something like int xxx(seqlock_t *lock, int *seq, int lockless) { if (lockless) { *seq = read_seqbegin(lock); return 1; } if (*seq & 1) { read_sequnlock_excl(lock); return 0; } if (read_seqretry(lock, *seq)) { read_seqlock_excl(lock); *seq = 1; return 1; } return 0; } #define __XXX(lock, seq, lockless) \ for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) #define XXX(lock) \ __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) ? This way the users can simply do: seqlock_t sl; void func(void) { XXX(&sl) { ... read-side critical section ... } } using only the new XXX() helper. No need to declare/initialize seq, no need for need_seqretry/done_seqretry. What do you think? Oleg.
On Sun, Sep 28, 2025 at 06:20:54PM +0200, Oleg Nesterov wrote: > Can we improve this API? Please :-) > ------------------------------------------------------------------------------- > To simplify, suppose we add the new helper > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > { > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > if (ret) > ++*seq; /* make this counter odd */ > > return ret; > } How about need_seqretry_or_lock() to stay in theme with read_seqbegin_or_lock(). But then there's done_seqretry() without the _or_lock() :/ > which can be used instead of need_seqretry(). This way the users do not > need to modify "seq" in the main loop. For example, we can simplify > thread_group_cputime() as follows: > > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > struct signal_struct *sig = tsk->signal; > u64 utime, stime; > struct task_struct *t; > - unsigned int seq, nextseq; > + unsigned int seq; > unsigned long flags; > > /* > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > > rcu_read_lock(); > /* Attempt a lockless read on the first round. */ > - nextseq = 0; > + seq = 0; > do { > - seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > times->utime = sig->utime; > times->stime = sig->stime; > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > times->stime += stime; > times->sum_exec_runtime += read_sum_exec_runtime(t); > } > - /* If lockless access failed, take the lock. */ > - nextseq = 1; > - } while (need_seqretry(&sig->stats_lock, seq)); > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > rcu_read_unlock(); > } > > most (if not all) of other users can be changed the same way. > > ------------------------------------------------------------------------------- > Or perhaps we can even add a helper that hides all the details, something like > > int xxx(seqlock_t *lock, int *seq, int lockless) > { > if (lockless) { > *seq = read_seqbegin(lock); > return 1; > } > > if (*seq & 1) { > read_sequnlock_excl(lock); > return 0; > } > > if (read_seqretry(lock, *seq)) { > read_seqlock_excl(lock); > *seq = 1; > return 1; > } > > return 0; > > } > > #define __XXX(lock, seq, lockless) \ > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > #define XXX(lock) \ > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > ? Oh gawd, that thing had better not have control flow escape that loop. But yes, I suppose something like this is far more useable than the current thing. > This way the users can simply do: > > seqlock_t sl; > > void func(void) > { > XXX(&sl) { > ... read-side critical section ... > } > } > > using only the new XXX() helper. No need to declare/initialize seq, no need > for need_seqretry/done_seqretry. > > What do you think? I'm thinking we want something like this for the normal seqcount loops too :-)
On 10/01, Peter Zijlstra wrote: > On Sun, Sep 28, 2025 at 06:20:54PM +0200, Oleg Nesterov wrote: > > > To simplify, suppose we add the new helper > > > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > > { > > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > > > if (ret) > > ++*seq; /* make this counter odd */ ^^^^^^ Hmm. just *seq = 1; makes more sense > How about need_seqretry_or_lock() to stay in theme with > read_seqbegin_or_lock(). I am fine with any name ;) This one looks good to me. > > #define __XXX(lock, seq, lockless) \ > > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > > > #define XXX(lock) \ > > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > > > > ? > > Oh gawd, that thing had better not have control flow escape that loop. Yes, yes. "continue" is fine, but break/return won't work. > But yes, I suppose something like this is far more useable than the > current thing. OK, great. So, modulo naming, how about the patch below? The new stuff should obviously go to include/linux/seqlock.h, xxx() can be probably uninlined. thread_group_cputime() is changed as an example. Oleg. --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -306,6 +306,35 @@ static u64 read_sum_exec_runtime(struct task_struct *t) } #endif /* !CONFIG_64BIT */ +static inline int xxx(seqlock_t *lock, int lockless, int *seq, unsigned long *flags) +{ + if (lockless) { + *seq = read_seqbegin(lock); + return 1; + } else if (*seq & 1) { + if (flags) + read_sequnlock_excl_irqrestore(lock, *flags); + else + read_sequnlock_excl(lock); + return 0; + } else if (read_seqretry(lock, *seq)) { + if (flags) + read_seqlock_excl_irqsave(lock, *flags); + else + read_seqlock_excl(lock); + *seq = 1; + return 1; + } else { + return 0; + } +} + +#define __XXX(lock, lockless, seq, flags) \ + for (int lockless = 1, seq; xxx(lock, lockless, &seq, flags); lockless = 0) + +#define XXX(lock, flags) \ + __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) + /* * Accumulate raw cputime values of dead tasks (sig->[us]time) and live * tasks (sum on group iteration) belonging to @tsk's group. @@ -315,7 +344,6 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; unsigned long flags; /* @@ -330,11 +358,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) (void) task_sched_runtime(current); rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + XXX(&sig->stats_lock, &flags) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; @@ -345,10 +369,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); }
On 10/01, Oleg Nesterov wrote: > > +static inline int xxx(seqlock_t *lock, int lockless, int *seq, unsigned long *flags) > +{ > + if (lockless) { > + *seq = read_seqbegin(lock); > + return 1; > + } else if (*seq & 1) { > + if (flags) > + read_sequnlock_excl_irqrestore(lock, *flags); > + else > + read_sequnlock_excl(lock); > + return 0; > + } else if (read_seqretry(lock, *seq)) { > + if (flags) > + read_seqlock_excl_irqsave(lock, *flags); > + else > + read_seqlock_excl(lock); > + *seq = 1; > + return 1; > + } else { > + return 0; > + } > +} > + > +#define __XXX(lock, lockless, seq, flags) \ > + for (int lockless = 1, seq; xxx(lock, lockless, &seq, flags); lockless = 0) > + > +#define XXX(lock, flags) \ > + __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) Note that __XXX() can have users too. See the patch below. Not that it makes a lot of sense, just for example. Oleg. diff --git a/fs/d_path.c b/fs/d_path.c index bb365511066b..b4cde188be4f 100644 --- a/fs/d_path.c +++ b/fs/d_path.c @@ -332,28 +332,23 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; - int seq = 0; rcu_read_lock(); -restart: - dentry = d; - b = *p; - read_seqbegin_or_lock(&rename_lock, &seq); - while (!IS_ROOT(dentry)) { - const struct dentry *parent = dentry->d_parent; + __XXX(&rename_lock, lockless, seq, NULL) { + dentry = d; + b = *p; + while (!IS_ROOT(dentry)) { + const struct dentry *parent = dentry->d_parent; - prefetch(parent); - if (!prepend_name(&b, &dentry->d_name)) - break; - dentry = parent; - } - if (!(seq & 1)) - rcu_read_unlock(); - if (need_seqretry(&rename_lock, seq)) { - seq = 1; - goto restart; + prefetch(parent); + if (!prepend_name(&b, &dentry->d_name)) + break; + dentry = parent; + } + if (lockless) + rcu_read_unlock(); } - done_seqretry(&rename_lock, seq); + if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b);
Oleg Nesterov <oleg@redhat.com> wrote: > > Can we improve this API? It would also be nice to fix the static lock-balance detection stuff that you get when you enable advanced checking during a kernel build. It doesn't seem to understand seqlocks. > - nextseq = 0; > + seq = 0; Perhaps an init function or macro that hides this bit? void init_read_seqlock(int *seq) { *seq = 0; } init_read_seqlock(&seq); or: #define INIT_READ_SEQBEGIN 0 seq = INIT_READ_SEQBEGIN; Though if we can fold the whole loop inside a macro, that might make it easier to use. d_walk() in fs/dcache.c might give you issues, though. David
On 09/30, David Howells wrote: > > It would also be nice to fix the static lock-balance detection stuff that you > get when you enable advanced checking during a kernel build. It doesn't > seem to understand seqlocks. > > > - nextseq = 0; > > + seq = 0; > > Perhaps an init function or macro that hides this bit? > > void init_read_seqlock(int *seq) > { > *seq = 0; > } > > init_read_seqlock(&seq); > > or: > > #define INIT_READ_SEQBEGIN 0 > > seq = INIT_READ_SEQBEGIN; Agreed, > Though if we can fold the whole loop inside a macro, that might make it easier > to use. Yes... > d_walk() in fs/dcache.c might give you issues, though. Yes. But note that it can use need_seqretry_xxx() too, see the patch below. So. Do you think it makes any sense to add the new helper for the start? Can you suggest a better name? Say, check_restart() ? Oleg. --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1355,7 +1355,7 @@ static void d_walk(struct dentry *parent, void *data, spin_lock(&this_parent->d_lock); /* might go back up the wrong parent if we have had a rename. */ - if (need_seqretry(&rename_lock, seq)) + if (need_seqretry_xxx(&rename_lock, &seq)) goto rename_retry; /* go into the first sibling still alive */ hlist_for_each_entry_continue(dentry, d_sib) { @@ -1366,7 +1366,7 @@ static void d_walk(struct dentry *parent, void *data, } goto ascend; } - if (need_seqretry(&rename_lock, seq)) + if (need_seqretry_xxx(&rename_lock, &seq)) goto rename_retry; rcu_read_unlock(); @@ -1378,11 +1378,8 @@ static void d_walk(struct dentry *parent, void *data, rename_retry: spin_unlock(&this_parent->d_lock); rcu_read_unlock(); - BUG_ON(seq & 1); - if (!retry) - return; - seq = 1; - goto again; + if (retry) + goto again; } struct check_mount {
> Another problem is that this API is error prone. Two years ago half of the > read_seqbegin_or_lock() users were buggy (they followed the wrong example > from Documentation/locking/seqlock.rst). And even if the code is mostly > correct it is very easy to add a hard-to-detect mistake, see for example > > [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop > https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.co > m/ > > Can we improve this API? > > ------------------------------------------------------------------------------- > To simplify, suppose we add the new helper > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > { > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > if (ret) > ++*seq; /* make this counter odd */ > > return ret; > } > > which can be used instead of need_seqretry(). This way the users do not need > to modify "seq" in the main loop. For example, we can simplify > thread_group_cputime() as follows: > > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > struct signal_struct *sig = tsk->signal; > u64 utime, stime; > struct task_struct *t; > - unsigned int seq, nextseq; > + unsigned int seq; > unsigned long flags; > > /* > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > > rcu_read_lock(); > /* Attempt a lockless read on the first round. */ > - nextseq = 0; > + seq = 0; > do { > - seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > times->utime = sig->utime; > times->stime = sig->stime; > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > times->stime += stime; > times->sum_exec_runtime += read_sum_exec_runtime(t); > } > - /* If lockless access failed, take the lock. */ > - nextseq = 1; > - } while (need_seqretry(&sig->stats_lock, seq)); > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > rcu_read_unlock(); > } > If this API can be simplified, it should prevent future errors; I submitted a patch, inspired by inspired by your previous patch, and hope that all places use a fixed syntax, to prevent future errors; https://lkml.org/lkml/2025/7/31/616 thanks -Li > most (if not all) of other users can be changed the same way. > > ------------------------------------------------------------------------------- > Or perhaps we can even add a helper that hides all the details, something like > > int xxx(seqlock_t *lock, int *seq, int lockless) > { > if (lockless) { > *seq = read_seqbegin(lock); > return 1; > } > > if (*seq & 1) { > read_sequnlock_excl(lock); > return 0; > } > > if (read_seqretry(lock, *seq)) { > read_seqlock_excl(lock); > *seq = 1; > return 1; > } > > return 0; > > } > > #define __XXX(lock, seq, lockless) \ > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > #define XXX(lock) \ > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > ? > > This way the users can simply do: > > seqlock_t sl; > > void func(void) > { > XXX(&sl) { > ... read-side critical section ... > } > } > > using only the new XXX() helper. No need to declare/initialize seq, no need for > need_seqretry/done_seqretry. > > What do you think? > > Oleg.
On 09/29, Li,Rongqing wrote: > > > Another problem is that this API is error prone. Two years ago half of the > > read_seqbegin_or_lock() users were buggy (they followed the wrong example > > from Documentation/locking/seqlock.rst). And even if the code is mostly > > correct it is very easy to add a hard-to-detect mistake, see for example > > > > [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop > > https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.co > > m/ > > > > Can we improve this API? > > > > ------------------------------------------------------------------------------- > > To simplify, suppose we add the new helper > > > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > > { > > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > > > if (ret) > > ++*seq; /* make this counter odd */ > > > > return ret; > > } > > > > which can be used instead of need_seqretry(). This way the users do not need > > to modify "seq" in the main loop. For example, we can simplify > > thread_group_cputime() as follows: > > > > --- a/kernel/sched/cputime.c > > +++ b/kernel/sched/cputime.c > > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > struct signal_struct *sig = tsk->signal; > > u64 utime, stime; > > struct task_struct *t; > > - unsigned int seq, nextseq; > > + unsigned int seq; > > unsigned long flags; > > > > /* > > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > > > rcu_read_lock(); > > /* Attempt a lockless read on the first round. */ > > - nextseq = 0; > > + seq = 0; > > do { > > - seq = nextseq; > > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > > times->utime = sig->utime; > > times->stime = sig->stime; > > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > times->stime += stime; > > times->sum_exec_runtime += read_sum_exec_runtime(t); > > } > > - /* If lockless access failed, take the lock. */ > > - nextseq = 1; > > - } while (need_seqretry(&sig->stats_lock, seq)); > > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > > rcu_read_unlock(); > > } > > > > If this API can be simplified, it should prevent future errors; > > I submitted a patch, inspired by inspired by your previous patch, and hope that all places use a fixed syntax, to prevent future errors; > > https://lkml.org/lkml/2025/7/31/616 Well, I am not sure it makes a lot of sense to change thread_group_cputime() this way, "nextseq" or the "seq++" trick is a matter of taste. I tried to suggest a simplified API to avoid the manipulation of "seq" altogether. Oleg. > > most (if not all) of other users can be changed the same way. > > > > ------------------------------------------------------------------------------- > > Or perhaps we can even add a helper that hides all the details, something like > > > > int xxx(seqlock_t *lock, int *seq, int lockless) > > { > > if (lockless) { > > *seq = read_seqbegin(lock); > > return 1; > > } > > > > if (*seq & 1) { > > read_sequnlock_excl(lock); > > return 0; > > } > > > > if (read_seqretry(lock, *seq)) { > > read_seqlock_excl(lock); > > *seq = 1; > > return 1; > > } > > > > return 0; > > > > } > > > > #define __XXX(lock, seq, lockless) \ > > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > > > #define XXX(lock) \ > > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > > > > ? > > > > This way the users can simply do: > > > > seqlock_t sl; > > > > void func(void) > > { > > XXX(&sl) { > > ... read-side critical section ... > > } > > } > > > > using only the new XXX() helper. No need to declare/initialize seq, no need for > > need_seqretry/done_seqretry. > > > > What do you think? > > > > Oleg. >
© 2016 - 2025 Red Hat, Inc.