[PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()

David Yang posted 4 patches 2 weeks, 6 days ago
[PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()
Posted by David Yang 2 weeks, 6 days ago
The following (anti-)pattern was observed in the code tree:

        do {
                start = u64_stats_fetch_begin(&pstats->syncp);
                memcpy(&temp, &pstats->stats, sizeof(temp));
        } while (u64_stats_fetch_retry(&pstats->syncp, start));

On 64bit arches, struct u64_stats_sync is empty and provides no help
against load/store tearing, especially for memcpy(), for which arches may
provide their highly-optimized implements.

In theory the affected code should convert to u64_stats_t, or use
READ_ONCE()/WRITE_ONCE() properly.

However since there are needs to copy chunks of statistics, instead of
writing loops at random places, we provide a safe memcpy() variant for
u64_stats.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 include/linux/u64_stats_sync.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 457879938fc1..849ff6e159c6 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -79,6 +79,14 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
 	return local64_read(&p->v);
 }
 
+static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
+{
+	BUILD_BUG_ON(len % sizeof(u64_stats_t));
+	for (size_t i = 0; i < len / sizeof(u64_stats_t); i++)
+		((u64 *)dst)[i] = local64_read(&((local64_t *)src)[i]);
+	return dst;
+}
+
 static inline void u64_stats_set(u64_stats_t *p, u64 val)
 {
 	local64_set(&p->v, val);
@@ -110,6 +118,7 @@ static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp,
 }
 
 #else /* 64 bit */
+#include <linux/string.h>
 
 typedef struct {
 	u64		v;
@@ -120,6 +129,12 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
 	return p->v;
 }
 
+static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
+{
+	BUILD_BUG_ON(len % sizeof(u64_stats_t));
+	return memcpy(dst, src, len);
+}
+
 static inline void u64_stats_set(u64_stats_t *p, u64 val)
 {
 	p->v = val;
-- 
2.51.0
Re: [PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()
Posted by Sabrina Dubroca 2 weeks, 4 days ago
2026-01-20, 17:21:29 +0800, David Yang wrote:
> The following (anti-)pattern was observed in the code tree:
> 
>         do {
>                 start = u64_stats_fetch_begin(&pstats->syncp);
>                 memcpy(&temp, &pstats->stats, sizeof(temp));
>         } while (u64_stats_fetch_retry(&pstats->syncp, start));
> 
> On 64bit arches, struct u64_stats_sync is empty and provides no help
> against load/store tearing, especially for memcpy(), for which arches may
> provide their highly-optimized implements.
> 
> In theory the affected code should convert to u64_stats_t, or use
> READ_ONCE()/WRITE_ONCE() properly.
> 
> However since there are needs to copy chunks of statistics, instead of
> writing loops at random places, we provide a safe memcpy() variant for
> u64_stats.
> 
> Signed-off-by: David Yang <mmyangfl@gmail.com>
> ---
>  include/linux/u64_stats_sync.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
> index 457879938fc1..849ff6e159c6 100644
> --- a/include/linux/u64_stats_sync.h
> +++ b/include/linux/u64_stats_sync.h
> @@ -79,6 +79,14 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
>  	return local64_read(&p->v);
>  }
>  
> +static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
> +{
> +	BUILD_BUG_ON(len % sizeof(u64_stats_t));
> +	for (size_t i = 0; i < len / sizeof(u64_stats_t); i++)
> +		((u64 *)dst)[i] = local64_read(&((local64_t *)src)[i]);

Maybe u64_stats_read/u64_stats_t instead of local64_read/local64_t?

> +	return dst;
> +}

Since this new helper is always used within a
u64_stats_fetch_begin/u64_stats_fetch_retry loop, maybe it would be
nicer to push the retry loop into the helper as well?  Not a strong
opinion. It would be a bit "simpler" for the callers, but your current
proposal has the advantage of looking like memcpy(), and of also
looking (for the caller) like other retry loops fetching each counter
explicitly.

Either way, I think extending the "Usage" section of the big comment
at the top of the file with this new helper would be nice.

-- 
Sabrina
Re: [PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()
Posted by Yangfl 2 weeks, 4 days ago
On Thu, Jan 22, 2026 at 1:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2026-01-20, 17:21:29 +0800, David Yang wrote:
> > The following (anti-)pattern was observed in the code tree:
> >
> >         do {
> >                 start = u64_stats_fetch_begin(&pstats->syncp);
> >                 memcpy(&temp, &pstats->stats, sizeof(temp));
> >         } while (u64_stats_fetch_retry(&pstats->syncp, start));
> >
> > On 64bit arches, struct u64_stats_sync is empty and provides no help
> > against load/store tearing, especially for memcpy(), for which arches may
> > provide their highly-optimized implements.
> >
> > In theory the affected code should convert to u64_stats_t, or use
> > READ_ONCE()/WRITE_ONCE() properly.
> >
> > However since there are needs to copy chunks of statistics, instead of
> > writing loops at random places, we provide a safe memcpy() variant for
> > u64_stats.
> >
> > Signed-off-by: David Yang <mmyangfl@gmail.com>
> > ---
> >  include/linux/u64_stats_sync.h | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
> > index 457879938fc1..849ff6e159c6 100644
> > --- a/include/linux/u64_stats_sync.h
> > +++ b/include/linux/u64_stats_sync.h
> > @@ -79,6 +79,14 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
> >       return local64_read(&p->v);
> >  }
> >
> > +static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
> > +{
> > +     BUILD_BUG_ON(len % sizeof(u64_stats_t));
> > +     for (size_t i = 0; i < len / sizeof(u64_stats_t); i++)
> > +             ((u64 *)dst)[i] = local64_read(&((local64_t *)src)[i]);
>
> Maybe u64_stats_read/u64_stats_t instead of local64_read/local64_t?
>

I think casting to u64_stats_t is a bit overkill here since we accept
const void * and we are the actual implementation.

Again, they should convert to u64_stats_t, and this solution already
implies that u64_stats_t is binary compatible with u64. I've already
sent several patches related to that, but that's another issue.

> > +     return dst;
> > +}
>
> Since this new helper is always used within a
> u64_stats_fetch_begin/u64_stats_fetch_retry loop, maybe it would be
> nicer to push the retry loop into the helper as well?  Not a strong
> opinion. It would be a bit "simpler" for the callers, but your current
> proposal has the advantage of looking like memcpy(), and of also
> looking (for the caller) like other retry loops fetching each counter
> explicitly.
>

The callers may want to copy other discontinuous data as well, albeit
no one did it then.

         do {
                 start = u64_stats_fetch_begin(&pstats->syncp);
                 memcpy(...);
                 u64_stats_read(...);
         } while (u64_stats_fetch_retry(&pstats->syncp, start));

It would be redundant to provide two variants of the function.
Moreover, callers can (and already) invent their own reader/writer
helpers, for example

         #define SLIC_GET_STATS_COUNTER(newst, st, counter) \
         { \
                  unsigned int start; \
                  do { \
                          start = u64_stats_fetch_begin(&(st)->syncp); \
                          newst = u64_stats_read(&(st)->counter); \
                  } while (u64_stats_fetch_retry(&(st)->syncp, start)); \
         }

> Either way, I think extending the "Usage" section of the big comment
> at the top of the file with this new helper would be nice.
>

I think callers should avoid memcpy() eventually, and they almost
certainly copied more data than what they need. However, I took a look
at some instances, and it would be non trivial to modify those
drivers.

> --
> Sabrina
Re: [PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()
Posted by Sabrina Dubroca 2 weeks, 4 days ago
2026-01-22, 02:22:49 +0800, Yangfl wrote:
> On Thu, Jan 22, 2026 at 1:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > 2026-01-20, 17:21:29 +0800, David Yang wrote:
> > > The following (anti-)pattern was observed in the code tree:
> > >
> > >         do {
> > >                 start = u64_stats_fetch_begin(&pstats->syncp);
> > >                 memcpy(&temp, &pstats->stats, sizeof(temp));
> > >         } while (u64_stats_fetch_retry(&pstats->syncp, start));
> > >
> > > On 64bit arches, struct u64_stats_sync is empty and provides no help
> > > against load/store tearing, especially for memcpy(), for which arches may
> > > provide their highly-optimized implements.
> > >
> > > In theory the affected code should convert to u64_stats_t, or use
> > > READ_ONCE()/WRITE_ONCE() properly.
> > >
> > > However since there are needs to copy chunks of statistics, instead of
> > > writing loops at random places, we provide a safe memcpy() variant for
> > > u64_stats.
> > >
> > > Signed-off-by: David Yang <mmyangfl@gmail.com>
> > > ---
> > >  include/linux/u64_stats_sync.h | 15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > >
> > > diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
> > > index 457879938fc1..849ff6e159c6 100644
> > > --- a/include/linux/u64_stats_sync.h
> > > +++ b/include/linux/u64_stats_sync.h
> > > @@ -79,6 +79,14 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
> > >       return local64_read(&p->v);
> > >  }
> > >
> > > +static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
> > > +{
> > > +     BUILD_BUG_ON(len % sizeof(u64_stats_t));
> > > +     for (size_t i = 0; i < len / sizeof(u64_stats_t); i++)
> > > +             ((u64 *)dst)[i] = local64_read(&((local64_t *)src)[i]);
> >
> > Maybe u64_stats_read/u64_stats_t instead of local64_read/local64_t?
> >
> 
> I think casting to u64_stats_t is a bit overkill here since we accept
> const void * and we are the actual implementation.

It would be a bit more consistent. Just within this function you have
2 lines using u64_stats_t and the 3rd uses local64_t. And reusing
types/helpers within a similar context doesn't seem overkill.


[...]
> > Since this new helper is always used within a
> > u64_stats_fetch_begin/u64_stats_fetch_retry loop, maybe it would be
> > nicer to push the retry loop into the helper as well?  Not a strong
> > opinion. It would be a bit "simpler" for the callers, but your current
> > proposal has the advantage of looking like memcpy(), and of also
> > looking (for the caller) like other retry loops fetching each counter
> > explicitly.
> >
> 
> The callers may want to copy other discontinuous data as well, albeit
> no one did it then.

I'm not sure why they would. I think the main point of using memcpy is
"I don't want to copy each counter by name one by one", and possibly
"I don't want to have to patch this code as well if we add a new
counter". If you already have a batch copy for a bunch of counters,
it's usually easier to add others in a contiguous block.


> It would be redundant to provide two variants of the function.
> Moreover, callers can (and already) invent their own reader/writer
> helpers, for example
> 
>          #define SLIC_GET_STATS_COUNTER(newst, st, counter) \
>          { \
>                   unsigned int start; \
>                   do { \
>                           start = u64_stats_fetch_begin(&(st)->syncp); \
>                           newst = u64_stats_read(&(st)->counter); \
>                   } while (u64_stats_fetch_retry(&(st)->syncp, start)); \
>          }

Probably because the retry loop is a bit cumbersome and they'd rather
not c/p it everywhere, and see it in the middle of whatever function
needs it.

> > Either way, I think extending the "Usage" section of the big comment
> > at the top of the file with this new helper would be nice.
> >
> 
> I think callers should avoid memcpy() eventually, and they almost
> certainly copied more data than what they need. However, I took a look
> at some instances, and it would be non trivial to modify those
> drivers.

I'm not asking you to fix something else. But for example commit
316580b69d0a ("u64_stats: provide u64_stats_t type") modified the bit
of documentation we have at the top of the file to help developers who
want to use this API. This patch is introducing a new function and
should also describe how to use it, so that new users aren't tempted
to re-introduce a memcpy.

-- 
Sabrina
Re: [PATCH net-next 1/4] u64_stats: Introduce u64_stats_copy()
Posted by Yangfl 2 weeks, 3 days ago
On Thu, Jan 22, 2026 at 7:20 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2026-01-22, 02:22:49 +0800, Yangfl wrote:
> > On Thu, Jan 22, 2026 at 1:23 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > 2026-01-20, 17:21:29 +0800, David Yang wrote:
> > > > The following (anti-)pattern was observed in the code tree:
> > > >
> > > >         do {
> > > >                 start = u64_stats_fetch_begin(&pstats->syncp);
> > > >                 memcpy(&temp, &pstats->stats, sizeof(temp));
> > > >         } while (u64_stats_fetch_retry(&pstats->syncp, start));
> > > >
> > > > On 64bit arches, struct u64_stats_sync is empty and provides no help
> > > > against load/store tearing, especially for memcpy(), for which arches may
> > > > provide their highly-optimized implements.
> > > >
> > > > In theory the affected code should convert to u64_stats_t, or use
> > > > READ_ONCE()/WRITE_ONCE() properly.
> > > >
> > > > However since there are needs to copy chunks of statistics, instead of
> > > > writing loops at random places, we provide a safe memcpy() variant for
> > > > u64_stats.
> > > >
> > > > Signed-off-by: David Yang <mmyangfl@gmail.com>
> > > > ---
> > > >  include/linux/u64_stats_sync.h | 15 +++++++++++++++
> > > >  1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
> > > > index 457879938fc1..849ff6e159c6 100644
> > > > --- a/include/linux/u64_stats_sync.h
> > > > +++ b/include/linux/u64_stats_sync.h
> > > > @@ -79,6 +79,14 @@ static inline u64 u64_stats_read(const u64_stats_t *p)
> > > >       return local64_read(&p->v);
> > > >  }
> > > >
> > > > +static inline void *u64_stats_copy(void *dst, const void *src, size_t len)
> > > > +{
> > > > +     BUILD_BUG_ON(len % sizeof(u64_stats_t));
> > > > +     for (size_t i = 0; i < len / sizeof(u64_stats_t); i++)
> > > > +             ((u64 *)dst)[i] = local64_read(&((local64_t *)src)[i]);
> > >
> > > Maybe u64_stats_read/u64_stats_t instead of local64_read/local64_t?
> > >
> >
> > I think casting to u64_stats_t is a bit overkill here since we accept
> > const void * and we are the actual implementation.
>
> It would be a bit more consistent. Just within this function you have
> 2 lines using u64_stats_t and the 3rd uses local64_t. And reusing
> types/helpers within a similar context doesn't seem overkill.
>
>
> [...]
> > > Since this new helper is always used within a
> > > u64_stats_fetch_begin/u64_stats_fetch_retry loop, maybe it would be
> > > nicer to push the retry loop into the helper as well?  Not a strong
> > > opinion. It would be a bit "simpler" for the callers, but your current
> > > proposal has the advantage of looking like memcpy(), and of also
> > > looking (for the caller) like other retry loops fetching each counter
> > > explicitly.
> > >
> >
> > The callers may want to copy other discontinuous data as well, albeit
> > no one did it then.
>
> I'm not sure why they would. I think the main point of using memcpy is
> "I don't want to copy each counter by name one by one", and possibly
> "I don't want to have to patch this code as well if we add a new
> counter". If you already have a batch copy for a bunch of counters,
> it's usually easier to add others in a contiguous block.
>

While I agree with your statement, I don't think it's a good idea to
push the retry loop into the helper. u64_stats_copy(syncp, dst, src,
len) would be a strange API, while no others accept syncp argument.
Also, it would give a false appearance to those who read the driver
code, since it does not involve fetch_begin()/fetch_retry()
explicitly.

In my opinion, it would be better to introduce a for-like macro
#define with_u64_stats_fetch(syncp), eliminating two function calls as
well as one variable declaration altogether.