Align atomic storage

[RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 2 weeks, 4 days ago

Some recent commits incorrectly assumed 4-byte alignment of locks.
That assumption fails on Linux/m68k (and, interestingly, would have
failed on Linux/cris also). Specify the minimum alignment of atomic
variables for fewer surprises and (hopefully) better performance.

Consistent with i386, atomic64_t is not given natural alignment here.

Cc: Lance Yang <lance.yang@linux.dev>
Link: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com/
---
Changed since v1:
 - atomic64_t now gets an __aligned attribute too.
 - The 'Fixes' tag has been dropped because Lance sent a different fix
   for commit e711faaafbe5 ("hung_task: replace blocker_mutex with encoded
   blocker") that's suitable for -stable.
---
 include/asm-generic/atomic64.h | 2 +-
 include/linux/types.h          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/atomic64.h b/include/asm-generic/atomic64.h
index 100d24b02e52..7ae82ac17645 100644
--- a/include/asm-generic/atomic64.h
+++ b/include/asm-generic/atomic64.h
@@ -10,7 +10,7 @@
 #include <linux/types.h>
 
 typedef struct {
-	s64 counter;
+	s64 counter __aligned(sizeof(long));
 } atomic64_t;
 
 #define ATOMIC64_INIT(i)	{ (i) }
diff --git a/include/linux/types.h b/include/linux/types.h
index 6dfdb8e8e4c3..cd5b2b0f4b02 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -179,7 +179,7 @@ typedef phys_addr_t resource_size_t;
 typedef unsigned long irq_hw_number_t;
 
 typedef struct {
-	int counter;
+	int counter __aligned(sizeof(int));
 } atomic_t;
 
 #define ATOMIC_INIT(i) { (i) }
-- 
2.49.1

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Geert Uytterhoeven 1 week, 3 days ago

Hi Finn,

On Sun, 14 Sept 2025 at 02:59, Finn Thain <fthain@linux-m68k.org> wrote:
> Some recent commits incorrectly assumed 4-byte alignment of locks.
> That assumption fails on Linux/m68k (and, interestingly, would have
> failed on Linux/cris also). Specify the minimum alignment of atomic
> variables for fewer surprises and (hopefully) better performance.
>
> Consistent with i386, atomic64_t is not given natural alignment here.
>
> Cc: Lance Yang <lance.yang@linux.dev>
> Link: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com/

Thanks for your patch!

> --- a/include/asm-generic/atomic64.h
> +++ b/include/asm-generic/atomic64.h
> @@ -10,7 +10,7 @@
>  #include <linux/types.h>
>
>  typedef struct {
> -       s64 counter;
> +       s64 counter __aligned(sizeof(long));
>  } atomic64_t;
>
>  #define ATOMIC64_INIT(i)       { (i) }
> diff --git a/include/linux/types.h b/include/linux/types.h
> index 6dfdb8e8e4c3..cd5b2b0f4b02 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -179,7 +179,7 @@ typedef phys_addr_t resource_size_t;
>  typedef unsigned long irq_hw_number_t;
>
>  typedef struct {
> -       int counter;
> +       int counter __aligned(sizeof(int));
>  } atomic_t;
>
>  #define ATOMIC_INIT(i) { (i) }

This triggers a failure in kernel/bpf/rqspinlock.c:

kernel/bpf/rqspinlock.c: In function ‘bpf_res_spin_lock’:
include/linux/compiler_types.h:572:45: error: call to
‘__compiletime_assert_397’ declared with attribute error: BUILD_BUG_ON
failed: __alignof__(rqspinlock_t) != __alignof__(struct
bpf_res_spin_lock)
  572 |         _compiletime_assert(condition, msg,
__compiletime_assert_, __COUNTER__)
      |                                             ^
include/linux/compiler_types.h:553:25: note: in definition of macro
‘__compiletime_assert’
  553 |                         prefix ## suffix();
         \
      |                         ^~~~~~
include/linux/compiler_types.h:572:9: note: in expansion of macro
‘_compiletime_assert’
  572 |         _compiletime_assert(condition, msg,
__compiletime_assert_, __COUNTER__)
      |         ^~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:39:37: note: in expansion of macro
‘compiletime_assert’
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:50:9: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
   50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      |         ^~~~~~~~~~~~~~~~
kernel/bpf/rqspinlock.c:695:9: note: in expansion of macro ‘BUILD_BUG_ON’
  695 |         BUILD_BUG_ON(__alignof__(rqspinlock_t) !=
__alignof__(struct bpf_res_spin_lock));
      |         ^~~~~~~~~~~~

I haven't investigated it yet.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 1 week, 3 days ago

On Mon, 22 Sep 2025, Geert Uytterhoeven wrote:

> 
> This triggers a failure in kernel/bpf/rqspinlock.c:
> 
> kernel/bpf/rqspinlock.c: In function ‘bpf_res_spin_lock’:
> include/linux/compiler_types.h:572:45: error: call to
> ‘__compiletime_assert_397’ declared with attribute error: BUILD_BUG_ON
> failed: __alignof__(rqspinlock_t) != __alignof__(struct
> bpf_res_spin_lock)
>   572 |         _compiletime_assert(condition, msg,
> __compiletime_assert_, __COUNTER__)
>       |                                             ^
> include/linux/compiler_types.h:553:25: note: in definition of macro
> ‘__compiletime_assert’
>   553 |                         prefix ## suffix();
>          \
>       |                         ^~~~~~
> include/linux/compiler_types.h:572:9: note: in expansion of macro
> ‘_compiletime_assert’
>   572 |         _compiletime_assert(condition, msg,
> __compiletime_assert_, __COUNTER__)
>       |         ^~~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:39:37: note: in expansion of macro
> ‘compiletime_assert’
>    39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>       |                                     ^~~~~~~~~~~~~~~~~~
> include/linux/build_bug.h:50:9: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
>    50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>       |         ^~~~~~~~~~~~~~~~
> kernel/bpf/rqspinlock.c:695:9: note: in expansion of macro ‘BUILD_BUG_ON’
>   695 |         BUILD_BUG_ON(__alignof__(rqspinlock_t) !=
> __alignof__(struct bpf_res_spin_lock));
>       |         ^~~~~~~~~~~~
> 
> I haven't investigated it yet.
> 

Yes, I noticed that also, after I started building with defconfigs.
I pushed a new patch to my github repo.

https://github.com/fthain/linux/tree/atomic_t

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Arnd Bergmann 1 week, 3 days ago

On Mon, Sep 22, 2025, at 10:16, Finn Thain wrote:
>
> Yes, I noticed that also, after I started building with defconfigs.
> I pushed a new patch to my github repo.
>
> https://github.com/fthain/linux/tree/atomic_t

I had a look at that repository, and I think the extra
annotations you added on a lot of structures have the
same mistake that I made when looking at annotating
ABI structures for -malign-int earlier:

Adding __aligned__((sizeof(long))) to a structure definition
only changes the /outer/ alignment of the structure itself,
so e.g.

struct {
    short a;
    int b;
} __attribute__((aligned(sizeof(long))));

creates a structure with 4-byte alignment and 8-byte
size when building with -mno-align-int, but the padding
is added after 'b', which is still misaligned.

In order to align members the same way that -malign-int does,
each unaligned member needs a separate attribute to force
aligning it, like

struct {
    short a;
    int b __attribute__((aligned(sizeof(int))));
};

This is different from how __attribute__((packed)) works,
as that is always applied to each member of the struct
if you add it to the end.

      Arnd

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 1 week, 2 days ago

On Mon, 22 Sep 2025, Arnd Bergmann wrote:

> On Mon, Sep 22, 2025, at 10:16, Finn Thain wrote:
> >
> > Yes, I noticed that also, after I started building with defconfigs.
> > I pushed a new patch to my github repo.
> >
> > https://github.com/fthain/linux/tree/atomic_t
> 
> I had a look at that repository, and I think the extra annotations you 
> added on a lot of structures have the same mistake that I made when 
> looking at annotating ABI structures for -malign-int earlier:
> 
> Adding __aligned__((sizeof(long))) to a structure definition only 
> changes the /outer/ alignment of the structure itself,

Right. I should have dropped those changes earlier.

@@ -8,7 +8,7 @@ struct vfsmount;
 struct path {
        struct vfsmount *mnt;
        struct dentry *dentry;
-} __randomize_layout;
+} __aligned(sizeof(long)) __randomize_layout;

There's no need: struct path contains a struct dentry, which contains a 
seqcount_spinlock_t, which contains a spinlock_t which contains an 
atomic_t member, which is explicitly aligned.

Despite that, there's still some kmem cache or other allocator somewhere 
that has produced some misaligned path and dentry structures. So we get 
misaligned atomics somewhere in the VFS and TTY layers. I was unable to 
find those allocations.

> so e.g.
> 
> struct {
>     short a;
>     int b;
> } __attribute__((aligned(sizeof(long))));
> 
> creates a structure with 4-byte alignment and 8-byte
> size when building with -mno-align-int, but the padding
> is added after 'b', which is still misaligned.
> 

Sure but -mno-align-int isn't particularly relevant here as it's only for 
m68k. Besides, I'd like to avoid the mess that would create.

The basic aim here is to naturally align both atomic_t and atomic64_t on 
all architectures (my testing is confined to m68k) and to try to find out 
what that would cost and what benefit it might bring.

The patches you're reviewing were a hack to try to silence the WARN from 
CONFIG_DEBUG_ATOMIC, because that way I could try to assess the 
cost/benefit.

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 2 days, 14 hours ago

On Tue, 23 Sep 2025, I wrote:

> 
> ... there's still some kmem cache or other allocator somewhere that has 
> produced some misaligned path and dentry structures. So we get 
> misaligned atomics somewhere in the VFS and TTY layers. I was unable to 
> find those allocations.
> 

It turned out that the problem wasn't dynamic allocations, it was a local 
variable in the core locking code (kernel/locking/rwsem.c): a misaligned 
long used with an atomic operation (cmpxchg). To get natural alignment for 
64-bit quantities, I had to align other local variables as well, such as 
the one in ktime_get_real_ts64_mg() that's used with 
atomic64_try_cmpxchg(). The atomic_t branch in my github repo has the 
patches I wrote for that.

To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit 
atomic operations, for my small m68k .config, it was also necesary to 
increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a 
ARCH_SLAB_MINALIGN increase, as that wastes memory. I think it might be 
more useful to limit the alignment test for CONFIG_DEBUG_ATOMIC, as 
follows.

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 402a999a0d6b..cd569a87c0a8 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -68,7 +68,7 @@ static __always_inline void instrument_atomic_read(const volatile void *v, size_
 {
 	kasan_check_read(v, size);
 	kcsan_check_atomic_read(v, size);
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1)));
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1) & 3));
 }

 /**
@@ -83,7 +83,7 @@ static __always_inline void instrument_atomic_write(const volatile void *v, size
 {
 	kasan_check_write(v, size);
 	kcsan_check_atomic_write(v, size);
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1)));
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1) & 3));
 }

 /**
@@ -98,7 +98,7 @@ static __always_inline void instrument_atomic_read_write(const volatile void *v,
 {
 	kasan_check_write(v, size);
 	kcsan_check_atomic_read_write(v, size);
-	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1)));
+	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1) & 3));
 }

 /**

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Geert Uytterhoeven 2 days, 9 hours ago

Hi Finn,

On Tue, 30 Sept 2025 at 04:18, Finn Thain <fthain@linux-m68k.org> wrote:
> On Tue, 23 Sep 2025, I wrote:
> > ... there's still some kmem cache or other allocator somewhere that has
> > produced some misaligned path and dentry structures. So we get
> > misaligned atomics somewhere in the VFS and TTY layers. I was unable to
> > find those allocations.
>
> It turned out that the problem wasn't dynamic allocations, it was a local
> variable in the core locking code (kernel/locking/rwsem.c): a misaligned
> long used with an atomic operation (cmpxchg). To get natural alignment for
> 64-bit quantities, I had to align other local variables as well, such as
> the one in ktime_get_real_ts64_mg() that's used with
> atomic64_try_cmpxchg(). The atomic_t branch in my github repo has the
> patches I wrote for that.

So cmpxchg() and friends should not take a volatile void *, but (new)
properly-aligned types, using the new _Generic()?

> To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit
> atomic operations, for my small m68k .config, it was also necesary to
> increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a

Probably ARCH_SLAB_MINALIGN should be 4 on m68k.  Somehow I thought
that was already the case, but it is __alignof__(unsigned long long) = 2.

> ARCH_SLAB_MINALIGN increase, as that wastes memory. I think it might be
> more useful to limit the alignment test for CONFIG_DEBUG_ATOMIC, as
> follows.

Did you check what would be the actual impact of increasing it to 4 or 8?

> --- a/include/linux/instrumented.h
> +++ b/include/linux/instrumented.h
> @@ -68,7 +68,7 @@ static __always_inline void instrument_atomic_read(const volatile void *v, size_
>  {
>         kasan_check_read(v, size);
>         kcsan_check_atomic_read(v, size);
> -       WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1)));
> +       WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1) & 3));

I'd make that an arch-overridable define instead of hardcoded 3.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 1 day, 15 hours ago

On Tue, 30 Sep 2025, Geert Uytterhoeven wrote:

> > To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit 
> > atomic operations, for my small m68k .config, it was also necesary to 
> > increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a
> 
> Probably ARCH_SLAB_MINALIGN should be 4 on m68k.  Somehow I thought that 
> was already the case, but it is __alignof__(unsigned long long) = 2.
> 

I agree -- setting ARCH_SLAB_MINALIGN to 4 would be better style, and may 
avoid surprises in future. Right now that won't have any effect because 
that value gets increased to sizeof(void *) by calculate_alignment() and 
gets increased to ARCH_KMALLOC_MINALIGN or ARCH_DMA_MINALIGN by 
__kmalloc_minalign().

> > ARCH_SLAB_MINALIGN increase, as that wastes memory. I think it might be
> > more useful to limit the alignment test for CONFIG_DEBUG_ATOMIC, as
> > follows.
> 
> Did you check what would be the actual impact of increasing it to 4 or 8?
> 
> > --- a/include/linux/instrumented.h
> > +++ b/include/linux/instrumented.h
> > @@ -68,7 +68,7 @@ static __always_inline void instrument_atomic_read(const volatile void *v, size_
> >  {
> >         kasan_check_read(v, size);
> >         kcsan_check_atomic_read(v, size);
> > -       WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1)));
> > +       WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & (size - 1) & 3));
> 
> I'd make that an arch-overridable define instead of hardcoded 3.
> 

How about (sizeof(atomic_long_t) - 1)?

Can you comment on this, Peter?

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Geert Uytterhoeven 1 day, 9 hours ago

Hi Finn,

On Wed, 1 Oct 2025 at 03:46, Finn Thain <fthain@linux-m68k.org> wrote:
> On Tue, 30 Sep 2025, Geert Uytterhoeven wrote:
> > > To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit
> > > atomic operations, for my small m68k .config, it was also necesary to
> > > increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a
> >
> > Probably ARCH_SLAB_MINALIGN should be 4 on m68k.  Somehow I thought that
> > was already the case, but it is __alignof__(unsigned long long) = 2.
>
> I agree -- setting ARCH_SLAB_MINALIGN to 4 would be better style, and may
> avoid surprises in future. Right now that won't have any effect because
> that value gets increased to sizeof(void *) by calculate_alignment() and

Ah, so there it happens...

> gets increased to ARCH_KMALLOC_MINALIGN or ARCH_DMA_MINALIGN by
> __kmalloc_minalign().

Thanks for checking!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Arnd Bergmann 2 days, 10 hours ago

On Tue, Sep 30, 2025, at 04:18, Finn Thain wrote:
> On Tue, 23 Sep 2025, I wrote:
>> 
>> ... there's still some kmem cache or other allocator somewhere that has 
>> produced some misaligned path and dentry structures. So we get 
>> misaligned atomics somewhere in the VFS and TTY layers. I was unable to 
>> find those allocations.
>> 
>
> It turned out that the problem wasn't dynamic allocations, it was a local 
> variable in the core locking code (kernel/locking/rwsem.c): a misaligned 
> long used with an atomic operation (cmpxchg). To get natural alignment for 
> 64-bit quantities, I had to align other local variables as well, such as 
> the one in ktime_get_real_ts64_mg() that's used with 
> atomic64_try_cmpxchg(). The atomic_t branch in my github repo has the 
> patches I wrote for that.

It looks like the variable you get the warning for is not
even the atomic64_t but the 'old' argument to atomic64_try_cmpxchg(),
at least in some of the cases you found if not all of them.

I don't see where why there is a requirement to have that
aligned at all, even if we do require the atomic64_t to
be naturally aligned, and I would expect the same warning
to hit on x86-32 and the other architectures with 4-byte
alignment of u64 variable on stack and .data.

> To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit 
> atomic operations, for my small m68k .config, it was also necesary to 
> increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a 
> ARCH_SLAB_MINALIGN increase, as that wastes memory.

Have you tried to quantify the memory waste here? I assume
that most slab allocations are already 8-byte aligned, at
least kmalloc() with size>4, while custom caches are usually
done for larger structures where an extra average of 2 bytes
per allocation may not be that bad.

> diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
> index 402a999a0d6b..cd569a87c0a8 100644
> --- a/include/linux/instrumented.h
> +++ b/include/linux/instrumented.h
> @@ -68,7 +68,7 @@ static __always_inline void 
> instrument_atomic_read(const volatile void *v, size_
>  {
>  	kasan_check_read(v, size);
>  	kcsan_check_atomic_read(v, size);
> -	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & 
> (size - 1)));
> +	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & 
> (size - 1) & 3));
>  }

What is the alignment of stack variables on m68k? E.g. if you
have a function with two local variables, would that still
be able to trigger the check?

int f(atomic64_t *a)
{
     u16 pad;
     u64 old;
     
     g(&pad);
     atomic64_try_cmpxchg(a, &old, 0);
}

Since there is nothing telling the compiler that
the 'old' argument to atomic*_try_cmpcxchg() needs to
be naturally aligned, maybe that check should be changed
to only test for the ABI-guaranteed alignment? I think
that would still be needed on x86-32.
 
      Arnd

diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h
index 9409a6ddf3e0..e57763a889bd 100644
--- a/include/linux/atomic/atomic-instrumented.h
+++ b/include/linux/atomic/atomic-instrumented.h
@@ -1276,7 +1276,7 @@ atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 {
 	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_try_cmpxchg(v, old, new);
 }
 
@@ -1298,7 +1298,7 @@ static __always_inline bool
 atomic_try_cmpxchg_acquire(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -1321,7 +1321,7 @@ atomic_try_cmpxchg_release(atomic_t *v, int *old, int new)
 {
 	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_try_cmpxchg_release(v, old, new);
 }
 
@@ -1343,7 +1343,7 @@ static __always_inline bool
 atomic_try_cmpxchg_relaxed(atomic_t *v, int *old, int new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_try_cmpxchg_relaxed(v, old, new);
 }
 
@@ -2854,7 +2854,7 @@ atomic64_try_cmpxchg(atomic64_t *v, s64 *old, s64 new)
 {
 	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic64_try_cmpxchg(v, old, new);
 }
 
@@ -2876,7 +2876,7 @@ static __always_inline bool
 atomic64_try_cmpxchg_acquire(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic64_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -2899,7 +2899,7 @@ atomic64_try_cmpxchg_release(atomic64_t *v, s64 *old, s64 new)
 {
 	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic64_try_cmpxchg_release(v, old, new);
 }
 
@@ -2921,7 +2921,7 @@ static __always_inline bool
 atomic64_try_cmpxchg_relaxed(atomic64_t *v, s64 *old, s64 new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic64_try_cmpxchg_relaxed(v, old, new);
 }
 
@@ -4432,7 +4432,7 @@ atomic_long_try_cmpxchg(atomic_long_t *v, long *old, long new)
 {
 	kcsan_mb();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_long_try_cmpxchg(v, old, new);
 }
 
@@ -4454,7 +4454,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg_acquire(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_long_try_cmpxchg_acquire(v, old, new);
 }
 
@@ -4477,7 +4477,7 @@ atomic_long_try_cmpxchg_release(atomic_long_t *v, long *old, long new)
 {
 	kcsan_release();
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_long_try_cmpxchg_release(v, old, new);
 }
 
@@ -4499,7 +4499,7 @@ static __always_inline bool
 atomic_long_try_cmpxchg_relaxed(atomic_long_t *v, long *old, long new)
 {
 	instrument_atomic_read_write(v, sizeof(*v));
-	instrument_atomic_read_write(old, sizeof(*old));
+	instrument_atomic_read_write(old, alignof(*old));
 	return raw_atomic_long_try_cmpxchg_relaxed(v, old, new);
 }

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 1 day, 15 hours ago

On Tue, 30 Sep 2025, Arnd Bergmann wrote:

> On Tue, Sep 30, 2025, at 04:18, Finn Thain wrote:
> >
> > It turned out that the problem wasn't dynamic allocations, it was a 
> > local variable in the core locking code (kernel/locking/rwsem.c): a 
> > misaligned long used with an atomic operation (cmpxchg). To get 
> > natural alignment for 64-bit quantities, I had to align other local 
> > variables as well, such as the one in ktime_get_real_ts64_mg() that's 
> > used with atomic64_try_cmpxchg(). The atomic_t branch in my github 
> > repo has the patches I wrote for that.
> 
> It looks like the variable you get the warning for is not even the 
> atomic64_t but the 'old' argument to atomic64_try_cmpxchg(), at least in 
> some of the cases you found if not all of them.
> 
> I don't see where why there is a requirement to have that aligned at 
> all, even if we do require the atomic64_t to be naturally aligned, and I 
> would expect the same warning to hit on x86-32 and the other 
> architectures with 4-byte alignment of u64 variable on stack and .data.
> 

Right -- there's only one memory operand in a CAS instruction on m68k, and 
there's only one pointer in the C implementation in asm-generic.

> > To silence the misalignment WARN from CONFIG_DEBUG_ATOMIC, for 64-bit 
> > atomic operations, for my small m68k .config, it was also necesary to 
> > increase ARCH_SLAB_MINALIGN to 8. However, I'm not advocating a 
> > ARCH_SLAB_MINALIGN increase, as that wastes memory.
> 
> Have you tried to quantify the memory waste here?

I think it's entirely workload dependent. The memory efficiency question 
comes down to the misalignment distance as a proportion of the size of the 
allocation.

> I assume that most slab allocations are already 8-byte aligned, at least 
> kmalloc() with size>4, while custom caches are usually done for larger 
> structures where an extra average of 2 bytes per allocation may not be 
> that bad.
> 
> > diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
> > index 402a999a0d6b..cd569a87c0a8 100644
> > --- a/include/linux/instrumented.h
> > +++ b/include/linux/instrumented.h
> > @@ -68,7 +68,7 @@ static __always_inline void 
> > instrument_atomic_read(const volatile void *v, size_
> >  {
> >  	kasan_check_read(v, size);
> >  	kcsan_check_atomic_read(v, size);
> > -	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & 
> > (size - 1)));
> > +	WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ATOMIC) && ((unsigned long)v & 
> > (size - 1) & 3));
> >  }
> 
> What is the alignment of stack variables on m68k? E.g. if you have a 
> function with two local variables, would that still be able to trigger 
> the check?
> 
> int f(atomic64_t *a)
> {
>      u16 pad;
>      u64 old;
>      
>      g(&pad);
>      atomic64_try_cmpxchg(a, &old, 0);
> }
> 

I assume so:

int foo(void) {
    short s;
    long long ll;
    return alignof(ll);
}

# Compilation provided by Compiler Explorer at https://godbolt.org/
foo():
        link.w %fp,#0
        moveq #2,%d0
        unlk %fp
        rts

> Since there is nothing telling the compiler that the 'old' argument to 
> atomic*_try_cmpcxchg() needs to be naturally aligned, maybe that check 
> should be changed to only test for the ABI-guaranteed alignment? I think 
> that would still be needed on x86-32.
>  

I don't know why we would check the alignment of the 'old' quantity. It's 
going to be loaded into a register before being used, right?

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Arnd Bergmann 1 day, 10 hours ago

On Wed, Oct 1, 2025, at 03:03, Finn Thain wrote:
> On Tue, 30 Sep 2025, Arnd Bergmann wrote:

>> What is the alignment of stack variables on m68k? E.g. if you have a 
>> function with two local variables, would that still be able to trigger 
>> the check?
>> 
>> int f(atomic64_t *a)
>> {
>>      u16 pad;
>>      u64 old;
>>      
>>      g(&pad);
>>      atomic64_try_cmpxchg(a, &old, 0);
>> }
>> 
>
> I assume so:
>
> int foo(void) {
>     short s;
>     long long ll;
>     return alignof(ll);
> }
>
> # Compilation provided by Compiler Explorer at https://godbolt.org/
> foo():
>         link.w %fp,#0
>         moveq #2,%d0
>         unlk %fp
>         rts

This just returns the guaranteed alignment of the 'long long'
type based on -malign-int/-mno-align-int. Checking again I
find that gcc's m68k-linux target aligns stack allocations
to 4 bytes, though the m68k-unknown target apparently keeps
the 2 byte alignment:

https://gcc.gnu.org/legacy-ml/gcc-patches/2007-09/msg01572.html

https://godbolt.org/z/48fGMj56W

Surprisingly the godbolt.org link also shows a significant
overhead when building the same code with -malign-int
in the second tab. This is unrelated to the issue here,
but I wonder if that is something to report to the gcc
bug tracker if we ever get to building the kernel with
-malign-int. Similarly, I noticed that clang does not
support the -malign-int flag on m68k at all.

>> Since there is nothing telling the compiler that the 'old' argument to 
>> atomic*_try_cmpcxchg() needs to be naturally aligned, maybe that check 
>> should be changed to only test for the ABI-guaranteed alignment? I think 
>> that would still be needed on x86-32.
>>  
>
> I don't know why we would check the alignment of the 'old' quantity. It's 
> going to be loaded into a register before being used, right?

I was wondering about that as well, but checking for alignof(*old)
probably can't hurt. The only architectures that actually have
a custom arch_try_cmpxchg*() are s390 and x86 and those don't
care about alignmnent of 'old', but it's possible that another
architecture that can't handle unaligned load/store would add
an inline asm implementation in the future and break if an
alignment fixup happens in the middle of an ll/sc loop.

      Arnd

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Arnd Bergmann 1 week, 2 days ago

On Tue, Sep 23, 2025, at 08:28, Finn Thain wrote:
> On Mon, 22 Sep 2025, Arnd Bergmann wrote:
>> On Mon, Sep 22, 2025, at 10:16, Finn Thain wrote:
> @@ -8,7 +8,7 @@ struct vfsmount;
>  struct path {
>         struct vfsmount *mnt;
>         struct dentry *dentry;
> -} __randomize_layout;
> +} __aligned(sizeof(long)) __randomize_layout;
>
> There's no need: struct path contains a struct dentry, which contains a 
> seqcount_spinlock_t, which contains a spinlock_t which contains an 
> atomic_t member, which is explicitly aligned.
>
> Despite that, there's still some kmem cache or other allocator somewhere 
> that has produced some misaligned path and dentry structures. So we get 
> misaligned atomics somewhere in the VFS and TTY layers. I was unable to 
> find those allocations.

Ok, I see. Those would certainly be good to find. I would assume that
all kmem caches have a sensible alignment on each architecture, but
I think the definition in linux/slab.h actually ends up setting the
minimum to 2 here:

#ifndef ARCH_KMALLOC_MINALIGN
#define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
#elif ARCH_KMALLOC_MINALIGN > 8
#define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
#define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
#endif

#ifndef ARCH_SLAB_MINALIGN
#define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
#endif

Maybe we should just change __alignof__(unsigned long long)
to a plain '8' here and make that the minimum alignment
everywhere, same as the atomic64_t alignment change.

Alternatively, we can keep the __alignof__ here in order
to reduce padding on architectures with a default 4-byte
alignment for __u64, but then override ARCH_SLAB_MINALIGN
and ARCH_KMALLOC_MINALIGN on m68k to be '4' instead of '2'.

     Arnd

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Finn Thain 1 week, 2 days ago

On Tue, 23 Sep 2025, Arnd Bergmann wrote:

> On Tue, Sep 23, 2025, at 08:28, Finn Thain wrote:
> > On Mon, 22 Sep 2025, Arnd Bergmann wrote:
> >> On Mon, Sep 22, 2025, at 10:16, Finn Thain wrote:
> > @@ -8,7 +8,7 @@ struct vfsmount;
> >  struct path {
> >         struct vfsmount *mnt;
> >         struct dentry *dentry;
> > -} __randomize_layout;
> > +} __aligned(sizeof(long)) __randomize_layout;
> >
> > There's no need: struct path contains a struct dentry, which contains a 
> > seqcount_spinlock_t, which contains a spinlock_t which contains an 
> > atomic_t member, which is explicitly aligned.
> >
> > Despite that, there's still some kmem cache or other allocator somewhere 
> > that has produced some misaligned path and dentry structures. So we get 
> > misaligned atomics somewhere in the VFS and TTY layers. I was unable to 
> > find those allocations.
> 
> Ok, I see. Those would certainly be good to find. I would assume that 
> all kmem caches have a sensible alignment on each architecture, but I 
> think the definition in linux/slab.h actually ends up setting the 
> minimum to 2 here:
> 
> #ifndef ARCH_KMALLOC_MINALIGN
> #define ARCH_KMALLOC_MINALIGN __alignof__(unsigned long long)
> #elif ARCH_KMALLOC_MINALIGN > 8
> #define KMALLOC_MIN_SIZE ARCH_KMALLOC_MINALIGN
> #define KMALLOC_SHIFT_LOW ilog2(KMALLOC_MIN_SIZE)
> #endif
> 
> #ifndef ARCH_SLAB_MINALIGN
> #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
> #endif
> 
> Maybe we should just change __alignof__(unsigned long long) to a plain 
> '8' here and make that the minimum alignment everywhere, same as the 
> atomic64_t alignment change.
> 

It would be better (less wasteful of memory) to specify the alignment 
parameter to kmem_cache_create() only at those call sites where it 
matters.

> Alternatively, we can keep the __alignof__ here in order to reduce 
> padding on architectures with a default 4-byte alignment for __u64, but 
> then override ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN on m68k to be 
> '4' instead of '2'.
> 

Raising that to 4 would probably have little or no effect (for m68k or any 
other arch). Back when I prepared the RFC patch series, I instrumented 
create_cache() in mm/slab_common.c, and those caches that were allocated 
at boot (for my usual minimal m68k .config) were already aligned to 4 
bytes or 16.

Also, increasing ARCH_SLAB_MINALIGN to 8 didn't solve the VFS/TTY layer 
problem I have with CONFIG_DEBUG_ATOMIC on m68k. So the culprit is not the 
obvious suspect (a kmem cache of objects with atomic64_t members). There's 
some other allocator at work and it's aligning objects to 2 bytes not 4.

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Arnd Bergmann 1 week, 1 day ago

On Tue, Sep 23, 2025, at 10:05, Finn Thain wrote:
> On Tue, 23 Sep 2025, Arnd Bergmann wrote:
>> On Tue, Sep 23, 2025, at 08:28, Finn Thain wrote:
>>
>> Maybe we should just change __alignof__(unsigned long long) to a plain 
>> '8' here and make that the minimum alignment everywhere, same as the 
>> atomic64_t alignment change.
>> 
>
> It would be better (less wasteful of memory) to specify the alignment 
> parameter to kmem_cache_create() only at those call sites where it 
> matters.

I think that's the idea: __alignof__(unsigned long long) is the
alignment that the ABI requires for allocating arbitrary data
structures that may contain 64-bit integers, so this is already
the minimum. Architectures that may have noncoherent DMA master
devices often need to raise the ARCH_KMALLOC_MINALIGN to the cache
line size but leave the ARCH_SLAB_MINALIGN at the ABI-required
minimum value. Any caller that needs a higher alignment than the
ABI minimum needs to specify that.

>> Alternatively, we can keep the __alignof__ here in order to reduce 
>> padding on architectures with a default 4-byte alignment for __u64, but 
>> then override ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN on m68k to be 
>> '4' instead of '2'.
>> 
>
> Raising that to 4 would probably have little or no effect (for m68k or any 
> other arch). Back when I prepared the RFC patch series, I instrumented 
> create_cache() in mm/slab_common.c, and those caches that were allocated 
> at boot (for my usual minimal m68k .config) were already aligned to 4 
> bytes or 16.

Ok

> Also, increasing ARCH_SLAB_MINALIGN to 8 didn't solve the VFS/TTY layer 
> problem I have with CONFIG_DEBUG_ATOMIC on m68k. So the culprit is not the 
> obvious suspect (a kmem cache of objects with atomic64_t members). There's 
> some other allocator at work and it's aligning objects to 2 bytes not 4.

In that case, my guess would be something that concatenates structures
in some variant of

struct s1 {
     int a;
     short b;
     int rest[];
};

struct s2 {
     atomic_t a;
};

     struct s1 *obj1 = kmalloc(sizeof(struct s1) + sizeof(struct s2), GFP_KERNEL);
     struct s2 *obj2 = (void*)&obj1[1];

which causes problems because s2 has a larger alignment than s1.
We've had problems with DMA to structures like this in the past.

The more common construct that does

struct s1 {
     short a;
     struct s2;
};

should not produce misaligned members in the inner struct,
unless the outer one has a __packed attribute. Unfortunately
we disabled -Wpacked-not-aligned, which would otherwise catch
that problem.

     Arnd

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Geert Uytterhoeven 1 week, 3 days ago

Hi Finn,

On Mon, 22 Sept 2025 at 10:16, Finn Thain <fthain@linux-m68k.org> wrote:
> On Mon, 22 Sep 2025, Geert Uytterhoeven wrote:
> > This triggers a failure in kernel/bpf/rqspinlock.c:
> >
> > kernel/bpf/rqspinlock.c: In function ‘bpf_res_spin_lock’:
> > include/linux/compiler_types.h:572:45: error: call to
> > ‘__compiletime_assert_397’ declared with attribute error: BUILD_BUG_ON
> > failed: __alignof__(rqspinlock_t) != __alignof__(struct
> > bpf_res_spin_lock)
> >   572 |         _compiletime_assert(condition, msg,
> > __compiletime_assert_, __COUNTER__)
> >       |                                             ^
> > include/linux/compiler_types.h:553:25: note: in definition of macro
> > ‘__compiletime_assert’
> >   553 |                         prefix ## suffix();
> >          \
> >       |                         ^~~~~~
> > include/linux/compiler_types.h:572:9: note: in expansion of macro
> > ‘_compiletime_assert’
> >   572 |         _compiletime_assert(condition, msg,
> > __compiletime_assert_, __COUNTER__)
> >       |         ^~~~~~~~~~~~~~~~~~~
> > include/linux/build_bug.h:39:37: note: in expansion of macro
> > ‘compiletime_assert’
> >    39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> >       |                                     ^~~~~~~~~~~~~~~~~~
> > include/linux/build_bug.h:50:9: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
> >    50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
> >       |         ^~~~~~~~~~~~~~~~
> > kernel/bpf/rqspinlock.c:695:9: note: in expansion of macro ‘BUILD_BUG_ON’
> >   695 |         BUILD_BUG_ON(__alignof__(rqspinlock_t) !=
> > __alignof__(struct bpf_res_spin_lock));
> >       |         ^~~~~~~~~~~~
> >
> > I haven't investigated it yet.
>
> Yes, I noticed that also, after I started building with defconfigs.
> I pushed a new patch to my github repo.
>
> https://github.com/fthain/linux/tree/atomic_t

Thanks, the updated version fixes the issue.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

Re: [RFC v2 2/3] atomic: Specify alignment for atomic_t and atomic64_t

Posted by Geert Uytterhoeven 2 weeks, 3 days ago

Hi Finn,

Thanks for your patch!

On Sun, 14 Sept 2025 at 02:59, Finn Thain <fthain@linux-m68k.org> wrote:
> Some recent commits incorrectly assumed 4-byte alignment of locks.
> That assumption fails on Linux/m68k (and, interestingly, would have
> failed on Linux/cris also). Specify the minimum alignment of atomic
> variables for fewer surprises and (hopefully) better performance.
>
> Consistent with i386, atomic64_t is not given natural alignment here.

You forgot to drop this line.

>
> Cc: Lance Yang <lance.yang@linux.dev>
> Link: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com/
> ---
> Changed since v1:
>  - atomic64_t now gets an __aligned attribute too.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds