lib/string.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
sized_strscpy() performs word-at-a-time writes to the destination
buffer. If the destination buffer is not aligned to unsigned long,
direct assignment causes UBSAN misaligned-access errors.
Use put_unaligned() to safely write the words to the destination.
Fixes: 30035e45753b7 ("string: provide strscpy()")
Signed-off-by: Fuad Tabba <tabba@google.com>
---
lib/string.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/string.c b/lib/string.c
index b632c71df1a5..a1697bf72078 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -157,16 +157,16 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
if (has_zero(c, &data, &constants)) {
data = prep_zero_mask(c, data, &constants);
data = create_zero_mask(data);
- *(unsigned long *)(dest+res) = c & zero_bytemask(data);
+ put_unaligned(c & zero_bytemask(data), (unsigned long *)(dest+res));
return res + find_zero(data);
}
count -= sizeof(unsigned long);
if (unlikely(!count)) {
c &= ALLBUTLAST_BYTE_MASK;
- *(unsigned long *)(dest+res) = c;
+ put_unaligned(c, (unsigned long *)(dest+res));
return -E2BIG;
}
- *(unsigned long *)(dest+res) = c;
+ put_unaligned(c, (unsigned long *)(dest+res));
res += sizeof(unsigned long);
max -= sizeof(unsigned long);
}
--
2.53.0.371.g1d285c8824-goog
On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote:
> sized_strscpy() performs word-at-a-time writes to the destination
> buffer. If the destination buffer is not aligned to unsigned long,
> direct assignment causes UBSAN misaligned-access errors.
Is this via CONFIG_UBSAN_ALIGNMENT=y ? Note this in the Kconfig:
Enabling this option on architectures that support unaligned
accesses may produce a lot of false positives.
which architecture are you checking this on?
> Use put_unaligned() to safely write the words to the destination.
Also, I thought the word-at-a-time work in sized_strscpy() was
specifically to take advantage of aligned word writes? This doesn't seem
like the right solution, and I think we're already disabling the
unaligned access by using "max=0" in the earlier checks.
I think the bug may be that you got CONFIG_UBSAN_ALIGNMENT enabled for
an arch that doesn't suffer from unaligned access problems. :) We should
fix the Kconfig!
-Kees
>
> Fixes: 30035e45753b7 ("string: provide strscpy()")
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> lib/string.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/lib/string.c b/lib/string.c
> index b632c71df1a5..a1697bf72078 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -157,16 +157,16 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
> if (has_zero(c, &data, &constants)) {
> data = prep_zero_mask(c, data, &constants);
> data = create_zero_mask(data);
> - *(unsigned long *)(dest+res) = c & zero_bytemask(data);
> + put_unaligned(c & zero_bytemask(data), (unsigned long *)(dest+res));
> return res + find_zero(data);
> }
> count -= sizeof(unsigned long);
> if (unlikely(!count)) {
> c &= ALLBUTLAST_BYTE_MASK;
> - *(unsigned long *)(dest+res) = c;
> + put_unaligned(c, (unsigned long *)(dest+res));
> return -E2BIG;
> }
> - *(unsigned long *)(dest+res) = c;
> + put_unaligned(c, (unsigned long *)(dest+res));
> res += sizeof(unsigned long);
> max -= sizeof(unsigned long);
> }
> --
> 2.53.0.371.g1d285c8824-goog
>
>
--
Kees Cook
Hi Kees,
On Tue, 24 Feb 2026 at 21:07, Kees Cook <kees@kernel.org> wrote:
>
> On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote:
> > sized_strscpy() performs word-at-a-time writes to the destination
> > buffer. If the destination buffer is not aligned to unsigned long,
> > direct assignment causes UBSAN misaligned-access errors.
>
> Is this via CONFIG_UBSAN_ALIGNMENT=y ? Note this in the Kconfig:
>
> Enabling this option on architectures that support unaligned
> accesses may produce a lot of false positives.
>
> which architecture are you checking this on?
This is with CONFIG_UBSAN_ALIGNMENT=y on arm64. Although the
architecture supports unaligned accesses, I was running the UBSAN
checks (including the alignment ones) the other day while debugging an
unrelated issue. That said, the alignment checks ensure C standard
compliance and prevent the compiler from optimizing unaligned UB casts
into alignment-strict instructions (like ldp/stp or vector
instructions on arm64, which cause hardware faults).
> > Use put_unaligned() to safely write the words to the destination.
>
> Also, I thought the word-at-a-time work in sized_strscpy() was
> specifically to take advantage of aligned word writes? This doesn't seem
> like the right solution, and I think we're already disabling the
> unaligned access by using "max=0" in the earlier checks.
The max=0 check is heavily guarded. Both x86 and arm64 select
CONFIG_DCACHE_WORD_ACCESS, bypassing it:
#ifndef CONFIG_DCACHE_WORD_ACCESS
// ... alignment checks that set max = 0 ...
#endif
I also noticed that the read path already expects and handles
unaligned addresses. If you look at load_unaligned_zeropad() (called
above the write), it explicitly loads an unaligned word and handles
potential page-crossing faults. The write path lacked the equivalent
put_unaligned() wrapper, leaving it exposed to UB.
I checked the disassembly on both x86 and aarch64: put_unaligned()
(via __builtin_memcpy) compiles to the same instructions (mov and
str), preserving the optimization while making the code UBSAN-clean.
> I think the bug may be that you got CONFIG_UBSAN_ALIGNMENT enabled for
> an arch that doesn't suffer from unaligned access problems. :) We should
> fix the Kconfig!
Does that reasoning make sense for keeping the fix here rather than in
the Kconfig?
Cheers,
/fuad
> -Kees
>
> >
> > Fixes: 30035e45753b7 ("string: provide strscpy()")
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > lib/string.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/string.c b/lib/string.c
> > index b632c71df1a5..a1697bf72078 100644
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -157,16 +157,16 @@ ssize_t sized_strscpy(char *dest, const char *src, size_t count)
> > if (has_zero(c, &data, &constants)) {
> > data = prep_zero_mask(c, data, &constants);
> > data = create_zero_mask(data);
> > - *(unsigned long *)(dest+res) = c & zero_bytemask(data);
> > + put_unaligned(c & zero_bytemask(data), (unsigned long *)(dest+res));
> > return res + find_zero(data);
> > }
> > count -= sizeof(unsigned long);
> > if (unlikely(!count)) {
> > c &= ALLBUTLAST_BYTE_MASK;
> > - *(unsigned long *)(dest+res) = c;
> > + put_unaligned(c, (unsigned long *)(dest+res));
> > return -E2BIG;
> > }
> > - *(unsigned long *)(dest+res) = c;
> > + put_unaligned(c, (unsigned long *)(dest+res));
> > res += sizeof(unsigned long);
> > max -= sizeof(unsigned long);
> > }
> > --
> > 2.53.0.371.g1d285c8824-goog
> >
> >
>
> --
> Kees Cook
On Wed, 25 Feb 2026 08:08:34 +0000
Fuad Tabba <tabba@google.com> wrote:
...
> I also noticed that the read path already expects and handles
> unaligned addresses. If you look at load_unaligned_zeropad() (called
> above the write), it explicitly loads an unaligned word and handles
> potential page-crossing faults. The write path lacked the equivalent
> put_unaligned() wrapper, leaving it exposed to UB.
Not really, the read side is doing reads that might go past the '\0'
that terminates the string.
If they are misaligned they can fault even though the string is
correctly terminated.
OTOH the write side must not write beyond the terminating '\0'
so the memory must always be there.
I didn't look at exactly how the 'word at a time' version terminates.
For strlen() doing it at all is pretty marginal for 32bit.
For strcpy() it may depend on whether the byte writes get merged
it the cpu's store buffer.
On 64bit you have to try quite hard to stop the compiler making
'a right pigs breakfast' of generating the constants; it is pretty
bad on x64-64, mips64 is a lot worse.
On LE with fast 'bit scan' you can quickly determine the number of
partial bytes - so they can be written without re-reading from
memory.
The actual problem with that version of strlen() is that it is
only faster for strings above (about and IIRC) 64 bytes long.
So in the kernel it is pretty much a complete waste of time.
The same is probably true for strscpy().
My guess is that the fastest code uses the 'unrolled once' loop:
do {
if ((dst[len] = src[len]) == 0)
break;
if ((dst[len + 1] = src[len + 1]) == 0) {
len++;
break;
}
} while ((len += 2) < lim);
(Provided it gets compiled reasonably).
Without the writes that was pretty much the best strlen() on the few cpu
I tried it on.
David
On Wed, Feb 25, 2026 at 08:08:34AM +0000, Fuad Tabba wrote: > On Tue, 24 Feb 2026 at 21:07, Kees Cook <kees@kernel.org> wrote: > > On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote: > > > sized_strscpy() performs word-at-a-time writes to the destination > > > buffer. If the destination buffer is not aligned to unsigned long, > > > direct assignment causes UBSAN misaligned-access errors. > > > > Is this via CONFIG_UBSAN_ALIGNMENT=y ? Note this in the Kconfig: > > > > Enabling this option on architectures that support unaligned > > accesses may produce a lot of false positives. > > > > which architecture are you checking this on? > > This is with CONFIG_UBSAN_ALIGNMENT=y on arm64. Although the > architecture supports unaligned accesses, I was running the UBSAN > checks (including the alignment ones) the other day while debugging an > unrelated issue. That said, the alignment checks ensure C standard > compliance and prevent the compiler from optimizing unaligned UB casts > into alignment-strict instructions (like ldp/stp or vector > instructions on arm64, which cause hardware faults). > > > > Use put_unaligned() to safely write the words to the destination. > > > > Also, I thought the word-at-a-time work in sized_strscpy() was > > specifically to take advantage of aligned word writes? This doesn't seem > > like the right solution, and I think we're already disabling the > > unaligned access by using "max=0" in the earlier checks. > > The max=0 check is heavily guarded. Both x86 and arm64 select > CONFIG_DCACHE_WORD_ACCESS, bypassing it: > > #ifndef CONFIG_DCACHE_WORD_ACCESS > // ... alignment checks that set max = 0 ... > #endif > > I also noticed that the read path already expects and handles > unaligned addresses. If you look at load_unaligned_zeropad() (called > above the write), it explicitly loads an unaligned word and handles > potential page-crossing faults. The write path lacked the equivalent > put_unaligned() wrapper, leaving it exposed to UB. Probably it needs to be reworked differently to provide write_at_a_time() helper? > I checked the disassembly on both x86 and aarch64: put_unaligned() > (via __builtin_memcpy) compiles to the same instructions (mov and > str), preserving the optimization while making the code UBSAN-clean. You need to check this on _all_ supported architectures with all possible related configuration option combinations. > > I think the bug may be that you got CONFIG_UBSAN_ALIGNMENT enabled for > > an arch that doesn't suffer from unaligned access problems. :) We should > > fix the Kconfig! > > Does that reasoning make sense for keeping the fix here rather than in > the Kconfig? -- With Best Regards, Andy Shevchenko
On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote:
> sized_strscpy() performs word-at-a-time writes to the destination
> buffer. If the destination buffer is not aligned to unsigned long,
> direct assignment causes UBSAN misaligned-access errors.
>
> Use put_unaligned() to safely write the words to the destination.
Have you measured the performance impact?
Have you read the comment near to
if (IS_ENABLED(CONFIG_KMSAN))
?
--
With Best Regards,
Andy Shevchenko
Hi Andy, On Tue, 24 Feb 2026 at 17:21, Andy Shevchenko <andriy.shevchenko@intel.com> wrote: > > On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote: > > sized_strscpy() performs word-at-a-time writes to the destination > > buffer. If the destination buffer is not aligned to unsigned long, > > direct assignment causes UBSAN misaligned-access errors. > > > > Use put_unaligned() to safely write the words to the destination. > > Have you measured the performance impact? Not directly. I verified the disassembly for both x86_64 and aarch64. On x86_64, both the raw pointer cast and put_unaligned() compile down to mov %rdi,(%rsi). On aarch64, both compile to str x0, [x1]. > Have you read the comment near to > > if (IS_ENABLED(CONFIG_KMSAN)) Not until now to be honest. However, are you asking whether put_unaligned() breaks KMSAN? I don't think it does, max is set to 0 when KMSAN is enabled, this entire while loop is bypassed. Thanks, /fuad > ? > > -- > With Best Regards, > Andy Shevchenko > >
On Tue, 24 Feb 2026 17:54:07 +0000 Fuad Tabba <tabba@google.com> wrote: > Hi Andy, > > On Tue, 24 Feb 2026 at 17:21, Andy Shevchenko > <andriy.shevchenko@intel.com> wrote: > > > > On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote: > > > sized_strscpy() performs word-at-a-time writes to the destination > > > buffer. If the destination buffer is not aligned to unsigned long, > > > direct assignment causes UBSAN misaligned-access errors. > > > > > > Use put_unaligned() to safely write the words to the destination. > > > > Have you measured the performance impact? > > Not directly. I verified the disassembly for both x86_64 and aarch64. > On x86_64, both the raw pointer cast and put_unaligned() compile down > to mov %rdi,(%rsi). On aarch64, both compile to str x0, [x1]. What happens on cpu that trap misaligned accesses (eg sparc64)? put_unaligned() exists because it can be horrid. David > > > Have you read the comment near to > > > > if (IS_ENABLED(CONFIG_KMSAN)) > > Not until now to be honest. However, are you asking whether > put_unaligned() breaks KMSAN? I don't think it does, max is set to 0 > when KMSAN is enabled, this entire while loop is bypassed. > > Thanks, > /fuad > > > ? > > > > -- > > With Best Regards, > > Andy Shevchenko > > > > >
Hi David,
On Tue, 24 Feb 2026 at 23:06, David Laight <david.laight.linux@gmail.com> wrote:
>
> On Tue, 24 Feb 2026 17:54:07 +0000
> Fuad Tabba <tabba@google.com> wrote:
>
> > Hi Andy,
> >
> > On Tue, 24 Feb 2026 at 17:21, Andy Shevchenko
> > <andriy.shevchenko@intel.com> wrote:
> > >
> > > On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote:
> > > > sized_strscpy() performs word-at-a-time writes to the destination
> > > > buffer. If the destination buffer is not aligned to unsigned long,
> > > > direct assignment causes UBSAN misaligned-access errors.
> > > >
> > > > Use put_unaligned() to safely write the words to the destination.
> > >
> > > Have you measured the performance impact?
> >
> > Not directly. I verified the disassembly for both x86_64 and aarch64.
> > On x86_64, both the raw pointer cast and put_unaligned() compile down
> > to mov %rdi,(%rsi). On aarch64, both compile to str x0, [x1].
>
> What happens on cpu that trap misaligned accesses (eg sparc64)?
> put_unaligned() exists because it can be horrid.
To be honest, I hadn't considered this until now. But looking at it, I
believe that the existing guards in sized_strscpy() already protect
architectures like sparc from the unaligned paths you are concerned
about. Looking at the code and configs, these do not select
CONFIG_DCACHE_WORD_ACCESS nor CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS.
Because of this, they fall into the #else block early in
sized_strscpy():
#ifndef CONFIG_DCACHE_WORD_ACCESS
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
...
#else
/* If src or dest is unaligned, don't do word-at-a-time. */
if (((long) dest | (long) src) & (sizeof(long) - 1))
max = 0;
#endif
#endif
If either dest or src is unaligned, max is set to 0. This bypasses the
loop with put_unaligned(). I checked by compiling this for sparc: if
aligned, the compiler sees that, and optimizes it into an 8-byte store
(stx %i0, [%i1]), identical to the raw pointer cast.
So this patch shouldn't introduce memcpy fallback penalties on sparc,
but it still fixes the UB on architectures like x86 and arm64.
Cheers,
/fuad
> David
>
> >
> > > Have you read the comment near to
> > >
> > > if (IS_ENABLED(CONFIG_KMSAN))
> >
> > Not until now to be honest. However, are you asking whether
> > put_unaligned() breaks KMSAN? I don't think it does, max is set to 0
> > when KMSAN is enabled, this entire while loop is bypassed.
> >
> > Thanks,
> > /fuad
> >
> > > ?
> > >
> > > --
> > > With Best Regards,
> > > Andy Shevchenko
> > >
> > >
> >
>
On Wed, 25 Feb 2026 08:33:01 +0000 Fuad Tabba <tabba@google.com> wrote: > Hi David, > > On Tue, 24 Feb 2026 at 23:06, David Laight <david.laight.linux@gmail.com> wrote: > > > > On Tue, 24 Feb 2026 17:54:07 +0000 > > Fuad Tabba <tabba@google.com> wrote: > > > > > Hi Andy, > > > > > > On Tue, 24 Feb 2026 at 17:21, Andy Shevchenko > > > <andriy.shevchenko@intel.com> wrote: > > > > > > > > On Tue, Feb 24, 2026 at 05:04:27PM +0000, Fuad Tabba wrote: > > > > > sized_strscpy() performs word-at-a-time writes to the destination > > > > > buffer. If the destination buffer is not aligned to unsigned long, > > > > > direct assignment causes UBSAN misaligned-access errors. > > > > > > > > > > Use put_unaligned() to safely write the words to the destination. > > > > > > > > Have you measured the performance impact? > > > > > > Not directly. I verified the disassembly for both x86_64 and aarch64. > > > On x86_64, both the raw pointer cast and put_unaligned() compile down > > > to mov %rdi,(%rsi). On aarch64, both compile to str x0, [x1]. > > > > What happens on cpu that trap misaligned accesses (eg sparc64)? > > put_unaligned() exists because it can be horrid. > > To be honest, I hadn't considered this until now. But looking at it, I > believe that the existing guards in sized_strscpy() already protect > architectures like sparc from the unaligned paths you are concerned > about. Looking at the code and configs, these do not select > CONFIG_DCACHE_WORD_ACCESS nor CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS. > Because of this, they fall into the #else block early in > sized_strscpy(): > > #ifndef CONFIG_DCACHE_WORD_ACCESS > #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS > ... > #else > /* If src or dest is unaligned, don't do word-at-a-time. */ > if (((long) dest | (long) src) & (sizeof(long) - 1)) > max = 0; > #endif > #endif > > If either dest or src is unaligned, max is set to 0. This bypasses the > loop with put_unaligned(). I checked by compiling this for sparc: if > aligned, the compiler sees that, and optimizes it into an 8-byte store > (stx %i0, [%i1]), identical to the raw pointer cast. That very much depends on the exactly how get/put_unaligned are implemented (and the behaviour of the compiler). ISTR something about not using 'casts to packed types' for the them, which might cause the compiler to generate other code. (Brain can't quite remember...) David > > So this patch shouldn't introduce memcpy fallback penalties on sparc, > but it still fixes the UB on architectures like x86 and arm64. > > Cheers, > /fuad > > > David > > > > > > > > > Have you read the comment near to > > > > > > > > if (IS_ENABLED(CONFIG_KMSAN)) > > > > > > Not until now to be honest. However, are you asking whether > > > put_unaligned() breaks KMSAN? I don't think it does, max is set to 0 > > > when KMSAN is enabled, this entire while loop is bypassed. > > > > > > Thanks, > > > /fuad > > > > > > > ? > > > > > > > > -- > > > > With Best Regards, > > > > Andy Shevchenko > > > > > > > > > > > > > >
Hi, [Andy] > You need to check this on _all_ supported architectures with all possible > related configuration option combinations. [David] > That very much depends on the exactly how get/put_unaligned are implemented > (and the behaviour of the compiler). > ISTR something about not using 'casts to packed types' for the them, > which might cause the compiler to generate other code. > (Brain can't quite remember...) Fair points all around. To be honest, proving the behavior across all supported architecture/config combinations is more than I can chew on right now. I initially stumbled across this UBSAN splat on arm64 while debugging an unrelated issue, and I thought a targeted put_unaligned() swap would be a straightforward fix. Given the complexity of the architectural and compiler quirks you've raised, I agree this needs a much deeper investigation, or potentially a new write_word_at_a_time() abstraction, as Andy suggested. I'll drop this patch for the time being. If I have the bandwidth in the future, and if this splat starts causing real problems, I might give it another go. Thanks for the thorough review and insights! At least I learned a bit from this. Cheers, /fuad > David > > > > > So this patch shouldn't introduce memcpy fallback penalties on sparc, > > but it still fixes the UB on architectures like x86 and arm64. > > > > Cheers, > > /fuad > > > > > David > > > > > > > > > > > > Have you read the comment near to > > > > > > > > > > if (IS_ENABLED(CONFIG_KMSAN)) > > > > > > > > Not until now to be honest. However, are you asking whether > > > > put_unaligned() breaks KMSAN? I don't think it does, max is set to 0 > > > > when KMSAN is enabled, this entire while loop is bypassed. > > > > > > > > Thanks, > > > > /fuad > > > > > > > > > ? > > > > > > > > > > -- > > > > > With Best Regards, > > > > > Andy Shevchenko > > > > > > > > > > > > > > > > > > > >
© 2016 - 2026 Red Hat, Inc.