tools/include/nolibc/arch-i386.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Hi Willy,
Just a single quick fix.
The ABI mandates that the %esp register must be a multiple of 16 when
executing a call instruction.
Commit 2ab446336b17 simplified the _start function, but it didn't take
care of the %esp alignment, causing SIGSEGV on SSE and AVX programs that
use aligned move instruction (e.g., movdqa, movaps, and vmovdqa).
$eax : 0x56559000 → 0x00003f90
$ebx : 0x56559000 → 0x00003f90
$ecx : 0x1
$edx : 0xf7fcaaa0 → endbr32
$esp : 0xffffcdbc → 0x00000001
$ebp : 0x0
$esi : 0xffffce7c → 0xffffd096
$edi : 0x56556060 → <_start+0> xor %ebp, %ebp
$eip : 0x56556489 → <sse_pq_add+25> movaps %xmm0, 0x30(%esp)
<sse_pq_add+11> pop %eax
<sse_pq_add+12> add $0x2b85, %eax
<sse_pq_add+18> movups -0x1fd0(%eax), %xmm0
→ <sse_pq_add+25> movaps %xmm0, 0x30(%esp) <== trapping instruction
<sse_pq_add+30> movups -0x1fe0(%eax), %xmm1
<sse_pq_add+37> movaps %xmm1, 0x20(%esp)
<sse_pq_add+42> movups -0x1ff0(%eax), %xmm2
<sse_pq_add+49> movaps %xmm2, 0x10(%esp)
<sse_pq_add+54> movups -0x2000(%eax), %xmm3
[#0] Id 1, Name: "test", stopped 0x56556489 in sse_pq_add (), reason: SIGSEGV
(gdb) bt
#0 0x56556489 in sse_pq_add ()
#1 0x5655608e in main ()
Ensure the %esp is a multiple of 16 when executing the call instruction.
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---
Ammar Faizi (1):
tools/nolibc: i386: Fix a stack misalign bug on _start
tools/include/nolibc/arch-i386.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
base-commit: 6269320850097903b30be8f07a5c61d9f7592393
--
Ammar Faizi
Hi, Ammar > Hi Willy, > > Just a single quick fix. > > The ABI mandates that the %esp register must be a multiple of 16 when > executing a call instruction. > > Commit 2ab446336b17 simplified the _start function, but it didn't take > care of the %esp alignment, causing SIGSEGV on SSE and AVX programs that > use aligned move instruction (e.g., movdqa, movaps, and vmovdqa). > Yeah, I have learned carefully about the old 'sub $4, %esp' instruction for the old 3 'push' instructions, but at last forgot to add a new instruction for the new single 'push' instruction to reserve the 16-byte alignment, very sorry for this bad regression. > $eax : 0x56559000 → 0x00003f90 > $ebx : 0x56559000 → 0x00003f90 > $ecx : 0x1 > $edx : 0xf7fcaaa0 → endbr32 > $esp : 0xffffcdbc → 0x00000001 > $ebp : 0x0 > $esi : 0xffffce7c → 0xffffd096 > $edi : 0x56556060 → <_start+0> xor %ebp, %ebp > $eip : 0x56556489 → <sse_pq_add+25> movaps %xmm0, 0x30(%esp) > > <sse_pq_add+11> pop %eax > <sse_pq_add+12> add $0x2b85, %eax > <sse_pq_add+18> movups -0x1fd0(%eax), %xmm0 > → <sse_pq_add+25> movaps %xmm0, 0x30(%esp) <== trapping instruction > <sse_pq_add+30> movups -0x1fe0(%eax), %xmm1 > <sse_pq_add+37> movaps %xmm1, 0x20(%esp) > <sse_pq_add+42> movups -0x1ff0(%eax), %xmm2 > <sse_pq_add+49> movaps %xmm2, 0x10(%esp) > <sse_pq_add+54> movups -0x2000(%eax), %xmm3 > > [#0] Id 1, Name: "test", stopped 0x56556489 in sse_pq_add (), reason: SIGSEGV > > (gdb) bt > #0 0x56556489 in sse_pq_add () > #1 0x5655608e in main () > Since we have a new 'startup' test group, do you have a short function to trigger this error? Perhaps it is time for us to add a new 'stack alignment' test case for all of the architectures. Thanks, Zhangjin > Ensure the %esp is a multiple of 16 when executing the call instruction. > > Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> > --- > Ammar Faizi (1): > tools/nolibc: i386: Fix a stack misalign bug on _start > > tools/include/nolibc/arch-i386.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > > base-commit: 6269320850097903b30be8f07a5c61d9f7592393 > -- > Ammar Faizi
On Sat, Aug 26, 2023 at 11:20:24PM +0800, Zhangjin Wu wrote:
> > $eax : 0x56559000 → 0x00003f90
> > $ebx : 0x56559000 → 0x00003f90
> > $ecx : 0x1
> > $edx : 0xf7fcaaa0 → endbr32
> > $esp : 0xffffcdbc → 0x00000001
> > $ebp : 0x0
> > $esi : 0xffffce7c → 0xffffd096
> > $edi : 0x56556060 → <_start+0> xor %ebp, %ebp
> > $eip : 0x56556489 → <sse_pq_add+25> movaps %xmm0, 0x30(%esp)
> >
> > <sse_pq_add+11> pop %eax
> > <sse_pq_add+12> add $0x2b85, %eax
> > <sse_pq_add+18> movups -0x1fd0(%eax), %xmm0
> > → <sse_pq_add+25> movaps %xmm0, 0x30(%esp) <== trapping instruction
> > <sse_pq_add+30> movups -0x1fe0(%eax), %xmm1
> > <sse_pq_add+37> movaps %xmm1, 0x20(%esp)
> > <sse_pq_add+42> movups -0x1ff0(%eax), %xmm2
> > <sse_pq_add+49> movaps %xmm2, 0x10(%esp)
> > <sse_pq_add+54> movups -0x2000(%eax), %xmm3
> >
> > [#0] Id 1, Name: "test", stopped 0x56556489 in sse_pq_add (), reason: SIGSEGV
> >
> > (gdb) bt
> > #0 0x56556489 in sse_pq_add ()
> > #1 0x5655608e in main ()
> >
>
> Since we have a new 'startup' test group, do you have a short function
> to trigger this error?
Here is a simple program to test the stack alignment.
#include "tools/include/nolibc/nolibc.h"
__asm__ (
"main:\n"
/*
* When the call main is executed, the
* %esp is 16 bytes aligned.
*
* Then, on function entry (%esp mod 16) == 12
* because the call instruction pushes 4 bytes
* onto the stack.
*
* subl $12, %esp will make (%esp mod 16) == 0
* again.
*/
"subl $12, %esp\n"
/*
* These move instructions will crash if %esp is
* not a multiple of 16.
*/
"movdqa (%esp), %xmm0\n"
"movdqa %xmm0, (%esp)\n"
"movaps (%esp), %xmm0\n"
"movaps %xmm0, (%esp)\n"
"addl $12, %esp\n"
"xorl %eax, %eax\n"
"ret\n"
);
> Perhaps it is time for us to add a new 'stack alignment' test case for
> all of the architectures.
I don't know the alignment rules for other architectures (I only work on
x86 and x86-64). While waiting for the maintainers' comment, I'll leave
the test case decision to you. Feel free to take the above code.
Extra:
It's also fine if you take my patch with the 'sub $(16 - 4), %esp'
change and batch it together in your next series.
--
Ammar Faizi
Hi, Ammar > On Sat, Aug 26, 2023 at 11:20:24PM +0800, Zhangjin Wu wrote: > > > $eax : 0x56559000 → 0x00003f90 > > > $ebx : 0x56559000 → 0x00003f90 > > > $ecx : 0x1 > > > $edx : 0xf7fcaaa0 → endbr32 > > > $esp : 0xffffcdbc → 0x00000001 > > > $ebp : 0x0 > > > $esi : 0xffffce7c → 0xffffd096 > > > $edi : 0x56556060 → <_start+0> xor %ebp, %ebp > > > $eip : 0x56556489 → <sse_pq_add+25> movaps %xmm0, 0x30(%esp) > > > > > > <sse_pq_add+11> pop %eax > > > <sse_pq_add+12> add $0x2b85, %eax > > > <sse_pq_add+18> movups -0x1fd0(%eax), %xmm0 > > > → <sse_pq_add+25> movaps %xmm0, 0x30(%esp) <== trapping instruction > > > <sse_pq_add+30> movups -0x1fe0(%eax), %xmm1 > > > <sse_pq_add+37> movaps %xmm1, 0x20(%esp) > > > <sse_pq_add+42> movups -0x1ff0(%eax), %xmm2 > > > <sse_pq_add+49> movaps %xmm2, 0x10(%esp) > > > <sse_pq_add+54> movups -0x2000(%eax), %xmm3 > > > > > > [#0] Id 1, Name: "test", stopped 0x56556489 in sse_pq_add (), reason: SIGSEGV > > > > > > (gdb) bt > > > #0 0x56556489 in sse_pq_add () > > > #1 0x5655608e in main () > > > > > > > Since we have a new 'startup' test group, do you have a short function > > to trigger this error? > > Here is a simple program to test the stack alignment. > > #include "tools/include/nolibc/nolibc.h" > > __asm__ ( > "main:\n" > /* > * When the call main is executed, the > * %esp is 16 bytes aligned. > * > * Then, on function entry (%esp mod 16) == 12 > * because the call instruction pushes 4 bytes > * onto the stack. > * > * subl $12, %esp will make (%esp mod 16) == 0 > * again. > */ > "subl $12, %esp\n" > > /* > * These move instructions will crash if %esp is > * not a multiple of 16. > */ > "movdqa (%esp), %xmm0\n" > "movdqa %xmm0, (%esp)\n" > "movaps (%esp), %xmm0\n" > "movaps %xmm0, (%esp)\n" > > "addl $12, %esp\n" > "xorl %eax, %eax\n" > "ret\n" > ); > Thanks very much for sharing this code. > > Perhaps it is time for us to add a new 'stack alignment' test case for > > all of the architectures. > > I don't know the alignment rules for other architectures (I only work on > x86 and x86-64). While waiting for the maintainers' comment, I'll leave > the test case decision to you. Feel free to take the above code. > Yes, the stack alignment rule is architecture dependent, so, we need more discussion and more work, not sure if there is a 'C' test function for all, let's delay this after v6.6. > Extra: > It's also fine if you take my patch with the 'sub $(16 - 4), %esp' > change and batch it together in your next series. > Ammar, your fixup patch is urgent since our _start_c() is for v6.6-rc1 (already in linux-next), let's wait for comments from Thomas or Willy, they will determine that merge it directly or require a v2. I'm ok with v1 code, but the old comment looks not that clear. Thanks, Zhangjin > -- > Ammar Faizi
© 2016 - 2025 Red Hat, Inc.