[PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer

Ard Biesheuvel posted 19 patches 1 year, 10 months ago
There is a newer version of this series
[PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer
Posted by Ard Biesheuvel 1 year, 10 months ago
From: Ard Biesheuvel <ardb@kernel.org>

Since commit 866b556efa12 ("x86/head/64: Install startup GDT"), the
primary startup sequence sets the code segment register (CS) to __KERNEL_CS
before calling into the startup code shared between primary and
secondary boot.

This means a simple indirect call is sufficient here.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/head_64.S | 35 ++------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..4017a49d7b76 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	movq	%r15, %rdi
 
 .Ljump_to_C_code:
-	/*
-	 * Jump to run C code and to be on a real kernel address.
-	 * Since we are running on identity-mapped space we have to jump
-	 * to the full 64bit address, this is only possible as indirect
-	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
-	 *
-	 * Note: do not change to far jump indirect with 64bit offset.
-	 *
-	 * AMD does not support far jump indirect with 64bit offset.
-	 * AMD64 Architecture Programmer's Manual, Volume 3: states only
-	 *	JMP FAR mem16:16 FF /5 Far jump indirect,
-	 *		with the target specified by a far pointer in memory.
-	 *	JMP FAR mem16:32 FF /5 Far jump indirect,
-	 *		with the target specified by a far pointer in memory.
-	 *
-	 * Intel64 does support 64bit offset.
-	 * Software Developer Manual Vol 2: states:
-	 *	FF /5 JMP m16:16 Jump far, absolute indirect,
-	 *		address given in m16:16
-	 *	FF /5 JMP m16:32 Jump far, absolute indirect,
-	 *		address given in m16:32.
-	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
-	 *		address given in m16:64.
-	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
 	xorl	%ebp, %ebp	# clear frame pointer
-	movq	initial_code(%rip), %rax
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
-.Lafter_lret:
-	ANNOTATE_NOENDBR
+	ANNOTATE_RETPOLINE_SAFE
+	callq	*initial_code(%rip)
+	int3
 SYM_CODE_END(secondary_startup_64)
 
 #include "verify_cpu.S"
-- 
2.43.0.429.g432eaa2c6b-goog
Re: [PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer
Posted by Borislav Petkov 1 year, 10 months ago
On Mon, Jan 29, 2024 at 07:05:06PM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Since commit 866b556efa12 ("x86/head/64: Install startup GDT"), the
> primary startup sequence sets the code segment register (CS) to __KERNEL_CS
> before calling into the startup code shared between primary and
> secondary boot.
> 
> This means a simple indirect call is sufficient here.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/head_64.S | 35 ++------------------
>  1 file changed, 3 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d4918d03efb4..4017a49d7b76 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
>  	movq	%r15, %rdi
>  
>  .Ljump_to_C_code:
> -	/*
> -	 * Jump to run C code and to be on a real kernel address.
> -	 * Since we are running on identity-mapped space we have to jump
> -	 * to the full 64bit address, this is only possible as indirect
> -	 * jump.  In addition we need to ensure %cs is set so we make this
> -	 * a far return.
> -	 *
> -	 * Note: do not change to far jump indirect with 64bit offset.
> -	 *
> -	 * AMD does not support far jump indirect with 64bit offset.
> -	 * AMD64 Architecture Programmer's Manual, Volume 3: states only
> -	 *	JMP FAR mem16:16 FF /5 Far jump indirect,
> -	 *		with the target specified by a far pointer in memory.
> -	 *	JMP FAR mem16:32 FF /5 Far jump indirect,
> -	 *		with the target specified by a far pointer in memory.
> -	 *
> -	 * Intel64 does support 64bit offset.
> -	 * Software Developer Manual Vol 2: states:
> -	 *	FF /5 JMP m16:16 Jump far, absolute indirect,
> -	 *		address given in m16:16
> -	 *	FF /5 JMP m16:32 Jump far, absolute indirect,
> -	 *		address given in m16:32.
> -	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> -	 *		address given in m16:64.
> -	 */
> -	pushq	$.Lafter_lret	# put return address on stack for unwinder
>  	xorl	%ebp, %ebp	# clear frame pointer
> -	movq	initial_code(%rip), %rax
> -	pushq	$__KERNEL_CS	# set correct cs
> -	pushq	%rax		# target address in negative space
> -	lretq
> -.Lafter_lret:
> -	ANNOTATE_NOENDBR
> +	ANNOTATE_RETPOLINE_SAFE
> +	callq	*initial_code(%rip)
> +	int3
>  SYM_CODE_END(secondary_startup_64)
>  
>  #include "verify_cpu.S"

objtool doesn't like it yet:

vmlinux.o: warning: objtool: verify_cpu+0x0: stack state mismatch: cfa1=4+8 cfa2=-1+0

Once we've solved this, I'll take this one even now - very nice cleanup!

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer
Posted by Ard Biesheuvel 1 year, 10 months ago
On Wed, 31 Jan 2024 at 14:45, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, Jan 29, 2024 at 07:05:06PM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Since commit 866b556efa12 ("x86/head/64: Install startup GDT"), the
> > primary startup sequence sets the code segment register (CS) to __KERNEL_CS
> > before calling into the startup code shared between primary and
> > secondary boot.
> >
> > This means a simple indirect call is sufficient here.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/head_64.S | 35 ++------------------
> >  1 file changed, 3 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index d4918d03efb4..4017a49d7b76 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> >       movq    %r15, %rdi
> >
> >  .Ljump_to_C_code:
> > -     /*
> > -      * Jump to run C code and to be on a real kernel address.
> > -      * Since we are running on identity-mapped space we have to jump
> > -      * to the full 64bit address, this is only possible as indirect
> > -      * jump.  In addition we need to ensure %cs is set so we make this
> > -      * a far return.
> > -      *
> > -      * Note: do not change to far jump indirect with 64bit offset.
> > -      *
> > -      * AMD does not support far jump indirect with 64bit offset.
> > -      * AMD64 Architecture Programmer's Manual, Volume 3: states only
> > -      *      JMP FAR mem16:16 FF /5 Far jump indirect,
> > -      *              with the target specified by a far pointer in memory.
> > -      *      JMP FAR mem16:32 FF /5 Far jump indirect,
> > -      *              with the target specified by a far pointer in memory.
> > -      *
> > -      * Intel64 does support 64bit offset.
> > -      * Software Developer Manual Vol 2: states:
> > -      *      FF /5 JMP m16:16 Jump far, absolute indirect,
> > -      *              address given in m16:16
> > -      *      FF /5 JMP m16:32 Jump far, absolute indirect,
> > -      *              address given in m16:32.
> > -      *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > -      *              address given in m16:64.
> > -      */
> > -     pushq   $.Lafter_lret   # put return address on stack for unwinder
> >       xorl    %ebp, %ebp      # clear frame pointer
> > -     movq    initial_code(%rip), %rax
> > -     pushq   $__KERNEL_CS    # set correct cs
> > -     pushq   %rax            # target address in negative space
> > -     lretq
> > -.Lafter_lret:
> > -     ANNOTATE_NOENDBR
> > +     ANNOTATE_RETPOLINE_SAFE
> > +     callq   *initial_code(%rip)
> > +     int3
> >  SYM_CODE_END(secondary_startup_64)
> >
> >  #include "verify_cpu.S"
>
> objtool doesn't like it yet:
>
> vmlinux.o: warning: objtool: verify_cpu+0x0: stack state mismatch: cfa1=4+8 cfa2=-1+0
>
> Once we've solved this, I'll take this one even now - very nice cleanup!
>

s/int3/RET seems to do the trick.

As long as there is an instruction that follows the callq, the
unwinder will see secondary_startup_64 at the base of the call stack.
We never return here anyway.
Re: [PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer
Posted by Ard Biesheuvel 1 year, 10 months ago
On Wed, 31 Jan 2024 at 14:57, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Wed, 31 Jan 2024 at 14:45, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Mon, Jan 29, 2024 at 07:05:06PM +0100, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Since commit 866b556efa12 ("x86/head/64: Install startup GDT"), the
> > > primary startup sequence sets the code segment register (CS) to __KERNEL_CS
> > > before calling into the startup code shared between primary and
> > > secondary boot.
> > >
> > > This means a simple indirect call is sufficient here.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/x86/kernel/head_64.S | 35 ++------------------
> > >  1 file changed, 3 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > > index d4918d03efb4..4017a49d7b76 100644
> > > --- a/arch/x86/kernel/head_64.S
> > > +++ b/arch/x86/kernel/head_64.S
> > > @@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> > >       movq    %r15, %rdi
> > >
> > >  .Ljump_to_C_code:
> > > -     /*
> > > -      * Jump to run C code and to be on a real kernel address.
> > > -      * Since we are running on identity-mapped space we have to jump
> > > -      * to the full 64bit address, this is only possible as indirect
> > > -      * jump.  In addition we need to ensure %cs is set so we make this
> > > -      * a far return.
> > > -      *
> > > -      * Note: do not change to far jump indirect with 64bit offset.
> > > -      *
> > > -      * AMD does not support far jump indirect with 64bit offset.
> > > -      * AMD64 Architecture Programmer's Manual, Volume 3: states only
> > > -      *      JMP FAR mem16:16 FF /5 Far jump indirect,
> > > -      *              with the target specified by a far pointer in memory.
> > > -      *      JMP FAR mem16:32 FF /5 Far jump indirect,
> > > -      *              with the target specified by a far pointer in memory.
> > > -      *
> > > -      * Intel64 does support 64bit offset.
> > > -      * Software Developer Manual Vol 2: states:
> > > -      *      FF /5 JMP m16:16 Jump far, absolute indirect,
> > > -      *              address given in m16:16
> > > -      *      FF /5 JMP m16:32 Jump far, absolute indirect,
> > > -      *              address given in m16:32.
> > > -      *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> > > -      *              address given in m16:64.
> > > -      */
> > > -     pushq   $.Lafter_lret   # put return address on stack for unwinder
> > >       xorl    %ebp, %ebp      # clear frame pointer
> > > -     movq    initial_code(%rip), %rax
> > > -     pushq   $__KERNEL_CS    # set correct cs
> > > -     pushq   %rax            # target address in negative space
> > > -     lretq
> > > -.Lafter_lret:
> > > -     ANNOTATE_NOENDBR
> > > +     ANNOTATE_RETPOLINE_SAFE
> > > +     callq   *initial_code(%rip)
> > > +     int3
> > >  SYM_CODE_END(secondary_startup_64)
> > >
> > >  #include "verify_cpu.S"
> >
> > objtool doesn't like it yet:
> >
> > vmlinux.o: warning: objtool: verify_cpu+0x0: stack state mismatch: cfa1=4+8 cfa2=-1+0
> >
> > Once we've solved this, I'll take this one even now - very nice cleanup!
> >
>
> s/int3/RET seems to do the trick.
>

or ud2, even better,
Re: [PATCH v3 03/19] x86/startup_64: Drop long return to initial_code pointer
Posted by Borislav Petkov 1 year, 10 months ago
On Wed, Jan 31, 2024 at 03:07:50PM +0100, Ard Biesheuvel wrote:
> > s/int3/RET seems to do the trick.
> >
> or ud2, even better,

Yap, that does it. And yes, we don't return here. I guess objtool
complains because

"7. file: warning: objtool: func()+0x5c: stack state mismatch

   The instruction's frame pointer state is inconsistent, depending on
   which execution path was taken to reach the instruction.

   ...

   Another possibility is that the code has some asm or inline asm which
   does some unusual things to the stack or the frame pointer.  In such
   cases it's probably appropriate to use the unwind hint macros in
   asm/unwind_hints.h.
"

Lemme test this one a bit on my machines and queue it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
[tip: x86/boot] x86/startup_64: Drop long return to initial_code pointer
Posted by tip-bot2 for Ard Biesheuvel 1 year, 10 months ago
The following commit has been merged into the x86/boot branch of tip:

Commit-ID:     15675706241887ed7fdad9e91f4bf977b9896d0f
Gitweb:        https://git.kernel.org/tip/15675706241887ed7fdad9e91f4bf977b9896d0f
Author:        Ard Biesheuvel <ardb@kernel.org>
AuthorDate:    Mon, 29 Jan 2024 19:05:06 +01:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Wed, 31 Jan 2024 18:31:21 +01:00

x86/startup_64: Drop long return to initial_code pointer

Since

  866b556efa12 ("x86/head/64: Install startup GDT")

the primary startup sequence sets the code segment register (CS) to
__KERNEL_CS before calling into the startup code shared between primary
and secondary boot.

This means a simple indirect call is sufficient here.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20240129180502.4069817-24-ardb+git@google.com
---
 arch/x86/kernel/head_64.S | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d0..bfbac50 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -428,39 +428,10 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	movq	%r15, %rdi
 
 .Ljump_to_C_code:
-	/*
-	 * Jump to run C code and to be on a real kernel address.
-	 * Since we are running on identity-mapped space we have to jump
-	 * to the full 64bit address, this is only possible as indirect
-	 * jump.  In addition we need to ensure %cs is set so we make this
-	 * a far return.
-	 *
-	 * Note: do not change to far jump indirect with 64bit offset.
-	 *
-	 * AMD does not support far jump indirect with 64bit offset.
-	 * AMD64 Architecture Programmer's Manual, Volume 3: states only
-	 *	JMP FAR mem16:16 FF /5 Far jump indirect,
-	 *		with the target specified by a far pointer in memory.
-	 *	JMP FAR mem16:32 FF /5 Far jump indirect,
-	 *		with the target specified by a far pointer in memory.
-	 *
-	 * Intel64 does support 64bit offset.
-	 * Software Developer Manual Vol 2: states:
-	 *	FF /5 JMP m16:16 Jump far, absolute indirect,
-	 *		address given in m16:16
-	 *	FF /5 JMP m16:32 Jump far, absolute indirect,
-	 *		address given in m16:32.
-	 *	REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
-	 *		address given in m16:64.
-	 */
-	pushq	$.Lafter_lret	# put return address on stack for unwinder
 	xorl	%ebp, %ebp	# clear frame pointer
-	movq	initial_code(%rip), %rax
-	pushq	$__KERNEL_CS	# set correct cs
-	pushq	%rax		# target address in negative space
-	lretq
-.Lafter_lret:
-	ANNOTATE_NOENDBR
+	ANNOTATE_RETPOLINE_SAFE
+	callq	*initial_code(%rip)
+	ud2
 SYM_CODE_END(secondary_startup_64)
 
 #include "verify_cpu.S"