[PATCH v6 00/15] x86-64: Stack protector and percpu improvements

Brian Gerst posted 15 patches 10 months, 4 weeks ago
arch/x86/Kconfig                          |  11 +-
arch/x86/Makefile                         |  20 +--
arch/x86/boot/compressed/misc.c           |  14 +--
arch/x86/entry/entry.S                    |   2 -
arch/x86/entry/entry_64.S                 |   2 +-
arch/x86/include/asm/desc.h               |   1 -
arch/x86/include/asm/elf.h                |   3 +-
arch/x86/include/asm/percpu.h             |  22 ----
arch/x86/include/asm/processor.h          |  28 +----
arch/x86/include/asm/stackprotector.h     |  36 +-----
arch/x86/kernel/Makefile                  |   2 +
arch/x86/kernel/asm-offsets_64.c          |   6 -
arch/x86/kernel/cpu/common.c              |   9 +-
arch/x86/kernel/head64.c                  |   2 +-
arch/x86/kernel/head_64.S                 |  20 ++-
arch/x86/kernel/irq_64.c                  |   1 -
arch/x86/kernel/module.c                  |  15 +++
arch/x86/kernel/setup_percpu.c            |  12 +-
arch/x86/kernel/vmlinux.lds.S             |  35 ------
arch/x86/platform/pvh/head.S              |  14 ++-
arch/x86/tools/relocs.c                   | 147 ++--------------------
arch/x86/xen/xen-head.S                   |  10 +-
include/asm-generic/sections.h            |   2 +-
include/asm-generic/vmlinux.lds.h         |  38 +-----
include/linux/percpu-defs.h               |  12 --
init/Kconfig                              |   5 -
kernel/kallsyms.c                         |  12 +-
mm/percpu.c                               |   4 +-
scripts/gcc-x86_32-has-stack-protector.sh |   8 --
scripts/gcc-x86_64-has-stack-protector.sh |   4 -
scripts/kallsyms.c                        |  72 ++---------
scripts/link-vmlinux.sh                   |   4 -
scripts/min-tool-version.sh               |   2 +
33 files changed, 100 insertions(+), 475 deletions(-)
delete mode 100755 scripts/gcc-x86_32-has-stack-protector.sh
delete mode 100755 scripts/gcc-x86_64-has-stack-protector.sh
[PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months, 4 weeks ago
Currently, x86-64 uses an unusual percpu layout, where the percpu section
is linked at absolute address 0.  The reason behind this is that older GCC
versions placed the stack protector (if enabled) at a fixed offset from the
GS segment base.  Since the GS segement is also used for percpu variables,
this forced the current layout.

GCC since version 8.1 supports a configurable location for the stack
protector value, which allows removal of the restriction on how the percpu
section is linked.  This allows the percpu section to be linked normally,
like other architectures.  In turn, this allows removal of code that was
needed to support the zero-based percpu section.

v6:
- Rebased to current tip tree
- Dropped patches already applied
- Fixed typos in commit messages
- Added Reviewed-by tags

Ard Biesheuvel (1):
  x86/module: Deal with GOT based stack cookie load on Clang < 17

Brian Gerst (14):
  x86: Raise minimum GCC version to 8.1
  x86/stackprotector: Remove stack protector test scripts
  x86/boot: Disable stack protector for early boot code
  x86/pvh: Use fixed_percpu_data for early boot GSBASE
  x86/relocs: Handle R_X86_64_REX_GOTPCRELX relocations
  x86/stackprotector/64: Convert to normal percpu variable
  x86/percpu/64: Use relative percpu offsets
  x86/percpu/64: Remove fixed_percpu_data
  x86/boot/64: Remove inverse relocations
  x86/percpu/64: Remove INIT_PER_CPU macros
  percpu: Remove PER_CPU_FIRST_SECTION
  percpu: Remove PERCPU_VADDR()
  percpu: Remove __per_cpu_load
  kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPU

 arch/x86/Kconfig                          |  11 +-
 arch/x86/Makefile                         |  20 +--
 arch/x86/boot/compressed/misc.c           |  14 +--
 arch/x86/entry/entry.S                    |   2 -
 arch/x86/entry/entry_64.S                 |   2 +-
 arch/x86/include/asm/desc.h               |   1 -
 arch/x86/include/asm/elf.h                |   3 +-
 arch/x86/include/asm/percpu.h             |  22 ----
 arch/x86/include/asm/processor.h          |  28 +----
 arch/x86/include/asm/stackprotector.h     |  36 +-----
 arch/x86/kernel/Makefile                  |   2 +
 arch/x86/kernel/asm-offsets_64.c          |   6 -
 arch/x86/kernel/cpu/common.c              |   9 +-
 arch/x86/kernel/head64.c                  |   2 +-
 arch/x86/kernel/head_64.S                 |  20 ++-
 arch/x86/kernel/irq_64.c                  |   1 -
 arch/x86/kernel/module.c                  |  15 +++
 arch/x86/kernel/setup_percpu.c            |  12 +-
 arch/x86/kernel/vmlinux.lds.S             |  35 ------
 arch/x86/platform/pvh/head.S              |  14 ++-
 arch/x86/tools/relocs.c                   | 147 ++--------------------
 arch/x86/xen/xen-head.S                   |  10 +-
 include/asm-generic/sections.h            |   2 +-
 include/asm-generic/vmlinux.lds.h         |  38 +-----
 include/linux/percpu-defs.h               |  12 --
 init/Kconfig                              |   5 -
 kernel/kallsyms.c                         |  12 +-
 mm/percpu.c                               |   4 +-
 scripts/gcc-x86_32-has-stack-protector.sh |   8 --
 scripts/gcc-x86_64-has-stack-protector.sh |   4 -
 scripts/kallsyms.c                        |  72 ++---------
 scripts/link-vmlinux.sh                   |   4 -
 scripts/min-tool-version.sh               |   2 +
 33 files changed, 100 insertions(+), 475 deletions(-)
 delete mode 100755 scripts/gcc-x86_32-has-stack-protector.sh
 delete mode 100755 scripts/gcc-x86_64-has-stack-protector.sh


base-commit: b79d90e018587507fd42c4c888956668692ff431
-- 
2.47.1
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ingo Molnar 10 months ago
* Brian Gerst <brgerst@gmail.com> wrote:

> Currently, x86-64 uses an unusual percpu layout, where the percpu section
> is linked at absolute address 0.  The reason behind this is that older GCC
> versions placed the stack protector (if enabled) at a fixed offset from the
> GS segment base.  Since the GS segement is also used for percpu variables,
> this forced the current layout.
> 
> GCC since version 8.1 supports a configurable location for the stack
> protector value, which allows removal of the restriction on how the percpu
> section is linked.  This allows the percpu section to be linked normally,
> like other architectures.  In turn, this allows removal of code that was
> needed to support the zero-based percpu section.
> 
> v6:
> - Rebased to current tip tree
> - Dropped patches already applied
> - Fixed typos in commit messages
> - Added Reviewed-by tags
> 
> Ard Biesheuvel (1):
>   x86/module: Deal with GOT based stack cookie load on Clang < 17
> 
> Brian Gerst (14):
>   x86: Raise minimum GCC version to 8.1
>   x86/stackprotector: Remove stack protector test scripts
>   x86/boot: Disable stack protector for early boot code
>   x86/pvh: Use fixed_percpu_data for early boot GSBASE
>   x86/relocs: Handle R_X86_64_REX_GOTPCRELX relocations
>   x86/stackprotector/64: Convert to normal percpu variable
>   x86/percpu/64: Use relative percpu offsets
>   x86/percpu/64: Remove fixed_percpu_data
>   x86/boot/64: Remove inverse relocations
>   x86/percpu/64: Remove INIT_PER_CPU macros
>   percpu: Remove PER_CPU_FIRST_SECTION
>   percpu: Remove PERCPU_VADDR()
>   percpu: Remove __per_cpu_load
>   kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPU

>  33 files changed, 100 insertions(+), 475 deletions(-)
>  delete mode 100755 scripts/gcc-x86_32-has-stack-protector.sh
>  delete mode 100755 scripts/gcc-x86_64-has-stack-protector.sh

Thank you for doing this series - it all looks pretty good from my side 
and I've applied it experimentally to tip:x86/asm. I fixed up the trivial 
details other reviewers and me noticed.

Note that the merge is tentative, it might still need a rebase if some 
fundamental problem comes up - but let's see how testing goes in -next.

Thanks,

	Ingo
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Uros Bizjak 10 months ago
On Tue, Feb 18, 2025 at 10:22 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > Currently, x86-64 uses an unusual percpu layout, where the percpu section
> > is linked at absolute address 0.  The reason behind this is that older GCC
> > versions placed the stack protector (if enabled) at a fixed offset from the
> > GS segment base.  Since the GS segement is also used for percpu variables,
> > this forced the current layout.
> >
> > GCC since version 8.1 supports a configurable location for the stack
> > protector value, which allows removal of the restriction on how the percpu
> > section is linked.  This allows the percpu section to be linked normally,
> > like other architectures.  In turn, this allows removal of code that was
> > needed to support the zero-based percpu section.
> >
> > v6:
> > - Rebased to current tip tree
> > - Dropped patches already applied
> > - Fixed typos in commit messages
> > - Added Reviewed-by tags
> >
> > Ard Biesheuvel (1):
> >   x86/module: Deal with GOT based stack cookie load on Clang < 17
> >
> > Brian Gerst (14):
> >   x86: Raise minimum GCC version to 8.1
> >   x86/stackprotector: Remove stack protector test scripts
> >   x86/boot: Disable stack protector for early boot code
> >   x86/pvh: Use fixed_percpu_data for early boot GSBASE
> >   x86/relocs: Handle R_X86_64_REX_GOTPCRELX relocations
> >   x86/stackprotector/64: Convert to normal percpu variable
> >   x86/percpu/64: Use relative percpu offsets
> >   x86/percpu/64: Remove fixed_percpu_data
> >   x86/boot/64: Remove inverse relocations
> >   x86/percpu/64: Remove INIT_PER_CPU macros
> >   percpu: Remove PER_CPU_FIRST_SECTION
> >   percpu: Remove PERCPU_VADDR()
> >   percpu: Remove __per_cpu_load
> >   kallsyms: Remove KALLSYMS_ABSOLUTE_PERCPU
>
> >  33 files changed, 100 insertions(+), 475 deletions(-)
> >  delete mode 100755 scripts/gcc-x86_32-has-stack-protector.sh
> >  delete mode 100755 scripts/gcc-x86_64-has-stack-protector.sh
>
> Thank you for doing this series - it all looks pretty good from my side
> and I've applied it experimentally to tip:x86/asm. I fixed up the trivial
> details other reviewers and me noticed.
>
> Note that the merge is tentative, it might still need a rebase if some
> fundamental problem comes up - but let's see how testing goes in -next.

I wonder if there would be any benefit if stack canary is put into
struct pcpu_hot?

Uros.
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ingo Molnar 10 months ago
* Uros Bizjak <ubizjak@gmail.com> wrote:

> > Thank you for doing this series - it all looks pretty good from my 
> > side and I've applied it experimentally to tip:x86/asm. I fixed up 
> > the trivial details other reviewers and me noticed.
> >
> > Note that the merge is tentative, it might still need a rebase if 
> > some fundamental problem comes up - but let's see how testing goes 
> > in -next.
> 
> I wonder if there would be any benefit if stack canary is put into 
> struct pcpu_hot?

It should definitely be one of the hottest data structures on x86, so 
moving it there makes sense even if it cannot be measured explicitly.

Thanks,

	Ingo
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months ago
On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > > Thank you for doing this series - it all looks pretty good from my
> > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > the trivial details other reviewers and me noticed.
> > >
> > > Note that the merge is tentative, it might still need a rebase if
> > > some fundamental problem comes up - but let's see how testing goes
> > > in -next.
> >
> > I wonder if there would be any benefit if stack canary is put into
> > struct pcpu_hot?
>
> It should definitely be one of the hottest data structures on x86, so
> moving it there makes sense even if it cannot be measured explicitly.
>

It would have to be done with linker tricks, since you can't make the
compiler use a struct member directly.


Brian Gerst
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ingo Molnar 10 months ago
* Brian Gerst <brgerst@gmail.com> wrote:

> On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > > > Thank you for doing this series - it all looks pretty good from my
> > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > the trivial details other reviewers and me noticed.
> > > >
> > > > Note that the merge is tentative, it might still need a rebase if
> > > > some fundamental problem comes up - but let's see how testing goes
> > > > in -next.
> > >
> > > I wonder if there would be any benefit if stack canary is put into
> > > struct pcpu_hot?
> >
> > It should definitely be one of the hottest data structures on x86, so
> > moving it there makes sense even if it cannot be measured explicitly.
> >
> 
> It would have to be done with linker tricks, since you can't make the 
> compiler use a struct member directly.

Probably not worth it then?

Thanks,

	Ingo
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months ago
On Thu, Feb 20, 2025 at 8:26 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Brian Gerst <brgerst@gmail.com> wrote:
>
> > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > > Thank you for doing this series - it all looks pretty good from my
> > > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > > the trivial details other reviewers and me noticed.
> > > > >
> > > > > Note that the merge is tentative, it might still need a rebase if
> > > > > some fundamental problem comes up - but let's see how testing goes
> > > > > in -next.
> > > >
> > > > I wonder if there would be any benefit if stack canary is put into
> > > > struct pcpu_hot?
> > >
> > > It should definitely be one of the hottest data structures on x86, so
> > > moving it there makes sense even if it cannot be measured explicitly.
> > >
> >
> > It would have to be done with linker tricks, since you can't make the
> > compiler use a struct member directly.
>
> Probably not worth it then?

Actually it wasn't so bad since we already had the hack for
__ref_stack_chk_guard.  Do you want the patches now or when the dust
settles on the original series?


Brian Gerst
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ingo Molnar 9 months, 4 weeks ago
* Brian Gerst <brgerst@gmail.com> wrote:

> On Thu, Feb 20, 2025 at 8:26 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Brian Gerst <brgerst@gmail.com> wrote:
> >
> > > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > > > Thank you for doing this series - it all looks pretty good from my
> > > > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > > > the trivial details other reviewers and me noticed.
> > > > > >
> > > > > > Note that the merge is tentative, it might still need a rebase if
> > > > > > some fundamental problem comes up - but let's see how testing goes
> > > > > > in -next.
> > > > >
> > > > > I wonder if there would be any benefit if stack canary is put into
> > > > > struct pcpu_hot?
> > > >
> > > > It should definitely be one of the hottest data structures on x86, so
> > > > moving it there makes sense even if it cannot be measured explicitly.
> > > >
> > >
> > > It would have to be done with linker tricks, since you can't make the
> > > compiler use a struct member directly.
> >
> > Probably not worth it then?
> 
> Actually it wasn't so bad since we already had the hack for 
> __ref_stack_chk_guard.  Do you want the patches now or when the dust 
> settles on the original series?

We can add it now I suppose, while memories are fresh and all that.

Thanks,

	Ingo
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Uros Bizjak 10 months ago
On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > > > Thank you for doing this series - it all looks pretty good from my
> > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > the trivial details other reviewers and me noticed.
> > > >
> > > > Note that the merge is tentative, it might still need a rebase if
> > > > some fundamental problem comes up - but let's see how testing goes
> > > > in -next.
> > >
> > > I wonder if there would be any benefit if stack canary is put into
> > > struct pcpu_hot?
> >
> > It should definitely be one of the hottest data structures on x86, so
> > moving it there makes sense even if it cannot be measured explicitly.
> >
>
> It would have to be done with linker tricks, since you can't make the
> compiler use a struct member directly.

Something like the attached patch?

It boots and runs without problems.

However, when building the kernel, I get "Absolute relocations
present" warning with thousands of locations:

  RELOCS  arch/x86/boot/compressed/vmlinux.relocs
WARNING: Absolute relocations present
Offset     Info     Type     Sym.Value Sym.Name
ffffffff81200826 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard
ffffffff81201493 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard
ffffffff81201714 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard
ffffffff81201d66 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard
...
ffffffff834e2a13 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard
ffffffff834e2a6a 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
__ref_stack_chk_guard

  RSTRIP  vmlinux

which I don't understand. Looking at the first one:

ffffffff8120081d <force_ibs_eilvt_setup.cold>:
ffffffff8120081d:    48 8b 44 24 08           mov    0x8(%rsp),%rax
ffffffff81200822:    65 48 2b 05 f6 ed 30     sub
%gs:0x230edf6(%rip),%rax        # ffffffff8350f620
<__ref_stack_chk_guard>
ffffffff81200829:    02

I don't think this is absolute relocation, see (%rip).

The kernel was compiled with gcc-14.2.1, so clang specific issue was not tested.

Uros.
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 20be5758c2d2..940efc07f2c1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -691,7 +691,7 @@ SYM_CODE_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movl	TASK_stack_canary(%edx), %ebx
-	movl	%ebx, PER_CPU_VAR(__stack_chk_guard)
+	movl	%ebx, PER_CPU_VAR(pcpu_hot + X86_stack_canary)
 #endif
 
 	/*
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49d3b222fe99..4f4c0cf4963f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -193,7 +193,7 @@ SYM_FUNC_START(__switch_to_asm)
 
 #ifdef CONFIG_STACKPROTECTOR
 	movq	TASK_stack_canary(%rsi), %rbx
-	movq	%rbx, PER_CPU_VAR(__stack_chk_guard)
+	movq	%rbx, PER_CPU_VAR(pcpu_hot + X86_stack_canary)
 #endif
 
 	/*
diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h
index bf5953883ec3..a4d515cd6a31 100644
--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -22,6 +22,9 @@ struct pcpu_hot {
 			u64			call_depth;
 #endif
 			unsigned long		top_of_stack;
+#ifdef CONFIG_STACKPROTECTOR
+			unsigned long		stack_canary;
+#endif
 			void			*hardirq_stack_ptr;
 			u16			softirq_pending;
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index d43fb589fcf6..5e5229ac1c46 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -20,8 +20,6 @@
 
 #include <linux/sched.h>
 
-DECLARE_PER_CPU(unsigned long, __stack_chk_guard);
-
 /*
  * Initialize the stackprotector canary value.
  *
@@ -38,12 +36,12 @@ static __always_inline void boot_init_stack_canary(void)
 	unsigned long canary = get_random_canary();
 
 	current->stack_canary = canary;
-	this_cpu_write(__stack_chk_guard, canary);
+	this_cpu_write(pcpu_hot.stack_canary, canary);
 }
 
 static inline void cpu_init_stack_canary(int cpu, struct task_struct *idle)
 {
-	per_cpu(__stack_chk_guard, cpu) = idle->stack_canary;
+	per_cpu(pcpu_hot.stack_canary, cpu) = idle->stack_canary;
 }
 
 #else	/* STACKPROTECTOR */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index a98020bf31bb..59e8b294cbdc 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -109,6 +109,9 @@ static void __used common(void)
 	OFFSET(TSS_sp2, tss_struct, x86_tss.sp2);
 	OFFSET(X86_top_of_stack, pcpu_hot, top_of_stack);
 	OFFSET(X86_current_task, pcpu_hot, current_task);
+#ifdef CONFIG_STACKPROTECTOR
+	OFFSET(X86_stack_canary, pcpu_hot, stack_canary);
+#endif
 #ifdef CONFIG_MITIGATION_CALL_DEPTH_TRACKING
 	OFFSET(X86_call_depth, pcpu_hot, call_depth);
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 21078907af57..f30d6a9c4abd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -24,7 +24,6 @@
 #include <linux/io.h>
 #include <linux/syscore_ops.h>
 #include <linux/pgtable.h>
-#include <linux/stackprotector.h>
 #include <linux/utsname.h>
 
 #include <asm/alternative.h>
@@ -2087,13 +2086,6 @@ void syscall_init(void)
 }
 #endif /* CONFIG_X86_64 */
 
-#ifdef CONFIG_STACKPROTECTOR
-DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
-#ifndef CONFIG_SMP
-EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
-#endif
-#endif
-
 /*
  * Clear all 6 debug registers:
  */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1769a7126224..a35e4ebe032e 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -468,7 +468,7 @@ SECTIONS
 	   "kernel image bigger than KERNEL_IMAGE_SIZE");
 
 /* needed for Clang - see arch/x86/entry/entry.S */
-PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+PROVIDE(__ref_stack_chk_guard = pcpu_hot + X86_stack_canary);
 
 #ifdef CONFIG_X86_64
 
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ard Biesheuvel 10 months ago
Hi Uros,

On Thu, 20 Feb 2025 at 10:51, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > > Thank you for doing this series - it all looks pretty good from my
> > > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > > the trivial details other reviewers and me noticed.
> > > > >
> > > > > Note that the merge is tentative, it might still need a rebase if
> > > > > some fundamental problem comes up - but let's see how testing goes
> > > > > in -next.
> > > >
> > > > I wonder if there would be any benefit if stack canary is put into
> > > > struct pcpu_hot?
> > >
> > > It should definitely be one of the hottest data structures on x86, so
> > > moving it there makes sense even if it cannot be measured explicitly.
> > >
> >
> > It would have to be done with linker tricks, since you can't make the
> > compiler use a struct member directly.
>
> Something like the attached patch?
>

Interesting take. I'd have tried to put the canary at offset 0x0, and
simply use pcpu_hot as the guard symbol.


> It boots and runs without problems.
>
> However, when building the kernel, I get "Absolute relocations
> present" warning with thousands of locations:
>
>   RELOCS  arch/x86/boot/compressed/vmlinux.relocs
> WARNING: Absolute relocations present
> Offset     Info     Type     Sym.Value Sym.Name
> ffffffff81200826 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
> ffffffff81201493 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
> ffffffff81201714 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
> ffffffff81201d66 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
> ...
> ffffffff834e2a13 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
> ffffffff834e2a6a 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> __ref_stack_chk_guard
>
>   RSTRIP  vmlinux
>
> which I don't understand. Looking at the first one:
>
> ffffffff8120081d <force_ibs_eilvt_setup.cold>:
> ffffffff8120081d:    48 8b 44 24 08           mov    0x8(%rsp),%rax
> ffffffff81200822:    65 48 2b 05 f6 ed 30     sub
> %gs:0x230edf6(%rip),%rax        # ffffffff8350f620
> <__ref_stack_chk_guard>
> ffffffff81200829:    02
>
> I don't think this is absolute relocation, see (%rip).
>

The warning is about the type of __ref_stack_chk_guard, not about the
type of the relocation.

$ nm vmlinux |grep \\s__ref_sta
ffffffff8350c620 A __ref_stack_chk_guard

Without your patch:

$ nm vmlinux |grep \\s__ref_sta
ffffffff834fba10 D __ref_stack_chk_guard
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Uros Bizjak 10 months ago
On Thu, Feb 20, 2025 at 11:05 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Hi Uros,
>
> On Thu, 20 Feb 2025 at 10:51, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > > > Thank you for doing this series - it all looks pretty good from my
> > > > > > side and I've applied it experimentally to tip:x86/asm. I fixed up
> > > > > > the trivial details other reviewers and me noticed.
> > > > > >
> > > > > > Note that the merge is tentative, it might still need a rebase if
> > > > > > some fundamental problem comes up - but let's see how testing goes
> > > > > > in -next.
> > > > >
> > > > > I wonder if there would be any benefit if stack canary is put into
> > > > > struct pcpu_hot?
> > > >
> > > > It should definitely be one of the hottest data structures on x86, so
> > > > moving it there makes sense even if it cannot be measured explicitly.
> > > >
> > >
> > > It would have to be done with linker tricks, since you can't make the
> > > compiler use a struct member directly.
> >
> > Something like the attached patch?
> >
>
> Interesting take. I'd have tried to put the canary at offset 0x0, and
> simply use pcpu_hot as the guard symbol.
>
>
> > It boots and runs without problems.
> >
> > However, when building the kernel, I get "Absolute relocations
> > present" warning with thousands of locations:
> >
> >   RELOCS  arch/x86/boot/compressed/vmlinux.relocs
> > WARNING: Absolute relocations present
> > Offset     Info     Type     Sym.Value Sym.Name
> > ffffffff81200826 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> > ffffffff81201493 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> > ffffffff81201714 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> > ffffffff81201d66 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> > ...
> > ffffffff834e2a13 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> > ffffffff834e2a6a 0003259e00000002 R_X86_64_PC32 ffffffff8350f620
> > __ref_stack_chk_guard
> >
> >   RSTRIP  vmlinux
> >
> > which I don't understand. Looking at the first one:
> >
> > ffffffff8120081d <force_ibs_eilvt_setup.cold>:
> > ffffffff8120081d:    48 8b 44 24 08           mov    0x8(%rsp),%rax
> > ffffffff81200822:    65 48 2b 05 f6 ed 30     sub
> > %gs:0x230edf6(%rip),%rax        # ffffffff8350f620
> > <__ref_stack_chk_guard>
> > ffffffff81200829:    02
> >
> > I don't think this is absolute relocation, see (%rip).
> >
>
> The warning is about the type of __ref_stack_chk_guard, not about the
> type of the relocation.

Thanks, I got distracted by the text of the warning that mentions relocation.

> $ nm vmlinux |grep \\s__ref_sta
> ffffffff8350c620 A __ref_stack_chk_guard
>
> Without your patch:
>
> $ nm vmlinux |grep \\s__ref_sta
> ffffffff834fba10 D __ref_stack_chk_guard

Is this a problem in our specific case? While the symbol is absolute,
the relocation is still relative, so IMO it should be OK even with
your ongoing rip-relative efforts in mind.

We can list the symbol in arch/x86/tools/relocs.c to quiet the
warning, but I would need some help with auditing the symbol itself.

OTOH, we could simply do it your way and put stack canary at the
beginning of pcpu_hot structure, with

static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));

for good measure.

Uros.
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ard Biesheuvel 10 months ago
On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 11:05 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Hi Uros,
> >
> > On Thu, 20 Feb 2025 at 10:51, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > > >
> > > > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > > I wonder if there would be any benefit if stack canary is put into
> > > > > > struct pcpu_hot?
> > > > >
> > > > > It should definitely be one of the hottest data structures on x86, so
> > > > > moving it there makes sense even if it cannot be measured explicitly.
> > > > >
> > > >
> > > > It would have to be done with linker tricks, since you can't make the
> > > > compiler use a struct member directly.
> > >
> >
> > Interesting take. I'd have tried to put the canary at offset 0x0, and
> > simply use pcpu_hot as the guard symbol.
> >
> >
> > > It boots and runs without problems.
> > >
> > > However, when building the kernel, I get "Absolute relocations
> > > present" warning with thousands of locations:
> > >
...
> >
> > The warning is about the type of __ref_stack_chk_guard, not about the
> > type of the relocation.
>
> Thanks, I got distracted by the text of the warning that mentions relocation.
>
> > $ nm vmlinux |grep \\s__ref_sta
> > ffffffff8350c620 A __ref_stack_chk_guard
> >
> > Without your patch:
> >
> > $ nm vmlinux |grep \\s__ref_sta
> > ffffffff834fba10 D __ref_stack_chk_guard
>
> Is this a problem in our specific case?

I don't think so - the whole notion of absolute ELF symbols is rather
flaky IME, so I don't think we should be pedantic here.

> We can list the symbol in arch/x86/tools/relocs.c to quiet the
> warning, but I would need some help with auditing the symbol itself.
>
> OTOH, we could simply do it your way and put stack canary at the
> beginning of pcpu_hot structure, with
>
> static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
>
> for good measure.

I think this would be the most straight-forward if there are no other
locality concerns this might interfere with.
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months ago
On Thu, Feb 20, 2025 at 5:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Feb 20, 2025 at 11:05 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Hi Uros,
> > >
> > > On Thu, 20 Feb 2025 at 10:51, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > > > >
> > > > > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > > I wonder if there would be any benefit if stack canary is put into
> > > > > > > struct pcpu_hot?
> > > > > >
> > > > > > It should definitely be one of the hottest data structures on x86, so
> > > > > > moving it there makes sense even if it cannot be measured explicitly.
> > > > > >
> > > > >
> > > > > It would have to be done with linker tricks, since you can't make the
> > > > > compiler use a struct member directly.
> > > >
> > >
> > > Interesting take. I'd have tried to put the canary at offset 0x0, and
> > > simply use pcpu_hot as the guard symbol.
> > >
> > >
> > > > It boots and runs without problems.
> > > >
> > > > However, when building the kernel, I get "Absolute relocations
> > > > present" warning with thousands of locations:
> > > >
> ...
> > >
> > > The warning is about the type of __ref_stack_chk_guard, not about the
> > > type of the relocation.
> >
> > Thanks, I got distracted by the text of the warning that mentions relocation.
> >
> > > $ nm vmlinux |grep \\s__ref_sta
> > > ffffffff8350c620 A __ref_stack_chk_guard
> > >
> > > Without your patch:
> > >
> > > $ nm vmlinux |grep \\s__ref_sta
> > > ffffffff834fba10 D __ref_stack_chk_guard
> >
> > Is this a problem in our specific case?
>
> I don't think so - the whole notion of absolute ELF symbols is rather
> flaky IME, so I don't think we should be pedantic here.

From what I understand it stayed relative because there wasn't a
constant added.  As soon as you add a constant (which the linker
treats as absolute), it becomes absolute.

> > We can list the symbol in arch/x86/tools/relocs.c to quiet the
> > warning, but I would need some help with auditing the symbol itself.
> >
> > OTOH, we could simply do it your way and put stack canary at the
> > beginning of pcpu_hot structure, with
> >
> > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> >
> > for good measure.
>
> I think this would be the most straight-forward if there are no other
> locality concerns this might interfere with.

I'd prefer it at the end of pcpu_hot, that way the disassembler
doesn't latch on to the __stack_chk_guard symbol when referencing the
other fields of pcpu_hot.


Brian Gerst
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Ard Biesheuvel 10 months ago
On Thu, 20 Feb 2025 at 18:24, Brian Gerst <brgerst@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 5:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > OTOH, we could simply do it your way and put stack canary at the
> > > beginning of pcpu_hot structure, with
> > >
> > > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> > >
> > > for good measure.
> >
> > I think this would be the most straight-forward if there are no other
> > locality concerns this might interfere with.
>
> I'd prefer it at the end of pcpu_hot, that way the disassembler
> doesn't latch on to the __stack_chk_guard symbol when referencing the
> other fields of pcpu_hot.
>

__stack_chk_guard would no longer exist, only __ref_stack_chk_guard,
which would be equal to pcpu_hot. We could just call that
__ref_pcpu_hot instead if it might cause confusion otherwise. (We
can't use pcpu_hot directly in -mstack-check-guard-symbol= for the
same reasons I had to add the indirection via __ref_stack_chk_guard)
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months ago
On Thu, Feb 20, 2025 at 12:36 PM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 20 Feb 2025 at 18:24, Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Thu, Feb 20, 2025 at 5:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > OTOH, we could simply do it your way and put stack canary at the
> > > > beginning of pcpu_hot structure, with
> > > >
> > > > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> > > >
> > > > for good measure.
> > >
> > > I think this would be the most straight-forward if there are no other
> > > locality concerns this might interfere with.
> >
> > I'd prefer it at the end of pcpu_hot, that way the disassembler
> > doesn't latch on to the __stack_chk_guard symbol when referencing the
> > other fields of pcpu_hot.
> >
>
> __stack_chk_guard would no longer exist, only __ref_stack_chk_guard,
> which would be equal to pcpu_hot.  We could just call that
> __ref_pcpu_hot instead if it might cause confusion otherwise. (We
> can't use pcpu_hot directly in -mstack-check-guard-symbol= for the
> same reasons I had to add the indirection via __ref_stack_chk_guard)

That works for me.


Brian Gerst
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Brian Gerst 10 months ago
On Thu, Feb 20, 2025 at 12:47 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 12:36 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 20 Feb 2025 at 18:24, Brian Gerst <brgerst@gmail.com> wrote:
> > >
> > > On Thu, Feb 20, 2025 at 5:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > OTOH, we could simply do it your way and put stack canary at the
> > > > > beginning of pcpu_hot structure, with
> > > > >
> > > > > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> > > > >
> > > > > for good measure.
> > > >
> > > > I think this would be the most straight-forward if there are no other
> > > > locality concerns this might interfere with.
> > >
> > > I'd prefer it at the end of pcpu_hot, that way the disassembler
> > > doesn't latch on to the __stack_chk_guard symbol when referencing the
> > > other fields of pcpu_hot.
> > >
> >
> > __stack_chk_guard would no longer exist, only __ref_stack_chk_guard,
> > which would be equal to pcpu_hot.  We could just call that
> > __ref_pcpu_hot instead if it might cause confusion otherwise. (We
> > can't use pcpu_hot directly in -mstack-check-guard-symbol= for the
> > same reasons I had to add the indirection via __ref_stack_chk_guard)
>
> That works for me.

Maybe not.  One quirk of how GCC implements this is that
-mstack-protector-guard=global (used by !SMP builds) ignores the
-mstack-protector-guard-symbol option and always uses
__stack_chk_guard.  That makes things more challenging.


Brian Gerst
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Uros Bizjak 10 months ago
On Thu, Feb 20, 2025 at 6:59 PM Brian Gerst <brgerst@gmail.com> wrote:
>
> On Thu, Feb 20, 2025 at 12:47 PM Brian Gerst <brgerst@gmail.com> wrote:
> >
> > On Thu, Feb 20, 2025 at 12:36 PM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 20 Feb 2025 at 18:24, Brian Gerst <brgerst@gmail.com> wrote:
> > > >
> > > > On Thu, Feb 20, 2025 at 5:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > OTOH, we could simply do it your way and put stack canary at the
> > > > > > beginning of pcpu_hot structure, with
> > > > > >
> > > > > > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> > > > > >
> > > > > > for good measure.
> > > > >
> > > > > I think this would be the most straight-forward if there are no other
> > > > > locality concerns this might interfere with.
> > > >
> > > > I'd prefer it at the end of pcpu_hot, that way the disassembler
> > > > doesn't latch on to the __stack_chk_guard symbol when referencing the
> > > > other fields of pcpu_hot.
> > > >
> > >
> > > __stack_chk_guard would no longer exist, only __ref_stack_chk_guard,
> > > which would be equal to pcpu_hot.  We could just call that
> > > __ref_pcpu_hot instead if it might cause confusion otherwise. (We
> > > can't use pcpu_hot directly in -mstack-check-guard-symbol= for the
> > > same reasons I had to add the indirection via __ref_stack_chk_guard)
> >
> > That works for me.
>
> Maybe not.  One quirk of how GCC implements this is that
> -mstack-protector-guard=global (used by !SMP builds) ignores the
> -mstack-protector-guard-symbol option and always uses
> __stack_chk_guard.  That makes things more challenging.

Not really. If we put stack_canary as the first member of struct
pcpu_hot, we can just alias __stack_chk_guard to struct pcpu_hot in
the linker script, and everything starts to magically work, SMP and
!SMP. Please see the proposed patch, effectively a three liner, at
[1].

[1] https://lore.kernel.org/lkml/20250220200439.4458-1-ubizjak@gmail.com/

Uros.
Re: [PATCH v6 00/15] x86-64: Stack protector and percpu improvements
Posted by Uros Bizjak 10 months ago
On Thu, Feb 20, 2025 at 11:52 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 20 Feb 2025 at 11:46, Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Thu, Feb 20, 2025 at 11:05 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Hi Uros,
> > >
> > > On Thu, 20 Feb 2025 at 10:51, Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 19, 2025 at 2:18 PM Brian Gerst <brgerst@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 19, 2025 at 6:47 AM Ingo Molnar <mingo@kernel.org> wrote:
> > > > > >
> > > > > > * Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > > >
> > > > > > > I wonder if there would be any benefit if stack canary is put into
> > > > > > > struct pcpu_hot?
> > > > > >
> > > > > > It should definitely be one of the hottest data structures on x86, so
> > > > > > moving it there makes sense even if it cannot be measured explicitly.
> > > > > >
> > > > >
> > > > > It would have to be done with linker tricks, since you can't make the
> > > > > compiler use a struct member directly.
> > > >
> > >
> > > Interesting take. I'd have tried to put the canary at offset 0x0, and
> > > simply use pcpu_hot as the guard symbol.
> > >
> > >
> > > > It boots and runs without problems.
> > > >
> > > > However, when building the kernel, I get "Absolute relocations
> > > > present" warning with thousands of locations:
> > > >
> ...
> > >
> > > The warning is about the type of __ref_stack_chk_guard, not about the
> > > type of the relocation.
> >
> > Thanks, I got distracted by the text of the warning that mentions relocation.
> >
> > > $ nm vmlinux |grep \\s__ref_sta
> > > ffffffff8350c620 A __ref_stack_chk_guard
> > >
> > > Without your patch:
> > >
> > > $ nm vmlinux |grep \\s__ref_sta
> > > ffffffff834fba10 D __ref_stack_chk_guard
> >
> > Is this a problem in our specific case?
>
> I don't think so - the whole notion of absolute ELF symbols is rather
> flaky IME, so I don't think we should be pedantic here.
>
> > We can list the symbol in arch/x86/tools/relocs.c to quiet the
> > warning, but I would need some help with auditing the symbol itself.
> >
> > OTOH, we could simply do it your way and put stack canary at the
> > beginning of pcpu_hot structure, with
> >
> > static_assert(offsetof(struct pcpu_hot, stack_canary) == 0));
> >
> > for good measure.
>
> I think this would be the most straight-forward if there are no other
> locality concerns this might interfere with.

OK, let me prepare a patch then.

Thanks,
Uros.