[PATCH v2 08/59] x86/build: Ensure proper function alignment

Peter Zijlstra posted 59 patches 3 years, 7 months ago
There is a newer version of this series
[PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Peter Zijlstra 3 years, 7 months ago
From: Thomas Gleixner <tglx@linutronix.de>

The Intel Architectures Optimization Reference Manual explains that
functions should be aligned at 16 bytes because for a lot of (Intel)
uarchs the I-fetch width is 16 bytes. The AMD Software Optimization
Guide (for recent chips) mentions a 32 byte I-fetch window but a 16
byte decode window.

Follow this advice and align functions to 16 bytes to optimize
instruction delivery to decode and reduce front-end bottlenecks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/Kconfig.cpu              |    6 ++++++
 arch/x86/Makefile                 |    4 ++++
 arch/x86/include/asm/linkage.h    |    7 ++++---
 include/asm-generic/vmlinux.lds.h |    7 ++++++-
 4 files changed, 20 insertions(+), 4 deletions(-)

--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -517,3 +517,9 @@ config CPU_SUP_VORTEX_32
 	  makes the kernel a tiny bit smaller.
 
 	  If unsure, say N.
+
+# Defined here so it is defined for UM too
+config FUNCTION_ALIGNMENT
+	int
+	default 16 if X86_64 || X86_ALIGNMENT_16
+	default 8
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -84,6 +84,10 @@ else
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
 endif
 
+ifneq ($(CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B),y)
+KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
+endif
+
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -14,9 +14,10 @@
 
 #ifdef __ASSEMBLY__
 
-#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
-#define __ALIGN		.p2align 4, 0x90
-#define __ALIGN_STR	__stringify(__ALIGN)
+#if CONFIG_FUNCTION_ALIGNMENT == 16
+#define __ALIGN			.p2align 4, 0x90
+#define __ALIGN_STR		__stringify(__ALIGN)
+#define FUNCTION_ALIGNMENT	16
 #endif
 
 #if defined(CONFIG_RETHUNK) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -82,7 +82,12 @@
 #endif
 
 /* Align . to a 8 byte boundary equals to maximum function alignment. */
-#define ALIGN_FUNCTION()  . = ALIGN(8)
+#ifndef CONFIG_FUNCTION_ALIGNMENT
+#define __FUNCTION_ALIGNMENT	8
+#else
+#define __FUNCTION_ALIGNMENT	CONFIG_FUNCTION_ALIGNMENT
+#endif
+#define ALIGN_FUNCTION()  . = ALIGN(__FUNCTION_ALIGNMENT)
 
 /*
  * LD_DEAD_CODE_DATA_ELIMINATION option enables -fdata-sections, which
RE: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by David Laight 3 years, 7 months ago
From: Peter Zijlstra
> Sent: 02 September 2022 14:07
> 
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> The Intel Architectures Optimization Reference Manual explains that
> functions should be aligned at 16 bytes because for a lot of (Intel)
> uarchs the I-fetch width is 16 bytes. The AMD Software Optimization
> Guide (for recent chips) mentions a 32 byte I-fetch window but a 16
> byte decode window.
> 
> Follow this advice and align functions to 16 bytes to optimize
> instruction delivery to decode and reduce front-end bottlenecks.

Performance figures?

IIRC the same document will suggest aligning all jump labels.
That is pretty much known to be harmful because of the bloat
it generates.

Also things like CFI and ENDBRA have a habit of making the
entry point unaligned unless you can pad to 16n+x values.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Linus Torvalds 3 years, 7 months ago
On Fri, Sep 2, 2022 at 6:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> --- a/arch/x86/include/asm/linkage.h
> +++ b/arch/x86/include/asm/linkage.h
> @@ -14,9 +14,10 @@
>
>  #ifdef __ASSEMBLY__
>
> -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
> -#define __ALIGN                .p2align 4, 0x90
> -#define __ALIGN_STR    __stringify(__ALIGN)
> +#if CONFIG_FUNCTION_ALIGNMENT == 16
> +#define __ALIGN                        .p2align 4, 0x90
> +#define __ALIGN_STR            __stringify(__ALIGN)
> +#define FUNCTION_ALIGNMENT     16
>  #endif

Ugh.

Why is this conditional on that alignment being 16?

Is it purely because the CONFIG variable was mis-designed, and is the
number of bytes, instead of being the shift? If so, just fix that, and
then do an unconditional

  #define __ALIGN                        .p2align
CONFIG_FUNCTION_ALIGNMENT_SHIFT, 0x90
  #define __ALIGN_STR            __stringify(__ALIGN)

(leave the asm-generic one conditional, since then the condition makes
sense - it's arch-specific).

           Linus
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Peter Zijlstra 3 years, 7 months ago
On Fri, Sep 02, 2022 at 09:51:17AM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2022 at 6:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > --- a/arch/x86/include/asm/linkage.h
> > +++ b/arch/x86/include/asm/linkage.h
> > @@ -14,9 +14,10 @@
> >
> >  #ifdef __ASSEMBLY__
> >
> > -#if defined(CONFIG_X86_64) || defined(CONFIG_X86_ALIGNMENT_16)
> > -#define __ALIGN                .p2align 4, 0x90
> > -#define __ALIGN_STR    __stringify(__ALIGN)
> > +#if CONFIG_FUNCTION_ALIGNMENT == 16
> > +#define __ALIGN                        .p2align 4, 0x90
> > +#define __ALIGN_STR            __stringify(__ALIGN)
> > +#define FUNCTION_ALIGNMENT     16
> >  #endif
> 
> Ugh.
> 
> Why is this conditional on that alignment being 16?

There is a DEBUG case that increases the thing to 32.
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Linus Torvalds 3 years, 7 months ago
On Fri, Sep 2, 2022 at 10:32 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> There is a DEBUG case that increases the thing to 32.

Well, but that should be part of the Kconfig rules too.

In fact, I think that argues for moving that FUNCTION_ALIGNMENT into
the generic Kconfig, since we already have that much hackier debug
thing there.

That would then get rid of the conditional in asm-generic too, and get
rid of the horrid hack in the main Makefile as well.

I love how commit cf536e185869 ("Makefile: extend 32B aligned debug
option to 64B aligned") took that previous random debug entry and just
made it 64B instead of 32B. What a crock that all is.

Let's just do this right.

             Linus


               Linus
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Peter Zijlstra 3 years, 7 months ago
On Fri, Sep 02, 2022 at 11:08:54AM -0700, Linus Torvalds wrote:

> Let's just do this right.

Something like so then?

---
--- a/Makefile
+++ b/Makefile
@@ -940,8 +940,8 @@ KBUILD_CFLAGS	+= $(CC_FLAGS_CFI)
 export CC_FLAGS_CFI
 endif
 
-ifdef CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B
-KBUILD_CFLAGS += -falign-functions=64
+ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
+KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
 endif
 
 # arch Makefile may override CC so keep this after arch Makefile is included
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1419,4 +1419,24 @@ source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
 
+config FUNCTION_ALIGNMENT_8B
+	bool
+
+config FUNCTION_ALIGNMENT_16B
+	bool
+
+config FUNCTION_ALIGNMENT_32B
+	bool
+
+config FUNCTION_ALIGNMENT_64B
+	bool
+
+config FUNCTION_ALIGNMENT
+	int
+	default 64 if FUNCTION_ALIGNMENT_64B
+	default 32 if FUNCTION_ALIGNMENT_32B
+	default 16 if FUNCTION_ALIGNMENT_16B
+	default 8 if FUNCTION_ALIGNMENT_8B
+	default 0
+
 endmenu
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -63,6 +63,7 @@ config IA64
 	select NUMA if !FLATMEM
 	select PCI_MSI_ARCH_FALLBACKS if PCI_MSI
 	select ZONE_DMA32
+	select FUNCTION_ALIGNMENT_32B
 	default y
 	help
 	  The Itanium Processor Family is Intel's 64-bit successor to
--- a/arch/ia64/Makefile
+++ b/arch/ia64/Makefile
@@ -23,7 +23,7 @@ KBUILD_AFLAGS_KERNEL := -mconstant-gp
 EXTRA		:=
 
 cflags-y	:= -pipe $(EXTRA) -ffixed-r13 -mfixed-range=f12-f15,f32-f127 \
-		   -falign-functions=32 -frename-registers -fno-optimize-sibling-calls
+		   -frename-registers -fno-optimize-sibling-calls
 KBUILD_CFLAGS_KERNEL := -mconstant-gp
 
 GAS_STATUS	= $(shell $(srctree)/arch/ia64/scripts/check-gas "$(CC)" "$(OBJDUMP)")
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -283,6 +283,8 @@ config X86
 	select X86_FEATURE_NAMES		if PROC_FS
 	select PROC_PID_ARCH_STATUS		if PROC_FS
 	select HAVE_ARCH_NODE_DEV_GROUP		if X86_SGX
+	select FUNCTION_ALIGNMENT_16B		if X86_64 || X86_ALIGNMENT_16
+	select FUNCTION_ALIGNMENT_8B
 	imply IMA_SECURE_AND_OR_TRUSTED_BOOT    if EFI
 
 config INSTRUCTION_DECODER
@@ -2442,6 +2444,7 @@ config HAVE_CALL_THUNKS
 	depends on CC_HAS_ENTRY_PADDING && RETHUNK && OBJTOOL
 
 config CALL_THUNKS
+	select FUNCTION_ALIGNMENT_16B
 	def_bool n
 
 menuconfig SPECULATION_MITIGATIONS
@@ -2515,6 +2518,7 @@ config CALL_DEPTH_TRACKING
 
 config CALL_THUNKS_DEBUG
 	bool "Enable call thunks and call depth tracking debugging"
+	select FUNCTION_ALIGNMENT_32B
 	depends on CALL_DEPTH_TRACKING
 	default n
 	help
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -517,9 +517,3 @@ config CPU_SUP_VORTEX_32
 	  makes the kernel a tiny bit smaller.
 
 	  If unsure, say N.
-
-# Defined here so it is defined for UM too
-config FUNCTION_ALIGNMENT
-	int
-	default 16 if X86_64 || X86_ALIGNMENT_16
-	default 8
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -84,10 +84,6 @@ else
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
 endif
 
-ifneq ($(CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B),y)
-KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT)
-endif
-
 ifeq ($(CONFIG_X86_32),y)
         BITS := 32
         UTS_MACHINE := i386
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -12,16 +12,7 @@
 #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
 #endif /* CONFIG_X86_32 */
 
-#if CONFIG_FUNCTION_ALIGNMENT == 8
-# define __ALIGN		.p2align 3, 0x90;
-#elif CONFIG_FUNCTION_ALIGNMENT == 16
-# define __ALIGN		.p2align 4, 0x90;
-#elif CONFIG_FUNCTION_ALIGNMENT == 32
-# define __ALIGN		.p2align 5, 0x90
-#else
-# error Unsupported function alignment
-#endif
-
+#define __ALIGN			.balign CONFIG_FUNCTION_ALIGNMENT, 0x90
 #define __ALIGN_STR		__stringify(__ALIGN)
 
 #ifdef CONFIG_CFI_CLANG
--- a/include/linux/linkage.h
+++ b/include/linux/linkage.h
@@ -69,8 +69,8 @@
 #endif
 
 #ifndef __ALIGN
-#define __ALIGN		.align 4,0x90
-#define __ALIGN_STR	".align 4,0x90"
+#define __ALIGN			.balign CONFIG_FUNCTION_ALIGNMENT
+#define __ALIGN_STR		__stringify(__ALIGN)
 #endif
 
 #ifdef __ASSEMBLY__
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -459,6 +459,7 @@ config SECTION_MISMATCH_WARN_ONLY
 config DEBUG_FORCE_FUNCTION_ALIGN_64B
 	bool "Force all function address 64B aligned"
 	depends on EXPERT && (X86_64 || ARM64 || PPC32 || PPC64 || ARC)
+	select FUNCTION_ALIGNMENT_64B
 	help
 	  There are cases that a commit from one domain changes the function
 	  address alignment of other domains, and cause magic performance
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Linus Torvalds 3 years, 6 months ago
On Mon, Sep 5, 2022 at 6:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Sep 02, 2022 at 11:08:54AM -0700, Linus Torvalds wrote:
>
> > Let's just do this right.
>
> Something like so then?

Sorry, I dropped this due to travel.

The patch looks sane, the only thing I worry a bit about is

> +config FUNCTION_ALIGNMENT
> +       int
> +       default 64 if FUNCTION_ALIGNMENT_64B
..
> +       default 0

Is '0' even a valid value then for things like

> +#define __ALIGN                        .balign CONFIG_FUNCTION_ALIGNMENT
> +#define __ALIGN_STR            __stringify(__ALIGN)

because it doesn't really seem like a sensible byte alignment.

Maybe "default 4" would be a safer choice?

                   Linus
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Peter Zijlstra 3 years, 6 months ago
On Mon, Sep 12, 2022 at 10:09:38AM -0400, Linus Torvalds wrote:

> The patch looks sane, the only thing I worry a bit about is
> 
> > +config FUNCTION_ALIGNMENT
> > +       int
> > +       default 64 if FUNCTION_ALIGNMENT_64B
> ..
> > +       default 0
> 
> Is '0' even a valid value then for things like

At the time I thought to have read that 0 alignment effectively no-ops
the statement, but now I can't find it in a hurry, happy to make it
default to 4.
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Peter Zijlstra 3 years, 6 months ago
On Mon, Sep 12, 2022 at 09:44:05PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 12, 2022 at 10:09:38AM -0400, Linus Torvalds wrote:
> 
> > The patch looks sane, the only thing I worry a bit about is
> > 
> > > +config FUNCTION_ALIGNMENT
> > > +       int
> > > +       default 64 if FUNCTION_ALIGNMENT_64B
> > ..
> > > +       default 0
> > 
> > Is '0' even a valid value then for things like
> 
> At the time I thought to have read that 0 alignment effectively no-ops
> the statement, but now I can't find it in a hurry, happy to make it
> default to 4.

Found it: https://sourceware.org/binutils/docs-2.39/as/Balign.html

7.8 .balign[wl] [abs-expr[, abs-expr[, abs-expr]]]

Pad the location counter (in the current subsection) to a particular
storage boundary. The first expression (which must be absolute) is the
alignment request in bytes. ...  If the expression is omitted then a
default value of 0 is used, effectively disabling alignment
requirements.

(for some raisin google served me a very old binutils document last
night that doesn't have mention the 0 thing)
Re: [PATCH v2 08/59] x86/build: Ensure proper function alignment
Posted by Linus Torvalds 3 years, 6 months ago
On Tue, Sep 13, 2022 at 4:09 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Found it: https://sourceware.org/binutils/docs-2.39/as/Balign.html
>
> 7.8 .balign[wl] [abs-expr[, abs-expr[, abs-expr]]]
>
> Pad the location counter [...]

Very good. All looks sane to me then.

              Linus