[PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()

Josh Poimboeuf posted 5 patches 8 months, 2 weeks ago
[PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
Posted by Josh Poimboeuf 8 months, 2 weeks ago
Use asm_inline() to prevent the compiler from making poor inlining
decisions based on the length of the objtool annotations.

For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
0.18% text size increase:

  add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
  Total: Before=19448407, After=19483356, chg +0.18%

Presumably the text growth is due to increased inlining.  A net total of
345 functions were removed.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 include/linux/instrumentation.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
index bf675a8aef8a..b1007407d272 100644
--- a/include/linux/instrumentation.h
+++ b/include/linux/instrumentation.h
@@ -9,7 +9,7 @@
 
 /* Begin/end of an instrumentation safe region */
 #define __instrumentation_begin(c) ({					\
-	asm volatile(__stringify(c) ": nop\n\t"				\
+	asm_inline volatile(__stringify(c) ": nop\n\t"			\
 		     ANNOTATE_INSTR_BEGIN(__ASM_BREF(c))		\
 		     : : "i" (c));					\
 })
@@ -47,7 +47,7 @@
  * part of the condition block and does not escape.
  */
 #define __instrumentation_end(c) ({					\
-	asm volatile(__stringify(c) ": nop\n\t"				\
+	asm_inline volatile(__stringify(c) ": nop\n\t"			\
 		     ANNOTATE_INSTR_END(__ASM_BREF(c))			\
 		     : : "i" (c));					\
 })
-- 
2.49.0
Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
Posted by Ingo Molnar 8 months, 2 weeks ago
* Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> Use asm_inline() to prevent the compiler from making poor inlining
> decisions based on the length of the objtool annotations.
> 
> For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> 0.18% text size increase:
> 
>   add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
>   Total: Before=19448407, After=19483356, chg +0.18%
> 
> Presumably the text growth is due to increased inlining.  A net total of
> 345 functions were removed.

Since +0.18% puts this into the 'significant' category of .text size 
increases, it would be nice to see a bit more details about the nature 
of these function calls removed: were they really explicit calls to 
__instrumentation_begin()/end(), or somehow tail-call optimized out, or 
something else?

Also, I'm wondering where the 34,949 bytes bloat comes from: with 345 
functions removed that's 100 bytes per function? Doesn't sound right.

Also, is the bloat-o-meter output limited to the .text section, or does 
it include growth in out-of-line sections too?

Thanks,

	Ingo
Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
Posted by Uros Bizjak 8 months, 2 weeks ago
On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> > Use asm_inline() to prevent the compiler from making poor inlining
> > decisions based on the length of the objtool annotations.
> >
> > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > 0.18% text size increase:
> >
> >   add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> >   Total: Before=19448407, After=19483356, chg +0.18%
> >
> > Presumably the text growth is due to increased inlining.  A net total of
> > 345 functions were removed.
>
> Since +0.18% puts this into the 'significant' category of .text size
> increases, it would be nice to see a bit more details about the nature
> of these function calls removed: were they really explicit calls to
> __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> something else?

This increase is mainly due to different inlining decisions.

> Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> functions removed that's 100 bytes per function? Doesn't sound right.

Please note that removed functions can be inlined at several places. E.g.:

$ grep "<encode_string>"  objdump.old

00000000004506e0 <encode_string>:
 45113c:       e8 9f f5 ff ff          call   4506e0 <encode_string>
 452bcb:       e9 10 db ff ff          jmp    4506e0 <encode_string>
 453d33:       e8 a8 c9 ff ff          call   4506e0 <encode_string>
 453ef7:       e8 e4 c7 ff ff          call   4506e0 <encode_string>
 45549f:       e8 3c b2 ff ff          call   4506e0 <encode_string>
 455843:       e8 98 ae ff ff          call   4506e0 <encode_string>
 455b37:       e8 a4 ab ff ff          call   4506e0 <encode_string>
 455b47:       e8 94 ab ff ff          call   4506e0 <encode_string>
 4564fa:       e8 e1 a1 ff ff          call   4506e0 <encode_string>
 456669:       e8 72 a0 ff ff          call   4506e0 <encode_string>
 456691:       e8 4a a0 ff ff          call   4506e0 <encode_string>
 4566a0:       e8 3b a0 ff ff          call   4506e0 <encode_string>
 4569aa:       e8 31 9d ff ff          call   4506e0 <encode_string>
 456e79:       e9 62 98 ff ff          jmp    4506e0 <encode_string>
 456efe:       e9 dd 97 ff ff          jmp    4506e0 <encode_string>

all these calls now inline:

encode_string                                 58       -     -58

where for example encode_putfh() grows by:

encode_putfh                                  70     118     +48

> Also, is the bloat-o-meter output limited to the .text section, or does
> it include growth in out-of-line sections too?

bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
returned by nm). Adding "-c" option will categorize the output by
symbol type (this is x86_64 defconfig change with gcc-4.2.1):

add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
Function                                     old     new   delta
...
Total: Before=16685068, After=16718743, chg +0.20%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Data                                         old     new   delta
Total: Before=4471889, After=4471889, chg +0.00%
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
RO Data                                      old     new   delta
icl_voltage_level_max_cdclk                   12       -     -12
Total: Before=1783310, After=1783298, chg -0.00%

Uros.
Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
Posted by Josh Poimboeuf 8 months, 2 weeks ago
On Tue, Apr 08, 2025 at 01:10:14PM +0200, Uros Bizjak wrote:
> On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@kernel.org> wrote:
> > * Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > > Use asm_inline() to prevent the compiler from making poor inlining
> > > decisions based on the length of the objtool annotations.
> > >
> > > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > > 0.18% text size increase:
> > >
> > >   add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> > >   Total: Before=19448407, After=19483356, chg +0.18%
> > >
> > > Presumably the text growth is due to increased inlining.  A net total of
> > > 345 functions were removed.
> >
> > Since +0.18% puts this into the 'significant' category of .text size
> > increases, it would be nice to see a bit more details about the nature
> > of these function calls removed: were they really explicit calls to
> > __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> > something else?

The instrumentation macros are used by WARN*() and lockdep, so this can
affect a lot of functions.

BTW, without the objtool annotations, each of those macros resolves to a
single NOP.  So using inline_asm() seems obviously correct here as it
accurately communicates the code size to the compiler.  I'm not sure if
that was clear from the description.

> > Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> > functions removed that's 100 bytes per function? Doesn't sound right.
> 
> Please note that removed functions can be inlined at several places. E.g.:
> 
> $ grep "<encode_string>"  objdump.old
> 
> 00000000004506e0 <encode_string>:
>  45113c:       e8 9f f5 ff ff          call   4506e0 <encode_string>
>  452bcb:       e9 10 db ff ff          jmp    4506e0 <encode_string>
>  453d33:       e8 a8 c9 ff ff          call   4506e0 <encode_string>
>  453ef7:       e8 e4 c7 ff ff          call   4506e0 <encode_string>
>  45549f:       e8 3c b2 ff ff          call   4506e0 <encode_string>
>  455843:       e8 98 ae ff ff          call   4506e0 <encode_string>
>  455b37:       e8 a4 ab ff ff          call   4506e0 <encode_string>
>  455b47:       e8 94 ab ff ff          call   4506e0 <encode_string>
>  4564fa:       e8 e1 a1 ff ff          call   4506e0 <encode_string>
>  456669:       e8 72 a0 ff ff          call   4506e0 <encode_string>
>  456691:       e8 4a a0 ff ff          call   4506e0 <encode_string>
>  4566a0:       e8 3b a0 ff ff          call   4506e0 <encode_string>
>  4569aa:       e8 31 9d ff ff          call   4506e0 <encode_string>
>  456e79:       e9 62 98 ff ff          jmp    4506e0 <encode_string>
>  456efe:       e9 dd 97 ff ff          jmp    4506e0 <encode_string>
> 
> all these calls now inline:
> 
> encode_string                                 58       -     -58
> 
> where for example encode_putfh() grows by:
> 
> encode_putfh                                  70     118     +48

Thanks for looking!  That makes sense: encode_string() uses
WARN_ON_ONCE() which uses instrumentation_begin()/end().
encode_string() is getting inlined now that GCC has more accurate
information about its size.

> > Also, is the bloat-o-meter output limited to the .text section, or does
> > it include growth in out-of-line sections too?
>
> bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
> returned by nm). Adding "-c" option will categorize the output by
> symbol type (this is x86_64 defconfig change with gcc-4.2.1):
> 
> add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
> Function                                     old     new   delta
> ...
> Total: Before=16685068, After=16718743, chg +0.20%
> add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> Data                                         old     new   delta
> Total: Before=4471889, After=4471889, chg +0.00%
> add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
> RO Data                                      old     new   delta
> icl_voltage_level_max_cdclk                   12       -     -12
> Total: Before=1783310, After=1783298, chg -0.00%

Right.  That means that bloat-o-meter's results include any
text.unlikely growth/shrinkage, because that code is contained by
symbols (typically .cold subfunctons).

I suppose it would be more helpful if .text.unlikely were excluded or
separated out.  .text.unlikely code growth is always a good thing, as
opposed to .text for which growth can be good or bad.

-- 
Josh