[PATCH v2 5/6] objtool: list `noreturn` Rust functions

Miguel Ojeda posted 6 patches 1 year, 4 months ago
There is a newer version of this series
[PATCH v2 5/6] objtool: list `noreturn` Rust functions
Posted by Miguel Ojeda 1 year, 4 months ago
Rust functions may be `noreturn` (i.e. diverging) by returning the
"never" type, `!`, e.g.

    fn f() -> ! {
        loop {}
    }

Thus list the known `noreturn` functions to avoid such warnings.

Without this, `objtool` would complain if enabled for Rust, e.g.:

    rust/core.o: warning: objtool:
    _R...9panic_fmt() falls through to next function _R...18panic_nounwind_fmt()

    rust/alloc.o: warning: objtool:
    .text: unexpected end of section

In order to do so, we cannot match symbols' names exactly, for two
reasons:

  - Rust mangling scheme [1] contains disambiguators [2] which we
    cannot predict (e.g. they may vary depending on the compiler version).

    One possibility to solve this would be to parse v0 and ignore/zero
    those before comparison.

  - Some of the diverging functions come from `core` and `alloc`, i.e.
    the Rust standard library, which may change with each compiler version
    since they are implementation details (e.g. `panic_internals`).

Thus, to workaround both issues, only part of the symbols are matched,
instead of using the `NORETURN` macro in `noreturns.h`.

Ideally, just like for the C side, we should have a better solution. For
instance, the compiler could give us the list via something like:

    $ rustc --print noreturns ...

Link: https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html [1]
Link: https://doc.rust-lang.org/rustc/symbol-mangling/v0.html#disambiguator [2]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
---
Please let me know if there is a better solution -- what kind of solution was
being thought about for C as mentioned in `noreturns.h`? Would it help for Rust?

 tools/objtool/check.c     | 36 +++++++++++++++++++++++++++++++++++-
 tools/objtool/noreturns.h |  2 ++
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 0a33d9195b7a..0afdcee038fd 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -177,6 +177,20 @@ static bool is_sibling_call(struct instruction *insn)
 	return (is_static_jump(insn) && insn_call_dest(insn));
 }

+/*
+ * Checks if a string ends with another.
+ */
+static bool str_ends_with(const char *s, const char *sub)
+{
+	const int slen = strlen(s);
+	const int sublen = strlen(sub);
+
+	if (sublen > slen)
+		return 0;
+
+	return !memcmp(s + slen - sublen, sub, sublen);
+}
+
 /*
  * This checks to see if the given function is a "noreturn" function.
  *
@@ -202,10 +216,30 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func,
 	if (!func)
 		return false;

-	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK)
+	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK) {
+		/*
+		 * Rust standard library functions.
+		 *
+		 * These are just heuristics -- we do not control the precise symbol
+		 * name, due to the crate disambiguators (which depend on the compiler)
+		 * as well as changes to the source code itself between versions.
+		 */
+		if (!strncmp(func->name, "_R", 2) &&
+		    (str_ends_with(func->name, "_4core6option13unwrap_failed")			||
+		     str_ends_with(func->name, "_4core6result13unwrap_failed")			||
+		     str_ends_with(func->name, "_4core9panicking5panic")			||
+		     str_ends_with(func->name, "_4core9panicking9panic_fmt")			||
+		     str_ends_with(func->name, "_4core9panicking14panic_explicit")		||
+		     str_ends_with(func->name, "_4core9panicking18panic_bounds_check")		||
+		     strstr(func->name, "_4core9panicking11panic_const24panic_const_")		||
+		     (strstr(func->name, "_4core5slice5index24slice_") &&
+		      str_ends_with(func->name, "_fail"))))
+			return true;
+
 		for (i = 0; i < ARRAY_SIZE(global_noreturns); i++)
 			if (!strcmp(func->name, global_noreturns[i]))
 				return true;
+	}

 	if (func->bind == STB_WEAK)
 		return false;
diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
index 7ebf29c91184..82a001ac433b 100644
--- a/tools/objtool/noreturns.h
+++ b/tools/objtool/noreturns.h
@@ -35,6 +35,8 @@ NORETURN(panic)
 NORETURN(panic_smp_self_stop)
 NORETURN(rest_init)
 NORETURN(rewind_stack_and_make_dead)
+NORETURN(rust_begin_unwind)
+NORETURN(rust_helper_BUG)
 NORETURN(sev_es_terminate)
 NORETURN(snp_abort)
 NORETURN(start_kernel)
--
2.45.2
Re: [PATCH v2 5/6] objtool: list `noreturn` Rust functions
Posted by Gary Guo 1 year, 4 months ago
On Wed, 24 Jul 2024 18:14:58 +0200
Miguel Ojeda <ojeda@kernel.org> wrote:

> Rust functions may be `noreturn` (i.e. diverging) by returning the
> "never" type, `!`, e.g.
> 
>     fn f() -> ! {
>         loop {}
>     }
> 
> Thus list the known `noreturn` functions to avoid such warnings.
> 
> Without this, `objtool` would complain if enabled for Rust, e.g.:
> 
>     rust/core.o: warning: objtool:
>     _R...9panic_fmt() falls through to next function _R...18panic_nounwind_fmt()
> 
>     rust/alloc.o: warning: objtool:
>     .text: unexpected end of section
> 
> In order to do so, we cannot match symbols' names exactly, for two
> reasons:
> 
>   - Rust mangling scheme [1] contains disambiguators [2] which we
>     cannot predict (e.g. they may vary depending on the compiler version).
> 
>     One possibility to solve this would be to parse v0 and ignore/zero
>     those before comparison.
> 
>   - Some of the diverging functions come from `core` and `alloc`, i.e.
>     the Rust standard library, which may change with each compiler version
>     since they are implementation details (e.g. `panic_internals`).
> 
> Thus, to workaround both issues, only part of the symbols are matched,
> instead of using the `NORETURN` macro in `noreturns.h`.
> 
> Ideally, just like for the C side, we should have a better solution. For
> instance, the compiler could give us the list via something like:
> 
>     $ rustc --print noreturns ...
> 
> Link: https://rust-lang.github.io/rfcs/2603-rust-symbol-name-mangling-v0.html [1]
> Link: https://doc.rust-lang.org/rustc/symbol-mangling/v0.html#disambiguator [2]
> Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
> ---
> Please let me know if there is a better solution -- what kind of solution was
> being thought about for C as mentioned in `noreturns.h`? Would it help for Rust?
> 
>  tools/objtool/check.c     | 36 +++++++++++++++++++++++++++++++++++-
>  tools/objtool/noreturns.h |  2 ++
>  2 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 0a33d9195b7a..0afdcee038fd 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -177,6 +177,20 @@ static bool is_sibling_call(struct instruction *insn)
>  	return (is_static_jump(insn) && insn_call_dest(insn));
>  }
> 
> +/*
> + * Checks if a string ends with another.
> + */
> +static bool str_ends_with(const char *s, const char *sub)
> +{
> +	const int slen = strlen(s);
> +	const int sublen = strlen(sub);
> +
> +	if (sublen > slen)
> +		return 0;
> +
> +	return !memcmp(s + slen - sublen, sub, sublen);
> +}
> +
>  /*
>   * This checks to see if the given function is a "noreturn" function.
>   *
> @@ -202,10 +216,30 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func,
>  	if (!func)
>  		return false;
> 
> -	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK)
> +	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK) {
> +		/*
> +		 * Rust standard library functions.
> +		 *
> +		 * These are just heuristics -- we do not control the precise symbol
> +		 * name, due to the crate disambiguators (which depend on the compiler)
> +		 * as well as changes to the source code itself between versions.
> +		 */
> +		if (!strncmp(func->name, "_R", 2) &&
> +		    (str_ends_with(func->name, "_4core6option13unwrap_failed")			||
> +		     str_ends_with(func->name, "_4core6result13unwrap_failed")			||
> +		     str_ends_with(func->name, "_4core9panicking5panic")			||
> +		     str_ends_with(func->name, "_4core9panicking9panic_fmt")			||
> +		     str_ends_with(func->name, "_4core9panicking14panic_explicit")		||
> +		     str_ends_with(func->name, "_4core9panicking18panic_bounds_check")		||
> +		     strstr(func->name, "_4core9panicking11panic_const24panic_const_")		||
> +		     (strstr(func->name, "_4core5slice5index24slice_") &&
> +		      str_ends_with(func->name, "_fail"))))
> +			return true;
> +

I wonder if we should use dwarf for this. There's DW_AT_noreturn which
tells us exactly what we want. This would also remove the need for the
hardcoded C symbol list. I do recognise that debug info is not required
for objtool though...

>  		for (i = 0; i < ARRAY_SIZE(global_noreturns); i++)
>  			if (!strcmp(func->name, global_noreturns[i]))
>  				return true;
> +	}
> 
>  	if (func->bind == STB_WEAK)
>  		return false;
> diff --git a/tools/objtool/noreturns.h b/tools/objtool/noreturns.h
> index 7ebf29c91184..82a001ac433b 100644
> --- a/tools/objtool/noreturns.h
> +++ b/tools/objtool/noreturns.h
> @@ -35,6 +35,8 @@ NORETURN(panic)
>  NORETURN(panic_smp_self_stop)
>  NORETURN(rest_init)
>  NORETURN(rewind_stack_and_make_dead)
> +NORETURN(rust_begin_unwind)
> +NORETURN(rust_helper_BUG)
>  NORETURN(sev_es_terminate)
>  NORETURN(snp_abort)
>  NORETURN(start_kernel)
> --
> 2.45.2
Re: [PATCH v2 5/6] objtool: list `noreturn` Rust functions
Posted by Peter Zijlstra 1 year, 4 months ago
On Wed, Jul 24, 2024 at 08:35:49PM +0100, Gary Guo wrote:
> > @@ -202,10 +216,30 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func,
> >  	if (!func)
> >  		return false;
> > 
> > -	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK)
> > +	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK) {
> > +		/*
> > +		 * Rust standard library functions.
> > +		 *
> > +		 * These are just heuristics -- we do not control the precise symbol
> > +		 * name, due to the crate disambiguators (which depend on the compiler)
> > +		 * as well as changes to the source code itself between versions.
> > +		 */
> > +		if (!strncmp(func->name, "_R", 2) &&
> > +		    (str_ends_with(func->name, "_4core6option13unwrap_failed")			||
> > +		     str_ends_with(func->name, "_4core6result13unwrap_failed")			||
> > +		     str_ends_with(func->name, "_4core9panicking5panic")			||
> > +		     str_ends_with(func->name, "_4core9panicking9panic_fmt")			||
> > +		     str_ends_with(func->name, "_4core9panicking14panic_explicit")		||
> > +		     str_ends_with(func->name, "_4core9panicking18panic_bounds_check")		||
> > +		     strstr(func->name, "_4core9panicking11panic_const24panic_const_")		||
> > +		     (strstr(func->name, "_4core5slice5index24slice_") &&
> > +		      str_ends_with(func->name, "_fail"))))
> > +			return true;
> > +

Perhaps add a helper function: is_rust_noreturn() or somesuch.

> I wonder if we should use dwarf for this. There's DW_AT_noreturn which
> tells us exactly what we want. This would also remove the need for the
> hardcoded C symbol list. I do recognise that debug info is not required
> for objtool though...

I think the slightly longer term plan here was to have noreturn.h
generated from a DWARF build but still included verbatim in the objtool
source, such that it works even on regular builds.

(and can be augmented where the compilers are being 'funny')

It's just that nobody has had time to implement this... you fancy giving
this a stab?
Re: [PATCH v2 5/6] objtool: list `noreturn` Rust functions
Posted by Gary Guo 1 year, 3 months ago
On Thu, 25 Jul 2024 10:33:55 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 24, 2024 at 08:35:49PM +0100, Gary Guo wrote:
> > > @@ -202,10 +216,30 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func,
> > >  	if (!func)
> > >  		return false;
> > > 
> > > -	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK)
> > > +	if (func->bind == STB_GLOBAL || func->bind == STB_WEAK) {
> > > +		/*
> > > +		 * Rust standard library functions.
> > > +		 *
> > > +		 * These are just heuristics -- we do not control the precise symbol
> > > +		 * name, due to the crate disambiguators (which depend on the compiler)
> > > +		 * as well as changes to the source code itself between versions.
> > > +		 */
> > > +		if (!strncmp(func->name, "_R", 2) &&
> > > +		    (str_ends_with(func->name, "_4core6option13unwrap_failed")			||
> > > +		     str_ends_with(func->name, "_4core6result13unwrap_failed")			||
> > > +		     str_ends_with(func->name, "_4core9panicking5panic")			||
> > > +		     str_ends_with(func->name, "_4core9panicking9panic_fmt")			||
> > > +		     str_ends_with(func->name, "_4core9panicking14panic_explicit")		||
> > > +		     str_ends_with(func->name, "_4core9panicking18panic_bounds_check")		||
> > > +		     strstr(func->name, "_4core9panicking11panic_const24panic_const_")		||
> > > +		     (strstr(func->name, "_4core5slice5index24slice_") &&
> > > +		      str_ends_with(func->name, "_fail"))))
> > > +			return true;
> > > +  
> 
> Perhaps add a helper function: is_rust_noreturn() or somesuch.
> 
> > I wonder if we should use dwarf for this. There's DW_AT_noreturn which
> > tells us exactly what we want. This would also remove the need for the
> > hardcoded C symbol list. I do recognise that debug info is not required
> > for objtool though...  
> 
> I think the slightly longer term plan here was to have noreturn.h
> generated from a DWARF build but still included verbatim in the objtool
> source, such that it works even on regular builds.
> 
> (and can be augmented where the compilers are being 'funny')
> 
> It's just that nobody has had time to implement this... you fancy giving
> this a stab?

The information extraction from dwarf is quite easy: I produced a tiny
cargo script that would be able to extract the info from an object file
(Miguel mentioned it in one of the replies):

https://gist.github.com/nbdd0121/449692570622c2f46a29ad9f47c3379a

I think the issue is that the list of noreturn symbols changes
depending on the configuration, so it's kind of difficult to
pregenerate this information.

Do you think it makes sense to have Rust object files always have DWARF
enabled, and we check Rust noreturn symbols using DWARF, while keep the
hardcoded C symbol list?

Best,
Gary