[PATCH] symbols: discard stray file symbols

Jan Beulich posted 1 patch 6 months, 2 weeks ago
Failed in applying to current master (apply log)
[PATCH] symbols: discard stray file symbols
Posted by Jan Beulich 6 months, 2 weeks ago
By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
when linking xen.efi. Due to the nature of file symbols in COFF symbol
tables (see the code comment) the symbols_offsets[] entries for such
symbols would cause assembler warnings regarding value truncation. Of
course the resulting entries would also be both meaningless and useless.
Add a heuristic to get rid of them, really taking effect only when
--all-symbols is specified (otherwise these symbols are discarded
anyway).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Factor 2 may in principle still be too small: We zap what looks like
real file symbols already in read_symbol(), so table_cnt doesn't really
reflect the number of symbol table entries encountered. It has proven to
work for me in practice though, with still some leeway left.

--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
 	if (strstr((char *)s->sym + offset, "_compiled."))
 		return 0;
 
+	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
+	 * section name while linking xen.efi. In COFF symbol tables the
+	 * "value" of file symbols is a link (symbol table index) to the next
+	 * file symbol. Since file (and other) symbols (can) come with one
+	 * (or in principle more) auxiliary symbol table entries, the value in
+	 * this heuristic is bounded to twice the number of symbols we have
+	 * found. See also read_symbol() as to the '?' checked for here. */
+	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
+		return 0;
+
 	return 1;
 }
Re: [PATCH] symbols: discard stray file symbols
Posted by Roger Pau Monné 1 week, 2 days ago
On Wed, Apr 16, 2025 at 11:00:57AM +0200, Jan Beulich wrote:
> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
> when linking xen.efi. Due to the nature of file symbols in COFF symbol
> tables (see the code comment) the symbols_offsets[] entries for such
> symbols would cause assembler warnings regarding value truncation. Of
> course the resulting entries would also be both meaningless and useless.
> Add a heuristic to get rid of them, really taking effect only when
> --all-symbols is specified (otherwise these symbols are discarded
> anyway).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> Factor 2 may in principle still be too small: We zap what looks like
> real file symbols already in read_symbol(), so table_cnt doesn't really
> reflect the number of symbol table entries encountered. It has proven to
> work for me in practice though, with still some leeway left.
> 
> --- a/xen/tools/symbols.c
> +++ b/xen/tools/symbols.c
> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>  	if (strstr((char *)s->sym + offset, "_compiled."))
>  		return 0;
>  
> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
> +	 * section name while linking xen.efi. In COFF symbol tables the
> +	 * "value" of file symbols is a link (symbol table index) to the next
> +	 * file symbol. Since file (and other) symbols (can) come with one
> +	 * (or in principle more) auxiliary symbol table entries, the value in
> +	 * this heuristic is bounded to twice the number of symbols we have
> +	 * found. See also read_symbol() as to the '?' checked for here. */
> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)

Maybe a naive question, but couldn't you drop everything below
__XEN_VIRT_START, as we shouldn't have any symbols below that
address?

Thanks, Roger.

Re: [PATCH] symbols: discard stray file symbols
Posted by Jan Beulich 1 week, 2 days ago
On 21.10.2025 11:56, Roger Pau Monné wrote:
> On Wed, Apr 16, 2025 at 11:00:57AM +0200, Jan Beulich wrote:
>> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
>> when linking xen.efi. Due to the nature of file symbols in COFF symbol
>> tables (see the code comment) the symbols_offsets[] entries for such
>> symbols would cause assembler warnings regarding value truncation. Of
>> course the resulting entries would also be both meaningless and useless.
>> Add a heuristic to get rid of them, really taking effect only when
>> --all-symbols is specified (otherwise these symbols are discarded
>> anyway).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/xen/tools/symbols.c
>> +++ b/xen/tools/symbols.c
>> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>>  	if (strstr((char *)s->sym + offset, "_compiled."))
>>  		return 0;
>>  
>> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
>> +	 * section name while linking xen.efi. In COFF symbol tables the
>> +	 * "value" of file symbols is a link (symbol table index) to the next
>> +	 * file symbol. Since file (and other) symbols (can) come with one
>> +	 * (or in principle more) auxiliary symbol table entries, the value in
>> +	 * this heuristic is bounded to twice the number of symbols we have
>> +	 * found. See also read_symbol() as to the '?' checked for here. */
>> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
> 
> Maybe a naive question, but couldn't you drop everything below
> __XEN_VIRT_START, as we shouldn't have any symbols below that
> address?

If we assumed so, then that might be an option. Such an assumption doesn't
look safe to me, though. See how e.g. hv_hcall_page is outside of the Xen
image (albeit still within __XEN_VIRT_{START,END}). I wouldn't want to
preclude architectures playing "interesting" games with symbols, in
particular when - unlike x86 - they have the entire VA space for their use.

Jan

Ping: [PATCH] symbols: discard stray file symbols
Posted by Jan Beulich 1 month ago
On 16.04.2025 11:00, Jan Beulich wrote:
> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
> when linking xen.efi. Due to the nature of file symbols in COFF symbol
> tables (see the code comment) the symbols_offsets[] entries for such
> symbols would cause assembler warnings regarding value truncation. Of
> course the resulting entries would also be both meaningless and useless.
> Add a heuristic to get rid of them, really taking effect only when
> --all-symbols is specified (otherwise these symbols are discarded
> anyway).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

May I please ask for feedback here, so that hopefully we can have this
sorted in 4.21?

Jan

> ---
> Factor 2 may in principle still be too small: We zap what looks like
> real file symbols already in read_symbol(), so table_cnt doesn't really
> reflect the number of symbol table entries encountered. It has proven to
> work for me in practice though, with still some leeway left.
> 
> --- a/xen/tools/symbols.c
> +++ b/xen/tools/symbols.c
> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>  	if (strstr((char *)s->sym + offset, "_compiled."))
>  		return 0;
>  
> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
> +	 * section name while linking xen.efi. In COFF symbol tables the
> +	 * "value" of file symbols is a link (symbol table index) to the next
> +	 * file symbol. Since file (and other) symbols (can) come with one
> +	 * (or in principle more) auxiliary symbol table entries, the value in
> +	 * this heuristic is bounded to twice the number of symbols we have
> +	 * found. See also read_symbol() as to the '?' checked for here. */
> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
> +		return 0;
> +
>  	return 1;
>  }
>
Re: Ping: [PATCH] symbols: discard stray file symbols
Posted by Oleksii Kurochko 1 month ago
On 9/25/25 9:36 AM, Jan Beulich wrote:
> On 16.04.2025 11:00, Jan Beulich wrote:
>> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
>> when linking xen.efi. Due to the nature of file symbols in COFF symbol
>> tables (see the code comment) the symbols_offsets[] entries for such
>> symbols would cause assembler warnings regarding value truncation. Of
>> course the resulting entries would also be both meaningless and useless.
>> Add a heuristic to get rid of them, really taking effect only when
>> --all-symbols is specified (otherwise these symbols are discarded
>> anyway).
>>
>> Signed-off-by: Jan Beulich<jbeulich@suse.com>
> May I please ask for feedback here, so that hopefully we can have this
> sorted in 4.21?

It is okay for me to have this change in 4.21:
  Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>

~ Oleksii

>
> Jan
>
>> ---
>> Factor 2 may in principle still be too small: We zap what looks like
>> real file symbols already in read_symbol(), so table_cnt doesn't really
>> reflect the number of symbol table entries encountered. It has proven to
>> work for me in practice though, with still some leeway left.
>>
>> --- a/xen/tools/symbols.c
>> +++ b/xen/tools/symbols.c
>> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>>   	if (strstr((char *)s->sym + offset, "_compiled."))
>>   		return 0;
>>   
>> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
>> +	 * section name while linking xen.efi. In COFF symbol tables the
>> +	 * "value" of file symbols is a link (symbol table index) to the next
>> +	 * file symbol. Since file (and other) symbols (can) come with one
>> +	 * (or in principle more) auxiliary symbol table entries, the value in
>> +	 * this heuristic is bounded to twice the number of symbols we have
>> +	 * found. See also read_symbol() as to the '?' checked for here. */
>> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
>> +		return 0;
>> +
>>   	return 1;
>>   }
>>   
Re: [PATCH] symbols: discard stray file symbols
Posted by Jason Andryuk 1 month, 3 weeks ago
On 2025-04-16 05:00, Jan Beulich wrote:
> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
> when linking xen.efi. Due to the nature of file symbols in COFF symbol
> tables (see the code comment) the symbols_offsets[] entries for such
> symbols would cause assembler warnings regarding value truncation. Of
> course the resulting entries would also be both meaningless and useless.
> Add a heuristic to get rid of them, really taking effect only when
> --all-symbols is specified (otherwise these symbols are discarded
> anyway).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Factor 2 may in principle still be too small: We zap what looks like
> real file symbols already in read_symbol(), so table_cnt doesn't really
> reflect the number of symbol table entries encountered. It has proven to
> work for me in practice though, with still some leeway left.
> 
> --- a/xen/tools/symbols.c
> +++ b/xen/tools/symbols.c
> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>   	if (strstr((char *)s->sym + offset, "_compiled."))
>   		return 0;
>   
> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
> +	 * section name while linking xen.efi. In COFF symbol tables the
> +	 * "value" of file symbols is a link (symbol table index) to the next
> +	 * file symbol. Since file (and other) symbols (can) come with one
> +	 * (or in principle more) auxiliary symbol table entries, the value in
> +	 * this heuristic is bounded to twice the number of symbols we have
> +	 * found. See also read_symbol() as to the '?' checked for here. */
> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
> +		return 0;
> +
>   	return 1;
>   }

I looked at this.  It'll drop symbols, but I don't know enough to give 
an R-b.  I can't give an actionable A-b either.   Maybe someone else can 
chime in.

Maybe this is just showing my lack of knowledge, but could any symbol 
starting "?." be considered invalid?  I don't think I've ever seen any 
like that.

Regards,
Jason
Re: [PATCH] symbols: discard stray file symbols
Posted by Jan Beulich 1 month, 3 weeks ago
On 04.09.2025 23:53, Jason Andryuk wrote:
> On 2025-04-16 05:00, Jan Beulich wrote:
>> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
>> when linking xen.efi. Due to the nature of file symbols in COFF symbol
>> tables (see the code comment) the symbols_offsets[] entries for such
>> symbols would cause assembler warnings regarding value truncation. Of
>> course the resulting entries would also be both meaningless and useless.
>> Add a heuristic to get rid of them, really taking effect only when
>> --all-symbols is specified (otherwise these symbols are discarded
>> anyway).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Factor 2 may in principle still be too small: We zap what looks like
>> real file symbols already in read_symbol(), so table_cnt doesn't really
>> reflect the number of symbol table entries encountered. It has proven to
>> work for me in practice though, with still some leeway left.
>>
>> --- a/xen/tools/symbols.c
>> +++ b/xen/tools/symbols.c
>> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>>   	if (strstr((char *)s->sym + offset, "_compiled."))
>>   		return 0;
>>   
>> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
>> +	 * section name while linking xen.efi. In COFF symbol tables the
>> +	 * "value" of file symbols is a link (symbol table index) to the next
>> +	 * file symbol. Since file (and other) symbols (can) come with one
>> +	 * (or in principle more) auxiliary symbol table entries, the value in
>> +	 * this heuristic is bounded to twice the number of symbols we have
>> +	 * found. See also read_symbol() as to the '?' checked for here. */
>> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
>> +		return 0;
>> +
>>   	return 1;
>>   }
> 
> I looked at this.  It'll drop symbols, but I don't know enough to give 
> an R-b.  I can't give an actionable A-b either.   Maybe someone else can 
> chime in.
> 
> Maybe this is just showing my lack of knowledge, but could any symbol 
> starting "?." be considered invalid?  I don't think I've ever seen any 
> like that.

With quotation, almost any symbol name can appear in principle. I wouldn't
want to judge symbol validity by its name. What's more important here,
though, is that sym[0] isn't part of the name; it's the symbol's type as
taken from nm's output. We're therefore heuristically looking at symbols
of unknown type with a dot as the first character (as section names would
conventionally have it).

Jan