By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
when linking xen.efi. Due to the nature of file symbols in COFF symbol
tables (see the code comment) the symbols_offsets[] entries for such
symbols would cause assembler warnings regarding value truncation. Of
course the resulting entries would also be both meaningless and useless.
Add a heuristic to get rid of them, really taking effect only when
--all-symbols is specified (otherwise these symbols are discarded
anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Factor 2 may in principle still be too small: We zap what looks like
real file symbols already in read_symbol(), so table_cnt doesn't really
reflect the number of symbol table entries encountered. It has proven to
work for me in practice though, with still some leeway left.
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
 	if (strstr((char *)s->sym + offset, "_compiled."))
 		return 0;
 
+	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
+	 * section name while linking xen.efi. In COFF symbol tables the
+	 * "value" of file symbols is a link (symbol table index) to the next
+	 * file symbol. Since file (and other) symbols (can) come with one
+	 * (or in principle more) auxiliary symbol table entries, the value in
+	 * this heuristic is bounded to twice the number of symbols we have
+	 * found. See also read_symbol() as to the '?' checked for here. */
+	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
+		return 0;
+
 	return 1;
 }On Wed, Apr 16, 2025 at 11:00:57AM +0200, Jan Beulich wrote: > By observation GNU ld 2.25 may emit file symbols for .data.read_mostly > when linking xen.efi. Due to the nature of file symbols in COFF symbol > tables (see the code comment) the symbols_offsets[] entries for such > symbols would cause assembler warnings regarding value truncation. Of > course the resulting entries would also be both meaningless and useless. > Add a heuristic to get rid of them, really taking effect only when > --all-symbols is specified (otherwise these symbols are discarded > anyway). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Factor 2 may in principle still be too small: We zap what looks like > real file symbols already in read_symbol(), so table_cnt doesn't really > reflect the number of symbol table entries encountered. It has proven to > work for me in practice though, with still some leeway left. > > --- a/xen/tools/symbols.c > +++ b/xen/tools/symbols.c > @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry > if (strstr((char *)s->sym + offset, "_compiled.")) > return 0; > > + /* At least GNU ld 2.25 may emit bogus file symbols referencing a > + * section name while linking xen.efi. In COFF symbol tables the > + * "value" of file symbols is a link (symbol table index) to the next > + * file symbol. Since file (and other) symbols (can) come with one > + * (or in principle more) auxiliary symbol table entries, the value in > + * this heuristic is bounded to twice the number of symbols we have > + * found. See also read_symbol() as to the '?' checked for here. */ > + if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2) Maybe a naive question, but couldn't you drop everything below __XEN_VIRT_START, as we shouldn't have any symbols below that address? Thanks, Roger.
On 21.10.2025 11:56, Roger Pau Monné wrote:
> On Wed, Apr 16, 2025 at 11:00:57AM +0200, Jan Beulich wrote:
>> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly
>> when linking xen.efi. Due to the nature of file symbols in COFF symbol
>> tables (see the code comment) the symbols_offsets[] entries for such
>> symbols would cause assembler warnings regarding value truncation. Of
>> course the resulting entries would also be both meaningless and useless.
>> Add a heuristic to get rid of them, really taking effect only when
>> --all-symbols is specified (otherwise these symbols are discarded
>> anyway).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks.
>> --- a/xen/tools/symbols.c
>> +++ b/xen/tools/symbols.c
>> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry
>>  	if (strstr((char *)s->sym + offset, "_compiled."))
>>  		return 0;
>>  
>> +	/* At least GNU ld 2.25 may emit bogus file symbols referencing a
>> +	 * section name while linking xen.efi. In COFF symbol tables the
>> +	 * "value" of file symbols is a link (symbol table index) to the next
>> +	 * file symbol. Since file (and other) symbols (can) come with one
>> +	 * (or in principle more) auxiliary symbol table entries, the value in
>> +	 * this heuristic is bounded to twice the number of symbols we have
>> +	 * found. See also read_symbol() as to the '?' checked for here. */
>> +	if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2)
> 
> Maybe a naive question, but couldn't you drop everything below
> __XEN_VIRT_START, as we shouldn't have any symbols below that
> address?
If we assumed so, then that might be an option. Such an assumption doesn't
look safe to me, though. See how e.g. hv_hcall_page is outside of the Xen
image (albeit still within __XEN_VIRT_{START,END}). I wouldn't want to
preclude architectures playing "interesting" games with symbols, in
particular when - unlike x86 - they have the entire VA space for their use.
Jan
                
            On 16.04.2025 11:00, Jan Beulich wrote: > By observation GNU ld 2.25 may emit file symbols for .data.read_mostly > when linking xen.efi. Due to the nature of file symbols in COFF symbol > tables (see the code comment) the symbols_offsets[] entries for such > symbols would cause assembler warnings regarding value truncation. Of > course the resulting entries would also be both meaningless and useless. > Add a heuristic to get rid of them, really taking effect only when > --all-symbols is specified (otherwise these symbols are discarded > anyway). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> May I please ask for feedback here, so that hopefully we can have this sorted in 4.21? Jan > --- > Factor 2 may in principle still be too small: We zap what looks like > real file symbols already in read_symbol(), so table_cnt doesn't really > reflect the number of symbol table entries encountered. It has proven to > work for me in practice though, with still some leeway left. > > --- a/xen/tools/symbols.c > +++ b/xen/tools/symbols.c > @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry > if (strstr((char *)s->sym + offset, "_compiled.")) > return 0; > > + /* At least GNU ld 2.25 may emit bogus file symbols referencing a > + * section name while linking xen.efi. In COFF symbol tables the > + * "value" of file symbols is a link (symbol table index) to the next > + * file symbol. Since file (and other) symbols (can) come with one > + * (or in principle more) auxiliary symbol table entries, the value in > + * this heuristic is bounded to twice the number of symbols we have > + * found. See also read_symbol() as to the '?' checked for here. */ > + if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2) > + return 0; > + > return 1; > } >
On 9/25/25 9:36 AM, Jan Beulich wrote: > On 16.04.2025 11:00, Jan Beulich wrote: >> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly >> when linking xen.efi. Due to the nature of file symbols in COFF symbol >> tables (see the code comment) the symbols_offsets[] entries for such >> symbols would cause assembler warnings regarding value truncation. Of >> course the resulting entries would also be both meaningless and useless. >> Add a heuristic to get rid of them, really taking effect only when >> --all-symbols is specified (otherwise these symbols are discarded >> anyway). >> >> Signed-off-by: Jan Beulich<jbeulich@suse.com> > May I please ask for feedback here, so that hopefully we can have this > sorted in 4.21? It is okay for me to have this change in 4.21: Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com> ~ Oleksii > > Jan > >> --- >> Factor 2 may in principle still be too small: We zap what looks like >> real file symbols already in read_symbol(), so table_cnt doesn't really >> reflect the number of symbol table entries encountered. It has proven to >> work for me in practice though, with still some leeway left. >> >> --- a/xen/tools/symbols.c >> +++ b/xen/tools/symbols.c >> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry >> if (strstr((char *)s->sym + offset, "_compiled.")) >> return 0; >> >> + /* At least GNU ld 2.25 may emit bogus file symbols referencing a >> + * section name while linking xen.efi. In COFF symbol tables the >> + * "value" of file symbols is a link (symbol table index) to the next >> + * file symbol. Since file (and other) symbols (can) come with one >> + * (or in principle more) auxiliary symbol table entries, the value in >> + * this heuristic is bounded to twice the number of symbols we have >> + * found. See also read_symbol() as to the '?' checked for here. */ >> + if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2) >> + return 0; >> + >> return 1; >> } >>
On 2025-04-16 05:00, Jan Beulich wrote: > By observation GNU ld 2.25 may emit file symbols for .data.read_mostly > when linking xen.efi. Due to the nature of file symbols in COFF symbol > tables (see the code comment) the symbols_offsets[] entries for such > symbols would cause assembler warnings regarding value truncation. Of > course the resulting entries would also be both meaningless and useless. > Add a heuristic to get rid of them, really taking effect only when > --all-symbols is specified (otherwise these symbols are discarded > anyway). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > Factor 2 may in principle still be too small: We zap what looks like > real file symbols already in read_symbol(), so table_cnt doesn't really > reflect the number of symbol table entries encountered. It has proven to > work for me in practice though, with still some leeway left. > > --- a/xen/tools/symbols.c > +++ b/xen/tools/symbols.c > @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry > if (strstr((char *)s->sym + offset, "_compiled.")) > return 0; > > + /* At least GNU ld 2.25 may emit bogus file symbols referencing a > + * section name while linking xen.efi. In COFF symbol tables the > + * "value" of file symbols is a link (symbol table index) to the next > + * file symbol. Since file (and other) symbols (can) come with one > + * (or in principle more) auxiliary symbol table entries, the value in > + * this heuristic is bounded to twice the number of symbols we have > + * found. See also read_symbol() as to the '?' checked for here. */ > + if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2) > + return 0; > + > return 1; > } I looked at this. It'll drop symbols, but I don't know enough to give an R-b. I can't give an actionable A-b either. Maybe someone else can chime in. Maybe this is just showing my lack of knowledge, but could any symbol starting "?." be considered invalid? I don't think I've ever seen any like that. Regards, Jason
On 04.09.2025 23:53, Jason Andryuk wrote: > On 2025-04-16 05:00, Jan Beulich wrote: >> By observation GNU ld 2.25 may emit file symbols for .data.read_mostly >> when linking xen.efi. Due to the nature of file symbols in COFF symbol >> tables (see the code comment) the symbols_offsets[] entries for such >> symbols would cause assembler warnings regarding value truncation. Of >> course the resulting entries would also be both meaningless and useless. >> Add a heuristic to get rid of them, really taking effect only when >> --all-symbols is specified (otherwise these symbols are discarded >> anyway). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> Factor 2 may in principle still be too small: We zap what looks like >> real file symbols already in read_symbol(), so table_cnt doesn't really >> reflect the number of symbol table entries encountered. It has proven to >> work for me in practice though, with still some leeway left. >> >> --- a/xen/tools/symbols.c >> +++ b/xen/tools/symbols.c >> @@ -213,6 +213,16 @@ static int symbol_valid(struct sym_entry >> if (strstr((char *)s->sym + offset, "_compiled.")) >> return 0; >> >> + /* At least GNU ld 2.25 may emit bogus file symbols referencing a >> + * section name while linking xen.efi. In COFF symbol tables the >> + * "value" of file symbols is a link (symbol table index) to the next >> + * file symbol. Since file (and other) symbols (can) come with one >> + * (or in principle more) auxiliary symbol table entries, the value in >> + * this heuristic is bounded to twice the number of symbols we have >> + * found. See also read_symbol() as to the '?' checked for here. */ >> + if (s->sym[0] == '?' && s->sym[1] == '.' && s->addr < table_cnt * 2) >> + return 0; >> + >> return 1; >> } > > I looked at this. It'll drop symbols, but I don't know enough to give > an R-b. I can't give an actionable A-b either. Maybe someone else can > chime in. > > Maybe this is just showing my lack of knowledge, but could any symbol > starting "?." be considered invalid? I don't think I've ever seen any > like that. With quotation, almost any symbol name can appear in principle. I wouldn't want to judge symbol validity by its name. What's more important here, though, is that sym[0] isn't part of the name; it's the symbol's type as taken from nm's output. We're therefore heuristically looking at symbols of unknown type with a dot as the first character (as section names would conventionally have it). Jan
© 2016 - 2025 Red Hat, Inc.