When determining the symbol for a given address (e.g. for the %pS
logging format specifier), so far the size of a symbol (function) was
assumed to be everything until the next symbol. There may be gaps
though, which would better be recognizable in output (often suggesting
something odd is going on).
Insert "fake" end symbols in the address table, accompanied by zero-
length type/name entries (to keep lookup reasonably close to how it
was).
Note however that this, with present GNU binutils, won't work for
xen.efi: The linker loses function sizes (they're not part of a normal
symbol table entry), and hence nm has no way of reporting them.
The address table growth is quite significant on x86 release builds (due
to functions being aligned to 16-byte boundaries), though: Its size
almost doubles.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Note: Style-wise this is a horrible mix. I'm trying to match styles with
what's used in the respective functions.
Older GNU ld retains section symbols, which nm then also lists. Should
we perhaps strip those as we read in nm's output? They don't provide any
useful extra information, as our linker scripts add section start
symbols anyway. (For the purposes here, luckily such section symbols are
at least emitted without size.)
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -116,6 +116,13 @@ const char *symbols_lookup(unsigned long
else high = mid;
}
+ /* If we hit an END symbol, move to the previous (real) one. */
+ if (!symbols_names[get_symbol_offset(low)]) {
+ ASSERT(low);
+ symbol_end = symbols_address(low);
+ --low;
+ }
+
/* search for the first aliased symbol. Aliased symbols are
symbols with the same address */
while (low && symbols_address(low - 1) == symbols_address(low))
@@ -124,11 +131,13 @@ const char *symbols_lookup(unsigned long
/* Grab name */
symbols_expand_symbol(get_symbol_offset(low), namebuf);
- /* Search for next non-aliased symbol */
- for (i = low + 1; i < symbols_num_addrs; i++) {
- if (symbols_address(i) > symbols_address(low)) {
- symbol_end = symbols_address(i);
- break;
+ if (!symbol_end) {
+ /* Search for next non-aliased symbol */
+ for (i = low + 1; i < symbols_num_addrs; i++) {
+ if (symbols_address(i) > symbols_address(low)) {
+ symbol_end = symbols_address(i);
+ break;
+ }
}
}
@@ -170,6 +179,7 @@ int xensyms_read(uint32_t *symnum, char
return -ERANGE;
if ( *symnum == symbols_num_addrs )
{
+ no_symbol:
/* No more symbols */
name[0] = '\0';
return 0;
@@ -183,10 +193,31 @@ int xensyms_read(uint32_t *symnum, char
/* Non-sequential access */
next_offset = get_symbol_offset(*symnum);
+ /*
+ * If we're at an END symbol, skip to the next (real) one. This can
+ * happen if the caller ignores the *symnum output from an earlier
+ * iteration (Linux'es /proc/xen/xensyms handling does as of 6.14-rc).
+ */
+ if ( !symbols_names[next_offset] )
+ {
+ ++next_offset;
+ if ( ++*symnum == symbols_num_addrs )
+ goto no_symbol;
+ }
+
*type = symbols_get_symbol_type(next_offset);
next_offset = symbols_expand_symbol(next_offset, name);
*address = symbols_address(*symnum);
+ /* If next one is an END symbol, skip it. */
+ if ( !symbols_names[next_offset] )
+ {
+ ++next_offset;
+ /* Make sure not to increment past symbols_num_addrs below. */
+ if ( *symnum + 1 < symbols_num_addrs )
+ ++*symnum;
+ }
+
next_symbol = ++*symnum;
spin_unlock(&symbols_mutex);
--- a/xen/tools/symbols.c
+++ b/xen/tools/symbols.c
@@ -38,6 +38,7 @@
struct sym_entry {
unsigned long long addr;
+ unsigned long size;
unsigned int len;
unsigned char *sym;
char *orig_symbol;
@@ -87,6 +88,8 @@ static int read_symbol(FILE *in, struct
static char *filename;
int rc = -1;
+ s->size = 0;
+
switch (input_format) {
case fmt_bsd:
rc = fscanf(in, "%llx %c %499s\n", &s->addr, &stype, str);
@@ -96,8 +99,12 @@ static int read_symbol(FILE *in, struct
/* nothing */;
rc = fscanf(in, "%499[^ |] |%llx | %c |",
str, &s->addr, &stype);
- if (rc == 3 && fscanf(in, " %19[^ |] |", type) != 1)
- *type = '\0';
+ if (rc == 3) {
+ if(fscanf(in, " %19[^ |] |", type) != 1)
+ *type = '\0';
+ else if(fscanf(in, "%lx |", &s->size) != 1)
+ s->size = 0;
+ }
break;
}
if (rc != 3) {
@@ -287,9 +294,18 @@ static int compare_name_orig(const void
return rc;
}
+/* Determine whether the symbol at address table @idx wants a fake END
+ * symbol (address only) emitted as well. */
+static bool want_symbol_end(unsigned int idx)
+{
+ return table[idx].size &&
+ (idx + 1 == table_cnt ||
+ table[idx].addr + table[idx].size < table[idx + 1].addr);
+}
+
static void write_src(void)
{
- unsigned int i, k, off;
+ unsigned int i, k, off, ends;
unsigned int best_idx[256];
unsigned int *markers;
char buf[KSYM_NAME_LEN+1];
@@ -318,24 +334,32 @@ static void write_src(void)
printf("#else\n");
output_label("symbols_offsets");
printf("#endif\n");
- for (i = 0; i < table_cnt; i++) {
+ for (i = 0, ends = 0; i < table_cnt; i++) {
printf("\tPTR\t%#llx - SYMBOLS_ORIGIN\n", table[i].addr);
+
+ table[i].addr_idx = i + ends;
+
+ if (!want_symbol_end(i))
+ continue;
+
+ ++ends;
+ printf("\tPTR\t%#llx - SYMBOLS_ORIGIN\n",
+ table[i].addr + table[i].size);
}
printf("\n");
output_label("symbols_num_addrs");
- printf("\t.long\t%d\n", table_cnt);
+ printf("\t.long\t%d\n", table_cnt + ends);
printf("\n");
/* table of offset markers, that give the offset in the compressed stream
* every 256 symbols */
- markers = (unsigned int *) malloc(sizeof(unsigned int) * ((table_cnt + 255) / 256));
+ markers = malloc(sizeof(*markers) * ((table_cnt + ends + 255) >> 8));
output_label("symbols_names");
- off = 0;
- for (i = 0; i < table_cnt; i++) {
- if ((i & 0xFF) == 0)
- markers[i >> 8] = off;
+ for (i = 0, off = 0, ends = 0; i < table_cnt; i++) {
+ if (((i + ends) & 0xFF) == 0)
+ markers[(i + ends) >> 8] = off;
printf("\t.byte 0x%02x", table[i].len);
for (k = 0; k < table[i].len; k++)
@@ -344,11 +368,22 @@ static void write_src(void)
table[i].stream_offset = off;
off += table[i].len + 1;
+
+ if (!want_symbol_end(i))
+ continue;
+
+ /* END symbols have no name or type. */
+ ++ends;
+ if (((i + ends) & 0xFF) == 0)
+ markers[(i + ends) >> 8] = off;
+
+ printf("\t.byte 0\n");
+ ++off;
}
printf("\n");
output_label("symbols_markers");
- for (i = 0; i < ((table_cnt + 255) >> 8); i++)
+ for (i = 0; i < ((table_cnt + ends + 255) >> 8); i++)
printf("\t.long\t%d\n", markers[i]);
printf("\n");
@@ -450,7 +485,6 @@ static void compress_symbols(unsigned ch
len = table[i].len;
p1 = table[i].sym;
- table[i].addr_idx = i;
/* find the token on the symbol */
p2 = memmem_pvt(p1, len, str, 2);
if (!p2) continue;
On 13/03/2025 1:54 pm, Jan Beulich wrote: > When determining the symbol for a given address (e.g. for the %pS > logging format specifier), so far the size of a symbol (function) was > assumed to be everything until the next symbol. There may be gaps > though, which would better be recognizable in output (often suggesting > something odd is going on). Do you have an example %pS for this new case? > Insert "fake" end symbols in the address table, accompanied by zero- > length type/name entries (to keep lookup reasonably close to how it > was). > > Note however that this, with present GNU binutils, won't work for > xen.efi: The linker loses function sizes (they're not part of a normal > symbol table entry), and hence nm has no way of reporting them. By "present GNU binutils", does this mean that you've got a fix in mind (or in progress), or that it's an open problem to be solved? > The address table growth is quite significant on x86 release builds (due > to functions being aligned to 16-byte boundaries), though: Its size > almost doubles. Why does the function alignment affect the growth? > Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > Note: Style-wise this is a horrible mix. I'm trying to match styles with > what's used in the respective functions. > > Older GNU ld retains section symbols, which nm then also lists. Should > we perhaps strip those as we read in nm's output? They don't provide any > useful extra information, as our linker scripts add section start > symbols anyway. (For the purposes here, luckily such section symbols are > at least emitted without size.) Will symbols_lookup() ever produce these? If not, it might be better to ignore the problem. Taking extra logic to work around a benign issue in older toolchains isn't necessarily ideal. ~Andrew
On 13.03.2025 17:39, Andrew Cooper wrote: > On 13/03/2025 1:54 pm, Jan Beulich wrote: >> When determining the symbol for a given address (e.g. for the %pS >> logging format specifier), so far the size of a symbol (function) was >> assumed to be everything until the next symbol. There may be gaps >> though, which would better be recognizable in output (often suggesting >> something odd is going on). > > Do you have an example %pS for this new case? I haven't encountered one yet, and I wasn't particularly trying to make up one. >> Insert "fake" end symbols in the address table, accompanied by zero- >> length type/name entries (to keep lookup reasonably close to how it >> was). >> >> Note however that this, with present GNU binutils, won't work for >> xen.efi: The linker loses function sizes (they're not part of a normal >> symbol table entry), and hence nm has no way of reporting them. > > By "present GNU binutils", does this mean that you've got a fix in mind > (or in progress), or that it's an open problem to be solved? The latter; I can't even tell yet whether this is legitimate to be arranged for in a PE executable's symbol table. >> The address table growth is quite significant on x86 release builds (due >> to functions being aligned to 16-byte boundaries), though: Its size >> almost doubles. > > Why does the function alignment affect the growth? I only insert fake end symbols when the following symbol doesn't match the prior one's end. Hence with minimal alignment (and thus no gaps) there wouldn't be any "end" symbols at all. >> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> --- >> Note: Style-wise this is a horrible mix. I'm trying to match styles with >> what's used in the respective functions. >> >> Older GNU ld retains section symbols, which nm then also lists. Should >> we perhaps strip those as we read in nm's output? They don't provide any >> useful extra information, as our linker scripts add section start >> symbols anyway. (For the purposes here, luckily such section symbols are >> at least emitted without size.) > > Will symbols_lookup() ever produce these? If not, it might be better to > ignore the problem. > > Taking extra logic to work around a benign issue in older toolchains > isn't necessarily ideal. Afaict it's unpredictable from Xen's pov. All depends on the order of entries after we sorted the table by address. The only criteria the tool's compare_value() applies for multiple symbols at the same address is to prefer global over local. As long as the first symbol in a section is global, we wouldn't see section symbols as lookup result. Jan
On 13/03/2025 4:48 pm, Jan Beulich wrote: > On 13.03.2025 17:39, Andrew Cooper wrote: >> On 13/03/2025 1:54 pm, Jan Beulich wrote: >>> When determining the symbol for a given address (e.g. for the %pS >>> logging format specifier), so far the size of a symbol (function) was >>> assumed to be everything until the next symbol. There may be gaps >>> though, which would better be recognizable in output (often suggesting >>> something odd is going on). >> Do you have an example %pS for this new case? > I haven't encountered one yet, and I wasn't particularly trying to > make up one. > >>> Insert "fake" end symbols in the address table, accompanied by zero- >>> length type/name entries (to keep lookup reasonably close to how it >>> was). >>> >>> Note however that this, with present GNU binutils, won't work for >>> xen.efi: The linker loses function sizes (they're not part of a normal >>> symbol table entry), and hence nm has no way of reporting them. >> By "present GNU binutils", does this mean that you've got a fix in mind >> (or in progress), or that it's an open problem to be solved? > The latter; I can't even tell yet whether this is legitimate to be > arranged for in a PE executable's symbol table. In which case, I'd suggest using the phrase "open problem" to make it clear that there's no fix. > >>> Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> --- >>> Note: Style-wise this is a horrible mix. I'm trying to match styles with >>> what's used in the respective functions. >>> >>> Older GNU ld retains section symbols, which nm then also lists. Should >>> we perhaps strip those as we read in nm's output? They don't provide any >>> useful extra information, as our linker scripts add section start >>> symbols anyway. (For the purposes here, luckily such section symbols are >>> at least emitted without size.) >> Will symbols_lookup() ever produce these? If not, it might be better to >> ignore the problem. >> >> Taking extra logic to work around a benign issue in older toolchains >> isn't necessarily ideal. > Afaict it's unpredictable from Xen's pov. All depends on the order of > entries after we sorted the table by address. The only criteria the > tool's compare_value() applies for multiple symbols at the same address > is to prefer global over local. As long as the first symbol in a section > is global, we wouldn't see section symbols as lookup result. Hmm, thinking about it, the global-ness does cause problems. e.g. we get _stextentry()+x rather than restore_all_guest()+x, and RAG is more likely than some to show up in a backtrace. So maybe we should strip section symbols, even the explicit linker ones, from the symbol table. I can't offhand think of a case where we want to look up a symbol by address and get back a section name. (Feel free to leave this as a todo. I wasn't intending to scope creep like this, but it would be a nice to have.) ~Andrew
On 13.03.2025 18:13, Andrew Cooper wrote: > On 13/03/2025 4:48 pm, Jan Beulich wrote: >> On 13.03.2025 17:39, Andrew Cooper wrote: >>> On 13/03/2025 1:54 pm, Jan Beulich wrote: >>>> When determining the symbol for a given address (e.g. for the %pS >>>> logging format specifier), so far the size of a symbol (function) was >>>> assumed to be everything until the next symbol. There may be gaps >>>> though, which would better be recognizable in output (often suggesting >>>> something odd is going on). >>> Do you have an example %pS for this new case? >> I haven't encountered one yet, and I wasn't particularly trying to >> make up one. >> >>>> Insert "fake" end symbols in the address table, accompanied by zero- >>>> length type/name entries (to keep lookup reasonably close to how it >>>> was). >>>> >>>> Note however that this, with present GNU binutils, won't work for >>>> xen.efi: The linker loses function sizes (they're not part of a normal >>>> symbol table entry), and hence nm has no way of reporting them. >>> By "present GNU binutils", does this mean that you've got a fix in mind >>> (or in progress), or that it's an open problem to be solved? >> The latter; I can't even tell yet whether this is legitimate to be >> arranged for in a PE executable's symbol table. > > In which case, I'd suggest using the phrase "open problem" to make it > clear that there's no fix. I'd like to leave it as is; right here it's not overly important what state the binutils side is. Furthermore, by the time this goes in the binutils side may have changed state already (e.g. from "open problem" to "fix in progress"). >>>> Older GNU ld retains section symbols, which nm then also lists. Should >>>> we perhaps strip those as we read in nm's output? They don't provide any >>>> useful extra information, as our linker scripts add section start >>>> symbols anyway. (For the purposes here, luckily such section symbols are >>>> at least emitted without size.) >>> Will symbols_lookup() ever produce these? If not, it might be better to >>> ignore the problem. >>> >>> Taking extra logic to work around a benign issue in older toolchains >>> isn't necessarily ideal. >> Afaict it's unpredictable from Xen's pov. All depends on the order of >> entries after we sorted the table by address. The only criteria the >> tool's compare_value() applies for multiple symbols at the same address >> is to prefer global over local. As long as the first symbol in a section >> is global, we wouldn't see section symbols as lookup result. > > Hmm, thinking about it, the global-ness does cause problems. > > e.g. we get _stextentry()+x rather than restore_all_guest()+x, and RAG > is more likely than some to show up in a backtrace. > > So maybe we should strip section symbols, even the explicit linker ones, > from the symbol table. So one thing we could do is to prefer FUNC/OBJECT symbols over NOTYPE ones, and only use global-ness as a last resort criteria. But "prefer" != "strip" in any event. Stripping section symbols is reasonably easy for ELF, as rather than being NOTYPE they have no type at all. Stripping section start symbols, otoh, can only be done by name, and hence we'd need to maintain a list of them in the symbols tool. Not overly nice, but doable of course. An intrusive - to the symbol table - alternative may be to simply strip all NOTYPE symbols. Yet that would take as a prereq marking quite a few more as FUNC or OBJECT. > I can't offhand think of a case where we want to > look up a symbol by address and get back a section name. We also need to keep in mind the opposite (lookup by name) for livepatch. I for one have no idea how (un)likely it might be for there to be a need to lookup a section symbol (then we'd be in trouble with newer binutils) or a section start symbol. > (Feel free to leave this as a todo. I wasn't intending to scope creep > like this, but it would be a nice to have.) If we can agree on what behavior we want, I can see about adding further patches to the series. Jan
© 2016 - 2025 Red Hat, Inc.