From: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf tools annotation code used for a long time parsing the output
of binutils's objdump (or its reimplementations, like llvm's) to then
parse and augment it with samples, allow navigation, etc.
More recently disassemblers from the capstone and llvm (libraries, not
parsing the output of tools using those libraries to mimic binutils's
objdump output) were introduced.
So when all those methods are available, there is a static preference
for a series of attempts of disassembling a binary, with the 'llvm,
capstone, objdump' sequence being hard coded.
This patch allows users to change that sequence, specifying via a 'perf
config' 'annotate.disassemblers' entry which and in what order
disassemblers should be attempted.
As alluded to in the comments in the source code of this series, this
flexibility is useful for users and developers alike, elliminating the
requirement to rebuild the tool with some specific set of libraries to
see how the output of disassembling would be for one of these methods.
root@x1:~# rm -f ~/.perfconfig
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
symbol__disassemble:
filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux,
sym=update_load_avg, start=0xffffffffb6148fe0, en>
annotating [0x6ff7170]
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux :
[0x7407ca0] update_load_avg
Disassembled with llvm
annotate.disassemblers=llvm,capstone,objdump
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = capstone
root@x1:~#
root@x1:~# perf annotate -v --stdio2 update_load_avg
<SNIP>
Disassembled with capstone
annotate.disassemblers=capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent 0xffffffff81148fe0 <update_load_avg>:
1.61 pushq %r15
pushq %r14
1.00 pushq %r13
movl %edx,%r13d
1.90 pushq %r12
pushq %rbp
movq %rsi,%rbp
pushq %rbx
movq %rdi,%rbx
subq $0x18,%rsp
15.14 movl 0x1a4(%rdi),%eax
root@x1:~# perf config annotate.disassemblers=objdump,capstone
root@x1:~# perf config annotate.disassemblers
annotate.disassemblers=objdump,capstone
root@x1:~# cat ~/.perfconfig
# this file is auto-generated.
[annotate]
disassemblers = objdump,capstone
root@x1:~# perf annotate -v --stdio2 update_load_avg
Executing: objdump --start-address=0xffffffff81148fe0 \
--stop-address=0xffffffff811497aa \
-d --no-show-raw-insn -S -C "$1"
Disassembled with objdump
annotate.disassemblers=objdump,capstone
Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz,
Event count (approx.): 5185444, [percent: local period]
update_load_avg()
/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux
Percent
Disassembly of section .text:
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
ffffffff81148fe0 <update_load_avg>:
#define DO_ATTACH 0x4
#define DO_DETACH 0x8
/* Update task and its cfs_rq load average */
static inline void update_load_avg(struct cfs_rq *cfs_rq,
struct sched_entity *se,
int flags)
{
1.61 push %r15
push %r14
1.00 push %r13
mov %edx,%r13d
1.90 push %r12
push %rbp
mov %rsi,%rbp
push %rbx
mov %rdi,%rbx
sub $0x18,%rsp
}
/* rq->task_clock normalized against any time
this cfs_rq has spent throttled */
static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq)
{
if (unlikely(cfs_rq->throttle_count))
15.14 mov 0x1a4(%rdi),%eax
root@x1:~#
After adding a way to select the disassembler from the command line a
'perf test' comparing the output of the various diassemblers should be
introduced, to test these codebases.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-config.txt | 13 ++++
tools/perf/util/annotate.c | 6 ++
tools/perf/util/annotate.h | 6 ++
tools/perf/util/disasm.c | 77 ++++++++++++++++++++++--
4 files changed, 96 insertions(+), 6 deletions(-)
diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt
index 379f9d7a8ab11a02..1f668d4724e3749a 100644
--- a/tools/perf/Documentation/perf-config.txt
+++ b/tools/perf/Documentation/perf-config.txt
@@ -247,6 +247,19 @@ annotate.*::
These are in control of addresses, jump function, source code
in lines of assembly code from a specific program.
+ annotate.disassemblers::
+ Choose the disassembler to use: "objdump", "llvm", "capstone",
+ if not specified it will first try, if available, the "llvm" one,
+ then, if it fails, "capstone", and finally the original "objdump"
+ based one.
+
+ Choosing a different one is useful when handling some feature that
+ is known to be best support at some point by one of the options,
+ to compare the output when in doubt about some bug, etc.
+
+ This can be a list, in order of preference, the first one that works
+ finishes the process.
+
annotate.addr2line::
addr2line binary to use for file names and line numbers.
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index b1d98da79be8b2b0..32e15c9f53f3c0a3 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2116,6 +2116,12 @@ static int annotation__config(const char *var, const char *value, void *data)
opt->offset_level = ANNOTATION__MAX_OFFSET_LEVEL;
else if (opt->offset_level < ANNOTATION__MIN_OFFSET_LEVEL)
opt->offset_level = ANNOTATION__MIN_OFFSET_LEVEL;
+ } else if (!strcmp(var, "annotate.disassemblers")) {
+ opt->disassemblers_str = strdup(value);
+ if (!opt->disassemblers_str) {
+ pr_err("Not enough memory for annotate.disassemblers\n");
+ return -1;
+ }
} else if (!strcmp(var, "annotate.hide_src_code")) {
opt->hide_src_code = perf_config_bool("hide_src_code", value);
} else if (!strcmp(var, "annotate.jump_arrows")) {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 8b9e05a1932f2f9e..194a05cbc506e4da 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -34,6 +34,9 @@ struct annotated_data_type;
#define ANNOTATION__BR_CNTR_WIDTH 30
#define ANNOTATION_DUMMY_LEN 256
+// llvm, capstone, objdump
+#define MAX_DISASSEMBLERS 3
+
struct annotation_options {
bool hide_src_code,
use_offset,
@@ -49,11 +52,14 @@ struct annotation_options {
annotate_src,
full_addr;
u8 offset_level;
+ u8 nr_disassemblers;
int min_pcnt;
int max_lines;
int context;
char *objdump_path;
char *disassembler_style;
+ const char *disassemblers_str;
+ const char *disassemblers[MAX_DISASSEMBLERS];
const char *prefix;
const char *prefix_strip;
unsigned int percent_type;
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 83df1da20a7b16cd..df6c172c9c7f86d9 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -2210,13 +2210,65 @@ static int symbol__disassemble_objdump(const char *filename, struct symbol *sym,
return err;
}
+static int annotation_options__init_disassemblers(struct annotation_options *options)
+{
+ char *disassembler;
+
+ if (options->disassemblers_str == NULL) {
+ const char *default_disassemblers_str =
+#ifdef HAVE_LIBLLVM_SUPPORT
+ "llvm,"
+#endif
+#ifdef HAVE_LIBCAPSTONE_SUPPORT
+ "capstone,"
+#endif
+ "objdump";
+
+ options->disassemblers_str = strdup(default_disassemblers_str);
+ if (!options->disassemblers_str)
+ goto out_enomem;
+ }
+
+ disassembler = strdup(options->disassemblers_str);
+ if (disassembler == NULL)
+ goto out_enomem;
+
+ while (1) {
+ char *comma = strchr(disassembler, ',');
+
+ if (comma != NULL)
+ *comma = '\0';
+
+ options->disassemblers[options->nr_disassemblers++] = strim(disassembler);
+
+ if (comma == NULL)
+ break;
+
+ disassembler = comma + 1;
+
+ if (options->nr_disassemblers >= MAX_DISASSEMBLERS) {
+ pr_debug("annotate.disassemblers can have at most %d entries, ignoring \"%s\"\n",
+ MAX_DISASSEMBLERS, disassembler);
+ break;
+ }
+ }
+
+ return 0;
+
+out_enomem:
+ pr_err("Not enough memory for annotate.disassemblers\n");
+ return -1;
+}
+
int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
{
+ struct annotation_options *options = args->options;
struct map *map = args->ms.map;
struct dso *dso = map__dso(map);
char symfs_filename[PATH_MAX];
bool delete_extract = false;
struct kcore_extract kce;
+ const char *disassembler;
bool decomp = false;
int err = dso__disassemble_filename(dso, symfs_filename, sizeof(symfs_filename));
@@ -2276,16 +2328,29 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args)
}
}
- err = symbol__disassemble_llvm(symfs_filename, sym, args);
- if (err == 0)
+ err = annotation_options__init_disassemblers(options);
+ if (err)
goto out_remove_tmp;
- err = symbol__disassemble_capstone(symfs_filename, sym, args);
- if (err == 0)
- goto out_remove_tmp;
+ err = -1;
- err = symbol__disassemble_objdump(symfs_filename, sym, args);
+ for (int i = 0; i < options->nr_disassemblers && err != 0; ++i) {
+ disassembler = options->disassemblers[i];
+ if (!strcmp(disassembler, "llvm"))
+ err = symbol__disassemble_llvm(symfs_filename, sym, args);
+ else if (!strcmp(disassembler, "capstone"))
+ err = symbol__disassemble_capstone(symfs_filename, sym, args);
+ else if (!strcmp(disassembler, "objdump"))
+ err = symbol__disassemble_objdump(symfs_filename, sym, args);
+ else
+ pr_debug("Unknown disassembler %s, skipping...\n", disassembler);
+ }
+
+ if (err == 0) {
+ pr_debug("Disassembled with %s\nannotate.disassemblers=%s\n",
+ disassembler, options->disassemblers_str);
+ }
out_remove_tmp:
if (decomp)
unlink(symfs_filename);
--
2.47.0
On Mon, Nov 11, 2024 at 7:18 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > From: Arnaldo Carvalho de Melo <acme@redhat.com> > > The perf tools annotation code used for a long time parsing the output > of binutils's objdump (or its reimplementations, like llvm's) to then > parse and augment it with samples, allow navigation, etc. > > More recently disassemblers from the capstone and llvm (libraries, not > parsing the output of tools using those libraries to mimic binutils's > objdump output) were introduced. > > So when all those methods are available, there is a static preference > for a series of attempts of disassembling a binary, with the 'llvm, > capstone, objdump' sequence being hard coded. So it LLVM is the preference can we just switch to using the LLVM ELF libraries, etc? :-) I was a bit surprised to see LLVM as preferable to capstone, which feels more agnostic in the LLVM vs GCC/binutils wars. Fwiw, I'm happy with LLVM being the preference. > This patch allows users to change that sequence, specifying via a 'perf > config' 'annotate.disassemblers' entry which and in what order > disassemblers should be attempted. > > As alluded to in the comments in the source code of this series, this > flexibility is useful for users and developers alike, elliminating the > requirement to rebuild the tool with some specific set of libraries to > see how the output of disassembling would be for one of these methods. > > root@x1:~# rm -f ~/.perfconfig > root@x1:~# perf annotate -v --stdio2 update_load_avg > <SNIP> > symbol__disassemble: > filename=/usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux, > sym=update_load_avg, start=0xffffffffb6148fe0, en> > annotating [0x6ff7170] > /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux : > [0x7407ca0] update_load_avg > Disassembled with llvm > annotate.disassemblers=llvm,capstone,objdump > Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, > Event count (approx.): 5185444, [percent: local period] > update_load_avg() > /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux > Percent 0xffffffff81148fe0 <update_load_avg>: > 1.61 pushq %r15 > pushq %r14 > 1.00 pushq %r13 > movl %edx,%r13d > 1.90 pushq %r12 > pushq %rbp > movq %rsi,%rbp > pushq %rbx > movq %rdi,%rbx > subq $0x18,%rsp > 15.14 movl 0x1a4(%rdi),%eax > > root@x1:~# perf config annotate.disassemblers=capstone > root@x1:~# cat ~/.perfconfig > # this file is auto-generated. > [annotate] > disassemblers = capstone > root@x1:~# > root@x1:~# perf annotate -v --stdio2 update_load_avg > <SNIP> > Disassembled with capstone > annotate.disassemblers=capstone > Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, > Event count (approx.): 5185444, [percent: local period] > update_load_avg() > /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux > Percent 0xffffffff81148fe0 <update_load_avg>: > 1.61 pushq %r15 > pushq %r14 > 1.00 pushq %r13 > movl %edx,%r13d > 1.90 pushq %r12 > pushq %rbp > movq %rsi,%rbp > pushq %rbx > movq %rdi,%rbx > subq $0x18,%rsp > 15.14 movl 0x1a4(%rdi),%eax > root@x1:~# perf config annotate.disassemblers=objdump,capstone > root@x1:~# perf config annotate.disassemblers > annotate.disassemblers=objdump,capstone > root@x1:~# cat ~/.perfconfig > # this file is auto-generated. > [annotate] > disassemblers = objdump,capstone > root@x1:~# perf annotate -v --stdio2 update_load_avg > Executing: objdump --start-address=0xffffffff81148fe0 \ > --stop-address=0xffffffff811497aa \ > -d --no-show-raw-insn -S -C "$1" > Disassembled with objdump > annotate.disassemblers=objdump,capstone > Samples: 66 of event 'cpu_atom/cycles/P', 10000 Hz, > Event count (approx.): 5185444, [percent: local period] > update_load_avg() > /usr/lib/debug/lib/modules/6.11.4-201.fc40.x86_64/vmlinux > Percent > > Disassembly of section .text: > > ffffffff81148fe0 <update_load_avg>: > #define DO_ATTACH 0x4 > > ffffffff81148fe0 <update_load_avg>: > #define DO_ATTACH 0x4 > #define DO_DETACH 0x8 > > /* Update task and its cfs_rq load average */ > static inline void update_load_avg(struct cfs_rq *cfs_rq, > struct sched_entity *se, > int flags) > { > 1.61 push %r15 > push %r14 > 1.00 push %r13 > mov %edx,%r13d > 1.90 push %r12 > push %rbp > mov %rsi,%rbp > push %rbx > mov %rdi,%rbx > sub $0x18,%rsp > } > > /* rq->task_clock normalized against any time > this cfs_rq has spent throttled */ > static inline u64 cfs_rq_clock_pelt(struct cfs_rq *cfs_rq) > { > if (unlikely(cfs_rq->throttle_count)) > 15.14 mov 0x1a4(%rdi),%eax > root@x1:~# > > After adding a way to select the disassembler from the command line a > 'perf test' comparing the output of the various diassemblers should be > introduced, to test these codebases. > > Cc: Adrian Hunter <adrian.hunter@intel.com> > Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> > Cc: Ian Rogers <irogers@google.com> > Cc: Jiri Olsa <jolsa@kernel.org> > Cc: Kan Liang <kan.liang@linux.intel.com> > Cc: Namhyung Kim <namhyung@kernel.org> > Cc: Steinar H. Gunderson <sesse@google.com> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Thanks, Ian > --- > tools/perf/Documentation/perf-config.txt | 13 ++++ > tools/perf/util/annotate.c | 6 ++ > tools/perf/util/annotate.h | 6 ++ > tools/perf/util/disasm.c | 77 ++++++++++++++++++++++-- > 4 files changed, 96 insertions(+), 6 deletions(-) > > diff --git a/tools/perf/Documentation/perf-config.txt b/tools/perf/Documentation/perf-config.txt > index 379f9d7a8ab11a02..1f668d4724e3749a 100644 > --- a/tools/perf/Documentation/perf-config.txt > +++ b/tools/perf/Documentation/perf-config.txt > @@ -247,6 +247,19 @@ annotate.*:: > These are in control of addresses, jump function, source code > in lines of assembly code from a specific program. > > + annotate.disassemblers:: > + Choose the disassembler to use: "objdump", "llvm", "capstone", > + if not specified it will first try, if available, the "llvm" one, > + then, if it fails, "capstone", and finally the original "objdump" > + based one. > + > + Choosing a different one is useful when handling some feature that > + is known to be best support at some point by one of the options, > + to compare the output when in doubt about some bug, etc. > + > + This can be a list, in order of preference, the first one that works > + finishes the process. > + > annotate.addr2line:: > addr2line binary to use for file names and line numbers. > > diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c > index b1d98da79be8b2b0..32e15c9f53f3c0a3 100644 > --- a/tools/perf/util/annotate.c > +++ b/tools/perf/util/annotate.c > @@ -2116,6 +2116,12 @@ static int annotation__config(const char *var, const char *value, void *data) > opt->offset_level = ANNOTATION__MAX_OFFSET_LEVEL; > else if (opt->offset_level < ANNOTATION__MIN_OFFSET_LEVEL) > opt->offset_level = ANNOTATION__MIN_OFFSET_LEVEL; > + } else if (!strcmp(var, "annotate.disassemblers")) { > + opt->disassemblers_str = strdup(value); > + if (!opt->disassemblers_str) { > + pr_err("Not enough memory for annotate.disassemblers\n"); > + return -1; > + } > } else if (!strcmp(var, "annotate.hide_src_code")) { > opt->hide_src_code = perf_config_bool("hide_src_code", value); > } else if (!strcmp(var, "annotate.jump_arrows")) { > diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h > index 8b9e05a1932f2f9e..194a05cbc506e4da 100644 > --- a/tools/perf/util/annotate.h > +++ b/tools/perf/util/annotate.h > @@ -34,6 +34,9 @@ struct annotated_data_type; > #define ANNOTATION__BR_CNTR_WIDTH 30 > #define ANNOTATION_DUMMY_LEN 256 > > +// llvm, capstone, objdump > +#define MAX_DISASSEMBLERS 3 > + > struct annotation_options { > bool hide_src_code, > use_offset, > @@ -49,11 +52,14 @@ struct annotation_options { > annotate_src, > full_addr; > u8 offset_level; > + u8 nr_disassemblers; > int min_pcnt; > int max_lines; > int context; > char *objdump_path; > char *disassembler_style; > + const char *disassemblers_str; > + const char *disassemblers[MAX_DISASSEMBLERS]; > const char *prefix; > const char *prefix_strip; > unsigned int percent_type; > diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c > index 83df1da20a7b16cd..df6c172c9c7f86d9 100644 > --- a/tools/perf/util/disasm.c > +++ b/tools/perf/util/disasm.c > @@ -2210,13 +2210,65 @@ static int symbol__disassemble_objdump(const char *filename, struct symbol *sym, > return err; > } > > +static int annotation_options__init_disassemblers(struct annotation_options *options) > +{ > + char *disassembler; > + > + if (options->disassemblers_str == NULL) { > + const char *default_disassemblers_str = > +#ifdef HAVE_LIBLLVM_SUPPORT > + "llvm," > +#endif > +#ifdef HAVE_LIBCAPSTONE_SUPPORT > + "capstone," > +#endif > + "objdump"; > + > + options->disassemblers_str = strdup(default_disassemblers_str); > + if (!options->disassemblers_str) > + goto out_enomem; > + } > + > + disassembler = strdup(options->disassemblers_str); > + if (disassembler == NULL) > + goto out_enomem; > + > + while (1) { > + char *comma = strchr(disassembler, ','); > + > + if (comma != NULL) > + *comma = '\0'; > + > + options->disassemblers[options->nr_disassemblers++] = strim(disassembler); > + > + if (comma == NULL) > + break; > + > + disassembler = comma + 1; > + > + if (options->nr_disassemblers >= MAX_DISASSEMBLERS) { > + pr_debug("annotate.disassemblers can have at most %d entries, ignoring \"%s\"\n", > + MAX_DISASSEMBLERS, disassembler); > + break; > + } > + } > + > + return 0; > + > +out_enomem: > + pr_err("Not enough memory for annotate.disassemblers\n"); > + return -1; > +} > + > int symbol__disassemble(struct symbol *sym, struct annotate_args *args) > { > + struct annotation_options *options = args->options; > struct map *map = args->ms.map; > struct dso *dso = map__dso(map); > char symfs_filename[PATH_MAX]; > bool delete_extract = false; > struct kcore_extract kce; > + const char *disassembler; > bool decomp = false; > int err = dso__disassemble_filename(dso, symfs_filename, sizeof(symfs_filename)); > > @@ -2276,16 +2328,29 @@ int symbol__disassemble(struct symbol *sym, struct annotate_args *args) > } > } > > - err = symbol__disassemble_llvm(symfs_filename, sym, args); > - if (err == 0) > + err = annotation_options__init_disassemblers(options); > + if (err) > goto out_remove_tmp; > > - err = symbol__disassemble_capstone(symfs_filename, sym, args); > - if (err == 0) > - goto out_remove_tmp; > + err = -1; > > - err = symbol__disassemble_objdump(symfs_filename, sym, args); > + for (int i = 0; i < options->nr_disassemblers && err != 0; ++i) { > + disassembler = options->disassemblers[i]; > > + if (!strcmp(disassembler, "llvm")) > + err = symbol__disassemble_llvm(symfs_filename, sym, args); > + else if (!strcmp(disassembler, "capstone")) > + err = symbol__disassemble_capstone(symfs_filename, sym, args); > + else if (!strcmp(disassembler, "objdump")) > + err = symbol__disassemble_objdump(symfs_filename, sym, args); > + else > + pr_debug("Unknown disassembler %s, skipping...\n", disassembler); > + } > + > + if (err == 0) { > + pr_debug("Disassembled with %s\nannotate.disassemblers=%s\n", > + disassembler, options->disassemblers_str); > + } > out_remove_tmp: > if (decomp) > unlink(symfs_filename); > -- > 2.47.0 >
On Mon, Nov 11, 2024 at 08:27:38AM -0800, Ian Rogers wrote: > On Mon, Nov 11, 2024 at 7:18 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > The perf tools annotation code used for a long time parsing the output > > of binutils's objdump (or its reimplementations, like llvm's) to then > > parse and augment it with samples, allow navigation, etc. > > More recently disassemblers from the capstone and llvm (libraries, not > > parsing the output of tools using those libraries to mimic binutils's > > objdump output) were introduced. > > So when all those methods are available, there is a static preference > > for a series of attempts of disassembling a binary, with the 'llvm, > > capstone, objdump' sequence being hard coded. > So it LLVM is the preference can we just switch to using the LLVM ELF > libraries, etc? :-) I was a bit surprised to see LLVM as preferable to I'd have to look up the discussion to see how this ended up being the default when LLVM is available, but when I wanted to have source code intermixed with it and noticed that the LLVM output doesn't have it, that lead me to try to make this selectable so that we can go from one to the other when needing something not available in one of them. On my todo list, and here Steinar could help, is to check if we an have source code intermixed with the llvm based disassembler, like we have with the objdump based one. > capstone, which feels more agnostic in the LLVM vs GCC/binutils wars. > Fwiw, I'm happy with LLVM being the preference. <SNIP> > > After adding a way to select the disassembler from the command line a > > 'perf test' comparing the output of the various diassemblers should be > > introduced, to test these codebases. > > > > Cc: Adrian Hunter <adrian.hunter@intel.com> > > Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> > > Cc: Ian Rogers <irogers@google.com> > > Cc: Jiri Olsa <jolsa@kernel.org> > > Cc: Kan Liang <kan.liang@linux.intel.com> > > Cc: Namhyung Kim <namhyung@kernel.org> > > Cc: Steinar H. Gunderson <sesse@google.com> > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> > > Acked-by: Ian Rogers <irogers@google.com> Thanks! - Arnaldo
On Mon, Nov 11, 2024 at 02:24:34PM -0300, Arnaldo Carvalho de Melo wrote: > On my todo list, and here Steinar could help, is to check if we an have > source code intermixed with the llvm based disassembler, like we have > with the objdump based one. I am no LLVM expert; the only time I ever touched it was for perf :-) TBH I'm not entirely sure what functionality this is, though; I don't think I've ever gotten perf to list source code for me, ever (in the ~15 years I've used it). I can give --line-numbers (and optionally --inlines) to objdump and then look up the line numbers by hand with an editor, but the actual source? Is there some way perf can attribute the samples back to individual source code lines the way some other profilers can (i.e., showing the source instead of instructions)? I would assume that LLVM has some way of outputting line numbers (presumably by parsing debug information), since llvm-objdump supports --line-numbers, but that's perhaps not what you're asking about? /* Steinar */
On Wed, Nov 13, 2024 at 01:56:49PM +0100, Steinar H. Gunderson wrote: > On Mon, Nov 11, 2024 at 02:24:34PM -0300, Arnaldo Carvalho de Melo wrote: > > On my todo list, and here Steinar could help, is to check if we an have > > source code intermixed with the llvm based disassembler, like we have > > with the objdump based one. > > I am no LLVM expert; the only time I ever touched it was for perf :-) Expert enough to get some more LLVM based features in perf ;-) > TBH I'm not entirely sure what functionality this is, though; I don't > think I've ever gotten perf to list source code for me, ever (in the ~15 > years I've used it). I mean this: root@number:~# uname -r 6.11.6-200.fc40.x86_64 root@number:~# rpm -q kernel-debuginfo package kernel-debuginfo is not installed root@number:~# rpm -qa | grep kernel-debuginfo root@number:~# perf probe -L icmp_rcv <icmp_rcv@/root/.cache/debuginfod_client/365ce5a0ccb899e5b203f81ed386764b3e8a6c2e/source-e181e220-#usr#src#debug#kernel-6.11.6#linux-6.11.6-200.fc40.x86_64#net#ipv4#icmp.c:0> 0 int icmp_rcv(struct sk_buff *skb) { 2 enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED; struct rtable *rt = skb_rtable(skb); struct net *net = dev_net(rt->dst.dev); struct icmphdr *icmph; if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) { 8 struct sec_path *sp = skb_sec_path(skb); 9 int nh; if (!(sp && sp->xvec[sp->len - 1]->props.flags & XFRM_STATE_ICMP)) { reason = SKB_DROP_REASON_XFRM_POLICY; goto drop; } <SNIP> /* * Parse the ICMP message */ 73 if (rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) { /* * RFC 1122: 3.2.2.6 An ICMP_ECHO to broadcast MAY be * silently ignored (we let user decide with a sysctl). * RFC 1122: 3.2.2.8 An ICMP_TIMESTAMP MAY be silently * discarded if to broadcast/multicast. */ 80 if ((icmph->type == ICMP_ECHO || icmph->type == ICMP_TIMESTAMP) && 82 READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_broadcasts)) { reason = SKB_DROP_REASON_INVALID_PROTO; goto error; } 86 if (icmph->type != ICMP_ECHO && icmph->type != ICMP_TIMESTAMP && icmph->type != ICMP_ADDRESS && icmph->type != ICMP_ADDRESSREPLY) { reason = SKB_DROP_REASON_INVALID_PROTO; goto error; } } <SNIP> So, source code, automatically obtained from a debuginfod server and places where one can put a probe. Also in annotation: root@number:~# perf record ping -f 127.0.0.1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. ^C --- 127.0.0.1 ping statistics --- 552924 packets transmitted, 552924 received, 0% packet loss, time 2192ms rtt min/avg/max/mdev = 0.001/0.001/0.204/0.000 ms, ipg/ewma 0.003/0.002 ms [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.424 MB perf.data (8509 samples) ] root@number:~# root@number:~# perf config annotate.disassemblers=objdump root@number:~# perf annotate --stdio2 icmp_rcv | head -20 Samples: 1 of event 'cpu_atom/cycles/P', 4000 Hz, Event count (approx.): 2993537, [percent: local period] icmp_rcv() /proc/kcore Percent Disassembly of section load0: ffffffffaff60220 <load0>: endbr64 nop push %r15 xor %esi,%esi push %r14 push %r13 push %r12 push %rbp push %rbx mov %rdi,%rbx sub $0x8,%rsp mov 0x58(%rdi),%rbp No source code, but then its objdump that is running and we don't have the debuginfo file, i.e. the objdump tool doesn't link with libdebuginfo to use the debuginfod server, so we need to install the huge kernel-debuginfo packages: root@number:~# dnf debuginfo-install kernel And then it works: root@number:~# perf annotate --stdio2 icmp_rcv Samples: 53 of event 'cpu_core/cycles/P', 4000 Hz, Event count (approx.): 73399213, [percent: local period] icmp_rcv() /usr/lib/debug/lib/modules/6.11.6-200.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81f60220 <icmp_rcv>: /* * Deal with incoming ICMP packets. */ int icmp_rcv(struct sk_buff *skb) { endbr64 → call __fentry__ push %r15 __xfrm_policy_check(sk, ndir, skb, family); } static inline int xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, unsigned short family) { return __xfrm_policy_check2(sk, dir, skb, family, 0); xor %esi,%esi push %r14 push %r13 push %r12 3.74 push %rbp push %rbx mov %rdi,%rbx sub $0x8,%rsp * rcu_read_lock section */ WARN_ON((skb->_skb_refdst & SKB_DST_NOREF) && !rcu_read_lock_held() && !rcu_read_lock_bh_held()); return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK); 3.74 mov 0x58(%rdi),%rbp and $0xfffffffffffffffe,%rbp } <SKIP some inlines used in icmp_rcv that don't have samples in this perf.data> skb_set_network_header(skb, nh); } __ICMP_INC_STATS(net, ICMP_MIB_INMSGS); f1: mov 0x1f0(%r13),%rax 9.35 incq %gs:0x8(%rax) return ((skb->ip_summed == CHECKSUM_UNNECESSARY) || movzbl 0x80(%rbx),%ecx if (skb_checksum_simple_validate(skb)) movzbl 0x83(%rbx),%eax mov %ecx,%edx and $0xfffffffe,%eax and $0x60,%edx mov %al,0x83(%rbx) skb->csum_valid || cmp $0x20,%dl ↓ je 2e5 1.87 cmp $0x60,%dl ↓ je 2a2 if (skb->ip_summed == CHECKSUM_COMPLETE) { cmp $0x40,%dl ↓ je 26e skb->csum = psum; 134: movl $0x0,0x8c(%rbx) csum = __skb_checksum_complete(skb); mov %rbx,%rdi 6.59 → call __skb_checksum_complete skb->csum_valid = !csum; movzbl 0x83(%rbx),%edx test %ax,%ax sete %cl 5.61 and $0xfffffffe,%edx <SKIP some more> This kind of thing. > I can give --line-numbers (and optionally --inlines) > to objdump and then look up the line numbers by hand with an editor, > but the actual source? Is there some way perf can attribute the samples > back to individual source code lines the way some other profilers can > (i.e., showing the source instead of instructions)? You can show just instructions or instructions + source code. There were requests but no attempt that I know of of doing just source code. Please take a look at 'perf config --help', or directly at tools/perf/Documentation/perf-config.txt to see what you can configure to govern the annotation process, for instance: ------- annotate.hide_src_code:: If a program which is analyzed has source code, this option lets 'annotate' print a list of assembly code with the source code. For example, let's see a part of a program. There're four lines. If this option is 'true', they can be printed without source code from a program as below. │ push %rbp │ mov %rsp,%rbp │ sub $0x10,%rsp │ mov (%rdi),%rdx But if this option is 'false', source code of the part can be also printed as below. Default is 'false'. │ struct rb_node *rb_next(const struct rb_node *node) │ { │ push %rbp │ mov %rsp,%rbp │ sub $0x10,%rsp │ struct rb_node *parent; │ │ if (RB_EMPTY_NODE(node)) │ mov (%rdi),%rdx │ return n; This option works with tui, stdio2 browsers. ------- > I would assume that LLVM has some way of outputting line numbers > (presumably by parsing debug information), since llvm-objdump supports > --line-numbers, but that's perhaps not what you're asking about? So, this is the command line that perf hands to objdump: root@number:~# perf annotate -vv --stdio2 icmp_rcv |& grep objdump Executing: objdump --start-address=0xffffffff81f60220 --stop-address=0xffffffff81f605cb -d --no-show-raw-insn -S -C "$1" Disassembled with objdump annotate.disassemblers=objdump root@number:~# strace -f -s512 -e execve -- perf annotate -vv --stdio2 icmp_rcv |& grep objdump [pid 24768] execve("/usr/bin/objdump", ["objdump", "--start-address=0xffffffff81f60220", "--stop-address=0xffffffff81f605cb", "-d", "--no-show-raw-insn", "-S", "-C", "/usr/lib/debug/lib/modules/6.11.6-200.fc40.x86_64/vmlinux"], 0x55efca837c50 /* 39 vars */) = 0 ^C root@number:~# ⬢ [acme@toolbox perf-tools-next]$ objdump --help |& grep -- -S -S, --source Intermix source code with disassembly --file-start-context Include context from start of file (with -S) --prefix=PREFIX Add PREFIX to absolute paths for -S --prefix-strip=LEVEL Strip initial directory names for -S ⬢ [acme@toolbox perf-tools-next]$ objdump --help |& grep -- -S -S, --source Intermix source code with disassembly --file-start-context Include context from start of file (with -S) --prefix=PREFIX Add PREFIX to absolute paths for -S --prefix-strip=LEVEL Strip initial directory names for -S ⬢ [acme@toolbox perf-tools-next]$ And llvm-objdump _has_ it: ⬢ [acme@toolbox perf-tools-next]$ llvm-objdump --help |& grep -- --source --source When disassembling, display source interleaved with the disassembly. Implies --disassemble -S Alias for --source ⬢ [acme@toolbox perf-tools-next]$ And perf allows to use llvm-objdump in the "objdump" disassembler: ⬢ [acme@toolbox perf-tools-next]$ perf annotate -h objdump Usage: perf annotate [<options>] --objdump <path> objdump binary to use for disassembly and annotations ⬢ [acme@toolbox perf-tools-next]$ Also via perf-config: root@number:~# perf config annotate.objdump=llvm-objdump root@number:~# cat ~/.perfconfig # this file is auto-generated. [annotate] disassemblers = objdump objdump = llvm-objdump root@number:~# And it works: root@number:~# perf annotate --stdio2 icmp_rcv Samples: 53 of event 'cpu_core/cycles/P', 4000 Hz, Event count (approx.): 73399213, [percent: local period] icmp_rcv() /usr/lib/debug/lib/modules/6.11.6-200.fc40.x86_64/vmlinux Percent Disassembly of section .text: ffffffff81f60220 <icmp_rcv>: ; { endbr64 → callq __fentry__ pushq %r15 ; return __xfrm_policy_check2(sk, dir, skb, family, 0); xorl %esi,%esi ; { pushq %r14 pushq %r13 pushq %r12 3.74 pushq %rbp pushq %rbx movq %rdi,%rbx subq $0x8,%rsp ; return (struct dst_entry *)(skb->_skb_refdst & SKB_DST_PTRMASK); 3.74 movq 0x58(%rdi),%rbp andq $-0x2,%rbp ; return rcu_dereference_protected(pnet->net, true); movq (%rbp),%rax movq 0x118(%rax),%r13 ; return __xfrm_policy_check2(sk, dir, skb, family, 0); → callq __xfrm_policy_check2.constprop.0 The output is different, but same instructions on a quick scan. To double check: root@number:~# strace -f -s512 -e execve -- perf annotate -vv --stdio2 icmp_rcv |& grep -m3 objdump Executing: llvm-objdump --start-address=0xffffffff81f60220 --stop-address=0xffffffff81f605cb -d --no-show-raw-insn -S -C "$1" [pid 26113] execve("/bin/sh", ["/bin/sh", "-c", "llvm-objdump --start-address=0xffffffff81f60220 --stop-address=0xffffffff81f605cb -d --no-show-raw-insn -S -C \"$1\"", "--", "/usr/lib/debug/lib/modules/6.11.6-200.fc40.x86_64/vmlinux"], 0x157fa1d0 /* 39 vars */) = 0 [pid 26113] execve("/usr/bin/llvm-objdump", ["llvm-objdump", "--start-address=0xffffffff81f60220", "--stop-address=0xffffffff81f605cb", "-d", "--no-show-raw-insn", "-S", "-C", "/usr/lib/debug/lib/modules/6.11.6-200.fc40.x86_64/vmlinux"], 0x559f670e7bd0 /* 39 vars */) = 0 root@number:~# So, in symbol__disassemble_llvm() it should look at args->options->hide_src_code and use that to ask the llvm lib to have that source code in the disasm somehow. - Arnaldo
On Wed, Nov 13, 2024 at 04:14:06PM -0300, Arnaldo Carvalho de Melo wrote: > So, source code, automatically obtained from a debuginfod server and > places where one can put a probe. Mm, OK. Unfortunately I don't have any debuginfod stuff available on this side, but I understand the desire. I looked at the LLVM disassembler and it _does_ insert line numbers: llvm_addr2line(filename, pc, &args->fileloc, (unsigned int *)&args->line_nr, false, NULL); But this is maybe not enough? We don't have any machinery in perf to get from the file + line number to the source ourselves? (What does capstone and the embedded binutils disassembler do?) > You can show just instructions or instructions + source code. > > There were requests but no attempt that I know of of doing just source > code. TBH I'd love the latter, but I'm not going to sign up for doing it. :-) /* Steinar */
© 2016 - 2024 Red Hat, Inc.