tools/lib/perf/Documentation/libperf.txt | 1 + tools/lib/perf/include/perf/event.h | 25 +++ tools/perf/builtin-annotate.c | 1 + tools/perf/builtin-c2c.c | 1 + tools/perf/builtin-diff.c | 1 + tools/perf/builtin-inject.c | 1 + tools/perf/builtin-kmem.c | 1 + tools/perf/builtin-mem.c | 1 + tools/perf/builtin-record.c | 14 ++ tools/perf/builtin-report.c | 1 + tools/perf/builtin-script.c | 13 ++ tools/perf/builtin-trace.c | 1 + tools/perf/util/build-id.c | 1 + tools/perf/util/data-convert-bt.c | 1 + tools/perf/util/data-convert-json.c | 1 + tools/perf/util/event.c | 22 ++ tools/perf/util/event.h | 5 + tools/perf/util/machine.c | 268 ++++++++++++++++++++++- tools/perf/util/machine.h | 2 + tools/perf/util/map.c | 1 + tools/perf/util/map.h | 4 + tools/perf/util/session.c | 17 ++ tools/perf/util/symbol-elf.c | 9 +- tools/perf/util/symbol.c | 2 +- tools/perf/util/synthetic-events.c | 136 +++++++++--- tools/perf/util/tool.h | 3 +- 26 files changed, 494 insertions(+), 39 deletions(-)
Kernel module elf contains executable code in non-".text" sections as well, for ex: ".noinstr.text". Plus, kernel module's memory layout differs from it's binary layout because .ko elf does not contain program header table. Perf tries to solve it by creating special maps for allocated (SHF_ALLOC) elf sections, but perf uses elf addresses for map address range and thus these special maps remains unused because no real ip falls into their address range. Solve this by preparing section specific special maps using addresses provided by sysfs /sys/module/.../sections/. Also save these details in PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at perf-report time. Without patchset: # perf record -a -c 5000000 # perf report Overhead Command Shared Object Symbol 13.20% qemu-system-x86 [unknown] [.] 0x00005557527b1973 6.58% qemu-system-x86 [kvm_amd] [k] 0x00000000000151e6 6.36% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt 6.21% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc 4.71% qemu-system-x86 [kvm] [k] vcpu_run 4.52% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run 3.50% qemu-system-x86 [kvm] [k] kvm_cpuid 2.09% qemu-system-x86 [kvm] [k] kvm_pmu_trigger_event 1.98% qemu-system-x86 [kvm_amd] [k] 0x0000000000015171 1.05% qemu-system-x86 [kvm_amd] [k] svm_handle_exit 1.04% qemu-system-x86 [kvm_amd] [k] 0x00000000000151e2 0.94% qemu-system-x86 [kvm_amd] [k] 0x0000000000015174 Same perf.data with kallsyms: # perf report --kallsyms=/proc/kallsyms Overhead Command Shared Object Symbol 14.22% qemu-system-x86 [kvm_amd] [k] __svm_vcpu_run 13.20% qemu-system-x86 [unknown] [.] 0x00005557527b1973 6.36% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt 6.21% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc 4.71% qemu-system-x86 [kvm] [k] vcpu_run 4.52% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run 3.50% qemu-system-x86 [kvm] [k] kvm_cpuid 2.09% qemu-system-x86 [kvm] [k] kvm_pmu_trigger_event 1.05% qemu-system-x86 [kvm_amd] [k] svm_handle_exit With patchset: # perf record -a -c 5000000 # perf report Overhead Command Shared Object Symbol 13.44% qemu-system-x86 [kvm-amd].noinstr.text [k] __svm_vcpu_run 13.25% qemu-system-x86 [unknown] [.] 0x000055f4c6563973 7.13% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt 6.00% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc 5.13% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run 4.83% qemu-system-x86 [kvm] [k] vcpu_run 3.65% qemu-system-x86 [kvm] [k] kvm_cpuid Same perf.data with kallsyms: # perf report --kallsyms=/proc/kallsyms Overhead Command Shared Object Symbol 13.44% qemu-system-x86 [kernel.vmlinux] [k] __svm_vcpu_run 13.25% qemu-system-x86 [unknown] [.] 0x000055f4c6563973 7.13% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt 6.00% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc 5.13% qemu-system-x86 [kernel.vmlinux] [k] svm_vcpu_run 4.83% qemu-system-x86 [kernel.vmlinux] [k] vcpu_run 3.65% qemu-system-x86 [kernel.vmlinux] [k] kvm_cpuid This is an RFC only series. TODOs: - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very much possible that, at perf report time, a module file exists at the same path but it's internal layout is different. I think I need to add some buildid check. Any ideas? - I've enabled host perf-record/report only. It doesn't work for guest modules because host does not have access to guest sysfs. I'm yet to figure out how to fix it. May be we can add --guest-mod-sysfs option. Any ideas? - Also, I'm currently assuming that module files are not compressed. - I've seen perf build failures when compiling with NO_LIBELF=1. - I've seen perf report not honoring --kallsyms in certain conditions. Prepared on top of acme/perf/core (69b41ac87e4a6) Ravi Bangoria (4): perf tool: Simplify machine__create_modules() a bit perf tool: Refactor perf_event__synthesize_modules() perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP perf tool: Fix non-".text" symbol resolution for kernel modules tools/lib/perf/Documentation/libperf.txt | 1 + tools/lib/perf/include/perf/event.h | 25 +++ tools/perf/builtin-annotate.c | 1 + tools/perf/builtin-c2c.c | 1 + tools/perf/builtin-diff.c | 1 + tools/perf/builtin-inject.c | 1 + tools/perf/builtin-kmem.c | 1 + tools/perf/builtin-mem.c | 1 + tools/perf/builtin-record.c | 14 ++ tools/perf/builtin-report.c | 1 + tools/perf/builtin-script.c | 13 ++ tools/perf/builtin-trace.c | 1 + tools/perf/util/build-id.c | 1 + tools/perf/util/data-convert-bt.c | 1 + tools/perf/util/data-convert-json.c | 1 + tools/perf/util/event.c | 22 ++ tools/perf/util/event.h | 5 + tools/perf/util/machine.c | 268 ++++++++++++++++++++++- tools/perf/util/machine.h | 2 + tools/perf/util/map.c | 1 + tools/perf/util/map.h | 4 + tools/perf/util/session.c | 17 ++ tools/perf/util/symbol-elf.c | 9 +- tools/perf/util/symbol.c | 2 +- tools/perf/util/synthetic-events.c | 136 +++++++++--- tools/perf/util/tool.h | 3 +- 26 files changed, 494 insertions(+), 39 deletions(-) -- 2.39.0
Hi all, On 10-Jan-23 11:28 AM, Ravi Bangoria wrote: > Kernel module elf contains executable code in non-".text" sections as > well, for ex: ".noinstr.text". Plus, kernel module's memory layout > differs from it's binary layout because .ko elf does not contain > program header table. > > Perf tries to solve it by creating special maps for allocated (SHF_ALLOC) > elf sections, but perf uses elf addresses for map address range and thus > these special maps remains unused because no real ip falls into their > address range. > > Solve this by preparing section specific special maps using addresses > provided by sysfs /sys/module/.../sections/. Also save these details in > PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at > perf-report time. Do you guys feel this is worth to fix and I shall continue? Or --kcore / --kallsyms workarounds are sufficient? Thanks, Ravi
On 10/01/23 07:58, Ravi Bangoria wrote: > Kernel module elf contains executable code in non-".text" sections as > well, for ex: ".noinstr.text". Plus, kernel module's memory layout > differs from it's binary layout because .ko elf does not contain > program header table. Have you looked at using perf record --kcore option. > > Perf tries to solve it by creating special maps for allocated (SHF_ALLOC) > elf sections, but perf uses elf addresses for map address range and thus > these special maps remains unused because no real ip falls into their > address range. > > Solve this by preparing section specific special maps using addresses > provided by sysfs /sys/module/.../sections/. Also save these details in > PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at > perf-report time. > > Without patchset: > > # perf record -a -c 5000000 > # perf report > Overhead Command Shared Object Symbol > 13.20% qemu-system-x86 [unknown] [.] 0x00005557527b1973 > 6.58% qemu-system-x86 [kvm_amd] [k] 0x00000000000151e6 > 6.36% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt > 6.21% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc > 4.71% qemu-system-x86 [kvm] [k] vcpu_run > 4.52% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run > 3.50% qemu-system-x86 [kvm] [k] kvm_cpuid > 2.09% qemu-system-x86 [kvm] [k] kvm_pmu_trigger_event > 1.98% qemu-system-x86 [kvm_amd] [k] 0x0000000000015171 > 1.05% qemu-system-x86 [kvm_amd] [k] svm_handle_exit > 1.04% qemu-system-x86 [kvm_amd] [k] 0x00000000000151e2 > 0.94% qemu-system-x86 [kvm_amd] [k] 0x0000000000015174 > > Same perf.data with kallsyms: > > # perf report --kallsyms=/proc/kallsyms > Overhead Command Shared Object Symbol > 14.22% qemu-system-x86 [kvm_amd] [k] __svm_vcpu_run > 13.20% qemu-system-x86 [unknown] [.] 0x00005557527b1973 > 6.36% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt > 6.21% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc > 4.71% qemu-system-x86 [kvm] [k] vcpu_run > 4.52% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run > 3.50% qemu-system-x86 [kvm] [k] kvm_cpuid > 2.09% qemu-system-x86 [kvm] [k] kvm_pmu_trigger_event > 1.05% qemu-system-x86 [kvm_amd] [k] svm_handle_exit > > With patchset: > > # perf record -a -c 5000000 > # perf report > Overhead Command Shared Object Symbol > 13.44% qemu-system-x86 [kvm-amd].noinstr.text [k] __svm_vcpu_run > 13.25% qemu-system-x86 [unknown] [.] 0x000055f4c6563973 > 7.13% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt > 6.00% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc > 5.13% qemu-system-x86 [kvm_amd] [k] svm_vcpu_run > 4.83% qemu-system-x86 [kvm] [k] vcpu_run > 3.65% qemu-system-x86 [kvm] [k] kvm_cpuid > > Same perf.data with kallsyms: > > # perf report --kallsyms=/proc/kallsyms > Overhead Command Shared Object Symbol > 13.44% qemu-system-x86 [kernel.vmlinux] [k] __svm_vcpu_run > 13.25% qemu-system-x86 [unknown] [.] 0x000055f4c6563973 > 7.13% qemu-system-x86 [kernel.vmlinux] [k] native_load_gdt > 6.00% qemu-system-x86 [kernel.vmlinux] [k] native_load_tr_desc > 5.13% qemu-system-x86 [kernel.vmlinux] [k] svm_vcpu_run > 4.83% qemu-system-x86 [kernel.vmlinux] [k] vcpu_run > 3.65% qemu-system-x86 [kernel.vmlinux] [k] kvm_cpuid > > This is an RFC only series. TODOs: > - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very > much possible that, at perf report time, a module file exists at the > same path but it's internal layout is different. I think I need to add > some buildid check. Any ideas? > - I've enabled host perf-record/report only. It doesn't work for guest > modules because host does not have access to guest sysfs. I'm yet to > figure out how to fix it. May be we can add --guest-mod-sysfs option. > Any ideas? > - Also, I'm currently assuming that module files are not compressed. > - I've seen perf build failures when compiling with NO_LIBELF=1. > - I've seen perf report not honoring --kallsyms in certain conditions. > > Prepared on top of acme/perf/core (69b41ac87e4a6) > > Ravi Bangoria (4): > perf tool: Simplify machine__create_modules() a bit > perf tool: Refactor perf_event__synthesize_modules() > perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP > perf tool: Fix non-".text" symbol resolution for kernel modules > > tools/lib/perf/Documentation/libperf.txt | 1 + > tools/lib/perf/include/perf/event.h | 25 +++ > tools/perf/builtin-annotate.c | 1 + > tools/perf/builtin-c2c.c | 1 + > tools/perf/builtin-diff.c | 1 + > tools/perf/builtin-inject.c | 1 + > tools/perf/builtin-kmem.c | 1 + > tools/perf/builtin-mem.c | 1 + > tools/perf/builtin-record.c | 14 ++ > tools/perf/builtin-report.c | 1 + > tools/perf/builtin-script.c | 13 ++ > tools/perf/builtin-trace.c | 1 + > tools/perf/util/build-id.c | 1 + > tools/perf/util/data-convert-bt.c | 1 + > tools/perf/util/data-convert-json.c | 1 + > tools/perf/util/event.c | 22 ++ > tools/perf/util/event.h | 5 + > tools/perf/util/machine.c | 268 ++++++++++++++++++++++- > tools/perf/util/machine.h | 2 + > tools/perf/util/map.c | 1 + > tools/perf/util/map.h | 4 + > tools/perf/util/session.c | 17 ++ > tools/perf/util/symbol-elf.c | 9 +- > tools/perf/util/symbol.c | 2 +- > tools/perf/util/synthetic-events.c | 136 +++++++++--- > tools/perf/util/tool.h | 3 +- > 26 files changed, 494 insertions(+), 39 deletions(-) >
Hi Adrian, On 10-Jan-23 12:05 PM, Adrian Hunter wrote: > On 10/01/23 07:58, Ravi Bangoria wrote: >> Kernel module elf contains executable code in non-".text" sections as >> well, for ex: ".noinstr.text". Plus, kernel module's memory layout >> differs from it's binary layout because .ko elf does not contain >> program header table. > > Have you looked at using perf record --kcore option. Nice! We can also use --kallsyms with perf report and it resolves symbols fine. But what about normal perf record/report? Why I'm enforcing on normal perf- record/report is because, generally user don't specify these options, esp if he has root privileges, he expects symbol-resolution should just work fine. But when he sees inconsistency in symbol-resolution of the same kernel module, he will be clueless of what's missing. This patchset is trying to solve it, although I too feel adding section specific maps to perf.data is overkill as --kcore or --kallsyms can also resolve those symbols. Thanks for your feedback, Ravi
On 10-Jan-23 2:13 PM, Ravi Bangoria wrote: > Hi Adrian, > > On 10-Jan-23 12:05 PM, Adrian Hunter wrote: >> On 10/01/23 07:58, Ravi Bangoria wrote: >>> Kernel module elf contains executable code in non-".text" sections as >>> well, for ex: ".noinstr.text". Plus, kernel module's memory layout >>> differs from it's binary layout because .ko elf does not contain >>> program header table. >> >> Have you looked at using perf record --kcore option. > > Nice! We can also use --kallsyms with perf report and it resolves symbols > fine. > > But what about normal perf record/report? Why I'm enforcing on normal perf- > record/report is because, generally user don't specify these options, esp if > he has root privileges, he expects symbol-resolution should just work fine. > But when he sees inconsistency in symbol-resolution of the same kernel module, > he will be clueless of what's missing. This patchset is trying to solve it, > although I too feel adding section specific maps to perf.data is overkill as > --kcore or --kallsyms can also resolve those symbols. FWIW, what this patchset does is not new. Perf already creates (pseudo) maps for module elf sections while parsing symbol table: dso__process_kernel_symbol(). But perf does it incorrectly so this patchset is trying to fix it. Thanks, Ravi
© 2016 - 2025 Red Hat, Inc.