[RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules

Ravi Bangoria posted 4 patches 2 years, 8 months ago
tools/lib/perf/Documentation/libperf.txt |   1 +
tools/lib/perf/include/perf/event.h      |  25 +++
tools/perf/builtin-annotate.c            |   1 +
tools/perf/builtin-c2c.c                 |   1 +
tools/perf/builtin-diff.c                |   1 +
tools/perf/builtin-inject.c              |   1 +
tools/perf/builtin-kmem.c                |   1 +
tools/perf/builtin-mem.c                 |   1 +
tools/perf/builtin-record.c              |  14 ++
tools/perf/builtin-report.c              |   1 +
tools/perf/builtin-script.c              |  13 ++
tools/perf/builtin-trace.c               |   1 +
tools/perf/util/build-id.c               |   1 +
tools/perf/util/data-convert-bt.c        |   1 +
tools/perf/util/data-convert-json.c      |   1 +
tools/perf/util/event.c                  |  22 ++
tools/perf/util/event.h                  |   5 +
tools/perf/util/machine.c                | 268 ++++++++++++++++++++++-
tools/perf/util/machine.h                |   2 +
tools/perf/util/map.c                    |   1 +
tools/perf/util/map.h                    |   4 +
tools/perf/util/session.c                |  17 ++
tools/perf/util/symbol-elf.c             |   9 +-
tools/perf/util/symbol.c                 |   2 +-
tools/perf/util/synthetic-events.c       | 136 +++++++++---
tools/perf/util/tool.h                   |   3 +-
26 files changed, 494 insertions(+), 39 deletions(-)
[RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
Posted by Ravi Bangoria 2 years, 8 months ago
Kernel module elf contains executable code in non-".text" sections as
well, for ex: ".noinstr.text". Plus, kernel module's memory layout
differs from it's binary layout because .ko elf does not contain
program header table.

Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
elf sections, but perf uses elf addresses for map address range and thus
these special maps remains unused because no real ip falls into their
address range.

Solve this by preparing section specific special maps using addresses
provided by sysfs /sys/module/.../sections/. Also save these details in
PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
perf-report time.

Without patchset:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.58%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e6
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.98%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015171
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
     1.04%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e2
     0.94%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015174

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object           Symbol
    14.22%  qemu-system-x86  [kvm_amd]               [k] __svm_vcpu_run
    13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
     6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
     4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
     2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
     1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit

With patchset:

  # perf record -a -c 5000000
  # perf report
  Overhead  Command          Shared Object           Symbol
    13.44%  qemu-system-x86  [kvm-amd].noinstr.text  [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]               [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kvm]                   [k] vcpu_run
     3.65%  qemu-system-x86  [kvm]                   [k] kvm_cpuid

  Same perf.data with kallsyms:

  # perf report --kallsyms=/proc/kallsyms
  Overhead  Command          Shared Object       Symbol
    13.44%  qemu-system-x86  [kernel.vmlinux]    [k] __svm_vcpu_run
    13.25%  qemu-system-x86  [unknown]           [.] 0x000055f4c6563973
     7.13%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_gdt
     6.00%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_tr_desc
     5.13%  qemu-system-x86  [kernel.vmlinux]    [k] svm_vcpu_run
     4.83%  qemu-system-x86  [kernel.vmlinux]    [k] vcpu_run
     3.65%  qemu-system-x86  [kernel.vmlinux]    [k] kvm_cpuid

This is an RFC only series. TODOs:
 - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very
   much possible that, at perf report time, a module file exists at the
   same path but it's internal layout is different. I think I need to add
   some buildid check. Any ideas?
 - I've enabled host perf-record/report only. It doesn't work for guest
   modules because host does not have access to guest sysfs. I'm yet to
   figure out how to fix it. May be we can add --guest-mod-sysfs option.
   Any ideas?
 - Also, I'm currently assuming that module files are not compressed.
 - I've seen perf build failures when compiling with NO_LIBELF=1.
 - I've seen perf report not honoring --kallsyms in certain conditions.

Prepared on top of acme/perf/core (69b41ac87e4a6)

Ravi Bangoria (4):
  perf tool: Simplify machine__create_modules() a bit
  perf tool: Refactor perf_event__synthesize_modules()
  perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
  perf tool: Fix non-".text" symbol resolution for kernel modules

 tools/lib/perf/Documentation/libperf.txt |   1 +
 tools/lib/perf/include/perf/event.h      |  25 +++
 tools/perf/builtin-annotate.c            |   1 +
 tools/perf/builtin-c2c.c                 |   1 +
 tools/perf/builtin-diff.c                |   1 +
 tools/perf/builtin-inject.c              |   1 +
 tools/perf/builtin-kmem.c                |   1 +
 tools/perf/builtin-mem.c                 |   1 +
 tools/perf/builtin-record.c              |  14 ++
 tools/perf/builtin-report.c              |   1 +
 tools/perf/builtin-script.c              |  13 ++
 tools/perf/builtin-trace.c               |   1 +
 tools/perf/util/build-id.c               |   1 +
 tools/perf/util/data-convert-bt.c        |   1 +
 tools/perf/util/data-convert-json.c      |   1 +
 tools/perf/util/event.c                  |  22 ++
 tools/perf/util/event.h                  |   5 +
 tools/perf/util/machine.c                | 268 ++++++++++++++++++++++-
 tools/perf/util/machine.h                |   2 +
 tools/perf/util/map.c                    |   1 +
 tools/perf/util/map.h                    |   4 +
 tools/perf/util/session.c                |  17 ++
 tools/perf/util/symbol-elf.c             |   9 +-
 tools/perf/util/symbol.c                 |   2 +-
 tools/perf/util/synthetic-events.c       | 136 +++++++++---
 tools/perf/util/tool.h                   |   3 +-
 26 files changed, 494 insertions(+), 39 deletions(-)

-- 
2.39.0
Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
Posted by Ravi Bangoria 2 years, 8 months ago
Hi all,

On 10-Jan-23 11:28 AM, Ravi Bangoria wrote:
> Kernel module elf contains executable code in non-".text" sections as
> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
> differs from it's binary layout because .ko elf does not contain
> program header table.
> 
> Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
> elf sections, but perf uses elf addresses for map address range and thus
> these special maps remains unused because no real ip falls into their
> address range.
> 
> Solve this by preparing section specific special maps using addresses
> provided by sysfs /sys/module/.../sections/. Also save these details in
> PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
> perf-report time.

Do you guys feel this is worth to fix and I shall continue? Or --kcore /
--kallsyms workarounds are sufficient?

Thanks,
Ravi
Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
Posted by Adrian Hunter 2 years, 8 months ago
On 10/01/23 07:58, Ravi Bangoria wrote:
> Kernel module elf contains executable code in non-".text" sections as
> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
> differs from it's binary layout because .ko elf does not contain
> program header table.

Have you looked at using perf record --kcore option.

> 
> Perf tries to solve it by creating special maps for allocated (SHF_ALLOC)
> elf sections, but perf uses elf addresses for map address range and thus
> these special maps remains unused because no real ip falls into their
> address range.
> 
> Solve this by preparing section specific special maps using addresses
> provided by sysfs /sys/module/.../sections/. Also save these details in
> PERF_RECORD_KMOD_SEC_MAP format in perf.data which can be consumed at
> perf-report time.
> 
> Without patchset:
> 
>   # perf record -a -c 5000000
>   # perf report
>   Overhead  Command          Shared Object           Symbol
>     13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
>      6.58%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e6
>      6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
>      2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
>      1.98%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015171
>      1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
>      1.04%  qemu-system-x86  [kvm_amd]               [k] 0x00000000000151e2
>      0.94%  qemu-system-x86  [kvm_amd]               [k] 0x0000000000015174
> 
>   Same perf.data with kallsyms:
> 
>   # perf report --kallsyms=/proc/kallsyms
>   Overhead  Command          Shared Object           Symbol
>     14.22%  qemu-system-x86  [kvm_amd]               [k] __svm_vcpu_run
>     13.20%  qemu-system-x86  [unknown]               [.] 0x00005557527b1973
>      6.36%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.21%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      4.71%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      4.52%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      3.50%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
>      2.09%  qemu-system-x86  [kvm]                   [k] kvm_pmu_trigger_event
>      1.05%  qemu-system-x86  [kvm_amd]               [k] svm_handle_exit
> 
> With patchset:
> 
>   # perf record -a -c 5000000
>   # perf report
>   Overhead  Command          Shared Object           Symbol
>     13.44%  qemu-system-x86  [kvm-amd].noinstr.text  [k] __svm_vcpu_run
>     13.25%  qemu-system-x86  [unknown]               [.] 0x000055f4c6563973
>      7.13%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_gdt
>      6.00%  qemu-system-x86  [kernel.vmlinux]        [k] native_load_tr_desc
>      5.13%  qemu-system-x86  [kvm_amd]               [k] svm_vcpu_run
>      4.83%  qemu-system-x86  [kvm]                   [k] vcpu_run
>      3.65%  qemu-system-x86  [kvm]                   [k] kvm_cpuid
> 
>   Same perf.data with kallsyms:
> 
>   # perf report --kallsyms=/proc/kallsyms
>   Overhead  Command          Shared Object       Symbol
>     13.44%  qemu-system-x86  [kernel.vmlinux]    [k] __svm_vcpu_run
>     13.25%  qemu-system-x86  [unknown]           [.] 0x000055f4c6563973
>      7.13%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_gdt
>      6.00%  qemu-system-x86  [kernel.vmlinux]    [k] native_load_tr_desc
>      5.13%  qemu-system-x86  [kernel.vmlinux]    [k] svm_vcpu_run
>      4.83%  qemu-system-x86  [kernel.vmlinux]    [k] vcpu_run
>      3.65%  qemu-system-x86  [kernel.vmlinux]    [k] kvm_cpuid
> 
> This is an RFC only series. TODOs:
>  - I'm just recording module path in PERF_RECORD_KMOD_SEC_MAP. It's very
>    much possible that, at perf report time, a module file exists at the
>    same path but it's internal layout is different. I think I need to add
>    some buildid check. Any ideas?
>  - I've enabled host perf-record/report only. It doesn't work for guest
>    modules because host does not have access to guest sysfs. I'm yet to
>    figure out how to fix it. May be we can add --guest-mod-sysfs option.
>    Any ideas?
>  - Also, I'm currently assuming that module files are not compressed.
>  - I've seen perf build failures when compiling with NO_LIBELF=1.
>  - I've seen perf report not honoring --kallsyms in certain conditions.
> 
> Prepared on top of acme/perf/core (69b41ac87e4a6)
> 
> Ravi Bangoria (4):
>   perf tool: Simplify machine__create_modules() a bit
>   perf tool: Refactor perf_event__synthesize_modules()
>   perf tool: Introduce PERF_RECORD_KMOD_SEC_MAP
>   perf tool: Fix non-".text" symbol resolution for kernel modules
> 
>  tools/lib/perf/Documentation/libperf.txt |   1 +
>  tools/lib/perf/include/perf/event.h      |  25 +++
>  tools/perf/builtin-annotate.c            |   1 +
>  tools/perf/builtin-c2c.c                 |   1 +
>  tools/perf/builtin-diff.c                |   1 +
>  tools/perf/builtin-inject.c              |   1 +
>  tools/perf/builtin-kmem.c                |   1 +
>  tools/perf/builtin-mem.c                 |   1 +
>  tools/perf/builtin-record.c              |  14 ++
>  tools/perf/builtin-report.c              |   1 +
>  tools/perf/builtin-script.c              |  13 ++
>  tools/perf/builtin-trace.c               |   1 +
>  tools/perf/util/build-id.c               |   1 +
>  tools/perf/util/data-convert-bt.c        |   1 +
>  tools/perf/util/data-convert-json.c      |   1 +
>  tools/perf/util/event.c                  |  22 ++
>  tools/perf/util/event.h                  |   5 +
>  tools/perf/util/machine.c                | 268 ++++++++++++++++++++++-
>  tools/perf/util/machine.h                |   2 +
>  tools/perf/util/map.c                    |   1 +
>  tools/perf/util/map.h                    |   4 +
>  tools/perf/util/session.c                |  17 ++
>  tools/perf/util/symbol-elf.c             |   9 +-
>  tools/perf/util/symbol.c                 |   2 +-
>  tools/perf/util/synthetic-events.c       | 136 +++++++++---
>  tools/perf/util/tool.h                   |   3 +-
>  26 files changed, 494 insertions(+), 39 deletions(-)
>
Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
Posted by Ravi Bangoria 2 years, 8 months ago
Hi Adrian,

On 10-Jan-23 12:05 PM, Adrian Hunter wrote:
> On 10/01/23 07:58, Ravi Bangoria wrote:
>> Kernel module elf contains executable code in non-".text" sections as
>> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
>> differs from it's binary layout because .ko elf does not contain
>> program header table.
> 
> Have you looked at using perf record --kcore option.

Nice! We can also use --kallsyms with perf report and it resolves symbols
fine.

But what about normal perf record/report? Why I'm enforcing on normal perf-
record/report is because, generally user don't specify these options, esp if
he has root privileges, he expects symbol-resolution should just work fine.
But when he sees inconsistency in symbol-resolution of the same kernel module,
he will be clueless of what's missing. This patchset is trying to solve it,
although I too feel adding section specific maps to perf.data is overkill as
--kcore or --kallsyms can also resolve those symbols.

Thanks for your feedback,
Ravi
Re: [RFC 0/4] perf tool: Fix non-".text" symbol resolution for kernel modules
Posted by Ravi Bangoria 2 years, 8 months ago
On 10-Jan-23 2:13 PM, Ravi Bangoria wrote:
> Hi Adrian,
> 
> On 10-Jan-23 12:05 PM, Adrian Hunter wrote:
>> On 10/01/23 07:58, Ravi Bangoria wrote:
>>> Kernel module elf contains executable code in non-".text" sections as
>>> well, for ex: ".noinstr.text". Plus, kernel module's memory layout
>>> differs from it's binary layout because .ko elf does not contain
>>> program header table.
>>
>> Have you looked at using perf record --kcore option.
> 
> Nice! We can also use --kallsyms with perf report and it resolves symbols
> fine.
> 
> But what about normal perf record/report? Why I'm enforcing on normal perf-
> record/report is because, generally user don't specify these options, esp if
> he has root privileges, he expects symbol-resolution should just work fine.
> But when he sees inconsistency in symbol-resolution of the same kernel module,
> he will be clueless of what's missing. This patchset is trying to solve it,
> although I too feel adding section specific maps to perf.data is overkill as
> --kcore or --kallsyms can also resolve those symbols.

FWIW, what this patchset does is not new. Perf already creates (pseudo) maps
for module elf sections while parsing symbol table: dso__process_kernel_symbol().
But perf does it incorrectly so this patchset is trying to fix it.

Thanks,
Ravi