[PATCH v2 00/16] perf arm64: Support data type profiling

Tengda Wu posted 16 patches 2 months, 1 week ago
.../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
.../util/annotate-arch/annotate-powerpc.c     |  10 +
tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
tools/perf/util/annotate-data.c               |  72 +-
tools/perf/util/annotate-data.h               |   7 +-
tools/perf/util/annotate.c                    | 108 +--
tools/perf/util/annotate.h                    |  12 +
tools/perf/util/capstone.c                    | 107 ++-
tools/perf/util/disasm.c                      |   5 +
tools/perf/util/disasm.h                      |   5 +
.../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
tools/perf/util/dwarf-regs.c                  |   2 +-
tools/perf/util/include/dwarf-regs.h          |   1 +
tools/perf/util/llvm.c                        |  50 ++
14 files changed, 984 insertions(+), 145 deletions(-)
[PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 2 months, 1 week ago
This patch series implements data type profiling support for arm64,
building upon the foundational work previously contributed by Huafei [1].
While the initial version laid the groundwork for arm64 data type analysis,
this series iterates on that work by refining instruction parsing and
extending support for core architectural features.

The series is organized as follows:

1. Fix disassembly mismatches (Patches 01-02)
   Current perf annotate supports three disassembly backends: llvm,
   capstone, and objdump. On arm64, inconsistencies between the output
   of these backends (specifically llvm/capstone vs. objdump) often
   prevent the tracker from correctly identifying registers and offsets.
   These patches resolve these mismatches, ensuring consistent instruction
   parsing across all supported backends.

2. Infrastructure for arm64 operand parsing (Patches 03-07)
   These patches establish the necessary infrastructure for arm64-specific
   operand handling. This includes implementing new callbacks and data
   structures to manage arm64's unique addressing modes and register sets.
   This foundation is essential for the subsequent type-tracking logic.

3. Core instruction tracking (Patches 08-16)
   These patches implement the core logic for type tracking on arm64,
   covering a wide range of instructions including:

   * Memory Access: ldr/str variants (including stack-based access).
   * Arithmetic & Data Processing: mov, add, and adrp.
   * Special Access: System register access (mrs) and per-cpu variable
     tracking.

The implementation draws inspiration from the existing x86 logic while
adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
perf annotate can successfully resolve memory locations and register
types, enabling comprehensive data type profiling on arm64 platforms.

Example Result
==============

# perf mem record -a -K -- sleep 1
# perf annotate --data-type --type-stat --stdio
Annotate data type stats:
total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
-----------------------------------------------------------
        29 : no_sym
       196 : no_var
       806 : no_typeinfo
        82 : bad_offset
      1370 : insn_track

Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
============================================================================
 Percent     offset       size  field
  100.00          0       0x40  struct page      {
    9.95          0        0x8      long unsigned int   flags;
   52.83        0x8       0x28      union        {
   52.83        0x8       0x28          struct   {
   37.21        0x8       0x10              union        {
   37.21        0x8       0x10                  struct list_head        lru {
   37.21        0x8        0x8                      struct list_head*   next;
    0.00       0x10        0x8                      struct list_head*   prev;
                                                };
   37.21        0x8       0x10                  struct   {
   37.21        0x8        0x8                      void*       __filler;
    0.00       0x10        0x4                      unsigned int        mlock_count;
   ...

Changes since v1: (reworked from Huafei's series):

 - Fix inconsistencies in arm64 instruction output across llvm, capstone,
   and objdump disassembly backends.
 - Support arm64-specific addressing modes and operand formats. (Leo Yan)
 - Extend instruction tracking to support mov and add instructions,
   along with per-cpu and stack variables.
 - Include real-world examples in commit messages to demonstrate
   practical effects. (Namhyung Kim)
 - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
   https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/

Please let me know if you have any feedback.

Thanks,
Tengda

[1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
[2] https://developer.arm.com/documentation/102374/0103
[3] https://github.com/flynd/asmsheets/releases/tag/v8

---

Tengda Wu (16):
  perf llvm: Fix arm64 adrp instruction disassembly mismatch with
    objdump
  perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
  perf annotate-arm64: Generalize arm64_mov__parse to support standard
    operands
  perf annotate-arm64: Handle load and store instructions
  perf annotate: Introduce extract_op_location callback for
    arch-specific parsing
  perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
  perf annotate-arm64: Implement extract_op_location() callback
  perf annotate-arm64: Enable instruction tracking support
  perf annotate-arm64: Support load instruction tracking
  perf annotate-arm64: Support store instruction tracking
  perf annotate-arm64: Support stack variable tracking
  perf annotate-arm64: Support 'mov' instruction tracking
  perf annotate-arm64: Support 'add' instruction tracking
  perf annotate-arm64: Support 'adrp' instruction to track global
    variables
  perf annotate-arm64: Support per-cpu variable access tracking
  perf annotate-arm64: Support 'mrs' instruction to track 'current'
    pointer

 .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
 .../util/annotate-arch/annotate-powerpc.c     |  10 +
 tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
 tools/perf/util/annotate-data.c               |  72 +-
 tools/perf/util/annotate-data.h               |   7 +-
 tools/perf/util/annotate.c                    | 108 +--
 tools/perf/util/annotate.h                    |  12 +
 tools/perf/util/capstone.c                    | 107 ++-
 tools/perf/util/disasm.c                      |   5 +
 tools/perf/util/disasm.h                      |   5 +
 .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
 tools/perf/util/dwarf-regs.c                  |   2 +-
 tools/perf/util/include/dwarf-regs.h          |   1 +
 tools/perf/util/llvm.c                        |  50 ++
 14 files changed, 984 insertions(+), 145 deletions(-)


base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
-- 
2.34.1
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by James Clark 1 month, 3 weeks ago

On 03/04/2026 10:47, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
> 
> The series is organized as follows:
> 
> 1. Fix disassembly mismatches (Patches 01-02)
>     Current perf annotate supports three disassembly backends: llvm,
>     capstone, and objdump. On arm64, inconsistencies between the output
>     of these backends (specifically llvm/capstone vs. objdump) often
>     prevent the tracker from correctly identifying registers and offsets.
>     These patches resolve these mismatches, ensuring consistent instruction
>     parsing across all supported backends.

Did you try recording the Perf datasym workload? With llvm-objdump I 
only get hits on data1 and not data2. And with binutils I don't get any 
hits on that struct at all, although the rest of the samples in 
ld-linux-aarch64.so etc look roughly the same between binutils and llvm. 
I would have thought such a simple example like datasym would work with 
both.

  $ perf record -e arm_spe_0/load_filter=1,store_filter=1,
     min_latency=30/u -c 10000 -- perf test -w datasym

  $ perf annotate --data-type --type-stat --itrace=i1i --stdio

With llvm-objdump-14:

   Annotate data type stats:
   total 25, ok 19 (76.0%), bad 6 (24.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_mem_ops
          3 : no_var
          1 : no_typeinfo
          9 : insn_track

   Annotate type: 'struct buf' in build/local/perf (6663 samples):
   ===============================================================
    Percent     offset       size  field
     100.00          0       0x40  struct buf       {
     100.00          0        0x1      char        data1;
       0.00        0x1       0x37      char[]      reserved;
       0.00       0x38        0x1      char        data2;
                                 };



With binutils that entry is missing:

   Annotate data type stats:
   total 25, ok 14 (56.0%), bad 11 (44.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_cuinfo
          3 : no_var
          6 : no_typeinfo
          4 : insn_track

...


But with the following patch I get plausible output for datasym with 
llvm where both entries in the struct have hits. It looks like you need 
to add the offset when calling get_global_var_type() for 
TSR_KIND_GLOBAL_ADDR otherwise all entries point to the first member of 
the struct:

   Annotate data type stats:
   total 4, ok 2 (50.0%), bad 2 (50.0%)
   -----------------------------------------------------------
          1 : no_sym
          1 : no_typeinfo
          2 : insn_track

   Annotate type: 'struct buf' in build/local/perf (35 samples):
   =====================================================================
    Percent     offset       size  field
     100.00          0       0x39  struct buf       {
      40.00          0        0x1      char        data1;
       0.00        0x1       0x37      char[]      reserved;
      60.00       0x38        0x1      char        data2;
                                 };


diff --git a/tools/perf/util/annotate-data.c 
b/tools/perf/util/annotate-data.c
index 7161417d1c76..0e5825121227 100644
--- a/tools/perf/util/annotate-data.c
+++ b/tools/perf/util/annotate-data.c
@@ -1287,7 +1287,9 @@ static enum type_match_result 
check_matching_type(struct type_state *state,
                  * The register holds the address of a global variable. 
  Try to
                  * find the variable by the address and get its type.
                  */
-               if (get_global_var_type(cu_die, dloc, dloc->ip, 
state->regs[reg].addr,
+               var_addr = state->regs[reg].addr + dloc->op->offset;
+
+               if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
                                         &var_offset, type_die)) {
                         dloc->type_offset = var_offset;


> 
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>     These patches establish the necessary infrastructure for arm64-specific
>     operand handling. This includes implementing new callbacks and data
>     structures to manage arm64's unique addressing modes and register sets.
>     This foundation is essential for the subsequent type-tracking logic.
> 
> 3. Core instruction tracking (Patches 08-16)
>     These patches implement the core logic for type tracking on arm64,
>     covering a wide range of instructions including:
> 
>     * Memory Access: ldr/str variants (including stack-based access).
>     * Arithmetic & Data Processing: mov, add, and adrp.
>     * Special Access: System register access (mrs) and per-cpu variable
>       tracking.
> 
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
> 
> Example Result
> ==============
> 
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> -----------------------------------------------------------
>          29 : no_sym
>         196 : no_var
>         806 : no_typeinfo
>          82 : bad_offset
>        1370 : insn_track
> 
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
>   Percent     offset       size  field
>    100.00          0       0x40  struct page      {
>      9.95          0        0x8      long unsigned int   flags;
>     52.83        0x8       0x28      union        {
>     52.83        0x8       0x28          struct   {
>     37.21        0x8       0x10              union        {
>     37.21        0x8       0x10                  struct list_head        lru {
>     37.21        0x8        0x8                      struct list_head*   next;
>      0.00       0x10        0x8                      struct list_head*   prev;
>                                                  };
>     37.21        0x8       0x10                  struct   {
>     37.21        0x8        0x8                      void*       __filler;
>      0.00       0x10        0x4                      unsigned int        mlock_count;
>     ...
> 
> Changes since v1: (reworked from Huafei's series):
> 
>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>     and objdump disassembly backends.
>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>   - Extend instruction tracking to support mov and add instructions,
>     along with per-cpu and stack variables.
>   - Include real-world examples in commit messages to demonstrate
>     practical effects. (Namhyung Kim)
>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> 
> Please let me know if you have any feedback.
> 
> Thanks,
> Tengda
> 
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
> 
> ---
> 
> Tengda Wu (16):
>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>      objdump
>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>      operands
>    perf annotate-arm64: Handle load and store instructions
>    perf annotate: Introduce extract_op_location callback for
>      arch-specific parsing
>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>    perf annotate-arm64: Implement extract_op_location() callback
>    perf annotate-arm64: Enable instruction tracking support
>    perf annotate-arm64: Support load instruction tracking
>    perf annotate-arm64: Support store instruction tracking
>    perf annotate-arm64: Support stack variable tracking
>    perf annotate-arm64: Support 'mov' instruction tracking
>    perf annotate-arm64: Support 'add' instruction tracking
>    perf annotate-arm64: Support 'adrp' instruction to track global
>      variables
>    perf annotate-arm64: Support per-cpu variable access tracking
>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>      pointer
> 
>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>   tools/perf/util/annotate-data.c               |  72 +-
>   tools/perf/util/annotate-data.h               |   7 +-
>   tools/perf/util/annotate.c                    | 108 +--
>   tools/perf/util/annotate.h                    |  12 +
>   tools/perf/util/capstone.c                    | 107 ++-
>   tools/perf/util/disasm.c                      |   5 +
>   tools/perf/util/disasm.h                      |   5 +
>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>   tools/perf/util/dwarf-regs.c                  |   2 +-
>   tools/perf/util/include/dwarf-regs.h          |   1 +
>   tools/perf/util/llvm.c                        |  50 ++
>   14 files changed, 984 insertions(+), 145 deletions(-)
> 
> 
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 1 month, 2 weeks ago
Hi James,

Sorry for the late reply.

On 2026/4/22 17:50, James Clark wrote:
> 
> 
> On 03/04/2026 10:47, Tengda Wu wrote:
>> This patch series implements data type profiling support for arm64,
>> building upon the foundational work previously contributed by Huafei [1].
>> While the initial version laid the groundwork for arm64 data type analysis,
>> this series iterates on that work by refining instruction parsing and
>> extending support for core architectural features.
>>
>> The series is organized as follows:
>>
>> 1. Fix disassembly mismatches (Patches 01-02)
>>     Current perf annotate supports three disassembly backends: llvm,
>>     capstone, and objdump. On arm64, inconsistencies between the output
>>     of these backends (specifically llvm/capstone vs. objdump) often
>>     prevent the tracker from correctly identifying registers and offsets.
>>     These patches resolve these mismatches, ensuring consistent instruction
>>     parsing across all supported backends.
> 
> Did you try recording the Perf datasym workload? With llvm-objdump I only get hits on data1 and not data2. And with binutils I don't get any hits on that struct at all, although the rest of the samples in ld-linux-aarch64.so etc look roughly the same between binutils and llvm. I would have thought such a simple example like datasym would work with both.> 
>  $ perf record -e arm_spe_0/load_filter=1,store_filter=1,
>     min_latency=30/u -c 10000 -- perf test -w datasym
> 
>  $ perf annotate --data-type --type-stat --itrace=i1i --stdio
> 
> With llvm-objdump-14:
> 
>   Annotate data type stats:
>   total 25, ok 19 (76.0%), bad 6 (24.0%)
>   -----------------------------------------------------------
>          1 : no_sym
>          1 : no_mem_ops
>          3 : no_var
>          1 : no_typeinfo
>          9 : insn_track
> 
>   Annotate type: 'struct buf' in build/local/perf (6663 samples):
>   ===============================================================
>    Percent     offset       size  field
>     100.00          0       0x40  struct buf       {
>     100.00          0        0x1      char        data1;
>       0.00        0x1       0x37      char[]      reserved;
>       0.00       0x38        0x1      char        data2;
>                                 };
> 
> 
> 
> With binutils that entry is missing:
> 
>   Annotate data type stats:
>   total 25, ok 14 (56.0%), bad 11 (44.0%)
>   -----------------------------------------------------------
>          1 : no_sym
>          1 : no_cuinfo
>          3 : no_var
>          6 : no_typeinfo
>          4 : insn_track
> 
> ...
> 

To clarify, perf annotate currently supports three disassembly backends:

1) libllvm, 2) libcapstone, and 3) objdump.

When you compared LLVM and binutils, are you referring to switching the
backend from _libllvm_ to _objdump_?

If so, my local results are actually the opposite:

1. Using libllvm: the 'struct buf' entry is missing.
2. Using objdump: the entry is present, but it only hits data1, not data2.

For issue 1, the root cause is in llvm_name_for_data() within
tools/perf/util/llvm.c when parsing ADRP symbols. It seems these ADRP
address symbols in userspace consistently fail to resolve, preventing
the name from appearing in disasm_buf.

Subsequently, in arm64_mov__parse, an unnecessary '<' character check
(as noted by Namhyung) prevents the address in the ADRP instruction
from being extracted correctly. This ultimately causes the instruction
tracking to fail to identify the type. I will remove the '<' check
in arm64_mov__parse(), which should resolve this issue.

> 
> But with the following patch I get plausible output for datasym with llvm where both entries in the struct have hits. It looks like you need to add the offset when calling get_global_var_type() for TSR_KIND_GLOBAL_ADDR otherwise all entries point to the first member of the struct:
> 
>   Annotate data type stats:
>   total 4, ok 2 (50.0%), bad 2 (50.0%)
>   -----------------------------------------------------------
>          1 : no_sym
>          1 : no_typeinfo
>          2 : insn_track
> 
>   Annotate type: 'struct buf' in build/local/perf (35 samples):
>   =====================================================================
>    Percent     offset       size  field
>     100.00          0       0x39  struct buf       {
>      40.00          0        0x1      char        data1;
>       0.00        0x1       0x37      char[]      reserved;
>      60.00       0x38        0x1      char        data2;
>                                 };
> 
> 
> diff --git a/tools/perf/util/annotate-data.c b/tools/perf/util/annotate-data.c
> index 7161417d1c76..0e5825121227 100644
> --- a/tools/perf/util/annotate-data.c
> +++ b/tools/perf/util/annotate-data.c
> @@ -1287,7 +1287,9 @@ static enum type_match_result check_matching_type(struct type_state *state,
>                  * The register holds the address of a global variable.  Try to
>                  * find the variable by the address and get its type.
>                  */
> -               if (get_global_var_type(cu_die, dloc, dloc->ip, state->regs[reg].addr,
> +               var_addr = state->regs[reg].addr + dloc->op->offset;
> +
> +               if (get_global_var_type(cu_die, dloc, dloc->ip, var_addr,
>                                         &var_offset, type_die)) {
>                         dloc->type_offset = var_offset;
> 
> 

For issue 2 above, I checked with --code-with-type and found that the
instruction missing data2 is an LDR with an offset, which was indeed
overlooked. Thank you for providing the fix; I will include it in v3.

    0.00 :   1fc030: adrp    x0, 61e000 <fake_callchains+0xb90>
    0.00 :   1fc034: add     x0, x0, #0x440
    0.60 :   1fc038: ldrb    w0, [x0, #56]              # data-type: struct buf +0 (data1)

Best Regards,
Tengda

>>
>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>>     These patches establish the necessary infrastructure for arm64-specific
>>     operand handling. This includes implementing new callbacks and data
>>     structures to manage arm64's unique addressing modes and register sets.
>>     This foundation is essential for the subsequent type-tracking logic.
>>
>> 3. Core instruction tracking (Patches 08-16)
>>     These patches implement the core logic for type tracking on arm64,
>>     covering a wide range of instructions including:
>>
>>     * Memory Access: ldr/str variants (including stack-based access).
>>     * Arithmetic & Data Processing: mov, add, and adrp.
>>     * Special Access: System register access (mrs) and per-cpu variable
>>       tracking.
>>
>> The implementation draws inspiration from the existing x86 logic while
>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>> perf annotate can successfully resolve memory locations and register
>> types, enabling comprehensive data type profiling on arm64 platforms.
>>
>> Example Result
>> ==============
>>
>> # perf mem record -a -K -- sleep 1
>> # perf annotate --data-type --type-stat --stdio
>> Annotate data type stats:
>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
>> -----------------------------------------------------------
>>          29 : no_sym
>>         196 : no_var
>>         806 : no_typeinfo
>>          82 : bad_offset
>>        1370 : insn_track
>>
>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>> ============================================================================
>>   Percent     offset       size  field
>>    100.00          0       0x40  struct page      {
>>      9.95          0        0x8      long unsigned int   flags;
>>     52.83        0x8       0x28      union        {
>>     52.83        0x8       0x28          struct   {
>>     37.21        0x8       0x10              union        {
>>     37.21        0x8       0x10                  struct list_head        lru {
>>     37.21        0x8        0x8                      struct list_head*   next;
>>      0.00       0x10        0x8                      struct list_head*   prev;
>>                                                  };
>>     37.21        0x8       0x10                  struct   {
>>     37.21        0x8        0x8                      void*       __filler;
>>      0.00       0x10        0x4                      unsigned int        mlock_count;
>>     ...
>>
>> Changes since v1: (reworked from Huafei's series):
>>
>>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>>     and objdump disassembly backends.
>>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>>   - Extend instruction tracking to support mov and add instructions,
>>     along with per-cpu and stack variables.
>>   - Include real-world examples in commit messages to demonstrate
>>     practical effects. (Namhyung Kim)
>>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>
>> Please let me know if you have any feedback.
>>
>> Thanks,
>> Tengda
>>
>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>> [2] https://developer.arm.com/documentation/102374/0103
>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>
>> ---
>>
>> Tengda Wu (16):
>>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>>      objdump
>>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>>      operands
>>    perf annotate-arm64: Handle load and store instructions
>>    perf annotate: Introduce extract_op_location callback for
>>      arch-specific parsing
>>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>>    perf annotate-arm64: Implement extract_op_location() callback
>>    perf annotate-arm64: Enable instruction tracking support
>>    perf annotate-arm64: Support load instruction tracking
>>    perf annotate-arm64: Support store instruction tracking
>>    perf annotate-arm64: Support stack variable tracking
>>    perf annotate-arm64: Support 'mov' instruction tracking
>>    perf annotate-arm64: Support 'add' instruction tracking
>>    perf annotate-arm64: Support 'adrp' instruction to track global
>>      variables
>>    perf annotate-arm64: Support per-cpu variable access tracking
>>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>>      pointer
>>
>>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>>   tools/perf/util/annotate-data.c               |  72 +-
>>   tools/perf/util/annotate-data.h               |   7 +-
>>   tools/perf/util/annotate.c                    | 108 +--
>>   tools/perf/util/annotate.h                    |  12 +
>>   tools/perf/util/capstone.c                    | 107 ++-
>>   tools/perf/util/disasm.c                      |   5 +
>>   tools/perf/util/disasm.h                      |   5 +
>>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>>   tools/perf/util/dwarf-regs.c                  |   2 +-
>>   tools/perf/util/include/dwarf-regs.h          |   1 +
>>   tools/perf/util/llvm.c                        |  50 ++
>>   14 files changed, 984 insertions(+), 145 deletions(-)
>>
>>
>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
> 

Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by James Clark 1 month, 4 weeks ago

On 03/04/2026 10:47, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
> 
> The series is organized as follows:
> 
> 1. Fix disassembly mismatches (Patches 01-02)
>     Current perf annotate supports three disassembly backends: llvm,
>     capstone, and objdump. On arm64, inconsistencies between the output
>     of these backends (specifically llvm/capstone vs. objdump) often
>     prevent the tracker from correctly identifying registers and offsets.
>     These patches resolve these mismatches, ensuring consistent instruction
>     parsing across all supported backends.
> 
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>     These patches establish the necessary infrastructure for arm64-specific
>     operand handling. This includes implementing new callbacks and data
>     structures to manage arm64's unique addressing modes and register sets.
>     This foundation is essential for the subsequent type-tracking logic.
> 
> 3. Core instruction tracking (Patches 08-16)
>     These patches implement the core logic for type tracking on arm64,
>     covering a wide range of instructions including:
> 
>     * Memory Access: ldr/str variants (including stack-based access).
>     * Arithmetic & Data Processing: mov, add, and adrp.
>     * Special Access: System register access (mrs) and per-cpu variable
>       tracking.
> 
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
> 
> Example Result
> ==============
> 
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio

Hi Tengda,

Did you run this with any itrace options? If I run your command I get 
repeated blocks of duplicate stats and types, which is very confusing. 
One for each sample type that we generate decoding SPE.

For example the default perf report output has all these groups:

   Available samples
   0 arm_spe_0/
     ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
   0 dummy:u
   3 l1d-miss
   18 l1d-access
   0 llc-miss
   0 llc-access
   0 tlb-miss
   22 tlb-access
   0 branch
   0 remote-access
   22 memory
   22 instructions

Obviously there are 22 samples total (instructions) and they get 
duplicated into whatever other categories they happen to have flags for.

To remove the duplicates you have to do --itrace=i1i. Could that need to 
be default for perf annotate with SPE?

> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> -----------------------------------------------------------
>          29 : no_sym
>         196 : no_var
>         806 : no_typeinfo
>          82 : bad_offset
>        1370 : insn_track
> 
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
>   Percent     offset       size  field
>    100.00          0       0x40  struct page      {
>      9.95          0        0x8      long unsigned int   flags;
>     52.83        0x8       0x28      union        {
>     52.83        0x8       0x28          struct   {
>     37.21        0x8       0x10              union        {
>     37.21        0x8       0x10                  struct list_head        lru {
>     37.21        0x8        0x8                      struct list_head*   next;
>      0.00       0x10        0x8                      struct list_head*   prev;
>                                                  };
>     37.21        0x8       0x10                  struct   {
>     37.21        0x8        0x8                      void*       __filler;
>      0.00       0x10        0x4                      unsigned int        mlock_count;
>     ...
> 
> Changes since v1: (reworked from Huafei's series):
> 
>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>     and objdump disassembly backends.
>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>   - Extend instruction tracking to support mov and add instructions,
>     along with per-cpu and stack variables.
>   - Include real-world examples in commit messages to demonstrate
>     practical effects. (Namhyung Kim)
>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> 
> Please let me know if you have any feedback.
> 
> Thanks,
> Tengda
> 
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
> 
> ---
> 
> Tengda Wu (16):
>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>      objdump
>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>      operands
>    perf annotate-arm64: Handle load and store instructions
>    perf annotate: Introduce extract_op_location callback for
>      arch-specific parsing
>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>    perf annotate-arm64: Implement extract_op_location() callback
>    perf annotate-arm64: Enable instruction tracking support
>    perf annotate-arm64: Support load instruction tracking
>    perf annotate-arm64: Support store instruction tracking
>    perf annotate-arm64: Support stack variable tracking
>    perf annotate-arm64: Support 'mov' instruction tracking
>    perf annotate-arm64: Support 'add' instruction tracking
>    perf annotate-arm64: Support 'adrp' instruction to track global
>      variables
>    perf annotate-arm64: Support per-cpu variable access tracking
>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>      pointer
> 
>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>   tools/perf/util/annotate-data.c               |  72 +-
>   tools/perf/util/annotate-data.h               |   7 +-
>   tools/perf/util/annotate.c                    | 108 +--
>   tools/perf/util/annotate.h                    |  12 +
>   tools/perf/util/capstone.c                    | 107 ++-
>   tools/perf/util/disasm.c                      |   5 +
>   tools/perf/util/disasm.h                      |   5 +
>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>   tools/perf/util/dwarf-regs.c                  |   2 +-
>   tools/perf/util/include/dwarf-regs.h          |   1 +
>   tools/perf/util/llvm.c                        |  50 ++
>   14 files changed, 984 insertions(+), 145 deletions(-)
> 
> 
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 1 month, 4 weeks ago

On 2026/4/16 23:31, James Clark wrote:
> 
> 
> On 03/04/2026 10:47, Tengda Wu wrote:
>> This patch series implements data type profiling support for arm64,
>> building upon the foundational work previously contributed by Huafei [1].
>> While the initial version laid the groundwork for arm64 data type analysis,
>> this series iterates on that work by refining instruction parsing and
>> extending support for core architectural features.
>>
>> The series is organized as follows:
>>
>> 1. Fix disassembly mismatches (Patches 01-02)
>>     Current perf annotate supports three disassembly backends: llvm,
>>     capstone, and objdump. On arm64, inconsistencies between the output
>>     of these backends (specifically llvm/capstone vs. objdump) often
>>     prevent the tracker from correctly identifying registers and offsets.
>>     These patches resolve these mismatches, ensuring consistent instruction
>>     parsing across all supported backends.
>>
>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>>     These patches establish the necessary infrastructure for arm64-specific
>>     operand handling. This includes implementing new callbacks and data
>>     structures to manage arm64's unique addressing modes and register sets.
>>     This foundation is essential for the subsequent type-tracking logic.
>>
>> 3. Core instruction tracking (Patches 08-16)
>>     These patches implement the core logic for type tracking on arm64,
>>     covering a wide range of instructions including:
>>
>>     * Memory Access: ldr/str variants (including stack-based access).
>>     * Arithmetic & Data Processing: mov, add, and adrp.
>>     * Special Access: System register access (mrs) and per-cpu variable
>>       tracking.
>>
>> The implementation draws inspiration from the existing x86 logic while
>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>> perf annotate can successfully resolve memory locations and register
>> types, enabling comprehensive data type profiling on arm64 platforms.
>>
>> Example Result
>> ==============
>>
>> # perf mem record -a -K -- sleep 1
>> # perf annotate --data-type --type-stat --stdio
> 
> Hi Tengda,
> 
> Did you run this with any itrace options? If I run your command I get repeated blocks of duplicate stats and types, which is very confusing. One for each sample type that we generate decoding SPE.
> 
> For example the default perf report output has all these groups:
> 
>   Available samples
>   0 arm_spe_0/
>     ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
>   0 dummy:u
>   3 l1d-miss
>   18 l1d-access
>   0 llc-miss
>   0 llc-access
>   0 tlb-miss
>   22 tlb-access
>   0 branch
>   0 remote-access
>   22 memory
>   22 instructions
> 
> Obviously there are 22 samples total (instructions) and they get duplicated into whatever other categories they happen to have flags for.
> 

Yes, I agree. The duplication makes the type stats misleading.
De-duplication is definitely necessary here.

> To remove the duplicates you have to do --itrace=i1i. Could that need to be default for perf annotate with SPE?
> 
I'll look into making this the default behavior for SPE data-type
annotation.

>> Annotate data type stats:
>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
>> -----------------------------------------------------------
>>          29 : no_sym
>>         196 : no_var
>>         806 : no_typeinfo
>>          82 : bad_offset
>>        1370 : insn_track
>>

Here are the results with --itrace=i1i (a slight decrease in accuracy):

Annotate data type stats:
total 1138, ok 877 (77.1%), bad 261 (22.9%)
-----------------------------------------------------------
         6 : no_sym
        44 : no_var
       197 : no_typeinfo
        14 : bad_offset
       238 : insn_track

This will be the new baseline, and I will work on further optimizations
from here.

Best regards,
Tengda

>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>> ============================================================================
>>   Percent     offset       size  field
>>    100.00          0       0x40  struct page      {
>>      9.95          0        0x8      long unsigned int   flags;
>>     52.83        0x8       0x28      union        {
>>     52.83        0x8       0x28          struct   {
>>     37.21        0x8       0x10              union        {
>>     37.21        0x8       0x10                  struct list_head        lru {
>>     37.21        0x8        0x8                      struct list_head*   next;
>>      0.00       0x10        0x8                      struct list_head*   prev;
>>                                                  };
>>     37.21        0x8       0x10                  struct   {
>>     37.21        0x8        0x8                      void*       __filler;
>>      0.00       0x10        0x4                      unsigned int        mlock_count;
>>     ...
>>
>> Changes since v1: (reworked from Huafei's series):
>>
>>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>>     and objdump disassembly backends.
>>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>>   - Extend instruction tracking to support mov and add instructions,
>>     along with per-cpu and stack variables.
>>   - Include real-world examples in commit messages to demonstrate
>>     practical effects. (Namhyung Kim)
>>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>
>> Please let me know if you have any feedback.
>>
>> Thanks,
>> Tengda
>>
>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>> [2] https://developer.arm.com/documentation/102374/0103
>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>
>> ---
>>
>> Tengda Wu (16):
>>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>>      objdump
>>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>>      operands
>>    perf annotate-arm64: Handle load and store instructions
>>    perf annotate: Introduce extract_op_location callback for
>>      arch-specific parsing
>>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>>    perf annotate-arm64: Implement extract_op_location() callback
>>    perf annotate-arm64: Enable instruction tracking support
>>    perf annotate-arm64: Support load instruction tracking
>>    perf annotate-arm64: Support store instruction tracking
>>    perf annotate-arm64: Support stack variable tracking
>>    perf annotate-arm64: Support 'mov' instruction tracking
>>    perf annotate-arm64: Support 'add' instruction tracking
>>    perf annotate-arm64: Support 'adrp' instruction to track global
>>      variables
>>    perf annotate-arm64: Support per-cpu variable access tracking
>>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>>      pointer
>>
>>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>>   tools/perf/util/annotate-data.c               |  72 +-
>>   tools/perf/util/annotate-data.h               |   7 +-
>>   tools/perf/util/annotate.c                    | 108 +--
>>   tools/perf/util/annotate.h                    |  12 +
>>   tools/perf/util/capstone.c                    | 107 ++-
>>   tools/perf/util/disasm.c                      |   5 +
>>   tools/perf/util/disasm.h                      |   5 +
>>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>>   tools/perf/util/dwarf-regs.c                  |   2 +-
>>   tools/perf/util/include/dwarf-regs.h          |   1 +
>>   tools/perf/util/llvm.c                        |  50 ++
>>   14 files changed, 984 insertions(+), 145 deletions(-)
>>
>>
>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb

Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by James Clark 1 month, 3 weeks ago

On 17/04/2026 02:53, Tengda Wu wrote:
> 
> 
> On 2026/4/16 23:31, James Clark wrote:
>>
>>
>> On 03/04/2026 10:47, Tengda Wu wrote:
>>> This patch series implements data type profiling support for arm64,
>>> building upon the foundational work previously contributed by Huafei [1].
>>> While the initial version laid the groundwork for arm64 data type analysis,
>>> this series iterates on that work by refining instruction parsing and
>>> extending support for core architectural features.
>>>
>>> The series is organized as follows:
>>>
>>> 1. Fix disassembly mismatches (Patches 01-02)
>>>      Current perf annotate supports three disassembly backends: llvm,
>>>      capstone, and objdump. On arm64, inconsistencies between the output
>>>      of these backends (specifically llvm/capstone vs. objdump) often
>>>      prevent the tracker from correctly identifying registers and offsets.
>>>      These patches resolve these mismatches, ensuring consistent instruction
>>>      parsing across all supported backends.
>>>
>>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>>>      These patches establish the necessary infrastructure for arm64-specific
>>>      operand handling. This includes implementing new callbacks and data
>>>      structures to manage arm64's unique addressing modes and register sets.
>>>      This foundation is essential for the subsequent type-tracking logic.
>>>
>>> 3. Core instruction tracking (Patches 08-16)
>>>      These patches implement the core logic for type tracking on arm64,
>>>      covering a wide range of instructions including:
>>>
>>>      * Memory Access: ldr/str variants (including stack-based access).
>>>      * Arithmetic & Data Processing: mov, add, and adrp.
>>>      * Special Access: System register access (mrs) and per-cpu variable
>>>        tracking.
>>>
>>> The implementation draws inspiration from the existing x86 logic while
>>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>>> perf annotate can successfully resolve memory locations and register
>>> types, enabling comprehensive data type profiling on arm64 platforms.
>>>
>>> Example Result
>>> ==============
>>>
>>> # perf mem record -a -K -- sleep 1
>>> # perf annotate --data-type --type-stat --stdio
>>
>> Hi Tengda,
>>
>> Did you run this with any itrace options? If I run your command I get repeated blocks of duplicate stats and types, which is very confusing. One for each sample type that we generate decoding SPE.
>>
>> For example the default perf report output has all these groups:
>>
>>    Available samples
>>    0 arm_spe_0/
>>      ts_enable=1,pa_enable=1,load_filter=1,store_filter=1,min_latency=30/
>>    0 dummy:u
>>    3 l1d-miss
>>    18 l1d-access
>>    0 llc-miss
>>    0 llc-access
>>    0 tlb-miss
>>    22 tlb-access
>>    0 branch
>>    0 remote-access
>>    22 memory
>>    22 instructions
>>
>> Obviously there are 22 samples total (instructions) and they get duplicated into whatever other categories they happen to have flags for.
>>
> 
> Yes, I agree. The duplication makes the type stats misleading.
> De-duplication is definitely necessary here.
> 
>> To remove the duplicates you have to do --itrace=i1i. Could that need to be default for perf annotate with SPE?
>>
> I'll look into making this the default behavior for SPE data-type
> annotation.
> 
>>> Annotate data type stats:
>>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
>>> -----------------------------------------------------------
>>>           29 : no_sym
>>>          196 : no_var
>>>          806 : no_typeinfo
>>>           82 : bad_offset
>>>         1370 : insn_track
>>>
> 
> Here are the results with --itrace=i1i (a slight decrease in accuracy):
> 
> Annotate data type stats:
> total 1138, ok 877 (77.1%), bad 261 (22.9%)

I'm still it a bit confused why you seem to get a 'total' count that is 
a sum of all the sample groups, if it went from 6204 to 1138 when you 
only asked for the instructions samples. Whereas I get separate groups, 
and asking for only instructions samples doesn't change the value for 
the last 'total', it just removes the other outputs.

It shouldn't change the accuracy either because the instruction group is 
the top level one which contains all of the samples.

> -----------------------------------------------------------
>           6 : no_sym
>          44 : no_var
>         197 : no_typeinfo
>          14 : bad_offset
>         238 : insn_track
> 
> This will be the new baseline, and I will work on further optimizations
> from here.
> 
> Best regards,
> Tengda
> 
>>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>>> ============================================================================
>>>    Percent     offset       size  field
>>>     100.00          0       0x40  struct page      {
>>>       9.95          0        0x8      long unsigned int   flags;
>>>      52.83        0x8       0x28      union        {
>>>      52.83        0x8       0x28          struct   {
>>>      37.21        0x8       0x10              union        {
>>>      37.21        0x8       0x10                  struct list_head        lru {
>>>      37.21        0x8        0x8                      struct list_head*   next;
>>>       0.00       0x10        0x8                      struct list_head*   prev;
>>>                                                   };
>>>      37.21        0x8       0x10                  struct   {
>>>      37.21        0x8        0x8                      void*       __filler;
>>>       0.00       0x10        0x4                      unsigned int        mlock_count;
>>>      ...
>>>
>>> Changes since v1: (reworked from Huafei's series):
>>>
>>>    - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>>>      and objdump disassembly backends.
>>>    - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>>>    - Extend instruction tracking to support mov and add instructions,
>>>      along with per-cpu and stack variables.
>>>    - Include real-world examples in commit messages to demonstrate
>>>      practical effects. (Namhyung Kim)
>>>    - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>>>      https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>>
>>> Please let me know if you have any feedback.
>>>
>>> Thanks,
>>> Tengda
>>>
>>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>> [2] https://developer.arm.com/documentation/102374/0103
>>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>>
>>> ---
>>>
>>> Tengda Wu (16):
>>>     perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>>>       objdump
>>>     perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>>>     perf annotate-arm64: Generalize arm64_mov__parse to support standard
>>>       operands
>>>     perf annotate-arm64: Handle load and store instructions
>>>     perf annotate: Introduce extract_op_location callback for
>>>       arch-specific parsing
>>>     perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>>>     perf annotate-arm64: Implement extract_op_location() callback
>>>     perf annotate-arm64: Enable instruction tracking support
>>>     perf annotate-arm64: Support load instruction tracking
>>>     perf annotate-arm64: Support store instruction tracking
>>>     perf annotate-arm64: Support stack variable tracking
>>>     perf annotate-arm64: Support 'mov' instruction tracking
>>>     perf annotate-arm64: Support 'add' instruction tracking
>>>     perf annotate-arm64: Support 'adrp' instruction to track global
>>>       variables
>>>     perf annotate-arm64: Support per-cpu variable access tracking
>>>     perf annotate-arm64: Support 'mrs' instruction to track 'current'
>>>       pointer
>>>
>>>    .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>>>    .../util/annotate-arch/annotate-powerpc.c     |  10 +
>>>    tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>>>    tools/perf/util/annotate-data.c               |  72 +-
>>>    tools/perf/util/annotate-data.h               |   7 +-
>>>    tools/perf/util/annotate.c                    | 108 +--
>>>    tools/perf/util/annotate.h                    |  12 +
>>>    tools/perf/util/capstone.c                    | 107 ++-
>>>    tools/perf/util/disasm.c                      |   5 +
>>>    tools/perf/util/disasm.h                      |   5 +
>>>    .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>>>    tools/perf/util/dwarf-regs.c                  |   2 +-
>>>    tools/perf/util/include/dwarf-regs.h          |   1 +
>>>    tools/perf/util/llvm.c                        |  50 ++
>>>    14 files changed, 984 insertions(+), 145 deletions(-)
>>>
>>>
>>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
> 

Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by James Clark 2 months ago

On 03/04/2026 10:47, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.
> 
> The series is organized as follows:
> 
> 1. Fix disassembly mismatches (Patches 01-02)
>     Current perf annotate supports three disassembly backends: llvm,
>     capstone, and objdump. On arm64, inconsistencies between the output
>     of these backends (specifically llvm/capstone vs. objdump) often
>     prevent the tracker from correctly identifying registers and offsets.
>     These patches resolve these mismatches, ensuring consistent instruction
>     parsing across all supported backends.
> 
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>     These patches establish the necessary infrastructure for arm64-specific
>     operand handling. This includes implementing new callbacks and data
>     structures to manage arm64's unique addressing modes and register sets.
>     This foundation is essential for the subsequent type-tracking logic.
> 
> 3. Core instruction tracking (Patches 08-16)
>     These patches implement the core logic for type tracking on arm64,
>     covering a wide range of instructions including:
> 
>     * Memory Access: ldr/str variants (including stack-based access).
>     * Arithmetic & Data Processing: mov, add, and adrp.
>     * Special Access: System register access (mrs) and per-cpu variable
>       tracking.
> 
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
> 
> Example Result
> ==============
> 
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> -----------------------------------------------------------
>          29 : no_sym
>         196 : no_var
>         806 : no_typeinfo
>          82 : bad_offset
>        1370 : insn_track
> 
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
>   Percent     offset       size  field
>    100.00          0       0x40  struct page      {
>      9.95          0        0x8      long unsigned int   flags;
>     52.83        0x8       0x28      union        {
>     52.83        0x8       0x28          struct   {
>     37.21        0x8       0x10              union        {
>     37.21        0x8       0x10                  struct list_head        lru {
>     37.21        0x8        0x8                      struct list_head*   next;
>      0.00       0x10        0x8                      struct list_head*   prev;
>                                                  };
>     37.21        0x8       0x10                  struct   {
>     37.21        0x8        0x8                      void*       __filler;
>      0.00       0x10        0x4                      unsigned int        mlock_count;
>     ...
> 
> Changes since v1: (reworked from Huafei's series):
> 
>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>     and objdump disassembly backends.
>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>   - Extend instruction tracking to support mov and add instructions,
>     along with per-cpu and stack variables.
>   - Include real-world examples in commit messages to demonstrate
>     practical effects. (Namhyung Kim)
>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> 
> Please let me know if you have any feedback.
> 

LGTM apart from a few small comments. I will do some testing as well.

> Thanks,
> Tengda
> 
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
> 
> ---
> 
> Tengda Wu (16):
>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>      objdump
>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>      operands
>    perf annotate-arm64: Handle load and store instructions
>    perf annotate: Introduce extract_op_location callback for
>      arch-specific parsing
>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>    perf annotate-arm64: Implement extract_op_location() callback
>    perf annotate-arm64: Enable instruction tracking support
>    perf annotate-arm64: Support load instruction tracking
>    perf annotate-arm64: Support store instruction tracking
>    perf annotate-arm64: Support stack variable tracking
>    perf annotate-arm64: Support 'mov' instruction tracking
>    perf annotate-arm64: Support 'add' instruction tracking
>    perf annotate-arm64: Support 'adrp' instruction to track global
>      variables
>    perf annotate-arm64: Support per-cpu variable access tracking
>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>      pointer
> 
>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>   tools/perf/util/annotate-data.c               |  72 +-
>   tools/perf/util/annotate-data.h               |   7 +-
>   tools/perf/util/annotate.c                    | 108 +--
>   tools/perf/util/annotate.h                    |  12 +
>   tools/perf/util/capstone.c                    | 107 ++-
>   tools/perf/util/disasm.c                      |   5 +
>   tools/perf/util/disasm.h                      |   5 +
>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>   tools/perf/util/dwarf-regs.c                  |   2 +-
>   tools/perf/util/include/dwarf-regs.h          |   1 +
>   tools/perf/util/llvm.c                        |  50 ++
>   14 files changed, 984 insertions(+), 145 deletions(-)
> 
> 
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 2 months ago

On 2026/4/14 23:10, James Clark wrote:
> 
> 
> On 03/04/2026 10:47, Tengda Wu wrote:
>> This patch series implements data type profiling support for arm64,
>> building upon the foundational work previously contributed by Huafei [1].
>> While the initial version laid the groundwork for arm64 data type analysis,
>> this series iterates on that work by refining instruction parsing and
>> extending support for core architectural features.
>>
>> The series is organized as follows:
>>
>> 1. Fix disassembly mismatches (Patches 01-02)
>>     Current perf annotate supports three disassembly backends: llvm,
>>     capstone, and objdump. On arm64, inconsistencies between the output
>>     of these backends (specifically llvm/capstone vs. objdump) often
>>     prevent the tracker from correctly identifying registers and offsets.
>>     These patches resolve these mismatches, ensuring consistent instruction
>>     parsing across all supported backends.
>>
>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>>     These patches establish the necessary infrastructure for arm64-specific
>>     operand handling. This includes implementing new callbacks and data
>>     structures to manage arm64's unique addressing modes and register sets.
>>     This foundation is essential for the subsequent type-tracking logic.
>>
>> 3. Core instruction tracking (Patches 08-16)
>>     These patches implement the core logic for type tracking on arm64,
>>     covering a wide range of instructions including:
>>
>>     * Memory Access: ldr/str variants (including stack-based access).
>>     * Arithmetic & Data Processing: mov, add, and adrp.
>>     * Special Access: System register access (mrs) and per-cpu variable
>>       tracking.
>>
>> The implementation draws inspiration from the existing x86 logic while
>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>> perf annotate can successfully resolve memory locations and register
>> types, enabling comprehensive data type profiling on arm64 platforms.
>>
>> Example Result
>> ==============
>>
>> # perf mem record -a -K -- sleep 1
>> # perf annotate --data-type --type-stat --stdio
>> Annotate data type stats:
>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
>> -----------------------------------------------------------
>>          29 : no_sym
>>         196 : no_var
>>         806 : no_typeinfo
>>          82 : bad_offset
>>        1370 : insn_track
>>
>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>> ============================================================================
>>   Percent     offset       size  field
>>    100.00          0       0x40  struct page      {
>>      9.95          0        0x8      long unsigned int   flags;
>>     52.83        0x8       0x28      union        {
>>     52.83        0x8       0x28          struct   {
>>     37.21        0x8       0x10              union        {
>>     37.21        0x8       0x10                  struct list_head        lru {
>>     37.21        0x8        0x8                      struct list_head*   next;
>>      0.00       0x10        0x8                      struct list_head*   prev;
>>                                                  };
>>     37.21        0x8       0x10                  struct   {
>>     37.21        0x8        0x8                      void*       __filler;
>>      0.00       0x10        0x4                      unsigned int        mlock_count;
>>     ...
>>
>> Changes since v1: (reworked from Huafei's series):
>>
>>   - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>>     and objdump disassembly backends.
>>   - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>>   - Extend instruction tracking to support mov and add instructions,
>>     along with per-cpu and stack variables.
>>   - Include real-world examples in commit messages to demonstrate
>>     practical effects. (Namhyung Kim)
>>   - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>>     https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>
>> Please let me know if you have any feedback.
>>
> 
> LGTM apart from a few small comments. I will do some testing as well.
> 

Thanks for the feedback! I'll look into your comments and address them.
Please let me know if any issues come up during your testing.

-- Tengda

>> Thanks,
>> Tengda
>>
>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>> [2] https://developer.arm.com/documentation/102374/0103
>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>
>> ---
>>
>> Tengda Wu (16):
>>    perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>>      objdump
>>    perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>>    perf annotate-arm64: Generalize arm64_mov__parse to support standard
>>      operands
>>    perf annotate-arm64: Handle load and store instructions
>>    perf annotate: Introduce extract_op_location callback for
>>      arch-specific parsing
>>    perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>>    perf annotate-arm64: Implement extract_op_location() callback
>>    perf annotate-arm64: Enable instruction tracking support
>>    perf annotate-arm64: Support load instruction tracking
>>    perf annotate-arm64: Support store instruction tracking
>>    perf annotate-arm64: Support stack variable tracking
>>    perf annotate-arm64: Support 'mov' instruction tracking
>>    perf annotate-arm64: Support 'add' instruction tracking
>>    perf annotate-arm64: Support 'adrp' instruction to track global
>>      variables
>>    perf annotate-arm64: Support per-cpu variable access tracking
>>    perf annotate-arm64: Support 'mrs' instruction to track 'current'
>>      pointer
>>
>>   .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>>   .../util/annotate-arch/annotate-powerpc.c     |  10 +
>>   tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>>   tools/perf/util/annotate-data.c               |  72 +-
>>   tools/perf/util/annotate-data.h               |   7 +-
>>   tools/perf/util/annotate.c                    | 108 +--
>>   tools/perf/util/annotate.h                    |  12 +
>>   tools/perf/util/capstone.c                    | 107 ++-
>>   tools/perf/util/disasm.c                      |   5 +
>>   tools/perf/util/disasm.h                      |   5 +
>>   .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>>   tools/perf/util/dwarf-regs.c                  |   2 +-
>>   tools/perf/util/include/dwarf-regs.h          |   1 +
>>   tools/perf/util/llvm.c                        |  50 ++
>>   14 files changed, 984 insertions(+), 145 deletions(-)
>>
>>
>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
> 

Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Namhyung Kim 2 months, 1 week ago
Hello,

On Fri, Apr 03, 2026 at 09:47:44AM +0000, Tengda Wu wrote:
> This patch series implements data type profiling support for arm64,
> building upon the foundational work previously contributed by Huafei [1].
> While the initial version laid the groundwork for arm64 data type analysis,
> this series iterates on that work by refining instruction parsing and
> extending support for core architectural features.

Thanks for working on this!  I'm happy to see that the changes are well
organized and each commit explained the issues clearly.  

> 
> The series is organized as follows:
> 
> 1. Fix disassembly mismatches (Patches 01-02)
>    Current perf annotate supports three disassembly backends: llvm,
>    capstone, and objdump. On arm64, inconsistencies between the output
>    of these backends (specifically llvm/capstone vs. objdump) often
>    prevent the tracker from correctly identifying registers and offsets.
>    These patches resolve these mismatches, ensuring consistent instruction
>    parsing across all supported backends.
> 
> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>    These patches establish the necessary infrastructure for arm64-specific
>    operand handling. This includes implementing new callbacks and data
>    structures to manage arm64's unique addressing modes and register sets.
>    This foundation is essential for the subsequent type-tracking logic.

I've only checked up to this part so far.  Let me write replies soon.
I'll continue to review later in this week.

> 
> 3. Core instruction tracking (Patches 08-16)
>    These patches implement the core logic for type tracking on arm64,
>    covering a wide range of instructions including:
> 
>    * Memory Access: ldr/str variants (including stack-based access).
>    * Arithmetic & Data Processing: mov, add, and adrp.
>    * Special Access: System register access (mrs) and per-cpu variable
>      tracking.
> 
> The implementation draws inspiration from the existing x86 logic while
> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
> perf annotate can successfully resolve memory locations and register
> types, enabling comprehensive data type profiling on arm64 platforms.
> 
> Example Result
> ==============
> 
> # perf mem record -a -K -- sleep 1
> # perf annotate --data-type --type-stat --stdio
> Annotate data type stats:
> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)

I'm impressed that the success rate is quite high.  But I think you need
to confirm that the findings are correct by taking a close look at each
result.  You can try `perf annotate --code-with-type`.

Thanks,
Namhyung


> -----------------------------------------------------------
>         29 : no_sym
>        196 : no_var
>        806 : no_typeinfo
>         82 : bad_offset
>       1370 : insn_track
> 
> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
> ============================================================================
>  Percent     offset       size  field
>   100.00          0       0x40  struct page      {
>     9.95          0        0x8      long unsigned int   flags;
>    52.83        0x8       0x28      union        {
>    52.83        0x8       0x28          struct   {
>    37.21        0x8       0x10              union        {
>    37.21        0x8       0x10                  struct list_head        lru {
>    37.21        0x8        0x8                      struct list_head*   next;
>     0.00       0x10        0x8                      struct list_head*   prev;
>                                                 };
>    37.21        0x8       0x10                  struct   {
>    37.21        0x8        0x8                      void*       __filler;
>     0.00       0x10        0x4                      unsigned int        mlock_count;
>    ...
> 
> Changes since v1: (reworked from Huafei's series):
> 
>  - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>    and objdump disassembly backends.
>  - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>  - Extend instruction tracking to support mov and add instructions,
>    along with per-cpu and stack variables.
>  - Include real-world examples in commit messages to demonstrate
>    practical effects. (Namhyung Kim)
>  - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>    https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> 
> Please let me know if you have any feedback.
> 
> Thanks,
> Tengda
> 
> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
> [2] https://developer.arm.com/documentation/102374/0103
> [3] https://github.com/flynd/asmsheets/releases/tag/v8
> 
> ---
> 
> Tengda Wu (16):
>   perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>     objdump
>   perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>   perf annotate-arm64: Generalize arm64_mov__parse to support standard
>     operands
>   perf annotate-arm64: Handle load and store instructions
>   perf annotate: Introduce extract_op_location callback for
>     arch-specific parsing
>   perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>   perf annotate-arm64: Implement extract_op_location() callback
>   perf annotate-arm64: Enable instruction tracking support
>   perf annotate-arm64: Support load instruction tracking
>   perf annotate-arm64: Support store instruction tracking
>   perf annotate-arm64: Support stack variable tracking
>   perf annotate-arm64: Support 'mov' instruction tracking
>   perf annotate-arm64: Support 'add' instruction tracking
>   perf annotate-arm64: Support 'adrp' instruction to track global
>     variables
>   perf annotate-arm64: Support per-cpu variable access tracking
>   perf annotate-arm64: Support 'mrs' instruction to track 'current'
>     pointer
> 
>  .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>  .../util/annotate-arch/annotate-powerpc.c     |  10 +
>  tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>  tools/perf/util/annotate-data.c               |  72 +-
>  tools/perf/util/annotate-data.h               |   7 +-
>  tools/perf/util/annotate.c                    | 108 +--
>  tools/perf/util/annotate.h                    |  12 +
>  tools/perf/util/capstone.c                    | 107 ++-
>  tools/perf/util/disasm.c                      |   5 +
>  tools/perf/util/disasm.h                      |   5 +
>  .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>  tools/perf/util/dwarf-regs.c                  |   2 +-
>  tools/perf/util/include/dwarf-regs.h          |   1 +
>  tools/perf/util/llvm.c                        |  50 ++
>  14 files changed, 984 insertions(+), 145 deletions(-)
> 
> 
> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
> -- 
> 2.34.1
>
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 2 months, 1 week ago
Hi Namhyung,

Thank you for the feedback.
I have received your comments and will get back to this as soon as I can.

Best regards,
Tengda

On 2026/4/7 14:31, Namhyung Kim wrote:
> Hello,
> 
> On Fri, Apr 03, 2026 at 09:47:44AM +0000, Tengda Wu wrote:
>> This patch series implements data type profiling support for arm64,
>> building upon the foundational work previously contributed by Huafei [1].
>> While the initial version laid the groundwork for arm64 data type analysis,
>> this series iterates on that work by refining instruction parsing and
>> extending support for core architectural features.
> 
> Thanks for working on this!  I'm happy to see that the changes are well
> organized and each commit explained the issues clearly.  
> 
>>
>> The series is organized as follows:
>>
>> 1. Fix disassembly mismatches (Patches 01-02)
>>    Current perf annotate supports three disassembly backends: llvm,
>>    capstone, and objdump. On arm64, inconsistencies between the output
>>    of these backends (specifically llvm/capstone vs. objdump) often
>>    prevent the tracker from correctly identifying registers and offsets.
>>    These patches resolve these mismatches, ensuring consistent instruction
>>    parsing across all supported backends.
>>
>> 2. Infrastructure for arm64 operand parsing (Patches 03-07)
>>    These patches establish the necessary infrastructure for arm64-specific
>>    operand handling. This includes implementing new callbacks and data
>>    structures to manage arm64's unique addressing modes and register sets.
>>    This foundation is essential for the subsequent type-tracking logic.
> 
> I've only checked up to this part so far.  Let me write replies soon.
> I'll continue to review later in this week.
> 
>>
>> 3. Core instruction tracking (Patches 08-16)
>>    These patches implement the core logic for type tracking on arm64,
>>    covering a wide range of instructions including:
>>
>>    * Memory Access: ldr/str variants (including stack-based access).
>>    * Arithmetic & Data Processing: mov, add, and adrp.
>>    * Special Access: System register access (mrs) and per-cpu variable
>>      tracking.
>>
>> The implementation draws inspiration from the existing x86 logic while
>> adapting it to the nuances of the AArch64 ISA [2][3]. With these changes,
>> perf annotate can successfully resolve memory locations and register
>> types, enabling comprehensive data type profiling on arm64 platforms.
>>
>> Example Result
>> ==============
>>
>> # perf mem record -a -K -- sleep 1
>> # perf annotate --data-type --type-stat --stdio
>> Annotate data type stats:
>> total 6204, ok 5091 (82.1%), bad 1113 (17.9%)
> 
> I'm impressed that the success rate is quite high.  But I think you need
> to confirm that the findings are correct by taking a close look at each
> result.  You can try `perf annotate --code-with-type`.
> 
> Thanks,
> Namhyung
> 
> 
>> -----------------------------------------------------------
>>         29 : no_sym
>>        196 : no_var
>>        806 : no_typeinfo
>>         82 : bad_offset
>>       1370 : insn_track
>>
>> Annotate type: 'struct page' in [kernel.kallsyms] (59208 samples):
>> ============================================================================
>>  Percent     offset       size  field
>>   100.00          0       0x40  struct page      {
>>     9.95          0        0x8      long unsigned int   flags;
>>    52.83        0x8       0x28      union        {
>>    52.83        0x8       0x28          struct   {
>>    37.21        0x8       0x10              union        {
>>    37.21        0x8       0x10                  struct list_head        lru {
>>    37.21        0x8        0x8                      struct list_head*   next;
>>     0.00       0x10        0x8                      struct list_head*   prev;
>>                                                 };
>>    37.21        0x8       0x10                  struct   {
>>    37.21        0x8        0x8                      void*       __filler;
>>     0.00       0x10        0x4                      unsigned int        mlock_count;
>>    ...
>>
>> Changes since v1: (reworked from Huafei's series):
>>
>>  - Fix inconsistencies in arm64 instruction output across llvm, capstone,
>>    and objdump disassembly backends.
>>  - Support arm64-specific addressing modes and operand formats. (Leo Yan)
>>  - Extend instruction tracking to support mov and add instructions,
>>    along with per-cpu and stack variables.
>>  - Include real-world examples in commit messages to demonstrate
>>    practical effects. (Namhyung Kim)
>>  - Improve type-tracking success rate (type stat) from 64.2% to 82.1%.
>>    https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>>
>> Please let me know if you have any feedback.
>>
>> Thanks,
>> Tengda
>>
>> [1] https://lore.kernel.org/all/20250314162137.528204-1-lihuafei1@huawei.com/
>> [2] https://developer.arm.com/documentation/102374/0103
>> [3] https://github.com/flynd/asmsheets/releases/tag/v8
>>
>> ---
>>
>> Tengda Wu (16):
>>   perf llvm: Fix arm64 adrp instruction disassembly mismatch with
>>     objdump
>>   perf capstone: Fix arm64 jump/adrp disassembly mismatch with objdump
>>   perf annotate-arm64: Generalize arm64_mov__parse to support standard
>>     operands
>>   perf annotate-arm64: Handle load and store instructions
>>   perf annotate: Introduce extract_op_location callback for
>>     arch-specific parsing
>>   perf dwarf-regs: Adapt get_dwarf_regnum() for arm64
>>   perf annotate-arm64: Implement extract_op_location() callback
>>   perf annotate-arm64: Enable instruction tracking support
>>   perf annotate-arm64: Support load instruction tracking
>>   perf annotate-arm64: Support store instruction tracking
>>   perf annotate-arm64: Support stack variable tracking
>>   perf annotate-arm64: Support 'mov' instruction tracking
>>   perf annotate-arm64: Support 'add' instruction tracking
>>   perf annotate-arm64: Support 'adrp' instruction to track global
>>     variables
>>   perf annotate-arm64: Support per-cpu variable access tracking
>>   perf annotate-arm64: Support 'mrs' instruction to track 'current'
>>     pointer
>>
>>  .../perf/util/annotate-arch/annotate-arm64.c  | 642 +++++++++++++++++-
>>  .../util/annotate-arch/annotate-powerpc.c     |  10 +
>>  tools/perf/util/annotate-arch/annotate-x86.c  |  88 ++-
>>  tools/perf/util/annotate-data.c               |  72 +-
>>  tools/perf/util/annotate-data.h               |   7 +-
>>  tools/perf/util/annotate.c                    | 108 +--
>>  tools/perf/util/annotate.h                    |  12 +
>>  tools/perf/util/capstone.c                    | 107 ++-
>>  tools/perf/util/disasm.c                      |   5 +
>>  tools/perf/util/disasm.h                      |   5 +
>>  .../util/dwarf-regs-arch/dwarf-regs-arm64.c   |  20 +
>>  tools/perf/util/dwarf-regs.c                  |   2 +-
>>  tools/perf/util/include/dwarf-regs.h          |   1 +
>>  tools/perf/util/llvm.c                        |  50 ++
>>  14 files changed, 984 insertions(+), 145 deletions(-)
>>
>>
>> base-commit: cf7c3c02fdd0dfccf4d6611714273dcb538af2cb
>> -- 
>> 2.34.1
>>
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Namhyung Kim 2 months ago
On Wed, Apr 08, 2026 at 07:35:22PM +0800, Tengda Wu wrote:
> Hi Namhyung,
> 
> Thank you for the feedback.
> I have received your comments and will get back to this as soon as I can.
 
Thanks, I've reviewed the instruction tracking part.

Basically my concern is that you should invalidate unknown or unhandled
cases so that it should strictly maintain the correct states of
registers and stack slots.  Otherwise it would contain many incorrect
results.

Also, please take a look at sashiko reviews which spotted many things.

https://sashiko.dev/#/patchset/20260403094800.1418825-1-wutengda%40huaweicloud.com

Thanks,
Namhyung
Re: [PATCH v2 00/16] perf arm64: Support data type profiling
Posted by Tengda Wu 2 months ago

On 2026/4/10 15:00, Namhyung Kim wrote:
> On Wed, Apr 08, 2026 at 07:35:22PM +0800, Tengda Wu wrote:
>> Hi Namhyung,
>>
>> Thank you for the feedback.
>> I have received your comments and will get back to this as soon as I can.
>  
> Thanks, I've reviewed the instruction tracking part.
> 
> Basically my concern is that you should invalidate unknown or unhandled
> cases so that it should strictly maintain the correct states of
> registers and stack slots.  Otherwise it would contain many incorrect
> results.
> 
> Also, please take a look at sashiko reviews which spotted many things.
> 
> https://sashiko.dev/#/patchset/20260403094800.1418825-1-wutengda%40huaweicloud.com
> 
> Thanks,
> Namhyung

Much appreciate the thorough review.

There is a lot of valuable input here. I will take some time to deep-dive into
each comment and ensure all concerns are properly addressed in V2.

Thanks,
Tengda