[PATCH v6 00/30] objtool: Function validation tracing

Alexandre Chartre posted 30 patches 1 week, 3 days ago
.../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
tools/build/Makefile.feature                  |    4 +-
tools/objtool/.gitignore                      |    3 +
tools/objtool/Build                           |    3 +
tools/objtool/Makefile                        |   26 +
tools/objtool/arch/loongarch/decode.c         |   23 +
tools/objtool/arch/loongarch/special.c        |    5 +
tools/objtool/arch/powerpc/decode.c           |   24 +
tools/objtool/arch/powerpc/special.c          |    5 +
tools/objtool/arch/x86/Build                  |    8 +
tools/objtool/arch/x86/decode.c               |   20 +
tools/objtool/arch/x86/special.c              |   10 +
tools/objtool/builtin-check.c                 |    4 +
tools/objtool/check.c                         |  648 +++++----
tools/objtool/disas.c                         | 1245 +++++++++++++++++
tools/objtool/include/objtool/arch.h          |   11 +
tools/objtool/include/objtool/builtin.h       |    3 +
tools/objtool/include/objtool/check.h         |   35 +-
tools/objtool/include/objtool/disas.h         |   81 ++
tools/objtool/include/objtool/special.h       |    4 +-
tools/objtool/include/objtool/trace.h         |  141 ++
tools/objtool/include/objtool/warn.h          |   17 +-
tools/objtool/special.c                       |    2 +
tools/objtool/trace.c                         |  203 +++
24 files changed, 2259 insertions(+), 299 deletions(-)
create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
create mode 100644 tools/objtool/disas.c
create mode 100644 tools/objtool/include/objtool/disas.h
create mode 100644 tools/objtool/include/objtool/trace.h
create mode 100644 tools/objtool/trace.c
[PATCH v6 00/30] objtool: Function validation tracing
Posted by Alexandre Chartre 1 week, 3 days ago
Hi,

These patches change objtool to disassemble code with libopcodes instead
of running objdump. You will find below:

- Changes: list of changes made in this version
- Overview: overview of the changes
- Notes: description of some particular behavior
- Examples: output examples

Patches are now based on tip/master.

I am deferring the following changes to future patches:
- Josh: convert --disas option to subcommand
- David: provide branch distance for small branches

Thanks,

alex.

-----

Changes:
========

V6:
---
- Josh: fix commit comments (PATCH 22 and 23)
- Josh: update .gitignore (PATCH 03 and 26)
- clean feature discovery (PATCH 03)

V5: 
---
- patches are now based on tip/master
- remove the resolution of direct/PV calls
  (added in V4 but this needs more work)
- Josh: fix rname
- Josh: change names for jump tables and exception tables
- Josh: display header line for single-line alternatives
- Josh: make NOP<n> lowercase
- Josh: fix alternatives order
- Josh: trim trailing NOPs
- David: indicate the number of trailing NOPs (nop*<n>)
- David: provide compact output for alternatives disassembly.
  The compact output is now the default, and there is a --wide
  option to provide a wide output where alternatives are displayed
  side-by-side.

V4:
---
This version fixes a build issue when disassembly is not available. Compared
with V3, this is addresses by changes in patch 14 (objtool: Improve tracing
of alternative instructions). Other patches are similar to V3.

V3:
---
This version addresses comments from Josh and Peter, in particular:

- Josh: replace ERROR in disas_context_create with WARN
- Josh: do not change offstr() outside the disassembler
- Josh: duplicated "falls through to next function" warning
- Josh: validate_symbol() has extra newline before return
- Josh: "make -s" should be completely silent
- Josh: instructions with unwinding state changes are printing twice
- Josh: explain TRACE_INSN(insn, NULL): this prints an instruction with no
  	additional message.

- Peter: display alternative on a single line
- Peter: display nop-like instruction as NOP<n>
- Peter: in alternative show differences between jmp.d8 and jmp.d32 (jmp/jmpq is used now)
- Peter: show alternative feature name and flags
- Peter: alternative jumps to altinstr_aux - see NOTE below:
         Disassembly can show default alternative jumping to .altinstr_aux
- Peter: some jump label target seems wrong (jmp +0) - NOTE below:
         Disassembly can show alternative jumping to the next instruction

Other improvements:

- An alternatives is displayed on single line if each alternative has a
  single instruction. Otherwise alternatives are dispayed side-by-side,
  with one column for each lternative. XXX option?

- Each alternative of a group alternative is displayed with its feature
  name and flags: <flags><feature-name>

  <flags> is made of the following characters:

    '!' : ALT_FLAG_NOT
    '+' : ALT_FLAG_DIRECT_CALL
    '?' : unknown flag (i.e. any other flags)

- A jump table is displayed the same way as an alternative, with the default
  branch (or not) instruction, and the corresponding substitute instruction.
  It is identified with the "JUMP" name.

- An exception table is displayed the same way as an alternative, with the
  default instruction (which can cause an exception), and a "resume at <desc>"
  string which indicates where the execution resumes if there is an exception.
  It is identified with the "EXCEPTION" name.

- An exception table can be present for an instruction which also has an
  alternative. In that case, the exception table is displayed similarly
  as the different group alternatives for this instruction.

- Print the destination name of pv_ops calls  when we can figure out if
  XENPV mode is used or not. If the PV mode can't be predicted then print
  the default pv_ops destination as a destination example. **REMOVED IN V5**

- If a group alternative is a direct call then print the corresponding
  pv_ops call. **REMOVED IN V5**


Overview:
=========

This provides the following changes to objtool.

- Disassemble code with libopcodes instead of running objdump

  objtool executes the objdump command to disassemble code. In particular,
  if objtool fails to validate a function then it will use objdump to
  disassemble the entire file which is not very helpful when processing
  a large file (like vmlinux.o).

  Using libopcodes provides more control about the disassembly scope and
  output, and it is possible to disassemble a single instruction or
  a single function. Now when objtool fails to validate a function it
  will disassemble that single function instead of disassembling the
  entire file.

- Add the --trace <function> option to trace function validation

  Figuring out why a function validation has failed can be difficult because
  objtool checks all code flows (including alternatives) and maintains
  instructions states (in particular call frame information).

  The trace option allows to follow the function validation done by objtool
  instruction per instruction, see what objtool is doing and get function
  validation information. An output example is shown below.

- Add the --disas <function> option to disassemble functions

  Disassembly is done using libopcodes and it will show the different code
  alternatives.

Note: some changes are architecture specific (x86, powerpc, loongarch). Any
feedback about the behavior on powerpc and loongarch is welcome.


Notes:
======

Disassembly can show default alternative jumping to .altinstr_aux
-----------------------------------------------------------------
Disassembly can show a default alternative jumping to .altinstr_aux. This
happens when the _static_cpu_has() function is used. Its default code
jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).

At runtime, this sequence is not used because the _static_cpu_has() 
an alternative with the X86_FEATURE_ALWAYS feature. 


  debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
  dec1:  perf_get_x86_pmu_capability+0x11     ud2                                                       


Disassembly can show alternative jumping to the next instruction
----------------------------------------------------------------

The disassembly can show jump tables with an alternative which jumps
to the next instruction.

For example:

def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
defb:  perf_get_x86_pmu_capability+0x4b	   mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      

This disassembly is correct. These instructions come from:

        cap->num_counters_gp = x86_pmu_num_counters(NULL)):

which will end up executing this statement:

        if (static_branch_unlikely(&perf_is_hybrid) && NULL)
	        <do something>;

This statement is optimized to do nothing because the condition is
always false. But static_branch_unlikely() is implemented with a jump
table inside an "asm goto" statement, and "asm goto" statements are
not optimized.

So basically the code is optimized like this:

        if (static_branch_unlikely(&perf_is_hybrid))
	        ;

And this translates to the assembly code above.


Examples:
=========

Example 1 (--trace option): Trace the validation of the os_save() function
--------------------------------------------------------------------------

$ ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --hacks=skylake --ibt --orc --retpoline --rethunk --sls --static-call --uaccess --prefix=16 --link --trace os_xsave -v vmlinux.o
os_xsave: validation begin
 59be0:  os_xsave+0x0                  push   %r12                                          - state: cfa=rsp+16 r12=(cfa-16) stack_size=16 
 59be2:  os_xsave+0x2		       mov    0x0(%rip),%eax        # 0x59be8 <alternatives_patched>
 59be8:  os_xsave+0x8		       push   %rbp                                          - state: cfa=rsp+24 rbp=(cfa-24) stack_size=24 
 59be9:  os_xsave+0x9		       mov    %rdi,%rbp                                          
 59bec:  os_xsave+0xc		       push   %rbx					     - state: cfa=rsp+32 rbx=(cfa-32) stack_size=32 
 59bed:  os_xsave+0xd		       mov    0x8(%rdi),%rbx                                     
 59bf1:  os_xsave+0x11		       mov    %rbx,%r12                                          
 59bf4:  os_xsave+0x14		       shr    $0x20,%r12                                         
 59bf8:  os_xsave+0x18		       test   %eax,%eax                                          
 59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump taken
 59c22:  os_xsave+0x42		       | ud2                                                     
 59c24:  os_xsave+0x44		       | jmp    0x59bfc <os_xsave+0x1c>                      - unconditional jump
 59bfc:  os_xsave+0x1c		       | | xor    %edx,%edx                                      
 59bfe:  os_xsave+0x1e		       | | mov    %rbx,%rsi                                      
 59c01:  os_xsave+0x21		       | | mov    %rbp,%rdi                                      
 59c04:  os_xsave+0x24		       | | callq  0x59c09 <xfd_validate_state>               - call
 59c09:  os_xsave+0x29		       | | mov    %ebx,%eax                                      
 59c0b:  os_xsave+0x2b		       | | mov    %r12d,%edx                                     
 	 			       | | / <alternative.59c0e> X86_FEATURE_XSAVEOPT
  1b29:  .altinstr_replacement+0x1b29  | | | xsaveopt64 0x40(%rbp)                               
 59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                    
 59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx                                    
 59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>                  - jump taken
 59c26:  os_xsave+0x46		       | | | | ud2                                               
 59c28:  os_xsave+0x48		       | | | | pop    %rbx                                   - state: cfa=rsp+24 rbx=<undef> stack_size=24 
 59c29:  os_xsave+0x49		       | | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16 
 59c2a:  os_xsave+0x4a		       | | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8 
 59c2c:  os_xsave+0x4c		       | | | | jmpq   0x59c31 <__x86_return_thunk>	     - return
 59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>		     - jump not taken
 59c19:  os_xsave+0x39		       | | | pop    %rbx    				     - state: cfa=rsp+24 rbx=<undef> stack_size=24 
 59c1a:  os_xsave+0x3a		       | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16 
 59c1b:  os_xsave+0x3b		       | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8 
 59c1d:  os_xsave+0x3d		       | | | jmpq   0x59c22 <__x86_return_thunk>	     - return
 	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEOPT
				       | | / <alternative.59c0e> X86_FEATURE_XSAVEC
  1b2e:  .altinstr_replacement+0x1b2e  | | | xsavec64 0x40(%rbp)                                 
 59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
 	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEC
				       | | / <alternative.59c0e> X86_FEATURE_XSAVES
  1b33:  .altinstr_replacement+0x1b33  | | | xsaves64 0x40(%rbp)                                 
 59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
 	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVES
				       | | / <alternative.59c0e> EXCEPTION for instruction at 0x59c0e <os_xsave+0x2e>
 59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx                                - already visited
 	 			       | | \ <alternative.59c0e> EXCEPTION
				       | | / <alternative.59c0e> DEFAULT
 59c0e:  os_xsave+0x2e		       | | xsave64 0x40(%rbp)                                    
 59c13:  os_xsave+0x33		       | | xor    %ebx,%ebx                                  - already visited
 59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump not taken
 59bfc:  os_xsave+0x1c		       xor    %edx,%edx                                      - already visited
os_xsave: validation end


Example 2 (--disas option): Single Instruction Alternatives
-----------------------------------------------------------

Compact Output (default):

Alternatives with single instructions are displayed each on one line,
with the instruction and a description of the alternative.

$ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --link vmlinux.o
perf_get_x86_pmu_capability:
  deb0:  perf_get_x86_pmu_capability+0x0     endbr64                                                   
  deb4:  perf_get_x86_pmu_capability+0x4     callq  0xdeb9 <__fentry__>                                
  deb9:  perf_get_x86_pmu_capability+0x9     mov    %rdi,%rdx                                          
  debc:  perf_get_x86_pmu_capability+0xc     <alternative.debc>
  	 				     = jmpq   0xdec1 <.altinstr_aux+0xfc>                 (if DEFAULT)
					     = jmpq   0x622 <perf_get_x86_pmu_capability+0x37>    (if X86_FEATURE_ALWAYS)
					     = nop5                                               (if X86_FEATURE_HYBRID_CPU)
  dec1:  perf_get_x86_pmu_capability+0x11    ud2                                                       
  dec3:  perf_get_x86_pmu_capability+0x13    movq   $0x0,(%rdx)                                        
  deca:  perf_get_x86_pmu_capability+0x1a    movq   $0x0,0x8(%rdx)                                     
  ded2:  perf_get_x86_pmu_capability+0x22    movq   $0x0,0x10(%rdx)                                    
  deda:  perf_get_x86_pmu_capability+0x2a    movq   $0x0,0x18(%rdx)                                    
  dee2:  perf_get_x86_pmu_capability+0x32    jmpq   0xdee7 <__x86_return_thunk>                        
  dee7:  perf_get_x86_pmu_capability+0x37    cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>      
  deef:  perf_get_x86_pmu_capability+0x3f    je     0xdec3 <perf_get_x86_pmu_capability+0x13>          
  def1:  perf_get_x86_pmu_capability+0x41    mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>       
  def7:  perf_get_x86_pmu_capability+0x47    mov    %eax,(%rdi)                                        
  def9:  perf_get_x86_pmu_capability+0x49    <jump_table.def9>
  	 				     = nop2                                              (if DEFAULT)
					     = jmp    defb <perf_get_x86_pmu_capability+0x4b>    (if JUMP)
  defb:  perf_get_x86_pmu_capability+0x4b    mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
  df02:  perf_get_x86_pmu_capability+0x52    <alternative.df02>
  	 				     = callq  0xdf07 <__sw_hweight64>    (if DEFAULT)
					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
  df07:  perf_get_x86_pmu_capability+0x57    mov    %eax,0x4(%rdx)                                     
  df0a:  perf_get_x86_pmu_capability+0x5a    <jump_table.df0a>
  	 				     = nop2                                              (if DEFAULT)
					     = jmp    df0c <perf_get_x86_pmu_capability+0x5c>    (if JUMP)
  df0c:  perf_get_x86_pmu_capability+0x5c    mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>      
  df13:  perf_get_x86_pmu_capability+0x63    <alternative.df13>
  	 				     = callq  0xdf18 <__sw_hweight64>    (if DEFAULT)
					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
  df18:  perf_get_x86_pmu_capability+0x68    mov    %eax,0x8(%rdx)                                     
  df1b:  perf_get_x86_pmu_capability+0x6b    mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>      
  df21:  perf_get_x86_pmu_capability+0x71    mov    %eax,0xc(%rdx)                                     
  df24:  perf_get_x86_pmu_capability+0x74    mov    %eax,0x10(%rdx)                                    
  df27:  perf_get_x86_pmu_capability+0x77    mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>     
  df2e:  perf_get_x86_pmu_capability+0x7e    mov    %eax,0x14(%rdx)                                    
  df31:  perf_get_x86_pmu_capability+0x81    mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>     
  df37:  perf_get_x86_pmu_capability+0x87    mov    %eax,0x18(%rdx)                                    
  df3a:  perf_get_x86_pmu_capability+0x8a    movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>     
  df41:  perf_get_x86_pmu_capability+0x91    movzbl 0x1c(%rdx),%eax                                    
  df45:  perf_get_x86_pmu_capability+0x95    shr    %cl                                                
  df47:  perf_get_x86_pmu_capability+0x97    and    $0x1,%ecx                                          
  df4a:  perf_get_x86_pmu_capability+0x9a    and    $0xfffffffe,%eax                                   
  df4d:  perf_get_x86_pmu_capability+0x9d    or     %ecx,%eax                                          
  df4f:  perf_get_x86_pmu_capability+0x9f    mov    %al,0x1c(%rdx)                                     
  df52:  perf_get_x86_pmu_capability+0xa2    jmpq   0xdf57 <__x86_return_thunk>                        


Wide Output (--wide option):

Alternatives with single instructions are displayed side-by-side,
with an header.

$ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --wide --link vmlinux.o
perf_get_x86_pmu_capability:
  deb0:  perf_get_x86_pmu_capability+0x0       endbr64                                                   
  deb4:  perf_get_x86_pmu_capability+0x4       callq  0xdeb9 <__fentry__>                                
  deb9:  perf_get_x86_pmu_capability+0x9       mov    %rdi,%rdx                                          
  debc:  perf_get_x86_pmu_capability+0xc     | <alternative.debc>                 | X86_FEATURE_ALWAYS                              | X86_FEATURE_HYBRID_CPU 
  debc:  perf_get_x86_pmu_capability+0xc     | jmpq   0xdec1 <.altinstr_aux+0xfc> | jmpq   0x622 <perf_get_x86_pmu_capability+0x37> | nop5                   
  dec1:  perf_get_x86_pmu_capability+0x11      ud2                                                       
  dec3:  perf_get_x86_pmu_capability+0x13      movq   $0x0,(%rdx)                                        
  deca:  perf_get_x86_pmu_capability+0x1a      movq   $0x0,0x8(%rdx)                                     
  ded2:  perf_get_x86_pmu_capability+0x22      movq   $0x0,0x10(%rdx)                                    
  deda:  perf_get_x86_pmu_capability+0x2a      movq   $0x0,0x18(%rdx)                                    
  dee2:  perf_get_x86_pmu_capability+0x32      jmpq   0xdee7 <__x86_return_thunk>                        
  dee7:  perf_get_x86_pmu_capability+0x37      cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>      
  deef:  perf_get_x86_pmu_capability+0x3f      je     0xdec3 <perf_get_x86_pmu_capability+0x13>          
  def1:  perf_get_x86_pmu_capability+0x41      mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>       
  def7:  perf_get_x86_pmu_capability+0x47      mov    %eax,(%rdi)                                        
  def9:  perf_get_x86_pmu_capability+0x49    | <jump_table.def9> | JUMP                                           
  def9:  perf_get_x86_pmu_capability+0x49    | nop2              | jmp    defb <perf_get_x86_pmu_capability+0x4b> 
  defb:  perf_get_x86_pmu_capability+0x4b      mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
  df02:  perf_get_x86_pmu_capability+0x52    | <alternative.df02>             | X86_FEATURE_POPCNT 
  df02:  perf_get_x86_pmu_capability+0x52    | callq  0xdf07 <__sw_hweight64> | popcnt %rdi,%rax   
  df07:  perf_get_x86_pmu_capability+0x57      mov    %eax,0x4(%rdx)                                     
  df0a:  perf_get_x86_pmu_capability+0x5a    | <jump_table.df0a> | JUMP                                           
  df0a:  perf_get_x86_pmu_capability+0x5a    | nop2              | jmp    df0c <perf_get_x86_pmu_capability+0x5c> 
  df0c:  perf_get_x86_pmu_capability+0x5c      mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>      
  df13:  perf_get_x86_pmu_capability+0x63    | <alternative.df13>             | X86_FEATURE_POPCNT 
  df13:  perf_get_x86_pmu_capability+0x63    | callq  0xdf18 <__sw_hweight64> | popcnt %rdi,%rax   
  df18:  perf_get_x86_pmu_capability+0x68      mov    %eax,0x8(%rdx)                                     
  df1b:  perf_get_x86_pmu_capability+0x6b      mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>      
  df21:  perf_get_x86_pmu_capability+0x71      mov    %eax,0xc(%rdx)                                     
  df24:  perf_get_x86_pmu_capability+0x74      mov    %eax,0x10(%rdx)                                    
  df27:  perf_get_x86_pmu_capability+0x77      mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>     
  df2e:  perf_get_x86_pmu_capability+0x7e      mov    %eax,0x14(%rdx)                                    
  df31:  perf_get_x86_pmu_capability+0x81      mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>     
  df37:  perf_get_x86_pmu_capability+0x87      mov    %eax,0x18(%rdx)                                    
  df3a:  perf_get_x86_pmu_capability+0x8a      movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>     
  df41:  perf_get_x86_pmu_capability+0x91      movzbl 0x1c(%rdx),%eax                                    
  df45:  perf_get_x86_pmu_capability+0x95      shr    %cl                                                
  df47:  perf_get_x86_pmu_capability+0x97      and    $0x1,%ecx                                          
  df4a:  perf_get_x86_pmu_capability+0x9a      and    $0xfffffffe,%eax                                   
  df4d:  perf_get_x86_pmu_capability+0x9d      or     %ecx,%eax                                          
  df4f:  perf_get_x86_pmu_capability+0x9f      mov    %al,0x1c(%rdx)                                     
  df52:  perf_get_x86_pmu_capability+0xa2      jmpq   0xdf57 <__x86_return_thunk>                        


Example 3 (--disas option): Alternatives with multiple instructions
-------------------------------------------------------------------

Compact Output (default):

Alternatives with multiple instructions are displayed one above the other,
with an header describing the alternative.

$ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
__switch_to_asm:
  82c0:  __switch_to_asm+0x0       push   %rbp                                               
  82c1:  __switch_to_asm+0x1	   push   %rbx                                               
  82c2:  __switch_to_asm+0x2	   push   %r12                                               
  82c4:  __switch_to_asm+0x4	   push   %r13                                               
  82c6:  __switch_to_asm+0x6	   push   %r14                                               
  82c8:  __switch_to_asm+0x8	   push   %r15                                               
  82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)                                  
  82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp                                  
  82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx                                   
  82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
  82e7:  __switch_to_asm+0x27	   <alternative.82e7>
  	 			   = DEFAULT
  82e7:  __switch_to_asm+0x27	   | jmp    0x8312 <__switch_to_asm+0x52>
  82e9:  __switch_to_asm+0x29	   | nop*41
  	 			   |
				   = X86_FEATURE_RSB_CTXSW
  82e7:  __switch_to_asm+0x27	   | mov    $0x10,%r12
  82ee:  __switch_to_asm+0x2e	   | callq  0x82f4 <__switch_to_asm+0x34>
  82f3:  __switch_to_asm+0x33	   | int3   
  82f4:  __switch_to_asm+0x34	   | callq  0x82fa <__switch_to_asm+0x3a>
  82f9:  __switch_to_asm+0x39	   | int3   
  82fa:  __switch_to_asm+0x3a	   | add    $0x10,%rsp
  82fe:  __switch_to_asm+0x3e	   | dec    %r12
  8301:  __switch_to_asm+0x41	   | jne    0x82ee <__switch_to_asm+0x2e>
  8303:  __switch_to_asm+0x43	   | lfence 
  8306:  __switch_to_asm+0x46	   | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
  	 			   |
				   = !X86_FEATURE_ALWAYS
  82e7:  __switch_to_asm+0x27	   | nop1
  82e8:  __switch_to_asm+0x28	   | nop1
  82e9:  __switch_to_asm+0x29	   | callq  0x82ef <__switch_to_asm+0x2f>
  82ee:  __switch_to_asm+0x2e	   | int3   
  82ef:  __switch_to_asm+0x2f	   | add    $0x8,%rsp
  82f3:  __switch_to_asm+0x33	   | lfence 
  	 			   |
  8312:  __switch_to_asm+0x52	   pop    %r15                                               
  8314:  __switch_to_asm+0x54	   pop    %r14                                               
  8316:  __switch_to_asm+0x56	   pop    %r13                                               
  8318:  __switch_to_asm+0x58	   pop    %r12                                               
  831a:  __switch_to_asm+0x5a	   pop    %rbx                                               
  831b:  __switch_to_asm+0x5b	   pop    %rbp                                               
  831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>                               


Wide Output (--wide option):

Alternatives with multiple instructions are displayed side-by-side, with
an header describing the alternative. The code in the first column is the
default code of the alternative.

$ ./tools/objtool/objtool --disas=__switch_to_asm --wide  --link vmlinux.o
__switch_to_asm:
  82c0:  __switch_to_asm+0x0       push   %rbp                                               
  82c1:  __switch_to_asm+0x1	   push   %rbx                                               
  82c2:  __switch_to_asm+0x2	   push   %r12                                               
  82c4:  __switch_to_asm+0x4	   push   %r13                                               
  82c6:  __switch_to_asm+0x6	   push   %r14                                               
  82c8:  __switch_to_asm+0x8	   push   %r15                                               
  82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)                                  
  82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp                                  
  82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx                                   
  82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
  82e7:  __switch_to_asm+0x27	 | <alternative.82e7>                   | X86_FEATURE_RSB_CTXSW                                               | !X86_FEATURE_ALWAYS
  82e7:  __switch_to_asm+0x27	 | jmp    0x8312 <__switch_to_asm+0x52> | mov    $0x10,%r12						      | nop1
  82e8:  __switch_to_asm+0x28	 |                                      | 	 							      | nop1
  82e9:  __switch_to_asm+0x29	 | nop*41                               |                                                                     | callq  0x82ef <__switch_to_asm+0x2f>
  82ee:  __switch_to_asm+0x2e	 |                                      | callq  0x82f4 <__switch_to_asm+0x34>                                | int3
  82ef:  __switch_to_asm+0x2f	 |                                      |                                                                     | add    $0x8,%rsp
  82f3:  __switch_to_asm+0x33	 |                                      | int3                                                                | lfence
  82f4:  __switch_to_asm+0x34	 |                                      | callq  0x82fa <__switch_to_asm+0x3a>                                |
  82f9:  __switch_to_asm+0x39	 |                                      | int3                                                                |
  82fa:  __switch_to_asm+0x3a	 |                                      | add    $0x10,%rsp                                                   |
  82fe:  __switch_to_asm+0x3e	 |                                      | dec    %r12                                                         |
  8301:  __switch_to_asm+0x41	 |                                      | jne    0x82ee <__switch_to_asm+0x2e>                                |
  8303:  __switch_to_asm+0x43	 |                                      | lfence                                                              |
  8306:  __switch_to_asm+0x46	 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip) # 0x20b <__x86_call_depth> |
  8312:  __switch_to_asm+0x52	   pop    %r15                                               
  8314:  __switch_to_asm+0x54	   pop    %r14                                               
  8316:  __switch_to_asm+0x56	   pop    %r13                                               
  8318:  __switch_to_asm+0x58	   pop    %r12                                               
  831a:  __switch_to_asm+0x5a	   pop    %rbx                                               
  831b:  __switch_to_asm+0x5b	   pop    %rbp                                               
  831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>
  
-----

Alexandre Chartre (30):
  objtool: Move disassembly functions to a separated file
  objtool: Create disassembly context
  objtool: Disassemble code with libopcodes instead of running objdump
  tool build: Remove annoying newline in build output
  objtool: Print symbol during disassembly
  objtool: Store instruction disassembly result
  objtool: Disassemble instruction on warning or backtrace
  objtool: Extract code to validate instruction from the validate branch
    loop
  objtool: Record symbol name max length
  objtool: Add option to trace function validation
  objtool: Trace instruction state changes during function validation
  objtool: Improve register reporting during function validation
  objtool: Identify the different types of alternatives
  objtool: Add functions to better name alternatives
  objtool: Improve tracing of alternative instructions
  objtool: Do not validate IBT for .return_sites and .call_sites
  objtool: Add the --disas=<function-pattern> action
  objtool: Preserve alternatives order
  objtool: Print headers for alternatives
  objtool: Disassemble group alternatives
  objtool: Print addresses with alternative instructions
  objtool: Disassemble exception table alternatives
  objtool: Disassemble jump table alternatives
  objtool: Fix address references in alternatives
  objtool: Provide access to feature and flags of group alternatives
  objtool: Function to get the name of a CPU feature
  objtool: Improve naming of group alternatives
  objtool: Compact output for alternatives with one instruction
  objtool: Add wide output for disassembly
  objtool: Trim trailing NOPs in alternative

 .../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
 tools/build/Makefile.feature                  |    4 +-
 tools/objtool/.gitignore                      |    3 +
 tools/objtool/Build                           |    3 +
 tools/objtool/Makefile                        |   26 +
 tools/objtool/arch/loongarch/decode.c         |   23 +
 tools/objtool/arch/loongarch/special.c        |    5 +
 tools/objtool/arch/powerpc/decode.c           |   24 +
 tools/objtool/arch/powerpc/special.c          |    5 +
 tools/objtool/arch/x86/Build                  |    8 +
 tools/objtool/arch/x86/decode.c               |   20 +
 tools/objtool/arch/x86/special.c              |   10 +
 tools/objtool/builtin-check.c                 |    4 +
 tools/objtool/check.c                         |  648 +++++----
 tools/objtool/disas.c                         | 1245 +++++++++++++++++
 tools/objtool/include/objtool/arch.h          |   11 +
 tools/objtool/include/objtool/builtin.h       |    3 +
 tools/objtool/include/objtool/check.h         |   35 +-
 tools/objtool/include/objtool/disas.h         |   81 ++
 tools/objtool/include/objtool/special.h       |    4 +-
 tools/objtool/include/objtool/trace.h         |  141 ++
 tools/objtool/include/objtool/warn.h          |   17 +-
 tools/objtool/special.c                       |    2 +
 tools/objtool/trace.c                         |  203 +++
 24 files changed, 2259 insertions(+), 299 deletions(-)
 create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
 create mode 100644 tools/objtool/disas.c
 create mode 100644 tools/objtool/include/objtool/disas.h
 create mode 100644 tools/objtool/include/objtool/trace.h
 create mode 100644 tools/objtool/trace.c

-- 
2.43.5
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Peter Zijlstra 1 week, 3 days ago
On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
> Alexandre Chartre (30):
>   objtool: Move disassembly functions to a separated file
>   objtool: Create disassembly context
>   objtool: Disassemble code with libopcodes instead of running objdump
>   tool build: Remove annoying newline in build output
>   objtool: Print symbol during disassembly
>   objtool: Store instruction disassembly result
>   objtool: Disassemble instruction on warning or backtrace
>   objtool: Extract code to validate instruction from the validate branch
>     loop
>   objtool: Record symbol name max length
>   objtool: Add option to trace function validation
>   objtool: Trace instruction state changes during function validation
>   objtool: Improve register reporting during function validation
>   objtool: Identify the different types of alternatives
>   objtool: Add functions to better name alternatives
>   objtool: Improve tracing of alternative instructions
>   objtool: Do not validate IBT for .return_sites and .call_sites
>   objtool: Add the --disas=<function-pattern> action
>   objtool: Preserve alternatives order
>   objtool: Print headers for alternatives
>   objtool: Disassemble group alternatives
>   objtool: Print addresses with alternative instructions
>   objtool: Disassemble exception table alternatives
>   objtool: Disassemble jump table alternatives
>   objtool: Fix address references in alternatives
>   objtool: Provide access to feature and flags of group alternatives
>   objtool: Function to get the name of a CPU feature
>   objtool: Improve naming of group alternatives
>   objtool: Compact output for alternatives with one instruction
>   objtool: Add wide output for disassembly
>   objtool: Trim trailing NOPs in alternative

I've pushed out these patches to queue/objtool/core, however when
building defconfig I get this:

  CC      /mnt/hirez/usr/src/linux-2.6/defconfig-build/tools/objtool/librbtree.o
arch/x86/special.c:10:10: fatal error: lib/cpu-feature-names.c: No such file or directory
   10 | #include "lib/cpu-feature-names.c"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

:-(
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Alexandre Chartre 1 week, 3 days ago
On 11/21/25 11:36, Peter Zijlstra wrote:
> On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
>> Alexandre Chartre (30):
>>    objtool: Move disassembly functions to a separated file
>>    objtool: Create disassembly context
>>    objtool: Disassemble code with libopcodes instead of running objdump
>>    tool build: Remove annoying newline in build output
>>    objtool: Print symbol during disassembly
>>    objtool: Store instruction disassembly result
>>    objtool: Disassemble instruction on warning or backtrace
>>    objtool: Extract code to validate instruction from the validate branch
>>      loop
>>    objtool: Record symbol name max length
>>    objtool: Add option to trace function validation
>>    objtool: Trace instruction state changes during function validation
>>    objtool: Improve register reporting during function validation
>>    objtool: Identify the different types of alternatives
>>    objtool: Add functions to better name alternatives
>>    objtool: Improve tracing of alternative instructions
>>    objtool: Do not validate IBT for .return_sites and .call_sites
>>    objtool: Add the --disas=<function-pattern> action
>>    objtool: Preserve alternatives order
>>    objtool: Print headers for alternatives
>>    objtool: Disassemble group alternatives
>>    objtool: Print addresses with alternative instructions
>>    objtool: Disassemble exception table alternatives
>>    objtool: Disassemble jump table alternatives
>>    objtool: Fix address references in alternatives
>>    objtool: Provide access to feature and flags of group alternatives
>>    objtool: Function to get the name of a CPU feature
>>    objtool: Improve naming of group alternatives
>>    objtool: Compact output for alternatives with one instruction
>>    objtool: Add wide output for disassembly
>>    objtool: Trim trailing NOPs in alternative
> 
> I've pushed out these patches to queue/objtool/core, however when
> building defconfig I get this:
> 
>    CC      /mnt/hirez/usr/src/linux-2.6/defconfig-build/tools/objtool/librbtree.o
> arch/x86/special.c:10:10: fatal error: lib/cpu-feature-names.c: No such file or directory
>     10 | #include "lib/cpu-feature-names.c"
>        |          ^~~~~~~~~~~~~~~~~~~~~~~~~
> compilation terminated.
>

I having a look. The problem is when using the O=<something> option.

Sorry,

alex.
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Alexandre Chartre 1 week, 3 days ago
On 11/21/25 14:16, Alexandre Chartre wrote:
> 
> On 11/21/25 11:36, Peter Zijlstra wrote:
>> On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
>>> Alexandre Chartre (30):
>>>    objtool: Move disassembly functions to a separated file
>>>    objtool: Create disassembly context
>>>    objtool: Disassemble code with libopcodes instead of running objdump
>>>    tool build: Remove annoying newline in build output
>>>    objtool: Print symbol during disassembly
>>>    objtool: Store instruction disassembly result
>>>    objtool: Disassemble instruction on warning or backtrace
>>>    objtool: Extract code to validate instruction from the validate branch
>>>      loop
>>>    objtool: Record symbol name max length
>>>    objtool: Add option to trace function validation
>>>    objtool: Trace instruction state changes during function validation
>>>    objtool: Improve register reporting during function validation
>>>    objtool: Identify the different types of alternatives
>>>    objtool: Add functions to better name alternatives
>>>    objtool: Improve tracing of alternative instructions
>>>    objtool: Do not validate IBT for .return_sites and .call_sites
>>>    objtool: Add the --disas=<function-pattern> action
>>>    objtool: Preserve alternatives order
>>>    objtool: Print headers for alternatives
>>>    objtool: Disassemble group alternatives
>>>    objtool: Print addresses with alternative instructions
>>>    objtool: Disassemble exception table alternatives
>>>    objtool: Disassemble jump table alternatives
>>>    objtool: Fix address references in alternatives
>>>    objtool: Provide access to feature and flags of group alternatives
>>>    objtool: Function to get the name of a CPU feature
>>>    objtool: Improve naming of group alternatives
>>>    objtool: Compact output for alternatives with one instruction
>>>    objtool: Add wide output for disassembly
>>>    objtool: Trim trailing NOPs in alternative
>>
>> I've pushed out these patches to queue/objtool/core, however when
>> building defconfig I get this:
>>
>>    CC      /mnt/hirez/usr/src/linux-2.6/defconfig-build/tools/objtool/librbtree.o
>> arch/x86/special.c:10:10: fatal error: lib/cpu-feature-names.c: No such file or directory
>>     10 | #include "lib/cpu-feature-names.c"
>>        |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>> compilation terminated.
>>

See fix below. This should be fold into patch 26 ("objtool: Function to get the name of a CPU feature").
I can resend this patch or the entire patchset if you want.

alex.

----

diff --git a/tools/objtool/arch/x86/Build b/tools/objtool/arch/x86/Build
index 1067355361b56..b95448ee01ee4 100644
--- a/tools/objtool/arch/x86/Build
+++ b/tools/objtool/arch/x86/Build
@@ -11,6 +11,8 @@ $(OUTPUT)arch/x86/lib/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
  
  $(OUTPUT)arch/x86/decode.o: $(OUTPUT)arch/x86/lib/inat-tables.c
  
+CFLAGS_decode.o += -I$(OUTPUT)arch/x86/lib
+
  cpu_features = ../arch/x86/include/asm/cpufeatures.h
  cpu_features_script = ../arch/x86/tools/gen-cpu-feature-names-x86.awk
  
@@ -19,4 +21,4 @@ $(OUTPUT)arch/x86/lib/cpu-feature-names.c: $(cpu_features_script) $(cpu_features
  
  $(OUTPUT)arch/x86/special.o: $(OUTPUT)arch/x86/lib/cpu-feature-names.c
  
-CFLAGS_decode.o += -I$(OUTPUT)arch/x86/lib
+CFLAGS_special.o := -I$(OUTPUT)arch/x86/lib
diff --git a/tools/objtool/arch/x86/special.c b/tools/objtool/arch/x86/special.c
index b6b40c56da896..e817a3fff4491 100644
--- a/tools/objtool/arch/x86/special.c
+++ b/tools/objtool/arch/x86/special.c
@@ -7,7 +7,7 @@
  #include <asm/cpufeatures.h>
  
  /* cpu feature name array generated from cpufeatures.h */
-#include "lib/cpu-feature-names.c"
+#include "cpu-feature-names.c"
  
  void arch_handle_alternative(struct special_alt *alt)
  {


Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Peter Zijlstra 1 week, 3 days ago
On Fri, Nov 21, 2025 at 02:51:57PM +0100, Alexandre Chartre wrote:

> > > I've pushed out these patches to queue/objtool/core, however when
> > > building defconfig I get this:
> > > 
> > >    CC      /mnt/hirez/usr/src/linux-2.6/defconfig-build/tools/objtool/librbtree.o
> > > arch/x86/special.c:10:10: fatal error: lib/cpu-feature-names.c: No such file or directory
> > >     10 | #include "lib/cpu-feature-names.c"
> > >        |          ^~~~~~~~~~~~~~~~~~~~~~~~~
> > > compilation terminated.
> > > 
> 
> See fix below. This should be fold into patch 26 ("objtool: Function to get the name of a CPU feature").
> I can resend this patch or the entire patchset if you want.

Folded and pushed out again. Local build succeeds, lets see if the
robots like it.

Thanks!
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Borislav Petkov 1 week ago
On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
> Notes:
> ======

So those examples below look very useful and cool. Can you pls document them
so that I can find the breadcrumbs next time and use them?

:-)

Thx.

> Disassembly can show default alternative jumping to .altinstr_aux
> -----------------------------------------------------------------
> Disassembly can show a default alternative jumping to .altinstr_aux. This
> happens when the _static_cpu_has() function is used. Its default code
> jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
> 
> At runtime, this sequence is not used because the _static_cpu_has() 
> an alternative with the X86_FEATURE_ALWAYS feature. 
> 
> 
>   debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
>   dec1:  perf_get_x86_pmu_capability+0x11     ud2                                                       
> 
> 
> Disassembly can show alternative jumping to the next instruction
> ----------------------------------------------------------------
> 
> The disassembly can show jump tables with an alternative which jumps
> to the next instruction.
> 
> For example:
> 
> def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
> defb:  perf_get_x86_pmu_capability+0x4b	   mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
> 
> This disassembly is correct. These instructions come from:
> 
>         cap->num_counters_gp = x86_pmu_num_counters(NULL)):
> 
> which will end up executing this statement:
> 
>         if (static_branch_unlikely(&perf_is_hybrid) && NULL)
> 	        <do something>;
> 
> This statement is optimized to do nothing because the condition is
> always false. But static_branch_unlikely() is implemented with a jump
> table inside an "asm goto" statement, and "asm goto" statements are
> not optimized.
> 
> So basically the code is optimized like this:
> 
>         if (static_branch_unlikely(&perf_is_hybrid))
> 	        ;
> 
> And this translates to the assembly code above.
> 
> 
> Examples:
> =========
> 
> Example 1 (--trace option): Trace the validation of the os_save() function
> --------------------------------------------------------------------------
> 
> $ ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --hacks=skylake --ibt --orc --retpoline --rethunk --sls --static-call --uaccess --prefix=16 --link --trace os_xsave -v vmlinux.o
> os_xsave: validation begin
>  59be0:  os_xsave+0x0                  push   %r12                                          - state: cfa=rsp+16 r12=(cfa-16) stack_size=16 
>  59be2:  os_xsave+0x2		       mov    0x0(%rip),%eax        # 0x59be8 <alternatives_patched>
>  59be8:  os_xsave+0x8		       push   %rbp                                          - state: cfa=rsp+24 rbp=(cfa-24) stack_size=24 
>  59be9:  os_xsave+0x9		       mov    %rdi,%rbp                                          
>  59bec:  os_xsave+0xc		       push   %rbx					     - state: cfa=rsp+32 rbx=(cfa-32) stack_size=32 
>  59bed:  os_xsave+0xd		       mov    0x8(%rdi),%rbx                                     
>  59bf1:  os_xsave+0x11		       mov    %rbx,%r12                                          
>  59bf4:  os_xsave+0x14		       shr    $0x20,%r12                                         
>  59bf8:  os_xsave+0x18		       test   %eax,%eax                                          
>  59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump taken
>  59c22:  os_xsave+0x42		       | ud2                                                     
>  59c24:  os_xsave+0x44		       | jmp    0x59bfc <os_xsave+0x1c>                      - unconditional jump
>  59bfc:  os_xsave+0x1c		       | | xor    %edx,%edx                                      
>  59bfe:  os_xsave+0x1e		       | | mov    %rbx,%rsi                                      
>  59c01:  os_xsave+0x21		       | | mov    %rbp,%rdi                                      
>  59c04:  os_xsave+0x24		       | | callq  0x59c09 <xfd_validate_state>               - call
>  59c09:  os_xsave+0x29		       | | mov    %ebx,%eax                                      
>  59c0b:  os_xsave+0x2b		       | | mov    %r12d,%edx                                     
>  	 			       | | / <alternative.59c0e> X86_FEATURE_XSAVEOPT
>   1b29:  .altinstr_replacement+0x1b29  | | | xsaveopt64 0x40(%rbp)                               
>  59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                    
>  59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx                                    
>  59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>                  - jump taken
>  59c26:  os_xsave+0x46		       | | | | ud2                                               
>  59c28:  os_xsave+0x48		       | | | | pop    %rbx                                   - state: cfa=rsp+24 rbx=<undef> stack_size=24 
>  59c29:  os_xsave+0x49		       | | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16 
>  59c2a:  os_xsave+0x4a		       | | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8 
>  59c2c:  os_xsave+0x4c		       | | | | jmpq   0x59c31 <__x86_return_thunk>	     - return
>  59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>		     - jump not taken
>  59c19:  os_xsave+0x39		       | | | pop    %rbx    				     - state: cfa=rsp+24 rbx=<undef> stack_size=24 
>  59c1a:  os_xsave+0x3a		       | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16 
>  59c1b:  os_xsave+0x3b		       | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8 
>  59c1d:  os_xsave+0x3d		       | | | jmpq   0x59c22 <__x86_return_thunk>	     - return
>  	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEOPT
> 				       | | / <alternative.59c0e> X86_FEATURE_XSAVEC
>   1b2e:  .altinstr_replacement+0x1b2e  | | | xsavec64 0x40(%rbp)                                 
>  59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
>  	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEC
> 				       | | / <alternative.59c0e> X86_FEATURE_XSAVES
>   1b33:  .altinstr_replacement+0x1b33  | | | xsaves64 0x40(%rbp)                                 
>  59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
>  	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVES
> 				       | | / <alternative.59c0e> EXCEPTION for instruction at 0x59c0e <os_xsave+0x2e>
>  59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx                                - already visited
>  	 			       | | \ <alternative.59c0e> EXCEPTION
> 				       | | / <alternative.59c0e> DEFAULT
>  59c0e:  os_xsave+0x2e		       | | xsave64 0x40(%rbp)                                    
>  59c13:  os_xsave+0x33		       | | xor    %ebx,%ebx                                  - already visited
>  59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump not taken
>  59bfc:  os_xsave+0x1c		       xor    %edx,%edx                                      - already visited
> os_xsave: validation end
> 
> 
> Example 2 (--disas option): Single Instruction Alternatives
> -----------------------------------------------------------
> 
> Compact Output (default):
> 
> Alternatives with single instructions are displayed each on one line,
> with the instruction and a description of the alternative.
> 
> $ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --link vmlinux.o
> perf_get_x86_pmu_capability:
>   deb0:  perf_get_x86_pmu_capability+0x0     endbr64                                                   
>   deb4:  perf_get_x86_pmu_capability+0x4     callq  0xdeb9 <__fentry__>                                
>   deb9:  perf_get_x86_pmu_capability+0x9     mov    %rdi,%rdx                                          
>   debc:  perf_get_x86_pmu_capability+0xc     <alternative.debc>
>   	 				     = jmpq   0xdec1 <.altinstr_aux+0xfc>                 (if DEFAULT)
> 					     = jmpq   0x622 <perf_get_x86_pmu_capability+0x37>    (if X86_FEATURE_ALWAYS)
> 					     = nop5                                               (if X86_FEATURE_HYBRID_CPU)
>   dec1:  perf_get_x86_pmu_capability+0x11    ud2                                                       
>   dec3:  perf_get_x86_pmu_capability+0x13    movq   $0x0,(%rdx)                                        
>   deca:  perf_get_x86_pmu_capability+0x1a    movq   $0x0,0x8(%rdx)                                     
>   ded2:  perf_get_x86_pmu_capability+0x22    movq   $0x0,0x10(%rdx)                                    
>   deda:  perf_get_x86_pmu_capability+0x2a    movq   $0x0,0x18(%rdx)                                    
>   dee2:  perf_get_x86_pmu_capability+0x32    jmpq   0xdee7 <__x86_return_thunk>                        
>   dee7:  perf_get_x86_pmu_capability+0x37    cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>      
>   deef:  perf_get_x86_pmu_capability+0x3f    je     0xdec3 <perf_get_x86_pmu_capability+0x13>          
>   def1:  perf_get_x86_pmu_capability+0x41    mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>       
>   def7:  perf_get_x86_pmu_capability+0x47    mov    %eax,(%rdi)                                        
>   def9:  perf_get_x86_pmu_capability+0x49    <jump_table.def9>
>   	 				     = nop2                                              (if DEFAULT)
> 					     = jmp    defb <perf_get_x86_pmu_capability+0x4b>    (if JUMP)
>   defb:  perf_get_x86_pmu_capability+0x4b    mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
>   df02:  perf_get_x86_pmu_capability+0x52    <alternative.df02>
>   	 				     = callq  0xdf07 <__sw_hweight64>    (if DEFAULT)
> 					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
>   df07:  perf_get_x86_pmu_capability+0x57    mov    %eax,0x4(%rdx)                                     
>   df0a:  perf_get_x86_pmu_capability+0x5a    <jump_table.df0a>
>   	 				     = nop2                                              (if DEFAULT)
> 					     = jmp    df0c <perf_get_x86_pmu_capability+0x5c>    (if JUMP)
>   df0c:  perf_get_x86_pmu_capability+0x5c    mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>      
>   df13:  perf_get_x86_pmu_capability+0x63    <alternative.df13>
>   	 				     = callq  0xdf18 <__sw_hweight64>    (if DEFAULT)
> 					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
>   df18:  perf_get_x86_pmu_capability+0x68    mov    %eax,0x8(%rdx)                                     
>   df1b:  perf_get_x86_pmu_capability+0x6b    mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>      
>   df21:  perf_get_x86_pmu_capability+0x71    mov    %eax,0xc(%rdx)                                     
>   df24:  perf_get_x86_pmu_capability+0x74    mov    %eax,0x10(%rdx)                                    
>   df27:  perf_get_x86_pmu_capability+0x77    mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>     
>   df2e:  perf_get_x86_pmu_capability+0x7e    mov    %eax,0x14(%rdx)                                    
>   df31:  perf_get_x86_pmu_capability+0x81    mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>     
>   df37:  perf_get_x86_pmu_capability+0x87    mov    %eax,0x18(%rdx)                                    
>   df3a:  perf_get_x86_pmu_capability+0x8a    movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>     
>   df41:  perf_get_x86_pmu_capability+0x91    movzbl 0x1c(%rdx),%eax                                    
>   df45:  perf_get_x86_pmu_capability+0x95    shr    %cl                                                
>   df47:  perf_get_x86_pmu_capability+0x97    and    $0x1,%ecx                                          
>   df4a:  perf_get_x86_pmu_capability+0x9a    and    $0xfffffffe,%eax                                   
>   df4d:  perf_get_x86_pmu_capability+0x9d    or     %ecx,%eax                                          
>   df4f:  perf_get_x86_pmu_capability+0x9f    mov    %al,0x1c(%rdx)                                     
>   df52:  perf_get_x86_pmu_capability+0xa2    jmpq   0xdf57 <__x86_return_thunk>                        
> 
> 
> Wide Output (--wide option):
> 
> Alternatives with single instructions are displayed side-by-side,
> with an header.
> 
> $ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --wide --link vmlinux.o
> perf_get_x86_pmu_capability:
>   deb0:  perf_get_x86_pmu_capability+0x0       endbr64                                                   
>   deb4:  perf_get_x86_pmu_capability+0x4       callq  0xdeb9 <__fentry__>                                
>   deb9:  perf_get_x86_pmu_capability+0x9       mov    %rdi,%rdx                                          
>   debc:  perf_get_x86_pmu_capability+0xc     | <alternative.debc>                 | X86_FEATURE_ALWAYS                              | X86_FEATURE_HYBRID_CPU 
>   debc:  perf_get_x86_pmu_capability+0xc     | jmpq   0xdec1 <.altinstr_aux+0xfc> | jmpq   0x622 <perf_get_x86_pmu_capability+0x37> | nop5                   
>   dec1:  perf_get_x86_pmu_capability+0x11      ud2                                                       
>   dec3:  perf_get_x86_pmu_capability+0x13      movq   $0x0,(%rdx)                                        
>   deca:  perf_get_x86_pmu_capability+0x1a      movq   $0x0,0x8(%rdx)                                     
>   ded2:  perf_get_x86_pmu_capability+0x22      movq   $0x0,0x10(%rdx)                                    
>   deda:  perf_get_x86_pmu_capability+0x2a      movq   $0x0,0x18(%rdx)                                    
>   dee2:  perf_get_x86_pmu_capability+0x32      jmpq   0xdee7 <__x86_return_thunk>                        
>   dee7:  perf_get_x86_pmu_capability+0x37      cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>      
>   deef:  perf_get_x86_pmu_capability+0x3f      je     0xdec3 <perf_get_x86_pmu_capability+0x13>          
>   def1:  perf_get_x86_pmu_capability+0x41      mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>       
>   def7:  perf_get_x86_pmu_capability+0x47      mov    %eax,(%rdi)                                        
>   def9:  perf_get_x86_pmu_capability+0x49    | <jump_table.def9> | JUMP                                           
>   def9:  perf_get_x86_pmu_capability+0x49    | nop2              | jmp    defb <perf_get_x86_pmu_capability+0x4b> 
>   defb:  perf_get_x86_pmu_capability+0x4b      mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
>   df02:  perf_get_x86_pmu_capability+0x52    | <alternative.df02>             | X86_FEATURE_POPCNT 
>   df02:  perf_get_x86_pmu_capability+0x52    | callq  0xdf07 <__sw_hweight64> | popcnt %rdi,%rax   
>   df07:  perf_get_x86_pmu_capability+0x57      mov    %eax,0x4(%rdx)                                     
>   df0a:  perf_get_x86_pmu_capability+0x5a    | <jump_table.df0a> | JUMP                                           
>   df0a:  perf_get_x86_pmu_capability+0x5a    | nop2              | jmp    df0c <perf_get_x86_pmu_capability+0x5c> 
>   df0c:  perf_get_x86_pmu_capability+0x5c      mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>      
>   df13:  perf_get_x86_pmu_capability+0x63    | <alternative.df13>             | X86_FEATURE_POPCNT 
>   df13:  perf_get_x86_pmu_capability+0x63    | callq  0xdf18 <__sw_hweight64> | popcnt %rdi,%rax   
>   df18:  perf_get_x86_pmu_capability+0x68      mov    %eax,0x8(%rdx)                                     
>   df1b:  perf_get_x86_pmu_capability+0x6b      mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>      
>   df21:  perf_get_x86_pmu_capability+0x71      mov    %eax,0xc(%rdx)                                     
>   df24:  perf_get_x86_pmu_capability+0x74      mov    %eax,0x10(%rdx)                                    
>   df27:  perf_get_x86_pmu_capability+0x77      mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>     
>   df2e:  perf_get_x86_pmu_capability+0x7e      mov    %eax,0x14(%rdx)                                    
>   df31:  perf_get_x86_pmu_capability+0x81      mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>     
>   df37:  perf_get_x86_pmu_capability+0x87      mov    %eax,0x18(%rdx)                                    
>   df3a:  perf_get_x86_pmu_capability+0x8a      movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>     
>   df41:  perf_get_x86_pmu_capability+0x91      movzbl 0x1c(%rdx),%eax                                    
>   df45:  perf_get_x86_pmu_capability+0x95      shr    %cl                                                
>   df47:  perf_get_x86_pmu_capability+0x97      and    $0x1,%ecx                                          
>   df4a:  perf_get_x86_pmu_capability+0x9a      and    $0xfffffffe,%eax                                   
>   df4d:  perf_get_x86_pmu_capability+0x9d      or     %ecx,%eax                                          
>   df4f:  perf_get_x86_pmu_capability+0x9f      mov    %al,0x1c(%rdx)                                     
>   df52:  perf_get_x86_pmu_capability+0xa2      jmpq   0xdf57 <__x86_return_thunk>                        
> 
> 
> Example 3 (--disas option): Alternatives with multiple instructions
> -------------------------------------------------------------------
> 
> Compact Output (default):
> 
> Alternatives with multiple instructions are displayed one above the other,
> with an header describing the alternative.
> 
> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
> __switch_to_asm:
>   82c0:  __switch_to_asm+0x0       push   %rbp                                               
>   82c1:  __switch_to_asm+0x1	   push   %rbx                                               
>   82c2:  __switch_to_asm+0x2	   push   %r12                                               
>   82c4:  __switch_to_asm+0x4	   push   %r13                                               
>   82c6:  __switch_to_asm+0x6	   push   %r14                                               
>   82c8:  __switch_to_asm+0x8	   push   %r15                                               
>   82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)                                  
>   82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp                                  
>   82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx                                   
>   82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>   82e7:  __switch_to_asm+0x27	   <alternative.82e7>
>   	 			   = DEFAULT
>   82e7:  __switch_to_asm+0x27	   | jmp    0x8312 <__switch_to_asm+0x52>
>   82e9:  __switch_to_asm+0x29	   | nop*41
>   	 			   |
> 				   = X86_FEATURE_RSB_CTXSW
>   82e7:  __switch_to_asm+0x27	   | mov    $0x10,%r12
>   82ee:  __switch_to_asm+0x2e	   | callq  0x82f4 <__switch_to_asm+0x34>
>   82f3:  __switch_to_asm+0x33	   | int3   
>   82f4:  __switch_to_asm+0x34	   | callq  0x82fa <__switch_to_asm+0x3a>
>   82f9:  __switch_to_asm+0x39	   | int3   
>   82fa:  __switch_to_asm+0x3a	   | add    $0x10,%rsp
>   82fe:  __switch_to_asm+0x3e	   | dec    %r12
>   8301:  __switch_to_asm+0x41	   | jne    0x82ee <__switch_to_asm+0x2e>
>   8303:  __switch_to_asm+0x43	   | lfence 
>   8306:  __switch_to_asm+0x46	   | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>   	 			   |
> 				   = !X86_FEATURE_ALWAYS
>   82e7:  __switch_to_asm+0x27	   | nop1
>   82e8:  __switch_to_asm+0x28	   | nop1
>   82e9:  __switch_to_asm+0x29	   | callq  0x82ef <__switch_to_asm+0x2f>
>   82ee:  __switch_to_asm+0x2e	   | int3   
>   82ef:  __switch_to_asm+0x2f	   | add    $0x8,%rsp
>   82f3:  __switch_to_asm+0x33	   | lfence 
>   	 			   |
>   8312:  __switch_to_asm+0x52	   pop    %r15                                               
>   8314:  __switch_to_asm+0x54	   pop    %r14                                               
>   8316:  __switch_to_asm+0x56	   pop    %r13                                               
>   8318:  __switch_to_asm+0x58	   pop    %r12                                               
>   831a:  __switch_to_asm+0x5a	   pop    %rbx                                               
>   831b:  __switch_to_asm+0x5b	   pop    %rbp                                               
>   831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>                               
> 
> 
> Wide Output (--wide option):
> 
> Alternatives with multiple instructions are displayed side-by-side, with
> an header describing the alternative. The code in the first column is the
> default code of the alternative.
> 
> $ ./tools/objtool/objtool --disas=__switch_to_asm --wide  --link vmlinux.o
> __switch_to_asm:
>   82c0:  __switch_to_asm+0x0       push   %rbp                                               
>   82c1:  __switch_to_asm+0x1	   push   %rbx                                               
>   82c2:  __switch_to_asm+0x2	   push   %r12                                               
>   82c4:  __switch_to_asm+0x4	   push   %r13                                               
>   82c6:  __switch_to_asm+0x6	   push   %r14                                               
>   82c8:  __switch_to_asm+0x8	   push   %r15                                               
>   82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)                                  
>   82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp                                  
>   82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx                                   
>   82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>   82e7:  __switch_to_asm+0x27	 | <alternative.82e7>                   | X86_FEATURE_RSB_CTXSW                                               | !X86_FEATURE_ALWAYS
>   82e7:  __switch_to_asm+0x27	 | jmp    0x8312 <__switch_to_asm+0x52> | mov    $0x10,%r12						      | nop1
>   82e8:  __switch_to_asm+0x28	 |                                      | 	 							      | nop1
>   82e9:  __switch_to_asm+0x29	 | nop*41                               |                                                                     | callq  0x82ef <__switch_to_asm+0x2f>
>   82ee:  __switch_to_asm+0x2e	 |                                      | callq  0x82f4 <__switch_to_asm+0x34>                                | int3
>   82ef:  __switch_to_asm+0x2f	 |                                      |                                                                     | add    $0x8,%rsp
>   82f3:  __switch_to_asm+0x33	 |                                      | int3                                                                | lfence
>   82f4:  __switch_to_asm+0x34	 |                                      | callq  0x82fa <__switch_to_asm+0x3a>                                |
>   82f9:  __switch_to_asm+0x39	 |                                      | int3                                                                |
>   82fa:  __switch_to_asm+0x3a	 |                                      | add    $0x10,%rsp                                                   |
>   82fe:  __switch_to_asm+0x3e	 |                                      | dec    %r12                                                         |
>   8301:  __switch_to_asm+0x41	 |                                      | jne    0x82ee <__switch_to_asm+0x2e>                                |
>   8303:  __switch_to_asm+0x43	 |                                      | lfence                                                              |
>   8306:  __switch_to_asm+0x46	 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip) # 0x20b <__x86_call_depth> |
>   8312:  __switch_to_asm+0x52	   pop    %r15                                               
>   8314:  __switch_to_asm+0x54	   pop    %r14                                               
>   8316:  __switch_to_asm+0x56	   pop    %r13                                               
>   8318:  __switch_to_asm+0x58	   pop    %r12                                               
>   831a:  __switch_to_asm+0x5a	   pop    %rbx                                               
>   831b:  __switch_to_asm+0x5b	   pop    %rbp                                               
>   831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>
>   
> -----
> 
> Alexandre Chartre (30):
>   objtool: Move disassembly functions to a separated file
>   objtool: Create disassembly context
>   objtool: Disassemble code with libopcodes instead of running objdump
>   tool build: Remove annoying newline in build output
>   objtool: Print symbol during disassembly
>   objtool: Store instruction disassembly result
>   objtool: Disassemble instruction on warning or backtrace
>   objtool: Extract code to validate instruction from the validate branch
>     loop
>   objtool: Record symbol name max length
>   objtool: Add option to trace function validation
>   objtool: Trace instruction state changes during function validation
>   objtool: Improve register reporting during function validation
>   objtool: Identify the different types of alternatives
>   objtool: Add functions to better name alternatives
>   objtool: Improve tracing of alternative instructions
>   objtool: Do not validate IBT for .return_sites and .call_sites
>   objtool: Add the --disas=<function-pattern> action
>   objtool: Preserve alternatives order
>   objtool: Print headers for alternatives
>   objtool: Disassemble group alternatives
>   objtool: Print addresses with alternative instructions
>   objtool: Disassemble exception table alternatives
>   objtool: Disassemble jump table alternatives
>   objtool: Fix address references in alternatives
>   objtool: Provide access to feature and flags of group alternatives
>   objtool: Function to get the name of a CPU feature
>   objtool: Improve naming of group alternatives
>   objtool: Compact output for alternatives with one instruction
>   objtool: Add wide output for disassembly
>   objtool: Trim trailing NOPs in alternative
> 
>  .../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
>  tools/build/Makefile.feature                  |    4 +-
>  tools/objtool/.gitignore                      |    3 +
>  tools/objtool/Build                           |    3 +
>  tools/objtool/Makefile                        |   26 +
>  tools/objtool/arch/loongarch/decode.c         |   23 +
>  tools/objtool/arch/loongarch/special.c        |    5 +
>  tools/objtool/arch/powerpc/decode.c           |   24 +
>  tools/objtool/arch/powerpc/special.c          |    5 +
>  tools/objtool/arch/x86/Build                  |    8 +
>  tools/objtool/arch/x86/decode.c               |   20 +
>  tools/objtool/arch/x86/special.c              |   10 +
>  tools/objtool/builtin-check.c                 |    4 +
>  tools/objtool/check.c                         |  648 +++++----
>  tools/objtool/disas.c                         | 1245 +++++++++++++++++
>  tools/objtool/include/objtool/arch.h          |   11 +
>  tools/objtool/include/objtool/builtin.h       |    3 +
>  tools/objtool/include/objtool/check.h         |   35 +-
>  tools/objtool/include/objtool/disas.h         |   81 ++
>  tools/objtool/include/objtool/special.h       |    4 +-
>  tools/objtool/include/objtool/trace.h         |  141 ++
>  tools/objtool/include/objtool/warn.h          |   17 +-
>  tools/objtool/special.c                       |    2 +
>  tools/objtool/trace.c                         |  203 +++
>  24 files changed, 2259 insertions(+), 299 deletions(-)
>  create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
>  create mode 100644 tools/objtool/disas.c
>  create mode 100644 tools/objtool/include/objtool/disas.h
>  create mode 100644 tools/objtool/include/objtool/trace.h
>  create mode 100644 tools/objtool/trace.c
> 
> -- 
> 2.43.5
> 
> 
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by David Laight 1 week ago
On Mon, 24 Nov 2025 12:04:25 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
> > Notes:
> > ======  
> 
> So those examples below look very useful and cool. Can you pls document them
> so that I can find the breadcrumbs next time and use them?
> 

Yes, it looks like the correct objdump 'spell' will be more useful
than 'objdump -dr[w]' for looking at object files.

	David
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Alexandre Chartre 1 week ago
On 11/24/25 12:04, Borislav Petkov wrote:
> On Fri, Nov 21, 2025 at 10:53:10AM +0100, Alexandre Chartre wrote:
>> Notes:
>> ======
> 
> So those examples below look very useful and cool. Can you pls document them
> so that I can find the breadcrumbs next time and use them?
> 
> :-)
> 
> Thx.

Sure. I will add a doc in tools/objtool/Documentation/ in followup patches.

Rgds,

alex.

> 
>> Disassembly can show default alternative jumping to .altinstr_aux
>> -----------------------------------------------------------------
>> Disassembly can show a default alternative jumping to .altinstr_aux. This
>> happens when the _static_cpu_has() function is used. Its default code
>> jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
>>
>> At runtime, this sequence is not used because the _static_cpu_has()
>> an alternative with the X86_FEATURE_ALWAYS feature.
>>
>>
>>    debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
>>    dec1:  perf_get_x86_pmu_capability+0x11     ud2
>>
>>
>> Disassembly can show alternative jumping to the next instruction
>> ----------------------------------------------------------------
>>
>> The disassembly can show jump tables with an alternative which jumps
>> to the next instruction.
>>
>> For example:
>>
>> def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
>> defb:  perf_get_x86_pmu_capability+0x4b	   mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>
>>
>> This disassembly is correct. These instructions come from:
>>
>>          cap->num_counters_gp = x86_pmu_num_counters(NULL)):
>>
>> which will end up executing this statement:
>>
>>          if (static_branch_unlikely(&perf_is_hybrid) && NULL)
>> 	        <do something>;
>>
>> This statement is optimized to do nothing because the condition is
>> always false. But static_branch_unlikely() is implemented with a jump
>> table inside an "asm goto" statement, and "asm goto" statements are
>> not optimized.
>>
>> So basically the code is optimized like this:
>>
>>          if (static_branch_unlikely(&perf_is_hybrid))
>> 	        ;
>>
>> And this translates to the assembly code above.
>>
>>
>> Examples:
>> =========
>>
>> Example 1 (--trace option): Trace the validation of the os_save() function
>> --------------------------------------------------------------------------
>>
>> $ ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --hacks=skylake --ibt --orc --retpoline --rethunk --sls --static-call --uaccess --prefix=16 --link --trace os_xsave -v vmlinux.o
>> os_xsave: validation begin
>>   59be0:  os_xsave+0x0                  push   %r12                                          - state: cfa=rsp+16 r12=(cfa-16) stack_size=16
>>   59be2:  os_xsave+0x2		       mov    0x0(%rip),%eax        # 0x59be8 <alternatives_patched>
>>   59be8:  os_xsave+0x8		       push   %rbp                                          - state: cfa=rsp+24 rbp=(cfa-24) stack_size=24
>>   59be9:  os_xsave+0x9		       mov    %rdi,%rbp
>>   59bec:  os_xsave+0xc		       push   %rbx					     - state: cfa=rsp+32 rbx=(cfa-32) stack_size=32
>>   59bed:  os_xsave+0xd		       mov    0x8(%rdi),%rbx
>>   59bf1:  os_xsave+0x11		       mov    %rbx,%r12
>>   59bf4:  os_xsave+0x14		       shr    $0x20,%r12
>>   59bf8:  os_xsave+0x18		       test   %eax,%eax
>>   59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump taken
>>   59c22:  os_xsave+0x42		       | ud2
>>   59c24:  os_xsave+0x44		       | jmp    0x59bfc <os_xsave+0x1c>                      - unconditional jump
>>   59bfc:  os_xsave+0x1c		       | | xor    %edx,%edx
>>   59bfe:  os_xsave+0x1e		       | | mov    %rbx,%rsi
>>   59c01:  os_xsave+0x21		       | | mov    %rbp,%rdi
>>   59c04:  os_xsave+0x24		       | | callq  0x59c09 <xfd_validate_state>               - call
>>   59c09:  os_xsave+0x29		       | | mov    %ebx,%eax
>>   59c0b:  os_xsave+0x2b		       | | mov    %r12d,%edx
>>   	 			       | | / <alternative.59c0e> X86_FEATURE_XSAVEOPT
>>    1b29:  .altinstr_replacement+0x1b29  | | | xsaveopt64 0x40(%rbp)
>>   59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx
>>   59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx
>>   59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>                  - jump taken
>>   59c26:  os_xsave+0x46		       | | | | ud2
>>   59c28:  os_xsave+0x48		       | | | | pop    %rbx                                   - state: cfa=rsp+24 rbx=<undef> stack_size=24
>>   59c29:  os_xsave+0x49		       | | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16
>>   59c2a:  os_xsave+0x4a		       | | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8
>>   59c2c:  os_xsave+0x4c		       | | | | jmpq   0x59c31 <__x86_return_thunk>	     - return
>>   59c17:  os_xsave+0x37		       | | | jne    0x59c26 <os_xsave+0x46>		     - jump not taken
>>   59c19:  os_xsave+0x39		       | | | pop    %rbx    				     - state: cfa=rsp+24 rbx=<undef> stack_size=24
>>   59c1a:  os_xsave+0x3a		       | | | pop    %rbp				     - state: cfa=rsp+16 rbp=<undef> stack_size=16
>>   59c1b:  os_xsave+0x3b		       | | | pop    %r12				     - state: cfa=rsp+8 r12=<undef> stack_size=8
>>   59c1d:  os_xsave+0x3d		       | | | jmpq   0x59c22 <__x86_return_thunk>	     - return
>>   	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEOPT
>> 				       | | / <alternative.59c0e> X86_FEATURE_XSAVEC
>>    1b2e:  .altinstr_replacement+0x1b2e  | | | xsavec64 0x40(%rbp)
>>   59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
>>   	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVEC
>> 				       | | / <alternative.59c0e> X86_FEATURE_XSAVES
>>    1b33:  .altinstr_replacement+0x1b33  | | | xsaves64 0x40(%rbp)
>>   59c13:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                - already visited
>>   	 			       | | \ <alternative.59c0e> X86_FEATURE_XSAVES
>> 				       | | / <alternative.59c0e> EXCEPTION for instruction at 0x59c0e <os_xsave+0x2e>
>>   59c15:  os_xsave+0x35		       | | | test   %ebx,%ebx                                - already visited
>>   	 			       | | \ <alternative.59c0e> EXCEPTION
>> 				       | | / <alternative.59c0e> DEFAULT
>>   59c0e:  os_xsave+0x2e		       | | xsave64 0x40(%rbp)
>>   59c13:  os_xsave+0x33		       | | xor    %ebx,%ebx                                  - already visited
>>   59bfa:  os_xsave+0x1a		       je     0x59c22 <os_xsave+0x42>                        - jump not taken
>>   59bfc:  os_xsave+0x1c		       xor    %edx,%edx                                      - already visited
>> os_xsave: validation end
>>
>>
>> Example 2 (--disas option): Single Instruction Alternatives
>> -----------------------------------------------------------
>>
>> Compact Output (default):
>>
>> Alternatives with single instructions are displayed each on one line,
>> with the instruction and a description of the alternative.
>>
>> $ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --link vmlinux.o
>> perf_get_x86_pmu_capability:
>>    deb0:  perf_get_x86_pmu_capability+0x0     endbr64
>>    deb4:  perf_get_x86_pmu_capability+0x4     callq  0xdeb9 <__fentry__>
>>    deb9:  perf_get_x86_pmu_capability+0x9     mov    %rdi,%rdx
>>    debc:  perf_get_x86_pmu_capability+0xc     <alternative.debc>
>>    	 				     = jmpq   0xdec1 <.altinstr_aux+0xfc>                 (if DEFAULT)
>> 					     = jmpq   0x622 <perf_get_x86_pmu_capability+0x37>    (if X86_FEATURE_ALWAYS)
>> 					     = nop5                                               (if X86_FEATURE_HYBRID_CPU)
>>    dec1:  perf_get_x86_pmu_capability+0x11    ud2
>>    dec3:  perf_get_x86_pmu_capability+0x13    movq   $0x0,(%rdx)
>>    deca:  perf_get_x86_pmu_capability+0x1a    movq   $0x0,0x8(%rdx)
>>    ded2:  perf_get_x86_pmu_capability+0x22    movq   $0x0,0x10(%rdx)
>>    deda:  perf_get_x86_pmu_capability+0x2a    movq   $0x0,0x18(%rdx)
>>    dee2:  perf_get_x86_pmu_capability+0x32    jmpq   0xdee7 <__x86_return_thunk>
>>    dee7:  perf_get_x86_pmu_capability+0x37    cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>
>>    deef:  perf_get_x86_pmu_capability+0x3f    je     0xdec3 <perf_get_x86_pmu_capability+0x13>
>>    def1:  perf_get_x86_pmu_capability+0x41    mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>
>>    def7:  perf_get_x86_pmu_capability+0x47    mov    %eax,(%rdi)
>>    def9:  perf_get_x86_pmu_capability+0x49    <jump_table.def9>
>>    	 				     = nop2                                              (if DEFAULT)
>> 					     = jmp    defb <perf_get_x86_pmu_capability+0x4b>    (if JUMP)
>>    defb:  perf_get_x86_pmu_capability+0x4b    mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>
>>    df02:  perf_get_x86_pmu_capability+0x52    <alternative.df02>
>>    	 				     = callq  0xdf07 <__sw_hweight64>    (if DEFAULT)
>> 					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
>>    df07:  perf_get_x86_pmu_capability+0x57    mov    %eax,0x4(%rdx)
>>    df0a:  perf_get_x86_pmu_capability+0x5a    <jump_table.df0a>
>>    	 				     = nop2                                              (if DEFAULT)
>> 					     = jmp    df0c <perf_get_x86_pmu_capability+0x5c>    (if JUMP)
>>    df0c:  perf_get_x86_pmu_capability+0x5c    mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>
>>    df13:  perf_get_x86_pmu_capability+0x63    <alternative.df13>
>>    	 				     = callq  0xdf18 <__sw_hweight64>    (if DEFAULT)
>> 					     = popcnt %rdi,%rax                  (if X86_FEATURE_POPCNT)
>>    df18:  perf_get_x86_pmu_capability+0x68    mov    %eax,0x8(%rdx)
>>    df1b:  perf_get_x86_pmu_capability+0x6b    mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>
>>    df21:  perf_get_x86_pmu_capability+0x71    mov    %eax,0xc(%rdx)
>>    df24:  perf_get_x86_pmu_capability+0x74    mov    %eax,0x10(%rdx)
>>    df27:  perf_get_x86_pmu_capability+0x77    mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>
>>    df2e:  perf_get_x86_pmu_capability+0x7e    mov    %eax,0x14(%rdx)
>>    df31:  perf_get_x86_pmu_capability+0x81    mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>
>>    df37:  perf_get_x86_pmu_capability+0x87    mov    %eax,0x18(%rdx)
>>    df3a:  perf_get_x86_pmu_capability+0x8a    movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>
>>    df41:  perf_get_x86_pmu_capability+0x91    movzbl 0x1c(%rdx),%eax
>>    df45:  perf_get_x86_pmu_capability+0x95    shr    %cl
>>    df47:  perf_get_x86_pmu_capability+0x97    and    $0x1,%ecx
>>    df4a:  perf_get_x86_pmu_capability+0x9a    and    $0xfffffffe,%eax
>>    df4d:  perf_get_x86_pmu_capability+0x9d    or     %ecx,%eax
>>    df4f:  perf_get_x86_pmu_capability+0x9f    mov    %al,0x1c(%rdx)
>>    df52:  perf_get_x86_pmu_capability+0xa2    jmpq   0xdf57 <__x86_return_thunk>
>>
>>
>> Wide Output (--wide option):
>>
>> Alternatives with single instructions are displayed side-by-side,
>> with an header.
>>
>> $ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --wide --link vmlinux.o
>> perf_get_x86_pmu_capability:
>>    deb0:  perf_get_x86_pmu_capability+0x0       endbr64
>>    deb4:  perf_get_x86_pmu_capability+0x4       callq  0xdeb9 <__fentry__>
>>    deb9:  perf_get_x86_pmu_capability+0x9       mov    %rdi,%rdx
>>    debc:  perf_get_x86_pmu_capability+0xc     | <alternative.debc>                 | X86_FEATURE_ALWAYS                              | X86_FEATURE_HYBRID_CPU
>>    debc:  perf_get_x86_pmu_capability+0xc     | jmpq   0xdec1 <.altinstr_aux+0xfc> | jmpq   0x622 <perf_get_x86_pmu_capability+0x37> | nop5
>>    dec1:  perf_get_x86_pmu_capability+0x11      ud2
>>    dec3:  perf_get_x86_pmu_capability+0x13      movq   $0x0,(%rdx)
>>    deca:  perf_get_x86_pmu_capability+0x1a      movq   $0x0,0x8(%rdx)
>>    ded2:  perf_get_x86_pmu_capability+0x22      movq   $0x0,0x10(%rdx)
>>    deda:  perf_get_x86_pmu_capability+0x2a      movq   $0x0,0x18(%rdx)
>>    dee2:  perf_get_x86_pmu_capability+0x32      jmpq   0xdee7 <__x86_return_thunk>
>>    dee7:  perf_get_x86_pmu_capability+0x37      cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>
>>    deef:  perf_get_x86_pmu_capability+0x3f      je     0xdec3 <perf_get_x86_pmu_capability+0x13>
>>    def1:  perf_get_x86_pmu_capability+0x41      mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>
>>    def7:  perf_get_x86_pmu_capability+0x47      mov    %eax,(%rdi)
>>    def9:  perf_get_x86_pmu_capability+0x49    | <jump_table.def9> | JUMP
>>    def9:  perf_get_x86_pmu_capability+0x49    | nop2              | jmp    defb <perf_get_x86_pmu_capability+0x4b>
>>    defb:  perf_get_x86_pmu_capability+0x4b      mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>
>>    df02:  perf_get_x86_pmu_capability+0x52    | <alternative.df02>             | X86_FEATURE_POPCNT
>>    df02:  perf_get_x86_pmu_capability+0x52    | callq  0xdf07 <__sw_hweight64> | popcnt %rdi,%rax
>>    df07:  perf_get_x86_pmu_capability+0x57      mov    %eax,0x4(%rdx)
>>    df0a:  perf_get_x86_pmu_capability+0x5a    | <jump_table.df0a> | JUMP
>>    df0a:  perf_get_x86_pmu_capability+0x5a    | nop2              | jmp    df0c <perf_get_x86_pmu_capability+0x5c>
>>    df0c:  perf_get_x86_pmu_capability+0x5c      mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>
>>    df13:  perf_get_x86_pmu_capability+0x63    | <alternative.df13>             | X86_FEATURE_POPCNT
>>    df13:  perf_get_x86_pmu_capability+0x63    | callq  0xdf18 <__sw_hweight64> | popcnt %rdi,%rax
>>    df18:  perf_get_x86_pmu_capability+0x68      mov    %eax,0x8(%rdx)
>>    df1b:  perf_get_x86_pmu_capability+0x6b      mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>
>>    df21:  perf_get_x86_pmu_capability+0x71      mov    %eax,0xc(%rdx)
>>    df24:  perf_get_x86_pmu_capability+0x74      mov    %eax,0x10(%rdx)
>>    df27:  perf_get_x86_pmu_capability+0x77      mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>
>>    df2e:  perf_get_x86_pmu_capability+0x7e      mov    %eax,0x14(%rdx)
>>    df31:  perf_get_x86_pmu_capability+0x81      mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>
>>    df37:  perf_get_x86_pmu_capability+0x87      mov    %eax,0x18(%rdx)
>>    df3a:  perf_get_x86_pmu_capability+0x8a      movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>
>>    df41:  perf_get_x86_pmu_capability+0x91      movzbl 0x1c(%rdx),%eax
>>    df45:  perf_get_x86_pmu_capability+0x95      shr    %cl
>>    df47:  perf_get_x86_pmu_capability+0x97      and    $0x1,%ecx
>>    df4a:  perf_get_x86_pmu_capability+0x9a      and    $0xfffffffe,%eax
>>    df4d:  perf_get_x86_pmu_capability+0x9d      or     %ecx,%eax
>>    df4f:  perf_get_x86_pmu_capability+0x9f      mov    %al,0x1c(%rdx)
>>    df52:  perf_get_x86_pmu_capability+0xa2      jmpq   0xdf57 <__x86_return_thunk>
>>
>>
>> Example 3 (--disas option): Alternatives with multiple instructions
>> -------------------------------------------------------------------
>>
>> Compact Output (default):
>>
>> Alternatives with multiple instructions are displayed one above the other,
>> with an header describing the alternative.
>>
>> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
>> __switch_to_asm:
>>    82c0:  __switch_to_asm+0x0       push   %rbp
>>    82c1:  __switch_to_asm+0x1	   push   %rbx
>>    82c2:  __switch_to_asm+0x2	   push   %r12
>>    82c4:  __switch_to_asm+0x4	   push   %r13
>>    82c6:  __switch_to_asm+0x6	   push   %r14
>>    82c8:  __switch_to_asm+0x8	   push   %r15
>>    82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)
>>    82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp
>>    82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx
>>    82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>>    82e7:  __switch_to_asm+0x27	   <alternative.82e7>
>>    	 			   = DEFAULT
>>    82e7:  __switch_to_asm+0x27	   | jmp    0x8312 <__switch_to_asm+0x52>
>>    82e9:  __switch_to_asm+0x29	   | nop*41
>>    	 			   |
>> 				   = X86_FEATURE_RSB_CTXSW
>>    82e7:  __switch_to_asm+0x27	   | mov    $0x10,%r12
>>    82ee:  __switch_to_asm+0x2e	   | callq  0x82f4 <__switch_to_asm+0x34>
>>    82f3:  __switch_to_asm+0x33	   | int3
>>    82f4:  __switch_to_asm+0x34	   | callq  0x82fa <__switch_to_asm+0x3a>
>>    82f9:  __switch_to_asm+0x39	   | int3
>>    82fa:  __switch_to_asm+0x3a	   | add    $0x10,%rsp
>>    82fe:  __switch_to_asm+0x3e	   | dec    %r12
>>    8301:  __switch_to_asm+0x41	   | jne    0x82ee <__switch_to_asm+0x2e>
>>    8303:  __switch_to_asm+0x43	   | lfence
>>    8306:  __switch_to_asm+0x46	   | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>>    	 			   |
>> 				   = !X86_FEATURE_ALWAYS
>>    82e7:  __switch_to_asm+0x27	   | nop1
>>    82e8:  __switch_to_asm+0x28	   | nop1
>>    82e9:  __switch_to_asm+0x29	   | callq  0x82ef <__switch_to_asm+0x2f>
>>    82ee:  __switch_to_asm+0x2e	   | int3
>>    82ef:  __switch_to_asm+0x2f	   | add    $0x8,%rsp
>>    82f3:  __switch_to_asm+0x33	   | lfence
>>    	 			   |
>>    8312:  __switch_to_asm+0x52	   pop    %r15
>>    8314:  __switch_to_asm+0x54	   pop    %r14
>>    8316:  __switch_to_asm+0x56	   pop    %r13
>>    8318:  __switch_to_asm+0x58	   pop    %r12
>>    831a:  __switch_to_asm+0x5a	   pop    %rbx
>>    831b:  __switch_to_asm+0x5b	   pop    %rbp
>>    831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>
>>
>>
>> Wide Output (--wide option):
>>
>> Alternatives with multiple instructions are displayed side-by-side, with
>> an header describing the alternative. The code in the first column is the
>> default code of the alternative.
>>
>> $ ./tools/objtool/objtool --disas=__switch_to_asm --wide  --link vmlinux.o
>> __switch_to_asm:
>>    82c0:  __switch_to_asm+0x0       push   %rbp
>>    82c1:  __switch_to_asm+0x1	   push   %rbx
>>    82c2:  __switch_to_asm+0x2	   push   %r12
>>    82c4:  __switch_to_asm+0x4	   push   %r13
>>    82c6:  __switch_to_asm+0x6	   push   %r14
>>    82c8:  __switch_to_asm+0x8	   push   %r15
>>    82ca:  __switch_to_asm+0xa	   mov    %rsp,0x1670(%rdi)
>>    82d1:  __switch_to_asm+0x11	   mov    0x1670(%rsi),%rsp
>>    82d8:  __switch_to_asm+0x18	   mov    0xad8(%rsi),%rbx
>>    82df:  __switch_to_asm+0x1f	   mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>>    82e7:  __switch_to_asm+0x27	 | <alternative.82e7>                   | X86_FEATURE_RSB_CTXSW                                               | !X86_FEATURE_ALWAYS
>>    82e7:  __switch_to_asm+0x27	 | jmp    0x8312 <__switch_to_asm+0x52> | mov    $0x10,%r12						      | nop1
>>    82e8:  __switch_to_asm+0x28	 |                                      | 	 							      | nop1
>>    82e9:  __switch_to_asm+0x29	 | nop*41                               |                                                                     | callq  0x82ef <__switch_to_asm+0x2f>
>>    82ee:  __switch_to_asm+0x2e	 |                                      | callq  0x82f4 <__switch_to_asm+0x34>                                | int3
>>    82ef:  __switch_to_asm+0x2f	 |                                      |                                                                     | add    $0x8,%rsp
>>    82f3:  __switch_to_asm+0x33	 |                                      | int3                                                                | lfence
>>    82f4:  __switch_to_asm+0x34	 |                                      | callq  0x82fa <__switch_to_asm+0x3a>                                |
>>    82f9:  __switch_to_asm+0x39	 |                                      | int3                                                                |
>>    82fa:  __switch_to_asm+0x3a	 |                                      | add    $0x10,%rsp                                                   |
>>    82fe:  __switch_to_asm+0x3e	 |                                      | dec    %r12                                                         |
>>    8301:  __switch_to_asm+0x41	 |                                      | jne    0x82ee <__switch_to_asm+0x2e>                                |
>>    8303:  __switch_to_asm+0x43	 |                                      | lfence                                                              |
>>    8306:  __switch_to_asm+0x46	 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip) # 0x20b <__x86_call_depth> |
>>    8312:  __switch_to_asm+0x52	   pop    %r15
>>    8314:  __switch_to_asm+0x54	   pop    %r14
>>    8316:  __switch_to_asm+0x56	   pop    %r13
>>    8318:  __switch_to_asm+0x58	   pop    %r12
>>    831a:  __switch_to_asm+0x5a	   pop    %rbx
>>    831b:  __switch_to_asm+0x5b	   pop    %rbp
>>    831c:  __switch_to_asm+0x5c	   jmpq   0x8321 <__switch_to>
>>    
>> -----
>>
>> Alexandre Chartre (30):
>>    objtool: Move disassembly functions to a separated file
>>    objtool: Create disassembly context
>>    objtool: Disassemble code with libopcodes instead of running objdump
>>    tool build: Remove annoying newline in build output
>>    objtool: Print symbol during disassembly
>>    objtool: Store instruction disassembly result
>>    objtool: Disassemble instruction on warning or backtrace
>>    objtool: Extract code to validate instruction from the validate branch
>>      loop
>>    objtool: Record symbol name max length
>>    objtool: Add option to trace function validation
>>    objtool: Trace instruction state changes during function validation
>>    objtool: Improve register reporting during function validation
>>    objtool: Identify the different types of alternatives
>>    objtool: Add functions to better name alternatives
>>    objtool: Improve tracing of alternative instructions
>>    objtool: Do not validate IBT for .return_sites and .call_sites
>>    objtool: Add the --disas=<function-pattern> action
>>    objtool: Preserve alternatives order
>>    objtool: Print headers for alternatives
>>    objtool: Disassemble group alternatives
>>    objtool: Print addresses with alternative instructions
>>    objtool: Disassemble exception table alternatives
>>    objtool: Disassemble jump table alternatives
>>    objtool: Fix address references in alternatives
>>    objtool: Provide access to feature and flags of group alternatives
>>    objtool: Function to get the name of a CPU feature
>>    objtool: Improve naming of group alternatives
>>    objtool: Compact output for alternatives with one instruction
>>    objtool: Add wide output for disassembly
>>    objtool: Trim trailing NOPs in alternative
>>
>>   .../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
>>   tools/build/Makefile.feature                  |    4 +-
>>   tools/objtool/.gitignore                      |    3 +
>>   tools/objtool/Build                           |    3 +
>>   tools/objtool/Makefile                        |   26 +
>>   tools/objtool/arch/loongarch/decode.c         |   23 +
>>   tools/objtool/arch/loongarch/special.c        |    5 +
>>   tools/objtool/arch/powerpc/decode.c           |   24 +
>>   tools/objtool/arch/powerpc/special.c          |    5 +
>>   tools/objtool/arch/x86/Build                  |    8 +
>>   tools/objtool/arch/x86/decode.c               |   20 +
>>   tools/objtool/arch/x86/special.c              |   10 +
>>   tools/objtool/builtin-check.c                 |    4 +
>>   tools/objtool/check.c                         |  648 +++++----
>>   tools/objtool/disas.c                         | 1245 +++++++++++++++++
>>   tools/objtool/include/objtool/arch.h          |   11 +
>>   tools/objtool/include/objtool/builtin.h       |    3 +
>>   tools/objtool/include/objtool/check.h         |   35 +-
>>   tools/objtool/include/objtool/disas.h         |   81 ++
>>   tools/objtool/include/objtool/special.h       |    4 +-
>>   tools/objtool/include/objtool/trace.h         |  141 ++
>>   tools/objtool/include/objtool/warn.h          |   17 +-
>>   tools/objtool/special.c                       |    2 +
>>   tools/objtool/trace.c                         |  203 +++
>>   24 files changed, 2259 insertions(+), 299 deletions(-)
>>   create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
>>   create mode 100644 tools/objtool/disas.c
>>   create mode 100644 tools/objtool/include/objtool/disas.h
>>   create mode 100644 tools/objtool/include/objtool/trace.h
>>   create mode 100644 tools/objtool/trace.c
>>
>> -- 
>> 2.43.5
>>
>>
>>
>
Re: [PATCH v6 00/30] objtool: Function validation tracing
Posted by Borislav Petkov 1 week ago
On Mon, Nov 24, 2025 at 12:39:08PM +0100, Alexandre Chartre wrote:
> Sure. I will add a doc in tools/objtool/Documentation/ in followup patches.

Thanks, sounds good.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette