[PATCH v4 00/28] objtool: Function validation tracing

Alexandre Chartre posted 28 patches 2 months, 3 weeks ago
There is a newer version of this series
.../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
tools/build/Makefile.feature                  |    4 +-
tools/objtool/Build                           |    3 +
tools/objtool/Makefile                        |   23 +
tools/objtool/arch/loongarch/decode.c         |   30 +
tools/objtool/arch/loongarch/special.c        |    5 +
tools/objtool/arch/powerpc/decode.c           |   31 +
tools/objtool/arch/powerpc/special.c          |    5 +
tools/objtool/arch/x86/Build                  |    8 +
tools/objtool/arch/x86/decode.c               |   36 +-
tools/objtool/arch/x86/special.c              |   10 +
tools/objtool/builtin-check.c                 |    5 +-
tools/objtool/check.c                         |  739 ++++++-----
tools/objtool/disas.c                         | 1137 +++++++++++++++++
tools/objtool/include/objtool/arch.h          |   14 +-
tools/objtool/include/objtool/builtin.h       |    2 +
tools/objtool/include/objtool/check.h         |   41 +-
tools/objtool/include/objtool/disas.h         |   75 ++
tools/objtool/include/objtool/elf.h           |    7 +
tools/objtool/include/objtool/objtool.h       |    6 +-
tools/objtool/include/objtool/special.h       |    4 +-
tools/objtool/include/objtool/trace.h         |  139 ++
tools/objtool/include/objtool/warn.h          |   17 +-
tools/objtool/objtool.c                       |   27 +-
tools/objtool/special.c                       |    2 +
tools/objtool/trace.c                         |  204 +++
26 files changed, 2287 insertions(+), 320 deletions(-)
create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
create mode 100644 tools/objtool/disas.c
create mode 100644 tools/objtool/include/objtool/disas.h
create mode 100644 tools/objtool/include/objtool/trace.h
create mode 100644 tools/objtool/trace.c
[PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
Hi,

These patches change objtool to disassemble code with libopcodes instead
of running objdump. You will find below:

- Changes: list of changes made in this version
- Overview: overview of the changes
- Notes: description of some particular behavior
- Examples: output examples

Thanks,

alex.

-----

Changes:
========

V4:
---
This version fixes a build issue when disassembly is not available. Compared
with V3, this is addresses by changes in patch 14 (objtool: Improve tracing
of alternative instructions). Other patches are similar to V3.

V3:
---
This version addresses comments from Josh and Peter, in particular:

- Josh: replace ERROR in disas_context_create with WARN
- Josh: do not change offstr() outside the disassembler
- Josh: duplicated "falls through to next function" warning
- Josh: validate_symbol() has extra newline before return
- Josh: "make -s" should be completely silent
- Josh: instructions with unwinding state changes are printing twice
- Josh: explain TRACE_INSN(insn, NULL): this prints an instruction with no
  	additional message.

- Peter: display alternative on a single line
- Peter: display nop-like instruction as NOP<n>
- Peter: in alternative show differences between jmp.d8 and jmp.d32 (jmp/jmpq is used now)
- Peter: show alternative feature name and flags
- Peter: alternative jumps to altinstr_aux - see NOTE below:
         Disassembly can show default alternative jumping to .altinstr_aux
- Peter: some jump label target seems wrong (jmp +0) - NOTE below:
         Disassembly can show alternative jumping to the next instruction

Other improvements:

- An alternatives is displayed on single line if each alternative has a
  single instruction. Otherwise alternatives are dispayed side-by-side,
  with one column for each lternative.

- Each alternative of a group alternative is displayed with its feature
  name and flags: <flags><feature-name>

  <flags> is made of the following characters:

    '!' : ALT_FLAG_NOT
    '+' : ALT_FLAG_DIRECT_CALL
    '?' : unknown flag (i.e. any other flags)

- If an alternative is a jump table then "JUMP" is used as the feature
  name.

- If an alternative is an exception table then "EXCEPTION" is used as the
  feature name.

- Print the destination name of pv_ops calls  when we can figure out if
  XENPV mode is used or not. If the PV mode can't be predicted then print
  the default pv_ops destination as a destination example.

- If a group alternative is a direct call then print the corresponding
  pv_ops call.

Examples are shown below.


Overview:
=========

This provides the following changes to objtool.

- Disassemble code with libopcodes instead of running objdump

  objtool executes the objdump command to disassemble code. In particular,
  if objtool fails to validate a function then it will use objdump to
  disassemble the entire file which is not very helpful when processing
  a large file (like vmlinux.o).

  Using libopcodes provides more control about the disassembly scope and
  output, and it is possible to disassemble a single instruction or
  a single function. Now when objtool fails to validate a function it
  will disassemble that single function instead of disassembling the
  entire file.

- Add the --trace <function> option to trace function validation

  Figuring out why a function validation has failed can be difficult because
  objtool checks all code flows (including alternatives) and maintains
  instructions states (in particular call frame information).

  The trace option allows to follow the function validation done by objtool
  instruction per instruction, see what objtool is doing and get function
  validation information. An output example is shown below.

- Add the --disas <function> option to disassemble functions

  Disassembly is done using libopcodes and it will show the different code
  alternatives.

Note: some changes are architecture specific (x86, powerpc, loongarch). Any
feedback about the behavior on powerpc and loongarch is welcome.


Notes:
======

Disassembly can show default alternative jumping to .altinstr_aux
-----------------------------------------------------------------
Disassembly can show a default alternative jumping to .altinstr_aux. This
happens when the _static_cpu_has() function is used. Its default code
jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).

At runtime, this sequence is not used because the _static_cpu_has() 
an alternative with the X86_FEATURE_ALWAYS feature. 


  debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
  dec1:  perf_get_x86_pmu_capability+0x11     ud2                                                       


Disassembly can show alternative jumping to the next instruction
----------------------------------------------------------------

The disassembly can show jump tables with an alternative which jumps
to the next instruction.

For example:

def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
defb:  perf_get_x86_pmu_capability+0x4b	   mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      

This disassembly is correct. These instructions come from:

        cap->num_counters_gp = x86_pmu_num_counters(NULL)):

which will end up executing this statement:

        if (static_branch_unlikely(&perf_is_hybrid) && NULL)
	        <do something>;

This statement is optimized to do nothing because the condition is
always false. But static_branch_unlikely() is implemented with a jump
table inside an "asm goto" statement, and "asm goto" statements are
not optimized.

So basically the code is optimized like this:

        if (static_branch_unlikely(&perf_is_hybrid))
	        ;

And this translates to the assembly code above.


Examples:
=========

Example 1 (--trace option): Trace the validation of the os_save() function
--------------------------------------------------------------------------

$ ./tools/objtool/objtool --hacks=jump_label --hacks=noinstr --hacks=skylake --ibt --orc --retpoline --rethunk --sls --static-call --uaccess --prefix=16 --link --trace os_xsave -v vmlinux.o
os_xsave: validation begin
 59b20:  os_xsave+0x0                    push   %r12                                       - state: cfa=rsp+16 r12=(cfa-16) stack_size=16 
 59b22:  os_xsave+0x2	     		 mov    0x0(%rip),%eax        # 0x59b28 <alternatives_patched>
 59b28:  os_xsave+0x8	     		 push   %rbp                                       - state: cfa=rsp+24 rbp=(cfa-24) stack_size=24 
 59b29:  os_xsave+0x9	     		 mov    %rdi,%rbp                                          
 59b2c:  os_xsave+0xc	     		 push   %rbx                                       - state: cfa=rsp+32 rbx=(cfa-32) stack_size=32 
 59b2d:  os_xsave+0xd	     		 mov    0x8(%rdi),%rbx                                     
 59b31:  os_xsave+0x11	     		 mov    %rbx,%r12                                          
 59b34:  os_xsave+0x14	     		 shr    $0x20,%r12                                         
 59b38:  os_xsave+0x18	    		 test   %eax,%eax                                          
 59b3a:  os_xsave+0x1a	     		 je     0x59b62 <os_xsave+0x42>                    - jump taken
 59b62:  os_xsave+0x42	   	       | ud2                                                     
 59b64:  os_xsave+0x44	   	       | jmp    0x59b3c <os_xsave+0x1c>                    - unconditional jump
 59b3c:  os_xsave+0x1c	   	       | | xor    %edx,%edx                                      
 59b3e:  os_xsave+0x1e	   	       | | mov    %rbx,%rsi                                      
 59b41:  os_xsave+0x21	   	       | | mov    %rbp,%rdi                                      
 59b44:  os_xsave+0x24	   	       | | callq  0x59b49 <xfd_validate_state>             - call
 59b49:  os_xsave+0x29	   	       | | mov    %ebx,%eax                                      
 59b4b:  os_xsave+0x2b	   	       | | mov    %r12d,%edx                                     
 	 		   	       | | / <alternative.59b4e> EXCEPTION for instruction at 0x59b4e <os_xsave+0x2e>
 59b55:  os_xsave+0x35	   	       | | | test   %ebx,%ebx                                    
 59b57:  os_xsave+0x37	   	       | | | jne    0x59b66 <os_xsave+0x46>                - jump taken
 59b66:  os_xsave+0x46	   	       | | | | ud2                                               
 59b68:  os_xsave+0x48	   	       | | | | pop    %rbx                                 - state: cfa=rsp+24 rbx=<undef> stack_size=24 
 59b69:  os_xsave+0x49	   	       | | | | pop    %rbp                                 - state: cfa=rsp+16 rbp=<undef> stack_size=16 
 59b6a:  os_xsave+0x4a	   	       | | | | pop    %r12                                 - state: cfa=rsp+8 r12=<undef> stack_size=8 
 59b6c:  os_xsave+0x4c	   	       | | | | jmpq   0x59b71 <__x86_return_thunk>         - return
 59b57:  os_xsave+0x37	   	       | | | jne    0x59b66 <os_xsave+0x46>                - jump not taken
 59b59:  os_xsave+0x39	   	       | | | pop    %rbx                                   - state: cfa=rsp+24 rbx=<undef> stack_size=24 
 59b5a:  os_xsave+0x3a	   	       | | | pop    %rbp                                   - state: cfa=rsp+16 rbp=<undef> stack_size=16 
 59b5b:  os_xsave+0x3b	   	       | | | pop    %r12                                   - state: cfa=rsp+8 r12=<undef> stack_size=8 
 59b5d:  os_xsave+0x3d	   	       | | | jmpq   0x59b62 <__x86_return_thunk>           - return
 	 		   	       | | \ <alternative.59b4e> EXCEPTION end
				       | | / <alternative.59b4e> X86_FEATURE_XSAVES
  1b2b:  .altinstr_replacement+0x1b2b  | | | xsaves64 0x40(%rbp)                                 
 59b53:  os_xsave+0x33		       | | | xor    %ebx,%ebx                                    
 59b55:  os_xsave+0x35		       | | | test   %ebx,%ebx                              - already visited
 	 			       | | \ <alternative.59b4e> X86_FEATURE_XSAVES end
				       | | / <alternative.59b4e> X86_FEATURE_XSAVEC
  1b26:  .altinstr_replacement+0x1b26  | | | xsavec64 0x40(%rbp)                                 
 59b53:  os_xsave+0x33		       | | | xor    %ebx,%ebx                              - already visited
 	 			       | | \ <alternative.59b4e> X86_FEATURE_XSAVEC end
				       | | / <alternative.59b4e> X86_FEATURE_XSAVEOPT
  1b21:  .altinstr_replacement+0x1b21  | | | xsaveopt64 0x40(%rbp)                               
 59b53:  os_xsave+0x33		       | | | xor    %ebx,%ebx                              - already visited
 	 			       | | \ <alternative.59b4e> X86_FEATURE_XSAVEOPT end
				       | | / <alternative.59b4e> DEFAULT
 59b4e:  os_xsave+0x2e		       | | xsave64 0x40(%rbp)                                    
 59b53:  os_xsave+0x33		       | | xor    %ebx,%ebx                                - already visited
 59b3a:  os_xsave+0x1a		       je     0x59b62 <os_xsave+0x42>                      - jump not taken
 59b3c:  os_xsave+0x1c		       xor    %edx,%edx                                    - already visited
os_xsave: validation end


Example 2 (--disas option): Single Instruction Alternatives
-----------------------------------------------------------
Alternatives with single instructions are displayed on a single line.
Instructions of the different alternatives are separated with a vertical
bar (|).


$ ./tools/objtool/objtool --disas=perf_get_x86_pmu_capability --link vmlinux.o
perf_get_x86_pmu_capability:
  deb0:  perf_get_x86_pmu_capability+0x0      endbr64                                                   
  deb4:  perf_get_x86_pmu_capability+0x4      callq  0xdeb9 <__fentry__>                                
  deb9:  perf_get_x86_pmu_capability+0x9      mov    %rdi,%rdx                                          
  debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
  dec1:  perf_get_x86_pmu_capability+0x11     ud2                                                       
  dec3:  perf_get_x86_pmu_capability+0x13     movq   $0x0,(%rdx)                                        
  deca:  perf_get_x86_pmu_capability+0x1a     movq   $0x0,0x8(%rdx)                                     
  ded2:  perf_get_x86_pmu_capability+0x22     movq   $0x0,0x10(%rdx)                                    
  deda:  perf_get_x86_pmu_capability+0x2a     movq   $0x0,0x18(%rdx)                                    
  dee2:  perf_get_x86_pmu_capability+0x32     jmpq   0xdee7 <__x86_return_thunk>                        
  dee7:  perf_get_x86_pmu_capability+0x37     cmpq   $0x0,0x0(%rip)        # 0xdeef <x86_pmu+0x10>      
  deef:  perf_get_x86_pmu_capability+0x3f     je     0xdec3 <perf_get_x86_pmu_capability+0x13>          
  def1:  perf_get_x86_pmu_capability+0x41     mov    0x0(%rip),%eax        # 0xdef7 <x86_pmu+0x8>       
  def7:  perf_get_x86_pmu_capability+0x47     mov    %eax,(%rdi)                                        
  def9:  perf_get_x86_pmu_capability+0x49     NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
  defb:  perf_get_x86_pmu_capability+0x4b     mov    0x0(%rip),%rdi        # 0xdf02 <x86_pmu+0xd8>      
  df02:  perf_get_x86_pmu_capability+0x52     callq  0xdf07 <__sw_hweight64> | popcnt %rdi,%rax  (X86_FEATURE_POPCNT)   # <alternative.df02>
  df07:  perf_get_x86_pmu_capability+0x57     mov    %eax,0x4(%rdx)                                     
  df0a:  perf_get_x86_pmu_capability+0x5a     NOP2 | jmp    df0c <perf_get_x86_pmu_capability+0x5c>  (JUMP)   # <alternative.df0a>
  df0c:  perf_get_x86_pmu_capability+0x5c     mov    0x0(%rip),%rdi        # 0xdf13 <x86_pmu+0xe0>      
  df13:  perf_get_x86_pmu_capability+0x63     callq  0xdf18 <__sw_hweight64> | popcnt %rdi,%rax  (X86_FEATURE_POPCNT)   # <alternative.df13>
  df18:  perf_get_x86_pmu_capability+0x68     mov    %eax,0x8(%rdx)                                     
  df1b:  perf_get_x86_pmu_capability+0x6b     mov    0x0(%rip),%eax        # 0xdf21 <x86_pmu+0xf8>      
  df21:  perf_get_x86_pmu_capability+0x71     mov    %eax,0xc(%rdx)                                     
  df24:  perf_get_x86_pmu_capability+0x74     mov    %eax,0x10(%rdx)                                    
  df27:  perf_get_x86_pmu_capability+0x77     mov    0x0(%rip),%rax        # 0xdf2e <x86_pmu+0x108>     
  df2e:  perf_get_x86_pmu_capability+0x7e     mov    %eax,0x14(%rdx)                                    
  df31:  perf_get_x86_pmu_capability+0x81     mov    0x0(%rip),%eax        # 0xdf37 <x86_pmu+0x110>     
  df37:  perf_get_x86_pmu_capability+0x87     mov    %eax,0x18(%rdx)                                    
  df3a:  perf_get_x86_pmu_capability+0x8a     movzbl 0x0(%rip),%ecx        # 0xdf41 <x86_pmu+0x1d1>     
  df41:  perf_get_x86_pmu_capability+0x91     movzbl 0x1c(%rdx),%eax                                    
  df45:  perf_get_x86_pmu_capability+0x95     shr    %cl                                                
  df47:  perf_get_x86_pmu_capability+0x97     and    $0x1,%ecx                                          
  df4a:  perf_get_x86_pmu_capability+0x9a     and    $0xfffffffe,%eax                                   
  df4d:  perf_get_x86_pmu_capability+0x9d     or     %ecx,%eax                                          
  df4f:  perf_get_x86_pmu_capability+0x9f     mov    %al,0x1c(%rdx)                                     
  df52:  perf_get_x86_pmu_capability+0xa2     jmpq   0xdf57 <__x86_return_thunk>


Example 3 (--disas option): Alternatives with multiple instructions
-------------------------------------------------------------------
Alternatives with multiple instructions are displayed side-by-side, with
an header describing the alternative. The code in the first column is the
default code of the alternative.


$ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
__switch_to_asm:
  82c0:  __switch_to_asm+0x0      push   %rbp                                               
  82c1:  __switch_to_asm+0x1	  push   %rbx                                               
  82c2:  __switch_to_asm+0x2	  push   %r12                                               
  82c4:  __switch_to_asm+0x4	  push   %r13                                               
  82c6:  __switch_to_asm+0x6	  push   %r14                                               
  82c8:  __switch_to_asm+0x8	  push   %r15                                               
  82ca:  __switch_to_asm+0xa	  mov    %rsp,0x1670(%rdi)                                  
  82d1:  __switch_to_asm+0x11	  mov    0x1670(%rsi),%rsp                                  
  82d8:  __switch_to_asm+0x18	  mov    0xad8(%rsi),%rbx                                   
  82df:  __switch_to_asm+0x1f	  mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
  82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
  82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | NOP1                                 | mov    $0x10,%r12
  82e8:  __switch_to_asm+0x28	  |                                      | NOP1                                 |
  82e9:  __switch_to_asm+0x29	  | NOP1                                 | callq  0x82ef <__switch_to_asm+0x2f> |
  82ea:  __switch_to_asm+0x2a	  | NOP1                                 |                                      |
  82eb:  __switch_to_asm+0x2b	  | NOP1                                 |                                      |
  82ec:  __switch_to_asm+0x2c	  | NOP1                                 |                                      |
  82ed:  __switch_to_asm+0x2d	  | NOP1                                 |                                      |
  82ee:  __switch_to_asm+0x2e	  | NOP1                                 | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
  82ef:  __switch_to_asm+0x2f	  | NOP1                                 | add    $0x8,%rsp                     |
  82f0:  __switch_to_asm+0x30	  | NOP1                                 |                                      |
  82f1:  __switch_to_asm+0x31	  | NOP1                                 |                                      |
  82f2:  __switch_to_asm+0x32	  | NOP1                                 |                                      |
  82f3:  __switch_to_asm+0x33	  | NOP1                                 | lfence                               | int3
  82f4:  __switch_to_asm+0x34	  | NOP1                                 |                                      | callq  0x82fa <__switch_to_asm+0x3a>
  82f5:  __switch_to_asm+0x35	  | NOP1                                 |                                      |
  82f6:  __switch_to_asm+0x36	  | NOP1                                 |                                      |
  82f7:  __switch_to_asm+0x37	  | NOP1                                 |                                      |
  82f8:  __switch_to_asm+0x38	  | NOP1                                 |                                      |
  82f9:  __switch_to_asm+0x39	  | NOP1                                 |                                      | int3
  82fa:  __switch_to_asm+0x3a	  | NOP1                                 |                                      | add    $0x10,%rsp
  82fb:  __switch_to_asm+0x3b	  | NOP1                                 |                                      |
  82fc:  __switch_to_asm+0x3c	  | NOP1                                 |                                      |
  82fd:  __switch_to_asm+0x3d	  | NOP1                                 |                                      |
  82fe:  __switch_to_asm+0x3e	  | NOP1                                 |                                      | dec    %r12
  82ff:  __switch_to_asm+0x3f	  | NOP1                                 |                                      |
  8300:  __switch_to_asm+0x40	  | NOP1                                 |                                      |
  8301:  __switch_to_asm+0x41	  | NOP1                                 |                                      | jne    0x82ee <__switch_to_asm+0x2e>
  8302:  __switch_to_asm+0x42	  | NOP1                                 |                                      |
  8303:  __switch_to_asm+0x43	  | NOP1                                 |                                      | lfence
  8304:  __switch_to_asm+0x44	  | NOP1                                 |                                      |
  8305:  __switch_to_asm+0x45	  | NOP1                                 |                                      |
  8306:  __switch_to_asm+0x46	  | NOP1                                 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
  8307:  __switch_to_asm+0x47	  | NOP1                                 |                                      |
  8308:  __switch_to_asm+0x48	  | NOP1                                 |                                      |
  8309:  __switch_to_asm+0x49	  | NOP1                                 |                                      |
  830a:  __switch_to_asm+0x4a	  | NOP1                                 |                                      |
  830b:  __switch_to_asm+0x4b	  | NOP1                                 |                                      |
  830c:  __switch_to_asm+0x4c	  | NOP1                                 |                                      |
  830d:  __switch_to_asm+0x4d	  | NOP1                                 |                                      |
  830e:  __switch_to_asm+0x4e	  | NOP1                                 |                                      |
  830f:  __switch_to_asm+0x4f	  | NOP1                                 |                                      |
  8310:  __switch_to_asm+0x50	  | NOP1                                 |                                      |
  8311:  __switch_to_asm+0x51	  | NOP1                                 |                                      |
  8312:  __switch_to_asm+0x52	    pop    %r15                                               
  8314:  __switch_to_asm+0x54	    pop    %r14                                               
  8316:  __switch_to_asm+0x56	    pop    %r13                                               
  8318:  __switch_to_asm+0x58	    pop    %r12                                               
  831a:  __switch_to_asm+0x5a	    pop    %rbx                                               
  831b:  __switch_to_asm+0x5b	    pop    %rbp                                               
  831c:  __switch_to_asm+0x5c	    jmpq   0x8321 <__switch_to>                               


Example 4 (--disas option): Alternative with direct call
--------------------------------------------------------
An alternative with a direct call show the pv_ops call and 
the default pv_ops function for this call.

$ ./tools/objtool/objtool --disas=paravirt_read_msr --link vmlinux.o
paravirt_read_msr:
  c3d0:  paravirt_read_msr+0x0      mov    %edi,%edi                                          
  c3d2:  paravirt_read_msr+0x2	    callq  *0x0(%rip)  # 0xc3d8 <pv_ops+0xb0> | callq  pv_ops[22] ~ native_read_msr  (+X86_FEATURE_ALWAYS)   # <alternative.c3d2>
  c3d8:  paravirt_read_msr+0x8	    jmpq   0xc3dd <__x86_return_thunk>                        


Example 5 (--disas option): Alternative with direct call for XENPV
------------------------------------------------------------------
An alternative with a direct call in the XENPV case show the pv_ops
function for XEN.

early_ioremap_pmd:
 332d0:  early_ioremap_pmd+0x0      push   %rbx                                               
 332d1:  early_ioremap_pmd+0x1	    mov    %rdi,%rbx                                          
 332d4:  early_ioremap_pmd+0x4	    callq  *0x0(%rip)        # 0x332da <pv_ops+0x150> | mov    %cr3,%rax  (!X86_FEATURE_XENPV) | callq  xen_read_cr3  (+X86_FEATURE_ALWAYS)   # <alternative.332d4>
 332da:  early_ioremap_pmd+0xa	    mov    0x0(%rip),%rdx        # 0x332e1 <sme_me_mask>      
 332e1:  early_ioremap_pmd+0x11	    mov    0x0(%rip),%ecx        # 0x332e7 <pgdir_shift>      
 332e7:  early_ioremap_pmd+0x17	    mov    %rbx,%rsi                                          
 332ea:  early_ioremap_pmd+0x1a	    and    0x0(%rip),%rax        # 0x332f1 <physical_mask>    
 332f1:  early_ioremap_pmd+0x21	    not    %rdx                                               
 332f4:  early_ioremap_pmd+0x24	    and    %rdx,%rax                                          
 332f7:  early_ioremap_pmd+0x27	    mov    %rbx,%rdx                                          
 332fa:  early_ioremap_pmd+0x2a	    shr    %cl,%rdx                                           
 332fd:  early_ioremap_pmd+0x2d	    and    $0xfffffffffffff000,%rax                           
 33303:  early_ioremap_pmd+0x33	    add    0x0(%rip),%rax        # 0x3330a <page_offset_base> 
 3330a:  early_ioremap_pmd+0x3a	    and    $0x1ff,%edx                                        
 33310:  early_ioremap_pmd+0x40	    lea    (%rax,%rdx,8),%rdi                                 
 33314:  early_ioremap_pmd+0x44	    callq  0x33319 <p4d_offset+0x0>                           
 33319:  early_ioremap_pmd+0x49	    mov    (%rax),%rdi                                        
 3331c:  early_ioremap_pmd+0x4c	    callq  *0x0(%rip)        # 0x33322 <pv_ops+0x228> | mov    %rdi,%rax  (!X86_FEATURE_XENPV) | callq  __raw_callee_save_xen_p4d_val  (+X86_FEATURE_ALWAYS)   # <alternative.3331c>
 33322:  early_ioremap_pmd+0x52	    mov    0x0(%rip),%rdx        # 0x33329 <page_offset_base> 
 33329:  early_ioremap_pmd+0x59	    and    0x0(%rip),%rax        # 0x33330 <physical_mask>    
 33330:  early_ioremap_pmd+0x60	    and    $0xfffffffffffff000,%rax                           
 33336:  early_ioremap_pmd+0x66	    mov    0xff8(%rax,%rdx,1),%rdi                            
 3333e:  early_ioremap_pmd+0x6e	    callq  *0x0(%rip)        # 0x33344 <pv_ops+0x210> | mov    %rdi,%rax  (!X86_FEATURE_XENPV) | callq  __raw_callee_save_xen_pud_val  (+X86_FEATURE_ALWAYS)   # <alternative.3333e>
 33344:  early_ioremap_pmd+0x74	    mov    0x0(%rip),%rcx        # 0x3334b <physical_mask>    
 3334b:  early_ioremap_pmd+0x7b	    mov    %rcx,%rdx                                          
 3334e:  early_ioremap_pmd+0x7e	    and    $0xfffffffffffff000,%rdx                           
 33355:  early_ioremap_pmd+0x85	    and    $0x80,%dil                                         
 33359:  early_ioremap_pmd+0x89	    je     0x33365 <early_ioremap_pmd+0x95>                   
 3335b:  early_ioremap_pmd+0x8b	    and    $0xffffffffc0000000,%rcx                           
 33362:  early_ioremap_pmd+0x92	    mov    %rcx,%rdx                                          
 33365:  early_ioremap_pmd+0x95	    and    %rax,%rdx                                          
 33368:  early_ioremap_pmd+0x98	    add    0x0(%rip),%rdx        # 0x3336f <page_offset_base> 
 3336f:  early_ioremap_pmd+0x9f	    pop    %rbx                                               
 33370:  early_ioremap_pmd+0xa0	    lea    0xfc8(%rdx),%rax                                   
 33377:  early_ioremap_pmd+0xa7	    jmpq   0x3337c <__x86_return_thunk>                       

-----

Alexandre Chartre (28):
  objtool: Move disassembly functions to a separated file
  objtool: Create disassembly context
  objtool: Disassemble code with libopcodes instead of running objdump
  tool build: Remove annoying newline in build output
  objtool: Print symbol during disassembly
  objtool: Store instruction disassembly result
  objtool: Disassemble instruction on warning or backtrace
  objtool: Extract code to validate instruction from the validate branch
    loop
  objtool: Record symbol name max length
  objtool: Add option to trace function validation
  objtool: Trace instruction state changes during function validation
  objtool: Improve register reporting during function validation
  objtool: Identify the different types of alternatives
  objtool: Improve tracing of alternative instructions
  objtool: Do not validate IBT for .return_sites and .call_sites
  objtool: Add the --disas=<function-pattern> action
  objtool: Print headers for alternatives
  objtool: Disassemble group alternatives
  objtool: Print addresses with alternative instructions
  objtool: Disassemble exception table alternatives
  objtool: Disassemble jump table alternatives
  objtool: Fix address references in alternatives
  objtool: Provide access to feature and flags of group alternatives
  objtool: Function to get the name of a CPU feature
  objtool: Improve naming of group alternatives
  objtool: Get the destination name of a PV call
  objtool: Improve the disassembly of the pv_ops call
  objtool: Print single line for alternatives with one instruction

 .../x86/tools/gen-cpu-feature-names-x86.awk   |   33 +
 tools/build/Makefile.feature                  |    4 +-
 tools/objtool/Build                           |    3 +
 tools/objtool/Makefile                        |   23 +
 tools/objtool/arch/loongarch/decode.c         |   30 +
 tools/objtool/arch/loongarch/special.c        |    5 +
 tools/objtool/arch/powerpc/decode.c           |   31 +
 tools/objtool/arch/powerpc/special.c          |    5 +
 tools/objtool/arch/x86/Build                  |    8 +
 tools/objtool/arch/x86/decode.c               |   36 +-
 tools/objtool/arch/x86/special.c              |   10 +
 tools/objtool/builtin-check.c                 |    5 +-
 tools/objtool/check.c                         |  739 ++++++-----
 tools/objtool/disas.c                         | 1137 +++++++++++++++++
 tools/objtool/include/objtool/arch.h          |   14 +-
 tools/objtool/include/objtool/builtin.h       |    2 +
 tools/objtool/include/objtool/check.h         |   41 +-
 tools/objtool/include/objtool/disas.h         |   75 ++
 tools/objtool/include/objtool/elf.h           |    7 +
 tools/objtool/include/objtool/objtool.h       |    6 +-
 tools/objtool/include/objtool/special.h       |    4 +-
 tools/objtool/include/objtool/trace.h         |  139 ++
 tools/objtool/include/objtool/warn.h          |   17 +-
 tools/objtool/objtool.c                       |   27 +-
 tools/objtool/special.c                       |    2 +
 tools/objtool/trace.c                         |  204 +++
 26 files changed, 2287 insertions(+), 320 deletions(-)
 create mode 100644 tools/arch/x86/tools/gen-cpu-feature-names-x86.awk
 create mode 100644 tools/objtool/disas.c
 create mode 100644 tools/objtool/include/objtool/disas.h
 create mode 100644 tools/objtool/include/objtool/trace.h
 create mode 100644 tools/objtool/trace.c

-- 
2.43.5
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Josh Poimboeuf 2 months, 3 weeks ago
On Thu, Nov 13, 2025 at 05:48:49PM +0100, Alexandre Chartre wrote:
> Changes:
> ========
> 
> V4:
> ---
> This version fixes a build issue when disassembly is not available. Compared
> with V3, this is addresses by changes in patch 14 (objtool: Improve tracing
> of alternative instructions). Other patches are similar to V3.

For the next revision, please base on tip/master, as there are some
major objtool changes pending for the next merge window.

Most of my comments below are bikeshedding, they are not required for
the next revision and can be addressed in followup patch sets if you'd
rather do it that way.

> - Each alternative of a group alternative is displayed with its feature
>   name and flags: <flags><feature-name>
> 
>   <flags> is made of the following characters:
> 
>     '!' : ALT_FLAG_NOT
>     '+' : ALT_FLAG_DIRECT_CALL
>     '?' : unknown flag (i.e. any other flags)

Other than '!', the meaning of the flags isn't intuitive.  Maybe it
should just show the source code names:

  ALT_NOT(X86_FEATURE_FOO)

  ALT_DIRECT_CALL(X86_FEATURE_BAR)

  ALT_UNKNOWN_FLAG(X86_FEATURE_BAZ)

?

> - If an alternative is a jump table then "JUMP" is used as the feature
>   name.

Hm, it's a bit confusing to label a jump label as an "alternative" as
those are two distinct things (though I'm aware that objtool conflates
the two).

> - If an alternative is an exception table then "EXCEPTION" is used as the
>   feature name.

Ditto.

> Disassembly can show default alternative jumping to .altinstr_aux
> -----------------------------------------------------------------
> Disassembly can show a default alternative jumping to .altinstr_aux. This
> happens when the _static_cpu_has() function is used. Its default code
> jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
> 
> At runtime, this sequence is not used because the _static_cpu_has() 
> an alternative with the X86_FEATURE_ALWAYS feature. 
> 
> 
>   debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>

I'm finding this one-line format considerably more difficult to parse
than the slightly longer two-line form:

    debc:  perf_get_x86_pmu_capability+0xc      <alternative.debc>		   | X86_FEATURE_HYBRID_CPU | X86_FEATURE_ALWAYS
    debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5		    | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>

Also, I wonder if we can make NOP5 lowercase (nop5), since it really is
just an instruction, not something special like a feature.

> Disassembly can show alternative jumping to the next instruction
> ----------------------------------------------------------------
> 
> The disassembly can show jump tables with an alternative which jumps
> to the next instruction.
> 
> For example:
> 
> def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>

I'm also struggling to read this one.

Maybe this needs a two-line form as well:

  def9:  perf_get_x86_pmu_capability+0x49    <static_branch.def9> |
  def9:  perf_get_x86_pmu_capability+0x49    NOP2		  | jmp    defb <perf_get_x86_pmu_capability+0x4b>

> Example 2 (--disas option): Single Instruction Alternatives
> -----------------------------------------------------------

I would like to convert this to a dedicated "disas" subcommand which can
be run like "objtool disas <func>" or so.  But again that can probably
be done in a followup.

> Example 3 (--disas option): Alternatives with multiple instructions
> -------------------------------------------------------------------
> Alternatives with multiple instructions are displayed side-by-side, with
> an header describing the alternative. The code in the first column is the
> default code of the alternative.
> 
> 
> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
> __switch_to_asm:
>   82c0:  __switch_to_asm+0x0      push   %rbp                                               
>   82c1:  __switch_to_asm+0x1	  push   %rbx                                               
>   82c2:  __switch_to_asm+0x2	  push   %r12                                               
>   82c4:  __switch_to_asm+0x4	  push   %r13                                               
>   82c6:  __switch_to_asm+0x6	  push   %r14                                               
>   82c8:  __switch_to_asm+0x8	  push   %r15                                               
>   82ca:  __switch_to_asm+0xa	  mov    %rsp,0x1670(%rdi)                                  
>   82d1:  __switch_to_asm+0x11	  mov    0x1670(%rsi),%rsp                                  
>   82d8:  __switch_to_asm+0x18	  mov    0xad8(%rsi),%rbx                                   
>   82df:  __switch_to_asm+0x1f	  mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>   82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW

Are the alternatives swapped?  I believe this comes from the following
code, so the !X86_FEATURE_ALWAYS column should be last?

.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
	ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
		__stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
		__stringify(nop;nop;__FILL_ONE_RETURN), \ftr2

.Lskip_rsb_\@:
.endm

>   82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | NOP1                                 | mov    $0x10,%r12
>   82e8:  __switch_to_asm+0x28	  |                                      | NOP1                                 |
>   82e9:  __switch_to_asm+0x29	  | NOP1                                 | callq  0x82ef <__switch_to_asm+0x2f> |
>   82ea:  __switch_to_asm+0x2a	  | NOP1                                 |                                      |
>   82eb:  __switch_to_asm+0x2b	  | NOP1                                 |                                      |
>   82ec:  __switch_to_asm+0x2c	  | NOP1                                 |                                      |
>   82ed:  __switch_to_asm+0x2d	  | NOP1                                 |                                      |
>   82ee:  __switch_to_asm+0x2e	  | NOP1                                 | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
>   82ef:  __switch_to_asm+0x2f	  | NOP1                                 | add    $0x8,%rsp                     |
>   82f0:  __switch_to_asm+0x30	  | NOP1                                 |                                      |
>   82f1:  __switch_to_asm+0x31	  | NOP1                                 |                                      |
>   82f2:  __switch_to_asm+0x32	  | NOP1                                 |                                      |
>   82f3:  __switch_to_asm+0x33	  | NOP1                                 | lfence                               | int3
>   82f4:  __switch_to_asm+0x34	  | NOP1                                 |                                      | callq  0x82fa <__switch_to_asm+0x3a>
>   82f5:  __switch_to_asm+0x35	  | NOP1                                 |                                      |
>   82f6:  __switch_to_asm+0x36	  | NOP1                                 |                                      |
>   82f7:  __switch_to_asm+0x37	  | NOP1                                 |                                      |
>   82f8:  __switch_to_asm+0x38	  | NOP1                                 |                                      |
>   82f9:  __switch_to_asm+0x39	  | NOP1                                 |                                      | int3
>   82fa:  __switch_to_asm+0x3a	  | NOP1                                 |                                      | add    $0x10,%rsp
>   82fb:  __switch_to_asm+0x3b	  | NOP1                                 |                                      |
>   82fc:  __switch_to_asm+0x3c	  | NOP1                                 |                                      |
>   82fd:  __switch_to_asm+0x3d	  | NOP1                                 |                                      |
>   82fe:  __switch_to_asm+0x3e	  | NOP1                                 |                                      | dec    %r12
>   82ff:  __switch_to_asm+0x3f	  | NOP1                                 |                                      |
>   8300:  __switch_to_asm+0x40	  | NOP1                                 |                                      |
>   8301:  __switch_to_asm+0x41	  | NOP1                                 |                                      | jne    0x82ee <__switch_to_asm+0x2e>
>   8302:  __switch_to_asm+0x42	  | NOP1                                 |                                      |
>   8303:  __switch_to_asm+0x43	  | NOP1                                 |                                      | lfence
>   8304:  __switch_to_asm+0x44	  | NOP1                                 |                                      |
>   8305:  __switch_to_asm+0x45	  | NOP1                                 |                                      |
>   8306:  __switch_to_asm+0x46	  | NOP1                                 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>   8307:  __switch_to_asm+0x47	  | NOP1                                 |                                      |
>   8308:  __switch_to_asm+0x48	  | NOP1                                 |                                      |
>   8309:  __switch_to_asm+0x49	  | NOP1                                 |                                      |
>   830a:  __switch_to_asm+0x4a	  | NOP1                                 |                                      |
>   830b:  __switch_to_asm+0x4b	  | NOP1                                 |                                      |
>   830c:  __switch_to_asm+0x4c	  | NOP1                                 |                                      |
>   830d:  __switch_to_asm+0x4d	  | NOP1                                 |                                      |
>   830e:  __switch_to_asm+0x4e	  | NOP1                                 |                                      |
>   830f:  __switch_to_asm+0x4f	  | NOP1                                 |                                      |
>   8310:  __switch_to_asm+0x50	  | NOP1                                 |                                      |
>   8311:  __switch_to_asm+0x51	  | NOP1                                 |                                      |

I like this a lot, but I think it could be vertically compressed quite a
bit, and superfluous NOPs removed:

    82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
    82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | nop1                                 | mov    $0x10,%r12
    82e8:  __switch_to_asm+0x28	  |                                      | nop1                                 |
    82e9:  __switch_to_asm+0x29	  |                                      | callq  0x82ef <__switch_to_asm+0x2f> |
    82ee:  __switch_to_asm+0x2e	  |                                      | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
    82ef:  __switch_to_asm+0x2f	  |                                      | add    $0x8,%rsp                     |
    82f3:  __switch_to_asm+0x33	  |                                      | lfence                               | int3
    82f4:  __switch_to_asm+0x34	  |                                      |                                      | callq  0x82fa <__switch_to_asm+0x3a>
    82f9:  __switch_to_asm+0x39	  |                                      |                                      | int3
    82fa:  __switch_to_asm+0x3a	  |                                      |                                      | add    $0x10,%rsp
    82fe:  __switch_to_asm+0x3e	  |                                      |                                      | dec    %r12
    8301:  __switch_to_asm+0x41	  |                                      |                                      | jne    0x82ee <__switch_to_asm+0x2e>
    8303:  __switch_to_asm+0x43	  |                                      |                                      | lfence
    8306:  __switch_to_asm+0x46	  |                                      |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>

That reads much nicer to me.

-- 
Josh
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/14/25 02:48, Josh Poimboeuf wrote:
> On Thu, Nov 13, 2025 at 05:48:49PM +0100, Alexandre Chartre wrote:
>> Changes:
>> ========
>>
>> V4:
>> ---
>> This version fixes a build issue when disassembly is not available. Compared
>> with V3, this is addresses by changes in patch 14 (objtool: Improve tracing
>> of alternative instructions). Other patches are similar to V3.
> 
> For the next revision, please base on tip/master, as there are some
> major objtool changes pending for the next merge window.

Ok, I will rebase the next revision on tip/master.


> Most of my comments below are bikeshedding, they are not required for
> the next revision and can be addressed in followup patch sets if you'd
> rather do it that way.

If changes are simple, I will try to address them immediately otherwise
defer to next patches.


>> - Each alternative of a group alternative is displayed with its feature
>>    name and flags: <flags><feature-name>
>>
>>    <flags> is made of the following characters:
>>
>>      '!' : ALT_FLAG_NOT
>>      '+' : ALT_FLAG_DIRECT_CALL
>>      '?' : unknown flag (i.e. any other flags)
> 
> Other than '!', the meaning of the flags isn't intuitive.  Maybe it
> should just show the source code names:
> 
>    ALT_NOT(X86_FEATURE_FOO)
> 
>    ALT_DIRECT_CALL(X86_FEATURE_BAR)
> 
>    ALT_UNKNOWN_FLAG(X86_FEATURE_BAZ)
> 

I think '?' is meaningful too, but I wasn't sure about '+'.

I am using single characters to keep the alternative name short. It can already
be fairly long because of the feature name (like "X86_FEATURE_SPEC_STORE_BYPASS_DISABLE")

Also I am assuming that flags can be combined (although that's not currently
the case) so that would be more difficult with full ALT_* names and the
result would be much longer.


>> - If an alternative is a jump table then "JUMP" is used as the feature
>>    name.
> 
> Hm, it's a bit confusing to label a jump label as an "alternative" as
> those are two distinct things (though I'm aware that objtool conflates
> the two).
> 
>> - If an alternative is an exception table then "EXCEPTION" is used as the
>>    feature name.
> 
> Ditto.
> 

Yes, the wording is not good, I use it just because objtool handles jump
labels and exception tables as alternative. I will reword to something
better.


>> Disassembly can show default alternative jumping to .altinstr_aux
>> -----------------------------------------------------------------
>> Disassembly can show a default alternative jumping to .altinstr_aux. This
>> happens when the _static_cpu_has() function is used. Its default code
>> jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
>>
>> At runtime, this sequence is not used because the _static_cpu_has()
>> an alternative with the X86_FEATURE_ALWAYS feature.
>>
>>
>>    debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
> 
> I'm finding this one-line format considerably more difficult to parse
> than the slightly longer two-line form:
> 
>      debc:  perf_get_x86_pmu_capability+0xc      <alternative.debc>		   | X86_FEATURE_HYBRID_CPU | X86_FEATURE_ALWAYS
>      debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5		    | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>


Another option could be:

     debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> (<alternative.debc>) |
                                                 NOP5  (X86_FEATURE_HYBRID_CPU) |
                                                 jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)

I think I will use this option when displaying alternative one after the other,
and your suggestion when displaying side-by-side, and add an option to select
the display.

> 
> Also, I wonder if we can make NOP5 lowercase (nop5), since it really is
> just an instruction, not something special like a feature.

This indicates that this is a pseudo instruction, NOP5 is actually nopl 0x00(%eax,%eax,1).
Even NOP1 can be a simple nop but also xchg %rax,%rax.


>> Disassembly can show alternative jumping to the next instruction
>> ----------------------------------------------------------------
>>
>> The disassembly can show jump tables with an alternative which jumps
>> to the next instruction.
>>
>> For example:
>>
>> def9:  perf_get_x86_pmu_capability+0x49    NOP2 | jmp    defb <perf_get_x86_pmu_capability+0x4b>  (JUMP)   # <alternative.def9>
> 
> I'm also struggling to read this one.
> 
> Maybe this needs a two-line form as well:
> 
>    def9:  perf_get_x86_pmu_capability+0x49    <static_branch.def9> |
>    def9:  perf_get_x86_pmu_capability+0x49    NOP2		  | jmp    defb <perf_get_x86_pmu_capability+0x4b>
  
I will do something similar as suggested above.


>> Example 2 (--disas option): Single Instruction Alternatives
>> -----------------------------------------------------------
> 
> I would like to convert this to a dedicated "disas" subcommand which can
> be run like "objtool disas <func>" or so.  But again that can probably
> be done in a followup.

Ok, I will look at it.


>> Example 3 (--disas option): Alternatives with multiple instructions
>> -------------------------------------------------------------------
>> Alternatives with multiple instructions are displayed side-by-side, with
>> an header describing the alternative. The code in the first column is the
>> default code of the alternative.
>>
>>
>> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
>> __switch_to_asm:
>>    82c0:  __switch_to_asm+0x0      push   %rbp
>>    82c1:  __switch_to_asm+0x1	  push   %rbx
>>    82c2:  __switch_to_asm+0x2	  push   %r12
>>    82c4:  __switch_to_asm+0x4	  push   %r13
>>    82c6:  __switch_to_asm+0x6	  push   %r14
>>    82c8:  __switch_to_asm+0x8	  push   %r15
>>    82ca:  __switch_to_asm+0xa	  mov    %rsp,0x1670(%rdi)
>>    82d1:  __switch_to_asm+0x11	  mov    0x1670(%rsi),%rsp
>>    82d8:  __switch_to_asm+0x18	  mov    0xad8(%rsi),%rbx
>>    82df:  __switch_to_asm+0x1f	  mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>>    82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
> 
> Are the alternatives swapped?  I believe this comes from the following
> code, so the !X86_FEATURE_ALWAYS column should be last?
> 
> .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
> 	ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
> 		__stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
> 		__stringify(nop;nop;__FILL_ONE_RETURN), \ftr2
> 
> .Lskip_rsb_\@:
> .endm

I will check but I process/print alternative in the order provided by
objtool (in struct alternative)


>>    82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | NOP1                                 | mov    $0x10,%r12
>>    82e8:  __switch_to_asm+0x28	  |                                      | NOP1                                 |
>>    82e9:  __switch_to_asm+0x29	  | NOP1                                 | callq  0x82ef <__switch_to_asm+0x2f> |
>>    82ea:  __switch_to_asm+0x2a	  | NOP1                                 |                                      |
>>    82eb:  __switch_to_asm+0x2b	  | NOP1                                 |                                      |
>>    82ec:  __switch_to_asm+0x2c	  | NOP1                                 |                                      |
>>    82ed:  __switch_to_asm+0x2d	  | NOP1                                 |                                      |
>>    82ee:  __switch_to_asm+0x2e	  | NOP1                                 | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
>>    82ef:  __switch_to_asm+0x2f	  | NOP1                                 | add    $0x8,%rsp                     |
>>    82f0:  __switch_to_asm+0x30	  | NOP1                                 |                                      |
>>    82f1:  __switch_to_asm+0x31	  | NOP1                                 |                                      |
>>    82f2:  __switch_to_asm+0x32	  | NOP1                                 |                                      |
>>    82f3:  __switch_to_asm+0x33	  | NOP1                                 | lfence                               | int3
>>    82f4:  __switch_to_asm+0x34	  | NOP1                                 |                                      | callq  0x82fa <__switch_to_asm+0x3a>
>>    82f5:  __switch_to_asm+0x35	  | NOP1                                 |                                      |
>>    82f6:  __switch_to_asm+0x36	  | NOP1                                 |                                      |
>>    82f7:  __switch_to_asm+0x37	  | NOP1                                 |                                      |
>>    82f8:  __switch_to_asm+0x38	  | NOP1                                 |                                      |
>>    82f9:  __switch_to_asm+0x39	  | NOP1                                 |                                      | int3
>>    82fa:  __switch_to_asm+0x3a	  | NOP1                                 |                                      | add    $0x10,%rsp
>>    82fb:  __switch_to_asm+0x3b	  | NOP1                                 |                                      |
>>    82fc:  __switch_to_asm+0x3c	  | NOP1                                 |                                      |
>>    82fd:  __switch_to_asm+0x3d	  | NOP1                                 |                                      |
>>    82fe:  __switch_to_asm+0x3e	  | NOP1                                 |                                      | dec    %r12
>>    82ff:  __switch_to_asm+0x3f	  | NOP1                                 |                                      |
>>    8300:  __switch_to_asm+0x40	  | NOP1                                 |                                      |
>>    8301:  __switch_to_asm+0x41	  | NOP1                                 |                                      | jne    0x82ee <__switch_to_asm+0x2e>
>>    8302:  __switch_to_asm+0x42	  | NOP1                                 |                                      |
>>    8303:  __switch_to_asm+0x43	  | NOP1                                 |                                      | lfence
>>    8304:  __switch_to_asm+0x44	  | NOP1                                 |                                      |
>>    8305:  __switch_to_asm+0x45	  | NOP1                                 |                                      |
>>    8306:  __switch_to_asm+0x46	  | NOP1                                 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>>    8307:  __switch_to_asm+0x47	  | NOP1                                 |                                      |
>>    8308:  __switch_to_asm+0x48	  | NOP1                                 |                                      |
>>    8309:  __switch_to_asm+0x49	  | NOP1                                 |                                      |
>>    830a:  __switch_to_asm+0x4a	  | NOP1                                 |                                      |
>>    830b:  __switch_to_asm+0x4b	  | NOP1                                 |                                      |
>>    830c:  __switch_to_asm+0x4c	  | NOP1                                 |                                      |
>>    830d:  __switch_to_asm+0x4d	  | NOP1                                 |                                      |
>>    830e:  __switch_to_asm+0x4e	  | NOP1                                 |                                      |
>>    830f:  __switch_to_asm+0x4f	  | NOP1                                 |                                      |
>>    8310:  __switch_to_asm+0x50	  | NOP1                                 |                                      |
>>    8311:  __switch_to_asm+0x51	  | NOP1                                 |                                      |
> 
> I like this a lot, but I think it could be vertically compressed quite a
> bit, and superfluous NOPs removed:
> 
>      82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
>      82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | nop1                                 | mov    $0x10,%r12
>      82e8:  __switch_to_asm+0x28	  |                                      | nop1                                 |
>      82e9:  __switch_to_asm+0x29	  |                                      | callq  0x82ef <__switch_to_asm+0x2f> |
>      82ee:  __switch_to_asm+0x2e	  |                                      | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
>      82ef:  __switch_to_asm+0x2f	  |                                      | add    $0x8,%rsp                     |
>      82f3:  __switch_to_asm+0x33	  |                                      | lfence                               | int3
>      82f4:  __switch_to_asm+0x34	  |                                      |                                      | callq  0x82fa <__switch_to_asm+0x3a>
>      82f9:  __switch_to_asm+0x39	  |                                      |                                      | int3
>      82fa:  __switch_to_asm+0x3a	  |                                      |                                      | add    $0x10,%rsp
>      82fe:  __switch_to_asm+0x3e	  |                                      |                                      | dec    %r12
>      8301:  __switch_to_asm+0x41	  |                                      |                                      | jne    0x82ee <__switch_to_asm+0x2e>
>      8303:  __switch_to_asm+0x43	  |                                      |                                      | lfence
>      8306:  __switch_to_asm+0x46	  |                                      |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
> 
> That reads much nicer to me.
> 

Yeah, better. I can easily do that by getting rid of trailing NOPs.

Thanks,

alex.
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Josh Poimboeuf 2 months, 3 weeks ago
On Fri, Nov 14, 2025 at 10:56:48AM +0100, Alexandre Chartre wrote:
> > Other than '!', the meaning of the flags isn't intuitive.  Maybe it
> > should just show the source code names:
> > 
> >    ALT_NOT(X86_FEATURE_FOO)
> > 
> >    ALT_DIRECT_CALL(X86_FEATURE_BAR)
> > 
> >    ALT_UNKNOWN_FLAG(X86_FEATURE_BAZ)
> > 
> 
> I think '?' is meaningful too, but I wasn't sure about '+'.
> 
> I am using single characters to keep the alternative name short. It can already
> be fairly long because of the feature name (like "X86_FEATURE_SPEC_STORE_BYPASS_DISABLE")
> 
> Also I am assuming that flags can be combined (although that's not currently
> the case) so that would be more difficult with full ALT_* names and the
> result would be much longer.

Ok.

> > > - If an alternative is a jump table then "JUMP" is used as the feature
> > >    name.
> > 
> > Hm, it's a bit confusing to label a jump label as an "alternative" as
> > those are two distinct things (though I'm aware that objtool conflates
> > the two).
> > 
> > > - If an alternative is an exception table then "EXCEPTION" is used as the
> > >    feature name.
> > 
> > Ditto.
> > 
> 
> Yes, the wording is not good, I use it just because objtool handles jump
> labels and exception tables as alternative. I will reword to something
> better.

I meant not only the wording here, but also the "<alternative.def9>"
labels shown in the disassembly.

> > > Disassembly can show default alternative jumping to .altinstr_aux
> > > -----------------------------------------------------------------
> > > Disassembly can show a default alternative jumping to .altinstr_aux. This
> > > happens when the _static_cpu_has() function is used. Its default code
> > > jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
> > > 
> > > At runtime, this sequence is not used because the _static_cpu_has()
> > > an alternative with the X86_FEATURE_ALWAYS feature.
> > > 
> > > 
> > >    debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
> > 
> > I'm finding this one-line format considerably more difficult to parse
> > than the slightly longer two-line form:
> > 
> >      debc:  perf_get_x86_pmu_capability+0xc      <alternative.debc>		   | X86_FEATURE_HYBRID_CPU | X86_FEATURE_ALWAYS
> >      debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5		    | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>
> 
> 
> Another option could be:
> 
>     debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> (<alternative.debc>) |
>                                                 NOP5  (X86_FEATURE_HYBRID_CPU) |
>                                                 jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)
> 
> I think I will use this option when displaying alternative one after the other,
> and your suggestion when displaying side-by-side, and add an option to select
> the display.

Hm, but how would that "one after the other" display look for an
alternative with multiple instructions?  A "compact vs wide" option is
ok, but within each of those I think it's helpful to use a consistent
format regardless of whether the alternative has one or multiple
instructions.

> > Also, I wonder if we can make NOP5 lowercase (nop5), since it really is
> > just an instruction, not something special like a feature.
> 
> This indicates that this is a pseudo instruction, NOP5 is actually nopl 0x00(%eax,%eax,1).
> Even NOP1 can be a simple nop but also xchg %rax,%rax.

But it still represents a single instruction... I view it as a readable
shorthand rather than a pseudo instruction.

> > >    82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
> > 
> > Are the alternatives swapped?  I believe this comes from the following
> > code, so the !X86_FEATURE_ALWAYS column should be last?
> > 
> > .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
> > 	ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
> > 		__stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
> > 		__stringify(nop;nop;__FILL_ONE_RETURN), \ftr2
> > 
> > .Lskip_rsb_\@:
> > .endm
> 
> I will check but I process/print alternative in the order provided by
> objtool (in struct alternative)

It's quite possible objtool has them in reverse order.  The order
shouldn't matter for objtool validate_branch(), but definitely matters
here.

-- 
Josh
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/14/25 22:34, Josh Poimboeuf wrote:
> On Fri, Nov 14, 2025 at 10:56:48AM +0100, Alexandre Chartre wrote:
>>> Other than '!', the meaning of the flags isn't intuitive.  Maybe it
>>> should just show the source code names:
>>>
>>>     ALT_NOT(X86_FEATURE_FOO)
>>>
>>>     ALT_DIRECT_CALL(X86_FEATURE_BAR)
>>>
>>>     ALT_UNKNOWN_FLAG(X86_FEATURE_BAZ)
>>>
>>
>> I think '?' is meaningful too, but I wasn't sure about '+'.
>>
>> I am using single characters to keep the alternative name short. It can already
>> be fairly long because of the feature name (like "X86_FEATURE_SPEC_STORE_BYPASS_DISABLE")
>>
>> Also I am assuming that flags can be combined (although that's not currently
>> the case) so that would be more difficult with full ALT_* names and the
>> result would be much longer.
> 
> Ok.
> 
>>>> - If an alternative is a jump table then "JUMP" is used as the feature
>>>>     name.
>>>
>>> Hm, it's a bit confusing to label a jump label as an "alternative" as
>>> those are two distinct things (though I'm aware that objtool conflates
>>> the two).
>>>
>>>> - If an alternative is an exception table then "EXCEPTION" is used as the
>>>>     feature name.
>>>
>>> Ditto.
>>>
>>
>> Yes, the wording is not good, I use it just because objtool handles jump
>> labels and exception tables as alternative. I will reword to something
>> better.
> 
> I meant not only the wording here, but also the "<alternative.def9>"
> labels shown in the disassembly.
> 

Right, I will change it <alternative.xxx>, <jump_label.xxx>, <ex_table.xxx>.


>>>> Disassembly can show default alternative jumping to .altinstr_aux
>>>> -----------------------------------------------------------------
>>>> Disassembly can show a default alternative jumping to .altinstr_aux. This
>>>> happens when the _static_cpu_has() function is used. Its default code
>>>> jumps to .altinstr_aux where a test sequence is executed (test; jnz; jmp).
>>>>
>>>> At runtime, this sequence is not used because the _static_cpu_has()
>>>> an alternative with the X86_FEATURE_ALWAYS feature.
>>>>
>>>>
>>>>     debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5  (X86_FEATURE_HYBRID_CPU) | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)   # <alternative.debc>
>>>
>>> I'm finding this one-line format considerably more difficult to parse
>>> than the slightly longer two-line form:
>>>
>>>       debc:  perf_get_x86_pmu_capability+0xc      <alternative.debc>		   | X86_FEATURE_HYBRID_CPU | X86_FEATURE_ALWAYS
>>>       debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> | NOP5		    | jmpq   0x61a <perf_get_x86_pmu_capability+0x37>
>>
>>
>> Another option could be:
>>
>>      debc:  perf_get_x86_pmu_capability+0xc      jmpq   0xdec1 <.altinstr_aux+0xfc> (<alternative.debc>) |
>>                                                  NOP5  (X86_FEATURE_HYBRID_CPU) |
>>                                                  jmpq   0x61a <perf_get_x86_pmu_capability+0x37>  (X86_FEATURE_ALWAYS)
>>
>> I think I will use this option when displaying alternative one after the other,
>> and your suggestion when displaying side-by-side, and add an option to select
>> the display.
> 
> Hm, but how would that "one after the other" display look for an
> alternative with multiple instructions?  A "compact vs wide" option is
> ok, but within each of those I think it's helpful to use a consistent
> format regardless of whether the alternative has one or multiple
> instructions.
> 

David raises the issue that a side-by-side display requires a large window.

The compact display could be like this:

Alternative with single instruction:

   bb8:  do_one_initcall+0x1a8    <alternative.bb8>
                                  = callq  *0x0(%rip)        # 0xbbe <pv_ops+0xf8>    (if default)
                                  = sti                                               (if !X86_FEATURE_XENPV)
                                  = callq  BUG_func                                   (if +X86_FEATURE_ALWAYS)

Alternative with multiple instructions:

   82e7:  __switch_to_asm+0x27    <alternative.82e7>
                                  = DEFAULT
   82e7:  __switch_to_asm+0x27    | jmp    0x8312 <__switch_to_asm+0x52>
                                  |
                                  = !X86_FEATURE_ALWAYS
   82e7:  __switch_to_asm+0x27    | NOP1
   82e8:  __switch_to_asm+0x28    | NOP1
   82e9:  __switch_to_asm+0x29    | callq  0x82ef <__switch_to_asm+0x2f>
   82ee:  __switch_to_asm+0x2e    | int3
   82ef:  __switch_to_asm+0x2f    | add    $0x8,%rsp
   82f3:  __switch_to_asm+0x33    | lfence
                                  |
                                  = X86_FEATURE_RSB_CTXSW
   82e7:  __switch_to_asm+0x27    | mov    $0x10,%r12
   82ee:  __switch_to_asm+0x2e    | callq  0x82f4 <__switch_to_asm+0x34>
   82f3:  __switch_to_asm+0x33    | int3
   82f4:  __switch_to_asm+0x34    | callq  0x82fa <__switch_to_asm+0x3a>
   82f9:  __switch_to_asm+0x39    | int3
   82fa:  __switch_to_asm+0x3a    | add    $0x10,%rsp
   82fe:  __switch_to_asm+0x3e    | dec    %r12
   8301:  __switch_to_asm+0x41    | jne    0x82ee <__switch_to_asm+0x2e>
   8303:  __switch_to_asm+0x43    | lfence
   8306:  __switch_to_asm+0x46    | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
                                  |


>>> Also, I wonder if we can make NOP5 lowercase (nop5), since it really is
>>> just an instruction, not something special like a feature.
>>
>> This indicates that this is a pseudo instruction, NOP5 is actually nopl 0x00(%eax,%eax,1).
>> Even NOP1 can be a simple nop but also xchg %rax,%rax.
> 
> But it still represents a single instruction... I view it as a readable
> shorthand rather than a pseudo instruction.
> 

Ok. I will change to nop<n>.


>>>>     82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
>>>
>>> Are the alternatives swapped?  I believe this comes from the following
>>> code, so the !X86_FEATURE_ALWAYS column should be last?
>>>
>>> .macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
>>> 	ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
>>> 		__stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
>>> 		__stringify(nop;nop;__FILL_ONE_RETURN), \ftr2
>>>
>>> .Lskip_rsb_\@:
>>> .endm
>>
>> I will check but I process/print alternative in the order provided by
>> objtool (in struct alternative)
> 
> It's quite possible objtool has them in reverse order.  The order
> shouldn't matter for objtool validate_branch(), but definitely matters
> here.
> 

That's probably the issue. I think objtool reads alternatives in the right
order but it adds each of them at the beginning of the same list. So they
end up in reverse order in the list.

Thanks,

alex.
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by David Laight 2 months, 3 weeks ago
On Mon, 17 Nov 2025 08:50:45 +0100
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:

> On 11/14/25 22:34, Josh Poimboeuf wrote:
...
> David raises the issue that a side-by-side display requires a large window.
> 
> The compact display could be like this:
> 
> Alternative with single instruction:
> 
>    bb8:  do_one_initcall+0x1a8    <alternative.bb8>
>                                   = callq  *0x0(%rip)        # 0xbbe <pv_ops+0xf8>    (if default)
>                                   = sti                                               (if !X86_FEATURE_XENPV)
>                                   = callq  BUG_func                                   (if +X86_FEATURE_ALWAYS)
> 
> Alternative with multiple instructions:
> 
>    82e7:  __switch_to_asm+0x27    <alternative.82e7>
>                                   = DEFAULT
>    82e7:  __switch_to_asm+0x27    | jmp    0x8312 <__switch_to_asm+0x52>
>                                   |
>                                   = !X86_FEATURE_ALWAYS
>    82e7:  __switch_to_asm+0x27    | NOP1
>    82e8:  __switch_to_asm+0x28    | NOP1
>    82e9:  __switch_to_asm+0x29    | callq  0x82ef <__switch_to_asm+0x2f>
>    82ee:  __switch_to_asm+0x2e    | int3
>    82ef:  __switch_to_asm+0x2f    | add    $0x8,%rsp
>    82f3:  __switch_to_asm+0x33    | lfence
>                                   |
>                                   = X86_FEATURE_RSB_CTXSW
>    82e7:  __switch_to_asm+0x27    | mov    $0x10,%r12
>    82ee:  __switch_to_asm+0x2e    | callq  0x82f4 <__switch_to_asm+0x34>
>    82f3:  __switch_to_asm+0x33    | int3
>    82f4:  __switch_to_asm+0x34    | callq  0x82fa <__switch_to_asm+0x3a>
>    82f9:  __switch_to_asm+0x39    | int3
>    82fa:  __switch_to_asm+0x3a    | add    $0x10,%rsp
>    82fe:  __switch_to_asm+0x3e    | dec    %r12
>    8301:  __switch_to_asm+0x41    | jne    0x82ee <__switch_to_asm+0x2e>
>    8303:  __switch_to_asm+0x43    | lfence
>    8306:  __switch_to_asm+0x46    | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>

That does looks better.
Although I think there ought to be some indication of the 31 NOP bytes
at the end of the middle alternative.

I'd also decode those callq as 'callq .+6' - not sure what other people think?
It is rather specific to that code.

	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/17/25 10:42, David Laight wrote:
> On Mon, 17 Nov 2025 08:50:45 +0100
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
>> On 11/14/25 22:34, Josh Poimboeuf wrote:
> ...
>> David raises the issue that a side-by-side display requires a large window.
>>
>> The compact display could be like this:
>>
>> Alternative with single instruction:
>>
>>     bb8:  do_one_initcall+0x1a8    <alternative.bb8>
>>                                    = callq  *0x0(%rip)        # 0xbbe <pv_ops+0xf8>    (if default)
>>                                    = sti                                               (if !X86_FEATURE_XENPV)
>>                                    = callq  BUG_func                                   (if +X86_FEATURE_ALWAYS)
>>
>> Alternative with multiple instructions:
>>
>>     82e7:  __switch_to_asm+0x27    <alternative.82e7>
>>                                    = DEFAULT
>>     82e7:  __switch_to_asm+0x27    | jmp    0x8312 <__switch_to_asm+0x52>
>>                                    |
>>                                    = !X86_FEATURE_ALWAYS
>>     82e7:  __switch_to_asm+0x27    | NOP1
>>     82e8:  __switch_to_asm+0x28    | NOP1
>>     82e9:  __switch_to_asm+0x29    | callq  0x82ef <__switch_to_asm+0x2f>
>>     82ee:  __switch_to_asm+0x2e    | int3
>>     82ef:  __switch_to_asm+0x2f    | add    $0x8,%rsp
>>     82f3:  __switch_to_asm+0x33    | lfence
>>                                    |
>>                                    = X86_FEATURE_RSB_CTXSW
>>     82e7:  __switch_to_asm+0x27    | mov    $0x10,%r12
>>     82ee:  __switch_to_asm+0x2e    | callq  0x82f4 <__switch_to_asm+0x34>
>>     82f3:  __switch_to_asm+0x33    | int3
>>     82f4:  __switch_to_asm+0x34    | callq  0x82fa <__switch_to_asm+0x3a>
>>     82f9:  __switch_to_asm+0x39    | int3
>>     82fa:  __switch_to_asm+0x3a    | add    $0x10,%rsp
>>     82fe:  __switch_to_asm+0x3e    | dec    %r12
>>     8301:  __switch_to_asm+0x41    | jne    0x82ee <__switch_to_asm+0x2e>
>>     8303:  __switch_to_asm+0x43    | lfence
>>     8306:  __switch_to_asm+0x46    | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
> 
> That does looks better.
> Although I think there ought to be some indication of the 31 NOP bytes
> at the end of the middle alternative.

I am now compacting the code by removing all trailing NOPs. I should probably
improve that with printing the actual number of NOPs, for example NOP31 (or nop31)

> I'd also decode those callq as 'callq .+6' - not sure what other people think?
> It is rather specific to that code.

This is done by libopcodes. I will need to check if there is an option to display
the branch distance instead of the branch target.

Thanks,

alex.
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by David Laight 2 months, 3 weeks ago
On Mon, 17 Nov 2025 10:47:06 +0100
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:

> On 11/17/25 10:42, David Laight wrote:
...
> > Although I think there ought to be some indication of the 31 NOP bytes
> > at the end of the middle alternative.  
> 
> I am now compacting the code by removing all trailing NOPs. I should probably
> improve that with printing the actual number of NOPs, for example NOP31 (or nop31)

That is the sort of thing I was thinking of.
Perhaps the actual opcodes on one line - eg: NOP5; NOP5; NOP5; NOP1

> > I'd also decode those callq as 'callq .+6' - not sure what other people think?
> > It is rather specific to that code.  
> 
> This is done by libopcodes. I will need to check if there is an option to display
> the branch distance instead of the branch target.

The 'problem' is that mostly you want the branch target - except when it is small.
Then you don't need both 'address' and 'symbol+offset', and it is quicker to find
the target by looking at the branch distance.
I'm not sure how you'd please everyone :-)

I'm sure one of the disassemblers ends up giving you the target address in a form
that isn't on the instruction line!
I've definitely counted opcode bytes to find the target.

	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/17/25 13:37, David Laight wrote:
> On Mon, 17 Nov 2025 10:47:06 +0100
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
>> On 11/17/25 10:42, David Laight wrote:
> ...
>>> Although I think there ought to be some indication of the 31 NOP bytes
>>> at the end of the middle alternative.
>>
>> I am now compacting the code by removing all trailing NOPs. I should probably
>> improve that with printing the actual number of NOPs, for example NOP31 (or nop31)
> 
> That is the sort of thing I was thinking of.
> Perhaps the actual opcodes on one line - eg: NOP5; NOP5; NOP5; NOP1

That might not always be very compact. For example __switch_to_asm() has 41 NOP1.
I will use NOP<n> for now, and we can improve later.

  
>>> I'd also decode those callq as 'callq .+6' - not sure what other people think?
>>> It is rather specific to that code.
>>
>> This is done by libopcodes. I will need to check if there is an option to display
>> the branch distance instead of the branch target.
> 
> The 'problem' is that mostly you want the branch target - except when it is small.
> Then you don't need both 'address' and 'symbol+offset', and it is quicker to find
> the target by looking at the branch distance.
> I'm not sure how you'd please everyone :-)
> 
> I'm sure one of the disassemblers ends up giving you the target address in a form
> that isn't on the instruction line!
> I've definitely counted opcode bytes to find the target.
> 

I will investigate this for a later patch. Maybe have both the distance and the
destination (could be an option?).

Thanks,

alex.
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by David Laight 2 months, 3 weeks ago
On Mon, 17 Nov 2025 14:11:55 +0100
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:

> On 11/17/25 13:37, David Laight wrote:
> > On Mon, 17 Nov 2025 10:47:06 +0100
> > Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> >   
> >> On 11/17/25 10:42, David Laight wrote:  
> > ...  
> >>> Although I think there ought to be some indication of the 31 NOP bytes
> >>> at the end of the middle alternative.  
> >>
> >> I am now compacting the code by removing all trailing NOPs. I should probably
> >> improve that with printing the actual number of NOPs, for example NOP31 (or nop31)  
> > 
> > That is the sort of thing I was thinking of.
> > Perhaps the actual opcodes on one line - eg: NOP5; NOP5; NOP5; NOP1  
> 
> That might not always be very compact. For example __switch_to_asm() has 41 NOP1.
> I will use NOP<n> for now, and we can improve later.

Could you use NOP1*41 (etc) so that NOP5*4 is different from NOP1*20?
(I'm guessing you hand-decode the standard NOP sequences already?)

Hmm... you don't want to execute that many 0x90 bytes.
I think that case might have had a jump around them.

Do I remember something about the trailing nop being merged?
Perhaps that is the kernel patching code.
Something made me think objtool might (also) be doing it.

	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/17/25 23:09, David Laight wrote:
> On Mon, 17 Nov 2025 14:11:55 +0100
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
>> On 11/17/25 13:37, David Laight wrote:
>>> On Mon, 17 Nov 2025 10:47:06 +0100
>>> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
>>>    
>>>> On 11/17/25 10:42, David Laight wrote:
>>> ...
>>>>> Although I think there ought to be some indication of the 31 NOP bytes
>>>>> at the end of the middle alternative.
>>>>
>>>> I am now compacting the code by removing all trailing NOPs. I should probably
>>>> improve that with printing the actual number of NOPs, for example NOP31 (or nop31)
>>>
>>> That is the sort of thing I was thinking of.
>>> Perhaps the actual opcodes on one line - eg: NOP5; NOP5; NOP5; NOP1
>>
>> That might not always be very compact. For example __switch_to_asm() has 41 NOP1.
>> I will use NOP<n> for now, and we can improve later.
> 
> Could you use NOP1*41 (etc) so that NOP5*4 is different from NOP1*20?
> (I'm guessing you hand-decode the standard NOP sequences already?)

Yes, objtool already identifies standard NOP sequences.

> Hmm... you don't want to execute that many 0x90 bytes.
> I think that case might have had a jump around them.

In that specific case, they are not executed, they are after a jump:

   82e7:  __switch_to_asm+0x27  <alternative.82e7>
                                = DEFAULT
   82e7:  __switch_to_asm+0x27  | jmp    0x8312 <__switch_to_asm+0x52>
   82e9:  __switch_to_asm+0x29  | NOP41
                                                                                  |

alex.

> Do I remember something about the trailing nop being merged?
> Perhaps that is the kernel patching code.
> Something made me think objtool might (also) be doing it.
> 
> 	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Peter Zijlstra 2 months, 3 weeks ago
On Tue, Nov 18, 2025 at 08:19:14AM +0100, Alexandre Chartre wrote:

> In that specific case, they are not executed, they are after a jump:
> 
>   82e7:  __switch_to_asm+0x27  <alternative.82e7>
>                                = DEFAULT
>   82e7:  __switch_to_asm+0x27  | jmp    0x8312 <__switch_to_asm+0x52>
>   82e9:  __switch_to_asm+0x29  | NOP41

nop41 is a bit naf since x86 can only have 15 byte instructions :-)
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago

On 11/18/25 10:12, Peter Zijlstra wrote:
> On Tue, Nov 18, 2025 at 08:19:14AM +0100, Alexandre Chartre wrote:
> 
>> In that specific case, they are not executed, they are after a jump:
>>
>>    82e7:  __switch_to_asm+0x27  <alternative.82e7>
>>                                 = DEFAULT
>>    82e7:  __switch_to_asm+0x27  | jmp    0x8312 <__switch_to_asm+0x52>
>>    82e9:  __switch_to_asm+0x29  | NOP41
> 
> nop41 is a bit naf since x86 can only have 15 byte instructions :-)

Yes, "nop1 x41" would be better for this specific case. But I think
I will just drop it for now as Josh suggested.

alex.
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Josh Poimboeuf 2 months, 3 weeks ago
On Mon, Nov 17, 2025 at 10:09:53PM +0000, David Laight wrote:
> On Mon, 17 Nov 2025 14:11:55 +0100
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
> > On 11/17/25 13:37, David Laight wrote:
> > > On Mon, 17 Nov 2025 10:47:06 +0100
> > > Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> > >   
> > >> On 11/17/25 10:42, David Laight wrote:  
> > > ...  
> > >>> Although I think there ought to be some indication of the 31 NOP bytes
> > >>> at the end of the middle alternative.  

I'm not sure we need that.  It's already implied those gaps will be
filled with NOPs.  This could add unnecessary visual clutter.

> > >>
> > >> I am now compacting the code by removing all trailing NOPs. I should probably
> > >> improve that with printing the actual number of NOPs, for example NOP31 (or nop31)  
> > > 
> > > That is the sort of thing I was thinking of.
> > > Perhaps the actual opcodes on one line - eg: NOP5; NOP5; NOP5; NOP1  
> > 
> > That might not always be very compact. For example __switch_to_asm() has 41 NOP1.
> > I will use NOP<n> for now, and we can improve later.
> 
> Could you use NOP1*41 (etc) so that NOP5*4 is different from NOP1*20?
> (I'm guessing you hand-decode the standard NOP sequences already?)
> 
> Hmm... you don't want to execute that many 0x90 bytes.
> I think that case might have had a jump around them.
> 
> Do I remember something about the trailing nop being merged?
> Perhaps that is the kernel patching code.
> Something made me think objtool might (also) be doing it.

Yes, IIRC, the alternatives code merges the small NOPs into bigger ones.

-- 
Josh
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by David Laight 2 months, 3 weeks ago
On Mon, 17 Nov 2025 14:38:49 -0800
Josh Poimboeuf <jpoimboe@kernel.org> wrote:

> On Mon, Nov 17, 2025 at 10:09:53PM +0000, David Laight wrote:
> > On Mon, 17 Nov 2025 14:11:55 +0100
> > Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> >   
> > > On 11/17/25 13:37, David Laight wrote:  
> > > > On Mon, 17 Nov 2025 10:47:06 +0100
> > > > Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> > > >     
> > > >> On 11/17/25 10:42, David Laight wrote:    
> > > > ...    
> > > >>> Although I think there ought to be some indication of the 31 NOP bytes
> > > >>> at the end of the middle alternative.    
> 
> I'm not sure we need that.  It's already implied those gaps will be
> filled with NOPs.  This could add unnecessary visual clutter.

But you need some idea of the size of the gap.
A large gap isn't really a good idea and may mean it is better to
refactor the code.
While the execution time of nops might be zero, there is still
the cost of fetching and decoding them.

...
> > Do I remember something about the trailing nop being merged?
> > Perhaps that is the kernel patching code.
> > Something made me think objtool might (also) be doing it.  
> 
> Yes, IIRC, the alternatives code merges the small NOPs into bigger ones.

That probably means it doesn't matter how objdump displays them.
But perhaps it ought to output NOP*5 rather than NOP5 to make it
clear it is a block of NOP (that will converted later) rather
than a single NOP5 instruction.

	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by David Laight 2 months, 3 weeks ago
On Thu, 13 Nov 2025 17:48:49 +0100
Alexandre Chartre <alexandre.chartre@oracle.com> wrote:

> Hi,
> 
> These patches change objtool to disassemble code with libopcodes instead
> of running objdump. You will find below:
> 
> - Changes: list of changes made in this version
> - Overview: overview of the changes
> - Notes: description of some particular behavior
> - Examples: output examples
...
> Example 3 (--disas option): Alternatives with multiple instructions
> -------------------------------------------------------------------
> Alternatives with multiple instructions are displayed side-by-side, with
> an header describing the alternative. The code in the first column is the
> default code of the alternative.
> 
> 
> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
> __switch_to_asm:
>   82c0:  __switch_to_asm+0x0      push   %rbp                                               
>   82c1:  __switch_to_asm+0x1	  push   %rbx                                               
>   82c2:  __switch_to_asm+0x2	  push   %r12                                               
>   82c4:  __switch_to_asm+0x4	  push   %r13                                               
>   82c6:  __switch_to_asm+0x6	  push   %r14                                               
>   82c8:  __switch_to_asm+0x8	  push   %r15                                               
>   82ca:  __switch_to_asm+0xa	  mov    %rsp,0x1670(%rdi)                                  
>   82d1:  __switch_to_asm+0x11	  mov    0x1670(%rsi),%rsp                                  
>   82d8:  __switch_to_asm+0x18	  mov    0xad8(%rsi),%rbx                                   
>   82df:  __switch_to_asm+0x1f	  mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>   82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
>   82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | NOP1                                 | mov    $0x10,%r12
>   82e8:  __switch_to_asm+0x28	  |                                      | NOP1                                 |
>   82e9:  __switch_to_asm+0x29	  | NOP1                                 | callq  0x82ef <__switch_to_asm+0x2f> |
>   82ea:  __switch_to_asm+0x2a	  | NOP1                                 |                                      |
>   82eb:  __switch_to_asm+0x2b	  | NOP1                                 |                                      |
>   82ec:  __switch_to_asm+0x2c	  | NOP1                                 |                                      |
>   82ed:  __switch_to_asm+0x2d	  | NOP1                                 |                                      |
>   82ee:  __switch_to_asm+0x2e	  | NOP1                                 | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
>   82ef:  __switch_to_asm+0x2f	  | NOP1                                 | add    $0x8,%rsp                     |
>   82f0:  __switch_to_asm+0x30	  | NOP1                                 |                                      |
>   82f1:  __switch_to_asm+0x31	  | NOP1                                 |                                      |
>   82f2:  __switch_to_asm+0x32	  | NOP1                                 |                                      |
>   82f3:  __switch_to_asm+0x33	  | NOP1                                 | lfence                               | int3
>   82f4:  __switch_to_asm+0x34	  | NOP1                                 |                                      | callq  0x82fa <__switch_to_asm+0x3a>
>   82f5:  __switch_to_asm+0x35	  | NOP1                                 |                                      |
>   82f6:  __switch_to_asm+0x36	  | NOP1                                 |                                      |
>   82f7:  __switch_to_asm+0x37	  | NOP1                                 |                                      |
>   82f8:  __switch_to_asm+0x38	  | NOP1                                 |                                      |
>   82f9:  __switch_to_asm+0x39	  | NOP1                                 |                                      | int3
>   82fa:  __switch_to_asm+0x3a	  | NOP1                                 |                                      | add    $0x10,%rsp
>   82fb:  __switch_to_asm+0x3b	  | NOP1                                 |                                      |
>   82fc:  __switch_to_asm+0x3c	  | NOP1                                 |                                      |
>   82fd:  __switch_to_asm+0x3d	  | NOP1                                 |                                      |
>   82fe:  __switch_to_asm+0x3e	  | NOP1                                 |                                      | dec    %r12
>   82ff:  __switch_to_asm+0x3f	  | NOP1                                 |                                      |
>   8300:  __switch_to_asm+0x40	  | NOP1                                 |                                      |
>   8301:  __switch_to_asm+0x41	  | NOP1                                 |                                      | jne    0x82ee <__switch_to_asm+0x2e>
>   8302:  __switch_to_asm+0x42	  | NOP1                                 |                                      |
>   8303:  __switch_to_asm+0x43	  | NOP1                                 |                                      | lfence
>   8304:  __switch_to_asm+0x44	  | NOP1                                 |                                      |
>   8305:  __switch_to_asm+0x45	  | NOP1                                 |                                      |
>   8306:  __switch_to_asm+0x46	  | NOP1                                 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>   8307:  __switch_to_asm+0x47	  | NOP1                                 |                                      |
>   8308:  __switch_to_asm+0x48	  | NOP1                                 |                                      |
>   8309:  __switch_to_asm+0x49	  | NOP1                                 |                                      |
>   830a:  __switch_to_asm+0x4a	  | NOP1                                 |                                      |
>   830b:  __switch_to_asm+0x4b	  | NOP1                                 |                                      |
>   830c:  __switch_to_asm+0x4c	  | NOP1                                 |                                      |
>   830d:  __switch_to_asm+0x4d	  | NOP1                                 |                                      |
>   830e:  __switch_to_asm+0x4e	  | NOP1                                 |                                      |
>   830f:  __switch_to_asm+0x4f	  | NOP1                                 |                                      |
>   8310:  __switch_to_asm+0x50	  | NOP1                                 |                                      |
>   8311:  __switch_to_asm+0x51	  | NOP1                                 |                                      |
>   8312:  __switch_to_asm+0x52	    pop    %r15                                               
>   8314:  __switch_to_asm+0x54	    pop    %r14                                               
>   8316:  __switch_to_asm+0x56	    pop    %r13                                               
>   8318:  __switch_to_asm+0x58	    pop    %r12                                               
>   831a:  __switch_to_asm+0x5a	    pop    %rbx                                               
>   831b:  __switch_to_asm+0x5b	    pop    %rbp                                               
>   831c:  __switch_to_asm+0x5c	    jmpq   0x8321 <__switch_to>  

That might be rather easier to read if the alternatives followed each other.
Not all of us want to use a very wide window to look at object files.
(I didn't see any other example like that either.)

Similarly in Ex 5:
 332d4:  early_ioremap_pmd+0x4	    callq  *0x0(%rip)        # 0x332da <pv_ops+0x150> | mov    %cr3,%rax  (!X86_FEATURE_XENPV) | callq  xen_read_cr3  (+X86_FEATURE_ALWAYS)   # <alternative.332d4>
might be more readable flipped to something like:
 332d4:  early_ioremap_pmd+0x4	    callq  *0x0(%rip)        # 0x332da <pv_ops+0x150>
	   !X86_FEATURE_XENPV:          mov    %cr3,%rax
	   +X86_FEATURE_ALWAYS:         callq  xen_read_cr3

	David
Re: [PATCH v4 00/28] objtool: Function validation tracing
Posted by Alexandre Chartre 2 months, 3 weeks ago
On 11/13/25 20:55, David Laight wrote:
> On Thu, 13 Nov 2025 17:48:49 +0100
> Alexandre Chartre <alexandre.chartre@oracle.com> wrote:
> 
>> Hi,
>>
>> These patches change objtool to disassemble code with libopcodes instead
>> of running objdump. You will find below:
>>
>> - Changes: list of changes made in this version
>> - Overview: overview of the changes
>> - Notes: description of some particular behavior
>> - Examples: output examples
> ...
>> Example 3 (--disas option): Alternatives with multiple instructions
>> -------------------------------------------------------------------
>> Alternatives with multiple instructions are displayed side-by-side, with
>> an header describing the alternative. The code in the first column is the
>> default code of the alternative.
>>
>>
>> $ ./tools/objtool/objtool --disas=__switch_to_asm --link vmlinux.o
>> __switch_to_asm:
>>    82c0:  __switch_to_asm+0x0      push   %rbp
>>    82c1:  __switch_to_asm+0x1	  push   %rbx
>>    82c2:  __switch_to_asm+0x2	  push   %r12
>>    82c4:  __switch_to_asm+0x4	  push   %r13
>>    82c6:  __switch_to_asm+0x6	  push   %r14
>>    82c8:  __switch_to_asm+0x8	  push   %r15
>>    82ca:  __switch_to_asm+0xa	  mov    %rsp,0x1670(%rdi)
>>    82d1:  __switch_to_asm+0x11	  mov    0x1670(%rsi),%rsp
>>    82d8:  __switch_to_asm+0x18	  mov    0xad8(%rsi),%rbx
>>    82df:  __switch_to_asm+0x1f	  mov    %rbx,%gs:0x0(%rip)        # 0x82e7 <__stack_chk_guard>
>>    82e7:  __switch_to_asm+0x27	  | <alternative.82e7>                   | !X86_FEATURE_ALWAYS                  | X86_FEATURE_RSB_CTXSW
>>    82e7:  __switch_to_asm+0x27	  | jmp    0x8312 <__switch_to_asm+0x52> | NOP1                                 | mov    $0x10,%r12
>>    82e8:  __switch_to_asm+0x28	  |                                      | NOP1                                 |
>>    82e9:  __switch_to_asm+0x29	  | NOP1                                 | callq  0x82ef <__switch_to_asm+0x2f> |
>>    82ea:  __switch_to_asm+0x2a	  | NOP1                                 |                                      |
>>    82eb:  __switch_to_asm+0x2b	  | NOP1                                 |                                      |
>>    82ec:  __switch_to_asm+0x2c	  | NOP1                                 |                                      |
>>    82ed:  __switch_to_asm+0x2d	  | NOP1                                 |                                      |
>>    82ee:  __switch_to_asm+0x2e	  | NOP1                                 | int3                                 | callq  0x82f4 <__switch_to_asm+0x34>
>>    82ef:  __switch_to_asm+0x2f	  | NOP1                                 | add    $0x8,%rsp                     |
>>    82f0:  __switch_to_asm+0x30	  | NOP1                                 |                                      |
>>    82f1:  __switch_to_asm+0x31	  | NOP1                                 |                                      |
>>    82f2:  __switch_to_asm+0x32	  | NOP1                                 |                                      |
>>    82f3:  __switch_to_asm+0x33	  | NOP1                                 | lfence                               | int3
>>    82f4:  __switch_to_asm+0x34	  | NOP1                                 |                                      | callq  0x82fa <__switch_to_asm+0x3a>
>>    82f5:  __switch_to_asm+0x35	  | NOP1                                 |                                      |
>>    82f6:  __switch_to_asm+0x36	  | NOP1                                 |                                      |
>>    82f7:  __switch_to_asm+0x37	  | NOP1                                 |                                      |
>>    82f8:  __switch_to_asm+0x38	  | NOP1                                 |                                      |
>>    82f9:  __switch_to_asm+0x39	  | NOP1                                 |                                      | int3
>>    82fa:  __switch_to_asm+0x3a	  | NOP1                                 |                                      | add    $0x10,%rsp
>>    82fb:  __switch_to_asm+0x3b	  | NOP1                                 |                                      |
>>    82fc:  __switch_to_asm+0x3c	  | NOP1                                 |                                      |
>>    82fd:  __switch_to_asm+0x3d	  | NOP1                                 |                                      |
>>    82fe:  __switch_to_asm+0x3e	  | NOP1                                 |                                      | dec    %r12
>>    82ff:  __switch_to_asm+0x3f	  | NOP1                                 |                                      |
>>    8300:  __switch_to_asm+0x40	  | NOP1                                 |                                      |
>>    8301:  __switch_to_asm+0x41	  | NOP1                                 |                                      | jne    0x82ee <__switch_to_asm+0x2e>
>>    8302:  __switch_to_asm+0x42	  | NOP1                                 |                                      |
>>    8303:  __switch_to_asm+0x43	  | NOP1                                 |                                      | lfence
>>    8304:  __switch_to_asm+0x44	  | NOP1                                 |                                      |
>>    8305:  __switch_to_asm+0x45	  | NOP1                                 |                                      |
>>    8306:  __switch_to_asm+0x46	  | NOP1                                 |                                      | movq   $0xffffffffffffffff,%gs:0x0(%rip)        # 0x20b <__x86_call_depth>
>>    8307:  __switch_to_asm+0x47	  | NOP1                                 |                                      |
>>    8308:  __switch_to_asm+0x48	  | NOP1                                 |                                      |
>>    8309:  __switch_to_asm+0x49	  | NOP1                                 |                                      |
>>    830a:  __switch_to_asm+0x4a	  | NOP1                                 |                                      |
>>    830b:  __switch_to_asm+0x4b	  | NOP1                                 |                                      |
>>    830c:  __switch_to_asm+0x4c	  | NOP1                                 |                                      |
>>    830d:  __switch_to_asm+0x4d	  | NOP1                                 |                                      |
>>    830e:  __switch_to_asm+0x4e	  | NOP1                                 |                                      |
>>    830f:  __switch_to_asm+0x4f	  | NOP1                                 |                                      |
>>    8310:  __switch_to_asm+0x50	  | NOP1                                 |                                      |
>>    8311:  __switch_to_asm+0x51	  | NOP1                                 |                                      |
>>    8312:  __switch_to_asm+0x52	    pop    %r15
>>    8314:  __switch_to_asm+0x54	    pop    %r14
>>    8316:  __switch_to_asm+0x56	    pop    %r13
>>    8318:  __switch_to_asm+0x58	    pop    %r12
>>    831a:  __switch_to_asm+0x5a	    pop    %rbx
>>    831b:  __switch_to_asm+0x5b	    pop    %rbp
>>    831c:  __switch_to_asm+0x5c	    jmpq   0x8321 <__switch_to>
> 
> That might be rather easier to read if the alternatives followed each other.
> Not all of us want to use a very wide window to look at object files.
> (I didn't see any other example like that either.)
> 
> Similarly in Ex 5:
>   332d4:  early_ioremap_pmd+0x4	    callq  *0x0(%rip)        # 0x332da <pv_ops+0x150> | mov    %cr3,%rax  (!X86_FEATURE_XENPV) | callq  xen_read_cr3  (+X86_FEATURE_ALWAYS)   # <alternative.332d4>
> might be more readable flipped to something like:
>   332d4:  early_ioremap_pmd+0x4	    callq  *0x0(%rip)        # 0x332da <pv_ops+0x150>
> 	   !X86_FEATURE_XENPV:          mov    %cr3,%rax
> 	   +X86_FEATURE_ALWAYS:         callq  xen_read_cr3
> 

I initially had alternatives followed each other, but PeterZ suggested to
have them on one line. But I agree that this requires a large window.

I can add an option to select the display: either following each other (that
would be the default), or side-by-side.

Thanks,

alex.