[PATCH V3 0/7] x86/entry: Convert error_entry() to C code

Lai Jiangshan posted 7 patches 3 years, 10 months ago
arch/x86/entry/Makefile                |   3 +-
arch/x86/entry/calling.h               |  10 --
arch/x86/entry/entry64.c               | 137 +++++++++++++++++++++++++
arch/x86/entry/entry_64.S              |  85 +--------------
arch/x86/include/asm/idtentry.h        |   3 +
arch/x86/include/asm/processor-flags.h |  15 +++
arch/x86/include/asm/proto.h           |   1 +
arch/x86/include/asm/special_insns.h   |   4 +-
arch/x86/include/asm/traps.h           |   1 +
arch/x86/kernel/traps.c                |   2 -
include/linux/compiler_types.h         |   8 +-
11 files changed, 169 insertions(+), 100 deletions(-)
create mode 100644 arch/x86/entry/entry64.c
[PATCH V3 0/7] x86/entry: Convert error_entry() to C code
Posted by Lai Jiangshan 3 years, 10 months ago
From: Lai Jiangshan <jiangshan.ljs@antgroup.com>

Add some C equivalent functions of the ASM macros and implement the whole
error_entry() as C code.

The patches are picked and re-made from the huge patchset
https://lore.kernel.org/lkml/20211126101209.8613-1-jiangshanlai@gmail.com/
which converts a large chunk of ASM code to C code.

The C version generally has better readability and easier to be
updated/improved.

This smaller patchset converts error_entry() only.
The equivalent ASM macros are not removed because they are still used by
the IST exceptions.

No functional change intended and comments are also copied.

The complier generates very similar code for the C code and the original
ASM code except minor differences.

The complier uses tail-call-optimization for calling sync_regs().  It
uses "JMP sync_regs" while the ASM code uses "CALL+RET".

The compiler generates "AND $0xe7,%ah" (3 bytes) for the code
"cr3 = user_cr3 & ~PTI_USER_PGTABLE_AND_PCID_MASK" while the ASM code in
SWITCH_TO_KERNEL_CR3() results "AND $0xffffffffffffe7ff,%rax" (6 bytes).

The compiler generates lengthier code for "cr3 |= X86_CR3_PCID_NOFLUSH"
because it uses "MOVABS+OR" (13 bytes)  rather than a single
"BTS" (5 bytes).

ALTERNATIVE and static_cpu_has() are also different which depends on
what alternative instructions for ALTERNATIVE are.

[V2]: https://lore.kernel.org/lkml/20220516131739.521817-1-jiangshanlai@gmail.com/
[V1]: https://lore.kernel.org/lkml/20220511072747.3960-1-jiangshanlai@gmail.com/

Changed from V2:
	Fix conflict in arch/x86/include/asm/proto.h in patch7

Changed from V1:
	remove unneeded cleanup in patch2

Changed from the old huge patchset:
	squash some patches

Lai Jiangshan (7):
  x86/entry: Introduce __entry_text for entry code written in C
  x86/entry: Move PTI_USER_* to arch/x86/include/asm/processor-flags.h
  x86: Mark __native_read_cr3() & native_write_cr3() as __always_inline
  x86/entry: Add arch/x86/entry/entry64.c for C entry code
  x86/entry: Add the C verion of SWITCH_TO_KERNEL_CR3 as
    switch_to_kernel_cr3()
  x86/traps: Add fence_swapgs_{user,kernel}_entry() and
    user_entry_swapgs_and_fence()
  x86/entry: Implement the whole error_entry() as C code

 arch/x86/entry/Makefile                |   3 +-
 arch/x86/entry/calling.h               |  10 --
 arch/x86/entry/entry64.c               | 137 +++++++++++++++++++++++++
 arch/x86/entry/entry_64.S              |  85 +--------------
 arch/x86/include/asm/idtentry.h        |   3 +
 arch/x86/include/asm/processor-flags.h |  15 +++
 arch/x86/include/asm/proto.h           |   1 +
 arch/x86/include/asm/special_insns.h   |   4 +-
 arch/x86/include/asm/traps.h           |   1 +
 arch/x86/kernel/traps.c                |   2 -
 include/linux/compiler_types.h         |   8 +-
 11 files changed, 169 insertions(+), 100 deletions(-)
 create mode 100644 arch/x86/entry/entry64.c

-- 
2.19.1.6.gb485710b
Re: [PATCH V3 0/7] x86/entry: Convert error_entry() to C code
Posted by Vegard Nossum 3 years, 10 months ago
On 6/6/22 16:45, Lai Jiangshan wrote:
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
> 
> Add some C equivalent functions of the ASM macros and implement the whole
> error_entry() as C code.

Hi,

I did some testing of your patches (on top of mainline commit
34f4335c16a5) and I see these two KASAN reports very occasionally during
boot:

1)

Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
==================================================================
BUG: KASAN: wild-memory-access in rcu_nmi_enter+0x6e/0xf0
Read of size 4 at addr ff11000034e38b10 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-14003-ga9afe081e27d
#1787
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x57/0x7d
 ? rcu_nmi_enter+0x6e/0xf0
 print_report.cold+0x188/0x667
 ? rcu_nmi_enter+0x6e/0xf0
 kasan_report+0x8a/0x1b0
 ? rcu_nmi_enter+0x6e/0xf0
 kasan_check_range+0x14d/0x1d0
 rcu_nmi_enter+0x6e/0xf0
 irqentry_enter+0x33/0x50
 common_interrupt+0x15/0xc0
 asm_common_interrupt+0x2a/0x40
RIP: 0010:identify_cpu+0x3db/0x19f0
Code: b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 d3 14
00 00 41 c7 84 24 80 00 00 00 00 00 00 00 31 c0 89 c1 0f a2 <41> 89 54
24 7c 4c 89 e2 41 89 44 24 1c 48 c1 ea 03 48 b8 00 00 00
RSP: 0000:ffffffff94807da0 EFLAGS: 00000246
RAX: 000000000000001b RBX: 00000000756e6547 RCX: 000000006c65746e
RDX: 0000000049656e69 RSI: 0000000000000000 RDI: ffffffff9509e6e0
RBP: ffffffff9509e675 R08: ffffffff9509e67c R09: 0000000000000000
R10: ffffffff9509e668 R11: fffffbfff2a13cce R12: ffffffff9509e660
R13: ffffffff9509e6e8 R14: ffffffff9509e678 R15: ffffffff9509e674
 identify_boot_cpu+0xd/0xb5
 check_bugs+0x82/0x15b5
 ? l1tf_cmdline+0x10c/0x10c
 ? do_raw_spin_unlock+0x4f/0x250
 ? _raw_spin_unlock+0x24/0x40
 ? poking_init+0x350/0x37f
 ? parse_direct_gbpages_off+0xd/0xd
 ? rcu_read_lock_bh_held+0xc0/0xc0
 start_kernel+0x38c/0x3bb
 secondary_startup_64_no_verify+0xe0/0xeb
 </TASK>
==================================================================
Disabling lock debugging due to kernel taint
x86/cpu: User Mode Instruction Prevention (UMIP) activated
numa_add_cpu cpu 0 node 0: mask now 0

2)

x86/cpu: User Mode Instruction Prevention (UMIP) activated
numa_add_cpu cpu 0 node 0: mask now 0
Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
Spectre V1 : Vulnerable: __user pointer sanitization and usercopy
barriers only; no swapgs barriers
Spectre V2 : Mitigation: Enhanced IBRS
Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context
switch
Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction
Barrier
Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled
TAA: Mitigation: TSX disabled
==================================================================
BUG: KASAN: out-of-bounds in rcu_nmi_enter+0x6e/0xf0
Read of size 4 at addr ff11000034e38b10 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc1-14003-ga9afe081e27d
#1787
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x57/0x7d
 print_report.cold+0x9c/0x667
 ? rcu_nmi_enter+0x6e/0xf0
 kasan_report+0x8a/0x1b0
 ? rcu_nmi_enter+0x6e/0xf0
 kasan_check_range+0x14d/0x1d0
 rcu_nmi_enter+0x6e/0xf0
 irqentry_enter+0x33/0x50
 common_interrupt+0x15/0xc0
 asm_common_interrupt+0x2a/0x40
RIP: 0010:text_poke_early+0x97/0x9e
Code: 8e 00 f8 fd 48 85 db 74 01 fb eb 05 0f 1f 00 eb 16 8c d0 50 54 48
83 04 24 08 9c 8c c8 50 68 13 ec 12 b1 48 cf eb 03 0f 01 e8 <5b> 5d 41
5c 41 5d c3 41 57 b8 ff ff 37 00 41 56 48 c1 e0 2a 41 55
RSP: 0000:ffffffffb0007c10 EFLAGS: 00000292
RAX: 0000000000000010 RBX: 0000000000000200 RCX: 1ffffffff61137c9
RDX: 0000000000000000 RSI: ffffffffaf69f720 RDI: ffffffffaf7f9ce0
RBP: ffffffffacc6902c R08: 0000000000000001 R09: 0000000000000001
R10: ffffffffacc69030 R11: fffffbfff598d206 R12: ffffffffb0007c80
R13: 0000000000000005 R14: ffffffffb0007c85 R15: ffffffffb0007c80
 ? kasan_check_range+0x1c/0x1d0
 ? kasan_check_range+0x20/0x1d0
 ? kasan_check_range+0x1c/0x1d0
 apply_alternatives+0x79e/0x81d
 ? text_poke_early+0x9e/0x9e
 ? xas_clear_mark+0x1df/0x270
 ? apply_retpolines+0x4e4/0x535
 ? apply_alternatives+0x81d/0x81d
 ? delay_halt+0x31/0x60
 ? delay_halt+0x40/0x60
 ? delay_halt+0x36/0x60
 ? optimize_nops+0x225/0x225
 alternative_instructions+0x43/0x11a
 check_bugs+0x14fb/0x15b5
 ? l1tf_cmdline+0x10c/0x10c
 ? do_raw_spin_unlock+0x4f/0x250
 ? quirk_disable_msi.part.0+0x72/0x73
 ? poking_init+0x350/0x37f
 ? parse_direct_gbpages_off+0xd/0xd
 ? rcu_read_lock_bh_held+0xc0/0xc0
 start_kernel+0x38c/0x3bb
 secondary_startup_64_no_verify+0xe0/0xeb
 </TASK>

The buggy address belongs to the physical page:
page:ffd4000000d38e00 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x34e38
flags: 0x100000000001000(reserved|node=0|zone=1)
raw: 0100000000001000 ffd4000000d38e08 ffd4000000d38e08 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ff11000034e38a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ff11000034e38a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ff11000034e38b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                            ^
 ff11000034e38b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ff11000034e38c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
debug: unmapping init [mem 0xffffffffb12b8000-0xffffffffb12c2fff]
smpboot: CPU0: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (family:
0x6, model: 0x6a, stepping: 0x6)

I'll try to get more details (boot options, configs, full logs, etc.), I
just wanted to give a heads up in case anything sticks out.


Vegard
Re: [PATCH V3 0/7] x86/entry: Convert error_entry() to C code
Posted by Vegard Nossum 3 years, 10 months ago
On 6/9/22 11:11, Vegard Nossum wrote:
> On 6/6/22 16:45, Lai Jiangshan wrote:
>> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>>
>> Add some C equivalent functions of the ASM macros and implement the whole
>> error_entry() as C code.
> 
> Hi,
> 
> I did some testing of your patches (on top of mainline commit
> 34f4335c16a5) and I see these two KASAN reports very occasionally during
> boot:
> 
> 1)
> 
> Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
> ==================================================================
> BUG: KASAN: wild-memory-access in rcu_nmi_enter+0x6e/0xf0

So this one I get without your patches as well. It's only about 1% of
boots, though. Let me try to bisect this and start a new thread.

> 2)
>
> BUG: KASAN: out-of-bounds in rcu_nmi_enter+0x6e/0xf0
> Read of size 4 at addr ff11000034e38b10 by task swapper/0/0

I haven't seen this without your patches, although it's the exact same
callsite so I assume it must be related to the first problem.


Vegard
Re: [PATCH V3 0/7] x86/entry: Convert error_entry() to C code
Posted by Lai Jiangshan 3 years, 10 months ago
On Mon, Jun 6, 2022 at 10:44 PM Lai Jiangshan <jiangshanlai@gmail.com> wrote:
>
> From: Lai Jiangshan <jiangshan.ljs@antgroup.com>
>
> Add some C equivalent functions of the ASM macros and implement the whole
> error_entry() as C code.
>
> The patches are picked and re-made from the huge patchset
> https://lore.kernel.org/lkml/20211126101209.8613-1-jiangshanlai@gmail.com/
> which converts a large chunk of ASM code to C code.
>
> The C version generally has better readability and easier to be
> updated/improved.
>
> This smaller patchset converts error_entry() only.
> The equivalent ASM macros are not removed because they are still used by
> the IST exceptions.
>
> No functional change intended and comments are also copied.
>
> The complier generates very similar code for the C code and the original
> ASM code except minor differences.
>
> The complier uses tail-call-optimization for calling sync_regs().  It
> uses "JMP sync_regs" while the ASM code uses "CALL+RET".
>
> The compiler generates "AND $0xe7,%ah" (3 bytes) for the code
> "cr3 = user_cr3 & ~PTI_USER_PGTABLE_AND_PCID_MASK" while the ASM code in
> SWITCH_TO_KERNEL_CR3() results "AND $0xffffffffffffe7ff,%rax" (6 bytes).
>
> The compiler generates lengthier code for "cr3 |= X86_CR3_PCID_NOFLUSH"
> because it uses "MOVABS+OR" (13 bytes)  rather than a single
> "BTS" (5 bytes).
>
> ALTERNATIVE and static_cpu_has() are also different which depends on
> what alternative instructions for ALTERNATIVE are.
>
> [V2]: https://lore.kernel.org/lkml/20220516131739.521817-1-jiangshanlai@gmail.com/
> [V1]: https://lore.kernel.org/lkml/20220511072747.3960-1-jiangshanlai@gmail.com/
>
> Changed from V2:
>         Fix conflict in arch/x86/include/asm/proto.h in patch7
>
> Changed from V1:
>         remove unneeded cleanup in patch2
>
> Changed from the old huge patchset:
>         squash some patches
>

Hello, ALL,

Ping

Thanks,
Lai