[PATCH 1/2] x86: make profile_pc() use arch_stack_walk()

Qi Zheng posted 2 patches 2 years, 10 months ago
[PATCH 1/2] x86: make profile_pc() use arch_stack_walk()
Posted by Qi Zheng 2 years, 10 months ago
The profile_pc() try to get pc by doing a trick to read
the contents of the stack. This may cause false positives
for KASAN, like the following:

 BUG: KASAN: stack-out-of-bounds in profile_pc+0x5b/0x90
 Read of size 8 at addr ffff8881062a7a00 by task id/130040

 CPU: 1 PID: 130040 Comm: id Kdump: loaded Not tainted 5.15.93+ #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
 Call Trace:
  <IRQ>
  dump_stack_lvl+0x4c/0x64
  ? profile_pc+0x5b/0x90
  print_address_description.constprop.8.cold.12+0x10/0x36b
  ? profile_pc+0x5b/0x90
  ? profile_pc+0x5b/0x90
  ? tick_sched_handle.isra.20+0xa0/0xa0
  kasan_report.cold.13+0x7f/0x11b
  ? scheduler_tick+0x30/0x150
  ? profile_pc+0x5b/0x90
  ? _raw_spin_lock+0x82/0xd0
  profile_pc+0x5b/0x90
  profile_tick+0x78/0xb0
  ? tick_sched_handle.isra.20+0x83/0xa0
  tick_sched_timer+0x94/0xb0
  ? enqueue_hrtimer+0x100/0x100
  ? _raw_write_lock_irqsave+0xd0/0xd0
  ? recalibrate_cpu_khz+0x10/0x10
  ? ktime_get_update_offsets_now+0x148/0x1a0
  hrtimer_interrupt+0x1b9/0x390
  ? sched_ttwu_pending+0xf1/0x150
  __sysvec_apic_timer_interrupt+0x7c/0x150
  sysvec_apic_timer_interrupt+0x61/0x80
  </IRQ>
  <TASK>
  asm_sysvec_apic_timer_interrupt+0x16/0x20
 RIP: 0010:_raw_spin_lock+0x82/0xd0

The KASAN checking is already disabled in the ORC unwinder,
so let's make profile_pc() use arch_stack_walk() to get pc,
which fixes the above BUG and also avoids open-coding of
unwind logic.

Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
---
 arch/x86/kernel/time.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index e42faa792c07..eee884306d36 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -17,6 +17,7 @@
 #include <linux/i8253.h>
 #include <linux/time.h>
 #include <linux/export.h>
+#include <linux/stacktrace.h>
 
 #include <asm/vsyscall.h>
 #include <asm/x86_init.h>
@@ -25,27 +26,24 @@
 #include <asm/hpet.h>
 #include <asm/time.h>
 
+static bool profile_pc_cb(void *arg, unsigned long pc)
+{
+	unsigned long *prof_pc = arg;
+
+	if (in_lock_functions(pc))
+		return true;
+
+	*prof_pc = pc;
+	return false;
+}
+
 unsigned long profile_pc(struct pt_regs *regs)
 {
-	unsigned long pc = instruction_pointer(regs);
-
-	if (!user_mode(regs) && in_lock_functions(pc)) {
-#ifdef CONFIG_FRAME_POINTER
-		return *(unsigned long *)(regs->bp + sizeof(long));
-#else
-		unsigned long *sp = (unsigned long *)regs->sp;
-		/*
-		 * Return address is either directly at stack pointer
-		 * or above a saved flags. Eflags has bits 22-31 zero,
-		 * kernel addresses don't.
-		 */
-		if (sp[0] >> 22)
-			return sp[0];
-		if (sp[1] >> 22)
-			return sp[1];
-#endif
-	}
-	return pc;
+	unsigned long prof_pc = 0;
+
+	arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
+
+	return prof_pc;
 }
 EXPORT_SYMBOL(profile_pc);
 
-- 
2.20.1
Re: [PATCH 1/2] x86: make profile_pc() use arch_stack_walk()
Posted by Josh Poimboeuf 2 years, 10 months ago
On Thu, Mar 30, 2023 at 04:15:51PM +0800, Qi Zheng wrote:
> The profile_pc() try to get pc by doing a trick to read
> the contents of the stack. This may cause false positives
> for KASAN, like the following:
> 
>  BUG: KASAN: stack-out-of-bounds in profile_pc+0x5b/0x90
>  Read of size 8 at addr ffff8881062a7a00 by task id/130040

I don't think this was actually a false positive.  The !FRAME_POINTER
code in profile_pc() has been badly broken for many years.

BTW, there was a similar patch here:

  https://lore.kernel.org/lkml/20230224021858.120078-1-chenzhongjin@huawei.com/

I thought CONFIG_PROFILING was obsolete but Andi said previously he
wants to keep it for at least boot-time profiling.

Andi did suggest removing the lock profiling hacks, which means all the
profile_pc() implementations can just be removed in favor of the generic
instruction_pointer().

-- 
Josh
Re: [PATCH 1/2] x86: make profile_pc() use arch_stack_walk()
Posted by Qi Zheng 2 years, 10 months ago

On 2023/4/8 12:56, Josh Poimboeuf wrote:
> On Thu, Mar 30, 2023 at 04:15:51PM +0800, Qi Zheng wrote:
>> The profile_pc() try to get pc by doing a trick to read
>> the contents of the stack. This may cause false positives
>> for KASAN, like the following:
>>
>>   BUG: KASAN: stack-out-of-bounds in profile_pc+0x5b/0x90
>>   Read of size 8 at addr ffff8881062a7a00 by task id/130040
> 
> I don't think this was actually a false positive.  The !FRAME_POINTER
> code in profile_pc() has been badly broken for many years.
> 
> BTW, there was a similar patch here:
> 
>    https://lore.kernel.org/lkml/20230224021858.120078-1-chenzhongjin@huawei.com/

Ah.

> 
> I thought CONFIG_PROFILING was obsolete but Andi said previously he
> wants to keep it for at least boot-time profiling.
> 
> Andi did suggest removing the lock profiling hacks, which means all the
> profile_pc() implementations can just be removed in favor of the generic
> instruction_pointer().

That's great, and I see Chen Zhongjin will send a new patch for this,
let him continue this work. :)

> 

-- 
Thanks,
Qi