[PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c

Alexander Potapenko posted 11 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c
Posted by Alexander Potapenko 3 months, 2 weeks ago
sched_clock() appears to be called from interrupts, producing spurious
coverage, as reported by CONFIG_KCOV_SELFTEST:

  RIP: 0010:__sanitizer_cov_trace_pc_guard+0x66/0xe0 kernel/kcov.c:288
  ...
   fault_in_kernel_space+0x17/0x70 arch/x86/mm/fault.c:1119
   handle_page_fault arch/x86/mm/fault.c:1477
   exc_page_fault+0x56/0x110 arch/x86/mm/fault.c:1538
   asm_exc_page_fault+0x26/0x30 ./arch/x86/include/asm/idtentry.h:623
  RIP: 0010:__sanitizer_cov_trace_pc_guard+0x66/0xe0 kernel/kcov.c:288
  ...
   sched_clock+0x12/0x70 arch/x86/kernel/tsc.c:284
   __lock_pin_lock kernel/locking/lockdep.c:5628
   lock_pin_lock+0xd7/0x180 kernel/locking/lockdep.c:5959
   rq_pin_lock kernel/sched/sched.h:1761
   rq_lock kernel/sched/sched.h:1838
   __schedule+0x3a8/0x4b70 kernel/sched/core.c:6691
   preempt_schedule_irq+0xbf/0x160 kernel/sched/core.c:7090
   irqentry_exit+0x6f/0x90 kernel/entry/common.c:354
   asm_sysvec_reschedule_ipi+0x1a/0x20 ./arch/x86/include/asm/idtentry.h:707
  RIP: 0010:selftest+0x26/0x60 kernel/kcov.c:1223
  ...
   kcov_init+0x81/0xa0 kernel/kcov.c:1252
   do_one_initcall+0x2e1/0x910
   do_initcall_level+0xff/0x160 init/main.c:1319
   do_initcalls+0x4a/0xa0 init/main.c:1335
   kernel_init_freeable+0x448/0x610 init/main.c:1567
   kernel_init+0x24/0x230 init/main.c:1457
   ret_from_fork+0x60/0x90 arch/x86/kernel/process.c:153
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
   </TASK>

Signed-off-by: Alexander Potapenko <glider@google.com>
---
 arch/x86/kernel/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 84cfa179802c3..c08626d348c85 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -43,6 +43,8 @@ KCOV_INSTRUMENT_dumpstack_$(BITS).o			:= n
 KCOV_INSTRUMENT_unwind_orc.o				:= n
 KCOV_INSTRUMENT_unwind_frame.o				:= n
 KCOV_INSTRUMENT_unwind_guess.o				:= n
+# Avoid instrumenting code that produces spurious coverage in interrupts.
+KCOV_INSTRUMENT_tsc.o					:= n
 
 CFLAGS_head32.o := -fno-stack-protector
 CFLAGS_head64.o := -fno-stack-protector
-- 
2.50.0.727.gbf7dc18ff4-goog
Re: [PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c
Posted by Peter Zijlstra 3 months, 1 week ago
On Thu, Jun 26, 2025 at 03:41:48PM +0200, Alexander Potapenko wrote:
> sched_clock() appears to be called from interrupts, producing spurious
> coverage, as reported by CONFIG_KCOV_SELFTEST:

NMI context even. But I'm not sure how this leads to problems. What does
spurious coverage even mean?
Re: [PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c
Posted by Alexander Potapenko 3 months, 1 week ago
On Fri, Jun 27, 2025 at 9:59 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 26, 2025 at 03:41:48PM +0200, Alexander Potapenko wrote:
> > sched_clock() appears to be called from interrupts, producing spurious
> > coverage, as reported by CONFIG_KCOV_SELFTEST:
>
> NMI context even. But I'm not sure how this leads to problems. What does
> spurious coverage even mean?

This leads to KCOV collecting slightly different coverage when
executing the same syscall multiple times.
For syzkaller that means higher chance to pick a less interesting
input incorrectly assuming it produced some new coverage.

There's a similar discussion at
https://lore.kernel.org/all/20240619111936.GK31592@noisy.programming.kicks-ass.net/T/#u
Re: [PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c
Posted by Peter Zijlstra 3 months, 1 week ago
On Fri, Jun 27, 2025 at 12:51:47PM +0200, Alexander Potapenko wrote:
> On Fri, Jun 27, 2025 at 9:59 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jun 26, 2025 at 03:41:48PM +0200, Alexander Potapenko wrote:
> > > sched_clock() appears to be called from interrupts, producing spurious
> > > coverage, as reported by CONFIG_KCOV_SELFTEST:
> >
> > NMI context even. But I'm not sure how this leads to problems. What does
> > spurious coverage even mean?
> 
> This leads to KCOV collecting slightly different coverage when
> executing the same syscall multiple times.
> For syzkaller that means higher chance to pick a less interesting
> input incorrectly assuming it produced some new coverage.
> 
> There's a similar discussion at
> https://lore.kernel.org/all/20240619111936.GK31592@noisy.programming.kicks-ass.net/T/#u

Clearly I'm not remembering any of that :-)

Anyway, looking at kcov again, all the __sanitize_*() hooks seem to have
check_kcov_mode(), which in turn has something like:

 if (!in_task() ..)
   return false;

Which should be filtering out all these things, no? If this filter
'broken' ?
Re: [PATCH v2 01/11] x86: kcov: disable instrumentation of arch/x86/kernel/tsc.c
Posted by Alexander Potapenko 3 months, 1 week ago
> Anyway, looking at kcov again, all the __sanitize_*() hooks seem to have
> check_kcov_mode(), which in turn has something like:
>
>  if (!in_task() ..)
>    return false;
>
> Which should be filtering out all these things, no? If this filter
> 'broken' ?

I think this is one of the cases where we are transitioning to the IRQ
context (so the coverage isn't really interesting for the fuzzer), but
still haven't bumped preempt_count.

In this particular case in_task() is 1, in_softirq_really() is 0, and
preempt_count() is 2.