[PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT

Yunseong Kim posted 4 patches 2 months ago
kernel/kcov.c | 248 +++++++++++++++++++++++++++-----------------------
1 file changed, 134 insertions(+), 114 deletions(-)
[PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
Posted by Yunseong Kim 2 months ago
This patch series resolves a sleeping function called from invalid context
bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.

The regression was introduced by the interaction of two separate patches:
one that made kcov's internal locks sleep on PREEMPT_RT for better latency
(d5d2c51f1e5f), and another that wrapped a kcov call in the USB softirq
path with local_irq_save() to prevent re-entrancy (f85d39dd7ed8).
This combination resulted in an attempt to acquire a sleeping lock from
within an atomic context, causing a kernel BUG.

To resolve this, this series makes the kcov remote path fully compatible
with atomic contexts by converting all its internal locking primitives to
non-sleeping variants. This approach is more robust than conditional
compilation as it creates a single, unified codebase that works correctly
on both RT and non-RT kernels.

The series is structured as follows:

Patch 1 converts the global kcov locks (kcov->lock and kcov_remote_lock)
to use the non-sleeping raw_spinlock_t.

Patch 2 replace the PREEMPT_RT-specific per-CPU local_lock_t back to the
original local_irq_save/restore primitives, making the per-CPU protection
non-sleeping as well.

Patches 3 and 4 are preparatory refactoring. They move the memory
allocation for remote handles out of the locked sections in the
KCOV_REMOTE_ENABLE ioctl path, which is a prerequisite for safely
using raw_spinlock_t as it forbids sleeping functions like kmalloc
within its critical section.

With these changes, I have been able to run syzkaller fuzzing on a
PREEMPT_RT kernel for a full day with no issues reported.

Reproduction details in here.
Link: https://lore.kernel.org/all/20250725201400.1078395-2-ysk@kzalloc.com/t/#u

Signed-off-by: Yunseong Kim <ysk@kzalloc.com>
---

Changes from v2:

	1. Updated kcov_remote_reset() to use raw_spin_lock_irqsave() /
	   raw_spin_unlock_irqrestore() instead of raw_spin_lock() /
	   raw_spin_unlock(), following the interrupt disabling pattern
	   used in the original function that guard kcov_remote_lock.

Changes from v1:

	1. Dropped the #ifdef-based PREEMPT_RT branching.

	2. Convert kcov->lock and kcov_remote_lock from spinlock_t to
	   raw_spinlock_t. This ensures they remain true, non-sleeping
	   spinlocks even on PREEMPT_RT kernels.

	3. Remove the local_lock_t protection for kcov_percpu_data in
	   kcov_remote_start/stop(). Since local_lock_t can also sleep under
	   RT, and the required protection is against local interrupts when
	   accessing per-CPU data, it is replaced with explicit
	   local_irq_save/restore().

	4. Refactor the KCOV_REMOTE_ENABLE path to move memory allocations
	   out of the critical section.

	5. Modify the ioctl handling logic to utilize these pre-allocated
	   structures within the critical section. kcov_remote_add() is
	   modified to accept a pre-allocated structure instead of allocating
	   one internally. All necessary struct kcov_remote structures are now
	   pre-allocated individually in kcov_ioctl() using GFP_KERNEL
	   (allowing sleep) before acquiring the raw spinlocks.

Changes from v0:

	1. On PREEMPT_RT, separated the handling of
	   kcov_remote_start_usb_softirq() and kcov_remote_stop_usb_softirq()
	   to allow sleeping when entering kcov_remote_start_usb() /
	   kcov_remote_stop().

Yunseong Kim (4):
  kcov: Use raw_spinlock_t for kcov->lock and kcov_remote_lock
  kcov: Replace per-CPU local_lock with local_irq_save/restore
  kcov: Separate KCOV_REMOTE_ENABLE ioctl helper function
  kcov: move remote handle allocation outside raw spinlock

 kernel/kcov.c | 248 +++++++++++++++++++++++++++-----------------------
 1 file changed, 134 insertions(+), 114 deletions(-)

base-commit: 186f3edfdd41f2ae87fc40a9ccba52a3bf930994

-- 
2.50.0
Re: [PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
Posted by Steven Rostedt 2 months ago
On Sun,  3 Aug 2025 07:20:41 +0000
Yunseong Kim <ysk@kzalloc.com> wrote:

> This patch series resolves a sleeping function called from invalid context
> bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.
> 
> The regression was introduced by the interaction of two separate patches:
> one that made kcov's internal locks sleep on PREEMPT_RT for better latency

Just so I fully understand this change. It is basically reverting the
"better latency" changes? That is, with KCOV anyone running with PREEMPT_RT
can expect non deterministic latency behavior?

This should be fully documented. I assume this will not be a problem as
kcov is more for debugging and should not be enabled in production.

-- Steve
Re: [PATCH v3 0/4] kcov, usb: Fix invalid context sleep in softirq path on PREEMPT_RT
Posted by Yunseong Kim 2 months ago
Hi Steve,

You're absolutely right to ask for clarification, and I now realize that
I didn’t explain the background clearly enough in my cover letter.

On 8/5/25 1:24 오전, Steven Rostedt wrote:
> On Sun,  3 Aug 2025 07:20:41 +0000
> Yunseong Kim <ysk@kzalloc.com> wrote:
> 
>> This patch series resolves a sleeping function called from invalid context
>> bug that occurs when fuzzing USB with syzkaller on a PREEMPT_RT kernel.
>>
>> The regression was introduced by the interaction of two separate patches:
>> one that made kcov's internal locks sleep on PREEMPT_RT for better latency
> 
> Just so I fully understand this change. It is basically reverting the
> "better latency" changes? That is, with KCOV anyone running with PREEMPT_RT
> can expect non deterministic latency behavior?

The regression results from the interaction of two changes — and in my original
description, I inaccurately characterized one of them as being 
"for better latency." That was misleading.

The first change d5d2c51 replaced spin_lock_irqsave() with local_lock_irqsave()
in KCOV to ensure compatibility with PREEMPT_RT. This avoided using a
potentially sleeping lock with interrupts disabled.
At the time, as Sebastian noted:

 "There is no compelling reason to change the lock type to raw_spin_lock_t...
  Changing it would require to move memory allocation and deallocation outside
  of the locked section."

However, the situation changed after another patch 8fea0c8 converted the USB
HCD tasklet to a BH workqueue. As a result, usb_giveback_urb_bh() began running
with interrupts enabled, and the KCOV remote coverage collection section in
this path became re-entrant. To prevent nested coverage sections — which KCOV
doesn’t support — kcov_remote_start_usb_softirq() was updated to explicitly
disable interrupts during coverage collection f85d39d.

This combination — using a local_lock (which can sleep on RT) alongside
local_irq_save() — inadvertently created a scenario where a sleeping lock was
acquired in atomic context, triggering a kernel BUG on PREEMPT_RT.

So while the original KCOV locking change didn't require raw spinlocks at
the time, it became effectively incompatible with the USB softirq use case once
that path began relying on interrupt disabling for correctness. In this sense,
the "no compelling reason" eventually turned into a "necessary compromise."

To clarify: this patch series doesn't revert the previous change entirely.
It keeps the local_lock behavior for task context (where it's safe and
appropriate), but ensures atomic safety in interrupt/softirq contexts by
using raw spinlocks selectively where needed.

> This should be fully documented. I assume this will not be a problem as
> kcov is more for debugging and should not be enabled in production.
> 
> -- Steve
> 

Thanks again for raising this — I’ll make sure the changelog documents this
interaction more clearly.

Best regards,
Yunseong Kim