softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

[PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Xin Zhao 4 weeks, 1 day ago

In RT-Linux, when enabling CONFIG_PREEMPT_RT_NEEDS_BH_LOCK, calling
__local_bh_disable_ip() when !preemptible() is illegal because it uses
local_lock, which might sleep. The only exception is during the
cgroup_init() logic in start_kernel() while preemption is disabled,
cgroup_init() calls cgroup_idr_alloc(), which calls spin_lock_bh().
It is sufficient to only exclude the system startup phase in macro
DEBUG_LOCKS_WARN_ON.

Although the original check of this_cpu_read(softirq_ctrl.cnt) can also
prevent the WARN_ON print during the boot process, it may hide some issues
that should be exposed immediately. Because softirq_ctrl.cnt maybe 0 when
__local_bh_disabled_ip() is called in !preemptible() context.

In RT-Linux, __local_bh_disable_ip() will be used by numerous _bh variants
locks and local_bh_disable(). Since locks call __might_resched() check, we
analyze the scenario of using local_bh_disable() in !preemptible() context.

If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled, __local_bh_disable_ip()
does not enter the local_lock lock and thus using local_bh_disable() in
!preemptible() context does not lead to might sleep problem, but using
local_bh_disable() in !preemptible() state is not meaningful in RT-Linux.

In non RT-Linux, we use local_irq_save() followed by local_bh_disable() to
keep soft interrupts disabled after restoring interrupts. However, in
RT-Linux, when CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled,
local_bh_disable() merely increments the softirq_ctrl.cnt counter without
actually disabling the soft interrupt behavior, because other tasks on the
CPU can preempt the task that wants to disable soft interrupts and execute
soft interrupt-related logic.
Consider the sequence diagram below:
Task A                        Task B
                              __local_bh_enable_ip()
                                __do_softirq()
                                  handle_softirqs()
                                    ...
                                    local_irq_enable();
                                    ...
local_irq_save()
local_bh_disable()
local_irq_restore()
                                    h->action(); -- it is serving softirq
local_bh_enable()

Signed-off-by: Xin Zhao <jackzxcui1989@163.com>
---
 kernel/softirq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 77198911b..320a52583 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -173,7 +173,7 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 			/* Required to meet the RCU bottomhalf requirements. */
 			rcu_read_lock();
 		} else {
-			DEBUG_LOCKS_WARN_ON(this_cpu_read(softirq_ctrl.cnt));
+			DEBUG_LOCKS_WARN_ON(system_state != SYSTEM_BOOTING);
 		}
 	}
 
-- 
2.34.1

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Sebastian Andrzej Siewior 4 weeks ago

On 2026-03-10 19:55:34 [+0800], Xin Zhao wrote:
> In RT-Linux, when enabling CONFIG_PREEMPT_RT_NEEDS_BH_LOCK, calling
> __local_bh_disable_ip() when !preemptible() is illegal because it uses
> local_lock, which might sleep. The only exception is during the
> cgroup_init() logic in start_kernel() while preemption is disabled,
> cgroup_init() calls cgroup_idr_alloc(), which calls spin_lock_bh().
> It is sufficient to only exclude the system startup phase in macro
> DEBUG_LOCKS_WARN_ON.

No, cgroup_init() should not be quoted as an exception. It is okay to
exclude cases while the scheduler is not active because here lock
contention can not happen.

> Although the original check of this_cpu_read(softirq_ctrl.cnt) can also
> prevent the WARN_ON print during the boot process, it may hide some issues
> that should be exposed immediately. Because softirq_ctrl.cnt maybe 0 when
> __local_bh_disabled_ip() is called in !preemptible() context.

That is actually the point. If it is known that the call chain does not
origin from bh-disabled context the it is fine. Well, not fine if you
stick to the details but good enough if you don't to constantly complain
to everyone about the little things which don't make a difference.

> In RT-Linux, __local_bh_disable_ip() will be used by numerous _bh variants
> locks and local_bh_disable(). Since locks call __might_resched() check, we
> analyze the scenario of using local_bh_disable() in !preemptible() context.
> 
> If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled, __local_bh_disable_ip()
> does not enter the local_lock lock and thus using local_bh_disable() in
> !preemptible() context does not lead to might sleep problem, but using
> local_bh_disable() in !preemptible() state is not meaningful in RT-Linux.

but it does not cause a locking or scheduling problem either.

> In non RT-Linux, we use local_irq_save() followed by local_bh_disable() to
> keep soft interrupts disabled after restoring interrupts. However, in
> RT-Linux, when CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled,
> local_bh_disable() merely increments the softirq_ctrl.cnt counter without
> actually disabling the soft interrupt behavior, because other tasks on the
> CPU can preempt the task that wants to disable soft interrupts and execute
> soft interrupt-related logic.
> Consider the sequence diagram below:
> Task A                        Task B
>                               __local_bh_enable_ip()
>                                 __do_softirq()
>                                   handle_softirqs()
>                                     ...
>                                     local_irq_enable();
>                                     ...
> local_irq_save()
> local_bh_disable()
> local_irq_restore()
>                                     h->action(); -- it is serving softirq
> local_bh_enable()

Okay. How is this a problem? You can enter this scenario event even
without disabling interrupts within Task A.

> Signed-off-by: Xin Zhao <jackzxcui1989@163.com>

Sebastian

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Xin Zhao 4 weeks ago

hi, Sebastian

On 2026-03-11 9:33 UTC, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2026-03-10 19:55:34 [+0800], Xin Zhao wrote:
> > In RT-Linux, when enabling CONFIG_PREEMPT_RT_NEEDS_BH_LOCK, calling
> > __local_bh_disable_ip() when !preemptible() is illegal because it uses
> > local_lock, which might sleep. The only exception is during the
> > cgroup_init() logic in start_kernel() while preemption is disabled,
> > cgroup_init() calls cgroup_idr_alloc(), which calls spin_lock_bh().
> > It is sufficient to only exclude the system startup phase in macro
> > DEBUG_LOCKS_WARN_ON.
> 
> No, cgroup_init() should not be quoted as an exception. It is okay to
> exclude cases while the scheduler is not active because here lock
> contention can not happen.

Yes, I shouldn't quote cgroup_init().

> > Although the original check of this_cpu_read(softirq_ctrl.cnt) can also
> > prevent the WARN_ON print during the boot process, it may hide some issues
> > that should be exposed immediately. Because softirq_ctrl.cnt maybe 0 when
> > __local_bh_disabled_ip() is called in !preemptible() context.
> 
> That is actually the point. If it is known that the call chain does not
> origin from bh-disabled context the it is fine. Well, not fine if you
> stick to the details but good enough if you don't to constantly complain
> to everyone about the little things which don't make a difference.

I just think that using (system_state != SYSTEM_BOOTING) for conditional checks
is more reasonable than using (softirq_ctrl.cnt).

> > In RT-Linux, __local_bh_disable_ip() will be used by numerous _bh variants
> > locks and local_bh_disable(). Since locks call __might_resched() check, we
> > analyze the scenario of using local_bh_disable() in !preemptible() context.
> > 
> > If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled, __local_bh_disable_ip()
> > does not enter the local_lock lock and thus using local_bh_disable() in
> > !preemptible() context does not lead to might sleep problem, but using
> > local_bh_disable() in !preemptible() state is not meaningful in RT-Linux.
> 
> but it does not cause a locking or scheduling problem either.
> 
> > In non RT-Linux, we use local_irq_save() followed by local_bh_disable() to
> > keep soft interrupts disabled after restoring interrupts. However, in
> > RT-Linux, when CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled,
> > local_bh_disable() merely increments the softirq_ctrl.cnt counter without
> > actually disabling the soft interrupt behavior, because other tasks on the
> > CPU can preempt the task that wants to disable soft interrupts and execute
> > soft interrupt-related logic.
> > Consider the sequence diagram below:
> > Task A                        Task B
> >                               __local_bh_enable_ip()
> >                                 __do_softirq()
> >                                   handle_softirqs()
> >                                     ...
> >                                     local_irq_enable();
> >                                     ...
> > local_irq_save()
> > local_bh_disable()
> > local_irq_restore()
> >                                     h->action(); -- it is serving softirq
> > local_bh_enable()
> 
> Okay. How is this a problem? You can enter this scenario event even
> without disabling interrupts within Task A.

I should use the situation where CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled to
illustrate the example above. People would expect that soft interrupts on this
CPU would not execute after calling local_bh_disable(), but as shown in the
example, this cannot actually be guaranteed.

Using (softirq_ctrl.cnt) as a condition for WARN_ON will only report issues in
scenarios similar to the one shown in the example. In many other cases, during
the execution of Task A, if there is no soft interrupt-related logic running on
the CPU, this WARN_ON may not get printed, thus hiding potential issues in the
code.

If Task A truly expects that soft interrupts remain disabled after calling
local_bh_disable(), and this expectation may not be met, we should detect this
risk early and prompt the user, just like the might_sleep() checks, providing
warnings proactively rather than indicating an issue after it has occurred.

Xin Zhao

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Sebastian Andrzej Siewior 4 weeks ago

On 2026-03-11 18:40:07 [+0800], Xin Zhao wrote:
> hi, Sebastian
Hi Xin,

> > > Although the original check of this_cpu_read(softirq_ctrl.cnt) can also
> > > prevent the WARN_ON print during the boot process, it may hide some issues
> > > that should be exposed immediately. Because softirq_ctrl.cnt maybe 0 when
> > > __local_bh_disabled_ip() is called in !preemptible() context.
> > 
> > That is actually the point. If it is known that the call chain does not
> > origin from bh-disabled context the it is fine. Well, not fine if you
> > stick to the details but good enough if you don't to constantly complain
> > to everyone about the little things which don't make a difference.
> 
> I just think that using (system_state != SYSTEM_BOOTING) for conditional checks
> is more reasonable than using (softirq_ctrl.cnt).

We had users which which did
	local_irq_disable();
	local_bh_disable();

and we decided not to bother if there is no reason to.

> > > In RT-Linux, __local_bh_disable_ip() will be used by numerous _bh variants
> > > locks and local_bh_disable(). Since locks call __might_resched() check, we
> > > analyze the scenario of using local_bh_disable() in !preemptible() context.
> > > 
> > > If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled, __local_bh_disable_ip()
> > > does not enter the local_lock lock and thus using local_bh_disable() in
> > > !preemptible() context does not lead to might sleep problem, but using
> > > local_bh_disable() in !preemptible() state is not meaningful in RT-Linux.
> > 
> > but it does not cause a locking or scheduling problem either.
> > 
> > > In non RT-Linux, we use local_irq_save() followed by local_bh_disable() to
> > > keep soft interrupts disabled after restoring interrupts. However, in
> > > RT-Linux, when CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is not enabled,
> > > local_bh_disable() merely increments the softirq_ctrl.cnt counter without
> > > actually disabling the soft interrupt behavior, because other tasks on the
> > > CPU can preempt the task that wants to disable soft interrupts and execute
> > > soft interrupt-related logic.
> > > Consider the sequence diagram below:
> > > Task A                        Task B
> > >                               __local_bh_enable_ip()
> > >                                 __do_softirq()
> > >                                   handle_softirqs()
> > >                                     ...
> > >                                     local_irq_enable();
> > >                                     ...
> > > local_irq_save()
> > > local_bh_disable()
> > > local_irq_restore()
> > >                                     h->action(); -- it is serving softirq
> > > local_bh_enable()
> > 
> > Okay. How is this a problem? You can enter this scenario event even
> > without disabling interrupts within Task A.
> 
> I should use the situation where CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled to
> illustrate the example above. People would expect that soft interrupts on this
> CPU would not execute after calling local_bh_disable(), but as shown in the
> example, this cannot actually be guaranteed.

If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled then the scenario is
possible but the DEBUG_LOCKS_WARN_ON will trigger a warning.

> Using (softirq_ctrl.cnt) as a condition for WARN_ON will only report issues in
> scenarios similar to the one shown in the example. In many other cases, during
> the execution of Task A, if there is no soft interrupt-related logic running on
> the CPU, this WARN_ON may not get printed, thus hiding potential issues in the
> code.
> 
> If Task A truly expects that soft interrupts remain disabled after calling
> local_bh_disable(), and this expectation may not be met, we should detect this
> risk early and prompt the user, just like the might_sleep() checks, providing
> warnings proactively rather than indicating an issue after it has occurred.

Hmm. Is there actually anything wrong in tree?
Longterm I would intend to make !CONFIG_PREEMPT_RT_NEEDS_BH_LOCK
default.

> Xin Zhao

Sebastian

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Xin Zhao 4 weeks ago

hi, Sebastian

On 2026-03-11 14:51 UTC, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > I just think that using (system_state != SYSTEM_BOOTING) for conditional checks
> > is more reasonable than using (softirq_ctrl.cnt).
> 
> We had users which which did
> 	local_irq_disable();
> 	local_bh_disable();
> 
> and we decided not to bother if there is no reason to.

If there are users have such usage, it will probabilistically hit the print statement
for DEBUG_LOCKS_WARN_ON(softirq_ctrl.cnt). This is because the task running this code
may preempt a task that has already entered the local_bh_disable critical section.
Isn't that right?

> > I should use the situation where CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled to
> > illustrate the example above. People would expect that soft interrupts on this
> > CPU would not execute after calling local_bh_disable(), but as shown in the
> > example, this cannot actually be guaranteed.
> 
> If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled then the scenario is
> possible but the DEBUG_LOCKS_WARN_ON will trigger a warning.

Indeed, it will trigger this warning; it just reduces the probability of it being
reported.

> Hmm. Is there actually anything wrong in tree?
> Longterm I would intend to make !CONFIG_PREEMPT_RT_NEEDS_BH_LOCK
> default.

I am tracing the soft interrupt code interface and found a _local_bh_enable interface
that only exists in non-RT Linux. This _local_bh_enable is used in sclp.c as follows:

    local_irq_disable();
    local_ctl_load(0, &cr0);
    if (!irq_context)
        _local_bh_enable();
    local_tick_enable(old_tick);
    local_irq_restore(flags);

I did not further investigate why the above code in sclp.c is not used in RT-Linux. As
for why _local_bh_enable does not exist in RT-Linux, it may be due to the consideration
that local_bh_disable is "ineffective" in the !preemptible() state in RT-Linux, but I'm
not sure if my understanding is correct.

Since you also mentioned that later CONFIG_PREEMPT_RT_NEEDS_BH_LOCK will no longer be
enabled, at that point, local_bh_disable almost loses its significance. I think it
should either be removed or implemented as a no-op, as it no longer achieves our
expected effect, and it would be better to save some instruction execution time.

Xin Zhao

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Sebastian Andrzej Siewior 4 weeks ago

On 2026-03-11 23:34:02 [+0800], Xin Zhao wrote:
> hi, Sebastian
Hi,

> On 2026-03-11 14:51 UTC, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > > I just think that using (system_state != SYSTEM_BOOTING) for conditional checks
> > > is more reasonable than using (softirq_ctrl.cnt).
> > 
> > We had users which which did
> > 	local_irq_disable();
> > 	local_bh_disable();
> > 
> > and we decided not to bother if there is no reason to.
> 
> If there are users have such usage, it will probabilistically hit the print statement
> for DEBUG_LOCKS_WARN_ON(softirq_ctrl.cnt). This is because the task running this code
> may preempt a task that has already entered the local_bh_disable critical section.
> Isn't that right?

Correct. And that is how got rid of the offenders. If I remember
correctly a few remained during kernel init which were considered not an
issue. But I just booted a few boxes double checking and I don't see the
warning meaning it went away.

> > > I should use the situation where CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled to
> > > illustrate the example above. People would expect that soft interrupts on this
> > > CPU would not execute after calling local_bh_disable(), but as shown in the
> > > example, this cannot actually be guaranteed.
> > 
> > If CONFIG_PREEMPT_RT_NEEDS_BH_LOCK is enabled then the scenario is
> > possible but the DEBUG_LOCKS_WARN_ON will trigger a warning.
> 
> Indeed, it will trigger this warning; it just reduces the probability of it being
> reported.
Yes

> > Hmm. Is there actually anything wrong in tree?
> > Longterm I would intend to make !CONFIG_PREEMPT_RT_NEEDS_BH_LOCK
> > default.
> 
> I am tracing the soft interrupt code interface and found a _local_bh_enable interface
> that only exists in non-RT Linux. This _local_bh_enable is used in sclp.c as follows:

Funny story: I did a grep for the pattern you described and this s390
driver was the only thing that popped up.

>     local_irq_disable();
>     local_ctl_load(0, &cr0);
>     if (!irq_context)
>         _local_bh_enable();
>     local_tick_enable(old_tick);
>     local_irq_restore(flags);
> 
> I did not further investigate why the above code in sclp.c is not used in RT-Linux. As
> for why _local_bh_enable does not exist in RT-Linux, it may be due to the consideration
> that local_bh_disable is "ineffective" in the !preemptible() state in RT-Linux, but I'm
> not sure if my understanding is correct.
> 
> Since you also mentioned that later CONFIG_PREEMPT_RT_NEEDS_BH_LOCK will no longer be
> enabled, at that point, local_bh_disable almost loses its significance. I think it
> should either be removed or implemented as a no-op, as it no longer achieves our
> expected effect, and it would be better to save some instruction execution time.

We can't nop it entirely. local_bh_disable() needs remain a RCU read
section and it needs to ensure that the context does not wonder off to
another CPU. Also we need to count the disable/enable because once we go
back to zero, we need to run callbacks which may have queued up.
And if we queue the softirq on per-task basis rather then per-CPU then
we don't have the problem that one task completes softirqs queued by
another one.

> Xin Zhao

Sebastian

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Xin Zhao 4 weeks ago

hi, Sebastian

On 2026-03-11 16:09 UTC, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > Indeed, it will trigger this warning; it just reduces the probability of it being
> > reported.
> Yes

As you said, the current implementation is good enough. :)
If you think it’s appropriate to change it to (system_state != SYSTEM_BOOTING), you can make
that change later when you git rid of CONFIG_PREEMPT_RT_NEEDS_BH_LOCK. :)

> Funny story: I did a grep for the pattern you described and this s390
> driver was the only thing that popped up.

I'm actually curious why the users of _local_bh_enable, specifically those using the s390
driver, haven't raised the issue that this interface cannot be used in RT-linux. Could it be
that s390 users have never run on RT-linux?

> > Since you also mentioned that later CONFIG_PREEMPT_RT_NEEDS_BH_LOCK will no longer be
> > enabled, at that point, local_bh_disable almost loses its significance. I think it
> > should either be removed or implemented as a no-op, as it no longer achieves our
> > expected effect, and it would be better to save some instruction execution time.
> 
> We can't nop it entirely. local_bh_disable() needs remain a RCU read
> section and it needs to ensure that the context does not wonder off to
> another CPU. Also we need to count the disable/enable because once we go
> back to zero, we need to run callbacks which may have queued up.

I did overlook that local_bh_disable() is also considered an RCU critical section and is
used in conjunction with rcu_read_lock_bh(). Although I saw comments in the code like
"/* Required to meet the RCU bottomhalf requirements. */", I don't fully understand why
local_bh_disable must be treated as an RCU read critical section. Is it simply because the
implementation of rcu_read_lock_bh does not directly call __rcu_read_lock and instead relies
on local_bh_disable to proxy this call? I haven't figured this out, and it seems a bit
strange to me.

> And if we queue the softirq on per-task basis rather then per-CPU then
> we don't have the problem that one task completes softirqs queued by
> another one.

Are you suggesting that the future implementation of soft interrupts might be optimized to
use a per-task approach for queuing and processing soft interrupts? I think this is a very
good attempt, as the current handling of soft interrupts is a bit chaotic. High-priority
tasks often end up passively dealing with many low-priority soft interrupt tasks during
local_bh_disable(), effectively acting as 'ksoftirqd'. This seems unreasonable to me, as
it elevates the priority of low-priority tasks for processing.

If soft interrupt handling could be implemented in a per-task manner, it could even lead to
priority inheritance in the future, and possibly work in conjunction with BH workqueues to
thoroughly resolve the long-standing issues of soft interrupts in RT-linux. In my project,
performance problems are often related to __local_bh_disable_ip and various sporadic
latency spikes caused by migrate_disable(). This is quite frustrating.

Xin Zhao

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Sebastian Andrzej Siewior 3 weeks, 6 days ago

On 2026-03-12 01:01:15 [+0800], Xin Zhao wrote:
> hi, Sebastian
Hi,

> As you said, the current implementation is good enough. :)
> If you think it’s appropriate to change it to (system_state != SYSTEM_BOOTING), you can make
> that change later when you git rid of CONFIG_PREEMPT_RT_NEEDS_BH_LOCK. :)

If I get rid of CONFIG_PREEMPT_RT_NEEDS_BH_LOCK then
!CONFIG_PREEMPT_RT_NEEDS_BH_LOCK becomes the only code and the code in
question will vanish.

> > Funny story: I did a grep for the pattern you described and this s390
> > driver was the only thing that popped up.
> 
> I'm actually curious why the users of _local_bh_enable, specifically those using the s390
> driver, haven't raised the issue that this interface cannot be used in RT-linux. Could it be
> that s390 users have never run on RT-linux?

This driver is very old and s390 does not support PREEMPT_RT. You can
grep for ARCH_SUPPORTS_RT to see who supports it.

> > > Since you also mentioned that later CONFIG_PREEMPT_RT_NEEDS_BH_LOCK will no longer be
> > > enabled, at that point, local_bh_disable almost loses its significance. I think it
> > > should either be removed or implemented as a no-op, as it no longer achieves our
> > > expected effect, and it would be better to save some instruction execution time.
> > 
> > We can't nop it entirely. local_bh_disable() needs remain a RCU read
> > section and it needs to ensure that the context does not wonder off to
> > another CPU. Also we need to count the disable/enable because once we go
> > back to zero, we need to run callbacks which may have queued up.
> 
> I did overlook that local_bh_disable() is also considered an RCU critical section and is
> used in conjunction with rcu_read_lock_bh(). Although I saw comments in the code like
> "/* Required to meet the RCU bottomhalf requirements. */", I don't fully understand why
> local_bh_disable must be treated as an RCU read critical section. Is it simply because the
> implementation of rcu_read_lock_bh does not directly call __rcu_read_lock and instead relies
> on local_bh_disable to proxy this call? I haven't figured this out, and it seems a bit
> strange to me.

local_bh_disable() becomes an implicit RCU read lock section on
!PREEMPT_RT and be must preserve the semantic. 

> > And if we queue the softirq on per-task basis rather then per-CPU then
> > we don't have the problem that one task completes softirqs queued by
> > another one.
> 
> Are you suggesting that the future implementation of soft interrupts might be optimized to
> use a per-task approach for queuing and processing soft interrupts? I think this is a very
> good attempt, as the current handling of soft interrupts is a bit chaotic. High-priority
> tasks often end up passively dealing with many low-priority soft interrupt tasks during
> local_bh_disable(), effectively acting as 'ksoftirqd'. This seems unreasonable to me, as
> it elevates the priority of low-priority tasks for processing.

Yes. Getting rid of that BH lock removed much of the pain. This would one additional piece.

> If soft interrupt handling could be implemented in a per-task manner, it could even lead to
> priority inheritance in the future, and possibly work in conjunction with BH workqueues to
> thoroughly resolve the long-standing issues of soft interrupts in RT-linux. In my project,
> performance problems are often related to __local_bh_disable_ip and various sporadic
> latency spikes caused by migrate_disable(). This is quite frustrating.

Ideally if task X queues soft interrupts, it handles them and a later
task does not observe them. Only a task with higher priority can add
additional softirq work.
If task X queues BLOCK and gets preempted, task Y with higher priority
adds NET_RX, then task Y will handle NET_RX and BLOCK. This can be
avoided by handling the softirqs per-task.
However if both raise NET_RX then task Y will still handle both. This is
because both use the same data structure to queue work, in this case the
list of pending napi devices. In this case threaded napi would work
because it avoids the common data structure.

I am not a big fan of the BH workqueues because you queue work items in
context in which it originates and then it "vanishes". So all the
priorities and so on are gone. Also the work from lower priority tasks
gets mixed with high priority tasks. Not something you desire in
general.
In general you are better off remaining in the threaded interrupt,
completing the work.

> Xin Zhao

Sebastian

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Xin Zhao 3 weeks, 6 days ago

hi, Sebastian

On 2026-03-12 10:05 UTC, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > As you said, the current implementation is good enough. :)
> > If you think it’s appropriate to change it to (system_state != SYSTEM_BOOTING), you can make
> > that change later when you git rid of CONFIG_PREEMPT_RT_NEEDS_BH_LOCK. :)
> 
> If I get rid of CONFIG_PREEMPT_RT_NEEDS_BH_LOCK then
> !CONFIG_PREEMPT_RT_NEEDS_BH_LOCK becomes the only code and the code in
> question will vanish.

Yes, you are right!

> > I'm actually curious why the users of _local_bh_enable, specifically those using the s390
> > driver, haven't raised the issue that this interface cannot be used in RT-linux. Could it be
> > that s390 users have never run on RT-linux?
> 
> This driver is very old and s390 does not support PREEMPT_RT. You can
> grep for ARCH_SUPPORTS_RT to see who supports it.

I see. Thanks.

> > I did overlook that local_bh_disable() is also considered an RCU critical section and is
> > used in conjunction with rcu_read_lock_bh(). Although I saw comments in the code like
> > "/* Required to meet the RCU bottomhalf requirements. */", I don't fully understand why
> > local_bh_disable must be treated as an RCU read critical section. Is it simply because the
> > implementation of rcu_read_lock_bh does not directly call __rcu_read_lock and instead relies
> > on local_bh_disable to proxy this call? I haven't figured this out, and it seems a bit
> > strange to me.
> 
> local_bh_disable() becomes an implicit RCU read lock section on
> !PREEMPT_RT and be must preserve the semantic. 

My current understanding of the statement "local_bh_disable() becomes an implicit RCU read lock
section on !PREEMPT_RT" is as follows:
In a regular Linux system, during the period of local_bh_disable, both preemption and soft
interrupts are disabled, so RCU callbacks cannot be executed. This effectively means that the
progress of the RCU grace period is stalled during the bh disable period. In a PREEMPT_RT system,
RCU callbacks are executed in an RCU context and are not protected by bh disable, so it is
necessary to explicitly mark the RCU read lock state.
I don't know if my understanding is correct.

> > Are you suggesting that the future implementation of soft interrupts might be optimized to
> > use a per-task approach for queuing and processing soft interrupts? I think this is a very
> > good attempt, as the current handling of soft interrupts is a bit chaotic. High-priority
> > tasks often end up passively dealing with many low-priority soft interrupt tasks during
> > local_bh_disable(), effectively acting as 'ksoftirqd'. This seems unreasonable to me, as
> > it elevates the priority of low-priority tasks for processing.
> 
> Yes. Getting rid of that BH lock removed much of the pain. This would one additional piece.

I am looking forward to the per-task softirq optimization. :)

> > If soft interrupt handling could be implemented in a per-task manner, it could even lead to
> > priority inheritance in the future, and possibly work in conjunction with BH workqueues to
> > thoroughly resolve the long-standing issues of soft interrupts in RT-linux. In my project,
> > performance problems are often related to __local_bh_disable_ip and various sporadic
> > latency spikes caused by migrate_disable(). This is quite frustrating.
> 
> Ideally if task X queues soft interrupts, it handles them and a later
> task does not observe them. Only a task with higher priority can add
> additional softirq work.
> If task X queues BLOCK and gets preempted, task Y with higher priority
> adds NET_RX, then task Y will handle NET_RX and BLOCK. This can be
> avoided by handling the softirqs per-task.

It does sound like it can optimize quite a lot. By the way, does per-task
manner calls softirq callbacks just before voluntary switch out or trigger
task_work before return to user-space? 

> However if both raise NET_RX then task Y will still handle both. This is
> because both use the same data structure to queue work, in this case the
> list of pending napi devices. In this case threaded napi would work
> because it avoids the common data structure.

I see. 

> I am not a big fan of the BH workqueues because you queue work items in
> context in which it originates and then it "vanishes". So all the
> priorities and so on are gone. Also the work from lower priority tasks
> gets mixed with high priority tasks. Not something you desire in
> general.
> In general you are better off remaining in the threaded interrupt,
> completing the work.

Indeed, if we queue the soft interrupts triggered by different priority
tasks into a single workqueue, it wouldn't be very appropriate. If we
want to queue them into a bottom-half (bh) workqueue, we would also need
to create a corresponding workqueue for each priority and queue based on
that priority. I previously developed a patch for a real-time workqueue,
which has been used in our project. If certain soft interrupt tasks are
very important and do not require CPU affinity, then queuing them on
other CPUs to execute according to the actual priority needed might
optimize performance to some extent from a real-time perspective.

https://lore.kernel.org/lkml/20251205125445.4154667-1-jackzxcui1989@163.com/

Xin Zhao

Re: [PATCH] softirq: WARN_ON !preemptible() not check softirq cnt in bh disable on RT

Posted by Sebastian Andrzej Siewior 3 weeks, 5 days ago

On 2026-03-12 20:47:53 [+0800], Xin Zhao wrote:
> hi, Sebastian
Hi Xin,

> > 
> > local_bh_disable() becomes an implicit RCU read lock section on
> > !PREEMPT_RT and be must preserve the semantic. 
> 
> My current understanding of the statement "local_bh_disable() becomes an implicit RCU read lock
> section on !PREEMPT_RT" is as follows:
> In a regular Linux system, during the period of local_bh_disable, both preemption and soft
> interrupts are disabled, so RCU callbacks cannot be executed. This effectively means that the
> progress of the RCU grace period is stalled during the bh disable period. In a PREEMPT_RT system,

No, that is not it. A preempt-disabled section can be interrupted by a
softirq. You can also offload the RCU-callbacks from CPU1 to CPU0 at
which point you can run RCU callbacks while CPU1 has interrupts off.
This has not always been like that but this is what we have now.

> RCU callbacks are executed in an RCU context and are not protected by bh disable, so it is
> necessary to explicitly mark the RCU read lock state.
> I don't know if my understanding is correct.

You do call_rcu() at which point the pointer/ callback lands on a list
for clean up. At this point if you can observe the pointer you need to
be in a RCU section to delay processing of the list, delaying the grace
period. If the grace period starts, all new callbacks land on a new
list. In order to process "the previous" list, the rcu_read_lock()
counter needs to get to zero and every CPU needs to schedule once. This
ensures that you are not in a preempt_disable() section (or
bh_disable(), or spin_lock(), or irq off). I think this is correct but
very compressed.

The requirement that preempt_disable() needs to be considered is very
old and comes from "classic RCU" where rcu_read_lock() was
preempt_disable() and some people just did preempt_disable() because
this was the thing before rcu_read_lock() was introduced. And then
spin_lock() did the same, some people rely on it so it needs to be
preserved.

> > Ideally if task X queues soft interrupts, it handles them and a later
> > task does not observe them. Only a task with higher priority can add
> > additional softirq work.
> > If task X queues BLOCK and gets preempted, task Y with higher priority
> > adds NET_RX, then task Y will handle NET_RX and BLOCK. This can be
> > avoided by handling the softirqs per-task.
> 
> It does sound like it can optimize quite a lot. By the way, does per-task
> manner calls softirq callbacks just before voluntary switch out or trigger
> task_work before return to user-space? 

It wouldn't change much. It would run on the exit from most outer BH
section, on local_bh_enable() like it is now. The difference would be
only that callbacks raised in this callbacks would be executed. So say,
TaskX raises NET_RX and BLOCK, gets preempted, TaskY gets in, raises
NET_RX and TASKLET. TaskY runs now callbacks and will do only NET_RX and
TASKLET. Then we schedule back to taskX which does NET_RX (empty) and
BLOCK.

Currently, TaskY would also do BLOCK.

> > I am not a big fan of the BH workqueues because you queue work items in
> > context in which it originates and then it "vanishes". So all the
> > priorities and so on are gone. Also the work from lower priority tasks
> > gets mixed with high priority tasks. Not something you desire in
> > general.
> > In general you are better off remaining in the threaded interrupt,
> > completing the work.
> 
> Indeed, if we queue the soft interrupts triggered by different priority
> tasks into a single workqueue, it wouldn't be very appropriate. If we
> want to queue them into a bottom-half (bh) workqueue, we would also need
> to create a corresponding workqueue for each priority and queue based on
> that priority. I previously developed a patch for a real-time workqueue,
> which has been used in our project. If certain soft interrupt tasks are
> very important and do not require CPU affinity, then queuing them on
> other CPUs to execute according to the actual priority needed might
> optimize performance to some extent from a real-time perspective.
> 
> https://lore.kernel.org/lkml/20251205125445.4154667-1-jackzxcui1989@163.com/

This encodes too much application logic. Having a kthread for a "thing"
is usually better. You can run that kthread either per-CPU or "unbound"
and let userland deal with it by either pinning it to a CPU and/ or
adjusting its priority based on his setup. So it can be more important
than network or less important (depending if your critical real time
work is network related or not). 

> Xin Zhao

Sebastian