softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

[PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 3 weeks ago

The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is
provided for a few legacy tasklet users. The interface is used from
atomic context (which is either softirq or disabled preemption) on
non-PREEMPT_RT an relies on spinning until the tasklet callback
completes.
On PREEMPT_RT the context is never atomic but the busy polling logic
remains. It possible that the thread invoking tasklet_unlock_spin_wait()
has higher priority than the tasklet. If both run on the same CPU the
the tasklet makes no progress and the thread trying to cancel the
tasklet will live-lock the system.
To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/
enable() which utilizes the local_lock_t for synchronisation. This lock
is a central per-CPU BKL and about to be removed.

Acquire a lock in tasklet_action_common() which is held while the
tasklet's callback is invoked. This lock will be acquired from
tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running().
After the tasklet completed tasklet_callback_sync_wait_running() drops
the lock and acquires it again. In order to avoid unlocking the lock
even if there is no cancel request, there is a cb_waiters counter which
is incremented during a cancel request.
Blocking on the lock will PI-boost the tasklet if needed, ensuring
progress is made.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c | 60 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 55 insertions(+), 5 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 513b1945987cc..57a84758b8459 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -805,6 +805,56 @@ static bool tasklet_clear_sched(struct tasklet_struct *t)
 	return false;
 }
 
+#ifdef CONFIG_PREEMPT_RT
+struct tasklet_sync_callback {
+	spinlock_t	cb_lock;
+	atomic_t	cb_waiters;
+};
+
+static DEFINE_PER_CPU(struct tasklet_sync_callback, tasklet_sync_callback) = {
+	.cb_lock	= __SPIN_LOCK_UNLOCKED(tasklet_sync_callback.cb_lock),
+	.cb_waiters	= ATOMIC_INIT(0),
+};
+
+static void tasklet_lock_callback(void)
+{
+	spin_lock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_unlock_callback(void)
+{
+	spin_unlock(this_cpu_ptr(&tasklet_sync_callback.cb_lock));
+}
+
+static void tasklet_callback_cancel_wait_running(void)
+{
+	struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+	atomic_inc(&sync_cb->cb_waiters);
+	spin_lock(&sync_cb->cb_lock);
+	atomic_dec(&sync_cb->cb_waiters);
+	spin_unlock(&sync_cb->cb_lock);
+}
+
+static void tasklet_callback_sync_wait_running(void)
+{
+	struct tasklet_sync_callback *sync_cb = this_cpu_ptr(&tasklet_sync_callback);
+
+	if (atomic_read(&sync_cb->cb_waiters)) {
+		spin_unlock(&sync_cb->cb_lock);
+		spin_lock(&sync_cb->cb_lock);
+	}
+}
+
+#else
+
+static void tasklet_lock_callback(void) { }
+static void tasklet_unlock_callback(void) { }
+static void tasklet_callback_cancel_wait_running(void) { }
+static void tasklet_callback_sync_wait_running(void) { }
+
+#endif
+
 static void tasklet_action_common(struct tasklet_head *tl_head,
 				  unsigned int softirq_nr)
 {
@@ -816,6 +866,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
 	tl_head->tail = &tl_head->head;
 	local_irq_enable();
 
+	tasklet_lock_callback();
 	while (list) {
 		struct tasklet_struct *t = list;
 
@@ -835,6 +886,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
 					}
 				}
 				tasklet_unlock(t);
+				tasklet_callback_sync_wait_running();
 				continue;
 			}
 			tasklet_unlock(t);
@@ -847,6 +899,7 @@ static void tasklet_action_common(struct tasklet_head *tl_head,
 		__raise_softirq_irqoff(softirq_nr);
 		local_irq_enable();
 	}
+	tasklet_unlock_callback();
 }
 
 static __latent_entropy void tasklet_action(void)
@@ -897,12 +950,9 @@ void tasklet_unlock_spin_wait(struct tasklet_struct *t)
 			/*
 			 * Prevent a live lock when current preempted soft
 			 * interrupt processing or prevents ksoftirqd from
-			 * running. If the tasklet runs on a different CPU
-			 * then this has no effect other than doing the BH
-			 * disable/enable dance for nothing.
+			 * running.
 			 */
-			local_bh_disable();
-			local_bh_enable();
+			tasklet_callback_cancel_wait_running();
 		} else {
 			cpu_relax();
 		}
-- 
2.50.1

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by kernel test robot 1 month, 3 weeks ago

Hi Sebastian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.17-rc1 next-20250813]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Sebastian-Andrzej-Siewior/softirq-Provide-a-handshake-for-canceling-tasklets-via-polling-on-PREEMPT_RT/20250812-224928
base:   linus/master
patch link:    https://lore.kernel.org/r/20250812143930.22RBn5BW%40linutronix.de
patch subject: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT
config: csky-randconfig-002-20250813 (https://download.01.org/0day-ci/archive/20250813/202508131652.C4118cYh-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 13.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250813/202508131652.C4118cYh-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202508131652.C4118cYh-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/softirq.c:853:13: warning: 'tasklet_callback_cancel_wait_running' defined but not used [-Wunused-function]
     853 | static void tasklet_callback_cancel_wait_running(void) { }
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/tasklet_callback_cancel_wait_running +853 kernel/softirq.c

   850	
   851	static void tasklet_lock_callback(void) { }
   852	static void tasklet_unlock_callback(void) { }
 > 853	static void tasklet_callback_cancel_wait_running(void) { }
   854	static void tasklet_callback_sync_wait_running(void) { }
   855	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 3 weeks ago

On 2025-08-12 16:39:32 [+0200], To linux-kernel@vger.kernel.org wrote:
> The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is
> provided for a few legacy tasklet users. The interface is used from
> atomic context (which is either softirq or disabled preemption) on
> non-PREEMPT_RT an relies on spinning until the tasklet callback
> completes.
> On PREEMPT_RT the context is never atomic but the busy polling logic
> remains. It possible that the thread invoking tasklet_unlock_spin_wait()
> has higher priority than the tasklet. If both run on the same CPU the
> the tasklet makes no progress and the thread trying to cancel the
> tasklet will live-lock the system.
> To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/
> enable() which utilizes the local_lock_t for synchronisation. This lock
> is a central per-CPU BKL and about to be removed.
> 
> Acquire a lock in tasklet_action_common() which is held while the
> tasklet's callback is invoked. This lock will be acquired from
> tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running().
> After the tasklet completed tasklet_callback_sync_wait_running() drops
> the lock and acquires it again. In order to avoid unlocking the lock
> even if there is no cancel request, there is a cb_waiters counter which
> is incremented during a cancel request.
> Blocking on the lock will PI-boost the tasklet if needed, ensuring
> progress is made.
> 

Tejun, Lai, I noticed that the BH part of workqueue also relies on this
mechanism (__flush_work(), the PREEMPT_RT ifdef).
This is a fairly recent API so there should be no "legacy" users as we
have it the tasklet interface. The majority of users use tasklet_kill()
(or seldom tasklet_unlock_wait()) and not tasklet_unlock_spin_wait().
The plan was to get rid of the spinning API but I didn't manage to get
rid of all users especially since some of the code could not be sanely/
safely converted (+tested).

Does the workqueue-BH code require the canceling from atomic context or
was this just added because the API for BH and non-BH work items is the
same and __cancel_work_sync() allows it?
Could we avoid the busy-waiting for BH work items and rely on the
wait_for_completion() below or do we need something similar to what I
added here for the tasklet API?

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 3 weeks ago

Hello,

On Tue, Aug 12, 2025 at 04:53:59PM +0200, Sebastian Andrzej Siewior wrote:
> Does the workqueue-BH code require the canceling from atomic context or
> was this just added because the API for BH and non-BH work items is the
> same and __cancel_work_sync() allows it?
> Could we avoid the busy-waiting for BH work items and rely on the
> wait_for_completion() below or do we need something similar to what I
> added here for the tasklet API?

The intention is to convert all BH users to workqueue-BH and remove BH
(that's what Linus wants and why workqueue-BH came to be), so the APIs
should be able to match up, I'm afraid. There were some attempts at pushing
the conversion but we've only made minimal progress. If you're looking at BH
users anyway and feel like it, please feel free to convert them.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 3 weeks ago

On 2025-08-12 09:38:50 [-1000], Tejun Heo wrote:
> Hello,
Hi Tejun,

> On Tue, Aug 12, 2025 at 04:53:59PM +0200, Sebastian Andrzej Siewior wrote:
> > Does the workqueue-BH code require the canceling from atomic context or
> > was this just added because the API for BH and non-BH work items is the
> > same and __cancel_work_sync() allows it?
> > Could we avoid the busy-waiting for BH work items and rely on the
> > wait_for_completion() below or do we need something similar to what I
> > added here for the tasklet API?
> 
> The intention is to convert all BH users to workqueue-BH and remove BH
> (that's what Linus wants and why workqueue-BH came to be), so the APIs
> should be able to match up, I'm afraid. There were some attempts at pushing
> the conversion but we've only made minimal progress. If you're looking at BH
> users anyway and feel like it, please feel free to convert them.

I understand this but I am talking about legacy users:

| drivers/atm/eni.c:      tasklet_disable_in_atomic(&ENI_DEV(vcc->dev)->task);
| drivers/net/wireless/ath/ath9k/beacon.c:        tasklet_disable_in_atomic(&sc->bcon_tasklet);
| drivers/pci/controller/pci-hyperv.c:    tasklet_disable_in_atomic(&channel->callback_event);

This is what is left. (There is also i915 but this is "special").
So we are talking about establishing an API and behaviour for those here
after we painfully managed converting everyone else away:

| git grep 'tasklet_unlock_wait([^s]' | wc -l
| 5
| git grep 'tasklet_disable([^s]' | wc -l
| 97
| git grep 'tasklet_kill([^s]' | wc -l
| 304

While I think it could be possible with upstream's help to avoid the
in-atomic bits for atk9k and hyperv I lost all hope ) for the ATM
driver.

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 3 weeks ago

Hello,

On Wed, Aug 13, 2025 at 08:33:11AM +0200, Sebastian Andrzej Siewior wrote:
...
> > The intention is to convert all BH users to workqueue-BH and remove BH
> > (that's what Linus wants and why workqueue-BH came to be), so the APIs
> > should be able to match up, I'm afraid. There were some attempts at pushing
> > the conversion but we've only made minimal progress. If you're looking at BH
> > users anyway and feel like it, please feel free to convert them.
> 
> I understand this but I am talking about legacy users:
> 
> | drivers/atm/eni.c:      tasklet_disable_in_atomic(&ENI_DEV(vcc->dev)->task);
> | drivers/net/wireless/ath/ath9k/beacon.c:        tasklet_disable_in_atomic(&sc->bcon_tasklet);
> | drivers/pci/controller/pci-hyperv.c:    tasklet_disable_in_atomic(&channel->callback_event);
> 
> This is what is left. (There is also i915 but this is "special").
> So we are talking about establishing an API and behaviour for those here
> after we painfully managed converting everyone else away:

Right, given how early in conversion, we can definitely leave this as
something to think about later. I have no objection to leave it be for now.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 2 weeks ago

On 2025-08-13 08:05:34 [-1000], Tejun Heo wrote:
> Hello,
Hi Tejun,

> On Wed, Aug 13, 2025 at 08:33:11AM +0200, Sebastian Andrzej Siewior wrote:
> ...
> > > The intention is to convert all BH users to workqueue-BH and remove BH
> > > (that's what Linus wants and why workqueue-BH came to be), so the APIs
> > > should be able to match up, I'm afraid. There were some attempts at pushing
> > > the conversion but we've only made minimal progress. If you're looking at BH
> > > users anyway and feel like it, please feel free to convert them.
> > 
> > I understand this but I am talking about legacy users:
> > 
> > | drivers/atm/eni.c:      tasklet_disable_in_atomic(&ENI_DEV(vcc->dev)->task);
> > | drivers/net/wireless/ath/ath9k/beacon.c:        tasklet_disable_in_atomic(&sc->bcon_tasklet);
> > | drivers/pci/controller/pci-hyperv.c:    tasklet_disable_in_atomic(&channel->callback_event);
> > 
> > This is what is left. (There is also i915 but this is "special").
> > So we are talking about establishing an API and behaviour for those here
> > after we painfully managed converting everyone else away:
> 
> Right, given how early in conversion, we can definitely leave this as
> something to think about later. I have no objection to leave it be for now.

Okay. Do I need to update __flush_work() in anyway to make it obvious?
The local_bh_disable()/ local_bh_enable() will become a nop in this
regard and should be removed.
It would be the revert of commit 134874e2eee93 ("workqueue: Allow
cancel_work_sync() and disable_work() from atomic contexts on BH work
items"). The commit added the possibility to flush BH work from atomic
context but it is unclear if there already a requirement for this or if
it was to match the legacy part of the tasklet API.

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 2 weeks ago

Hello,

On Mon, Aug 18, 2025 at 02:52:42PM +0200, Sebastian Andrzej Siewior wrote:
...
> > Right, given how early in conversion, we can definitely leave this as
> > something to think about later. I have no objection to leave it be for now.
> 
> Okay. Do I need to update __flush_work() in anyway to make it obvious?
> The local_bh_disable()/ local_bh_enable() will become a nop in this
> regard and should be removed.
> It would be the revert of commit 134874e2eee93 ("workqueue: Allow
> cancel_work_sync() and disable_work() from atomic contexts on BH work
> items"). The commit added the possibility to flush BH work from atomic
> context but it is unclear if there already a requirement for this or if
> it was to match the legacy part of the tasklet API.

I see. Can I backtrack? If it doesn't require too invasive changes, let's
just keep the two in sync. I'll get back to conversions so that we can
actually achieve the goal eventually and it'll probably be more confusing if
we revert that and try to redo it later.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 2 weeks ago

On 2025-08-18 07:41:06 [-1000], Tejun Heo wrote:
> Hello,
Hi Tejun,

> I see. Can I backtrack? If it doesn't require too invasive changes, let's
> just keep the two in sync. I'll get back to conversions so that we can
> actually achieve the goal eventually and it'll probably be more confusing if
> we revert that and try to redo it later.

Okay. Then let me repost the tasklet patch and make one for workqueue to
stay in sync.
I do hope that we end up with a requirement that any kind of teardown
does not happen from an atomic context ;)

> Thanks.
> 

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 2 weeks ago

On 2025-08-19 17:01:07 [+0200], To Tejun Heo wrote:
> Okay. Then let me repost the tasklet patch and make one for workqueue to
> stay in sync.
> I do hope that we end up with a requirement that any kind of teardown
> does not happen from an atomic context ;)

That would be 

------------->8-------------

Subject: [PATCH] workqueue: Provide a handshake for canceling BH workers

While a BH work item is canceled, the core code spins until it
determines that the item completed. On PREEMPT_RT the spinning relies on
a lock in local_bh_disable() to avoid a live lock if the canceling
thread has higher priority than the BH-worker and preempts it. This lock
ensures that the BH-worker makes progress by PI-boosting it.

This lock in local_bh_disable() is a central per-CPU BKL and about to be
removed.

To provide the required synchronisation add a per pool lock. The lock is
acquired by the bh_worker at the begin while the individual callbacks
are invoked. To enforce progress in case of interruption, __flush_work()
needs to acquire the lock.
This will flush all BH-work items assigned to that pool.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c6b79b3675c31..94e226f637992 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -222,7 +222,9 @@ struct worker_pool {
 	struct workqueue_attrs	*attrs;		/* I: worker attributes */
 	struct hlist_node	hash_node;	/* PL: unbound_pool_hash node */
 	int			refcnt;		/* PL: refcnt for unbound pools */
-
+#ifdef CONFIG_PREEMPT_RT
+	spinlock_t		cb_lock;	/* BH worker cancel lock */
+#endif
 	/*
 	 * Destruction of pool is RCU protected to allow dereferences
 	 * from get_work_pool().
@@ -3078,6 +3080,31 @@ __acquires(&pool->lock)
 		goto restart;
 }
 
+#ifdef CONFIG_PREEMPT_RT
+static void worker_lock_callback(struct worker_pool *pool)
+{
+	spin_lock(&pool->cb_lock);
+}
+
+static void worker_unlock_callback(struct worker_pool *pool)
+{
+	spin_unlock(&pool->cb_lock);
+}
+
+static void workqueue_callback_cancel_wait_running(struct worker_pool *pool)
+{
+	spin_lock(&pool->cb_lock);
+	spin_unlock(&pool->cb_lock);
+}
+
+#else
+
+static void worker_lock_callback(struct worker_pool *pool) { }
+static void worker_unlock_callback(struct worker_pool *pool) { }
+static void workqueue_callback_cancel_wait_running(struct worker_pool *pool) { }
+
+#endif
+
 /**
  * manage_workers - manage worker pool
  * @worker: self
@@ -3557,6 +3584,7 @@ static void bh_worker(struct worker *worker)
 	int nr_restarts = BH_WORKER_RESTARTS;
 	unsigned long end = jiffies + BH_WORKER_JIFFIES;
 
+	worker_lock_callback(pool);
 	raw_spin_lock_irq(&pool->lock);
 	worker_leave_idle(worker);
 
@@ -3585,6 +3613,7 @@ static void bh_worker(struct worker *worker)
 	worker_enter_idle(worker);
 	kick_pool(pool);
 	raw_spin_unlock_irq(&pool->lock);
+	worker_unlock_callback(pool);
 }
 
 /*
@@ -4222,17 +4251,18 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
 		    (data & WORK_OFFQ_BH)) {
 			/*
 			 * On RT, prevent a live lock when %current preempted
-			 * soft interrupt processing or prevents ksoftirqd from
-			 * running by keeping flipping BH. If the BH work item
-			 * runs on a different CPU then this has no effect other
-			 * than doing the BH disable/enable dance for nothing.
-			 * This is copied from
-			 * kernel/softirq.c::tasklet_unlock_spin_wait().
+			 * soft interrupt processing by blocking on lock which
+			 * is owned by the thread invoking the callback.
 			 */
 			while (!try_wait_for_completion(&barr.done)) {
 				if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
-					local_bh_disable();
-					local_bh_enable();
+					struct worker_pool *pool;
+
+					mutex_lock(&wq_pool_mutex);
+					pool = get_work_pool(work);
+					if (pool)
+						workqueue_callback_cancel_wait_running(pool);
+					mutex_unlock(&wq_pool_mutex);
 				} else {
 					cpu_relax();
 				}
@@ -4782,6 +4812,9 @@ static int init_worker_pool(struct worker_pool *pool)
 	ida_init(&pool->worker_ida);
 	INIT_HLIST_NODE(&pool->hash_node);
 	pool->refcnt = 1;
+#ifdef CONFIG_PREEMPT_RT
+	spin_lock_init(&pool->cb_lock);
+#endif
 
 	/* shouldn't fail above this point */
 	pool->attrs = alloc_workqueue_attrs();
-- 
2.50.1

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 2 weeks ago

On 2025-08-20 12:36:59 [+0200], To Tejun Heo wrote:
> Subject: [PATCH] workqueue: Provide a handshake for canceling BH workers
…
> This will flush all BH-work items assigned to that pool.

We need to flush all items because the inserted wq_barrier is at the
end of the queue. So if the cb_lock is dropped after
worker->current_func(work) then we will live lock. Just tested, I
somehow assumed it polls on worker.

This behaviour is undesired and goes back to the requirement to be able
to cancel something from an atomic context which can't be atomic on
PREEMPT_RT to begin with.

Since the caller can never be atomic on PREEMPT_RT, what about the
following:

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c6b79b3675c31..6ce9c980a7966 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -4214,28 +4214,16 @@ static bool __flush_work(struct work_struct *work, bool from_cancel)
 	 * can't currently be queued. Its data must contain OFFQ bits. If @work
 	 * was queued on a BH workqueue, we also know that it was running in the
 	 * BH context and thus can be busy-waited.
+	 * On PREEMPT_RT the BH context can not be busy-waited because it can be
+	 * preempted by the caller if it has higher priority.
 	 */
-	if (from_cancel) {
+	if (from_cancel && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		unsigned long data = *work_data_bits(work);

 		if (!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) &&
 		    (data & WORK_OFFQ_BH)) {
-			/*
-			 * On RT, prevent a live lock when %current preempted
-			 * soft interrupt processing or prevents ksoftirqd from
-			 * running by keeping flipping BH. If the BH work item
-			 * runs on a different CPU then this has no effect other
-			 * than doing the BH disable/enable dance for nothing.
-			 * This is copied from
-			 * kernel/softirq.c::tasklet_unlock_spin_wait().
-			 */
 			while (!try_wait_for_completion(&barr.done)) {
-				if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
-					local_bh_disable();
-					local_bh_enable();
-				} else {
-					cpu_relax();
-				}
+				cpu_relax();
 			}
 			goto out_destroy;
 		}
@@ -4351,7 +4339,7 @@ static bool __cancel_work_sync(struct work_struct *work, u32 cflags)

 	ret = __cancel_work(work, cflags | WORK_CANCEL_DISABLE);

-	if (*work_data_bits(work) & WORK_OFFQ_BH)
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) && *work_data_bits(work) & WORK_OFFQ_BH)
 		WARN_ON_ONCE(in_hardirq());
 	else
 		might_sleep();

It is not the revert I suggested.
This should work for softirq caller and from forced-thread interrupt
(not that I encourage such behaviour).
It will not work from an atomic context such as with raw_spinlock_t
acquired but this will also not work with the current
(local_bh_disable() + enable()) solution.

I prefer this because it avoids the locking bh_worker() and the flushing
of all pending work items the flush/ cancel case.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 2 weeks ago

Hello,

On Wed, Aug 20, 2025 at 12:55:18PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-08-20 12:36:59 [+0200], To Tejun Heo wrote:
> > Subject: [PATCH] workqueue: Provide a handshake for canceling BH workers
> …
> > This will flush all BH-work items assigned to that pool.
> 
> We need to flush all items because the inserted wq_barrier is at the
> end of the queue. So if the cb_lock is dropped after
> worker->current_func(work) then we will live lock. Just tested, I
> somehow assumed it polls on worker.

Is flushing all a problem tho? I think the main focus is keeping the
semantics matching on RT, right?

...
> -	if (from_cancel) {
> +	if (from_cancel && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
>  		unsigned long data = *work_data_bits(work);
>  
>  		if (!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) &&
>  		    (data & WORK_OFFQ_BH)) {
> -			/*
> -			 * On RT, prevent a live lock when %current preempted
> -			 * soft interrupt processing or prevents ksoftirqd from
> -			 * running by keeping flipping BH. If the BH work item
> -			 * runs on a different CPU then this has no effect other
> -			 * than doing the BH disable/enable dance for nothing.
> -			 * This is copied from
> -			 * kernel/softirq.c::tasklet_unlock_spin_wait().
> -			 */
>  			while (!try_wait_for_completion(&barr.done)) {
> -				if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> -					local_bh_disable();
> -					local_bh_enable();
> -				} else {
> -					cpu_relax();
> -				}
> +				cpu_relax();

I'm most likely missing something about RT but wouldn't the above still lead
to deadlocks if the caller is non-hardirq but higher priority thread?

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 2 weeks ago

On 2025-08-20 09:44:53 [-1000], Tejun Heo wrote:
> Hello,
Hi Tejun,

> On Wed, Aug 20, 2025 at 12:55:18PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-08-20 12:36:59 [+0200], To Tejun Heo wrote:
> > > Subject: [PATCH] workqueue: Provide a handshake for canceling BH workers
> > …
> > > This will flush all BH-work items assigned to that pool.
> > 
> > We need to flush all items because the inserted wq_barrier is at the
> > end of the queue. So if the cb_lock is dropped after
> > worker->current_func(work) then we will live lock. Just tested, I
> > somehow assumed it polls on worker.
> 
> Is flushing all a problem tho? 

It is not wrong as in something will explode. The priority-inheritance
boost is meant to give the lower priority task runtime in order to leave
its critical section. So the task with the higher priority can continue
to make progress instead of sitting around. Spinning while waiting for
completion will not succeed.
In this case "leaving the critical section" would mean complete the one
work item. But instead we flush all of them. It is more of semantics in
this case. That is why the looping-cancel in tasklet cancels just that
one workitem.

> I think the main focus is keeping the
> semantics matching on RT, right?

Yes, but having the semantics with busy waiting on a BH work is kind of
the problem. And there is no need for it which makes it a bit difficult.
The previous patch would match the !RT bits but we flush all work, have
and the lock for no reason. That is why I don't like it. The majority of
tasklet users don't need it. It is in my opinion bad semantics.

But if you insist on it, the previous patch will make it work and has
been tested. There is not much I can do.

> ...
> > -	if (from_cancel) {
> > +	if (from_cancel && !IS_ENABLED(CONFIG_PREEMPT_RT)) {
> >  		unsigned long data = *work_data_bits(work);
> >  
> >  		if (!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) &&
> >  		    (data & WORK_OFFQ_BH)) {
> > -			/*
> > -			 * On RT, prevent a live lock when %current preempted
> > -			 * soft interrupt processing or prevents ksoftirqd from
> > -			 * running by keeping flipping BH. If the BH work item
> > -			 * runs on a different CPU then this has no effect other
> > -			 * than doing the BH disable/enable dance for nothing.
> > -			 * This is copied from
> > -			 * kernel/softirq.c::tasklet_unlock_spin_wait().
> > -			 */
> >  			while (!try_wait_for_completion(&barr.done)) {
> > -				if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> > -					local_bh_disable();
> > -					local_bh_enable();
> > -				} else {
> > -					cpu_relax();
> > -				}
> > +				cpu_relax();
> 
> I'm most likely missing something about RT but wouldn't the above still lead
> to deadlocks if the caller is non-hardirq but higher priority thread?

Not sure what you refer to. Right now there is this lock in
local_bh_disable() which forces PI.
Removing the whole section for RT as in this snippet gets us to the
wait_for_completion() below. It lets the task with higher priority
schedule out allowing task with lower priority to run. Eventually the
barrier item completes and with the wakeup the high priority task will
continue.
So the high-priority task will lose runtime (allowing task with lower
priority to run). I don't think it will be a problem because it is
mostly used in "quit" scenarios (not during normal workload) and matches
tasklet_disable().

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 2 weeks ago

Hello, Sebastian.

On Thu, Aug 21, 2025 at 11:28:27AM +0200, Sebastian Andrzej Siewior wrote:
...
> It is not wrong as in something will explode. The priority-inheritance
> boost is meant to give the lower priority task runtime in order to leave
> its critical section. So the task with the higher priority can continue
> to make progress instead of sitting around. Spinning while waiting for
> completion will not succeed.
> In this case "leaving the critical section" would mean complete the one
> work item. But instead we flush all of them. It is more of semantics in
> this case. That is why the looping-cancel in tasklet cancels just that
> one workitem.

Understood. However, given that these pools are per-cpu, BHs are usually not
heavily contended and canceling itself is a low frequency operation, the
practical difference likely won't be noticeable.

> > I think the main focus is keeping the
> > semantics matching on RT, right?
> 
> Yes, but having the semantics with busy waiting on a BH work is kind of
> the problem. And there is no need for it which makes it a bit difficult.
> The previous patch would match the !RT bits but we flush all work, have
> and the lock for no reason. That is why I don't like it. The majority of
> tasklet users don't need it. It is in my opinion bad semantics.
> 
> But if you insist on it, the previous patch will make it work and has
> been tested. There is not much I can do.

Oh, I'm not insisting, don't know enough to do so. Just trying to understand
the situation.

> > I'm most likely missing something about RT but wouldn't the above still lead
> > to deadlocks if the caller is non-hardirq but higher priority thread?
> 
> Not sure what you refer to. Right now there is this lock in
> local_bh_disable() which forces PI.
> Removing the whole section for RT as in this snippet gets us to the
> wait_for_completion() below. It lets the task with higher priority
> schedule out allowing task with lower priority to run. Eventually the
> barrier item completes and with the wakeup the high priority task will
> continue.
> So the high-priority task will lose runtime (allowing task with lower
> priority to run). I don't think it will be a problem because it is
> mostly used in "quit" scenarios (not during normal workload) and matches
> tasklet_disable().

Okay, so, on !RT, that busy loop section is there to allow
cancel_work_sync() to be called from BH-disabled contexts and the caller is
responsible for ensuring there's no recursion. It's not great but matches
the existing behavior. Obviously, in !RT, we can't go to
wait_for_completion() there because we can be in a non-sleepable context.

Are you saying that, in RT, it'd be fine to call wait_for_completion()
inside local_bh_disable() and won't trip any of the context warnings? If so,
yeah, we don't need any of the looping.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 1 week ago

On 2025-08-21 07:10:42 [-1000], Tejun Heo wrote:
> Hello, Sebastian.
Hello Tejun,

> Oh, I'm not insisting, don't know enough to do so. Just trying to understand
> the situation.

ah okay.

> > > I'm most likely missing something about RT but wouldn't the above still lead
> > > to deadlocks if the caller is non-hardirq but higher priority thread?
> > 
> > Not sure what you refer to. Right now there is this lock in
> > local_bh_disable() which forces PI.
> > Removing the whole section for RT as in this snippet gets us to the
> > wait_for_completion() below. It lets the task with higher priority
> > schedule out allowing task with lower priority to run. Eventually the
> > barrier item completes and with the wakeup the high priority task will
> > continue.
> > So the high-priority task will lose runtime (allowing task with lower
> > priority to run). I don't think it will be a problem because it is
> > mostly used in "quit" scenarios (not during normal workload) and matches
> > tasklet_disable().
> 
> Okay, so, on !RT, that busy loop section is there to allow
> cancel_work_sync() to be called from BH-disabled contexts and the caller is
> responsible for ensuring there's no recursion. It's not great but matches
> the existing behavior. 

hold on for for a sec: existing behaviour for tasklet_unlock_spin_wait()
which has three users (a fourth one if we count i915 which has its own
tasklet layer). Not something that I would call common or wide spread
behaviour in the kernel (and task workqueue does not have it either).

tasklet_disable() and tasklet_kill() both sleep while waiting for
completion and don't spin.

>                        Obviously, in !RT, we can't go to
> wait_for_completion() there because we can be in a non-sleepable context.

Again, only a small amount of users require to do so.

> Are you saying that, in RT, it'd be fine to call wait_for_completion()
> inside local_bh_disable() and won't trip any of the context warnings? If so,
> yeah, we don't need any of the looping.

No, that won't work. local_bh_disable() will start a RCU read section
and then RCU will complain during schedule().
So if the requirement is to cancel a BH workitem from within a BH
disabled section then we need the first patch in this thread.

But if we get rid of this requirement…

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 1 week ago

Hello, Sebastian.

On Fri, Aug 22, 2025 at 11:48:12AM +0200, Sebastian Andrzej Siewior wrote:
> > Okay, so, on !RT, that busy loop section is there to allow
> > cancel_work_sync() to be called from BH-disabled contexts and the caller is
> > responsible for ensuring there's no recursion. It's not great but matches
> > the existing behavior. 
> 
> hold on for for a sec: existing behaviour for tasklet_unlock_spin_wait()
> which has three users (a fourth one if we count i915 which has its own
> tasklet layer). Not something that I would call common or wide spread
> behaviour in the kernel (and task workqueue does not have it either).

IIRC, there was a conversion patchset which didn't get pushed to completion
and it hit those, so the patch to add the busy wait.

> tasklet_disable() and tasklet_kill() both sleep while waiting for
> completion and don't spin.
> 
> >                        Obviously, in !RT, we can't go to
> > wait_for_completion() there because we can be in a non-sleepable context.
> 
> Again, only a small amount of users require to do so.
> 
> > Are you saying that, in RT, it'd be fine to call wait_for_completion()
> > inside local_bh_disable() and won't trip any of the context warnings? If so,
> > yeah, we don't need any of the looping.
> 
> No, that won't work. local_bh_disable() will start a RCU read section
> and then RCU will complain during schedule().
> So if the requirement is to cancel a BH workitem from within a BH
> disabled section then we need the first patch in this thread.
> 
> But if we get rid of this requirement…

Agreed, once we get rid of them, we can drop the whole block for both RT and
!RT - ie. revert the patch that added it. But, wouldn't it be more orderly
to match the semantics in both cases even if the behavior isn't quite
optimal? We can put some comment noting what to do once the culprits are
updated to not need it.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 1 week ago

On 2025-08-22 08:07:47 [-1000], Tejun Heo wrote:
> Hello, Sebastian.
Hi Tejun,

> Agreed, once we get rid of them, we can drop the whole block for both RT and
> !RT - ie. revert the patch that added it. But, wouldn't it be more orderly
> to match the semantics in both cases even if the behavior isn't quite
> optimal? We can put some comment noting what to do once the culprits are
> updated to not need it.

Sure. I am only worried that if something is possible, people will use
it. I don't think things will change if we debate for another week ;)
The first patch here in this thread should provide the symmetrical API.

Oh. We could also hide this polling API behind a special API which is
hidden just for three special cases so everyone else would do the right
job.

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month, 1 week ago

Hello,

On Tue, Aug 26, 2025 at 05:49:42PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-08-22 08:07:47 [-1000], Tejun Heo wrote:
> > Hello, Sebastian.
> Hi Tejun,
> 
> > Agreed, once we get rid of them, we can drop the whole block for both RT and
> > !RT - ie. revert the patch that added it. But, wouldn't it be more orderly
> > to match the semantics in both cases even if the behavior isn't quite
> > optimal? We can put some comment noting what to do once the culprits are
> > updated to not need it.
> 
> Sure. I am only worried that if something is possible, people will use
> it. I don't think things will change if we debate for another week ;)
> The first patch here in this thread should provide the symmetrical API.
> 
> Oh. We could also hide this polling API behind a special API which is
> hidden just for three special cases so everyone else would do the right
> job.

Oh yeah, that makes a lot of sense to me - splitting it out into something
which is named explicitly to discourage further usages.

Thanks.

-- 
tejun

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Sebastian Andrzej Siewior 1 month, 1 week ago

On 2025-08-26 06:27:27 [-1000], Tejun Heo wrote:
> Hello,
Hello Tejun,

> Oh yeah, that makes a lot of sense to me - splitting it out into something
> which is named explicitly to discourage further usages.

I am a bit lost now. Do you intend to apply the patch and we came up
with the bh-canceling-from-bh API later on or what is the plan?

I can repost it of course together with the tasklet patch.

> Thanks.

Sebastian

Re: [PATCH] softirq: Provide a handshake for canceling tasklets via polling on PREEMPT_RT

Posted by Tejun Heo 1 month ago

Hello, Sebastian.

On Thu, Aug 28, 2025 at 06:04:55PM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-08-26 06:27:27 [-1000], Tejun Heo wrote:
> > Oh yeah, that makes a lot of sense to me - splitting it out into something
> > which is named explicitly to discourage further usages.
> 
> I am a bit lost now. Do you intend to apply the patch and we came up
> with the bh-canceling-from-bh API later on or what is the plan?

I'd much prefer if the end result is that the busy waiting is a separate
API. Whether that's done in a single patch or incremental patches doesn't
really matter.

Thanks.

-- 
tejun