From nobody Fri Oct 3 11:15:06 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30A9A2EC09B for ; Mon, 1 Sep 2025 16:38:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744698; cv=none; b=BAmjFd433Vnyt9Nm6LnbRKAa7yNVT7wQ7lTuN0YatNG3mC13WS4zpH/X641p2Mg7H3tkxGhGp0heDCmuoeNK4BKd7T5885pd7xS6FKBqLo+N7pN60cyh7UUSYKe7UV+Wn6oMBLyHALuKmKykw4APc7HWmdgmvb3JTYt/wFvQo/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744698; c=relaxed/simple; bh=pGIEPFpqD8aJXuWO8M8k8R4+XpO5MMzeNMBpiSL1EeA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KvjcTqaZ55lp8TUcsMrTXtqNGAQA28ln2Ch8HK2SK7xhnccRxUHFMTstJzZLIvpSyLFrBk19hZb/7ke4IV37NaAdl0q9EHIphbib0qKEcBNw8237TtmlAaEKY2caCaqBmrfWUPco+fJsU9qSHUrb3ZD56UmUVblUvfpRJYjwL9I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=la4CeuZU; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=R1odyhZO; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="la4CeuZU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="R1odyhZO" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1756744695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ymF0vFVlA69uj5EXxsImbalEDu3PQoiy0nTiLc6jf3o=; b=la4CeuZUUc92TRevR56KZnJpxqhBUGgbV1cfL0FpkGbnAQBGEFmuixoO7Y4whQKN5tWCnp WxczrMtVIugFADB1dtoQaM83P6AOqRY2OBzbd8hCdOkYtSP8Dz+Uf1YEllBzL3fjriiykN cE+tBLhGnxgg1zxqXsDuceOE094szbjem5SA+9/D9ykguRDjP6J0PA/dZjjSYgGDKlfi9K IjulKYdmKOy3DbP3HL7BK9o+kn8+MQOGEchFip8OcFM8U2TkA9Dd3Gx4aTyyNVO6OivrZp Uzw2WeSb1eCR82jogl7pDeTuTmQCRG+r7Qr619pWMC362EaTV8noEi9khZxzDg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1756744695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ymF0vFVlA69uj5EXxsImbalEDu3PQoiy0nTiLc6jf3o=; b=R1odyhZOBI6iI9xFGriL5sD/jN6R9hqg1yBTzABwCgqYVXtTOWfHVyXLUE/kpKOBlT3s3T emeFDdO+Qx3BuhAg== To: linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Clark Williams , Ingo Molnar , Lai Jiangshan , Peter Zijlstra , Steven Rostedt , Tejun Heo , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v2 1/3] workqueue: Provide a handshake for canceling BH workers Date: Mon, 1 Sep 2025 18:38:09 +0200 Message-ID: <20250901163811.963326-2-bigeasy@linutronix.de> In-Reply-To: <20250901163811.963326-1-bigeasy@linutronix.de> References: <20250901163811.963326-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" While a BH work item is canceled, the core code spins until it determines that the item completed. On PREEMPT_RT the spinning relies on a lock in local_bh_disable() to avoid a live lock if the canceling thread has higher priority than the BH-worker and preempts it. This lock ensures that the BH-worker makes progress by PI-boosting it. This lock in local_bh_disable() is a central per-CPU BKL and about to be removed. To provide the required synchronisation add a per pool lock. The lock is acquired by the bh_worker at the begin while the individual callbacks are invoked. To enforce progress in case of interruption, __flush_work() needs to acquire the lock. This will flush all BH-work items assigned to that pool. Signed-off-by: Sebastian Andrzej Siewior --- kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 42 insertions(+), 9 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c6b79b3675c31..94e226f637992 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -222,7 +222,9 @@ struct worker_pool { struct workqueue_attrs *attrs; /* I: worker attributes */ struct hlist_node hash_node; /* PL: unbound_pool_hash node */ int refcnt; /* PL: refcnt for unbound pools */ - +#ifdef CONFIG_PREEMPT_RT + spinlock_t cb_lock; /* BH worker cancel lock */ +#endif /* * Destruction of pool is RCU protected to allow dereferences * from get_work_pool(). @@ -3078,6 +3080,31 @@ __acquires(&pool->lock) goto restart; } =20 +#ifdef CONFIG_PREEMPT_RT +static void worker_lock_callback(struct worker_pool *pool) +{ + spin_lock(&pool->cb_lock); +} + +static void worker_unlock_callback(struct worker_pool *pool) +{ + spin_unlock(&pool->cb_lock); +} + +static void workqueue_callback_cancel_wait_running(struct worker_pool *poo= l) +{ + spin_lock(&pool->cb_lock); + spin_unlock(&pool->cb_lock); +} + +#else + +static void worker_lock_callback(struct worker_pool *pool) { } +static void worker_unlock_callback(struct worker_pool *pool) { } +static void workqueue_callback_cancel_wait_running(struct worker_pool *poo= l) { } + +#endif + /** * manage_workers - manage worker pool * @worker: self @@ -3557,6 +3584,7 @@ static void bh_worker(struct worker *worker) int nr_restarts =3D BH_WORKER_RESTARTS; unsigned long end =3D jiffies + BH_WORKER_JIFFIES; =20 + worker_lock_callback(pool); raw_spin_lock_irq(&pool->lock); worker_leave_idle(worker); =20 @@ -3585,6 +3613,7 @@ static void bh_worker(struct worker *worker) worker_enter_idle(worker); kick_pool(pool); raw_spin_unlock_irq(&pool->lock); + worker_unlock_callback(pool); } =20 /* @@ -4222,17 +4251,18 @@ static bool __flush_work(struct work_struct *work, = bool from_cancel) (data & WORK_OFFQ_BH)) { /* * On RT, prevent a live lock when %current preempted - * soft interrupt processing or prevents ksoftirqd from - * running by keeping flipping BH. If the BH work item - * runs on a different CPU then this has no effect other - * than doing the BH disable/enable dance for nothing. - * This is copied from - * kernel/softirq.c::tasklet_unlock_spin_wait(). + * soft interrupt processing by blocking on lock which + * is owned by the thread invoking the callback. */ while (!try_wait_for_completion(&barr.done)) { if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - local_bh_disable(); - local_bh_enable(); + struct worker_pool *pool; + + mutex_lock(&wq_pool_mutex); + pool =3D get_work_pool(work); + if (pool) + workqueue_callback_cancel_wait_running(pool); + mutex_unlock(&wq_pool_mutex); } else { cpu_relax(); } @@ -4782,6 +4812,9 @@ static int init_worker_pool(struct worker_pool *pool) ida_init(&pool->worker_ida); INIT_HLIST_NODE(&pool->hash_node); pool->refcnt =3D 1; +#ifdef CONFIG_PREEMPT_RT + spin_lock_init(&pool->cb_lock); +#endif =20 /* shouldn't fail above this point */ pool->attrs =3D alloc_workqueue_attrs(); --=20 2.51.0 From nobody Fri Oct 3 11:15:06 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 896B33064BD for ; Mon, 1 Sep 2025 16:38:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744699; cv=none; b=mOlimGIgHxqh8k10Iw4JN8hZRyyO+L/JbPjh+uAo/x/1UebzRaya+T4q9bNAVguT2LYb+8IEINtf5F1hi3Rsj0a/0OWUwbgpEYa3NhaAQHReo9R4v554HaqI7wwAR4Ckd2mVQwW7cIZiWLvhxZcF5CY2o165IO97Kce8WOKvaSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744699; c=relaxed/simple; bh=BrexX74KTRK7FMalM1pY5dC2CuUGR6ggJVFFqo1/zDk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DVJVuVJSvD0OIimaFoPa0z2AQyVokznQHhyjm5dsgWGL6KFmAXk8BUcOBPbOf/2WWs6I+x2gKHRSufWoYDKwU9ygm8lVlVJlvO0SDbFaDc+g7/A/5i5VrOl5ClYaWp0SQ9MiSdkaUXn+/hVU8IqBQXkrOnIrLpV8svl1O0APLuY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=N1nr/4xo; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=egmu6wlV; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="N1nr/4xo"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="egmu6wlV" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1756744695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UVN5CLakpk5U9Y52EwxJn3/1VRy6LEwOBHWKO5lWw8o=; b=N1nr/4xoLtQlltAQ8MjUQSk8YVZ/c6BkotHebQdcOORfFpikcmGmOe567uUMMCN7A2oJd+ Nzt7k/khAUXKUTgg5n3CkGPgDTu2W94+0l7jCz8fPENj7SwmL43I9HR+nXZSFnJIO30X3o X/oyN/yZb9OjAm0hl7olLE/et/COPHtBpIppHx+DAyd8I6w3oVkC+hCOlYsBhnjZLc3xzq jDZcMx0vFw01X3SkQMo28JdE3O4NOrCw/R6E2s+5N9t6ZNBLa5j28eOOHGqoGYg2RJzFOo PXyhEwNB1grenx3iRflAn9R6uHjjbytIAQYK0fzmrVfcESD57dUsqXfkB1qa7A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1756744695; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UVN5CLakpk5U9Y52EwxJn3/1VRy6LEwOBHWKO5lWw8o=; b=egmu6wlVdxQ7GfrWIOyY+Ez7zn7yMxQPmJY+E+wiiXISElBFb+ZZx5dmw1qttLS/fo/F9N pi7XAFC5l+bSj7Dg== To: linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Clark Williams , Ingo Molnar , Lai Jiangshan , Peter Zijlstra , Steven Rostedt , Tejun Heo , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v2 2/3] softirq: Provide a handshake for canceling tasklets via polling Date: Mon, 1 Sep 2025 18:38:10 +0200 Message-ID: <20250901163811.963326-3-bigeasy@linutronix.de> In-Reply-To: <20250901163811.963326-1-bigeasy@linutronix.de> References: <20250901163811.963326-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The tasklet_unlock_spin_wait() via tasklet_disable_in_atomic() is provided for a few legacy tasklet users. The interface is used from atomic context (which is either softirq or disabled preemption) on non-PREEMPT_RT an relies on spinning until the tasklet callback completes. On PREEMPT_RT the context is never atomic but the busy polling logic remains. It possible that the thread invoking tasklet_unlock_spin_wait() has higher priority than the tasklet. If both run on the same CPU the the tasklet makes no progress and the thread trying to cancel the tasklet will live-lock the system. To avoid the lockup tasklet_unlock_spin_wait() uses local_bh_disable()/ enable() which utilizes the local_lock_t for synchronisation. This lock is a central per-CPU BKL and about to be removed. Acquire a lock in tasklet_action_common() which is held while the tasklet's callback is invoked. This lock will be acquired from tasklet_unlock_spin_wait() via tasklet_callback_cancel_wait_running(). After the tasklet completed tasklet_callback_sync_wait_running() drops the lock and acquires it again. In order to avoid unlocking the lock even if there is no cancel request, there is a cb_waiters counter which is incremented during a cancel request. Blocking on the lock will PI-boost the tasklet if needed, ensuring progress is made. Signed-off-by: Sebastian Andrzej Siewior --- kernel/softirq.c | 62 ++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 57 insertions(+), 5 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 513b1945987cc..4e2c980e7712e 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -805,6 +805,58 @@ static bool tasklet_clear_sched(struct tasklet_struct = *t) return false; } =20 +#ifdef CONFIG_PREEMPT_RT +struct tasklet_sync_callback { + spinlock_t cb_lock; + atomic_t cb_waiters; +}; + +static DEFINE_PER_CPU(struct tasklet_sync_callback, tasklet_sync_callback)= =3D { + .cb_lock =3D __SPIN_LOCK_UNLOCKED(tasklet_sync_callback.cb_lock), + .cb_waiters =3D ATOMIC_INIT(0), +}; + +static void tasklet_lock_callback(void) +{ + spin_lock(this_cpu_ptr(&tasklet_sync_callback.cb_lock)); +} + +static void tasklet_unlock_callback(void) +{ + spin_unlock(this_cpu_ptr(&tasklet_sync_callback.cb_lock)); +} + +static void tasklet_callback_cancel_wait_running(void) +{ + struct tasklet_sync_callback *sync_cb =3D this_cpu_ptr(&tasklet_sync_call= back); + + atomic_inc(&sync_cb->cb_waiters); + spin_lock(&sync_cb->cb_lock); + atomic_dec(&sync_cb->cb_waiters); + spin_unlock(&sync_cb->cb_lock); +} + +static void tasklet_callback_sync_wait_running(void) +{ + struct tasklet_sync_callback *sync_cb =3D this_cpu_ptr(&tasklet_sync_call= back); + + if (atomic_read(&sync_cb->cb_waiters)) { + spin_unlock(&sync_cb->cb_lock); + spin_lock(&sync_cb->cb_lock); + } +} + +#else /* !CONFIG_PREEMPT_RT: */ + +static void tasklet_lock_callback(void) { } +static void tasklet_unlock_callback(void) { } +static void tasklet_callback_sync_wait_running(void) { } + +#ifdef CONFIG_SMP +static void tasklet_callback_cancel_wait_running(void) { } +#endif +#endif /* !CONFIG_PREEMPT_RT */ + static void tasklet_action_common(struct tasklet_head *tl_head, unsigned int softirq_nr) { @@ -816,6 +868,7 @@ static void tasklet_action_common(struct tasklet_head *= tl_head, tl_head->tail =3D &tl_head->head; local_irq_enable(); =20 + tasklet_lock_callback(); while (list) { struct tasklet_struct *t =3D list; =20 @@ -835,6 +888,7 @@ static void tasklet_action_common(struct tasklet_head *= tl_head, } } tasklet_unlock(t); + tasklet_callback_sync_wait_running(); continue; } tasklet_unlock(t); @@ -847,6 +901,7 @@ static void tasklet_action_common(struct tasklet_head *= tl_head, __raise_softirq_irqoff(softirq_nr); local_irq_enable(); } + tasklet_unlock_callback(); } =20 static __latent_entropy void tasklet_action(void) @@ -897,12 +952,9 @@ void tasklet_unlock_spin_wait(struct tasklet_struct *t) /* * Prevent a live lock when current preempted soft * interrupt processing or prevents ksoftirqd from - * running. If the tasklet runs on a different CPU - * then this has no effect other than doing the BH - * disable/enable dance for nothing. + * running. */ - local_bh_disable(); - local_bh_enable(); + tasklet_callback_cancel_wait_running(); } else { cpu_relax(); } --=20 2.51.0 From nobody Fri Oct 3 11:15:06 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4A1B2EC083 for ; Mon, 1 Sep 2025 16:38:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744699; cv=none; b=iLk9Wrpp1KsCPtsHT+pgnCBczQsw01G/7ATV4ULtB/+8OScKsYUIJuq6ERdvy1MwMQCUTlGaYvfqO4G5/b1BxCZ4sWhvPWd0y2q5NvwrR+KCKP6+BA1SJeVRiVV4b8aW0FVRPAYEMA3O7TQVGCQAQ/kzdXZZFNV03YEgSqVYccE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756744699; c=relaxed/simple; bh=ntLMxMTJJL8iq2rbfSuz0Vqw+9K2PaY2gC9lbyvr3xk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RoiGXg4WJUoWiTOUBJiLNCkXathSEDstphL9BvenPuZsAvRv3U5WzD5NZFBnoCxHZD/HtFrilAHIeUDCaAGyuqDkDvawn8WkuHOAC91+dUEUnkR4E/lZCK38MGFZdJ/2B8NZ92XI3PKvsN7acp93t2i7l/BPi4uiGr2M2rngfd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Vg2L5s3A; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=PotjeZeJ; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Vg2L5s3A"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="PotjeZeJ" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1756744696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zpqMI9FznM8U9iJMEFpqpF+IGwhgXp8qLalooxsCxCo=; b=Vg2L5s3Aeu3pWukpzsHFtKelYefl6wKlCH56F/aFlW9jgYn7QO3/j4t9fe+baOZPDOVJVK GsTgU+mGdEGQ28mBfyH0Cve7ouE4on1syDchRwLj82HVHuy3sIdgc2fsY3gdEIIXu0zmt0 TNBz78/6TUAGdTBLtamtKmW1L9qdJQ+HnKiMJEBXPstF650UjYhGMAzShMo0TK9Mi0s+BI r6WxLM9DKmaTHuVzIvAe0j/R9y/nMK4gECcth7iJaRHAmAdSSK+VKH2rJZ9lFdBO78p/Vy mabwyfKtAJDF+dgb1IDLguPcdscbBJ88nm1xm2a3LfNy0xgHDl8ibEFNtG5RSA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1756744696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zpqMI9FznM8U9iJMEFpqpF+IGwhgXp8qLalooxsCxCo=; b=PotjeZeJYGw8dF/H2QWUiyXu5lvEBKA8y5CQEk02RPz5r9/OdRVttW4H+DEmYzE8zA8g4P NTQI7wRICMEougCw== To: linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Clark Williams , Ingo Molnar , Lai Jiangshan , Peter Zijlstra , Steven Rostedt , Tejun Heo , Thomas Gleixner , Sebastian Andrzej Siewior Subject: [PATCH v2 3/3] softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT Date: Mon, 1 Sep 2025 18:38:11 +0200 Message-ID: <20250901163811.963326-4-bigeasy@linutronix.de> In-Reply-To: <20250901163811.963326-1-bigeasy@linutronix.de> References: <20250901163811.963326-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" softirqs are preemptible on PREEMPT_RT. There is synchronisation between individual sections which disable bottom halves. This in turn means that a forced threaded interrupt can not preempt another forced threaded interrupt. Instead it will PI-boost the other handler and wait for its completion. This is required because code within a softirq section is assumed to be non-preemptible and may expect exclusive access to per-CPU resources such as variables or pinned timers. Code with such expectation has been identified and updated to use local_lock_nested_bh() for locking of the per-CPU resource. This means the lock can be removed. Add CONFIG_PREEMPT_RT_NEEDS_BH_LOCK which keeps the old behaviour if selected. Otherwise the softirq synchronising is lifted. The softirq_ctrl.cnt accounting remains to let NOHZ code if softirqs are currently handled. Signed-off-by: Sebastian Andrzej Siewior --- kernel/Kconfig.preempt | 13 +++++++ kernel/softirq.c | 83 ++++++++++++++++++++++++++++++++---------- 2 files changed, 76 insertions(+), 20 deletions(-) diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt index 54ea59ff8fbeb..da326800c1c9b 100644 --- a/kernel/Kconfig.preempt +++ b/kernel/Kconfig.preempt @@ -103,6 +103,19 @@ config PREEMPT_RT Select this if you are building a kernel for systems which require real-time guarantees. =20 +config PREEMPT_RT_NEEDS_BH_LOCK + bool "Enforce softirq synchronisation on PREEMPT_RT" + depends on PREEMPT_RT + help + Enforce synchronisation across the softirqs context. On PREEMPT_RT + the softirq is preemptible. This enforces the same per-CPU BLK + semantic non-PREEMPT_RT builds have. This should not be needed + because per-CPU locks were added to avoid the per-CPU BKL. + + This switch provides the old behaviour for testing reasons. Select + this if you suspect an error with preemptible softirq and want test + the old synchronized behaviour. + config PREEMPT_COUNT bool =20 diff --git a/kernel/softirq.c b/kernel/softirq.c index 4e2c980e7712e..77198911b8dd4 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -165,7 +165,11 @@ void __local_bh_disable_ip(unsigned long ip, unsigned = int cnt) /* First entry of a task into a BH disabled section? */ if (!current->softirq_disable_cnt) { if (preemptible()) { - local_lock(&softirq_ctrl.lock); + if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) + local_lock(&softirq_ctrl.lock); + else + migrate_disable(); + /* Required to meet the RCU bottomhalf requirements. */ rcu_read_lock(); } else { @@ -177,17 +181,34 @@ void __local_bh_disable_ip(unsigned long ip, unsigned= int cnt) * Track the per CPU softirq disabled state. On RT this is per CPU * state to allow preemption of bottom half disabled sections. */ - newcnt =3D __this_cpu_add_return(softirq_ctrl.cnt, cnt); - /* - * Reflect the result in the task state to prevent recursion on the - * local lock and to make softirq_count() & al work. - */ - current->softirq_disable_cnt =3D newcnt; + if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) { + newcnt =3D this_cpu_add_return(softirq_ctrl.cnt, cnt); + /* + * Reflect the result in the task state to prevent recursion on the + * local lock and to make softirq_count() & al work. + */ + current->softirq_disable_cnt =3D newcnt; =20 - if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt =3D=3D cnt) { - raw_local_irq_save(flags); - lockdep_softirqs_off(ip); - raw_local_irq_restore(flags); + if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && newcnt =3D=3D cnt) { + raw_local_irq_save(flags); + lockdep_softirqs_off(ip); + raw_local_irq_restore(flags); + } + } else { + bool sirq_dis =3D false; + + if (!current->softirq_disable_cnt) + sirq_dis =3D true; + + this_cpu_add(softirq_ctrl.cnt, cnt); + current->softirq_disable_cnt +=3D cnt; + WARN_ON_ONCE(current->softirq_disable_cnt < 0); + + if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_dis) { + raw_local_irq_save(flags); + lockdep_softirqs_off(ip); + raw_local_irq_restore(flags); + } } } EXPORT_SYMBOL(__local_bh_disable_ip); @@ -195,23 +216,42 @@ EXPORT_SYMBOL(__local_bh_disable_ip); static void __local_bh_enable(unsigned int cnt, bool unlock) { unsigned long flags; + bool sirq_en =3D false; int newcnt; =20 - DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=3D - this_cpu_read(softirq_ctrl.cnt)); + if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) { + DEBUG_LOCKS_WARN_ON(current->softirq_disable_cnt !=3D + this_cpu_read(softirq_ctrl.cnt)); + if (softirq_count() =3D=3D cnt) + sirq_en =3D true; + } else { + if (current->softirq_disable_cnt =3D=3D cnt) + sirq_en =3D true; + } =20 - if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && softirq_count() =3D=3D cnt) { + if (IS_ENABLED(CONFIG_TRACE_IRQFLAGS) && sirq_en) { raw_local_irq_save(flags); lockdep_softirqs_on(_RET_IP_); raw_local_irq_restore(flags); } =20 - newcnt =3D __this_cpu_sub_return(softirq_ctrl.cnt, cnt); - current->softirq_disable_cnt =3D newcnt; + if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) { + newcnt =3D this_cpu_sub_return(softirq_ctrl.cnt, cnt); + current->softirq_disable_cnt =3D newcnt; =20 - if (!newcnt && unlock) { - rcu_read_unlock(); - local_unlock(&softirq_ctrl.lock); + if (!newcnt && unlock) { + rcu_read_unlock(); + local_unlock(&softirq_ctrl.lock); + } + } else { + current->softirq_disable_cnt -=3D cnt; + this_cpu_sub(softirq_ctrl.cnt, cnt); + if (unlock && !current->softirq_disable_cnt) { + migrate_enable(); + rcu_read_unlock(); + } else { + WARN_ON_ONCE(current->softirq_disable_cnt < 0); + } } } =20 @@ -228,7 +268,10 @@ void __local_bh_enable_ip(unsigned long ip, unsigned i= nt cnt) lock_map_release(&bh_lock_map); =20 local_irq_save(flags); - curcnt =3D __this_cpu_read(softirq_ctrl.cnt); + if (IS_ENABLED(CONFIG_PREEMPT_RT_NEEDS_BH_LOCK)) + curcnt =3D this_cpu_read(softirq_ctrl.cnt); + else + curcnt =3D current->softirq_disable_cnt; =20 /* * If this is not reenabling soft interrupts, no point in trying to --=20 2.51.0