From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E0912E7BD0 for ; Thu, 3 Jul 2025 14:07:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551656; cv=none; b=W8TrwGp10Hg2pT6EEkrWYqOYVH0B3XfQcZZ1Fvr7szJIHDANlRngLY1XA/J7KJT5fSxzAC7/s7qSBsqKHk5xSoOgn1Za/AssJReYN/ygZj91+FVsf1ctuL/qChRM8tEbxJHTLo/CPpDtLXWke77yNahI0tF+JYlplSsH84c7Ja8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551656; c=relaxed/simple; bh=KEyxKF7bbPMo0JctDd19rU2CfoDK6otXeM0XfrzjtAM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=abTWtLLrMYoWKo1oESV80iHRxU7jgXjKhmu5Ei6RkJEDW6GEjNChVigDQsus9HkQ3fGRYQHFz/mZ4f9gmRn8VvuBWIXBwB/PhLNUmf8m8Qsmnuh1b0bhwtdLyK1Ra4QVML3LdxkAtWtKEh3Zeq21NkwydZqtAa27XCCEjzEEKlY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=R5WdPwK8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R5WdPwK8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5E008C4CEED; Thu, 3 Jul 2025 14:07:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551655; bh=KEyxKF7bbPMo0JctDd19rU2CfoDK6otXeM0XfrzjtAM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=R5WdPwK8joJmZRx7kOg1Iee7tBSPqd4RG87rlVXg2TpGSjTEz6TKvXe6YV/0x+IgE qk8vnRZPkMLuadN/8xJckx1lO0wf7RF/L4lmzgI7Nd07gu9uzdbY+YY72UqlIZMPhU KPMiWVijIhE1ReB+DUJk6XamyQceTWG7RvHEREHyLIOxhuBoBTqPggmrjWDfuj2qPD rRkOyUDj4dS/tGxSkN6BTwzxTIxbZEBATG7h/neXGyjE6qZm7Vvve6D71BQJwgo5gm rdNUf6tKSNC+bf6dZ9qRHX+sDjO6CYchzdBO1sziapvzZXDlHK4rEqVQv/UpvOLF6n EHT2CLYq27hTQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 1/6] task_work: Provide means to check if a work is queued Date: Thu, 3 Jul 2025 16:07:12 +0200 Message-ID: <20250703140717.25703-2-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some task work users implement their own ways to know if a callback is already queued on the current task while fiddling with the callback head internals. Provide instead a consolidated API to serve this very purpose. Reviewed-by: Oleg Nesterov Reviewed-by: Valentin Schneider Signed-off-by: Frederic Weisbecker --- include/linux/task_work.h | 12 ++++++++++++ kernel/task_work.c | 9 +++++++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 0646804860ff..31caf12c1313 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -5,12 +5,15 @@ #include #include =20 +#define TASK_WORK_DEQUEUED ((void *) -1UL) + typedef void (*task_work_func_t)(struct callback_head *); =20 static inline void init_task_work(struct callback_head *twork, task_work_func_t func) { twork->func =3D func; + twork->next =3D TASK_WORK_DEQUEUED; } =20 enum task_work_notify_mode { @@ -26,6 +29,15 @@ static inline bool task_work_pending(struct task_struct = *task) return READ_ONCE(task->task_works); } =20 +/* + * Check if a work is queued. Beware: this is inherently racy if the work = can + * be queued elsewhere than the current task. + */ +static inline bool task_work_queued(struct callback_head *twork) +{ + return twork->next !=3D TASK_WORK_DEQUEUED; +} + int task_work_add(struct task_struct *task, struct callback_head *twork, enum task_work_notify_mode mode); =20 diff --git a/kernel/task_work.c b/kernel/task_work.c index d1efec571a4a..56718cb824d9 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -67,8 +67,10 @@ int task_work_add(struct task_struct *task, struct callb= ack_head *work, =20 head =3D READ_ONCE(task->task_works); do { - if (unlikely(head =3D=3D &work_exited)) + if (unlikely(head =3D=3D &work_exited)) { + work->next =3D TASK_WORK_DEQUEUED; return -ESRCH; + } work->next =3D head; } while (!try_cmpxchg(&task->task_works, &head, work)); =20 @@ -129,8 +131,10 @@ task_work_cancel_match(struct task_struct *task, if (!match(work, data)) { pprev =3D &work->next; work =3D READ_ONCE(*pprev); - } else if (try_cmpxchg(pprev, &work, work->next)) + } else if (try_cmpxchg(pprev, &work, work->next)) { + work->next =3D TASK_WORK_DEQUEUED; break; + } } raw_spin_unlock_irqrestore(&task->pi_lock, flags); =20 @@ -224,6 +228,7 @@ void task_work_run(void) =20 do { next =3D work->next; + work->next =3D TASK_WORK_DEQUEUED; work->func(work); work =3D next; cond_resched(); --=20 2.48.1 From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3412C2E88AB for ; Thu, 3 Jul 2025 14:07:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551659; cv=none; b=X+EZOJrsfBWuYODyAWmpl061KzVrxit7tRfhN1Fd3ZVwW4J/RUyaxqxylB/Oz+hyzTdWiOvqQQtu5bwZVa5Hko3mkAX7hQ3zfjPt2KSyfRsRvKEQQ+N2yRSuB/3qsNIqF/mD3i7a3WmeyQorv4pXulU9TZBp3mEnwSrCuaub12I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551659; c=relaxed/simple; bh=pCxc4nxfv0WShg8suY1U89MW7fSVleiCrPCrhGMxvUQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VwQkq9rDc2giArom8LQ7dBalVRibK/3gXHRsJ7tP9XtIRdbzJc6/0U+GmjiQfLdpPZk7AzK7KA9WBu3mOLwbZWfW3VUyBP7PpWnTAmur979XCTtBPPlOtQ2mVV1FFs+DIw0NN0I3O/bG421Mpw0/evLZW4qPfnQWypUNuuu8Tl4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jmjUVF8s; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jmjUVF8s" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4717DC4CEE3; Thu, 3 Jul 2025 14:07:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551658; bh=pCxc4nxfv0WShg8suY1U89MW7fSVleiCrPCrhGMxvUQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jmjUVF8smGiBdaJpDYUBMLyv/HEqgN8HproRARYCC5NsKiL3pETBMB0d2DOK1kU1l 4fPMo6Dr0mhgusJQqqwp3/i5FYuUuhJg1CF4T6qMLkLOgjZPVDvbjrptsXN08cpTcY VHgHigKzeGXXBvCQv/zZjiwjNapHiQA5TVYEWR801QtkTuae6IKmqnxYBtGHEFiU7T K83LlSrd218SLELj+FB8CnWMEur58U/z88FYr2QemwuYcUY6AygcHPo3M3P3n9KPnh b8dR6zUYrlp/Qpbf5yGNrqaJaKSCGeLZ+O/i/GQKjxeUkik7IB7Z49H4yUVrsEtYi1 Ye2VEhjuta9cw== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 2/6] sched/fair: Use task_work_queued() on numa_work Date: Thu, 3 Jul 2025 16:07:13 +0200 Message-ID: <20250703140717.25703-3-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the ad-hoc implementation of task_work_queued(). Reviewed-by: Oleg Nesterov Reviewed-by: Valentin Schneider Signed-off-by: Frederic Weisbecker --- kernel/sched/fair.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a14da5396fb..b350b0f4e7a5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3318,7 +3318,6 @@ static void task_numa_work(struct callback_head *work) =20 WARN_ON_ONCE(p !=3D container_of(work, struct task_struct, numa_work)); =20 - work->next =3D work; /* * Who cares about NUMA placement when they're dying. * @@ -3575,8 +3574,6 @@ void init_numa_balancing(unsigned long clone_flags, s= truct task_struct *p) p->numa_scan_seq =3D mm ? mm->numa_scan_seq : 0; p->numa_scan_period =3D sysctl_numa_balancing_scan_delay; p->numa_migrate_retry =3D 0; - /* Protect against double add, see task_tick_numa and task_numa_work */ - p->numa_work.next =3D &p->numa_work; p->numa_faults =3D NULL; p->numa_pages_migrated =3D 0; p->total_numa_faults =3D 0; @@ -3617,7 +3614,7 @@ static void task_tick_numa(struct rq *rq, struct task= _struct *curr) /* * We don't care about NUMA placement if we don't have memory. */ - if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next = !=3D work) + if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || task_work_q= ueued(work)) return; =20 /* --=20 2.48.1 From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 208A02E92AA for ; Thu, 3 Jul 2025 14:07:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551662; cv=none; b=fWKHI2XdZ8N6F6bCf4wabmJ6lhYdlTX46SBnOURBny0w9cLDVPU0/jNXm9V4TFdYq4ezy9GfyOkNMcPjP/UXUTwDkaIBXZZvpcB2r8GKrgpHwaIV+xZznrVJnvwBiSC5Dr1MNuHmEizZsnbXcRuXldOFtKF8Y/GDGARJB1J7iCQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551662; c=relaxed/simple; bh=X/u6N70lmZ7psbV6ssBZolq4F8KRBqL9Apr7baO/45s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EU2gL5VRBHRudEtQA8HuKwx9uUYfCHTWNFUlg7u9PYqz9dueE7EIMboZcRkDPjPH1eusplNYwR8mmsOQV7/hqAaxcrL56pnDBhvXjahJpBakGbvoOJVcanx/abLjKzLPdvHk9cbpq7io0H1AyD8qjt2Y8iJkx5YJ/ZnD6XnfXtQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SsiUUfLD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SsiUUfLD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2F3D0C4CEF0; Thu, 3 Jul 2025 14:07:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551661; bh=X/u6N70lmZ7psbV6ssBZolq4F8KRBqL9Apr7baO/45s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SsiUUfLDeoYsY9wMs+o30nMR0v75V/D1vnglMQOtKFWss6tuHhnNlgmShoRGFkng+ RHjCxuqo378saLlIr8EPpAwDFkVdccWdKINtQoE9q/RkFAEB+wvD5e/PhlYG0H2lD+ v5gDiXtwwubU4gVXkvtzNEv8vibqtlec8HRIhegNjBfGIYISqhUbRypCtxqwF8wbuS +Qia0SlBcXD0mamUopfNZm3R8vymX+U1BhYCuq46ggqQcMG4whevkMWIhqiySPysoe AhSwERH3CdtELB7qmA4+yhQb80nRGFrSPcPrNearTIvuTDplZnjEiUYQUhA48YYjR2 hE63GnImtLD2w== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 3/6] sched: Use task_work_queued() on cid_work Date: Thu, 3 Jul 2025 16:07:14 +0200 Message-ID: <20250703140717.25703-4-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the ad-hoc implementation of task_work_queued() Reviewed-by: Oleg Nesterov Signed-off-by: Frederic Weisbecker Reviewed-by: Valentin Schneider --- kernel/sched/core.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8988d38d46a3..35783a486c28 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10599,7 +10599,6 @@ static void task_mm_cid_work(struct callback_head *= work) =20 WARN_ON_ONCE(t !=3D container_of(work, struct task_struct, cid_work)); =20 - work->next =3D work; /* Prevent double-add */ if (t->flags & PF_EXITING) return; mm =3D t->mm; @@ -10643,7 +10642,6 @@ void init_sched_mm_cid(struct task_struct *t) if (mm_users =3D=3D 1) mm->mm_cid_next_scan =3D jiffies + msecs_to_jiffies(MM_CID_SCAN_DELAY); } - t->cid_work.next =3D &t->cid_work; /* Protect against double add */ init_task_work(&t->cid_work, task_mm_cid_work); } =20 @@ -10652,8 +10650,7 @@ void task_tick_mm_cid(struct rq *rq, struct task_st= ruct *curr) struct callback_head *work =3D &curr->cid_work; unsigned long now =3D jiffies; =20 - if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || - work->next !=3D work) + if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || task_work_q= ueued(work)) return; if (time_before(now, READ_ONCE(curr->mm->mm_cid_next_scan))) return; --=20 2.48.1 From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D24EA2E9EBD for ; Thu, 3 Jul 2025 14:07:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551664; cv=none; b=tkBAvD06UA2VyNtXM7WQ1R/YcO0yvW+9GbR45pEZQc91OU+vJGdTvY/bpQ3CAGfQK5yGkR8NefDivu2qxqwU22WDWhdb8K38TR8txr5UTWWmAGcIC63IccrO9Vp53bk2uYT4TBRG6QdPFXZhLintwKd4oalSOlk2wjvL8rkHGpQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551664; c=relaxed/simple; bh=6YLBMO4OX7UQZr8sjL+8YFZIQDKbhjUdn2GAhwLdBXI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q7bdBuq44tQbW41/as9vhVSRyu+UascsCGGp0TW53PnNQj85ELSi4x30hRd7ghxHzbMrfrIl6Jst/isQYFgQFXHrk3wmevy1eKGXtQ+jlXLX30UjaAaVc1jPE4Nj76PqmpX81uvgdoGN/TSuzw3duke0JRQw53P1hXxptiOyavk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lUeZonbi; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lUeZonbi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1426EC4CEE3; Thu, 3 Jul 2025 14:07:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551664; bh=6YLBMO4OX7UQZr8sjL+8YFZIQDKbhjUdn2GAhwLdBXI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=lUeZonbi0i97RiWpdXIlsePYQZPLgKc87nTUzdVxZmFG34JdoqRcAHk8uZXyoNdQC CfM8FFA+iHXdlqB8aTC449Vv7DJ6HgZ+SBpHsvhoPl26dPx1djC92/7X14hkzQpCy1 oDAEEU2vVruA2KXAnglmv3rFulkloSYie2ZXWn8fbViy7mabmajRu3uSkTwlYVR7Vu tfJwbWeE0avB3n0sRXkGU0hl24FTdJeQpxUk9llXySq24QTgt1JUyWKNY4nQqRTOxR AwbdRlBXsp8ykprfNoT3smm3C/yab5+/H2+N85fz4/HSU0bXOkjI+mX0E0Svi5NQMT EOhTnmujdu8DQ== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 4/6] tick/nohz: Move nohz_full related fields out of hot task struct's places Date: Thu, 3 Jul 2025 16:07:15 +0200 Message-ID: <20250703140717.25703-5-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" nohz_full is a feature that only fits into rare and very corner cases. Yet distros enable it by default and therefore the related fields are always reserved in the task struct. Those task fields are stored in the middle of cacheline hot places such as cputime accounting and context switch counting, which doesn't make any sense for a feature that is disabled most of the time. Move the nohz_full storage to colder places. Signed-off-by: Frederic Weisbecker Reviewed-by: Valentin Schneider --- include/linux/sched.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 4f78a64beb52..117aa20b8fb6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1114,13 +1114,7 @@ struct task_struct { #endif u64 gtime; struct prev_cputime prev_cputime; -#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN - struct vtime vtime; -#endif =20 -#ifdef CONFIG_NO_HZ_FULL - atomic_t tick_dep_mask; -#endif /* Context switch counts: */ unsigned long nvcsw; unsigned long nivcsw; @@ -1446,6 +1440,14 @@ struct task_struct { struct task_delay_info *delays; #endif =20 +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN + struct vtime vtime; +#endif + +#ifdef CONFIG_NO_HZ_FULL + atomic_t tick_dep_mask; +#endif + #ifdef CONFIG_FAULT_INJECTION int make_it_fail; unsigned int fail_nth; --=20 2.48.1 From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A6602EA160 for ; Thu, 3 Jul 2025 14:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551667; cv=none; b=S1Y38o6xQ4TtNpwQo+USTm92r9E+SwnDoXAAUHyACcG5pCFRh4JPPgo3ncDzczgmEj6LSgBi7d2a944xa5f+ZAo+UOzmrftyIlXfvz4MTQDj8y0odadHKPP+TwEi83R42/3oRG8PReXMYBOZ5H+6WB1B9Rv++62mOirnOH5k7gs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551667; c=relaxed/simple; bh=QY9AUh/9sbWRVey2XWEDIQEEsAK1y6vLLNwig1fqR50=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sRdfuQO3SBcn5idNz5kl2Ba2uJ+U8OnN8q7Y4n8W8tXvaM+XP/HB1QvKWPbg/UDbdV28Qw/gtdlr39NFQ+Ij/11APcfpHPbsS3el3y1sTaJu8ncnKI5VLoBCzqq16aho1EIt27Rx1wSP2LfEJ3WmbRyUEEH3UJ00DaC56nLm7Yo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Qi3cX/BG; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Qi3cX/BG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BE5D3C4CEF2; Thu, 3 Jul 2025 14:07:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551667; bh=QY9AUh/9sbWRVey2XWEDIQEEsAK1y6vLLNwig1fqR50=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Qi3cX/BGE6MvsXYCQZ5ZQSZr8cjrtR/u1fbQ5/6Uzb1d+1jbRRRddwO6gE8p1IJ5w k0+cNXhZzgRopQQswSx4Cu1beVVx8XuP5SsKgITRnJ2aPFToP6HHn/y6usKsnTH6V1 HnpNaSU0vweHwSuH6tGpfU2AiWCRvUEGNg4GNEX8kjhZx0I6alY4+5cE3KJXvcvnrm /PS6DciVI5u+rzJOZG8AEZ9xuWCMLW6HgR6yHTh5Kp1u4wFvREZ2p3miWDEe4oWVz/ jQP2f3bHnvJtvTpOrMPg2zaiOIJUGJvhq1pwm6MPtQ4a9eAotNAwxTmiu+ll4OeuRH kk0o63TmWOmyA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 5/6] sched/isolation: Introduce isolated task work Date: Thu, 3 Jul 2025 16:07:16 +0200 Message-ID: <20250703140717.25703-6-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Some asynchronous kernel work may be pending upon resume to userspace and execute later on. On isolated workload this becomes problematic once the process is done with preparatory work involving syscalls and wants to run in userspace without being interrupted. Provide an infrastructure to queue a work to be executed from the current isolated task context right before resuming to userspace. This goes with the assumption that isolated tasks are pinned to a single nohz_full CPU. Signed-off-by: Frederic Weisbecker --- include/linux/sched.h | 4 ++++ include/linux/sched/isolation.h | 17 +++++++++++++++++ kernel/sched/core.c | 1 + kernel/sched/isolation.c | 23 +++++++++++++++++++++++ kernel/sched/sched.h | 1 + kernel/time/Kconfig | 12 ++++++++++++ 6 files changed, 58 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 117aa20b8fb6..931065b5744f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1448,6 +1448,10 @@ struct task_struct { atomic_t tick_dep_mask; #endif =20 +#ifdef CONFIG_NO_HZ_FULL_WORK + struct callback_head nohz_full_work; +#endif + #ifdef CONFIG_FAULT_INJECTION int make_it_fail; unsigned int fail_nth; diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolatio= n.h index d8501f4709b5..9481b7d152c9 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -77,4 +77,21 @@ static inline bool cpu_is_isolated(int cpu) cpuset_cpu_is_isolated(cpu); } =20 +#if defined(CONFIG_NO_HZ_FULL_WORK) +extern int __isolated_task_work_queue(void); + +static inline int isolated_task_work_queue(void) +{ + if (!housekeeping_cpu(raw_smp_processor_id(), HK_TYPE_KERNEL_NOISE)) + return -ENOTSUPP; + + return __isolated_task_work_queue(); +} + +extern void isolated_task_work_init(struct task_struct *tsk); +#else +static inline int isolated_task_work_queue(void) { return -ENOTSUPP; } +static inline void isolated_task_work_init(struct task_struct *tsk) { } +#endif /* CONFIG_NO_HZ_FULL_WORK */ + #endif /* _LINUX_SCHED_ISOLATION_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 35783a486c28..eca8242bd81d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4538,6 +4538,7 @@ static void __sched_fork(unsigned long clone_flags, s= truct task_struct *p) p->migration_pending =3D NULL; #endif init_sched_mm_cid(p); + isolated_task_work_init(p); } =20 DEFINE_STATIC_KEY_FALSE(sched_numa_balancing); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 93b038d48900..d74c4ef91ce2 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -249,3 +249,26 @@ static int __init housekeeping_isolcpus_setup(char *st= r) return housekeeping_setup(str, flags); } __setup("isolcpus=3D", housekeeping_isolcpus_setup); + +#ifdef CONFIG_NO_HZ_FULL_WORK +static void isolated_task_work(struct callback_head *head) +{ +} + +int __isolated_task_work_queue(void) +{ + if (current->flags & (PF_KTHREAD | PF_USER_WORKER | PF_IO_WORKER)) + return -EINVAL; + + guard(irqsave)(); + if (task_work_queued(¤t->nohz_full_work)) + return 0; + + return task_work_add(current, ¤t->nohz_full_work, TWA_RESUME); +} + +void isolated_task_work_init(struct task_struct *tsk) +{ + init_task_work(&tsk->nohz_full_work, isolated_task_work); +} +#endif /* CONFIG_NO_HZ_FULL_WORK */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 475bb5998295..50e0cada1e1b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -60,6 +60,7 @@ #include #include #include +#include #include #include #include diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index b0b97a60aaa6..34591fc50ab1 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -146,6 +146,18 @@ config NO_HZ_FULL =20 endchoice =20 +config NO_HZ_FULL_WORK + bool "Full dynticks work flush on kernel exit" + depends on NO_HZ_FULL + help + Selectively flush pending asynchronous kernel work upon user exit. + Assuming userspace is not performing any critical isolated work while + issuing syscalls, some per-CPU kernel works are flushed before resuming + to userspace so that they don't get remotely queued later when the CPU + doesn't want to be disturbed. + + If in doubt say N. + config CONTEXT_TRACKING_USER bool depends on HAVE_CONTEXT_TRACKING_USER --=20 2.48.1 From nobody Wed Oct 8 00:39:50 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB52A2EA499 for ; Thu, 3 Jul 2025 14:07:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551670; cv=none; b=sK6RkIyKZKL7izJbFLjJYIw+n7ekz3Hj/EggQg4pKeC49YCBI3d5mSRkKrrgqeu887bqwNjO+sZetB9Tuh+VqIt4H26nskncSQZP0EM2YEV3T4Zxu0DwfxJQ8mPlbCAYXuu9guilcmJ126/Ax++lHrs1Xt3ovluHq5gD3QSiteY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751551670; c=relaxed/simple; bh=6MRU5dtxFNiZV2KGlLDNWprwfE+/qRvCU7C2Kg8bEfI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZO614bjhalRpztUFKxglrwMvUnsrlPpEWEMq9jAZYKyVkEjxIaBd/ZuUNEFgOANpdKhywV2nx7AHgK6B+BM8QOEwR73NsMNsPVJmp50e+D5eWmKRZg/k6eefBpWa9mDl9zKf4Mbw2LxdW8DjjQbvEn8tFnNiS8fMBtYECuFbRw8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IIgVUaiJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IIgVUaiJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 81FC7C4CEF2; Thu, 3 Jul 2025 14:07:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751551669; bh=6MRU5dtxFNiZV2KGlLDNWprwfE+/qRvCU7C2Kg8bEfI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=IIgVUaiJTYwAzh+S9P2+Kk/G8TEtCZ3FaKFey34KGeUdQ4gVjMQWVDAgIInASZV9j 5C7DRCC2ppsPPNaP16VNACnpP3eitZEWzp7jSHe0WQfZ0Sqx+3uIBjm4bIz1hHReET PtGkqKf1uBn7s/Ypn3evUuCIbHJFXNcFSNWpeIhKONKSAkehS4axozi03jUUNs2Sp2 b6zPZpzCB1/aK8YdN3im+cT631nQNcPGZ98IADyO9GX+oiHhB7+7GHiDrzxvkdu4vC i4dL2ApqJzAsyKDIeBoOKGmuKWRmyJtPwEfGoZ11avLVPbi2zYL34yOxRReDPzuq0L OLkUNqemcJYzg== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Andrew Morton , Ingo Molnar , Marcelo Tosatti , Michal Hocko , Oleg Nesterov , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Vlastimil Babka , linux-mm@kvack.org Subject: [PATCH 6/6] mm: Drain LRUs upon resume to userspace on nohz_full CPUs Date: Thu, 3 Jul 2025 16:07:17 +0200 Message-ID: <20250703140717.25703-7-frederic@kernel.org> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250703140717.25703-1-frederic@kernel.org> References: <20250703140717.25703-1-frederic@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" LRU batching can be source of disturbances for isolated workloads running in the userspace because it requires kernel worker to handle that and that would preempt the said task. The primary source for such disruption would be __lru_add_drain_all which could be triggered from non-isolated CPUs. Why would an isolated CPU have anything on the pcp cache? Many syscalls allocate pages that might end there. A typical and unavoidable one would be fork/exec leaving pages on the cache behind just waiting for somebody to drain. Address the problem by noting a batch has been added to the cache and schedule draining upon return to userspace so the work is done while the syscall is still executing and there are no suprises while the task runs in the userspace where it doesn't want to be preempted. Signed-off-by: Frederic Weisbecker --- include/linux/pagevec.h | 18 ++---------------- include/linux/swap.h | 1 + kernel/sched/isolation.c | 3 +++ mm/swap.c | 30 +++++++++++++++++++++++++++++- 4 files changed, 35 insertions(+), 17 deletions(-) diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index 5d3a0cccc6bf..7e647b8df4c7 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -61,22 +61,8 @@ static inline unsigned int folio_batch_space(struct foli= o_batch *fbatch) return PAGEVEC_SIZE - fbatch->nr; } =20 -/** - * folio_batch_add() - Add a folio to a batch. - * @fbatch: The folio batch. - * @folio: The folio to add. - * - * The folio is added to the end of the batch. - * The batch must have previously been initialised using folio_batch_init(= ). - * - * Return: The number of slots still available. - */ -static inline unsigned folio_batch_add(struct folio_batch *fbatch, - struct folio *folio) -{ - fbatch->folios[fbatch->nr++] =3D folio; - return folio_batch_space(fbatch); -} +unsigned int folio_batch_add(struct folio_batch *fbatch, + struct folio *folio); =20 /** * folio_batch_next - Return the next folio to process. diff --git a/include/linux/swap.h b/include/linux/swap.h index bc0e1c275fc0..d74ad6c893a1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -401,6 +401,7 @@ extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); extern void lru_add_drain_all(void); +extern void lru_add_and_bh_lrus_drain(void); void folio_deactivate(struct folio *folio); void folio_mark_lazyfree(struct folio *folio); extern void swap_setup(void); diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index d74c4ef91ce2..06882916c24f 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -8,6 +8,8 @@ * */ =20 +#include + enum hk_flags { HK_FLAG_DOMAIN =3D BIT(HK_TYPE_DOMAIN), HK_FLAG_MANAGED_IRQ =3D BIT(HK_TYPE_MANAGED_IRQ), @@ -253,6 +255,7 @@ __setup("isolcpus=3D", housekeeping_isolcpus_setup); #ifdef CONFIG_NO_HZ_FULL_WORK static void isolated_task_work(struct callback_head *head) { + lru_add_and_bh_lrus_drain(); } =20 int __isolated_task_work_queue(void) diff --git a/mm/swap.c b/mm/swap.c index 4fc322f7111a..da08c918cef4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -155,6 +156,29 @@ static void lru_add(struct lruvec *lruvec, struct foli= o *folio) trace_mm_lru_insertion(folio); } =20 +/** + * folio_batch_add() - Add a folio to a batch. + * @fbatch: The folio batch. + * @folio: The folio to add. + * + * The folio is added to the end of the batch. + * The batch must have previously been initialised using folio_batch_init(= ). + * + * Return: The number of slots still available. + */ +unsigned int folio_batch_add(struct folio_batch *fbatch, + struct folio *folio) +{ + unsigned int ret; + + fbatch->folios[fbatch->nr++] =3D folio; + ret =3D folio_batch_space(fbatch); + isolated_task_work_queue(); + + return ret; +} +EXPORT_SYMBOL(folio_batch_add); + static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t mov= e_fn) { int i; @@ -738,7 +762,7 @@ void lru_add_drain(void) * the same cpu. It shouldn't be a problem in !SMP case since * the core is only one and the locks will disable preemption. */ -static void lru_add_and_bh_lrus_drain(void) +void lru_add_and_bh_lrus_drain(void) { local_lock(&cpu_fbatches.lock); lru_add_drain_cpu(smp_processor_id()); @@ -864,6 +888,10 @@ static inline void __lru_add_drain_all(bool force_all_= cpus) for_each_online_cpu(cpu) { struct work_struct *work =3D &per_cpu(lru_add_drain_work, cpu); =20 + /* Isolated CPUs handle their cache upon return to userspace */ + if (IS_ENABLED(CONFIG_NO_HZ_FULL_WORK) && !housekeeping_cpu(cpu, HK_TYPE= _KERNEL_NOISE)) + continue; + if (cpu_needs_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); queue_work_on(cpu, mm_percpu_wq, work); --=20 2.48.1