From nobody Wed Apr 8 03:07:04 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 96D4128BA95 for ; Tue, 10 Mar 2026 20:28:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174537; cv=none; b=pneL8B9uTRz5tbEM5HhQmkdG528M7GgZLaGOKveQf/+nH9GPsBEFCkUUfmezt/5HZqWZe381xLT6Oaf72RAacCaNI8PU6ZjS7qY60wLeUlz7yUU+wsX+36m0AUmefnsEGywUOFoVBZpI+sbbL72Sjkfr/5jdf9oCKNOhtyTw7BU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174537; c=relaxed/simple; bh=3a7Im+dx/w+zlGIvyhEdzYwRXoUOFhCC+NiflF0+us4=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=qPvwUG/YQRNPbP2J4HLAr1AhlLrYGhtZPK0z3yNZCrsXyFIGq+uxDCEVh+/PiqmOv39ie9HlzZpGJihzicgy+/S/ubSa/uYW75jMhKeMQr6EEQoKotKuR/IJ0TJvgKbCCkU8KMKsSFvbBSdYoWqsknN/n+oXnjltQJRMmwnW8tk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=YEGjWncK; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="YEGjWncK" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9BA18C19423; Tue, 10 Mar 2026 20:28:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773174537; bh=3a7Im+dx/w+zlGIvyhEdzYwRXoUOFhCC+NiflF0+us4=; h=Date:From:To:Cc:Subject:References:From; b=YEGjWncKIm5P+312kvroTSfDdve+QktWzkhFncXVVw/0hjb8ddYrWJZCT6j4LZUOj kNRhSq/m/zfzqraXH8eUFGYDBMRoWNqexsA4iNUDfFr8ZjfnMmODans2DdPnUMQN/s +1LhWocXP5HVyXVI/CDhoJASulM6W5GNGeKRLcZU+I7RYw3ogF2P4n1+JW4DjTf/OZ 9ZQurJmM8e9G+f2WTevd4x4Jh6Uo9RYLzYXmwdlCC6bS8zxHbkHWL+v/bR6LdsxpVb 3Karn2AQJY2DW8VQiGn2AMKXjdPSvSji7Z321hn8B2Fu8AWT64z16VsyqpzpuwD062 VJrgFePmmL6tg== Date: Tue, 10 Mar 2026 21:28:53 +0100 Message-ID: <20260310202525.969061974@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Mathieu Desnoyers , Matthieu Baerts , Jiri Slaby Subject: [patch 1/4] sched/mmcid: Prevent CID stalls due to concurrent forks References: <20260310201009.257617049@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A newly forked task is accounted as MMCID user before the task is visible in the process' thread list and the global task list. This creates the following problem: CPU1 CPU2 fork() sched_mm_cid_fork(tnew1) tnew1->mm.mm_cid_users++; tnew1->mm_cid.cid =3D getcid() -> preemption fork() sched_mm_cid_fork(tnew2) tnew2->mm.mm_cid_users++; // Reaches the per CPU threshold mm_cid_fixup_tasks_to_cpus() for_each_other(current, p) .... As tnew1 is not visible yet, this fails to fix up the already allocated CID of tnew1. As a consequence a subsequent schedule in might fail to acquire a (transitional) CID and the machine stalls. Move the invocation of sched_mm_cid_fork() after the new task becomes visible in the thread and the task list to prevent this. This also makes it symmetrical vs. exit() where the task is removed as CID user before the task is removed from the thread and task lists. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functio= ns") Signed-off-by: Thomas Gleixner Tested-by: Matthieu Baerts (NGI0) --- include/linux/sched.h | 2 -- kernel/fork.c | 2 -- kernel/sched/core.c | 22 +++++++++++++++------- 3 files changed, 15 insertions(+), 11 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2354,7 +2354,6 @@ static __always_inline void alloc_tag_re #ifdef CONFIG_SCHED_MM_CID void sched_mm_cid_before_execve(struct task_struct *t); void sched_mm_cid_after_execve(struct task_struct *t); -void sched_mm_cid_fork(struct task_struct *t); void sched_mm_cid_exit(struct task_struct *t); static __always_inline int task_mm_cid(struct task_struct *t) { @@ -2363,7 +2362,6 @@ static __always_inline int task_mm_cid(s #else static inline void sched_mm_cid_before_execve(struct task_struct *t) { } static inline void sched_mm_cid_after_execve(struct task_struct *t) { } -static inline void sched_mm_cid_fork(struct task_struct *t) { } static inline void sched_mm_cid_exit(struct task_struct *t) { } static __always_inline int task_mm_cid(struct task_struct *t) { --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1586,7 +1586,6 @@ static int copy_mm(u64 clone_flags, stru =20 tsk->mm =3D mm; tsk->active_mm =3D mm; - sched_mm_cid_fork(tsk); return 0; } =20 @@ -2498,7 +2497,6 @@ static bool need_futex_hash_allocate_def exit_nsproxy_namespaces(p); bad_fork_cleanup_mm: if (p->mm) { - sched_mm_cid_exit(p); mm_clear_owner(p->mm, p); mmput(p->mm); } --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4729,8 +4729,12 @@ void sched_cancel_fork(struct task_struc scx_cancel_fork(p); } =20 +static void sched_mm_cid_fork(struct task_struct *t); + void sched_post_fork(struct task_struct *p) { + if (IS_ENABLED(CONFIG_SCHED_MM_CID)) + sched_mm_cid_fork(p); uclamp_post_fork(p); scx_post_fork(p); } @@ -10646,12 +10650,13 @@ static void mm_cid_do_fixup_tasks_to_cpu * possible switch back to per task mode happens either in the * deferred handler function or in the next fork()/exit(). * - * The caller has already transferred. The newly incoming task is - * already accounted for, but not yet visible. + * The caller has already transferred so remove it from the users + * count. The incoming task is already visible and has mm_cid.active, + * but has task::mm_cid::cid =3D=3D UNSET. Still it needs to be accounted + * for. Concurrent fork()s might add more threads, but all of them have + * task::mm_cid::active =3D 0, so they don't affect the accounting here. */ - users =3D mm->mm_cid.users - 2; - if (!users) - return; + users =3D mm->mm_cid.users - 1; =20 guard(rcu)(); for_other_threads(current, t) { @@ -10688,12 +10693,15 @@ static bool sched_mm_cid_add_user(struct return mm_update_max_cids(mm); } =20 -void sched_mm_cid_fork(struct task_struct *t) +static void sched_mm_cid_fork(struct task_struct *t) { struct mm_struct *mm =3D t->mm; bool percpu; =20 - WARN_ON_ONCE(!mm || t->mm_cid.cid !=3D MM_CID_UNSET); + if (!mm) + return; + + WARN_ON_ONCE(t->mm_cid.cid !=3D MM_CID_UNSET); =20 guard(mutex)(&mm->mm_cid.mutex); scoped_guard(raw_spinlock_irq, &mm->mm_cid.lock) { From nobody Wed Apr 8 03:07:04 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E24C73E9597 for ; Tue, 10 Mar 2026 20:29:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174543; cv=none; b=DDknHEyptQ4gLEujuXCpoZP045a/LdyukmRhbAuSM59vxsu1s5hLu7ix/66FI4NWTdRfwNoBNNPxi1bfk6r4F5FgOCAWzShmrcwnEZMO8oLbYk8kMqkDOk6g7uT1LawNYMw/IGk5YFkJG7RG7IXwd0QQxgYDRrRMCYRxlItSMYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174543; c=relaxed/simple; bh=o4A35Cs+YBk2a4/kiNaOoEhOapMtVwqgpAPnVtyjuf0=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=E9JHjdNOqeAHgFaf8nXZ7phGzrtpIo1rbyOBQd+9D2ThaaY3BSCLwL8laoPigU++fUYVhgYgg+OCIYWS5X5Y8r/+1IREeX6u6b7CEKCwP48qHrJUjIOXrdNqbnCWk/80BAb4sWRkf/BSubFpWG2lFUbmvzpYo3ntBRGwveo17OI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=aQgYAmHr; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="aQgYAmHr" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E6C1DC19423; Tue, 10 Mar 2026 20:29:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773174542; bh=o4A35Cs+YBk2a4/kiNaOoEhOapMtVwqgpAPnVtyjuf0=; h=Date:From:To:Cc:Subject:References:From; b=aQgYAmHrsFTWn5hdDaCOnt41W3b1lypltI6s49P4j/1IPu/SZ7rjwIGAIUv+I1BbJ z8jeUhxZWZAA85tBWdKDh5PqmaHFfSBLK+cFqzSLtAiDkhVZe/Qlte9NKcpxHCueae bZ9wDUEF4gYfjYTPIheMfuYk+fqkhPBNC82SwbFYZ+tHU9mJXcmN80IBG0KKXejbJO wAoOD3o6PdwS0X0lspofZLENfKzqGHWEqjbbp49YthX0ojrYJr+AZ+t861sN3vHILF L62Q6OEzkQt8LxjuC2GVAabFND65xKVU8Z47qRH6qMNVaxgbGmnkgfb7thFYu7DiC1 FFwBt2tOtLViQ== Date: Tue, 10 Mar 2026 21:28:58 +0100 Message-ID: <20260310202526.048657665@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Mathieu Desnoyers , Matthieu Baerts , Jiri Slaby Subject: [patch 2/4] sched/mmcid: Handle vfork()/CLONE_VM correctly References: <20260310201009.257617049@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Matthieu and Jiri reported stalls where a task endlessly loops in mm_get_cid() when scheduling in. It turned out that the logic which handles vfork()'ed tasks is broken. It is invoked when the number of tasks associated to a process is smaller than the number of MMCID users. It then walks the task list to find the vfork()'ed task, but accounts all the already processed tasks as well. If that double processing brings the number of to be handled tasks to 0, the walk stops and the vfork()'ed task's CID is not fixed up. As a consequence a subsequent schedule in fails to acquire a (transitional) CID and the machine stalls. Cure this by removing the accounting condition and make the fixup always walk the full task list if it could not find the exact number of users in the process' thread list. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functio= ns") Reported-by: Matthieu Baerts Reported-by: Jiri Slaby Signed-off-by: Thomas Gleixner Closes: https://lore.kernel.org/b24ffcb3-09d5-4e48-9070-0b69bc654281@kernel= .org Tested-by: Matthieu Baerts (NGI0) --- kernel/sched/core.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10671,10 +10671,7 @@ static void mm_cid_do_fixup_tasks_to_cpu for_each_process_thread(p, t) { if (t =3D=3D current || t->mm !=3D mm) continue; - if (mm_cid_fixup_task_to_cpu(t, mm)) { - if (--users =3D=3D 0) - return; - } + mm_cid_fixup_task_to_cpu(t, mm); } } From nobody Wed Apr 8 03:07:04 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C2923EAC62 for ; Tue, 10 Mar 2026 20:29:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174548; cv=none; b=I9xUqaM4k8qBfeWPP8En5aQOc4vPIp5NZW/y0uq/G77jSqtUM0J3cMW9NY5AhMfpdvJKZk9xpfL7Esy/FbbJt7l9DVFQtKt4PLRKBoqbvt5z3+RrEifeFOlyILuvX8Rx4BKflJkh8KehpkHr+M7aeDFEbTp5rspDI+d8uL6/B3w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174548; c=relaxed/simple; bh=eRWJmG0Xt0fYDFAd6Vd/md9H9N9F8Ozc6BTWFL8Y9CM=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=D5htQWCk0nO0lBGlpJqbLqF9bn9PeWtSJz/yvUhxcZyVHxnoIifwDYMlnKouRNS7IqVI4BuyrP7nUfjK3EaaZZR+yIBBzOhPgjWIaB1Xmi9FnAE6wb/OX+rQchSDwYoCHh99uNjvmVfvLW8VV/cOnTh8odNfQNH0KjFX0GMw024= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Vsu8zqw6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Vsu8zqw6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EA67C2BC87; Tue, 10 Mar 2026 20:29:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773174547; bh=eRWJmG0Xt0fYDFAd6Vd/md9H9N9F8Ozc6BTWFL8Y9CM=; h=Date:From:To:Cc:Subject:References:From; b=Vsu8zqw67dDv/K/Am01VHC3+wVa1GxqkezjgHtYMaaz4s6231myNYY+kc5yu5+c+4 gWy/krmQLcUqe9YbCJyIzsAWYvDGHfvtF4zh/WTScH7tPNfvUH4IOHijg0vSlDB+2v bGu8ShRvxdzFEtx6AszC2lKOoJIiR2WCgdCkW96pVZXqHdFde3ujHDk5DFdV5ACZt6 tcJc/EWOnV6o7xyjjPyN+qQqB7cJsxv/bpfJG7Q594QmCwtfx6COOk0pmYDKMI2Ti6 iRxkCPOwqB1+5tzhp6rlR16ABAUrScEXW/2Wvlrq9zNv8KSwa7alKV8TYxgOmkuc2P pYmP+bM40Xi1A== Date: Tue, 10 Mar 2026 21:29:04 +0100 Message-ID: <20260310202526.116363613@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Mathieu Desnoyers , Matthieu Baerts , Jiri Slaby Subject: [patch 3/4] sched/mmcid: Remove pointless preempt guard References: <20260310201009.257617049@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is a leftover from the early versions of this function where it could be invoked without mm::mm_cid::lock held. Remove it and add lockdep asserts instead. Fixes: 653fda7ae73d ("sched/mmcid: Switch over to the new mechanism") Signed-off-by: Thomas Gleixner Tested-by: Matthieu Baerts (NGI0) --- kernel/sched/core.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10685,6 +10685,8 @@ static void mm_cid_fixup_tasks_to_cpus(v =20 static bool sched_mm_cid_add_user(struct task_struct *t, struct mm_struct = *mm) { + lockdep_assert_held(&mm->mm_cid.lock); + t->mm_cid.active =3D 1; mm->mm_cid.users++; return mm_update_max_cids(mm); @@ -10737,12 +10739,12 @@ static void sched_mm_cid_fork(struct tas =20 static bool sched_mm_cid_remove_user(struct task_struct *t) { + lockdep_assert_held(&t->mm->mm_cid.lock); + t->mm_cid.active =3D 0; - scoped_guard(preempt) { - /* Clear the transition bit */ - t->mm_cid.cid =3D cid_from_transit_cid(t->mm_cid.cid); - mm_unset_cid_on_task(t); - } + /* Clear the transition bit */ + t->mm_cid.cid =3D cid_from_transit_cid(t->mm_cid.cid); + mm_unset_cid_on_task(t); t->mm->mm_cid.users--; return mm_update_max_cids(t->mm); } From nobody Wed Apr 8 03:07:04 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61B4A28BA95 for ; Tue, 10 Mar 2026 20:29:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174553; cv=none; b=QaiaBwbPKcwTEpc3aFVbUeGS0meGcv5Ax7T04DHzl+yjR1VQm8lGPYqrUT9Y+SDQdoBm5W5kNLKo89EzCo8HYBy91tOYGPJjXVdlIKbIVYaC5TIYPVC80WgGXLmGrQvfkXnZuK99+i4TJAmnyb5/bIMLbgaK2e3qdatfK3Y3qIw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773174553; c=relaxed/simple; bh=IqoKBeoajhWIGcG26R/iRZOjpee28i/vEQF7hRF529Q=; h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=D7dU3ZZmXvfT0hld2do4XgBnqU4qopPmJAiSbDJ4T3p2Ytw/ZNpgHTXosKO4oboJ0yW5y7ymZ29/+NAreEUba+F5lV7eLrBpOgn/HmbBmfnBru3C0GydYXb+rRI1QC1cCOfCrZ8Uy+BYSBnyXRYTcy2/MvG/gHU0+kWxiiqOwFE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=P4PA06yt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="P4PA06yt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78515C2BC87; Tue, 10 Mar 2026 20:29:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773174553; bh=IqoKBeoajhWIGcG26R/iRZOjpee28i/vEQF7hRF529Q=; h=Date:From:To:Cc:Subject:References:From; b=P4PA06ytjj3Jq8Yke6pFka+IzcM2jxzq2jNlYGq9wN2+YrNqFr02WqAfSrRUER/GP vInRwtx+LQkBGHbj9cNseotbWy4+zyXjBvN73LhX99OsMYygI56RTVOoO9Ut7RXyKs rVU2+YZht19PLLUGsXJUjnLi1ILcZCDJpmLLCX23foFoJBcQUdZ97LbVXqgzwXrKBN y6C0gijIC1CkzspsKAtUj98FWL/3Jogv51HiaOmQJYOPEsW6xlnTmgGo4o/EqjyYjS qgSqpYUQLIjsGlWdnbiQ3qnoWMakelDgGB0NubbZOwEhdp/gGYZk4eCrODVgzp44qC bOyLM4Kc+WiMw== Date: Tue, 10 Mar 2026 21:29:09 +0100 Message-ID: <20260310202526.183824481@kernel.org> User-Agent: quilt/0.68 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Mathieu Desnoyers , Matthieu Baerts , Jiri Slaby Subject: [patch 4/4] sched/mmcid: Avoid full tasklist walks References: <20260310201009.257617049@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Chasing vfork()'ed tasks on a CID ownership mode switch requires a full task list walk, which is obviously expensive on large systems. Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid and walk this list instead. This removes the proven to be flaky counting logic and avoids a full task list walk in the case of vfork()'ed tasks. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functio= ns") Signed-off-by: Thomas Gleixner Tested-by: Matthieu Baerts (NGI0) --- include/linux/rseq_types.h | 6 ++++- kernel/fork.c | 1=20 kernel/sched/core.c | 54 ++++++++++------------------------------= ----- 3 files changed, 18 insertions(+), 43 deletions(-) --- a/include/linux/rseq_types.h +++ b/include/linux/rseq_types.h @@ -133,10 +133,12 @@ struct rseq_data { }; * @active: MM CID is active for the task * @cid: The CID associated to the task either permanently or * borrowed from the CPU + * @node: Queued in the per MM MMCID list */ struct sched_mm_cid { unsigned int active; unsigned int cid; + struct hlist_node node; }; =20 /** @@ -157,6 +159,7 @@ struct mm_cid_pcpu { * @work: Regular work to handle the affinity mode change case * @lock: Spinlock to protect against affinity setting which can't take @= mutex * @mutex: Mutex to serialize forks and exits related to this mm + * @user_list: List of the MM CID users of a MM * @nr_cpus_allowed: The number of CPUs in the per MM allowed CPUs map. Th= e map * is growth only. * @users: The number of tasks sharing this MM. Separate from mm::mm_users @@ -177,13 +180,14 @@ struct mm_mm_cid { =20 raw_spinlock_t lock; struct mutex mutex; + struct hlist_head user_list; =20 /* Low frequency modified */ unsigned int nr_cpus_allowed; unsigned int users; unsigned int pcpu_thrs; unsigned int update_deferred; -}____cacheline_aligned_in_smp; +} ____cacheline_aligned; #else /* CONFIG_SCHED_MM_CID */ struct mm_mm_cid { }; struct sched_mm_cid { }; --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1000,6 +1000,7 @@ static struct task_struct *dup_task_stru #ifdef CONFIG_SCHED_MM_CID tsk->mm_cid.cid =3D MM_CID_UNSET; tsk->mm_cid.active =3D 0; + INIT_HLIST_NODE(&tsk->mm_cid.node); #endif return tsk; =20 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10621,13 +10621,10 @@ static inline void mm_cid_transit_to_cpu } } =20 -static bool mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_stru= ct *mm) +static void mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_stru= ct *mm) { /* Remote access to mm::mm_cid::pcpu requires rq_lock */ guard(task_rq_lock)(t); - /* If the task is not active it is not in the users count */ - if (!t->mm_cid.active) - return false; if (cid_on_task(t->mm_cid.cid)) { /* If running on the CPU, put the CID in transit mode, otherwise drop it= */ if (task_rq(t)->curr =3D=3D t) @@ -10635,51 +10632,21 @@ static bool mm_cid_fixup_task_to_cpu(str else mm_unset_cid_on_task(t); } - return true; } =20 -static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm) +static void mm_cid_fixup_tasks_to_cpus(void) { - struct task_struct *p, *t; - unsigned int users; - - /* - * This can obviously race with a concurrent affinity change, which - * increases the number of allowed CPUs for this mm, but that does - * not affect the mode and only changes the CID constraints. A - * possible switch back to per task mode happens either in the - * deferred handler function or in the next fork()/exit(). - * - * The caller has already transferred so remove it from the users - * count. The incoming task is already visible and has mm_cid.active, - * but has task::mm_cid::cid =3D=3D UNSET. Still it needs to be accounted - * for. Concurrent fork()s might add more threads, but all of them have - * task::mm_cid::active =3D 0, so they don't affect the accounting here. - */ - users =3D mm->mm_cid.users - 1; - - guard(rcu)(); - for_other_threads(current, t) { - if (mm_cid_fixup_task_to_cpu(t, mm)) - users--; - } + struct mm_struct *mm =3D current->mm; + struct task_struct *t; =20 - if (!users) - return; + lockdep_assert_held(&mm->mm_cid.mutex); =20 - /* Happens only for VM_CLONE processes. */ - for_each_process_thread(p, t) { - if (t =3D=3D current || t->mm !=3D mm) - continue; - mm_cid_fixup_task_to_cpu(t, mm); + hlist_for_each_entry(t, &mm->mm_cid.user_list, mm_cid.node) { + /* Current has already transferred before invoking the fixup. */ + if (t !=3D current) + mm_cid_fixup_task_to_cpu(t, mm); } -} - -static void mm_cid_fixup_tasks_to_cpus(void) -{ - struct mm_struct *mm =3D current->mm; =20 - mm_cid_do_fixup_tasks_to_cpus(mm); mm_cid_complete_transit(mm, MM_CID_ONCPU); } =20 @@ -10688,6 +10655,7 @@ static bool sched_mm_cid_add_user(struct lockdep_assert_held(&mm->mm_cid.lock); =20 t->mm_cid.active =3D 1; + hlist_add_head(&t->mm_cid.node, &mm->mm_cid.user_list); mm->mm_cid.users++; return mm_update_max_cids(mm); } @@ -10745,6 +10713,7 @@ static bool sched_mm_cid_remove_user(str /* Clear the transition bit */ t->mm_cid.cid =3D cid_from_transit_cid(t->mm_cid.cid); mm_unset_cid_on_task(t); + hlist_del_init(&t->mm_cid.node); t->mm->mm_cid.users--; return mm_update_max_cids(t->mm); } @@ -10887,6 +10856,7 @@ void mm_init_cid(struct mm_struct *mm, s mutex_init(&mm->mm_cid.mutex); mm->mm_cid.irq_work =3D IRQ_WORK_INIT_HARD(mm_cid_irq_work); INIT_WORK(&mm->mm_cid.work, mm_cid_work_fn); + INIT_HLIST_HEAD(&mm->mm_cid.user_list); cpumask_copy(mm_cpus_allowed(mm), &p->cpus_mask); bitmap_zero(mm_cidmask(mm), num_possible_cpus()); }