From nobody Tue Apr 7 22:03:13 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6162E3D566A; Wed, 11 Mar 2026 11:04:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227055; cv=none; b=IiQy/HZIsWJvzfQEh12bhqGoQWV00LQYhzT/gAlde/+GZJre55y4Gkhg4Xf8kQPdteaVnkGo5u+LO11Fb6fwQPk5zNeAFpuXsXd9AkcvR4Y6KMK3OByxfCh/+94qMxZYIQrR2ThDKzSaf2p9DUWsF2rFvZhZ/zQfNpKJd1nq/rg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227055; c=relaxed/simple; bh=OCgOX6elUt+tVkU2doAE0VIesEansDjRmUi56GskxsI=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=noZcW/mlEa8t5nLDQgBj4YRCi3ZmS2TA/niAk/WAeOeATREEgi0WZF6oFICWtaHCcLzTQR87RXEocebuLak8rmE+XC9QZaMFVeeV9ryxbdoW06D7NBPv0p0u0ubO3fb4FD9DZ7KVHDAnoSKFhHgYx+OSLE/XB3kLNsPzG49csA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=YBZO0K8e; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=4EkHk7vm; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="YBZO0K8e"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="4EkHk7vm" Date: Wed, 11 Mar 2026 11:04:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1773227047; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lw58+8yOofcnmBC3lJm89vkGQUYRyZdGZDOMIZHt0oE=; b=YBZO0K8exPRdsk1f5QI8PsHG1/MPWs88m/vCwZo3pGqBh9ZMnq8zO7JGHoEHRxDCQmTctm qtoxbgyaHnod2cNY5NA3iVNGKXFieZZdkEXNZZl4x+u3oXQVgWemxPKMyvkaq+L3f8kuXZ Iqlkthg6bu43hXrt0iW/oftsSWQZ8VK2uRXaUcGa4a5qHZ9jYYqdV9MdNLifSXWMiwbcne 3kTII9sygfrb5O/BC6mgRVMq7MsmcF/aq/vNksIgRFsodzantYCwLBniIEBTHjLb5C8Ol2 JK6LGZr5YJquhpEt2naL+WKLeeViUVa6ZGZvWiy53LQFM29ySqqL7xiBcIyOLA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1773227047; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lw58+8yOofcnmBC3lJm89vkGQUYRyZdGZDOMIZHt0oE=; b=4EkHk7vmFDRZu+Ojn00tQfBUQQxWCNA00fpu2YAKpd+yFZarMY6SjG2l73p2O3FRGcCa0U aLeqOsfrbrWicjDA== From: "tip-bot2 for Thomas Gleixner" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/mmcid: Avoid full tasklist walks Cc: Thomas Gleixner , "Peter Zijlstra (Intel)" , "Matthieu Baerts (NGI0)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260310202526.183824481@kernel.org> References: <20260310202526.183824481@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177322704466.1647592.10271243406990283554.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 192d852129b1b7c4f0ddbab95d0de1efd5ee1405 Gitweb: https://git.kernel.org/tip/192d852129b1b7c4f0ddbab95d0de1efd= 5ee1405 Author: Thomas Gleixner AuthorDate: Tue, 10 Mar 2026 21:29:09 +01:00 Committer: Peter Zijlstra CommitterDate: Wed, 11 Mar 2026 12:01:07 +01:00 sched/mmcid: Avoid full tasklist walks Chasing vfork()'ed tasks on a CID ownership mode switch requires a full task list walk, which is obviously expensive on large systems. Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid and walk this list instead. This removes the proven to be flaky counting logic and avoids a full task list walk in the case of vfork()'ed tasks. Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functio= ns") Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Tested-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20260310202526.183824481@kernel.org --- include/linux/rseq_types.h | 6 +++- kernel/fork.c | 1 +- kernel/sched/core.c | 54 ++++++++----------------------------- 3 files changed, 18 insertions(+), 43 deletions(-) diff --git a/include/linux/rseq_types.h b/include/linux/rseq_types.h index da5fa6f..0b42045 100644 --- a/include/linux/rseq_types.h +++ b/include/linux/rseq_types.h @@ -133,10 +133,12 @@ struct rseq_data { }; * @active: MM CID is active for the task * @cid: The CID associated to the task either permanently or * borrowed from the CPU + * @node: Queued in the per MM MMCID list */ struct sched_mm_cid { unsigned int active; unsigned int cid; + struct hlist_node node; }; =20 /** @@ -157,6 +159,7 @@ struct mm_cid_pcpu { * @work: Regular work to handle the affinity mode change case * @lock: Spinlock to protect against affinity setting which can't take @= mutex * @mutex: Mutex to serialize forks and exits related to this mm + * @user_list: List of the MM CID users of a MM * @nr_cpus_allowed: The number of CPUs in the per MM allowed CPUs map. Th= e map * is growth only. * @users: The number of tasks sharing this MM. Separate from mm::mm_users @@ -177,13 +180,14 @@ struct mm_mm_cid { =20 raw_spinlock_t lock; struct mutex mutex; + struct hlist_head user_list; =20 /* Low frequency modified */ unsigned int nr_cpus_allowed; unsigned int users; unsigned int pcpu_thrs; unsigned int update_deferred; -}____cacheline_aligned_in_smp; +} ____cacheline_aligned; #else /* CONFIG_SCHED_MM_CID */ struct mm_mm_cid { }; struct sched_mm_cid { }; diff --git a/kernel/fork.c b/kernel/fork.c index 7febf4c..bc2bf58 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1000,6 +1000,7 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #ifdef CONFIG_SCHED_MM_CID tsk->mm_cid.cid =3D MM_CID_UNSET; tsk->mm_cid.active =3D 0; + INIT_HLIST_NODE(&tsk->mm_cid.node); #endif return tsk; =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f56156f..496dff7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10620,13 +10620,10 @@ static inline void mm_cid_transit_to_cpu(struct t= ask_struct *t, struct mm_cid_pc } } =20 -static bool mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_stru= ct *mm) +static void mm_cid_fixup_task_to_cpu(struct task_struct *t, struct mm_stru= ct *mm) { /* Remote access to mm::mm_cid::pcpu requires rq_lock */ guard(task_rq_lock)(t); - /* If the task is not active it is not in the users count */ - if (!t->mm_cid.active) - return false; if (cid_on_task(t->mm_cid.cid)) { /* If running on the CPU, put the CID in transit mode, otherwise drop it= */ if (task_rq(t)->curr =3D=3D t) @@ -10634,51 +10631,21 @@ static bool mm_cid_fixup_task_to_cpu(struct task_= struct *t, struct mm_struct *mm else mm_unset_cid_on_task(t); } - return true; } =20 -static void mm_cid_do_fixup_tasks_to_cpus(struct mm_struct *mm) +static void mm_cid_fixup_tasks_to_cpus(void) { - struct task_struct *p, *t; - unsigned int users; - - /* - * This can obviously race with a concurrent affinity change, which - * increases the number of allowed CPUs for this mm, but that does - * not affect the mode and only changes the CID constraints. A - * possible switch back to per task mode happens either in the - * deferred handler function or in the next fork()/exit(). - * - * The caller has already transferred so remove it from the users - * count. The incoming task is already visible and has mm_cid.active, - * but has task::mm_cid::cid =3D=3D UNSET. Still it needs to be accounted - * for. Concurrent fork()s might add more threads, but all of them have - * task::mm_cid::active =3D 0, so they don't affect the accounting here. - */ - users =3D mm->mm_cid.users - 1; - - guard(rcu)(); - for_other_threads(current, t) { - if (mm_cid_fixup_task_to_cpu(t, mm)) - users--; - } + struct mm_struct *mm =3D current->mm; + struct task_struct *t; =20 - if (!users) - return; + lockdep_assert_held(&mm->mm_cid.mutex); =20 - /* Happens only for VM_CLONE processes. */ - for_each_process_thread(p, t) { - if (t =3D=3D current || t->mm !=3D mm) - continue; - mm_cid_fixup_task_to_cpu(t, mm); + hlist_for_each_entry(t, &mm->mm_cid.user_list, mm_cid.node) { + /* Current has already transferred before invoking the fixup. */ + if (t !=3D current) + mm_cid_fixup_task_to_cpu(t, mm); } -} - -static void mm_cid_fixup_tasks_to_cpus(void) -{ - struct mm_struct *mm =3D current->mm; =20 - mm_cid_do_fixup_tasks_to_cpus(mm); mm_cid_complete_transit(mm, MM_CID_ONCPU); } =20 @@ -10687,6 +10654,7 @@ static bool sched_mm_cid_add_user(struct task_struc= t *t, struct mm_struct *mm) lockdep_assert_held(&mm->mm_cid.lock); =20 t->mm_cid.active =3D 1; + hlist_add_head(&t->mm_cid.node, &mm->mm_cid.user_list); mm->mm_cid.users++; return mm_update_max_cids(mm); } @@ -10744,6 +10712,7 @@ static bool sched_mm_cid_remove_user(struct task_st= ruct *t) /* Clear the transition bit */ t->mm_cid.cid =3D cid_from_transit_cid(t->mm_cid.cid); mm_unset_cid_on_task(t); + hlist_del_init(&t->mm_cid.node); t->mm->mm_cid.users--; return mm_update_max_cids(t->mm); } @@ -10886,6 +10855,7 @@ void mm_init_cid(struct mm_struct *mm, struct task_= struct *p) mutex_init(&mm->mm_cid.mutex); mm->mm_cid.irq_work =3D IRQ_WORK_INIT_HARD(mm_cid_irq_work); INIT_WORK(&mm->mm_cid.work, mm_cid_work_fn); + INIT_HLIST_HEAD(&mm->mm_cid.user_list); cpumask_copy(mm_cpus_allowed(mm), &p->cpus_mask); bitmap_zero(mm_cidmask(mm), num_possible_cpus()); }