From nobody Sat Feb  7 21:15:01 2026
Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7594832E723
	for <linux-kernel@vger.kernel.org>; Thu, 29 Jan 2026 21:20:57 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=193.142.43.55
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1769721658; cv=none;
 b=td61hclQSBD48Guh1j0mA8ON8dHKevWrselexm1BguhuzTIapPixSCAmdHeUubN5pHVYMOrd32eZD0m0uOBj+UEVthriUqi90quDM0iyQypfxk8zJ+SBcBaSQHyH89KQTvW9rnETIC0Lv2U7vQ/xWNoj7iJ76MMerm44IkmeX8I=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1769721658; c=relaxed/simple;
	bh=/X3LYzgslZlxMvzSRABPEAr+dyyXe4izLfc9wqN4tWw=;
	h=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version:
	 Content-Type;
 b=CvPXJarRb4KecCqYN86kfen7T5eeZf7tL1l6HDOc8qyZu3sf3qldRox9xCnfwt14u6kvvUd+Jp1Yews/k0PxvHAfalzIR/vXyrmNrKSv5105RgMfdL3wu6Vhvq8g1l0G2/iPdaH8WlrlMeyJHb2MaWGvxV4XcCfZ6vZgKLKlPEM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org;
 spf=pass smtp.mailfrom=linutronix.de; arc=none smtp.client-ip=193.142.43.55
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=fail (p=quarantine dis=none) header.from=kernel.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linutronix.de
Date: Thu, 29 Jan 2026 22:20:54 +0100
Message-ID: <20260129211557.882759840@kernel.org>
From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Ihor Solodrai <ihor.solodrai@linux.dev>,
 Shrikanth Hegde <sshegde@linux.ibm.com>,
 Peter Zijlstra <peterz@infradead.org>,
 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
 Michael Jeanson <mjeanson@efficios.com>
Subject: [patch 4/4] sched/mmcid: Optimize transitional CIDs when scheduling
 out
References: <20260129210219.452851594@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

During the investigation of the various transition mode issues
instrumentation revealed that the amount of bitmap operations can be
significantly reduced when a task with a transitional CID schedules out
after the fixup function completed and disabled the transition mode.

At that point the mode is stable and therefore it is not required to drop
the transitional CID back into the pool. As the fixup is complete the
potential exhaustion of the CID pool is not longer possible, so the CID can
be transferred to the scheduling out task or to the CPU depending on the
current ownership mode. This is now possible because mm_cid::mode contains
both the ownership state and the transition bit so the racy snapshot is
valid under all circumstances because a subsequent modification of the
mode is serialized by the corresponding runqueue lock.

Assigning the ownership right there not only spares the bitmap access for
dropping the CID it also avoids it when the task is scheduled back in as it
directly hits the fast path in both modes when the CID is within the
optimal range. If it's outside the range the next schedule in will need to
converge so dropping it right away is sensible. In the good case this also
allows to go into the fast path on the next schedule in operation.

With a thread pool benchmark which is configured to cross the mode switch
boundaries frequently this reduces the number of bitmap operations by about
30% and increases the fastpath utilization in the low single digit
percentage range.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/sched/sched.h |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3902,12 +3902,32 @@ static __always_inline void mm_cid_sched
=20
 static __always_inline void mm_cid_schedout(struct task_struct *prev)
 {
+	struct mm_struct *mm =3D prev->mm;
+	unsigned int mode, cid;
+
 	/* During mode transitions CIDs are temporary and need to be dropped */
 	if (likely(!cid_in_transit(prev->mm_cid.cid)))
 		return;
=20
-	mm_drop_cid(prev->mm, cid_from_transit_cid(prev->mm_cid.cid));
-	prev->mm_cid.cid =3D MM_CID_UNSET;
+	mode =3D READ_ONCE(mm->mm_cid.mode);
+	cid =3D cid_from_transit_cid(prev->mm_cid.cid);
+
+	/*
+	 * If transition mode is done, transfer ownership when the CID is
+	 * within the convergion range. Otherwise the next schedule in will
+	 * have to allocate or converge
+	 */
+	if (!cid_in_transit(mode) && cid < READ_ONCE(mm->mm_cid.max_cids)) {
+		if (cid_on_cpu(mode))
+			cid =3D cid_to_cpu_cid(cid);
+
+		/* Update both so that the next schedule in goes into the fast path */
+		mm_cid_update_pcpu_cid(mm, cid);
+		prev->mm_cid.cid =3D cid;
+	} else {
+		mm_drop_cid(mm, cid);
+		prev->mm_cid.cid =3D MM_CID_UNSET;
+	}
 }
=20
 static inline void mm_cid_switch_to(struct task_struct *prev, struct task_=
struct *next)