From nobody Fri Dec 19 21:09:00 2025 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2CA91309EF4 for ; Wed, 3 Dec 2025 23:01:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802908; cv=none; b=nzlhLORShQGH6z2OKPCwgPj3fFYQBq0S4kjlB8PdpAMAbRvUDKx69/o9oLg1lRga1/7uLzN7ZJmwClhqm7REccEFVBXjMxnF8O6F1qeXlUxSc5j6wsPAdvgE25W54gtIVxKBjQRnZDVLeIGtXbaxk29EoCqp7pm1fCpS1IY7jQo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764802908; c=relaxed/simple; bh=7G8GAR73tqFcdrEyXVcfBaeUwRwA82VAe47pEbdUV2w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Jh1NMZniFEQvMeyAac4yMWESOURMqAUIKW5GcomnPyFPuACvinoSr0dUF9HnUWSFLODn+/4wiWm4ySl8YKMzKSgIL7OQSmo169aanmL/sbmdbfeduyjfscZaBGqL5cQYK99GiDZLKPt44QcYP3KC0gclEaC+Rkd8OiTRxeMU500= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E0yq8JMN; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E0yq8JMN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1764802906; x=1796338906; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7G8GAR73tqFcdrEyXVcfBaeUwRwA82VAe47pEbdUV2w=; b=E0yq8JMN3sNhZ58s1b5iZ/cpqNuM9N0pDevJEvrPce0R2mUndVkmGScN McHDjEQAdkFny/+9qg6ANdvlFmYlDA/4TibC4Yz5kBPZKGiM/VEgmSwNx Wv+0fExbPAqEqTORsnJ61vyIc7KAkoB0P/ug+G27y1gOBAwA36EGLI/OA /yCpUK6WyND+MO1j8Jd+Z6+AKRhUgaidNDGg0GWIIit5s7o17SsHVlDsV qRWNYanMa3En1ALugyelInfcAx8tLNFNwwlqUz9ZCh6D2uuGRuoBR5fLH VziKp+AH5f2oXxMZP43VD+u7hWt+ni9sCpFuAa1/qPyus5y+HPClviJWH w==; X-CSE-ConnectionGUID: oDqO/ga6T/+BT+4b/VYbEw== X-CSE-MsgGUID: BsO2ZD3WSAih53lppi2XZQ== X-IronPort-AV: E=McAfee;i="6800,10657,11631"; a="77136597" X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="77136597" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Dec 2025 15:01:45 -0800 X-CSE-ConnectionGUID: vKt+yECETT+2Z5MJs0mW1A== X-CSE-MsgGUID: XpexGbaTSRGCCth9FMIgbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.20,247,1758610800"; d="scan'208";a="199763921" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmviesa004.fm.intel.com with ESMTP; 03 Dec 2025 15:01:45 -0800 From: Tim Chen To: Peter Zijlstra , Ingo Molnar , K Prateek Nayak , "Gautham R . Shenoy" , Vincent Guittot Cc: Chen Yu , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Madadi Vineeth Reddy , Hillf Danton , Shrikanth Hegde , Jianyong Wu , Yangyu Chen , Tingyin Duan , Vern Hao , Vern Hao , Len Brown , Tim Chen , Aubrey Li , Zhao Liu , Chen Yu , Adam Li , Aaron Lu , Tim Chen , linux-kernel@vger.kernel.org Subject: [PATCH v2 17/23] sched/cache: Record the number of active threads per process for cache-aware scheduling Date: Wed, 3 Dec 2025 15:07:36 -0800 Message-Id: X-Mailer: git-send-email 2.32.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Chen Yu A performance regression was observed by Prateek when running hackbench with many threads per process (high fd count). To avoid this, processes with a large number of active threads are excluded from cache-aware scheduling. With sched_cache enabled, record the number of active threads in each process during the periodic task_cache_work(). While iterating over CPUs, if the currently running task belongs to the same process as the task that launched task_cache_work(), increment the active thread count. This number will be used by subsequent patch to inhibit cache aware load balance. Suggested-by: K Prateek Nayak Signed-off-by: Chen Yu Signed-off-by: Tim Chen --- Notes: v1->v2: No change. include/linux/mm_types.h | 1 + kernel/sched/fair.c | 11 +++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 1ea16ef90566..04743983de4d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1043,6 +1043,7 @@ struct mm_struct { raw_spinlock_t mm_sched_lock; unsigned long mm_sched_epoch; int mm_sched_cpu; + u64 nr_running_avg ____cacheline_aligned_in_smp; #endif =20 #ifdef CONFIG_MMU diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 580a967efdac..2f38ad82688f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1421,11 +1421,11 @@ static void task_tick_cache(struct rq *rq, struct t= ask_struct *p) =20 static void __no_profile task_cache_work(struct callback_head *work) { - struct task_struct *p =3D current; + struct task_struct *p =3D current, *cur; struct mm_struct *mm =3D p->mm; unsigned long m_a_occ =3D 0; unsigned long curr_m_a_occ =3D 0; - int cpu, m_a_cpu =3D -1; + int cpu, m_a_cpu =3D -1, nr_running =3D 0; cpumask_var_t cpus; =20 WARN_ON_ONCE(work !=3D &p->cache_work); @@ -1458,6 +1458,12 @@ static void __no_profile task_cache_work(struct call= back_head *work) m_occ =3D occ; m_cpu =3D i; } + rcu_read_lock(); + cur =3D rcu_dereference(cpu_rq(i)->curr); + if (cur && !(cur->flags & (PF_EXITING | PF_KTHREAD)) && + cur->mm =3D=3D mm) + nr_running++; + rcu_read_unlock(); } =20 /* @@ -1501,6 +1507,7 @@ static void __no_profile task_cache_work(struct callb= ack_head *work) mm->mm_sched_cpu =3D m_a_cpu; } =20 + update_avg(&mm->nr_running_avg, nr_running); free_cpumask_var(cpus); } =20 --=20 2.32.0