From nobody Sun Feb  8 10:47:30 2026
Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com
 [209.85.128.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BCFB717B402
	for <linux-kernel@vger.kernel.org>; Mon,  2 Jun 2025 18:06:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1748887564; cv=none;
 b=ocH7Hh2UKvzGeE1hQIiD2IxF5AFH88CeAYshsFW+C+kx4e4eR2QHFhJyuC+0BIX38jiejjX2PegiBdtvO7NQNNB/NtIBQmdOM3Ff4Sn+rK0VTaAuajJt4DBnBZpxZQTA3mGPcHYLP0eNxEXXsUBUXaP9OdAIJyXICPTDJzVVlHA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1748887564; c=relaxed/simple;
	bh=0cjetwOsL2ypGuYmNTagUI9TOm0lkdiSDu22lm6cOys=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=WRyvPS6MFGrvUWcMxq9BPGZQrhgAZonoH7zr3RLNeBGUhf1rI5z7TwHth9nt1u6cK1YqKqmIW7zKhSHGbRN75WzBePdfIIYrdnUgPrjEQZNi4zdBQ5vqEEBEbY6asRNVmsxF6iPvNRDd7k1Mf1CpiQXnSPek5YTI7Sqgu0sZcU8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=UhmW+NjS; arc=none smtp.client-ip=209.85.128.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="UhmW+NjS"
Received: by mail-yw1-f202.google.com with SMTP id
 00721157ae682-7081121c3f7so55454777b3.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 02 Jun 2025 11:06:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1748887561; x=1749492361;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=U522+V0G8YlXfshZzb10Z1s1B/k3lv+mFTMQTUdIdTw=;
        b=UhmW+NjSASQQP5ywrGHZ9l85CJDfQiys40piRE0BagiMU3LQhugtzxKPB7Ngw/y1RE
         x6BLSzIC+uTg3WMYv+fK7bvrcmrAHFD2JTpny04rU1pLQ+USVapWAibK7cZ7ObCoPxKq
         XyfV6nkL5ba+Zvvs5RhAbzKKSPoGRrsWyaimxwCcI6wePg6rUaemX7zMXwtEdOPx22hE
         ztyF0udbc/0ix9CONpYWKvSBWDYdpVLXOJJDxuH68ClEdcDic61exHngZ3geyTmJQ7xf
         danoWiYzlOMw6uDC6+b5QyML5v4ZY+5PCFn5dS5ZNE2bPtipmAU7ihHS71MMSnvbTmG9
         v+oA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1748887561; x=1749492361;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=U522+V0G8YlXfshZzb10Z1s1B/k3lv+mFTMQTUdIdTw=;
        b=PeFod3E29VPPnrokFHIxbTKkJjlZzGnX7j9aNoQJIsYVkq9Aec7+0VvNf6uLU0df7x
         YaZzZFjlET/ShaeMCOp5pNZFISf9xyfh1vinDw3Cu97o6B0W/mizjLSQHGChjfc6Xyhw
         jlQwEWMO0p+OG7G2yacdVW/8lnXApaBEekQogosEaN74w6g+YQcDsefvDayDQnsxIN1W
         ydOi1wBAnkAkHrrH6aslfoKKKJHWgT5TP2iAoPpxZXBz82vEJwJvBUTtIfOKc3+Wm/R4
         xlOv6CgBDTGzrQkjWlhCszK6TKdqGGHnZE0LaNIYbxIupxpdcl10qUj4M4p2kUalrUJF
         amPA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXb8jYE1E9arIQA5bylW04cniGgXNlS6JUz7So++7DwPEa452j+TZe6dgR/KUQDmSLu4tCUn50sSh5WoBs=@vger.kernel.org
X-Gm-Message-State: AOJu0YyS/uvvrWgmxY9jn+8vJ6XvKB2rLlugK9EjQ6S9I1j8J2uG+7dZ
	lLAj0TeDh4Geo8x/ozZO/vW4M8edQQUqbyqmfQ/0Kdue/+4zYQsgdFoA3gcqdFsYn712ZTmEjTI
	k7MlHuGP2/A==
X-Google-Smtp-Source: 
 AGHT+IFJ0QuXh6WF1I206Yw95tOUU0UGMNhKo8CcWmMnKF1imoo/LhbVZd7U5siLim9mXuZ1gyCBrXAv5pDx
X-Received: from ywbfq17.prod.google.com
 ([2002:a05:690c:3511:b0:708:cde3:531f])
 (user=zecheng job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:690c:4a11:b0:70c:c20a:89a5
 with SMTP id 00721157ae682-710c6622b97mr6640917b3.19.1748887561724; Mon, 02
 Jun 2025 11:06:01 -0700 (PDT)
Date: Mon,  2 Jun 2025 18:05:41 +0000
In-Reply-To: <20250602180544.3626909-1-zecheng@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250602180544.3626909-1-zecheng@google.com>
X-Mailer: git-send-email 2.49.0.1204.g71687c7c1d-goog
Message-ID: <20250602180544.3626909-2-zecheng@google.com>
Subject: [RFC PATCH v2 1/3] cache: conditionally align cache groups
From: Zecheng Li <zecheng@google.com>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>, Xu Liu <xliuprof@google.com>,
	Blake Jones <blakejones@google.com>, Josh Don <joshdon@google.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>, linux-kernel@vger.kernel.org,
	Zecheng Li <zecheng@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Introduces a pair of macros, `__cacheline_group_begin_aligned_cond` and
`__cacheline_group_end_aligned_cond`, to provide conditional cacheline
alignment for cache groups. The alignment behavior is as follows:

If the `COND` parameter is equal to `SMP_CACHE_BYTES`, the cache group
will be aligned to `SMP_CACHE_BYTES`.

If `COND` is not equal to `SMP_CACHE_BYTES`, no specific additional
cacheline alignment is enforced by using `__aligned(1)`.

This mechanism allows for more precise control over cacheline alignment,
ensuring that layout optimizations intended for one cache architecture
do not inadvertently degrade efficiency or introduce holes on systems
with different cache line sizes.

Signed-off-by: Zecheng Li <zecheng@google.com>
---
 include/linux/cache.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/include/linux/cache.h b/include/linux/cache.h
index e69768f50d53..8b5dadf1c487 100644
--- a/include/linux/cache.h
+++ b/include/linux/cache.h
@@ -147,6 +147,34 @@
 	struct { } __cacheline_group_pad__##GROUP		\
 	__aligned((__VA_ARGS__ + 0) ? : SMP_CACHE_BYTES)
=20
+/**
+ * __cacheline_group_begin_aligned_cond - conditionally align a cache group
+ * @GROUP: name of the group
+ * @COND: a size; if it equals SMP_CACHE_BYTES, the group will be aligned
+ * to SMP_CACHE_BYTES. Otherwise, no specific cacheline alignment
+ * is enforced.
+ *
+ */
+#define __cacheline_group_begin_aligned_cond(GROUP, COND)	\
+	__cacheline_group_begin(GROUP)				\
+	__aligned(((COND) =3D=3D SMP_CACHE_BYTES) ? SMP_CACHE_BYTES : 1)
+
+/**
+ * __cacheline_group_end_aligned_cond - declare a conditionally aligned gr=
oup end
+ * @GROUP: name of the group
+ * @COND: condition (size); if it equals SMP_CACHE_BYTES, padding will
+ * be aligned to SMP_CACHE_BYTES. Otherwise, no alignment.
+ *
+ * This complements __cacheline_group_begin_aligned_cond.
+ * The end marker itself is aligned to sizeof(long).
+ * The final padding to avoid the next field falling into this cacheline
+ * is applied conditionally based on COND.
+ */
+#define __cacheline_group_end_aligned_cond(GROUP, COND)                 \
+        __cacheline_group_end(GROUP) __aligned(sizeof(long));           \
+        struct { } __cacheline_group_pad__##GROUP                       \
+        __aligned(((COND) =3D=3D SMP_CACHE_BYTES) ? SMP_CACHE_BYTES : 1)
+
 #ifndef CACHELINE_ASSERT_GROUP_MEMBER
 #define CACHELINE_ASSERT_GROUP_MEMBER(TYPE, GROUP, MEMBER) \
 	BUILD_BUG_ON(!(offsetof(TYPE, MEMBER) >=3D \
--=20
2.49.0
From nobody Sun Feb  8 10:47:30 2026
Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com
 [209.85.128.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DFEDA227EA4
	for <linux-kernel@vger.kernel.org>; Mon,  2 Jun 2025 18:06:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1748887568; cv=none;
 b=t3Ti9anKJ//tBCH5EVV7/3a3yVNNI7MLueK9OZJEKaEECiqqXR5qbzW4fsC9E+NvuwenKBIx1/MqKOULCU8JfyE+RakSI9ihcXLrI5TE8Hu/kC5nXeJ0nDGVe3x4q+C9Zvhc+qi3y92zWjS67oKoufwUpHj6x1WbrB3wgAX2eiw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1748887568; c=relaxed/simple;
	bh=AZlIea1dW+5RMVO3LVVnVTm5W3mhmp186O8a1JZi1ls=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=a7Bito5w/70WEeHootlZajlDqtAgdVB+MEEHtJn/1uSMckSBogozHrbNAc4jMz5YQ1L0c35ndCDRAsjyNbNIJhrT9FlcT3bdgAlLftSENTpcVDXXZouRTuBY8Hiqk9KZSCGI725spOVwdT8yby3I0kkCwTxA4MXIGCvqn0U29TY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=i8MRKrdY; arc=none smtp.client-ip=209.85.128.201
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="i8MRKrdY"
Received: by mail-yw1-f201.google.com with SMTP id
 00721157ae682-70e72e81fafso69077647b3.2
        for <linux-kernel@vger.kernel.org>;
 Mon, 02 Jun 2025 11:06:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1748887565; x=1749492365;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=xzk0cvJ8UNCzaHHSJWSHz1NKFknBW+0VIm4yoAVCWg0=;
        b=i8MRKrdYup+1YE7J/6108np8ZlYfKFTs5R/r2angv9R5IChv4dJ8pBvDQlPMlCxXy/
         Z8HfX1wV/JQMDki6rgqpqAuzFD5sJryN/YVYWlAnSWUWdjirepgqOSVipakwCJb3J4Bq
         h2uR2vodEvK6NmoP0jZYgXCmDw7QjjGombnpeIfngOykAs6N3emzQ+NO/KNP02/4lnO4
         3aWOLEkyYp85R7vMcoFjl+3kpmLo8dcgbrjgGUMbiHcd2hY+8sZK/69fpHoxN+7DpRKm
         xDXjEK4B22L4htfRYxWUHdFbffBBav0sLCkQ9NBXw+ZXuj8yxtngtf672lWjoCV2zx7z
         U4dQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1748887565; x=1749492365;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=xzk0cvJ8UNCzaHHSJWSHz1NKFknBW+0VIm4yoAVCWg0=;
        b=HQX4SQAdJ/0KOO8nM7ZkDRCBYjIu9DBrwVBKaepRzv4aeznrznv6XruEEcOm+9V0ZQ
         1h3v691/CGP02SnOyxjmWa6jYsJEVYblYgsUo48YHJpEvgjN4FOydXmxkPhenOLU7IYa
         7VsCE5JoJP5OCYsvZYrkOI/UwzgJHaeux7Z3wIyKXLE+ww6ksS/zx3gc1q7i66QqvJJ3
         OwOjB0812kkgplr5TS1V4e0+7vQInELI5RQ55RwmZpbjSFTkiEvsrJET/vnnrEVQAffL
         TQbvXm9E2rDA8Q5E3UzczRgrohSH1wqDfTw0k1AA0C3g6apuCfWIl1Ef3vBRtU0vWNid
         VOyg==
X-Forwarded-Encrypted: i=1;
 AJvYcCX5HciBdwFVPX51BbE8XRvn17TUJMg5E9oTmgsb50m2kRwU/vloorHGkXUBZ/3L6CHSlsBVZw/inYkfE54=@vger.kernel.org
X-Gm-Message-State: AOJu0Yz3drMfkm/kJRh4MVdDyZKaxSgz3ibjtwr1+vscVMBZOLCjzmb6
	BcygOClgNZE9AYeUdJ4TOdxFPlXnZOre87ugGLsNuHiF18f5aLwP9KyNiaRh3rVEILED6qrbUsF
	lnjJ2nzk7tA==
X-Google-Smtp-Source: 
 AGHT+IErnilqh/FPKXIL6KvVOGQizN6Yu+6DNdKW52Es9cu8i7octE1JM91IVsRnO3YHnziLnwa2+8oO4rPI
X-Received: from ybbdr8.prod.google.com
 ([2002:a05:6902:2408:b0:e7d:9da3:a6d4])
 (user=zecheng job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6902:1141:b0:e81:5197:4cda
 with SMTP id 3f1490d57ef6-e8151974f47mr3822023276.0.1748887564754; Mon, 02
 Jun 2025 11:06:04 -0700 (PDT)
Date: Mon,  2 Jun 2025 18:05:42 +0000
In-Reply-To: <20250602180544.3626909-1-zecheng@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250602180544.3626909-1-zecheng@google.com>
X-Mailer: git-send-email 2.49.0.1204.g71687c7c1d-goog
Message-ID: <20250602180544.3626909-3-zecheng@google.com>
Subject: [RFC PATCH v2 2/3] sched/fair: Reorder struct cfs_rq
From: Zecheng Li <zecheng@google.com>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>, Xu Liu <xliuprof@google.com>,
	Blake Jones <blakejones@google.com>, Josh Don <joshdon@google.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>, linux-kernel@vger.kernel.org,
	Zecheng Li <zecheng@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Hot fields are moved to the head of the struct, for a total of 128
bytes, which accounts for two cache lines in x86. With all related
CONFIG enabled, it also moves fields originally located around the 4th
and 5th cache line offsets to provide better locality when executing CFS
bandwidth control functions. Due to the removal of holes in the struct,
its size is observed to reduce by one cacheline in an x86 system.

The following changes are proposed:

- Move `curr`, `rq`, `tg`, `throttle_count`, and `runtime_enabled` to
the first cache line as they are frequently accessed (and mostly read).
They are pointers to the closely related structs (`rq`, `tg`) or checked
as a condition (`curr`, `throttle_count` and `runtime_enabled`).

- `propagate` and `idle`, two frequently read fields, were placed in
separate cache lines. Group them in cache line 2 with the remaining
fields previously in cache line 1 to fill the hole.

- `on_list` is often accessed together with `throttle_clock_*` in
`tg_unthrottle_up` and `tg_throttle_down` functions. Move
`runtime_remaining` and `throttled_pelt_idle`, which are less frequently
accessed, to the outside to allow grouping `on_list` and
throttle-related fields together. This cache group aligns to 64 byte
boundaries only when the target architecture utilizes a 64 byte cache
line size.

- Use `__cacheline_group_*` macros to delineate logically grouped fields
for cache alignment, with compile-time checks added in
`cfs_rq_struct_check`.

Signed-off-by: Zecheng Li <zecheng@google.com>
---
 kernel/sched/core.c  | 61 ++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h | 81 ++++++++++++++++++++++++++++++--------------
 2 files changed, 115 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c81cf642dba0..ba89cd4f2fac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8524,6 +8524,8 @@ LIST_HEAD(task_groups);
 static struct kmem_cache *task_group_cache __ro_after_init;
 #endif
=20
+static void __init cfs_rq_struct_check(void);
+
 void __init sched_init(void)
 {
 	unsigned long ptr =3D 0;
@@ -8540,7 +8542,7 @@ void __init sched_init(void)
 	BUG_ON(!sched_class_above(&fair_sched_class, &ext_sched_class));
 	BUG_ON(!sched_class_above(&ext_sched_class, &idle_sched_class));
 #endif
-
+	cfs_rq_struct_check();
 	wait_bit_init();
=20
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -10746,3 +10748,60 @@ void sched_enq_and_set_task(struct sched_enq_and_s=
et_ctx *ctx)
 		set_next_task(rq, ctx->p);
 }
 #endif	/* CONFIG_SCHED_CLASS_EXT */
+
+static void __init cfs_rq_struct_check(void)
+{
+	/*
+	 * The first two cache lines are hot and mostly read
+	 * except load.inv_weight
+	 */
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, load);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, nr_queued);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, h_nr_queued);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, h_nr_runnable);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, h_nr_idle);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, curr);
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, rq);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, tg);
+
+#ifdef CONFIG_CFS_BANDWIDTH
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, throttle_count);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, runtime_enabled);
+#endif
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, idle);
+
+#ifdef CONFIG_SMP
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, propagate);
+#endif
+#endif
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, avg_vruntime);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, avg_load);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, min_vruntime);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, tasks_timeline);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, hot, next);
+
+	/*
+	 * This cache line groups hot fields of the throttling functions.
+	 * This group is enabled when CFS_BANDWIDTH is configured.
+	 */
+#ifdef CONFIG_CFS_BANDWIDTH
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle, throttled);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle, on_list);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle,
+				      leaf_cfs_rq_list);
+
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle, throttled_clock);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle,
+				      throttled_clock_pelt);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle,
+				      throttled_clock_pelt_time);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle,
+				      throttled_clock_self);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct cfs_rq, throttle,
+				      throttled_clock_self_time);
+#endif
+#endif
+}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 47972f34ea70..b0a6c70c01ea 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -641,31 +641,55 @@ struct balance_callback {
 	void (*func)(struct rq *rq);
 };
=20
+/**
+ * The `throttle` cache group is designed to group 64 bytes into a cache
+ * line, which benefits architectures with a 64-byte cache line size. To
+ * prevent performance degradation on other architectures, let's
+ * conditionally align it when the target system utilizes a 64-byte
+ * cache line.
+ */
+#define THROTTLE_GROUP_ALIGN_COND 64
+
 /* CFS-related fields in a runqueue */
 struct cfs_rq {
+	/* The first cache line group is hot and mostly read */
+	__cacheline_group_begin(hot);
 	struct load_weight	load;
 	unsigned int		nr_queued;
 	unsigned int		h_nr_queued;       /* SCHED_{NORMAL,BATCH,IDLE} */
 	unsigned int		h_nr_runnable;     /* SCHED_{NORMAL,BATCH,IDLE} */
 	unsigned int		h_nr_idle; /* SCHED_IDLE */
+	/*
+	 * 'curr' points to currently running entity on this cfs_rq.
+	 * It is set to NULL otherwise (i.e when none are currently running).
+	 */
+	struct sched_entity	*curr;
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	struct rq		*rq;	/* CPU runqueue to which this cfs_rq is attached */
+	struct task_group	*tg;	/* group that "owns" this runqueue */
+
+#ifdef CONFIG_CFS_BANDWIDTH
+	int			throttle_count;
+	int			runtime_enabled;
+#endif
+	/* Locally cached copy of our task_group's idle value */
+	int			idle;
+
+#ifdef CONFIG_SMP
+	long			propagate;
+#endif /* CONFIG_SMP */
+#endif /* CONFIG_FAIR_GROUP_SCHED */
=20
 	s64			avg_vruntime;
 	u64			avg_load;
=20
 	u64			min_vruntime;
-#ifdef CONFIG_SCHED_CORE
-	unsigned int		forceidle_seq;
-	u64			min_vruntime_fi;
-#endif
=20
 	struct rb_root_cached	tasks_timeline;
=20
-	/*
-	 * 'curr' points to currently running entity on this cfs_rq.
-	 * It is set to NULL otherwise (i.e when none are currently running).
-	 */
-	struct sched_entity	*curr;
 	struct sched_entity	*next;
+	__cacheline_group_end(hot);
=20
 #ifdef CONFIG_SMP
 	/*
@@ -686,7 +710,6 @@ struct cfs_rq {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	u64			last_update_tg_load_avg;
 	unsigned long		tg_load_avg_contrib;
-	long			propagate;
 	long			prop_runnable_sum;
=20
 	/*
@@ -702,8 +725,21 @@ struct cfs_rq {
 #endif /* CONFIG_SMP */
=20
 #ifdef CONFIG_FAIR_GROUP_SCHED
-	struct rq		*rq;	/* CPU runqueue to which this cfs_rq is attached */
-
+#ifdef CONFIG_CFS_BANDWIDTH
+	s64			runtime_remaining;
+	u64			throttled_pelt_idle;
+#ifndef CONFIG_64BIT
+	u64                     throttled_pelt_idle_copy;
+#endif
+	/*
+	 * This cache line groups hot fields of the throttling functions.
+	 * This group is enabled when CFS_BANDWIDTH is configured.
+	 * Alignment is enforced only when the target architecture
+	 * utilizes a 64-byte cache line size.
+	 */
+	__cacheline_group_begin_aligned_cond(throttle, THROTTLE_GROUP_ALIGN_COND);
+	int			throttled;
+#endif /* CONFIG_CFS_BANDWIDTH */
 	/*
 	 * leaf cfs_rqs are those that hold tasks (lowest schedulable entity in
 	 * a hierarchy). Non-leaf lrqs hold other higher schedulable entities
@@ -714,30 +750,23 @@ struct cfs_rq {
 	 */
 	int			on_list;
 	struct list_head	leaf_cfs_rq_list;
-	struct task_group	*tg;	/* group that "owns" this runqueue */
-
-	/* Locally cached copy of our task_group's idle value */
-	int			idle;
-
 #ifdef CONFIG_CFS_BANDWIDTH
-	int			runtime_enabled;
-	s64			runtime_remaining;
-
-	u64			throttled_pelt_idle;
-#ifndef CONFIG_64BIT
-	u64                     throttled_pelt_idle_copy;
-#endif
 	u64			throttled_clock;
 	u64			throttled_clock_pelt;
 	u64			throttled_clock_pelt_time;
 	u64			throttled_clock_self;
 	u64			throttled_clock_self_time;
-	int			throttled;
-	int			throttle_count;
+	__cacheline_group_end_aligned_cond(throttle, THROTTLE_GROUP_ALIGN_COND);
+
 	struct list_head	throttled_list;
 	struct list_head	throttled_csd_list;
 #endif /* CONFIG_CFS_BANDWIDTH */
 #endif /* CONFIG_FAIR_GROUP_SCHED */
+
+#ifdef CONFIG_SCHED_CORE
+	unsigned int		forceidle_seq;
+	u64			min_vruntime_fi;
+#endif
 };
=20
 #ifdef CONFIG_SCHED_CLASS_EXT
--=20
2.49.0
From nobody Sun Feb  8 10:47:30 2026
Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com
 [209.85.219.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C54DC228CB5
	for <linux-kernel@vger.kernel.org>; Mon,  2 Jun 2025 18:06:09 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.219.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1748887572; cv=none;
 b=eEXUgs8s0eRFUPw4EABI4WnuONC1OYfCYTIRkFNGRZz0WdCDVqt4O0O70f8kPMxbMTO2bY3EaCFocORcIacedUvMUP+jNOdir584vak08s1aQ3nPR/VzK58u7q9eVt+AReHcrmDoICw0EOWp48I9PqUhtD5wCT80oBLsH4PT+nc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1748887572; c=relaxed/simple;
	bh=xt14giH4H2Boj1LehC8v8p2UGvEQ5ROsW4o3kmI6NEs=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Tk0JxWB8gaJ+LR3DavUZ7fFLYWCAotKwqFKBMw31atCeBdg0O6U8QrSw6VDf9yFXVLUQaQKyel59KibVjm3vopdaeiicCGf4lm1Og4W6SZzVaYbUugaxMI3foGnoqleFSxFept7cF9hC3azuqCxOyr4Gd4BmixXaa5Mp/t3LK6w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=cDCgLPKc; arc=none smtp.client-ip=209.85.219.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--zecheng.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="cDCgLPKc"
Received: by mail-yb1-f202.google.com with SMTP id
 3f1490d57ef6-e812d1126deso2499508276.1
        for <linux-kernel@vger.kernel.org>;
 Mon, 02 Jun 2025 11:06:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1748887568; x=1749492368;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=TCrDrn8l7ZQ01wUSaHvfLKKVmhd43n1nTteE9Ulczlg=;
        b=cDCgLPKcWRFllHkTi4FeWQ5onVOJOtDwIOQ2p1O2t/DZCXOC+uzdN1VHsGLRz3aShe
         yil294lM8LYZwb8bXiCW9xFyckN6tVwXQFnAA4pGKhkcWu1J+9XKbkvKIyWwPeglUsEg
         aR+1ppDndrWUX93xpUe8jhxQqmRDEE7QFEFUPxn4kYdcuj1FMIcimAbeNg1beV8dY43z
         fUWQjDM0sf3C2wEHlFxI7zOrad9zQHBdJnNUiTUVKLbKz/tkxPCztdgFXaju7sKvV6+8
         gGeRFCccRdEnndF+QJYjt06ngNixMFcGexjVQC0wZLxpNvCjgKhKoDb1VU/tGjvyhlJc
         OBrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1748887568; x=1749492368;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=TCrDrn8l7ZQ01wUSaHvfLKKVmhd43n1nTteE9Ulczlg=;
        b=vgq6HFKY/FydoJyk5YMsFY2OxqpEJhA3fvaD7CWVKoSAOasXwvHmBPgJN812vo3j3E
         Rlm5P5ZXXx8YMuXHtalY0efTfv5laeb5T5fRyfDW1VSqLNxis8RioMKaIKXupZTot08d
         4N2fjEnzK3ABdJxHqupcDCAgsvzI0YUW4I4opfUvlO33tJh1RXq7CU2gjB5dFZg6bg16
         MuFlguX6jWhBiiyhUdS7Lal3zRBDz3dL5O9whJYCTpO2d6IP56eJ+vK8XmZMtyThw5gr
         85yOn7h1//7QjNo8E+xeBQy6vesL/5uBQRamXHaKQB2qPo0KX9z2xvYbPafveA+Jpy1O
         qKNg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVYLuqrdj/EPg5bfvMlqYxfQM5u7A9vrCDZCJ4ydyqCDInuDeamYku8vU8naHvbRnAnnTKuIi+dy3hCAIQ=@vger.kernel.org
X-Gm-Message-State: AOJu0YyB4Nc8le7c2RNhGaKFhvjQ0nrnvGZB2euNqaN8A9vKohsmyDZJ
	L+q6SjPR+nbGKQDTKkjgMavTmJ70FjAs/N961hPdixlYlRyGSzoGzcIpgXfEI+t96wu2LQIeHR5
	GD3OSE4+4ZA==
X-Google-Smtp-Source: 
 AGHT+IHcNb6bKXDTsa0HdCWFtVoy0XH/LU/dCeX+oRa/ab36fKnnncKj2TfD6Sgznxdk3pAEH4A7mfXJlwlp
X-Received: from ybbgr8.prod.google.com
 ([2002:a05:6902:6208:b0:e7d:6f4c:cf59])
 (user=zecheng job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6902:c04:b0:e7d:a6f8:2eb2
 with SMTP id 3f1490d57ef6-e8128c7348dmr11955643276.35.1748887568266; Mon, 02
 Jun 2025 11:06:08 -0700 (PDT)
Date: Mon,  2 Jun 2025 18:05:43 +0000
In-Reply-To: <20250602180544.3626909-1-zecheng@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250602180544.3626909-1-zecheng@google.com>
X-Mailer: git-send-email 2.49.0.1204.g71687c7c1d-goog
Message-ID: <20250602180544.3626909-4-zecheng@google.com>
Subject: [RFC PATCH v2 3/3] sched/fair: Reorder struct sched_entity
From: Zecheng Li <zecheng@google.com>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
 Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>, Xu Liu <xliuprof@google.com>,
	Blake Jones <blakejones@google.com>, Josh Don <joshdon@google.com>,
	Madadi Vineeth Reddy <vineethr@linux.ibm.com>, linux-kernel@vger.kernel.org,
	Zecheng Li <zecheng@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Groups the mostly read fields in struct sched_entity to the head of the
struct when `CONFIG_FAIR_GROUP_SCHED` is set. The additional fields from
`CONFIG_FAIR_GROUP_SCHED` are related to CFS cgroup scheduling and were
placed far away from the hot fields `load`, `on_rq` and `vruntime`. They
are moved together to the head of the struct to exploit locality.
Although `depth` is not as hot as other fields, we keep it here to avoid
breaking the #ifdef boundaries. Adds enforced alignment of struct
sched_entity to ensure the cache group works as intended.

Also adds a compile time check when `CONFIG_FAIR_GROUP_SCHED` is set to
check the placement of the hot fields.

Signed-off-by: Zecheng Li <zecheng@google.com>
---
 include/linux/sched.h | 39 +++++++++++++++++++++------------------
 kernel/sched/core.c   | 20 ++++++++++++++++++++
 2 files changed, 41 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac1982893..b20b2d590cf6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -567,40 +567,43 @@ struct sched_statistics {
 } ____cacheline_aligned;
=20
 struct sched_entity {
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	/* Group the read most hot fields in sched_entity */
+	__cacheline_group_begin(hot);
+	struct sched_entity		*parent;
+	/* rq on which this entity is (to be) queued: */
+	struct cfs_rq			*cfs_rq;
+	/* rq "owned" by this entity/group: */
+	struct cfs_rq			*my_q;
+	/* cached value of my_q->h_nr_running */
+	unsigned long			runnable_weight;
+	int				depth;
+#endif
+	unsigned char			on_rq;
+	unsigned char			sched_delayed;
+	unsigned char			rel_deadline;
+	unsigned char			custom_slice;
 	/* For load-balancing: */
 	struct load_weight		load;
+	u64				vruntime;
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	__cacheline_group_end(hot);
+#endif
 	struct rb_node			run_node;
 	u64				deadline;
 	u64				min_vruntime;
 	u64				min_slice;
=20
 	struct list_head		group_node;
-	unsigned char			on_rq;
-	unsigned char			sched_delayed;
-	unsigned char			rel_deadline;
-	unsigned char			custom_slice;
-					/* hole */
=20
 	u64				exec_start;
 	u64				sum_exec_runtime;
 	u64				prev_sum_exec_runtime;
-	u64				vruntime;
 	s64				vlag;
 	u64				slice;
=20
 	u64				nr_migrations;
=20
-#ifdef CONFIG_FAIR_GROUP_SCHED
-	int				depth;
-	struct sched_entity		*parent;
-	/* rq on which this entity is (to be) queued: */
-	struct cfs_rq			*cfs_rq;
-	/* rq "owned" by this entity/group: */
-	struct cfs_rq			*my_q;
-	/* cached value of my_q->h_nr_running */
-	unsigned long			runnable_weight;
-#endif
-
 #ifdef CONFIG_SMP
 	/*
 	 * Per entity load average tracking.
@@ -610,7 +613,7 @@ struct sched_entity {
 	 */
 	struct sched_avg		avg;
 #endif
-};
+} ____cacheline_aligned;
=20
 struct sched_rt_entity {
 	struct list_head		run_list;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ba89cd4f2fac..dcc50df9e8ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8525,6 +8525,7 @@ static struct kmem_cache *task_group_cache __ro_after=
_init;
 #endif
=20
 static void __init cfs_rq_struct_check(void);
+static void __init sched_entity_struct_check(void);
=20
 void __init sched_init(void)
 {
@@ -8543,6 +8544,7 @@ void __init sched_init(void)
 	BUG_ON(!sched_class_above(&ext_sched_class, &idle_sched_class));
 #endif
 	cfs_rq_struct_check();
+	sched_entity_struct_check();
 	wait_bit_init();
=20
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -10805,3 +10807,21 @@ static void __init cfs_rq_struct_check(void)
 #endif
 #endif
 }
+
+static void __init sched_entity_struct_check(void)
+{
+	/*
+	 * The compile time check is only enabled with CONFIG_FAIR_GROUP_SCHED.
+	 * We care about the placement of six hottest fields below.
+	 */
+#ifdef CONFIG_FAIR_GROUP_SCHED
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, parent);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, cfs_rq);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, my_q);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot,
+				      runnable_weight);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, on_rq);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, load);
+	CACHELINE_ASSERT_GROUP_MEMBER(struct sched_entity, hot, vruntime);
+#endif
+}
--=20
2.49.0