From: Valentin Schneider <vschneid@redhat.com>
Add related data structures for this new throttle functionality.
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
---
include/linux/sched.h | 4 ++++
kernel/sched/core.c | 3 +++
kernel/sched/fair.c | 12 ++++++++++++
kernel/sched/sched.h | 2 ++
4 files changed, 21 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac19828934..0b55c79fee209 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -880,6 +880,10 @@ struct task_struct {
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
+#ifdef CONFIG_CFS_BANDWIDTH
+ struct callback_head sched_throttle_work;
+ struct list_head throttle_node;
+#endif
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 79692f85643fe..3b8735bc527da 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4492,6 +4492,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = NULL;
+#ifdef CONFIG_CFS_BANDWIDTH
+ init_cfs_throttle_work(p);
+#endif
#endif
#ifdef CONFIG_SCHEDSTATS
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0c19459c80422..894202d232efd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5823,6 +5823,18 @@ static inline int throttled_lb_pair(struct task_group *tg,
throttled_hierarchy(dest_cfs_rq);
}
+static void throttle_cfs_rq_work(struct callback_head *work)
+{
+}
+
+void init_cfs_throttle_work(struct task_struct *p)
+{
+ init_task_work(&p->sched_throttle_work, throttle_cfs_rq_work);
+ /* Protect against double add, see throttle_cfs_rq() and throttle_cfs_rq_work() */
+ p->sched_throttle_work.next = &p->sched_throttle_work;
+ INIT_LIST_HEAD(&p->throttle_node);
+}
+
static int tg_unthrottle_up(struct task_group *tg, void *data)
{
struct rq *rq = data;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c5a6a503eb6de..921527327f107 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2703,6 +2703,8 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
extern void init_dl_entity(struct sched_dl_entity *dl_se);
+extern void init_cfs_throttle_work(struct task_struct *p);
+
#define BW_SHIFT 20
#define BW_UNIT (1 << BW_SHIFT)
#define RATIO_SHIFT 8
--
2.39.5
Hello Aaron,
On 4/9/2025 5:37 PM, Aaron Lu wrote:
> From: Valentin Schneider <vschneid@redhat.com>
>
> Add related data structures for this new throttle functionality.
>
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
> ---
> include/linux/sched.h | 4 ++++
> kernel/sched/core.c | 3 +++
> kernel/sched/fair.c | 12 ++++++++++++
> kernel/sched/sched.h | 2 ++
> 4 files changed, 21 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f96ac19828934..0b55c79fee209 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -880,6 +880,10 @@ struct task_struct {
>
> #ifdef CONFIG_CGROUP_SCHED
> struct task_group *sched_task_group;
> +#ifdef CONFIG_CFS_BANDWIDTH
> + struct callback_head sched_throttle_work;
> + struct list_head throttle_node;
Since throttled tasks are fully dequeued before placing on the
"throttled_limbo_list", is it possible to reuse "p->se.group_node"?
Currently, it is used to track the task on "rq->cfs_tasks" and during
load-balancing when moving a bunch of tasks between CPUs but since a
fully throttled task is not tracked by either, it should be safe to
reuse this bit (CONFIG_DEBUG_LIST will scream if I'm wrong) and save
up on some space in the task_struct.
Thoughts?
--
Thanks and Regards,
Prateek
> +#endif
> #endif
>
On Mon, Apr 14, 2025 at 09:28:36AM +0530, K Prateek Nayak wrote:
> Hello Aaron,
>
> On 4/9/2025 5:37 PM, Aaron Lu wrote:
> > From: Valentin Schneider <vschneid@redhat.com>
> >
> > Add related data structures for this new throttle functionality.
> >
> > Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> > Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
> > ---
> > include/linux/sched.h | 4 ++++
> > kernel/sched/core.c | 3 +++
> > kernel/sched/fair.c | 12 ++++++++++++
> > kernel/sched/sched.h | 2 ++
> > 4 files changed, 21 insertions(+)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index f96ac19828934..0b55c79fee209 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -880,6 +880,10 @@ struct task_struct {
> > #ifdef CONFIG_CGROUP_SCHED
> > struct task_group *sched_task_group;
> > +#ifdef CONFIG_CFS_BANDWIDTH
> > + struct callback_head sched_throttle_work;
> > + struct list_head throttle_node;
>
> Since throttled tasks are fully dequeued before placing on the
> "throttled_limbo_list", is it possible to reuse "p->se.group_node"?
I think it might be possible.
> Currently, it is used to track the task on "rq->cfs_tasks" and during
> load-balancing when moving a bunch of tasks between CPUs but since a
> fully throttled task is not tracked by either, it should be safe to
> reuse this bit (CONFIG_DEBUG_LIST will scream if I'm wrong) and save
> up on some space in the task_struct.
>
> Thoughts?
Is it that adding throttle_node would cause task_struct to just cross a
cacheline boundary? :-)
Or it's mainly a concern that system could have many tasks and any saving
in task_struct is worth to try?
I can see reusing another field would cause task_is_throttled() more
obscure to digest and implement, but I think it is doable.
Thanks,
Aaron
Hello Aaron,
On 4/14/2025 5:25 PM, Aaron Lu wrote:
> On Mon, Apr 14, 2025 at 09:28:36AM +0530, K Prateek Nayak wrote:
>> Hello Aaron,
>>
>> On 4/9/2025 5:37 PM, Aaron Lu wrote:
>>> From: Valentin Schneider <vschneid@redhat.com>
>>>
>>> Add related data structures for this new throttle functionality.
>>>
>>> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
>>> Signed-off-by: Aaron Lu <ziqianlu@bytedance.com>
>>> ---
>>> include/linux/sched.h | 4 ++++
>>> kernel/sched/core.c | 3 +++
>>> kernel/sched/fair.c | 12 ++++++++++++
>>> kernel/sched/sched.h | 2 ++
>>> 4 files changed, 21 insertions(+)
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index f96ac19828934..0b55c79fee209 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -880,6 +880,10 @@ struct task_struct {
>>> #ifdef CONFIG_CGROUP_SCHED
>>> struct task_group *sched_task_group;
>>> +#ifdef CONFIG_CFS_BANDWIDTH
>>> + struct callback_head sched_throttle_work;
>>> + struct list_head throttle_node;
>>
>> Since throttled tasks are fully dequeued before placing on the
>> "throttled_limbo_list", is it possible to reuse "p->se.group_node"?
>
> I think it might be possible.
>
>> Currently, it is used to track the task on "rq->cfs_tasks" and during
>> load-balancing when moving a bunch of tasks between CPUs but since a
>> fully throttled task is not tracked by either, it should be safe to
>> reuse this bit (CONFIG_DEBUG_LIST will scream if I'm wrong) and save
>> up on some space in the task_struct.
>>
>> Thoughts?
>
> Is it that adding throttle_node would cause task_struct to just cross a
> cacheline boundary? :-)
>
> Or it's mainly a concern that system could have many tasks and any saving
> in task_struct is worth to try?
Mostly this :)
>
> I can see reusing another field would cause task_is_throttled() more
> obscure to digest and implement, but I think it is doable.
I completely overlooked task_is_throttled() use-case. I think the
current implementation is much cleaner in that aspect; no need to
overload "p->se.group_node" and over-complicate this.
If we really want some space saving , declaring a "unsigned char
sched_throttled" in the hole next to "sched_delayed" would be cleaner
but I'd wait on Valentin and Peter's comments before going down that
path.
>
> Thanks,
> Aaron
--
Thanks and Regards,
Prateek
© 2016 - 2026 Red Hat, Inc.