[RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control

Chen Yu posted 3 patches 11 months, 2 weeks ago
There is a newer version of this series
include/linux/numa.h           |  1 +
include/linux/sched.h          |  4 ++++
include/linux/sched/sysctl.h   |  1 +
include/linux/vm_event_item.h  |  2 ++
include/uapi/linux/mempolicy.h |  1 +
kernel/sched/core.c            | 42 ++++++++++++++++++++++++++++++++--
kernel/sched/debug.c           |  4 ++++
kernel/sched/fair.c            | 18 +++++++++++++++
kernel/sched/sched.h           |  3 +++
mm/memcontrol.c                |  2 ++
mm/memory.c                    |  2 +-
mm/mempolicy.c                 |  7 ++++++
mm/mprotect.c                  |  5 ++--
mm/vmstat.c                    |  2 ++
14 files changed, 89 insertions(+), 5 deletions(-)
[RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control
Posted by Chen Yu 11 months, 2 weeks ago
Introduce a per-cgroup interface to enable NUMA balancing
for specific cgroups. The system administrator needs to set
the NUMA balancing mode to NUMA_BALANCING_CGROUP=4 to enable
this feature. When in the NUMA_BALANCING_CGROUP mode, all
cgroups' NUMA balancing is disabled by default. After the
administrator enables this feature for a specific cgroup,
NUMA balancing for that cgroup is enabled.

This per-cgroup NUMA balancing control was once proposed in
2019 by Yun Wang[1]. Then, in 2024, Kaiyang Zhao mentioned
that he was working with Meta on per-cgroup NUMA control[2]
during a discussion with David Rientjes.

I could not find further discussion regarding per-cgroup NUMA
balancing from that point on. This set of RFC patches is a
rough and compile-passed version, and may have unhandled cases
(for example, THP). It has not been thoroughly tested and is
intended to initiate or resume the discussion on the topic of
per-cgroup NUMA load balancing.

The first patch is a NUMA load balancing statistics enhancement.
The second patch introduces per-cgroup NUMA balancing. The third
one enhances NUMA load balancing for the MPOL_INTERLEAVE policy.

Any feedback would be appreciated.

[1] https://lore.kernel.org/linux-fsdevel/60b59306-5e36-e587-9145-e90657daec41@linux.alibaba.com/
[2] https://lore.kernel.org/linux-mm/ZrukILyQhMAKWwTe@localhost.localhost/T/

Chen Yu (3):
  sched/numa: Introduce numa balance task migration and swap in
    schedstats
  sched/numa: Introduce per cgroup numa balance control
  sched/numa: Allow intervale memory allocation for numa balance

 include/linux/numa.h           |  1 +
 include/linux/sched.h          |  4 ++++
 include/linux/sched/sysctl.h   |  1 +
 include/linux/vm_event_item.h  |  2 ++
 include/uapi/linux/mempolicy.h |  1 +
 kernel/sched/core.c            | 42 ++++++++++++++++++++++++++++++++--
 kernel/sched/debug.c           |  4 ++++
 kernel/sched/fair.c            | 18 +++++++++++++++
 kernel/sched/sched.h           |  3 +++
 mm/memcontrol.c                |  2 ++
 mm/memory.c                    |  2 +-
 mm/mempolicy.c                 |  7 ++++++
 mm/mprotect.c                  |  5 ++--
 mm/vmstat.c                    |  2 ++
 14 files changed, 89 insertions(+), 5 deletions(-)

-- 
2.25.1
Re: [RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control
Posted by Kaiyang Zhao 11 months, 1 week ago
On Tue, Feb 25, 2025 at 09:59:33PM +0800, Chen Yu wrote:
> This per-cgroup NUMA balancing control was once proposed in
> 2019 by Yun Wang[1]. Then, in 2024, Kaiyang Zhao mentioned
> that he was working with Meta on per-cgroup NUMA control[2]
> during a discussion with David Rientjes.
> 
> I could not find further discussion regarding per-cgroup NUMA
> balancing from that point on. This set of RFC patches is a
> rough and compile-passed version, and may have unhandled cases
> (for example, THP). It has not been thoroughly tested and is
> intended to initiate or resume the discussion on the topic of
> per-cgroup NUMA load balancing.

Hello Chen,

It's nice to see people interested in this. I posted a set of RFC patches
later[1] that focuses on the fairness issue in memory tiering. It mostly
concerns the demotion side of things, and the promotion / NUMA balancing
side of things was left out of the patch set.

I don't work for Meta now, but my understanding is that they'll attempt
to push through a solution for per-cgroup control of memory tiering that
is in the same vein as my RFC patches, and it may include controls for
per-group NUMA balancing in the context of tiered memory.

Best,
Kaiyang

[1] https://lore.kernel.org/linux-mm/20240920221202.1734227-1-kaiyang2@cs.cmu.edu/
Re: [RFC PATCH 0/3] sched/numa: Introduce per cgroup numa balance control
Posted by Chen Yu 11 months, 1 week ago
Hi Kaiyang,

On 2025-03-05 at 14:38:14 +0000, Kaiyang Zhao wrote:
> On Tue, Feb 25, 2025 at 09:59:33PM +0800, Chen Yu wrote:
> > This per-cgroup NUMA balancing control was once proposed in
> > 2019 by Yun Wang[1]. Then, in 2024, Kaiyang Zhao mentioned
> > that he was working with Meta on per-cgroup NUMA control[2]
> > during a discussion with David Rientjes.
> > 
> > I could not find further discussion regarding per-cgroup NUMA
> > balancing from that point on. This set of RFC patches is a
> > rough and compile-passed version, and may have unhandled cases
> > (for example, THP). It has not been thoroughly tested and is
> > intended to initiate or resume the discussion on the topic of
> > per-cgroup NUMA load balancing.
> 
> Hello Chen,
> 
> It's nice to see people interested in this. I posted a set of RFC patches
> later[1] that focuses on the fairness issue in memory tiering. It mostly
> concerns the demotion side of things, and the promotion / NUMA balancing
> side of things was left out of the patch set.
>

I see, thanks for the information.
 
> I don't work for Meta now, but my understanding is that they'll attempt
> to push through a solution for per-cgroup control of memory tiering that
> is in the same vein as my RFC patches, and it may include controls for
> per-group NUMA balancing in the context of tiered memory.
>

OK, it would be nice to see that patch set. We can continue the disscussion
on this basic per-cgroup Numa balancing control, the tiered memory promotion
could be on top of that IMO.

thanks,
Chenyu
 
> Best,
> Kaiyang
> 
> [1] https://lore.kernel.org/linux-mm/20240920221202.1734227-1-kaiyang2@cs.cmu.edu/