[PATCH v2 0/9] support ksm_stat showing at cgroup level

xu.xin16@zte.com.cn posted 9 patches 9 months, 1 week ago
Documentation/admin-guide/cgroup-v1/memory.rst | 36 +++++++++++
Documentation/admin-guide/cgroup-v2.rst        | 12 ++++
include/linux/memcontrol.h                     | 14 +++++
mm/memcontrol-v1.c                             |  6 ++
mm/memcontrol.c                                | 83 +++++++++++++++++++++++++-
mm/oom_kill.c                                  |  6 +-
6 files changed, 152 insertions(+), 5 deletions(-)
[PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by xu.xin16@zte.com.cn 9 months, 1 week ago
From: xu xin <xu.xin16@zte.com.cn>

With the enablement of container-level KSM (e.g., via prctl [1]), there is
a growing demand for container-level observability of KSM behavior. However,
current cgroup implementations lack support for exposing KSM-related
metrics.

This patch introduces a new interface named ksm_stat
at the cgroup hierarchy level, enabling users to monitor KSM merging
statistics specifically for containers where this feature has been
activated, eliminating the need to manually inspect KSM information for
each individual process within the cgroup.

Users can obtain the KSM information of a cgroup just by:

# cat /sys/fs/cgroup/memory.ksm_stat
ksm_rmap_items 76800
ksm_zero_pages 0
ksm_merging_pages 76800
ksm_process_profit 309657600

Current implementation supports both cgroup v2 and cgroup v1.

xu xin (9):
  memcontrol: rename mem_cgroup_scan_tasks()
  memcontrol: introduce the new mem_cgroup_scan_tasks()
  memcontrol: introduce ksm_stat at memcg-v2
  memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat
  memcontrol: add ksm_merging_pages in cgroup/memory.ksm_stat
  memcontrol: add ksm_profit in cgroup/memory.ksm_stat
  memcontrol-v1: add ksm_stat at memcg-v1
  Documentation: add ksm_stat description in cgroup-v1/memory.rst
  Documentation: add ksm_stat description in cgroup-v2.rst

 Documentation/admin-guide/cgroup-v1/memory.rst | 36 +++++++++++
 Documentation/admin-guide/cgroup-v2.rst        | 12 ++++
 include/linux/memcontrol.h                     | 14 +++++
 mm/memcontrol-v1.c                             |  6 ++
 mm/memcontrol.c                                | 83 +++++++++++++++++++++++++-
 mm/oom_kill.c                                  |  6 +-
 6 files changed, 152 insertions(+), 5 deletions(-)

-- 
2.15.2
Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by Andrew Morton 9 months, 1 week ago
On Thu, 1 May 2025 12:08:54 +0800 (CST) <xu.xin16@zte.com.cn> wrote:

> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related
> metrics.
> 
> This patch introduces a new interface named ksm_stat
> at the cgroup hierarchy level, enabling users to monitor KSM merging
> statistics specifically for containers where this feature has been
> activated, eliminating the need to manually inspect KSM information for
> each individual process within the cgroup.

Well, you didn't cc any of the memcg maintainers!

The feature seems desirable and the implementation straightforward. 
I'll add the patchset into mm.git for some testing, pending review
outcomes, thanks.
Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by Shakeel Butt 9 months, 1 week ago
On Thu, May 01, 2025 at 12:08:54PM +0800, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
> 
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related
> metrics.
> 
> This patch introduces a new interface named ksm_stat
> at the cgroup hierarchy level, enabling users to monitor KSM merging
> statistics specifically for containers where this feature has been
> activated, eliminating the need to manually inspect KSM information for
> each individual process within the cgroup.
> 
> Users can obtain the KSM information of a cgroup just by:
> 
> # cat /sys/fs/cgroup/memory.ksm_stat
> ksm_rmap_items 76800
> ksm_zero_pages 0
> ksm_merging_pages 76800
> ksm_process_profit 309657600
> 
> Current implementation supports both cgroup v2 and cgroup v1.
> 

Before adding these stats to memcg, add global stats for them in
enum node_stat_item and then you can expose them in memcg through
memory.stat instead of a new interface.
Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by xu.xin16@zte.com.cn 9 months, 1 week ago
> > Users can obtain the KSM information of a cgroup just by:
> > 
> > # cat /sys/fs/cgroup/memory.ksm_stat
> > ksm_rmap_items 76800
> > ksm_zero_pages 0
> > ksm_merging_pages 76800
> > ksm_process_profit 309657600
> > 
> > Current implementation supports both cgroup v2 and cgroup v1.
> > 
> 
> Before adding these stats to memcg, add global stats for them in
> enum node_stat_item and then you can expose them in memcg through
> memory.stat instead of a new interface.

Dear shakeel.butt,

If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
CPU-consuming while normal KSM procedures happen. Or, we can just traversal all processes of
this memcg and sum their ksm'counters like the current patche set implmentation.

If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
less advisable since they are strongly coupled with KSM internals and would primarily interest
users monitoring KSM-specific behavior.

Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
structural consistency with the existing /proc/<pid>/ksm_stat interface.
Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by Shakeel Butt 9 months ago
On Tue, May 06, 2025 at 01:09:25PM +0800, xu.xin16@zte.com.cn wrote:
> > > Users can obtain the KSM information of a cgroup just by:
> > > 
> > > # cat /sys/fs/cgroup/memory.ksm_stat
> > > ksm_rmap_items 76800
> > > ksm_zero_pages 0
> > > ksm_merging_pages 76800
> > > ksm_process_profit 309657600
> > > 
> > > Current implementation supports both cgroup v2 and cgroup v1.
> > > 
> > 
> > Before adding these stats to memcg, add global stats for them in
> > enum node_stat_item and then you can expose them in memcg through
> > memory.stat instead of a new interface.
> 
> Dear shakeel.butt,
> 
> If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
> code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
> CPU-consuming while normal KSM procedures happen.

How is it more expensive than traversing all processes?
__lruvec_stat_add_folio() and related functions are already called in many
performance critical code paths, so I don't see any issue to call in the
ksm.

> Or, we can just traversal all processes of
> this memcg and sum their ksm'counters like the current patche set implmentation.
> 
> If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
> it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
> less advisable since they are strongly coupled with KSM internals and would primarily interest
> users monitoring KSM-specific behavior.

We can discuss and decide each individual ksm stat if it makes sense to
added to memcg or not.

> 
> Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
> structural consistency with the existing /proc/<pid>/ksm_stat interface.

Sorry, I don't agree with this rationale. This is a separate interface
and can be different from exisiting ksm interface. We can define however
we think is right way to do for memcg and yes there can be stats overlap
with older interface.

For now I would say start with the ksm metrics that are appropriate to
be exposed globally and then we can see if those are fine for memcg as
well.
Re: [PATCH v2 0/9] support ksm_stat showing at cgroup level
Posted by xu.xin16@zte.com.cn 8 months, 1 week ago
> > > > Users can obtain the KSM information of a cgroup just by:
> > > > 
> > > > # cat /sys/fs/cgroup/memory.ksm_stat
> > > > ksm_rmap_items 76800
> > > > ksm_zero_pages 0
> > > > ksm_merging_pages 76800
> > > > ksm_process_profit 309657600
> > > > 
> > > > Current implementation supports both cgroup v2 and cgroup v1.
> > > > 
> > > 
> > > Before adding these stats to memcg, add global stats for them in
> > > enum node_stat_item and then you can expose them in memcg through
> > > memory.stat instead of a new interface.
> > 
> > Dear shakeel.butt,
> > 
> > If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating
> > code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra
> > CPU-consuming while normal KSM procedures happen.
> 
> How is it more expensive than traversing all processes?
> __lruvec_stat_add_folio() and related functions are already called in many
> performance critical code paths, so I don't see any issue to call in the
> ksm.
> 
> > Or, we can just traversal all processes of
> > this memcg and sum their ksm'counters like the current patche set implmentation.
> > 
> > If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as
> > it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is
> > less advisable since they are strongly coupled with KSM internals and would primarily interest
> > users monitoring KSM-specific behavior.
> 
> We can discuss and decide each individual ksm stat if it makes sense to
> added to memcg or not.
> 
> > 
> > Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining
> > structural consistency with the existing /proc/<pid>/ksm_stat interface.
> 
> Sorry, I don't agree with this rationale. This is a separate interface
> and can be different from exisiting ksm interface. We can define however
> we think is right way to do for memcg and yes there can be stats overlap
> with older interface.
> 
> For now I would say start with the ksm metrics that are appropriate to
> be exposed globally and then we can see if those are fine for memcg as
> well.

Thank you very much for your suggestion, and I'm sorry for the delayed reply
as last month I was exceptionally busy. 

Upon further consideration, I agree that adding entries to the existing memory.stat
interface is indeed preferable to arbitrarily creating new interfaces.  Therefore, my
next step is to plan adding the following global KSM metrics to memory.stat,  such as
ksm_merged, ksm_unmergable, ksm_zero, and ksm_profit. (These represent the
total amount of merged pages, unmergable pages, zero pages merged by KSM, and
the overall profit, respectively.) However, please note that ksm_merging_pages and
ksm_unshared need to be converted to be represented in bytes.”