Documentation/admin-guide/cgroup-v1/memory.rst | 36 +++++++++++ Documentation/admin-guide/cgroup-v2.rst | 12 ++++ include/linux/memcontrol.h | 14 +++++ mm/memcontrol-v1.c | 6 ++ mm/memcontrol.c | 83 +++++++++++++++++++++++++- mm/oom_kill.c | 6 +- 6 files changed, 152 insertions(+), 5 deletions(-)
From: xu xin <xu.xin16@zte.com.cn> With the enablement of container-level KSM (e.g., via prctl [1]), there is a growing demand for container-level observability of KSM behavior. However, current cgroup implementations lack support for exposing KSM-related metrics. This patch introduces a new interface named ksm_stat at the cgroup hierarchy level, enabling users to monitor KSM merging statistics specifically for containers where this feature has been activated, eliminating the need to manually inspect KSM information for each individual process within the cgroup. Users can obtain the KSM information of a cgroup just by: # cat /sys/fs/cgroup/memory.ksm_stat ksm_rmap_items 76800 ksm_zero_pages 0 ksm_merging_pages 76800 ksm_process_profit 309657600 Current implementation supports both cgroup v2 and cgroup v1. xu xin (9): memcontrol: rename mem_cgroup_scan_tasks() memcontrol: introduce the new mem_cgroup_scan_tasks() memcontrol: introduce ksm_stat at memcg-v2 memcontrol: add ksm_zero_pages in cgroup/memory.ksm_stat memcontrol: add ksm_merging_pages in cgroup/memory.ksm_stat memcontrol: add ksm_profit in cgroup/memory.ksm_stat memcontrol-v1: add ksm_stat at memcg-v1 Documentation: add ksm_stat description in cgroup-v1/memory.rst Documentation: add ksm_stat description in cgroup-v2.rst Documentation/admin-guide/cgroup-v1/memory.rst | 36 +++++++++++ Documentation/admin-guide/cgroup-v2.rst | 12 ++++ include/linux/memcontrol.h | 14 +++++ mm/memcontrol-v1.c | 6 ++ mm/memcontrol.c | 83 +++++++++++++++++++++++++- mm/oom_kill.c | 6 +- 6 files changed, 152 insertions(+), 5 deletions(-) -- 2.15.2
On Thu, 1 May 2025 12:08:54 +0800 (CST) <xu.xin16@zte.com.cn> wrote: > With the enablement of container-level KSM (e.g., via prctl [1]), there is > a growing demand for container-level observability of KSM behavior. However, > current cgroup implementations lack support for exposing KSM-related > metrics. > > This patch introduces a new interface named ksm_stat > at the cgroup hierarchy level, enabling users to monitor KSM merging > statistics specifically for containers where this feature has been > activated, eliminating the need to manually inspect KSM information for > each individual process within the cgroup. Well, you didn't cc any of the memcg maintainers! The feature seems desirable and the implementation straightforward. I'll add the patchset into mm.git for some testing, pending review outcomes, thanks.
On Thu, May 01, 2025 at 12:08:54PM +0800, xu.xin16@zte.com.cn wrote: > From: xu xin <xu.xin16@zte.com.cn> > > With the enablement of container-level KSM (e.g., via prctl [1]), there is > a growing demand for container-level observability of KSM behavior. However, > current cgroup implementations lack support for exposing KSM-related > metrics. > > This patch introduces a new interface named ksm_stat > at the cgroup hierarchy level, enabling users to monitor KSM merging > statistics specifically for containers where this feature has been > activated, eliminating the need to manually inspect KSM information for > each individual process within the cgroup. > > Users can obtain the KSM information of a cgroup just by: > > # cat /sys/fs/cgroup/memory.ksm_stat > ksm_rmap_items 76800 > ksm_zero_pages 0 > ksm_merging_pages 76800 > ksm_process_profit 309657600 > > Current implementation supports both cgroup v2 and cgroup v1. > Before adding these stats to memcg, add global stats for them in enum node_stat_item and then you can expose them in memcg through memory.stat instead of a new interface.
> > Users can obtain the KSM information of a cgroup just by: > > > > # cat /sys/fs/cgroup/memory.ksm_stat > > ksm_rmap_items 76800 > > ksm_zero_pages 0 > > ksm_merging_pages 76800 > > ksm_process_profit 309657600 > > > > Current implementation supports both cgroup v2 and cgroup v1. > > > > Before adding these stats to memcg, add global stats for them in > enum node_stat_item and then you can expose them in memcg through > memory.stat instead of a new interface. Dear shakeel.butt, If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra CPU-consuming while normal KSM procedures happen. Or, we can just traversal all processes of this memcg and sum their ksm'counters like the current patche set implmentation. If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is less advisable since they are strongly coupled with KSM internals and would primarily interest users monitoring KSM-specific behavior. Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining structural consistency with the existing /proc/<pid>/ksm_stat interface.
On Tue, May 06, 2025 at 01:09:25PM +0800, xu.xin16@zte.com.cn wrote: > > > Users can obtain the KSM information of a cgroup just by: > > > > > > # cat /sys/fs/cgroup/memory.ksm_stat > > > ksm_rmap_items 76800 > > > ksm_zero_pages 0 > > > ksm_merging_pages 76800 > > > ksm_process_profit 309657600 > > > > > > Current implementation supports both cgroup v2 and cgroup v1. > > > > > > > Before adding these stats to memcg, add global stats for them in > > enum node_stat_item and then you can expose them in memcg through > > memory.stat instead of a new interface. > > Dear shakeel.butt, > > If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating > code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra > CPU-consuming while normal KSM procedures happen. How is it more expensive than traversing all processes? __lruvec_stat_add_folio() and related functions are already called in many performance critical code paths, so I don't see any issue to call in the ksm. > Or, we can just traversal all processes of > this memcg and sum their ksm'counters like the current patche set implmentation. > > If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as > it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is > less advisable since they are strongly coupled with KSM internals and would primarily interest > users monitoring KSM-specific behavior. We can discuss and decide each individual ksm stat if it makes sense to added to memcg or not. > > Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining > structural consistency with the existing /proc/<pid>/ksm_stat interface. Sorry, I don't agree with this rationale. This is a separate interface and can be different from exisiting ksm interface. We can define however we think is right way to do for memcg and yes there can be stats overlap with older interface. For now I would say start with the ksm metrics that are appropriate to be exposed globally and then we can see if those are fine for memcg as well.
> > > > Users can obtain the KSM information of a cgroup just by: > > > > > > > > # cat /sys/fs/cgroup/memory.ksm_stat > > > > ksm_rmap_items 76800 > > > > ksm_zero_pages 0 > > > > ksm_merging_pages 76800 > > > > ksm_process_profit 309657600 > > > > > > > > Current implementation supports both cgroup v2 and cgroup v1. > > > > > > > > > > Before adding these stats to memcg, add global stats for them in > > > enum node_stat_item and then you can expose them in memcg through > > > memory.stat instead of a new interface. > > > > Dear shakeel.butt, > > > > If adding these ksm-related items to enum node_stat_item and bringing extra counters-updating > > code like __lruvec_stat_add_folio()... embedded into KSM procudure, it increases extra > > CPU-consuming while normal KSM procedures happen. > > How is it more expensive than traversing all processes? > __lruvec_stat_add_folio() and related functions are already called in many > performance critical code paths, so I don't see any issue to call in the > ksm. > > > Or, we can just traversal all processes of > > this memcg and sum their ksm'counters like the current patche set implmentation. > > > > If only including a single "KSM merged pages" entry in memory.stat, I think it is reasonable as > > it reflects this memcg's KSM page count. However, adding the other three KSM-related metrics is > > less advisable since they are strongly coupled with KSM internals and would primarily interest > > users monitoring KSM-specific behavior. > > We can discuss and decide each individual ksm stat if it makes sense to > added to memcg or not. > > > > > Last but not least, the rationale for adding a ksm_stat entry to memcg also lies in maintaining > > structural consistency with the existing /proc/<pid>/ksm_stat interface. > > Sorry, I don't agree with this rationale. This is a separate interface > and can be different from exisiting ksm interface. We can define however > we think is right way to do for memcg and yes there can be stats overlap > with older interface. > > For now I would say start with the ksm metrics that are appropriate to > be exposed globally and then we can see if those are fine for memcg as > well. Thank you very much for your suggestion, and I'm sorry for the delayed reply as last month I was exceptionally busy. Upon further consideration, I agree that adding entries to the existing memory.stat interface is indeed preferable to arbitrarily creating new interfaces. Therefore, my next step is to plan adding the following global KSM metrics to memory.stat, such as ksm_merged, ksm_unmergable, ksm_zero, and ksm_profit. (These represent the total amount of merged pages, unmergable pages, zero pages merged by KSM, and the overall profit, respectively.) However, please note that ksm_merging_pages and ksm_unshared need to be converted to be represented in bytes.”
© 2016 - 2026 Red Hat, Inc.