From nobody Mon Feb 9 05:59:16 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE74DEB64D7 for ; Wed, 21 Jun 2023 18:17:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230075AbjFUSR2 (ORCPT ); Wed, 21 Jun 2023 14:17:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231562AbjFUSRM (ORCPT ); Wed, 21 Jun 2023 14:17:12 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1176319A2 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id 98e67ed59e1d1-25eccff0a51so3839298a91.0 for ; Wed, 21 Jun 2023 11:17:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=TwWoi+pZl5v8X1NfXYAXDFTjFQRH0gRf8kcFXat6SAjcaa5QAAlB1haHO0q77RAinF D7xdtDa76dKs0GBJ34FBnPIsbh4xyimY5wrW1f6snuskuEcZJCOzknkqQ824BwEme3MM E6lxh0I4tvcYJRHOS1/KATWGHFctdscLKfObeEhRzcu1IiMQFS9ynJo1hdgECo0P8Gpt Xor2kFxaCjpsZ3henaaUXtkntEKa7iivsPZyOrHfOsxsId1bV6FY8Ohj1/LpkWsG+GdH d7oCOnaC7hgluf1VkKFS52Jq6dvfFstI19Xby8uEeRklXtHAMo72yh/opONH6YPZCAbt StMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687371430; x=1689963430; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P8vYzFHxLWdhm/YYBuM4cir+l3MkexuRd1yB7xK/qZo=; b=dMR9ebuhYT7nBpULJyAlqdHSqj3+XYTozxyly/GtbmNU1d16GW4FinyH5XHFe26GMA iXl58N8pVVEsurCMxyXRHIMI1eLL2xFwXBKoqAJErj+49e706NKN0JrnnVE0Sd0iIuJD ID/SWziIHjo6O0X68f4rO0DrsqMw1zMV8CmWmtyq+uaXYrD+hSd3UyHpMzIwlw7iQq+B fkIKsLp02ZdxliqiEJrXvofWd+13oZcJ2DhjI8gXkd+ankHlY7hF+1f6yF2xVzGP9FeO uILbKcy8MrbY51sN8jS2yydAmkHJC+DnO31Hdt3m3EsPbK+H6skk8o5aTEoeOFiw2fCd Y6qw== X-Gm-Message-State: AC+VfDyuFNqmbppcfw5hjtAK/zjNx8HgYIiJa68Zl4dxw3t31AuAZpG0 Xkj+zUQBdFG5SriSS0A8p94rETpX11NM X-Google-Smtp-Source: ACHHUZ5ChKgXaGOZiUgiMpHUxjfmbgkoCy3pGADXdgSNrd1s/teZwOa9M7BVnVWQRd1xAjPY/MRtw95ar8Wv X-Received: from yuanchu.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:4024]) (user=yuanchu job=sendgmr) by 2002:a17:90a:cb06:b0:25b:f973:f944 with SMTP id z6-20020a17090acb0600b0025bf973f944mr2138645pjt.3.1687371430482; Wed, 21 Jun 2023 11:17:10 -0700 (PDT) Date: Wed, 21 Jun 2023 18:04:52 +0000 In-Reply-To: <20230621180454.973862-1-yuanchu@google.com> Mime-Version: 1.0 References: <20230621180454.973862-1-yuanchu@google.com> X-Mailer: git-send-email 2.41.0.162.gfafddb0af9-goog Message-ID: <20230621180454.973862-5-yuanchu@google.com> Subject: [RFC PATCH v2 4/6] mm: extend working set reporting to memcgs From: Yuanchu Xie To: Greg Kroah-Hartman , "Rafael J . Wysocki" , "Michael S . Tsirkin" , David Hildenbrand , Jason Wang , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Yu Zhao , Kefeng Wang , Kairui Song , Yosry Ahmed , Yuanchu Xie , "T . J . Alumbaugh" Cc: Wei Xu , SeongJae Park , Sudarshan Rajagopalan , kai.huang@intel.com, hch@lst.de, jon@nutanix.com, Aneesh Kumar K V , Matthew Wilcox , Vasily Averin , linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Break down the system-wide working set reporting into per-memcg reports, which aggregages its children hierarchically. The per-node working set reporting histograms and refresh/report threshold files are presented as memcg files, showing a report containing all the nodes. Signed-off-by: T.J. Alumbaugh Signed-off-by: Yuanchu Xie --- include/linux/memcontrol.h | 6 + include/linux/wsr.h | 4 + mm/memcontrol.c | 262 ++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 9 +- 4 files changed, 277 insertions(+), 4 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 85dc9b88ea379..96971aa6a48cd 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -10,6 +10,7 @@ =20 #ifndef _LINUX_MEMCONTROL_H #define _LINUX_MEMCONTROL_H +#include #include #include #include @@ -325,6 +326,11 @@ struct mem_cgroup { struct lru_gen_mm_list mm_list; #endif =20 +#ifdef CONFIG_WSR + int wsr_event; + wait_queue_head_t wsr_wait_queue; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; =20 diff --git a/include/linux/wsr.h b/include/linux/wsr.h index 85c901ce026b9..d45f7cc0672ac 100644 --- a/include/linux/wsr.h +++ b/include/linux/wsr.h @@ -48,6 +48,7 @@ ssize_t wsr_intervals_ms_parse(char *src, struct ws_bin *= bins); */ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *root, struct pglist_data *pgdat); +void report_ws(struct pglist_data *pgdat, struct scan_control *sc); #else struct ws_bin; struct wsr; @@ -73,6 +74,9 @@ static inline void wsr_refresh(struct wsr *wsr, struct me= m_cgroup *root, struct pglist_data *pgdat) { } +static inline void report_ws(struct pglist_data *pgdat, struct scan_contro= l *sc) +{ +} #endif /* CONFIG_WSR */ =20 #endif /* _LINUX_WSR_H */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2eee092f8f119..edf5bb31bb19c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ =20 +#include #include #include #include @@ -65,6 +66,7 @@ #include #include "internal.h" #include +#include #include #include "slab.h" #include "swap.h" @@ -5233,6 +5235,7 @@ static void free_mem_cgroup_per_node_info(struct mem_= cgroup *memcg, int node) if (!pn) return; =20 + wsr_destroy(&pn->lruvec); free_percpu(pn->lruvec_stats_percpu); kfree(pn); } @@ -5311,6 +5314,10 @@ static struct mem_cgroup *mem_cgroup_alloc(void) spin_lock_init(&memcg->deferred_split_queue.split_queue_lock); INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue); memcg->deferred_split_queue.split_queue_len =3D 0; +#endif +#ifdef CONFIG_WSR + memcg->wsr_event =3D 0; + init_waitqueue_head(&memcg->wsr_wait_queue); #endif idr_replace(&mem_cgroup_idr, memcg, memcg->id.id); lru_gen_init_memcg(memcg); @@ -5411,6 +5418,11 @@ static void mem_cgroup_css_offline(struct cgroup_sub= sys_state *css) } spin_unlock_irq(&memcg->event_list_lock); =20 +#ifdef CONFIG_WSR + wake_up_pollfree(&memcg->wsr_wait_queue); + synchronize_rcu(); +#endif + page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); =20 @@ -6642,6 +6654,228 @@ static ssize_t memory_reclaim(struct kernfs_open_fi= le *of, char *buf, return nbytes; } =20 +#ifdef CONFIG_WSR +static int memory_wsr_intervals_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr; + struct ws_bin *bin; + + wsr =3D lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + seq_printf(m, "N%d=3D", nid); + for (bin =3D wsr->bins; bin->idle_age !=3D -1; bin++) + seq_printf(m, "%u,", jiffies_to_msecs(bin->idle_age)); + mutex_unlock(&wsr->bins_lock); + + seq_printf(m, "%lld ", LLONG_MAX); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_intervals_ms_parse(struct kernfs_open_file *of, + char *buf, size_t nbytes, + unsigned int *nid_out, + struct ws_bin *bins) +{ + char *node, *intervals; + unsigned int nid; + int err; + + buf =3D strstrip(buf); + intervals =3D buf; + node =3D strsep(&intervals, "=3D"); + + if (*node !=3D 'N') + return -EINVAL; + + err =3D kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >=3D nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err =3D wsr_intervals_ms_parse(intervals, bins); + if (err) + return err; + + *nid_out =3D nid; + return 0; +} + +static ssize_t memory_wsr_intervals_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int nid; + int err; + struct wsr *wsr; + struct ws_bin *bins; + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + + bins =3D kzalloc(sizeof(wsr->bins), GFP_KERNEL); + if (!bins) + return -ENOMEM; + + err =3D memory_wsr_intervals_ms_parse(of, buf, nbytes, &nid, bins); + if (err) + goto failed; + + wsr =3D lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + mutex_lock(&wsr->bins_lock); + memcpy(wsr->bins, bins, sizeof(wsr->bins)); + mutex_unlock(&wsr->bins_lock); +failed: + kfree(bins); + return err ?: nbytes; +} + +static int memory_wsr_refresh_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr =3D + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=3D%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->refresh_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_threshold_parse(char *buf, size_t nbytes, + unsigned int *nid_out, + unsigned int *msecs) +{ + char *node, *threshold; + unsigned int nid; + int err; + + buf =3D strstrip(buf); + threshold =3D buf; + node =3D strsep(&threshold, "=3D"); + + if (*node !=3D 'N') + return -EINVAL; + + err =3D kstrtouint(node + 1, 0, &nid); + if (err) + return err; + + if (nid >=3D nr_node_ids || !node_state(nid, N_MEMORY)) + return -EINVAL; + + err =3D kstrtouint(threshold, 0, msecs); + if (err) + return err; + + *nid_out =3D nid; + + return nbytes; +} + +static ssize_t memory_wsr_refresh_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + ssize_t ret =3D memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr =3D lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->refresh_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_report_ms_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr =3D + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + + seq_printf(m, "N%d=3D%u ", nid, + jiffies_to_msecs(READ_ONCE(wsr->report_threshold))); + } + seq_putc(m, '\n'); + + return 0; +} + +static ssize_t memory_wsr_report_ms_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + unsigned int nid, msecs; + struct wsr *wsr; + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + ssize_t ret =3D memory_wsr_threshold_parse(buf, nbytes, &nid, &msecs); + + if (ret < 0) + return ret; + + wsr =3D lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + WRITE_ONCE(wsr->report_threshold, msecs_to_jiffies(msecs)); + return ret; +} + +static int memory_wsr_histogram_show(struct seq_file *m, void *v) +{ + int nid; + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + for_each_node_state(nid, N_MEMORY) { + struct wsr *wsr =3D + lruvec_wsr(mem_cgroup_lruvec(memcg, NODE_DATA(nid))); + struct ws_bin *bin; + + seq_printf(m, "N%d\n", nid); + + mutex_lock(&wsr->bins_lock); + wsr_refresh(wsr, memcg, NODE_DATA(nid)); + for (bin =3D wsr->bins; bin->idle_age !=3D -1; bin++) + seq_printf(m, "%u anon=3D%lu file=3D%lu\n", + jiffies_to_msecs(bin->idle_age), + bin->nr_pages[0], bin->nr_pages[1]); + + seq_printf(m, "%lld anon=3D%lu file=3D%lu\n", LLONG_MAX, + bin->nr_pages[0], bin->nr_pages[1]); + + mutex_unlock(&wsr->bins_lock); + } + + return 0; +} + +__poll_t memory_wsr_histogram_poll(struct kernfs_open_file *of, + struct poll_table_struct *pt) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + + if (memcg->css.flags & CSS_DYING) + return DEFAULT_POLLMASK; + + poll_wait(of->file, &memcg->wsr_wait_queue, pt); + if (cmpxchg(&memcg->wsr_event, 1, 0) =3D=3D 1) + return DEFAULT_POLLMASK | EPOLLPRI; + return DEFAULT_POLLMASK; +} +#endif + static struct cftype memory_files[] =3D { { .name =3D "current", @@ -6710,7 +6944,33 @@ static struct cftype memory_files[] =3D { .flags =3D CFTYPE_NS_DELEGATABLE, .write =3D memory_reclaim, }, - { } /* terminate */ +#ifdef CONFIG_WSR + { + .name =3D "wsr.intervals_ms", + .flags =3D CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show =3D memory_wsr_intervals_ms_show, + .write =3D memory_wsr_intervals_ms_write, + }, + { + .name =3D "wsr.refresh_ms", + .flags =3D CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show =3D memory_wsr_refresh_ms_show, + .write =3D memory_wsr_refresh_ms_write, + }, + { + .name =3D "wsr.report_ms", + .flags =3D CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show =3D memory_wsr_report_ms_show, + .write =3D memory_wsr_report_ms_write, + }, + { + .name =3D "wsr.histogram", + .flags =3D CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show =3D memory_wsr_histogram_show, + .poll =3D memory_wsr_histogram_poll, + }, +#endif + {} /* terminate */ }; =20 struct cgroup_subsys memory_cgrp_subsys =3D { diff --git a/mm/vmscan.c b/mm/vmscan.c index c56fddcec88fb..ba254b6e91e19 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4559,8 +4559,6 @@ static bool age_lruvec(struct lruvec *lruvec, struct = scan_control *sc, unsigned return true; } =20 -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc); - /* to protect the working set of the last N jiffies */ static unsigned long lru_gen_min_ttl __read_mostly; =20 @@ -5937,7 +5935,7 @@ void wsr_refresh(struct wsr *wsr, struct mem_cgroup *= root, } } =20 -static void report_ws(struct pglist_data *pgdat, struct scan_control *sc) +void report_ws(struct pglist_data *pgdat, struct scan_control *sc) { static DEFINE_RATELIMIT_STATE(rate, HZ, 3); =20 @@ -5969,6 +5967,8 @@ static void report_ws(struct pglist_data *pgdat, stru= ct scan_control *sc) =20 if (wsr->notifier) kernfs_notify(wsr->notifier); + if (memcg && cmpxchg(&memcg->wsr_event, 0, 1) =3D=3D 0) + wake_up_interruptible(&memcg->wsr_wait_queue); } #endif /* CONFIG_WSR */ =20 @@ -6486,6 +6486,9 @@ static void shrink_zones(struct zonelist *zonelist, s= truct scan_control *sc) if (zone->zone_pgdat =3D=3D last_pgdat) continue; last_pgdat =3D zone->zone_pgdat; + + if (!sc->proactive) + report_ws(zone->zone_pgdat, sc); shrink_node(zone->zone_pgdat, sc); } =20 --=20 2.41.0.162.gfafddb0af9-goog