From nobody Thu Sep 18 00:07:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86D3DC4332F for ; Wed, 14 Dec 2022 22:52:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229645AbiLNWvv (ORCPT ); Wed, 14 Dec 2022 17:51:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229620AbiLNWvt (ORCPT ); Wed, 14 Dec 2022 17:51:49 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D45A3134E for ; Wed, 14 Dec 2022 14:51:48 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id h66-20020a252145000000b0071a7340eea9so1525245ybh.6 for ; Wed, 14 Dec 2022 14:51:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EUc+v5HwD46tRHbZWt0mN+PX65Kt87vRY2PWRTrdra8=; b=D1ZGqVZqRfjR4k+QCHfaRDc8HXpG/JxiKLnotRDm7Q0jKKMK98yfDAMjKF0CLGdkeO BVF5VEcuWQxJx6HnJzNbK6mWrwXtEdlrjzqRS0tZNCszk9P+OwNxJmLtbk++RNZ3MJ2G CZO/woce3dSyn1ylx7eIxBwwxjPYWeaDpc35wLuMTka2EBsXgXps1k6sEw7ixcsNXF9p XS/ySLs5INBD64ztnBLJmjS2xKhcPb2uQ8HO5xHRDe3Z59Zo8PPRKLq8X555tFBm+KIb asLhS5yiHmEdXlSIpeEqhJHQ5RCV8f1SRahCwzobKiTodVo1k1IFYKNhqPiusXTCOodl f8fQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EUc+v5HwD46tRHbZWt0mN+PX65Kt87vRY2PWRTrdra8=; b=3adXCXyHbfBPpmjVV9utohdEGI8KSkPdJTqsWIOdAPetc5DG5O5EptLPkQUV/Kj08f keclPAGJHCE20I1NZf/qiGuUHtuyKr80ZWVDgLdyZfy0DKI3jllvUBJxpyy5mPQyK9w/ Bd32g2vmwHxheWxo9BjGWXBA20NRwZ5qYi2e9Gmi/Lp6GS9nDbfpU+lGyrMf1JqAmQwl F4nqKRmCVq8r6rZIw6aXKjEhH5YTswp/xe0mHzcCS9KPyg8BzpBrnJdSfxVqpAZGVNC6 E1hD7+FBlga9nE6z3H2b+alBBKm8jsX89l/mvm6sQvwY1pVETDq3tMypD4yyReHGVQlj 8IDA== X-Gm-Message-State: ANoB5pkwCTunUvU5rFTGprwIpmll3CJPsOfwDbaOAMikwuStMlSzc9xC 7eaTA6UEkM3Ud/DmY2UCfN+PSQETb60T X-Google-Smtp-Source: AA0mqf7DGDIJUfC4VIyKgAVtfwcgyC3fcmhOUgkP+R5oeIkqE2QVIN7CL/qi6/psX2VJUox0PiyWM7a9fBN1 X-Received: from yuanchu.svl.corp.google.com ([2620:15c:2d4:203:1311:60bc:9e2a:ab1]) (user=yuanchu job=sendgmr) by 2002:a05:690c:583:b0:3bd:4b7a:c0b1 with SMTP id bo3-20020a05690c058300b003bd4b7ac0b1mr1390108ywb.220.1671058307326; Wed, 14 Dec 2022 14:51:47 -0800 (PST) Date: Wed, 14 Dec 2022 14:51:22 -0800 In-Reply-To: <20221214225123.2770216-1-yuanchu@google.com> Mime-Version: 1.0 References: <20221214225123.2770216-1-yuanchu@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221214225123.2770216-2-yuanchu@google.com> Subject: [RFC PATCH 1/2] mm: multi-gen LRU: periodic aging From: Yuanchu Xie To: Johannes Weiner , Michal Hocko , Roman Gushchin , Yu Zhao Cc: Andrew Morton , Shakeel Butt , Muchun Song , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yuanchu Xie Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Periodically age MGLRU-enabled lruvecs to turn MGLRU generations into time-based working set information. This includes an interface to set the periodic aging interval and a new kthread to perform aging. memory.periodic_aging: a new root-level only file in cgroupfs Writing to memory.periodic aging sets the aging interval and opts into periodic aging. kold: a new kthread that ages memcgs based on the set aging interval. Signed-off-by: Yuanchu Xie --- include/linux/kold.h | 44 ++++++++++++ include/linux/mmzone.h | 4 +- mm/Makefile | 3 + mm/kold.c | 150 +++++++++++++++++++++++++++++++++++++++++ mm/memcontrol.c | 52 ++++++++++++++ mm/vmscan.c | 35 +++++++++- 6 files changed, 286 insertions(+), 2 deletions(-) create mode 100644 include/linux/kold.h create mode 100644 mm/kold.c diff --git a/include/linux/kold.h b/include/linux/kold.h new file mode 100644 index 000000000000..10b0dbe09a5c --- /dev/null +++ b/include/linux/kold.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * + * Periodic aging for multi-gen LRU + * + * Copyright (C) 2022 Yuanchu Xie + */ +#ifndef KOLD_H_ +#define KOLD_H_ + +#include + +struct kold_stats { + /* late is defined as spending an entire interval aging without sleep + * stat is aggregated every aging interval + */ + unsigned int late_count; +}; + +int kold_set_interval(unsigned int interval); +unsigned int kold_get_interval(void); +int kold_get_stats(struct kold_stats *stats); + +/* returns the creation timestamp of the youngest generation */ +unsigned long lru_gen_force_age_lruvec(struct mem_cgroup *memcg, int nid, + unsigned long min_ttl); + +#ifndef CONFIG_MEMCG +int kold_set_interval(unsigned int interval) +{ + return 0; +} + +unsigned int kold_get_interval(void) +{ + return 0; +} + +int kold_get_stats(struct kold_stats *stats) +{ + return -1; +} +#endif /* CONFIG_MEMCG */ + +#endif /* KOLD_H_ */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5f74891556f3..929c777b826a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1218,7 +1218,9 @@ typedef struct pglist_data { =20 #ifdef CONFIG_LRU_GEN /* kswap mm walk data */ - struct lru_gen_mm_walk mm_walk; + struct lru_gen_mm_walk mm_walk; + /* kold periodic aging walk data */ + struct lru_gen_mm_walk kold_mm_walk; #endif =20 CACHELINE_PADDING(_pad2_); diff --git a/mm/Makefile b/mm/Makefile index 8e105e5b3e29..8bd554a6eb7d 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -98,6 +98,9 @@ obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o +ifdef CONFIG_LRU_GEN +obj-$(CONFIG_MEMCG) +=3D kold.o +endif ifdef CONFIG_SWAP obj-$(CONFIG_MEMCG) +=3D swap_cgroup.o endif diff --git a/mm/kold.c b/mm/kold.c new file mode 100644 index 000000000000..094574177968 --- /dev/null +++ b/mm/kold.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2022 Yuanchu Xie + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static struct task_struct *kold_thread __read_mostly; +/* protects kold_thread */ +static DEFINE_MUTEX(kold_mutex); + +static unsigned int aging_interval __read_mostly; +static unsigned int late_count; + +/* try to move to a cpu on the target node */ +static void try_move_current_to_node(int nid) +{ + struct cpumask node_cpus; + + cpumask_and(&node_cpus, cpumask_of_node(nid), cpu_online_mask); + if (!cpumask_empty(&node_cpus)) + set_cpus_allowed_ptr(current, &node_cpus); +} + +static int kold_run(void *none) +{ + int nid; + unsigned int flags; + unsigned long last_interval_start_time =3D jiffies; + bool sleep_since_last_full_scan =3D false; + struct mem_cgroup *memcg; + struct reclaim_state reclaim_state =3D {}; + + while (!kthread_should_stop()) { + unsigned long interval =3D + (unsigned long)(READ_ONCE(aging_interval)) * HZ; + unsigned long next_wakeup_tick =3D jiffies + interval; + long timeout_ticks; + + current->reclaim_state =3D &reclaim_state; + flags =3D memalloc_noreclaim_save(); + + for_each_node_state(nid, N_MEMORY) { + pg_data_t *pgdat =3D NODE_DATA(nid); + + try_move_current_to_node(nid); + reclaim_state.mm_walk =3D &pgdat->kold_mm_walk; + + memcg =3D mem_cgroup_iter(NULL, NULL, NULL); + do { + unsigned long young_timestamp =3D + lru_gen_force_age_lruvec(memcg, nid, + interval); + + if (time_before(young_timestamp + interval, + next_wakeup_tick)) { + next_wakeup_tick =3D young_timestamp + interval; + } + } while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL))); + } + + memalloc_noreclaim_restore(flags); + current->reclaim_state =3D NULL; + + /* late_count stats update */ + if (time_is_before_jiffies(last_interval_start_time + interval)) { + last_interval_start_time +=3D interval; + if (!sleep_since_last_full_scan) { + WRITE_ONCE(late_count, + READ_ONCE(late_count) + 1); + } + sleep_since_last_full_scan =3D false; + } + + /* sleep until next aging */ + timeout_ticks =3D -(long)(jiffies - next_wakeup_tick); + if (timeout_ticks > 0 && timeout_ticks !=3D MAX_SCHEDULE_TIMEOUT) { + sleep_since_last_full_scan =3D true; + schedule_timeout_idle(timeout_ticks); + } + } + return 0; +} + +int kold_get_stats(struct kold_stats *stats) +{ + stats->late_count =3D READ_ONCE(late_count); + return 0; +} + +unsigned int kold_get_interval(void) +{ + return READ_ONCE(aging_interval); +} + +int kold_set_interval(unsigned int interval) +{ + int err =3D 0; + + mutex_lock(&kold_mutex); + if (interval && !kold_thread) { + if (!lru_gen_enabled()) { + err =3D -EOPNOTSUPP; + goto cleanup; + } + kold_thread =3D kthread_create(kold_run, NULL, "kold"); + + if (IS_ERR(kold_thread)) { + pr_err("kold: kthread_run(kold_run) failed\n"); + err =3D PTR_ERR(kold_thread); + kold_thread =3D NULL; + goto cleanup; + } + WRITE_ONCE(aging_interval, interval); + wake_up_process(kold_thread); + } else { + if (!interval && kold_thread) { + kthread_stop(kold_thread); + kold_thread =3D NULL; + } + WRITE_ONCE(aging_interval, interval); + } + +cleanup: + mutex_unlock(&kold_mutex); + return err; +} + +static int __init kold_init(void) +{ + return 0; +} + +module_init(kold_init); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2d8549ae1b30..7d2fb3fc4580 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -6569,6 +6570,49 @@ static ssize_t memory_oom_group_write(struct kernfs_= open_file *of, return nbytes; } =20 +#ifdef CONFIG_LRU_GEN +static int memory_periodic_aging_show(struct seq_file *m, void *v) +{ + unsigned int interval =3D kold_get_interval(); + struct kold_stats stats; + int err; + + err =3D kold_get_stats(&stats); + + if (err) + return err; + + seq_printf(m, "aging_interval %u\n", interval); + seq_printf(m, "late_count %u\n", stats.late_count); + return 0; +} + +static ssize_t memory_periodic_aging_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + unsigned int new_interval; + int err; + + if (!lru_gen_enabled()) + return -EOPNOTSUPP; + + buf =3D strstrip(buf); + if (!buf) + return -EINVAL; + + err =3D kstrtouint(buf, 0, &new_interval); + if (err) + return err; + + err =3D kold_set_interval(new_interval); + if (err) + return err; + + return nbytes; +} +#endif /* CONFIG_LRU_GEN */ + static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6679,6 +6723,14 @@ static struct cftype memory_files[] =3D { .flags =3D CFTYPE_NS_DELEGATABLE, .write =3D memory_reclaim, }, +#ifdef CONFIG_LRU_GEN + { + .name =3D "periodic_aging", + .flags =3D CFTYPE_ONLY_ON_ROOT, + .seq_show =3D memory_periodic_aging_show, + .write =3D memory_periodic_aging_write, + }, +#endif { } /* terminate */ }; =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index 04d8b88e5216..0fea21366fc8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -54,6 +54,7 @@ #include #include #include +#include =20 #include #include @@ -5279,8 +5280,10 @@ static void lru_gen_change_state(bool enabled) =20 if (enabled) static_branch_enable_cpuslocked(&lru_gen_caps[LRU_GEN_CORE]); - else + else { static_branch_disable_cpuslocked(&lru_gen_caps[LRU_GEN_CORE]); + kold_set_interval(0); + } =20 memcg =3D mem_cgroup_iter(NULL, NULL, NULL); do { @@ -5760,6 +5763,36 @@ static const struct file_operations lru_gen_ro_fops = =3D { .release =3D seq_release, }; =20 +/*************************************************************************= ***** + * periodic aging (kold) + *************************************************************************= *****/ + +/* age lruvec as long as it is older than min_ttl, + * return the timestamp of the youngest generation + */ +unsigned long lru_gen_force_age_lruvec(struct mem_cgroup *memcg, int nid, + unsigned long min_ttl) +{ + struct scan_control sc =3D { + .may_writepage =3D true, + .may_unmap =3D true, + .may_swap =3D true, + .reclaim_idx =3D MAX_NR_ZONES - 1, + .gfp_mask =3D GFP_KERNEL, + }; + struct lruvec *lruvec =3D get_lruvec(memcg, nid); + DEFINE_MAX_SEQ(lruvec); + int gen =3D lru_gen_from_seq(max_seq); + unsigned long birth_timestamp =3D + READ_ONCE(lruvec->lrugen.timestamps[gen]); + + if (time_is_before_jiffies(birth_timestamp + min_ttl)) + try_to_inc_max_seq(lruvec, max_seq, &sc, true, true); + + return READ_ONCE(lruvec->lrugen.timestamps[lru_gen_from_seq( + READ_ONCE((lruvec)->lrugen.max_seq))]); +} + /*************************************************************************= ***** * initialization *************************************************************************= *****/ --=20 2.39.0.314.g84b9a713c41-goog