From nobody Mon Apr 27 09:11:33 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D633C43334 for ; Tue, 14 Jun 2022 22:26:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344143AbiFNW0S (ORCPT ); Tue, 14 Jun 2022 18:26:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344134AbiFNW0N (ORCPT ); Tue, 14 Jun 2022 18:26:13 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9EAED5002A for ; Tue, 14 Jun 2022 15:26:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655245572; x=1686781572; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gSBoEG1kj/hIctq3Ckgu1ns/0pG+bGgO59Azngc0Rcw=; b=ZIsaJU0rVf7VVZvUXj7TPpMDYumTsSF8iu1ZrVQr4/ABeJGOaVApKvGq 8mCZcCKoPQ0ynzTJHGPWGAouJHgZnuU8ZoQ6tdZ6ZG3l90w+EKOwzaAnt cDVITmQvbe5xF4FmiST2kx3uBpmpzGBemYn87Wtyp6iBUIw7J1LXs+tc9 eM4XLOegdH6G2ZweXilimlZInWDbndyzJi7Uz4z3ofayba2YiBBKjWC75 QLA0TiIGmwrfFqsyI0mSsb7tbXoP1xrBHtxEbHOe6KCyZkQdhtwaJtWFd LcHjYrfFO61aJ9XUOL2VrsjqU+fCf5PKAEDlg8RZm4GnxQjzwGWI52XpJ w==; X-IronPort-AV: E=McAfee;i="6400,9594,10378"; a="340422010" X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="340422010" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2022 15:26:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="588724718" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmsmga007.fm.intel.com with ESMTP; 14 Jun 2022 15:26:11 -0700 From: Tim Chen To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Tim Chen , Wei Xu , Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes , "Aneesh Kumar K . V" , Shakeel Butt Subject: [RFC PATCH 1/3] mm/memory-tiers Add functions for tier memory usage in a cgroup Date: Tue, 14 Jun 2022 15:25:33 -0700 Message-Id: <94a10c772c50d378b81dad654476551bfc50e3d7.1655242024.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add functions to provide tier based memory usage. This is in preparation for query via sysfs and for controlling a cgroup's top tier memory usage. This patch introduces the tiered memory usage query interface and a simple implementation. A more efficient implementation to get toptier memory usage will be introduced in the next patch. --- include/linux/memory-tiers.h | 2 ++ mm/memcontrol.c | 35 +++++++++++++++++++++++++++++++++++ mm/memory-tiers.c | 3 ++- 3 files changed, 39 insertions(+), 1 deletion(-) diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index de4098f6d5d5..1177dcbbdeda 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -31,6 +31,8 @@ struct memory_tier { }; =20 extern bool numa_demotion_enabled; +extern struct list_head memory_tiers; + int node_create_and_set_memory_tier(int node, int tier); int next_demotion_node(int node); int node_set_memory_tier(int node, int tier); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index abec50f31fe6..2f6e95e6d200 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -3921,6 +3922,40 @@ static int memcg_numa_stat_show(struct seq_file *m, = void *v) =20 return 0; } + +unsigned long mem_cgroup_memtier_usage(struct mem_cgroup *memcg, + struct memory_tier *memtier) +{ + int node; + struct memory_tier *node_tier; + unsigned long usage =3D 0; + + if (!memcg) + return 0; + + rcu_read_lock(); + for_each_online_node(node) { + node_tier =3D node_get_memory_tier(node); + if (node_tier =3D=3D memtier) + usage +=3D mem_cgroup_node_nr_lru_pages(memcg, node, + LRU_ALL, true); + node_put_memory_tier(node_tier); + } + rcu_read_unlock(); + return usage; +} + +unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg) +{ + struct memory_tier *top_tier; + + top_tier =3D list_first_entry(&memory_tiers, struct memory_tier, list); + if (top_tier) + return mem_cgroup_memtier_usage(memcg, top_tier); + else + return 0; +} + #endif /* CONFIG_NUMA */ =20 static const unsigned int memcg1_stats[] =3D { diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index 0dae3114e22c..d552ac1e9d57 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -16,7 +16,8 @@ struct demotion_nodes { #define to_memory_tier(device) container_of(device, struct memory_tier, de= v) static void establish_migration_targets(void); static DEFINE_MUTEX(memory_tier_lock); -static LIST_HEAD(memory_tiers); +LIST_HEAD(memory_tiers); +EXPORT_SYMBOL(memory_tiers); static int top_tier_rank; /* * node_demotion[] examples: --=20 2.35.1 From nobody Mon Apr 27 09:11:33 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67B5DC43334 for ; Tue, 14 Jun 2022 22:26:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344866AbiFNW0T (ORCPT ); Tue, 14 Jun 2022 18:26:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344171AbiFNW0O (ORCPT ); Tue, 14 Jun 2022 18:26:14 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F26C506C5 for ; Tue, 14 Jun 2022 15:26:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655245573; x=1686781573; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1JcRJtvf1cqzATPkll8v8tW75scjLiK/F3DEZ16TkYw=; b=Ubybomb36vgEKGJfYNWM50CzIwywrqNDU4nc5ad/qSHfcnK0K29z156g yCB3iGXtgYuC9SiDoLydio4Os4Mrx2bfG9QwpC4nmH0eoMLXUfRdFrNPj IJ4YezAuwfRgwsKcCnSboaCgGNR6N0FvdTs0OU6DDRLlcGEYUVOTusDeX ErGHcyZu1YPGbeah4BqSfzNRiSoVENNCFV4DIl5qvn+tI3Nu6qVkUWJYt B3RF9Sf0myRTSdAcf2Ck3E2N4q+MyeE27Urf++c9Y60qdiNcGSn/fvvP2 nNTKmushInDWcM9C7tY869ADhvCtdaz7RSJ44+6ms+XpBTfQcXUYw5OgJ g==; X-IronPort-AV: E=McAfee;i="6400,9594,10378"; a="340422017" X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="340422017" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2022 15:26:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="588724728" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmsmga007.fm.intel.com with ESMTP; 14 Jun 2022 15:26:12 -0700 From: Tim Chen To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Tim Chen , Wei Xu , Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes , "Aneesh Kumar K . V" , Shakeel Butt Subject: [RFC PATCH 2/3] mm/memory-tiers: Use page counter to track toptier memory usage Date: Tue, 14 Jun 2022 15:25:34 -0700 Message-Id: X-Mailer: git-send-email 2.32.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" If we need to restrict toptier memory usage for a cgroup, we need to retrieve usage of toptier memory efficiently. Add a page counter to track toptier memory usage directly so its value can be returned right away. --- include/linux/memcontrol.h | 1 + mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++++------ 2 files changed, 43 insertions(+), 8 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9ecead1042b9..b4f727cba1de 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -241,6 +241,7 @@ struct mem_cgroup { =20 /* Accounted resources */ struct page_counter memory; /* Both v1 & v2 */ + struct page_counter toptier; =20 union { struct page_counter swap; /* v2 only */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2f6e95e6d200..2f20ec2712b8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -848,6 +848,23 @@ static void mem_cgroup_charge_statistics(struct mem_cg= roup *memcg, __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); } =20 +static inline void mem_cgroup_charge_toptier(struct mem_cgroup *memcg, + int nid, + int nr_pages) +{ + if (!node_is_toptier(nid) || !memcg) + return; + + if (nr_pages >=3D 0) { + page_counter_charge(&memcg->toptier, + (unsigned long) nr_pages); + } else { + nr_pages =3D -nr_pages; + page_counter_uncharge(&memcg->toptier, + (unsigned long) nr_pages); + } +} + static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target) { @@ -3027,6 +3044,8 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t= gfp, int order) if (!ret) { page->memcg_data =3D (unsigned long)objcg | MEMCG_DATA_KMEM; + mem_cgroup_charge_toptier(page_memcg(page), + page_to_nid(page), 1 << order); return 0; } obj_cgroup_put(objcg); @@ -3050,6 +3069,8 @@ void __memcg_kmem_uncharge_page(struct page *page, in= t order) =20 objcg =3D __folio_objcg(folio); obj_cgroup_uncharge_pages(objcg, nr_pages); + mem_cgroup_charge_toptier(page_memcg(page), + page_to_nid(page), -nr_pages); folio->memcg_data =3D 0; obj_cgroup_put(objcg); } @@ -3947,13 +3968,10 @@ unsigned long mem_cgroup_memtier_usage(struct mem_c= group *memcg, =20 unsigned long mem_cgroup_toptier_usage(struct mem_cgroup *memcg) { - struct memory_tier *top_tier; - - top_tier =3D list_first_entry(&memory_tiers, struct memory_tier, list); - if (top_tier) - return mem_cgroup_memtier_usage(memcg, top_tier); - else + if (!memcg) return 0; + + return page_counter_read(&memcg->toptier); } =20 #endif /* CONFIG_NUMA */ @@ -5228,11 +5246,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *pa= rent_css) memcg->oom_kill_disable =3D parent->oom_kill_disable; =20 page_counter_init(&memcg->memory, &parent->memory); + page_counter_init(&memcg->toptier, &parent->toptier); page_counter_init(&memcg->swap, &parent->swap); page_counter_init(&memcg->kmem, &parent->kmem); page_counter_init(&memcg->tcpmem, &parent->tcpmem); } else { page_counter_init(&memcg->memory, NULL); + page_counter_init(&memcg->toptier, NULL); page_counter_init(&memcg->swap, NULL); page_counter_init(&memcg->kmem, NULL); page_counter_init(&memcg->tcpmem, NULL); @@ -5678,6 +5698,8 @@ static int mem_cgroup_move_account(struct page *page, memcg_check_events(to, nid); mem_cgroup_charge_statistics(from, -nr_pages); memcg_check_events(from, nid); + mem_cgroup_charge_toptier(to, nid, nr_pages); + mem_cgroup_charge_toptier(from, nid, -nr_pages); local_irq_enable(); out_unlock: folio_unlock(folio); @@ -6761,6 +6783,7 @@ static int charge_memcg(struct folio *folio, struct m= em_cgroup *memcg, =20 local_irq_disable(); mem_cgroup_charge_statistics(memcg, nr_pages); + mem_cgroup_charge_toptier(memcg, folio_nid(folio), nr_pages); memcg_check_events(memcg, folio_nid(folio)); local_irq_enable(); out: @@ -6853,6 +6876,7 @@ struct uncharge_gather { unsigned long nr_memory; unsigned long pgpgout; unsigned long nr_kmem; + unsigned long nr_toptier; int nid; }; =20 @@ -6867,6 +6891,7 @@ static void uncharge_batch(const struct uncharge_gath= er *ug) =20 if (ug->nr_memory) { page_counter_uncharge(&ug->memcg->memory, ug->nr_memory); + page_counter_uncharge(&ug->memcg->toptier, ug->nr_toptier); if (do_memsw_account()) page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory); if (ug->nr_kmem) @@ -6929,12 +6954,18 @@ static void uncharge_folio(struct folio *folio, str= uct uncharge_gather *ug) ug->nr_memory +=3D nr_pages; ug->nr_kmem +=3D nr_pages; =20 + if (node_is_toptier(folio_nid(folio))) + ug->nr_toptier +=3D nr_pages; + folio->memcg_data =3D 0; obj_cgroup_put(objcg); } else { /* LRU pages aren't accounted at the root level */ - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_is_root(memcg)) { ug->nr_memory +=3D nr_pages; + if (node_is_toptier(folio_nid(folio))) + ug->nr_toptier +=3D nr_pages; + } ug->pgpgout++; =20 folio->memcg_data =3D 0; @@ -7011,6 +7042,7 @@ void mem_cgroup_migrate(struct folio *old, struct fol= io *new) /* Force-charge the new page. The old one will be freed soon */ if (!mem_cgroup_is_root(memcg)) { page_counter_charge(&memcg->memory, nr_pages); + mem_cgroup_charge_toptier(memcg, folio_nid(new), nr_pages); if (do_memsw_account()) page_counter_charge(&memcg->memsw, nr_pages); } @@ -7231,8 +7263,10 @@ void mem_cgroup_swapout(struct folio *folio, swp_ent= ry_t entry) =20 folio->memcg_data =3D 0; =20 - if (!mem_cgroup_is_root(memcg)) + if (!mem_cgroup_is_root(memcg)) { page_counter_uncharge(&memcg->memory, nr_entries); + mem_cgroup_charge_toptier(memcg, folio_nid(folio), -nr_entries); + } =20 if (!cgroup_memory_noswap && memcg !=3D swap_memcg) { if (!mem_cgroup_is_root(swap_memcg)) --=20 2.35.1 From nobody Mon Apr 27 09:11:33 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37818C433EF for ; Tue, 14 Jun 2022 22:26:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345212AbiFNW0X (ORCPT ); Tue, 14 Jun 2022 18:26:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344400AbiFNW0Q (ORCPT ); Tue, 14 Jun 2022 18:26:16 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9200550E21 for ; Tue, 14 Jun 2022 15:26:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655245574; x=1686781574; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1DtxFR7jESUt0Tg8UteAbb/krwgxdAsTgRC8kfkXIEY=; b=acxCcwLoL1PZQU88P6J2MFdxse+tTVFE2kNcSyvoZBqsjVnZdyUBh9eX G/pQM5d7Ah3BaXXHM316mXlwVQhgLNmxFyNAZI4HENRkbfd2eDzm/ujMb j0Xd3HbYH15t6dnvEizbBh6hzefEl9SpC2qb5/cr8BNKfUbxPhay20U5D G5Yl17mYO6eclhadne95usyJHjvURTNu2QfpCgFJwuRAvC1ABQugywN6X Y53oIRPYCOXjwuV2+GU9OOhGINqVe4E/IgpqsXVUTkNzVSfbLzQX2qI3y jab2D/FU5znRSkMhYeyGCHmDiXUPPziiTTfPsBNuy6hsTOqEh3ZB3JNx9 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10378"; a="340422024" X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="340422024" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2022 15:26:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,300,1647327600"; d="scan'208";a="588724739" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmsmga007.fm.intel.com with ESMTP; 14 Jun 2022 15:26:13 -0700 From: Tim Chen To: linux-mm@kvack.org, akpm@linux-foundation.org Cc: Tim Chen , Wei Xu , Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes , "Aneesh Kumar K . V" , Shakeel Butt Subject: [RFC PATCH 3/3] mm/memory-tiers: Show toptier memory usage for cgroup Date: Tue, 14 Jun 2022 15:25:35 -0700 Message-Id: <766f5858d5427c0345a7bd55a16733b39022095c.1655242024.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Show toptier memory usage for a cgroup via sysfs: /sys/fs/cgroup//memory.toptier --- mm/memcontrol.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2f20ec2712b8..5fd1e3b686cd 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6205,6 +6205,14 @@ static u64 memory_current_read(struct cgroup_subsys_= state *css, return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE; } =20 +static u64 memory_toptier_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); + + return (u64)mem_cgroup_toptier_usage(memcg) * PAGE_SIZE; +} + static u64 memory_peak_read(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -6516,6 +6524,11 @@ static struct cftype memory_files[] =3D { .flags =3D CFTYPE_NOT_ON_ROOT, .read_u64 =3D memory_current_read, }, + { + .name =3D "toptier", + .flags =3D CFTYPE_NOT_ON_ROOT, + .read_u64 =3D memory_toptier_read, + }, { .name =3D "peak", .flags =3D CFTYPE_NOT_ON_ROOT, --=20 2.35.1