From nobody Fri Dec 19 11:29:20 2025 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92949313E14 for ; Mon, 8 Dec 2025 09:41:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186887; cv=none; b=h9aixmoYCqclwnt2FNUznjnysFillEvW7hPbperZtIhwnFeF5f+A/Ryp+FgBFzpBvd3/nKnrsHYIr5m7WlyZAskWv7HlvhanL9lmD+B93fjOgiS1+IKx0ETt5PBdENl0GxZcYmi+IQRI9ffh5XId/rt0J/u8rqFvtExzJf0550U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186887; c=relaxed/simple; bh=UNuYO5mx9EJp3oJF/BMIjfwDQ6F6CgDJxyYgjNzptuo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dMLT900y1n1qGhGQosqbwyZsJEE6k/IMBRM/qHuIZlUQoHx1Y8GqsJIiG0y3dWLOYWUKRQfSk6Kg7yM91SWU3OuoZeKW6d7EzIIgrkaTMInWdAmQvzF2Eceg3Cs3s2AJOhQ4OieK2CKBHax2jo8IRYQ7KPLTF/Ym7+zBO+TB1FU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cV3trop0; arc=none smtp.client-ip=209.85.215.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cV3trop0" Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-bfb84c2fe5eso974055a12.1 for ; Mon, 08 Dec 2025 01:41:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765186885; x=1765791685; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YrNXZOKlxgRO0yUuW29s8gJG7z3xY0PKnm9mYNQSi3I=; b=cV3trop0PWww59zoB7ZTA41BN60SCU5QLNh8CYRZt0xcSzNMTnSg5RRIfJYJNbl9eC BTG9n6fnUil+s8HhKjhnbrk5eg9jVztMuPLJKJ80Ff9A4m5UErmEoRilPMaMiFVMJMAY ljsiK0U0JS3qiQPDAeMYCrbGAchuaan17vEWsTThGpLRSU0qSJjmOKWB9rdJQC61hEro P128i+U2l1XV5kQgedkt2s+fjxLUQCIdmdZuwxPesM9Jtw+wQQGUhjZL1i4oByx8Mk29 5E9C4vaXSwkDkvxEoY/aWS6gOWyMdecmA6Pwu1wW09MiiV3RekAGOicRgPEi5e3UTf9H Tjzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765186885; x=1765791685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=YrNXZOKlxgRO0yUuW29s8gJG7z3xY0PKnm9mYNQSi3I=; b=ZZ5sUM0U1j4ha3Kr7YVIIXMcKi8ifXWqho43tlwd0XVipe/FMGLYdpanbRUhJm0kx3 8WY9QcQxLJXw84Pe2hiUcaTUhPgb8anRl8MbOETqrkVzB/T5jM+ORmzspaoGRKvf3cdU k+4Trjq/CHK/JuPwNvw8TdH8dOp93G44I5dSUWCDoElehVigJywPphYOKQknyL2DJJ2O WsvEeukz2CL5PzgKSZ4KLT94staMD1mFeuFVOlJkgvsrmxo6PmH7xHf/ATDvToVG93Oy 9HxUr2E9JvI8vaM7kU7r5Jts7bfO5gBYdPI6Kr8nQnMTrDHe05xz72dvpiB2sbanQiUE 8uCQ== X-Forwarded-Encrypted: i=1; AJvYcCUSR56nKPuIFFqENZv0ti+hzsOhRl0HRg88KiBWG/6eQgTEb9ClUVICjXY8B2w7bvrqgz6JjoXsb34P3Qc=@vger.kernel.org X-Gm-Message-State: AOJu0YzFgC019xSiHpSnX3638SbqFDkr42q3CObnpcwqa6QbBPOlyYbU LY7zCBmeGyojkZjrM162aXNdU5qXFiqnhtqI5zShHF9R2dW/y7vSZk9c X-Gm-Gg: ASbGncurMJjMzJAfebk2vq2JQ3QaQ4lTrpt/rwsAtowxs780soVeniEbdw31jDbQNhu b1+XsEueog0c6Cszld1cnp8E2h65OGdzxm15PtU50jEZXgBEMhV0aBhcJv5+wux1lMppT8OCOi0 Mgsy/jT6OlqvYPSpMs/zfTahkw0wIOacZjTMYDkjp8FawEgwINA+yf8cfFxFBnBvSnavuObe7uG EJm3viK/1B77nKL0xlYUFjwsxwkwBiFmi/7ufQ4PMvWHlLNJ7AQIYFet+yCF9ErGP/tbCcLV5kj cZ6PASLpHqmOmDyZsXY/7gk7qbaLTIWdZp8egFRQdtVH7e7JcJkcEOmn+I8pybIZpeL/cH9DNxN DOlRPDUulg0APiJrgXu+yuS9u/A7xVsayDp/2naqZlVwxFGu13V4umUuAhBxADlPyuLRWLAYrcO uytx7oK6ON5tBqc76BxlN91wW3 X-Google-Smtp-Source: AGHT+IH2gvj0cqj4kadasbx5J6Um1ertdus7jcEuIr+SD+BFGC8ZCznhCkpFOtQ4+kihzfBZfZN8nQ== X-Received: by 2002:a05:6a20:a109:b0:366:14b0:4af9 with SMTP id adf61e73a8af0-36614b052b9mr7799200637.32.1765186884634; Mon, 08 Dec 2025 01:41:24 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:e251:d9f:c2ef:caf4]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf6a36bbc60sm11675279a12.36.2025.12.08.01.41.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Dec 2025 01:41:24 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com Subject: [PATCH 1/2] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes Date: Mon, 8 Dec 2025 18:40:27 +0900 Message-ID: <20251208094028.214949-2-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208094028.214949-1-akinobu.mita@gmail.com> References: <20251208094028.214949-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This makes it possible to create memory tiers using fake numa nodes generated by numa emulation. The new "numa_emulation.adistance" kernel parameter allows you to set the abstract distance for each NUMA node. For example, if the system is booted with the parameters "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704", it will configure mem= ory tiers with node0 having the default DRAM adistance value and node1 having a lower adistance value. Signed-off-by: Akinobu Mita --- mm/numa_emulation.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c index 703c8fa05048..a4266da21344 100644 --- a/mm/numa_emulation.c +++ b/mm/numa_emulation.c @@ -6,6 +6,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -344,6 +347,27 @@ static int __init setup_emu2phys_nid(int *dfl_phys_nid) return max_emu_nid; } =20 +static int adistance[MAX_NUMNODES]; +module_param_array(adistance, int, NULL, 0400); +MODULE_PARM_DESC(adistance, "Abstract distance values for each NUMA node"); + +static int emu_calculate_adistance(struct notifier_block *self, + unsigned long nid, void *data) +{ + if (adistance[nid]) { + int *adist =3D data; + + *adist =3D adistance[nid]; + return NOTIFY_STOP; + } + return NOTIFY_OK; +} + +static struct notifier_block emu_adist_nb =3D { + .notifier_call =3D emu_calculate_adistance, + .priority =3D INT_MIN, +}; + /** * numa_emulation - Emulate NUMA nodes * @numa_meminfo: NUMA configuration to massage @@ -532,6 +556,8 @@ void __init numa_emulation(struct numa_meminfo *numa_me= minfo, int numa_dist_cnt) } } =20 + register_mt_adistance_algorithm(&emu_adist_nb); + /* free the copied physical distance table */ memblock_free(phys_dist, phys_size); return; --=20 2.43.0 From nobody Fri Dec 19 11:29:20 2025 Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1104E314A64 for ; Mon, 8 Dec 2025 09:41:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186891; cv=none; b=PpxurI5/7aaEHgY9yKl2OdBxjnIzRqwHWwEtKIkzgpBE89//w/tgPSrrNBQTJJ6vcIBOp91OZs6ozBB5LfZXTJhlxjsbhoiYQwCKrDgVDdoYoYOiURZBhO7t0OgCancpKygI3UZQZugFT/S/PMxQ2ZZOt7pyJ9xUkWAsTuV/oQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186891; c=relaxed/simple; bh=vaRn1KfT36KE3FEN/+Bf12Zbj9vhejAeH1OmduUjF4A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NP3VlOFCKgJ5UA6YtURn/UnJGbYjZ3CNe1ISiD1psNB6jKrWvdZd8g2NkLNX/bJIhuFhfFtAFLNh2YD+8dlcNAR/tcM4IBboKCMopJPiMOfDHuZ15Z+3/kyHNSoKyK4g8wtJDsQjr8uP7h5tWJvo/IqqQUfUnnEj9/K/U5ivR5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VbOxDbxq; arc=none smtp.client-ip=209.85.210.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VbOxDbxq" Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-7b9c17dd591so3899585b3a.3 for ; Mon, 08 Dec 2025 01:41:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765186889; x=1765791689; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bExisJd2rawtEUNYxhwuMVIv9x+cEHj6grigFw1J+tY=; b=VbOxDbxqhxFfad/uYI8S2RJHYoEde3IUbvqYyxj8GojEYzhXXBzaOmzonaGKyxiL+S 2qn2C1npNUyd/eJBcgs4Ligzyawsi9YdqOMeUnIgwHx+LBwycvM599Z8BkST6vmJX4bA xH/sJTbqVbf8iJrttOb461pUY4jb2OHFnd+4TSVcx8xJ4JscmI8tOFU4vT0JzKYE8HN9 JiJ9EwkVDMnsxcWQxqoZl5SQtYcgP/UE9A2WBsccLRJhD2aozHFe9HN1zEvNtG7S47wS CAhbMHvD4UCpjdtQfvUYyMuI/qtCuGFVtutCcUTPeeihOcwwKcP6hhos/mtEygLjmPlr opjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765186889; x=1765791689; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=bExisJd2rawtEUNYxhwuMVIv9x+cEHj6grigFw1J+tY=; b=hqTfpp2q9SUDYAP4rW2xOA4TymKppOIwB08RBu2HsUmMPtQfazUU8lMiCLWENyDd5X IWsuncn2nLC7WHXafwoZdzVCpVOwj4oV34IvTiSuDo+/MCgd3LH4Xa/2zVIEViwKfuUW E+vK6ysZ8yTuGofZj7zzL3Y25+EfFzCDVtRqN1hollkZnUhoPbTz/fMWrchQUTPU5PmN fI+IZDUqtJy81QXoAaeWei/ya9aYxKC2h7zbsSWJn55o4FPsHOPJG85w10qfH8putnnN 5p/4eHsgu1/ybJ05pxO02B8GqnRSwSmFgFuj088GPE4XKUPx9do07b0o0Oyoz0chkB2X SbhA== X-Forwarded-Encrypted: i=1; AJvYcCXrptyXaH0laUyTO2vFAu4aQafW4gl3XT463XCzjRseztxKlAUW33s9EvTZ/M8LhtQUplp3fagnk5LDnnM=@vger.kernel.org X-Gm-Message-State: AOJu0YzZjxK5Xvezrx7Wa+5B03w2r4rSbWwqWc9hhdSZXghHv3U8VkrC bdYueptnpl/JEf5ty5q8QQ+JiPzsqZxIxYHnyne9hrH5Bc/IPuziNBfD X-Gm-Gg: ASbGncsz4SxwlA7auaW2zjUkkayyFdXZuQ10t46ECep35A2u54y5XF9k6luJKIL4DLP mU/GJeHBwRrfX6BvktWlWZk9ZjmRcRDa4emzJoyRkVxZMcmVWMmrUD6bobJ8OQWTdi/OsWIOhJw zB9heUJ4PshsSTYD8rg7N5IXL1dYSFjyFS5F9N3iehAmF/L1sQAtSej+ge6MWK4xuRmexe+bNVi rzSdKC0G773gUgn4CgAw/udTzreChK2JOsdT+wltmKWyCsiG/FRo+qgukJSE7sVvdrIgWvm9h5O iQBUojDNPmu1LOxRwq8/VHo2ZFxoteIP8BambphrqxSWCWzpODujDcvJGKWUgkIayQDZ+Q5Fsw1 Wda9K0m0w5uOD7sd/l7V2nUAHAUZ2wZNK1K6bzKGLlZZ4Z7sCjAMYIXM/0A9VHmoPDhtWCISp7L 9hDFPZ4op1RyzUSzHvPdeuN+Fp X-Google-Smtp-Source: AGHT+IFqvRUGIwFEaNgQfl/NTg1cwY2IZM60Uy8TIHC22S07N4FVSlZj92GMWE1sIDQ0qozblkpCYw== X-Received: by 2002:a05:6a20:ed0f:20b0:366:1e42:630 with SMTP id adf61e73a8af0-3661e420638mr4892693637.1.1765186889147; Mon, 08 Dec 2025 01:41:29 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:e251:d9f:c2ef:caf4]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-bf6a36bbc60sm11675279a12.36.2025.12.08.01.41.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Dec 2025 01:41:28 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com Subject: [PATCH 2/2] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Date: Mon, 8 Dec 2025 18:40:28 +0900 Message-ID: <20251208094028.214949-3-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208094028.214949-1-akinobu.mita@gmail.com> References: <20251208094028.214949-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On systems with multiple memory-tiers consisting of DRAM and CXL memory, the OOM killer is not invoked properly. Here's the command to reproduce: $ sudo swapoff -a $ stress-ng --oomable -v --memrate 20 --memrate-bytes 10G \ --memrate-rd-mbs 1 --memrate-wr-mbs 1 The memory usage is the number of workers specified with the --memrate option multiplied by the buffer size specified with the --memrate-bytes option, so please adjust it so that it exceeds the total size of the installed DRAM and CXL memory. If swap is disabled, you can usually expect the OOM killer to terminate the stress-ng process when memory usage approaches the installed memory size. However, if multiple memory-tiers exist (multiple /sys/devices/virtual/memory_tiering/memory_tier directories exist), and /sys/kernel/mm/numa/demotion_enabled is true and /sys/kernel/mm/lru_gen/min_ttl_ms is 0, the OOM killer will not be invoked and the system will become inoperable. This issue can be reproduced using NUMA emulation even on systems with only DRAM. You can create two-fake memory-tiers by booting a single-node system with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704" kernel parameters. The reason for this issue is that if the target node for allocation has an underlying memory tier, it is always assumed that it can be reclaimed via demotion. So this change avoids this issue by not attempting to demote if the demoting node has less free memory than the minimum watermark. Signed-off-by: Akinobu Mita --- mm/vmscan.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index fddd168a9737..f4748f258294 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -356,7 +356,20 @@ static bool can_demote(int nid, struct scan_control *s= c, return false; =20 /* If demotion node isn't in the cgroup's mems_allowed, fall back */ - return mem_cgroup_node_allowed(memcg, demotion_nid); + if (mem_cgroup_node_allowed(memcg, demotion_nid)) { + int z; + struct zone *zone; + struct pglist_data *pgdat =3D NODE_DATA(demotion_nid); + unsigned int highest_zoneidx =3D sc ? sc->reclaim_idx : MAX_NR_ZONES - 1; + int order =3D sc ? sc->order : 0; + + for_each_managed_zone_pgdat(zone, pgdat, z, highest_zoneidx) { + if (zone_watermark_ok(zone, order, min_wmark_pages(zone), + highest_zoneidx, 0)) + return true; + } + } + return false; } =20 static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, --=20 2.43.0