From nobody Sat Feb 7 21:24:23 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 116D221D3E4 for ; Mon, 22 Dec 2025 00:49:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364562; cv=none; b=VURo1nRUN167ZqktXv4QT648+BsKIK6VZNfu/qCqG1QG81uHq6AzIgTKfuCWM7cM5mqmeq3dA4bqxtnppbpFqWp9tyjP+4SfKkHR3XzagRIXtFUrvQpGbEfy1QQeTegdR4tVFqRY0ZsKQQ5Ap4WeUZf5M55IFTibIZjprSQHweU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364562; c=relaxed/simple; bh=uwpd6A8/JJiZRPsY0eYWPt+toE8mJr3sa3KvHSB6r6E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KjHZx5t/XO6tgykmWDDOKOkC6gqKyL9Lglt3A0wB+CrjczZNgTF77LuDSK6mgcCxe8XAFqrYjPi7s2EMA8BSxocFlHOOlE3IU/opSJlpNLxrl5BAVqug6YE/IiGuzXnVCV9BunNsC+2UnbhDQCRxoJ0c8YcjHB4HQA5BCKe74ck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=G+9wf30e; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="G+9wf30e" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-2a102494058so22172065ad.0 for ; Sun, 21 Dec 2025 16:49:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766364560; x=1766969360; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=10vzVgMW/SIAVZiREFQWpaxB2zwT+8w8TEXAc8bNZjs=; b=G+9wf30e62Idhr6Dn/aKSNfTzIWCDzQro2Jf9UhazyYGS3tTXR9EvrhauwstQik2WC Xgh1HM7UocOfslvN2ehoa/lSrYbt8se94R4pfJKKpchde8P2RULiddxFMOr95CQ5Pc9Z uajyii3PabRXlryf28vGVRkxY7BO1bJ8Aa96MBNGVH5MiOF6px9WOG3g1Cd0wv0ht/lZ YmrysSv//47lcX4IoYjzIZ9kCPWZnFBirSbvCG+ksGxvWYRzCRlHrZuzijhe2vNfGi3W WZ3HB8Y7bfuZcAkpKflMiQiXPclZXIwNzADyES7uzNj7dvVb9mCC4gNWCBxKbVAu1pRw rYuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766364560; x=1766969360; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=10vzVgMW/SIAVZiREFQWpaxB2zwT+8w8TEXAc8bNZjs=; b=hhYorIIBwjdq8x5XjbVjUzkbNcPCNdz1AThcluIOZN4w7ExFAtm1PNZxQreugrO0MB yyNQ6QzjEWYm35GM7h3qjfafTxQa5TiONOOVgEe887X5PtTup37dyT5ntKoQskyZqTgK RGB8eEBZAIJYHt2K7riPTJYc+GmqlGUAvmbYCvAP50aSOY5gPSz/3dSuKyGdtCO2Pvdg g30pdwIfRazuEiZJqx7O6sVkfWkKI7bR06J4THV6XhzY0ZqzKcXVMip4A/3U+PgTJzFi ko5asXqLr7rMEsgyjA64vE1DXHjFOqS7th6yKVdnG+3y/NIRjNZ29IPaPudscMfco7ly 7mug== X-Forwarded-Encrypted: i=1; AJvYcCVutNuRQb/uMhW1RtdYQtobFzf7+437SQB9DGxe7PnlG2v15XWad3GSs+Kt1+GcSjp3ByDxa61kFvwatmQ=@vger.kernel.org X-Gm-Message-State: AOJu0YzSWVBInDrUC67rEU3sSgTaiC5Oxi1bnxxs5fMU+qETIY/pU6OD dW+t1sFZkddFrfsLcjy4ksGIdQ+Dl2aTRlrQrnYRJE0dwMQGM4kOTg8o X-Gm-Gg: AY/fxX48OWIlLCf2d07V7/uvLWU0tC+AM6b+PotN5m42GgI6g8WNhSnyYQ1CQaDmxIQ 2n5ROCL8jG2zZ/lhiAwXh/mrZXwZjmrHEqlLrVLzSCEGONP0G71/VObrOOEYex94LLGpSBisXvu 3LO8RmzLzv/oOebehojTxNjdQWmYDnGQ8H3iNXZx2Eb2DZq1qbZdt3hpwU0w3OS7s6oSeiRAXTf S/gaUfPjRGPEI3syhifjoxEk9bNWmQhNo89YO1mPh3T1u2wvvJsMJpCfIi0nroOEd6PvVv2rfPm NEch/ZI+L/Vok5b6q+4IyOBns4ldcHqCaSqyJXk3arVlVv9MnaOeo2jnpaGkercDAGSvX3zRIRX OV0/39En2YgMpOeg6KuEpQx2Qr2MO06tFSdDnuPb/I0cNUElYEBjFzow5Z2eeUeSPYwRJTIJoX1 /jrDTA/A3bilNZ4JEwsii3LX3qbg== X-Google-Smtp-Source: AGHT+IGJcxUPthsfQvfgIvDDUW9++z5bU5TtJMiOACtXv5gVSZp4E6tKXbai45W4KorvPXSKizQXDQ== X-Received: by 2002:a17:90b:3c4d:b0:340:6b6f:4bbf with SMTP id 98e67ed59e1d1-34e71e6a525mr11996986a91.18.1766364560272; Sun, 21 Dec 2025 16:49:20 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:df7d:b611:ffaf:6d45]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c1e79620bd3sm7461832a12.4.2025.12.21.16.49.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Dec 2025 16:49:19 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com Subject: [PATCH v2 1/3] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes Date: Mon, 22 Dec 2025 09:48:32 +0900 Message-ID: <20251222004834.10539-2-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251222004834.10539-1-akinobu.mita@gmail.com> References: <20251222004834.10539-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This makes it possible to create memory tiers using fake numa nodes generated by numa emulation. The "numa_emulation.adistance=3D" kernel cmdline option allows you to set the abstract distance for each NUMA node. For example, you can create two fake nodes, each in a different memory tier by booting with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704". Here, the abstract distances of node0 and node1 are set to 576 and 706, respectively. Each memory tier covers an abstract distance chunk size of 128. Thus, nodes with abstract distances between 512 and 639 are classified into the same memory tier, and nodes with abstract distances between 640 and 767 are classified into the next slower memory tier. The abstract distance of fake nodes not specified in the parameter will be the default DRAM abstract distance of 576. Signed-off-by: Akinobu Mita --- v2: - fix the explanation about cmdline parameter in the commit log mm/numa_emulation.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c index 703c8fa05048..a4266da21344 100644 --- a/mm/numa_emulation.c +++ b/mm/numa_emulation.c @@ -6,6 +6,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -344,6 +347,27 @@ static int __init setup_emu2phys_nid(int *dfl_phys_nid) return max_emu_nid; } =20 +static int adistance[MAX_NUMNODES]; +module_param_array(adistance, int, NULL, 0400); +MODULE_PARM_DESC(adistance, "Abstract distance values for each NUMA node"); + +static int emu_calculate_adistance(struct notifier_block *self, + unsigned long nid, void *data) +{ + if (adistance[nid]) { + int *adist =3D data; + + *adist =3D adistance[nid]; + return NOTIFY_STOP; + } + return NOTIFY_OK; +} + +static struct notifier_block emu_adist_nb =3D { + .notifier_call =3D emu_calculate_adistance, + .priority =3D INT_MIN, +}; + /** * numa_emulation - Emulate NUMA nodes * @numa_meminfo: NUMA configuration to massage @@ -532,6 +556,8 @@ void __init numa_emulation(struct numa_meminfo *numa_me= minfo, int numa_dist_cnt) } } =20 + register_mt_adistance_algorithm(&emu_adist_nb); + /* free the copied physical distance table */ memblock_free(phys_dist, phys_size); return; --=20 2.43.0 From nobody Sat Feb 7 21:24:23 2026 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89DDA20B7E1 for ; Mon, 22 Dec 2025 00:49:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.65 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364567; cv=none; b=Xy30jIOHVoC2KitA/NX9wJKs1Xa2ss8cAhmOJpZia7DFoRm6ge9SUbfPcL+5LCLchVZfGL2vE7nFhBB+MXuaP4N/4n3FHXl9h7pRcexzUnp82fBozBApKBa75CJWtkc7m0rpy143lwDi5X8OMoAUGC6eedrZwJQHwTZnZamuyak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364567; c=relaxed/simple; bh=PYz5LGEPFVA5K7LEO2RLxyv5BGghR4alllEhsANbZNQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JJ0LicpU1ludbUgdgLP9PeVEu+ZBUXIjvgoH9bEmbpDYfS/OwiOs41X6iTRg/x4xmI/P54LO99xD/otFr+fBDh1Hq9rwQZQFV5YWO1r74n8dGPVb77tQeJB9618uGNi2zvB/r/YKGuygO0GOlhBroX/qSF8yeFz867yzUFDtFL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=KzyzXLgO; arc=none smtp.client-ip=209.85.216.65 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KzyzXLgO" Received: by mail-pj1-f65.google.com with SMTP id 98e67ed59e1d1-34c7d0c5ddaso1855077a91.0 for ; Sun, 21 Dec 2025 16:49:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766364565; x=1766969365; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=a10/yyFUtLAmkdxBH2DQmudL6S0po5mlrbiaBGfSCN0=; b=KzyzXLgOkkFN/BrBIaREMCIni9v/MAwjGrTVgZNE9jOOWgooRbE0AmOooQT1PqE/Jt 4b/c8Q6TMLOoGO/WzL0R4rcbg6vn/AksELyRhosB7qAkVosRVIBe/SAXV6KBuwaBtR+8 IByHlFQVmaZ4JgxyTtMwrHZzb+sAcU1kklVpeuNuSyS3a7vgMNdda2ylL4fOi2L/aoT6 dnN2BwR2UeF8I9kkw5FP6kQ6gUiuGJdxwrGcEPRboFniw8+rbyZOHu6BNeZw/5pZlo5z uLRpJbcYVXF94I7sWw4S4AMH1vVsfQEsT+zqZMwaG2MrbQ+Ce+/1kNsbvyLbfKqE+Hta 0+Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766364565; x=1766969365; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=a10/yyFUtLAmkdxBH2DQmudL6S0po5mlrbiaBGfSCN0=; b=G+RjJxhNy/I8ynpJ9skxcYidtW6cy/Z+eReXfSq75gWmBkUZIVhvzL7c57PDwu6Tgp SjlriFxNwslKgHJtID5lWjEr0HK5Vybj09vofeUmmGBY7zmrBwM2mIqOK2r2exiG4bKf +2kAQZYd6SLG1VVnxwZbJ1P3PoTPU95sciiKn499lMGRROFKdGGjKaVnFk3BT5V51qF1 rPGGz60H7lf+2Shvtttz22A41awHViaIJFU/+NvT27ly3q3Na6OMsM1aURRdQ6HVrbUy hQBlFGZMnf2uKcg0N0bCj9BsmwqXx19ULehZmJPT/tGPr/9/yDUYoyxBEwNPmMyah6ye bL3g== X-Forwarded-Encrypted: i=1; AJvYcCVyXdVJIIYBVnlHcvRRrSxzmBULRfYu4/pewnj/K466TTZFSd5ajpyFnq0NZa/c8MSPDPSdrXKTyhdVKNQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxYM+BJi2mHMLIH9nfk1fZhxc6GpXYPxw/6xOTY6K71YBrW4UJc QcC5NRBOexltV3Ihn31l87XQ5yizwDaxzklQtOOgUBV+Po3jHfISQBnh X-Gm-Gg: AY/fxX49z+77fG9i06cWj1MdKHLlhwYqMXApJWP3AzSfWu9Wyc2503dQpCV4eJLv3ve uy5zjjTm2n5GdkCYsnpvfKvkdjWI3B9pg7djhofI3drEuW8cm2ts49p0K8ZKzmMfTjY7UHBFa6v ItL4MPZxF2EH6uSyPFCKW2cb0RUoJTvFgcSRn3hfAa97DLH6CIWvQGqSjeEaB6YFozpoN5+ESk9 wamEwOuq53gK/RN3C4tkixrT5QlbVHqDyNO0vsrjeo89fDdwzPN8IcyKYYY8qXv9U0Q47NxQGWF CIFvcFqLUCofP+p84g+nxopFJd4at11aLJnsIr91P0jkCrXvHHZDHFS2b8cIzQh69NZFR/ni0N9 NIClnuEHhQHY/35P6pkUK1XoJPHk9fxIdoyhN28iRW8FQ/SVrRvvI1tHvLhJBF/ljXDfb+k1CFa msmahyWiGL6r+n9QRziA6A9wViUw== X-Google-Smtp-Source: AGHT+IEMm5PsORF6eBK+myRm0k9fs0suEo4RqzGxdu08t8Dj4rsOQM/w7YKy0ZJXO7xvr+1j5f+i6g== X-Received: by 2002:a17:90b:1fc8:b0:32e:2fa7:fe6b with SMTP id 98e67ed59e1d1-34e90de1f19mr9151046a91.14.1766364564695; Sun, 21 Dec 2025 16:49:24 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:df7d:b611:ffaf:6d45]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c1e79620bd3sm7461832a12.4.2025.12.21.16.49.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Dec 2025 16:49:24 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com Subject: [PATCH v2 2/3] mm: numa_emu: add document for NUMA emulation Date: Mon, 22 Dec 2025 09:48:33 +0900 Message-ID: <20251222004834.10539-3-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251222004834.10539-1-akinobu.mita@gmail.com> References: <20251222004834.10539-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a document with a brief explanation of numa emulation and how to use the newly added "numa_emulation.adistance=3D" kernel cmdline parameter. Signed-off-by: Akinobu Mita --- New in v2 Documentation/mm/index.rst | 1 + Documentation/mm/numa_emulation.rst | 30 +++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+) create mode 100644 Documentation/mm/numa_emulation.rst diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst index 7aa2a8886908..7d628edd6a17 100644 --- a/Documentation/mm/index.rst +++ b/Documentation/mm/index.rst @@ -24,6 +24,7 @@ see the :doc:`admin guide <../admin-guide/mm/index>`. page_cache shmfs oom + numa_emulation =20 Unsorted Documentation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/Documentation/mm/numa_emulation.rst b/Documentation/mm/numa_em= ulation.rst new file mode 100644 index 000000000000..dce9f607c031 --- /dev/null +++ b/Documentation/mm/numa_emulation.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +NUMA emulation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +If CONFIG_NUMA_EMU is enabled, you can create fake NUMA nodes with +``numa=3Dfake=3D`` kernel cmdline option. +See Documentation/admin-guide/kernel-parameters.txt and +Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst for more informati= on. + + +Multiple Memory Tiers Creation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +The "numa_emulation.adistance=3D" kernel cmdline option allows you to set +the abstract distance for each NUMA node. + +For example, you can create two fake nodes, each in a different memory +tier by booting with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704". +Here, the abstract distances of node0 and node1 are set to 576 and 706, +respectively. + +Each memory tier covers an abstract distance chunk size of 128. Thus, +nodes with abstract distances between 512 and 639 are classified into the +same memory tier, and nodes with abstract distances between 640 and 767 +are classified into the next slower memory tier. + +The abstract distance of fake nodes not specified in the parameter will +be the default DRAM abstract distance of 576. --=20 2.43.0 From nobody Sat Feb 7 21:24:23 2026 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D00FD1F91E3 for ; Mon, 22 Dec 2025 00:49:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364571; cv=none; b=ZX9U0JmXhJeJ8H8YOZwtjyXoll/DQWraUZ1nvXpP0Glu4L003C61W0gtXX9hj+BTSPGNpNeGo1GSpF9iPU7cEnqNh0Q5MFLPNR7rbnDdBsF69forINF9SvUctV/pr1wUNVxLG2LZ4Wtxv2mSjMkItIwI1g0fuCiWkejnD5dcAmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766364571; c=relaxed/simple; bh=uW6yl7/YVIR0hnsVUPmYmxbr6X3Mpwe2rvnTEr7sd/w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GUzugfbgsUsiepu2HjKgn6p/yvWUNvlMmp4v1UnrPrRr6wpi5c+BGX1wRiz8camZF7jQ6ZODBjwTbf8aujPXDrRtaEWZkCgmk5w5/NIzMqYOK43ejuVtMSD1THOywVZ1acReVPyA0RtcY4UG9LdRK9vzmVXWv7hXy1nw+cYtIbw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=l9+Htyqw; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="l9+Htyqw" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-7ade456b6abso2792049b3a.3 for ; Sun, 21 Dec 2025 16:49:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1766364569; x=1766969369; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=O11NpgEudbETTqMQr23yY3JLGVYTvh4uSvV3m2jKMg8=; b=l9+HtyqwY46becPzsMALhR6F5fCS/nli58aH5k45IXCVrvKGkoJiWIazQRrOYY7Ntu UKS8qgYTBqdDsYPNO3ZWUtDMxegTstwUKHBywNBvrQEN/lFxhTLMVhCv7xoTJCeMaI9w Pz6GQGP+umbpZE5SyePhAWuAeKRziPLyQ2nWL/avhiVeDjfLq+gZxLZUWPdRmLiFm4AD fy0DFlgJKGYXKhutkjMMJa28ndZ7XBhHN9Qkp1KqBIboNtyuQk+KhPUw73LLhIp3SS6r ohiqGbrXmOzlKkYyvKMMwTFGEELHjB6CD9DaamBeuf++Zbpc7Onq99s2hLsHyx7nYI/L RkHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766364569; x=1766969369; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=O11NpgEudbETTqMQr23yY3JLGVYTvh4uSvV3m2jKMg8=; b=FhHLWSzOEKqnFUzkNg5aCtvWM5PZnhAwFBwwBE0M63VksbPt/aLBpBi7FiDUHIx4Ul p9b0SaeQQLjtRcqadbHgWiKbrZ+oCK3yhZprJWWNdalTzoFYYh8T5WSTZLkkfA55Y2yy O3smTKDs5fAuIe/p2UjQg1OktollvWvKne0y6K+zytAhGDxFI6TUuaI0+3Ls5MX0OdxW gh+VANSLBpekhxR9/QzK5zY38SZKh54gr5qW/K/Qk7eT9xgpRwoHyxQOIprQMxpXVPoT RPAXrdg5Ae9Y4xsZOqeU5nzCMRN/EmtfJ7k1t0JUl9GIdPqvGuVdNw/9UyasCqnR7L9e 58hg== X-Forwarded-Encrypted: i=1; AJvYcCU5TTTnHp/1VzRcqyvywVT681jHi0CPtwFyfPKn7YoDElUlGXf1z8kDP33ceHHJ9EaClaT2Zj67S9/nYh8=@vger.kernel.org X-Gm-Message-State: AOJu0YzY+UIdqpRNSycQvH6heErmjmnw2Ig6wS2zLRgJLsHRMVKrBo+O mTF4xOc7JMB0AGtDmcVTBzf3iAKa4PINXt9Bt/9ApDQKLpDMfecUcesX X-Gm-Gg: AY/fxX5KA0gUC3P+SugcNEmqdmsU9MBuo9hkFFHeDIZ9JAUPt5Zfpt2lrudSyZ/upqZ 17LJ/0NqzHCYg5IKY8clnBbk1XIHXbOYoyFRKlyoRQ0C5AgxGat6WIoIDmWnBVpREME7CSf302z 3HYAVJ2kBRHJlXJkAj+XEr1tSjODx+gKEL0WaLaObejNzS0k4BzkzMOcTIXHAzl97dDRTsjdZ5P ji4t6S/TN4VL5J2JcXOHVfQ5T7yahzrbKQ3EgpY1J9jJsC0ECQrPpezovJgJpvgHsZJqxqHV+1a XtdscfOF4C381ALj5RN8Rvq0tKNZ/n23ptHzS6UiBGjE17KeXQ6ZeSf580R8Ulqp/3btARZx6QV NrxUo3rCQMNAxCxs0nWPqNQVYxQueN4vsjiUyv+Vs0wKWvIuqXP6lBFcUZlETGhjGaVBOaPc9nO A1jAVMGKsskOUgPgkqyvL91n6mAg== X-Google-Smtp-Source: AGHT+IG8HmbbKxbFNUbBIyvPC/wqo5S8q0fmYrBfBYMPa9s7jMVxNEhlbTkPAWQV0fM658EICPtxTA== X-Received: by 2002:a05:6a20:7d9e:b0:364:be7:6fe9 with SMTP id adf61e73a8af0-376a83d884cmr7849186637.32.1766364569009; Sun, 21 Dec 2025 16:49:29 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:df7d:b611:ffaf:6d45]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c1e79620bd3sm7461832a12.4.2025.12.21.16.49.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Dec 2025 16:49:28 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com Subject: [PATCH v2 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Date: Mon, 22 Dec 2025 09:48:34 +0900 Message-ID: <20251222004834.10539-4-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251222004834.10539-1-akinobu.mita@gmail.com> References: <20251222004834.10539-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On systems with multiple memory-tiers consisting of DRAM and CXL memory, the OOM killer is not invoked properly. Here's the command to reproduce: $ sudo swapoff -a $ stress-ng --oomable -v --memrate 20 --memrate-bytes 10G \ --memrate-rd-mbs 1 --memrate-wr-mbs 1 The memory usage is the number of workers specified with the --memrate option multiplied by the buffer size specified with the --memrate-bytes option, so please adjust it so that it exceeds the total size of the installed DRAM and CXL memory. If swap is disabled, you can usually expect the OOM killer to terminate the stress-ng process when memory usage approaches the installed memory size. However, if multiple memory-tiers exist (multiple /sys/devices/virtual/memory_tiering/memory_tier directories exist) and /sys/kernel/mm/numa/demotion_enabled is true, the OOM killer will not be invoked and the system will become inoperable, regardless of whether MGLRU is enabled or not. This issue can be reproduced using NUMA emulation even on systems with only DRAM. You can create two-fake memory-tiers by booting a single-node system with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704" kernel parameters. The reason for this issue is that memory allocations do not directly trigger the oom-killer, assuming that if the target node has an underlying memory tier, it can always be reclaimed by demotion. So this change avoids this issue by not attempting to demote if the underlying node has less free memory than the minimum watermark, and the oom-killer will be triggered directly from memory allocations. Signed-off-by: Akinobu Mita --- v2: - describe reproducibility with !mglru in the commit log - removed unnecessary consideration for scan control when checking demotion= _nid watermarks mm/vmscan.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 76e9864447cc..0362026e66a5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -356,7 +356,18 @@ static bool can_demote(int nid, struct scan_control *s= c, return false; =20 /* If demotion node isn't in the cgroup's mems_allowed, fall back */ - return mem_cgroup_node_allowed(memcg, demotion_nid); + if (mem_cgroup_node_allowed(memcg, demotion_nid)) { + int z; + struct zone *zone; + struct pglist_data *pgdat =3D NODE_DATA(demotion_nid); + + for_each_managed_zone_pgdat(zone, pgdat, z, MAX_NR_ZONES - 1) { + if (zone_watermark_ok(zone, 0, min_wmark_pages(zone), + ZONE_MOVABLE, 0)) + return true; + } + } + return false; } =20 static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, --=20 2.43.0