From nobody Sat Feb 7 10:08:28 2026 Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 847F237BE7E for ; Tue, 13 Jan 2026 08:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292131; cv=none; b=Y/b1HYb87pscIDleMrUoFRaSwerh5nJKLuoX7Cg2A4c8xXDUK+hSSVbBcaoyBIKDFEh/DbtjiUmQ5R8Rdn0Jv5D2Y6/UpySS8Lp1xHpEoW/El3UVlUNncBQNbLk3rKlBhSwBJ6y4eK956tZb+G4q0L54R1nEZXt4XSM1qG1GfdI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292131; c=relaxed/simple; bh=eUfMDALs3oQGUoNDnjd6BXOv9jf2PYXIXtYjEOUdMWY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kWOAxNh6KqlYdEL8jlKzicqvTexopqqlihxbm8v2WgS0LxQs7wiih4wndu/MjX3TxuaY8Q08jLBiF8+hCNuzI2K2MZ7UCiWsXyYhW6vLrCyBHVVrvhGaAKjHKE7yBInBh0eUwYp86LQkgMguY7aqDpx/gVC9i32bUAokI4d0b2w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VyN8i23A; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VyN8i23A" Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-34e730f5fefso5166714a91.0 for ; Tue, 13 Jan 2026 00:15:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768292130; x=1768896930; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PwTnQsuEWMmzKeRtMHdsKcZZ3Am1ZXUylBtZW2knjDQ=; b=VyN8i23AloqYp0gSyrI+vUwcHa4eMdWQHy/uUYsuS3qYXoletVPJxWWpWhQ9/i4MVZ 2kFR/oxG7IwrwFvuTVW6Dd+bLP6ohpdrrtidR/t55Sip2zwv+mxhUX+M0VLfmavY7cKo 0T7zlf623+NNLcbcVvC/C1sWt1RhLqVh+GS3RoC6cEwsgqdz2ZnM3wksTgD99n+AOxU9 CgCTsUe4ZWY3E8xcSWv4Iav9m9FTAAMRc3k24U4LayQ27sTSfyrdy5EoB5LIVySC+nXS 9WO01iSqpp+x4/ROIGkTE7lZquRiV2fw5ltMK8EQQ+q3stFs5PJ22WHQeWZnelH10Kiu TKgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768292130; x=1768896930; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=PwTnQsuEWMmzKeRtMHdsKcZZ3Am1ZXUylBtZW2knjDQ=; b=DbK+LbBlpLufZhMoT44+l7ilxTxD5ilyaFERyxkoXFoNUnOAozGyS23kqBXIpN3pif sVXCq3SfCRIGyRDyVqRuFpsrmOumj/SglP9AK5XHJVZZwx+ENNyPPkbn1Qz4tcsEyfYZ rIYGByFYskHiHN52V4KEvwFF1LMBvHbjbmVTSbvFVCaIRuxb9QwJPMEP1Y6osZ/IZGdn W9cJAUbTqMA3DGnpkmP238pTha96UmGDLJr4qIkvmFjQu11OGD0opG2cyRR2O6dRU+uR U+vVhnHjYo88QEsq4PTTw+zNDOOUnmuSu+0KSSgYcq944TTY44XJhVWrP70aP/S807yy 2/TA== X-Forwarded-Encrypted: i=1; AJvYcCWm0MQcK/2t1bg12m54yE7eWq5KIBzCURMx3fbJfnnm4lJ5YbXcBwvjZLxsTIglt0EtMRervAE7yh551K4=@vger.kernel.org X-Gm-Message-State: AOJu0YzLdQORuouih4WU/78kYiQhbBcNSb/spR4XFuqttEnpFt6qfw+/ atGiQ3xEQRjENH8kM355O2Ph3rS8FP9PLopkicp7pWoveumjVhpA2GAA X-Gm-Gg: AY/fxX4fWeolbWNQCzqUhdb4acU+wW0se5nh4QV0bl2MqzKgHZCV9lWpMbyOTbZG2Zo NZTozeJVhooBsxXXXM+3vsBc0hEY1V9iOfajLCO0yoxO7S+lKpe5/ePUmxaojv84HICiL7uymw0 GX432KXFFgRrO+5hTRIXUgmlkHex9C6+pnFkvMfqLCDAYLKOv6V0POwyMuxtYlE4MXpif//Hauu 5jcwc3QGdZEb+AqxAuHbfv/PeDRCmUR6Bi0ATws2CO6RLTu6NttfXVBVPP5NWQztKrDmugdA6s5 +GjpYmSxCAilWaWZVh4KCjLMYdTg1lhV03zUN6UPJTJiqMQ955t6vJe8I3OO35uzEddDp4tbm/w YigG2qsNa85cOwayIILbCPs6x+3Qh9JQxIioGt/psmvbpJEqFJUDsmf4trQpDCEWSKLh72oNgmb G/q4jIdk1YCvehWgE7iDlaT2tlqw== X-Google-Smtp-Source: AGHT+IGyoLWqkhk6HNGdCwYOoM3jG2DoR/KQRVZcJr9YqWUTu1tHClglXbICbAvlyxhP0k0QSxD03g== X-Received: by 2002:a17:90b:3941:b0:34c:253d:581d with SMTP id 98e67ed59e1d1-34f68c4cc61mr17622637a91.9.1768292129845; Tue, 13 Jan 2026 00:15:29 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:180a:3788:c683:2f64]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-350ff05492dsm657199a91.3.2026.01.13.00.15.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jan 2026 00:15:29 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bingjiao@google.com, jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com Subject: [PATCH v4 1/3] mm: memory-tiers, numa_emu: enable to create memory tiers using fake numa nodes Date: Tue, 13 Jan 2026 17:14:51 +0900 Message-ID: <20260113081453.8293-2-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260113081453.8293-1-akinobu.mita@gmail.com> References: <20260113081453.8293-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This makes it possible to create memory tiers using fake numa nodes generated by numa emulation. The "numa_emulation.adistance=3D" kernel cmdline option allows you to set the abstract distance for each NUMA node. For example, you can create two fake nodes, each in a different memory tier by booting with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704". Here, the abstract distances of node0 and node1 are set to 576 and 704, respectively. Each memory tier covers an abstract distance chunk size of 128. Thus, nodes with abstract distances between 512 and 639 are classified into the same memory tier, and nodes with abstract distances between 640 and 767 are classified into the next slower memory tier. The abstract distance of fake nodes not specified in the parameter will be the default DRAM abstract distance of 576. Signed-off-by: Akinobu Mita Reviewed-by: Jonathan Cameron Reviewed-by: Pratyush Brahma --- v4: - remove unnecessary include of linux/node.h, suggested by Jonathan Cameron - include linux/notifier.h for the notifier_block, suggested by Jonathan Ca= meron - typo in abstruct distance value in the commit log v2: - fix the explanation about cmdline parameter in the commit log mm/numa_emulation.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c index 703c8fa05048..2d05e61570cc 100644 --- a/mm/numa_emulation.c +++ b/mm/numa_emulation.c @@ -6,6 +6,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -344,6 +347,27 @@ static int __init setup_emu2phys_nid(int *dfl_phys_nid) return max_emu_nid; } =20 +static int adistance[MAX_NUMNODES]; +module_param_array(adistance, int, NULL, 0400); +MODULE_PARM_DESC(adistance, "Abstract distance values for each NUMA node"); + +static int emu_calculate_adistance(struct notifier_block *self, + unsigned long nid, void *data) +{ + if (adistance[nid]) { + int *adist =3D data; + + *adist =3D adistance[nid]; + return NOTIFY_STOP; + } + return NOTIFY_OK; +} + +static struct notifier_block emu_adist_nb =3D { + .notifier_call =3D emu_calculate_adistance, + .priority =3D INT_MIN, +}; + /** * numa_emulation - Emulate NUMA nodes * @numa_meminfo: NUMA configuration to massage @@ -532,6 +556,8 @@ void __init numa_emulation(struct numa_meminfo *numa_me= minfo, int numa_dist_cnt) } } =20 + register_mt_adistance_algorithm(&emu_adist_nb); + /* free the copied physical distance table */ memblock_free(phys_dist, phys_size); return; --=20 2.43.0 From nobody Sat Feb 7 10:08:28 2026 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0FE737BE8F for ; Tue, 13 Jan 2026 08:15:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292138; cv=none; b=eSQL+d+l5kFRm9/4a5GaI0nwSvJfP0ldjEqZKz4jur7O67wjPZ4bTlsWKrXyQRx3ZeMr13lzuqgtDeLBcj8IVl/zsQFbyX4Ppvr3w8MEmZIgsNDNSNtuBE2IMRJhLmuEiYOc0BkbVkL/WyQUsjiSD/XVGQqCq7FnY9TqEfvsuRg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292138; c=relaxed/simple; bh=UwxEPyduWf6Cf4EObr2HFsSLyyogC0TXzqPXXd0TF7I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fQ4AsBDpqAbD4Zn34P//N9H9ssemvvE+ll3dd618K8jjuHrFx1avi4R4MJm7p0us4SxVLPE1sTXEX0YNvYSZSaQyCrilYoPXT31If0c2ywzd+zvR5GExQCRStR9npkHMKu+5ww3r1io3XBwIV28JCObRDUtSNfd1x6qwneOHxwo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dbz34y+i; arc=none smtp.client-ip=209.85.216.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dbz34y+i" Received: by mail-pj1-f67.google.com with SMTP id 98e67ed59e1d1-34f634dbfd6so5255052a91.2 for ; Tue, 13 Jan 2026 00:15:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768292136; x=1768896936; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UoxKrGhyaE4VBNdB4C6I31y/frdPcxDPShYvaQtOSlE=; b=dbz34y+iwWauQZYp9WUx9WDsk9ubXo4VmRCjdyOxIRNJZ8BSX9F71+XtOkBLWqOMbq jJKSjpzSWI6BGQ7bKas8na/gZcTdVycbrZDRqJtNVzIvOCmpzo8po2MhtdjzgkFeGGft V793OwozaCxd8dJr+Y4xsuijGZnfKYvARNPcNUkWXfXaKYESzJtZJU4dPNKlvPNXYscT q90EfTXIVpQ62CUZtH6VuL6FWrHPX2JPJ15ucOIEqKqsCplbLYPH57/gB/XqTjYSMGVo yImB8DJ1ePXQkFuivLyrGAiqnS1JYSM4y0rzHhluZSCz9b9FCumgV37Edl7Spiy5KyxH QD5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768292136; x=1768896936; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=UoxKrGhyaE4VBNdB4C6I31y/frdPcxDPShYvaQtOSlE=; b=BBdyNyVfY9hvOgkFq9SrhcAR0PV0nZTHI2iMKOEe701VgxzgNatIYhMriW3IE2OuB6 nJ0p23YghOTqEGzPAPepUa2B/KpWeKLxh9D4/BfWfjEZ2Ob2V+NB4az/at6o75BxT45R cc72QMjMJqSaORpndNFrFz1l8kManSHK8XSkXJknseNJCgEhPeY7Uvvm9WbnXq5u2FRN ReSdx3xsqnJuXJnPAvSJDPuDp6rIKAaouzEWn2SDv0lEzbTaR1avzptjoip+k5oBIYrV m72WrU9VP7GLgTezAd/mK1jZduGdO0QewfaBZtkwqEIu5P9XIf1orZm/YmSnZ8GDu1eS YBAQ== X-Forwarded-Encrypted: i=1; AJvYcCXN3rJw9DQs0ouixjwXALsNHtYmv7W9bPZHXyU6zbDvaCqV1KpEo1YEUSFeHVYPzJdwEZeezvNcE02ArVo=@vger.kernel.org X-Gm-Message-State: AOJu0YxxVf1m3bE5P94AZNt899ke0Tuc74sYswjQleeuyIwzhEcJoCoq 7ads9W9UgQRe0edX5Wx8b+8sNeOGuSU2Ys/kzZ49AjtJlUmnNjMm8PD0 X-Gm-Gg: AY/fxX7Bqc2JLaelZ7Xm4cPo53M+hYw13S81vqD6mhtBJ/ioMUa84pHFa21wZCfN/FM Z+552tkFvjEijQM9JGkqLMlI6ZN3It5pGAgRI+u3nRU9aw4TR8MQ2WkEVes0k7HXQ6QdYl/wgsY RHBZnZbmtmbkti6iMkMyrBSJ4r8zVXPya+AZ84pjMM/qReNgphKae98vl+aFQ+dssum0l6aPvN/ NdmUIzKzuM3TXKxn1XLgFjgLsIA2mBH/GaJTuguf7lkeGgTARyQa5WLPx4thXRt8qq2UJ+FbQBS oMfEVKcAsCTOQlLH/jR5zlAxb29793iRastTUrCtGX6yrtjgtv7QnqAelebtBF2UtrdWXvVkX7k XN0rqszZdOsfZr5yPMCw44BgXDot04E4o3PbftJKgkRfuMRWJoo28gWVvza8ex22OQV51s8Jytq /84XBbazv+k7sXfnYYC+M3HMaGvg== X-Google-Smtp-Source: AGHT+IFI8tZbu3gKqGSLHJvoaf0AjOxH+SKwPXLRujBsbcbN09R1JhQxlQ2NzJCr7k+pKN2Pqi467w== X-Received: by 2002:a17:90b:3c06:b0:34a:b1ea:664e with SMTP id 98e67ed59e1d1-34f68c4d4famr19778021a91.15.1768292135720; Tue, 13 Jan 2026 00:15:35 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:180a:3788:c683:2f64]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-350ff05492dsm657199a91.3.2026.01.13.00.15.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jan 2026 00:15:35 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bingjiao@google.com, jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com Subject: [PATCH v4 2/3] mm: numa_emu: add document for NUMA emulation Date: Tue, 13 Jan 2026 17:14:52 +0900 Message-ID: <20260113081453.8293-3-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260113081453.8293-1-akinobu.mita@gmail.com> References: <20260113081453.8293-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a document with a brief explanation of numa emulation and how to use the newly added "numa_emulation.adistance=3D" kernel cmdline parameter. Signed-off-by: Akinobu Mita Reviewed-by: Jonathan Cameron Reviewed-by: Pratyush Brahma --- v4: - typo in abstruct distance value - add information about supported architectures in numa_emulation.rst v2: - added in v2 Documentation/mm/index.rst | 1 + Documentation/mm/numa_emulation.rst | 31 +++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) create mode 100644 Documentation/mm/numa_emulation.rst diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst index 7aa2a8886908..7d628edd6a17 100644 --- a/Documentation/mm/index.rst +++ b/Documentation/mm/index.rst @@ -24,6 +24,7 @@ see the :doc:`admin guide <../admin-guide/mm/index>`. page_cache shmfs oom + numa_emulation =20 Unsorted Documentation =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D diff --git a/Documentation/mm/numa_emulation.rst b/Documentation/mm/numa_em= ulation.rst new file mode 100644 index 000000000000..81f15ea68022 --- /dev/null +++ b/Documentation/mm/numa_emulation.rst @@ -0,0 +1,31 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +NUMA emulation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +NUMA emulation is currently supported on x86, arm64, and risc-v architectu= res. +If CONFIG_NUMA_EMU is enabled, you can create fake NUMA nodes with +``numa=3Dfake=3D`` kernel cmdline option. +See Documentation/admin-guide/kernel-parameters.txt and +Documentation/arch/x86/x86_64/fake-numa-for-cpusets.rst for more informati= on. + + +Multiple Memory Tiers Creation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +The "numa_emulation.adistance=3D" kernel cmdline option allows you to set +the abstract distance for each NUMA node. + +For example, you can create two fake nodes, each in a different memory +tier by booting with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704". +Here, the abstract distances of node0 and node1 are set to 576 and 704, +respectively. + +Each memory tier covers an abstract distance chunk size of 128. Thus, +nodes with abstract distances between 512 and 639 are classified into the +same memory tier, and nodes with abstract distances between 640 and 767 +are classified into the next slower memory tier. + +The abstract distance of fake nodes not specified in the parameter will be +the default DRAM abstract distance of 576. --=20 2.43.0 From nobody Sat Feb 7 10:08:28 2026 Received: from mail-pg1-f178.google.com (mail-pg1-f178.google.com [209.85.215.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C77436920C for ; Tue, 13 Jan 2026 08:15:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292143; cv=none; b=e8rgGxBdZhM1ezqS7adRa5wO5tRRp9wSCOD1uB6hYlHhn0OMsG6Xrze6kvhwuX54xeNa9uoly1nrThSGLtoy5fS4+l1LEX4VN1rRMlQlLsAg6uJxnRYSSgXeU+W432VN0oP/nAF5T3GeoxWsf4tHYGnzbVGXz6sWKzE60nee9Nw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768292143; c=relaxed/simple; bh=fp4REK6J/MVZYC22Z2TU+pzeMJxeOKqpDFyn0zcX3KM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T5HtNEBqQTZqIFoYa6fYs/mLc2hwLtxR9bc+xssSacCHKPNMz2xxUzLeCyTXGsyfr4T+NJfZBh1bMLc2J7wYOdHPA+pmBxOKE9MMd0S+WUCzH6xijVnB4n7nlH7sZ3WcpJFjs/v7537eIbFf3SLZUACnH8KiNvqzyAO6bh8wF3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FKM+7dq5; arc=none smtp.client-ip=209.85.215.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FKM+7dq5" Received: by mail-pg1-f178.google.com with SMTP id 41be03b00d2f7-c06cb8004e8so2728949a12.0 for ; Tue, 13 Jan 2026 00:15:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768292142; x=1768896942; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fn+2HMuW53bWkoRvrsYzvFUGyuuVFRazgaZ3DqLSbig=; b=FKM+7dq5LtVXfWMnkPk9bhrTl9tndOTOTmCsf1pGWsegz+cVsePAPstJpj2U7rbNkC A0bny0Dg6t8vZCl+zkU2vs/wK2JuRbXvpSbOecz1ghIZH1P+OrZg9Lsy3IQeEvv+ba9h F0yjEefvqEQ4nCJMSDoClfwDodNKufWBN6eKN8i8L431T5Mg6/hF8adzQipBIG56ypwu 9PrA/zQlIRvHM+fQz4tZHw//PabB3rzLCXxksS+/KXeKlgb/FZkj0odP1ZDsdAtrM/1Z CWLprO6cM4G5RYjnuyjspp0lDXSqq3087KWrii3pxaRx55RiPOerWxHKllKVlUktauwc U+wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768292142; x=1768896942; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fn+2HMuW53bWkoRvrsYzvFUGyuuVFRazgaZ3DqLSbig=; b=EH16ao7yWt45992VjWILUFzN+mlG+aiPp/45zh7fr3Vu9On91mUz4csQ0SStYZRPWK IgfXwpLBhWKjZO9BXR+53beSG6+JWA7mWwEuQUr7/6tSdmxFds+Cdj4ut+CIcne5oEhS Tz4LabVWPE+ckobDjFuliCKQQT5SIWljHJ1xdwj4is34RWzRRrPki5HiTZkSUQfEps+V K+4Ei5Lgq80tLSaIV6JbyYlUG9/kMEbTQryPG+DKXzXnmZEbn7aOexvSMxIO+0QI48ky UAYaa6Fdqs4EikmzuU5BQ2RiardEqv50sTzLiAoXhpssGTl3gO3yGUTOW4zySy1Ppg5O MLGg== X-Forwarded-Encrypted: i=1; AJvYcCWOmvPHsl077J94dq5WVShuibx3+62bM/aXx8Q5MuAHIYY8PhcQnezaBn3gzY8U7a/TczbHwuoA0YOi/9c=@vger.kernel.org X-Gm-Message-State: AOJu0YwVUB44ovezhV3UgCVhQh9mC9bxGhDx2126TM+IxWhQFyWw2rLZ 2daNb2tvZVzc3td8K1oJkcEMp8WSpHwqqjlvpkR69OwPtNRpbre78/gV X-Gm-Gg: AY/fxX6RX+SBSHWB8JJwXrR/r9awalx9CkghuR/O+PTtFVNmzSal/1nixMWFZQKOnl9 gCXXvXt1E3PqXl7eqhm7NHhML4EzybP3VXdr3Ee/IW4nqh8NSwhH0bMQ5DafpRU4NJMuK4toSTr B1X0O/jco/axSTwOFeukPWw3o1ytvCmAnQsIiOkN7sjxYs9BkNP94niPwBSBsjTOYMMLnKvy9zt 2qAKUPhMOl4HLkyEAeP0HlOStmTg3GcAWEvxfb/SFChSAOYYvE1vQ1/aYfBi3r40SoHZ0KsV41f u6n+eycNZQoGn9lqGqmpNPRmyg0svFMIon2Fy1fNbQzilgseUjREjQTAB9ll8CHw9oiFFWEj6oQ 4rzvkzO1qpibnaObvi/eq5uuvwwgdBthfyintKvpW0t+YyxZGDzdsKWMXDgY6UT/cZVk6kaOodw YdTY8GUqT33aW5ZaiqaZHuQ7HWEQ== X-Google-Smtp-Source: AGHT+IED9gfPtbQO5QOad8Z+7M8wVwtLncJwWLcyW2QnowsMI6f260Caic2klkdpN/rarjVFLNQLMQ== X-Received: by 2002:a17:90b:3852:b0:343:b610:901c with SMTP id 98e67ed59e1d1-34f68cb9036mr19913726a91.26.1768292141715; Tue, 13 Jan 2026 00:15:41 -0800 (PST) Received: from localhost.localdomain ([240f:34:212d:1:180a:3788:c683:2f64]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-350ff05492dsm657199a91.3.2026.01.13.00.15.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Jan 2026 00:15:41 -0800 (PST) From: Akinobu Mita To: akinobu.mita@gmail.com Cc: linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, mhocko@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com, joshua.hahnjy@gmail.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bingjiao@google.com, jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com Subject: [PATCH v4 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Date: Tue, 13 Jan 2026 17:14:53 +0900 Message-ID: <20260113081453.8293-4-akinobu.mita@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260113081453.8293-1-akinobu.mita@gmail.com> References: <20260113081453.8293-1-akinobu.mita@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On systems with multiple memory-tiers consisting of DRAM and CXL memory, the OOM killer is not invoked properly. Here's the command to reproduce: $ sudo swapoff -a $ stress-ng --oomable -v --memrate 20 --memrate-bytes 10G \ --memrate-rd-mbs 1 --memrate-wr-mbs 1 The memory usage is the number of workers specified with the --memrate option multiplied by the buffer size specified with the --memrate-bytes option, so please adjust it so that it exceeds the total size of the installed DRAM and CXL memory. If swap is disabled, you can usually expect the OOM killer to terminate the stress-ng process when memory usage approaches the installed memory size. However, if multiple memory-tiers exist (multiple /sys/devices/virtual/memory_tiering/memory_tier directories exist) and /sys/kernel/mm/numa/demotion_enabled is true, the OOM killer will not be invoked and the system will become inoperable, regardless of whether MGLRU is enabled or not. This issue can be reproduced using NUMA emulation even on systems with only DRAM. You can create two-fake memory-tiers by booting a single-node system with "numa=3Dfake=3D2 numa_emulation.adistance=3D576,704" kernel parameters. The reason for this issue is that memory allocations do not directly trigger the oom-killer, assuming that if the target node has an underlying memory tier, it can always be reclaimed by demotion. So this change avoids this issue by not attempting to demote if the underlying node has less free memory than the minimum watermark, and the oom-killer will be triggered directly from memory allocations. Signed-off-by: Akinobu Mita --- v4: - add a code comment in can_demote() v3: - rebase to linux-next (next-20260108), where demotion target has changed from node id to node mask. v2: - describe reproducibility with !mglru in the commit log - removed unnecessary consideration for scan control when checking demotion= _nid watermarks mm/vmscan.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f35afc5093dc..f980c533c778 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -358,7 +358,22 @@ static bool can_demote(int nid, struct scan_control *s= c, =20 /* Filter out nodes that are not in cgroup's mems_allowed. */ mem_cgroup_node_filter_allowed(memcg, &allowed_mask); - return !nodes_empty(allowed_mask); + if (nodes_empty(allowed_mask)) + return false; + + /* Check if there is enough free memory in the demotion target */ + for_each_node_mask(nid, allowed_mask) { + int z; + struct zone *zone; + struct pglist_data *pgdat =3D NODE_DATA(nid); + + for_each_managed_zone_pgdat(zone, pgdat, z, MAX_NR_ZONES - 1) { + if (zone_watermark_ok(zone, 0, min_wmark_pages(zone), + ZONE_MOVABLE, 0)) + return true; + } + } + return false; } =20 static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg, --=20 2.43.0