From nobody Wed Feb 11 09:22:46 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 583481D54D for ; Fri, 8 Mar 2024 15:17:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709911025; cv=none; b=VO7Q778eTWcTNQG4Nuikm47mcwitfaV4eAuGDsidi+ImJLArBr3YgQQVkSQTMhmmFQIzjtF/mIWxzRHLAN43OfBQ/zu6wcqNdVtaBoGL3wQ19FMR6earhGlOgjprbmA+LMKz98GKcHbd9KG7o6pmO5N8jSpCENJwVJPgWPiXziE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709911025; c=relaxed/simple; bh=i46mWTUPmsls5AqdeU81Fh2yDrBBdW1Qnq9rKANiCmk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ktQD/vfZJni8KINKesSupnIydyGQskeQ5C9alF83NuCBq28Q42KHoCZz/Jg0wZxAKfBswJOuv0bK15fKXAVVEVagwCkzsE/Ej/4qRKvCAGkiWphRkmmGo542nNwBHxkf3L8e2mZ9Du0iCyIqNx/v6Vw5QM+aM4c7BYU6qJljbb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=M5uhEC5u; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="M5uhEC5u" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 428F1wLv032748; Fri, 8 Mar 2024 15:16:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=nlqWPtM5+DbIaChmmWJ9jy3Ip4vFxPNuZPDBM45ynrI=; b=M5uhEC5uly1aj+/t0SibrVysHyUB5V3MH4tcj2D/jtHacxCYj4udL/Nzns77hV1n8tXC b4RsevVCczfUrb69NQ91nyaUT/4U5GIOyH2YGhctwGSXvLfL+9YzXz7/SCYgukTS0P/E DcvedO/oCL2oQirceBAVWHerhvvKeOMYqhUjGQOpCJ70AKQ5gFQ2UzhlVU/F814JDdHH fv1JBl00ezSyGd/fy53/w+FbW5zRn6iGJ8nP1hcVtW5AiA3NE7wdRVZ/i5DqpXCj2x5u lAGKg22ZDy2aiv3Jhzx54HOv6pSDC5lxLqUcyx9+MuopudSA2XRqjKEtz2k96qDdPQGu dw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wr4y788ff-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:34 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 428F3ESG005308; Fri, 8 Mar 2024 15:16:34 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wr4y788f8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:34 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 428E1WWS006077; Fri, 8 Mar 2024 15:16:33 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3wmeetncjy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:32 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 428FGTX427853246 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 8 Mar 2024 15:16:31 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 35ACC2004F; Fri, 8 Mar 2024 15:16:29 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FB952004E; Fri, 8 Mar 2024 15:16:26 +0000 (GMT) Received: from ltczz402-lp1.aus.stglabs.ibm.com (unknown [9.53.171.174]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 8 Mar 2024 15:16:26 +0000 (GMT) From: Donet Tom To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Aneesh Kumar , Huang Ying , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan , Donet Tom Subject: [PATCH v2 1/2] mm/mempolicy: Use numa_node_id() instead of cpu_to_node() Date: Fri, 8 Mar 2024 09:15:37 -0600 Message-Id: <744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donettom@linux.ibm.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: gqq7bvBqapxNbYosgNU363AECOXd4lLJ X-Proofpoint-ORIG-GUID: DWdelyt722JB92dQEikvrgbgtsRFaeD7 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-08_08,2024-03-06_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 phishscore=0 adultscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2403080123 Content-Type: text/plain; charset="utf-8" Instead of using 'cpu_to_node()', we use 'numa_node_id()', which is quicker. smp_processor_id is guaranteed to be stable in the 'mpol_misplaced()' function because it is called with ptl held. lockdep_assert_held was added to ensure that. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V (IBM) Signed-off-by: Donet Tom --- include/linux/mempolicy.h | 5 +++-- mm/huge_memory.c | 2 +- mm/internal.h | 2 +- mm/memory.c | 8 +++++--- mm/mempolicy.c | 12 +++++++++--- 5 files changed, 19 insertions(+), 10 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 931b118336f4..1add16f21612 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -167,7 +167,8 @@ extern void mpol_to_str(char *buffer, int maxlen, struc= t mempolicy *pol); /* Check if a vma is migratable */ extern bool vma_migratable(struct vm_area_struct *vma); =20 -int mpol_misplaced(struct folio *, struct vm_area_struct *, unsigned long); +int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, + unsigned long addr); extern void mpol_put_task_policy(struct task_struct *); =20 static inline bool mpol_is_preferred_many(struct mempolicy *pol) @@ -282,7 +283,7 @@ static inline int mpol_parse_str(char *str, struct memp= olicy **mpol) #endif =20 static inline int mpol_misplaced(struct folio *folio, - struct vm_area_struct *vma, + struct vm_fault *vmf, unsigned long address) { return -1; /* no node preference */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 94c958f7ebb5..7f944e0c4571 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1752,7 +1752,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) */ if (node_is_toptier(nid)) last_cpupid =3D folio_last_cpupid(folio); - target_nid =3D numa_migrate_prep(folio, vma, haddr, nid, &flags); + target_nid =3D numa_migrate_prep(folio, vmf, haddr, nid, &flags); if (target_nid =3D=3D NUMA_NO_NODE) { folio_put(folio); goto out_map; diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..ae175be9165e 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -992,7 +992,7 @@ void vunmap_range_noflush(unsigned long start, unsigned= long end); =20 void __vunmap_range_noflush(unsigned long start, unsigned long end); =20 -int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma, +int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf, unsigned long addr, int page_nid, int *flags); =20 void free_zone_device_page(struct page *page); diff --git a/mm/memory.c b/mm/memory.c index 0bfc8b007c01..4e258a8564ca 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4899,9 +4899,11 @@ static vm_fault_t do_fault(struct vm_fault *vmf) return ret; } =20 -int numa_migrate_prep(struct folio *folio, struct vm_area_struct *vma, +int numa_migrate_prep(struct folio *folio, struct vm_fault *vmf, unsigned long addr, int page_nid, int *flags) { + struct vm_area_struct *vma =3D vmf->vma; + folio_get(folio); =20 /* Record the current PID acceesing VMA */ @@ -4913,7 +4915,7 @@ int numa_migrate_prep(struct folio *folio, struct vm_= area_struct *vma, *flags |=3D TNF_FAULT_LOCAL; } =20 - return mpol_misplaced(folio, vma, addr); + return mpol_misplaced(folio, vmf, addr); } =20 static vm_fault_t do_numa_page(struct vm_fault *vmf) @@ -4987,7 +4989,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) last_cpupid =3D (-1 & LAST_CPUPID_MASK); else last_cpupid =3D folio_last_cpupid(folio); - target_nid =3D numa_migrate_prep(folio, vma, vmf->address, nid, &flags); + target_nid =3D numa_migrate_prep(folio, vmf, vmf->address, nid, &flags); if (target_nid =3D=3D NUMA_NO_NODE) { folio_put(folio); goto out_map; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 10a590ee1c89..e635d7ed501b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2477,18 +2477,24 @@ static void sp_free(struct sp_node *n) * Return: NUMA_NO_NODE if the page is in a node that is valid for this * policy, or a suitable node ID to allocate a replacement folio from. */ -int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma, +int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, unsigned long addr) { struct mempolicy *pol; pgoff_t ilx; struct zoneref *z; int curnid =3D folio_nid(folio); + struct vm_area_struct *vma =3D vmf->vma; int thiscpu =3D raw_smp_processor_id(); - int thisnid =3D cpu_to_node(thiscpu); + int thisnid =3D numa_node_id(); int polnid =3D NUMA_NO_NODE; int ret =3D NUMA_NO_NODE; =20 + /* + * Make sure ptl is held so that we don't preempt and we + * have a stable smp processor id + */ + lockdep_assert_held(vmf->ptl); pol =3D get_vma_policy(vma, addr, folio_order(folio), &ilx); if (!(pol->flags & MPOL_F_MOF)) goto out; @@ -2526,7 +2532,7 @@ int mpol_misplaced(struct folio *folio, struct vm_are= a_struct *vma, if (node_isset(curnid, pol->nodes)) goto out; z =3D first_zones_zonelist( - node_zonelist(numa_node_id(), GFP_HIGHUSER), + node_zonelist(thisnid, GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), &pol->nodes); polnid =3D zone_to_nid(z->zone); --=20 2.39.3 From nobody Wed Feb 11 09:22:46 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3B131D54D for ; Fri, 8 Mar 2024 15:17:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709911033; cv=none; b=bsw8EMrKIpgSAyBDlscHLGejmVtM3mmUPuU+X4UdrWjMW9HnpDL4XWFFpu1v74d7Ki43NkJyVVQZy6pVitUqK5K/7phGPpd0ZgfmaHD1TpwutL8W4NC5UwxTvXcqut1jAO8r29qX2G3MG6sxSBL31pVt5n5lIuC18zD3Eg+yYig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709911033; c=relaxed/simple; bh=L0GXK8sj0SWNJEA45uUS22w/R7Z6RwF0QpW+5FFjFLM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=J8Sjgx8mA2Sf8OI8w+XUSj2/Bry47qb0iLMYJBKFaKMflOAyTrOvLfWeIgMKGt3BrQo/PNRJobr0XP7ZRTIdiAUTCyNx4gFudACBOzG2zX8d7sdLJbKQEB303hK2n7+xMshFn2CHK1z23n9ZoduJyZpw9GlICkzud2ivcD2cybM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=XF6yIrsu; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="XF6yIrsu" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 428F22L4000388; Fri, 8 Mar 2024 15:16:42 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=xD4BUagKz9NF9+nwODZKIkXZMdm0V3dM0BpUskdMkng=; b=XF6yIrsuOOauiwJrDttTORqWLU1Pm7KBaaR/cpr/TCyg/DzlAhiIgwgr2sAwpdfAYaZ5 ahxw+C6pmcT+5UTdBXwWPGmIOrtJcxFirebHGBkGMSt+wzjek0YvOLmsOhDnMOsFwpSV NmSfr2ORSNYKN7eXagxp32naqzQWJjQW7h1pcAtT5S0UWOy4t+qfuBNG2LLCEkdlSlkH Ycs03HF3feLqy1seGkzPpcC5Vp3YFDDu1o56L87BKeTdvo89PFgxkiaEH6hEzjLadR55 Pn4v7eCQvEWaECeLjzndnKjQr06Tl0FkccB/HSJVv7u/KKwGChjTIwlJT1NqHZrdV9sX ZQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wr4y788gx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:41 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 428F4Iro013691; Fri, 8 Mar 2024 15:16:40 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3wr4y788ge-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:40 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 428DJ9Us024193; Fri, 8 Mar 2024 15:16:38 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3wpjwsrd37-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 08 Mar 2024 15:16:38 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 428FGYvP21365482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 8 Mar 2024 15:16:36 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A475C2004F; Fri, 8 Mar 2024 15:16:34 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9A4E120040; Fri, 8 Mar 2024 15:16:31 +0000 (GMT) Received: from ltczz402-lp1.aus.stglabs.ibm.com (unknown [9.53.171.174]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 8 Mar 2024 15:16:31 +0000 (GMT) From: Donet Tom To: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Aneesh Kumar , Huang Ying , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan , Donet Tom Subject: [PATCH v2 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Date: Fri, 8 Mar 2024 09:15:38 -0600 Message-Id: <369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donettom@linux.ibm.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: n2cYCcYTqRR81FJi3Lo2sRCKtpBExAl4 X-Proofpoint-ORIG-GUID: shpIrlyi5QEj0KPM72RBPFsGNGyQhM1T X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-08_08,2024-03-06_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 spamscore=0 impostorscore=0 mlxscore=0 lowpriorityscore=0 mlxlogscore=999 clxscore=1015 phishscore=0 adultscore=0 bulkscore=0 suspectscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2403080123 Content-Type: text/plain; charset="utf-8" commit bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") added support for migrate on protnone reference with MPOL_BIND memory policy. This allowed numa fault migration when the executing node is part of the policy mask for MPOL_BIND. This patch extends migration support to MPOL_PREFERRED_MANY policy. Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag MPOL_F_NUMA_BALANCING. This causes issues when we want to use NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, the kernel should not allocate pages from the slower memory tier via allocation control zonelist fallback. Instead, we should move cold pages from the faster memory node via memory demotion. For a page allocation, kswapd is only woken up after we try to allocate pages from all nodes in the allocation zone list. This implies that, without using memory policies, we will end up allocating hot pages in the slower memory tier. MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better allocation control when we have memory tiers in the system. With MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only of faster memory nodes. When we fail to allocate pages from the faster memory node, kswapd would be woken up, allowing demotion of cold pages to slower memory nodes. With the current kernel, such usage of memory policies implies we can't do page promotion from a slower memory tier to a faster memory tier using numa fault. This patch fixes this issue. For MPOL_PREFERRED_MANY, if the executing node is in the policy node mask, we allow numa migration to the executing nodes. If the executing node is not in the policy node mask, we do not allow numa migration. Signed-off-by: Aneesh Kumar K.V (IBM) Signed-off-by: Donet Tom --- mm/mempolicy.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e635d7ed501b..ccd9c6c5fcf5 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1458,9 +1458,10 @@ static inline int sanitize_mpol_flags(int *mode, uns= igned short *flags) if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) return -EINVAL; if (*flags & MPOL_F_NUMA_BALANCING) { - if (*mode !=3D MPOL_BIND) + if (*mode =3D=3D MPOL_BIND || *mode =3D=3D MPOL_PREFERRED_MANY) + *flags |=3D (MPOL_F_MOF | MPOL_F_MORON); + else return -EINVAL; - *flags |=3D (MPOL_F_MOF | MPOL_F_MORON); } return 0; } @@ -2515,15 +2516,26 @@ int mpol_misplaced(struct folio *folio, struct vm_f= ault *vmf, break; =20 case MPOL_BIND: - /* Optimize placement among multiple nodes via NUMA balancing */ + case MPOL_PREFERRED_MANY: + /* + * Even though MPOL_PREFERRED_MANY can allocate pages outside + * policy nodemask we don't allow numa migration to nodes + * outside policy nodemask for now. This is done so that if we + * want demotion to slow memory to happen, before allocating + * from some DRAM node say 'x', we will end up using a + * MPOL_PREFERRED_MANY mask excluding node 'x'. In such scenario + * we should not promote to node 'x' from slow memory node. + */ if (pol->flags & MPOL_F_MORON) { + /* + * Optimize placement among multiple nodes + * via NUMA balancing + */ if (node_isset(thisnid, pol->nodes)) break; goto out; } - fallthrough; =20 - case MPOL_PREFERRED_MANY: /* * use current page if in policy nodemask, * else select nearest allowed node, if any. --=20 2.39.3