From nobody Sat Feb 7 06:21:44 2026 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D75D0200132 for ; Wed, 27 Nov 2024 16:28:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724926; cv=none; b=FSJdnZcvgaoZbkSNp+QcFE7Gqmx4osrgBrVuHPjxqVEb3yTm11d5+wSX6Pgij9uMfaIj6Nh2XwYHM4ymyYh6CC6rw6WxHjuVN8cLsCHvMwTTQD22w6OlrSDcBKisn/pf/MZifL3/GeliyeuzwV2Y7xQuayBMANgonZ99rUCAnAA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724926; c=relaxed/simple; bh=04Q0ZnnNtqiGHJ5FGrUCZRfXltXGgfyyrDds83wZDEs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JkqNEiWexcKgRAC2zOOMLBiqs1Vw0UEi6QV7/cVQLVC1CCtS3qupL3I7roCx4qg9APInsjiUhDqbC+kK71hKOWzd8Pt2Nniu137zV8RhzzezEDkfr1l9C6MRF8THMhqVrcXjplrhTTDfgberS2jQkZjkn2xGMq9n+rD0lN29MWA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=q2iJO2Gc; arc=none smtp.client-ip=209.85.160.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="q2iJO2Gc" Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-46677ef6920so8899151cf.0 for ; Wed, 27 Nov 2024 08:28:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1732724924; x=1733329724; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=prxwR8ozRxMxxXvOfV2zxhvLnEBIflQoHb028mkGmho=; b=q2iJO2GcTLDHJjMSl//KzUf0NvDtR0LO46FfDoWmUToTz2aW2mYn0fD80Eo8flMQVb 8/cuwwZd+PrlN7p466ptqRlCUQYJOPGanO3DFQ4CwpZ8/C6QT3y8V4obeRx9AJWfRDps CuXEMC7gMtpsu986vMP602Z8nD4mGfXLse8/jmJxRBo9UCN1ScyeeQrG9pS5LIcsXR4V sHWTVVFKMjGmD6uHUspANAQvfmEErb2YMRJJnl+rrQ2BDYqrmb9/Emeu68uxdNkX5M+O +/dWOLaD1RAFVvE7vLUAhUzbG66pEfNEKu8+oMarLPv2NheFS4HfFNivOF+dQiup8I3S aXNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732724924; x=1733329724; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=prxwR8ozRxMxxXvOfV2zxhvLnEBIflQoHb028mkGmho=; b=cxY59iGEGUHYd5OAt/AL/ZXONr4U5XuO2KxiNitjV0+HXb7VHsUm3sVGZu8ZRiIC0M +hho7TbiTcUywm7Wd3vIkqPNFFN13EtLUhJK6ac1d04zOZLr6sEGkGyIbwglo43QsY0N cbs6GBew21k5LI3qaw3InniCwqgnbaf98b51X7d0cdm3rsAOxTlus0+IEuXDoML04wcu VIp3NYAE+39aZprwBhA3MdsES7ryJewBXp8HaRDOjHzfxOCOqNr/9SVxIWDmLigDbPuO OaUNdttz0KWGL/tURiXcLyeDQLS1/+HXk3gdFPXnusU3lSoIahuNFROCehn4mSC64X6P p9ZQ== X-Gm-Message-State: AOJu0Yy968uCSJvseve560YUV4olebbVTsSB6nO2jMkbUN3qFoCbN9qy DZuL35+KzZ0jH9wJJ130PriTSX4IdUkKzC6+DkIF5VnVjoXSEnrgaNJ9JIfoP0I= X-Gm-Gg: ASbGncsEA2fAky8fd250FCjYxei7HESWpECnyGOIUB029NyL8Ey1+NXovrzQ8r1SaYO uApAAY4Yy2d7ykB88wLtDCToVy7bkvSnbMNv9QbpLcZMB1ohf8DLWL/UX6tw80feDbiWIIQ9OkU lelgwFS36Uz/uC8BZZNAtY/NAss+UqyXmCVGu4Rxqi4JJDi7sq9s5TaYPcjZgTEqXJylUJcR1JA s6YfG9fCCTyDbzaZbZB7B5N8kvnYIOwrun1sVB2vn27DM8FOBSjxr6+NLxGsAUdzpLqirI+H7WR 2d5kcYNQo48g0CUOXlMJK3RV2uTEL/6SYXw= X-Google-Smtp-Source: AGHT+IEVCaFgyRliI/ntnaM1UvjIzFe3wAKf9UlR8ZwU776bFJXRF7ZXcH88bA7pVNDHWLMNqTde5w== X-Received: by 2002:ac8:59d0:0:b0:466:92e1:37e1 with SMTP id d75a77b69052e-466a3bc762emr134984061cf.26.1732724923686; Wed, 27 Nov 2024 08:28:43 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466871721c2sm45002921cf.17.2024.11.27.08.28.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2024 08:28:42 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com, gourry@gourry.net, akpm@linux-foundation.org, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com Subject: [PATCH 1/4] migrate: Allow migrate_misplaced_folio APIs without a VMA Date: Wed, 27 Nov 2024 03:21:58 -0500 Message-ID: <20241127082201.1276-2-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241127082201.1276-1-gourry@gourry.net> References: <20241127082201.1276-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To migrate unmapped pagecache folios, migrate_misplaced_folio and migrate_misplaced_folio_prepare must handle folios without VMAs. migrate_misplaced_folio_prepare checks VMA for exec bits, so allow a NULL VMA when it does not have a mapping. migrate_misplaced_folio must call migrate_pages with MIGRATE_SYNC when in the pagecache path because it is a synchronous context. Suggested-by: Johannes Weiner Signed-off-by: Gregory Price Reviewed-by: Raghavendra K T Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Keith Busch --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index dfb5eba3c522..3b0bd3f21ac3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2632,7 +2632,7 @@ int migrate_misplaced_folio_prepare(struct folio *fol= io, * See folio_likely_mapped_shared() on possible imprecision * when we cannot easily detect if a folio is shared. */ - if ((vma->vm_flags & VM_EXEC) && + if (vma && (vma->vm_flags & VM_EXEC) && folio_likely_mapped_shared(folio)) return -EACCES; =20 --=20 2.43.0 From nobody Sat Feb 7 06:21:44 2026 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AF3D200BA0 for ; Wed, 27 Nov 2024 16:28:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724928; cv=none; b=j7RbEMFua7FqrSKFfgimPJtaIXPaTfRW6i6x4sSq2Bk+sfasimkCrocDQb7w3jKXMhVxVKgTR3AEO6XnGacu3mF2DH6+elHI9b4qHSEQdcAn4YK21bU6KmWyUssQ1Bu1iY6PNGXdhIdSnzjnN3mfUF84a5OHMZmLR5Rao5osuQc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724928; c=relaxed/simple; bh=09oWr2CEAWWbVHzonkFNu6O/nfXC7MQTkP+NgsRnT8s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=O98KbUzxHeoTy+vjhjZMEJ615zWEcvlV2s0B4z5ltoj5tIofcwT7ljQ4gh9REujxY4wdg6jpu58twjD57qPNysojxqNZ5kib4K/yUWelooxd+D15eMXadKUyl+3hXzTlS9YqRAcpFTN0/gIDWtKMt5q1+eyx3TTd7zao/Rt8Q7k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=H6A/gRNn; arc=none smtp.client-ip=209.85.222.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="H6A/gRNn" Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7b670de79e6so175079985a.3 for ; Wed, 27 Nov 2024 08:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1732724925; x=1733329725; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1YVRetr6ViR4L1BBUGXRQMyVoFD2gs0Ovt3TmolcbLM=; b=H6A/gRNnXwtiX/HoGKkQvw8VtCz2/IKeIlF/EJrCPGBCw78fUyubHNRUar9ZCq0a69 HCyrJwos+nZkztl0IFjZh1lQ2ZksoMbyk7BjE6RAzM42lE0N3B/OkvcgRr9VB6QZWZaU nj8PtugtkRDmsAJI07sScI3+RYEoKM67XSU6yNIuaUGA/jx3xVgfjjxfzvjVBKb7cji7 vollIWF5DDTY8afjajav89QTYEbdysAZ0xXd7+UJeTvNk0M5r98Nst6OwP3Yto51H4ZN 33c9XrZ1qdDWrMz75ByHBXjO8ZRUsmAXe9Ob9IWpIW8qym4kZh3ivbM1RG7FtGJ5kY/B N5Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732724925; x=1733329725; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1YVRetr6ViR4L1BBUGXRQMyVoFD2gs0Ovt3TmolcbLM=; b=bw2i7Z2pE7OlXpK529E1YDV2j2iKtkGFCpZNi/Hg6IUmBKs0mFzPjheSwF7Rg62lDc n4NAqE8RhIDmnmqwb0RcRs4Ualr12aR3713Fq9aIycn4OndavYLS8LXJW753yoatFt+L mhXHUjCjRrm/8pqUlXkrbM17TmhLcPIoJvSg+xpRnYyQHuSIqaiuCWmgihebljzljPG4 KgZoIJmxpRoHdprdZ8nWQsZPyFIb4eV8qAX+41u1z2aJeOsrbRxDNbdURR7rjU3wGMed us7ufm4X0QMlehnooHZOJtgbmuuXwTrZ1mDCooIEz35fsn8X1PGRtjmwG2jLRGnNrHI3 PMjw== X-Gm-Message-State: AOJu0YysfD00MHrCyOoTLqGlN7qeLhhAYSzFtHFcrcJAhbtP7k6fPER6 WAJDpb0/9PEULMJ9uChrloxpLrDM2+oCGPCCOUWYpEDgZGhhJysWj0VYc1u7bcs= X-Gm-Gg: ASbGnctpT0gyfpvNic/Iuity7u+6eLfvZ+9Ut/F3PK8r7+jnz10flEjCaByc6I/LxEe bDNOBwxIhOOnauPePvy2tXgId6wruLAOP2b5vsiNfZbyYKC6ng3y5dFQvyN+/XpRs5Tr03CUf9m yyx5lpOUM78KlvWWn8Z19Haq814Ry+j9FG06jA2/XJTaMoG6NQvbPbOcGBPgPN+ovrmgB6GT9It QY4s+f8IcDfpddIC7anfdendXLMp37l9dBu0FXoMPac4lpphXgcGVxdduR38yDZfrmDIIXz99m1 NOe8PFV1AN8z1DO3qH2GCY6JtnlrYHBkkKs= X-Google-Smtp-Source: AGHT+IE7MyR6097VIWfq3+is83/Yc33iicre1dSbm4nHjYBDTL/wY941BZN4EnUeI4leGNLYuU1nKQ== X-Received: by 2002:a05:620a:2915:b0:7b3:51a5:556 with SMTP id af79cd13be357-7b67c292a7cmr475141385a.22.1732724925502; Wed, 27 Nov 2024 08:28:45 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466871721c2sm45002921cf.17.2024.11.27.08.28.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2024 08:28:45 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com, gourry@gourry.net, akpm@linux-foundation.org, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com Subject: [PATCH 2/4] memory: allow non-fault migration in numa_migrate_check path Date: Wed, 27 Nov 2024 03:21:59 -0500 Message-ID: <20241127082201.1276-3-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241127082201.1276-1-gourry@gourry.net> References: <20241127082201.1276-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" numa_migrate_check and mpol_misplaced presume callers are in the fault path with accessed to a VMA. To enable migrations from page cache, re-using the same logic to handle migration prep is preferable. Mildly refactor numa_migrate_check and mpol_misplaced so that they may be called with (vmf =3D NULL) from non-faulting paths. Also move from numa balancing defines inside the appropriate ifdef. Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch --- mm/memory.c | 28 ++++++++++++++++------------ mm/mempolicy.c | 25 +++++++++++++++++-------- 2 files changed, 33 insertions(+), 20 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 209885a4134f..a373b6ad0b34 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5471,7 +5471,20 @@ int numa_migrate_check(struct folio *folio, struct v= m_fault *vmf, unsigned long addr, int *flags, bool writable, int *last_cpupid) { - struct vm_area_struct *vma =3D vmf->vma; + if (vmf) { + struct vm_area_struct *vma =3D vmf->vma; + const vm_flags_t vmflags =3D vma->vm_flags; + + /* + * Flag if the folio is shared between multiple address spaces. + * This used later when determining whether to group tasks. + */ + if (folio_likely_mapped_shared(folio)) + *flags |=3D vmflags & VM_SHARED ? TNF_SHARED : 0; + + /* Record the current PID acceesing VMA */ + vma_set_access_pid_bit(vma); + } =20 /* * Avoid grouping on RO pages in general. RO pages shouldn't hurt as @@ -5484,12 +5497,6 @@ int numa_migrate_check(struct folio *folio, struct v= m_fault *vmf, if (!writable) *flags |=3D TNF_NO_GROUP; =20 - /* - * Flag if the folio is shared between multiple address spaces. This - * is later used when determining whether to group tasks together - */ - if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHARED)) - *flags |=3D TNF_SHARED; /* * For memory tiering mode, cpupid of slow memory page is used * to record page access time. So use default value. @@ -5499,17 +5506,14 @@ int numa_migrate_check(struct folio *folio, struct = vm_fault *vmf, else *last_cpupid =3D folio_last_cpupid(folio); =20 - /* Record the current PID acceesing VMA */ - vma_set_access_pid_bit(vma); - - count_vm_numa_event(NUMA_HINT_FAULTS); #ifdef CONFIG_NUMA_BALANCING + count_vm_numa_event(NUMA_HINT_FAULTS); count_memcg_folio_events(folio, NUMA_HINT_FAULTS, 1); -#endif if (folio_nid(folio) =3D=3D numa_node_id()) { count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); *flags |=3D TNF_FAULT_LOCAL; } +#endif =20 return mpol_misplaced(folio, vmf, addr); } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bb37cd1a51d8..eb6c97bccea3 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2727,12 +2727,16 @@ static void sp_free(struct sp_node *n) * mpol_misplaced - check whether current folio node is valid in policy * * @folio: folio to be checked - * @vmf: structure describing the fault + * @vmf: structure describing the fault (NULL if called outside fault path) * @addr: virtual address in @vma for shared policy lookup and interleave = policy + * Ignored if vmf is NULL. * * Lookup current policy node id for vma,addr and "compare to" folio's - * node id. Policy determination "mimics" alloc_page_vma(). - * Called from fault path where we know the vma and faulting address. + * node id - or task's policy node id if vmf is NULL. Policy determination + * "mimics" alloc_page_vma(). + * + * vmf must be non-NULL if called from fault path where we know the vma and + * faulting address. The PTL must be held by caller if vmf is not NULL. * * Return: NUMA_NO_NODE if the page is in a node that is valid for this * policy, or a suitable node ID to allocate a replacement folio from. @@ -2744,7 +2748,6 @@ int mpol_misplaced(struct folio *folio, struct vm_fau= lt *vmf, pgoff_t ilx; struct zoneref *z; int curnid =3D folio_nid(folio); - struct vm_area_struct *vma =3D vmf->vma; int thiscpu =3D raw_smp_processor_id(); int thisnid =3D numa_node_id(); int polnid =3D NUMA_NO_NODE; @@ -2754,18 +2757,24 @@ int mpol_misplaced(struct folio *folio, struct vm_f= ault *vmf, * Make sure ptl is held so that we don't preempt and we * have a stable smp processor id */ - lockdep_assert_held(vmf->ptl); - pol =3D get_vma_policy(vma, addr, folio_order(folio), &ilx); + if (vmf) { + lockdep_assert_held(vmf->ptl); + pol =3D get_vma_policy(vmf->vma, addr, folio_order(folio), &ilx); + } else { + pol =3D get_task_policy(current); + } if (!(pol->flags & MPOL_F_MOF)) goto out; =20 switch (pol->mode) { case MPOL_INTERLEAVE: - polnid =3D interleave_nid(pol, ilx); + polnid =3D vmf ? interleave_nid(pol, ilx) : + interleave_nodes(pol); break; =20 case MPOL_WEIGHTED_INTERLEAVE: - polnid =3D weighted_interleave_nid(pol, ilx); + polnid =3D vmf ? weighted_interleave_nid(pol, ilx) : + weighted_interleave_nodes(pol); break; =20 case MPOL_PREFERRED: --=20 2.43.0 From nobody Sat Feb 7 06:21:44 2026 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FF63200BBC for ; Wed, 27 Nov 2024 16:28:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724929; cv=none; b=W/hDmsoBA6kObdOHCzlBaBN6eVeZ8czQFQL642AjRMAUtE0hysca0AdDSLeQ+6KfuTpl7q+gamh0bTjL+WONrs5hF72Rxvsn6BFLXd8JS7Zt0wnxbw6GHHPOX8gWxQhPhDx7YerQSMr3pEPH8oeO8uu/yKJNrdTOgXEkLLje5zs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724929; c=relaxed/simple; bh=0RcAtz1MZo4W7JBinrRSZMWTgHLFy84GDh0d66nWFPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UgWenqUzMx1wHxFvfGi9jF1BYj75d4JZ3OWbaZvl5PEpBBWm7TwWwNcCRLxgcq74ug26QXWOV/RKTULESSVogSq9N27tUbzCqcxlpvHnNw3/LK89NUJpy7r19jwVTPcQgVnIld7yIbl3KPWJ6VcToPitDf3+IDxvs/u/r1IiWVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=mLMgjsKa; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="mLMgjsKa" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-466966d8dbaso22118891cf.3 for ; Wed, 27 Nov 2024 08:28:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1732724927; x=1733329727; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7qVD2IQ6TUVnIPl8KOFrC6kDkrJSATgnAz7c/+Cb6G4=; b=mLMgjsKapO1NJbkSIh+OjtDzBpHVUfcPNhhe25pSvn/Y5pbGraWslHyAtKWhmlZlWb saKvxwHVJ9z2IkvzQ/REnNsolJ2BvpCo8Rz8rbioylDAPh0tVrBeyYmmsbUzT3KcJwdn rHdTKNGgsdrQi+R9Z81pynpQhvtNnWlKxVsh26TkDXE4lm1RsIGYP2rh5iH3xMRcK3Bu a89Ui9Sz4f7opFS0NtyHbkiKndKLd1TOb3YvKMpQePR1aUfv/nS2qfk+k9wbLnZBF2Jb pynMRi68ghWZUUE7GxTYpg1R2P7wik8B+i6nXIYVKK7NA6MxrhjijWu8ZMzV8wdcKb4W goVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732724927; x=1733329727; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7qVD2IQ6TUVnIPl8KOFrC6kDkrJSATgnAz7c/+Cb6G4=; b=UyChvWp3F7Gh6gMwL14fsqFyfzryhb+4qeEfJbqSEVBN4kQM4MaATLe/hUc8bnbJCA yZUGLOD1h7R2bVzBEWE+HQ9bL0cO5MK6j01mZeQwV/8ZGsVfQnXoQ98xEVJX3paH0Q2N beGdizgRdM3aHlrr5tf4HXutNxe5psYsCZ6YJeUInzsGRPyuaOZgNuN3CLV+V0nz8yN1 Otaonv4koghcytzlZj+sq2MRgpNXBPR1cKjEdiiMNTxJ0zHppCHhYkJoyEdNHWltaz6Y oPuPN65vrhgAgu9VCFIcIK4xGGrzLJhK0RSumsyadue0AOMooIz6aa8qZVYPBlNhi8Zv wh0Q== X-Gm-Message-State: AOJu0YzlLdB0QxoQ4jHmozaLurAsFQ7AQfz6xiVWNUiFEXCCEORb8II9 +h1VZpf+iBOjXHl71ZnDkYBOU5NVOa4nE7Lr7yvNz4OyD7BEgEP5ht9zFjuSWnk= X-Gm-Gg: ASbGncu4qgi7V8odG+awvenSlLKhM1GRy+GnaDS4nQIpoX9SmTTUWzOKX3PRQCYdeUj bznHXyK1n2w+zDfNSutGqq1ki+K7hs1fvGE/pKXQ8gB5/m35OJEreRGzih4tJkcTklk7O0UqlGT HaX8colW/nyt29g9yVmvJc0uudkdcoDthjvsKCeMPzDwe8zoX7LMm/UfBLIrJVDws5ulSD9F3q1 G7sHWRrH2uD6wByxLkAU0W4FRGndbYeW+gxDGJJ+GmGv0zpBxoWcJy/MK1v0DXDvLJ+osJKJPRh EpNqsrMeDIZlLRcdIg4XaaSd3qOVf1K9pxo= X-Google-Smtp-Source: AGHT+IG/ToEOt1Gt6Ry1TUTbihCqZ0qO6hIYx2iJN602ReOwj0JgnK+caF+0walmHaDLqNcTTiZU1A== X-Received: by 2002:ac8:5811:0:b0:460:e9d3:e989 with SMTP id d75a77b69052e-466b34d31c1mr44603091cf.8.1732724927143; Wed, 27 Nov 2024 08:28:47 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466871721c2sm45002921cf.17.2024.11.27.08.28.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2024 08:28:46 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com, gourry@gourry.net, akpm@linux-foundation.org, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com Subject: [PATCH 3/4] vmstat: add page-cache numa hints Date: Wed, 27 Nov 2024 03:22:00 -0500 Message-ID: <20241127082201.1276-4-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241127082201.1276-1-gourry@gourry.net> References: <20241127082201.1276-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Count non-page-fault events as page-cache numa hints instead of fault hints in vmstat. Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch --- include/linux/vm_event_item.h | 2 ++ mm/memory.c | 15 ++++++++++----- mm/vmstat.c | 2 ++ 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index f70d0958095c..9fee15d9ba48 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -63,6 +63,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HUGE_PTE_UPDATES, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, + NUMA_HINT_PAGE_CACHE, + NUMA_HINT_PAGE_CACHE_LOCAL, NUMA_PAGE_MIGRATE, #endif #ifdef CONFIG_MIGRATION diff --git a/mm/memory.c b/mm/memory.c index a373b6ad0b34..35b72a1cfbd5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5507,11 +5507,16 @@ int numa_migrate_check(struct folio *folio, struct = vm_fault *vmf, *last_cpupid =3D folio_last_cpupid(folio); =20 #ifdef CONFIG_NUMA_BALANCING - count_vm_numa_event(NUMA_HINT_FAULTS); - count_memcg_folio_events(folio, NUMA_HINT_FAULTS, 1); - if (folio_nid(folio) =3D=3D numa_node_id()) { - count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); - *flags |=3D TNF_FAULT_LOCAL; + if (vmf) { + count_vm_numa_event(NUMA_HINT_FAULTS); + count_memcg_folio_events(folio, NUMA_HINT_FAULTS, 1); + if (folio_nid(folio) =3D=3D numa_node_id()) { + count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); + *flags |=3D TNF_FAULT_LOCAL; + } + } else { + count_vm_numa_event(NUMA_HINT_PAGE_CACHE); + count_memcg_folio_events(folio, NUMA_HINT_PAGE_CACHE, 1); } #endif =20 diff --git a/mm/vmstat.c b/mm/vmstat.c index 4d016314a56c..bcd9be11e957 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1338,6 +1338,8 @@ const char * const vmstat_text[] =3D { "numa_huge_pte_updates", "numa_hint_faults", "numa_hint_faults_local", + "numa_hint_page_cache", + "numa_hint_page_cache_local", "numa_pages_migrated", #endif #ifdef CONFIG_MIGRATION --=20 2.43.0 From nobody Sat Feb 7 06:21:44 2026 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87DEE2010EF for ; Wed, 27 Nov 2024 16:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724934; cv=none; b=FUPKK8OFN1AFLm31135Q/cBPdSaRuB6seQxLcMBDHvbfxPRcG1LlkkOQSvB0vhbOdTe6HoPexqmbjbDXsqTgTF+H1KlseoJg2elU2MW+2H9kj2k/99cujuLMjM0wb+FxQgTPCy7cxnhSDyMEP7n2c8G7JDqX6WfOMfYToq9Pm3c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732724934; c=relaxed/simple; bh=rQwf8RqJzYNlj8/y1DnOjxcC0DUx5Uhb2h2Z0Nf4WTQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EhuHTfEHvveM50FUerJLzsAXFhqN17JKLGYVjvqY9GLRNmLSNLWXL0S7vSPPyU8BrvGlXKelsyarucXE3ItVTGgjp0mIhB3u3sdcvW19933fEd2mWJ96leuNSbAErfjTA9rOqombdXz0wcZP6a0nVegSflTb49lvq/BltyCR7+s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=Ufx/NU/Y; arc=none smtp.client-ip=209.85.219.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="Ufx/NU/Y" Received: by mail-yb1-f169.google.com with SMTP id 3f1490d57ef6-e396c98af45so312659276.1 for ; Wed, 27 Nov 2024 08:28:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1732724931; x=1733329731; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GlbopFyhIte0FJS10Nwl9pZw748om4b0DMwVNmVhyf8=; b=Ufx/NU/Y1kSQ0OtUo21rvlqQoTlVhqCFXvUVM33Gv474zxK8IaRo5P5wlDXURLktzi mxZDP6Na+hDa6RqMN4RVNVIMSOwGP7OFE/2u1Xp4B6c4hhrZyh9dyXOBZKjUSVH0uw5Z TNhdBKfGD+NKiAbGmBJLjDHdgglgkOLnGevw4kIvxG7OzR5JzhMP4ZZiUo94U0ddVLMV LsV2iCVmWHTjWR3iW0evdKc+zaA8ujywW/B+VABtE6PoQh3GYXejVjOzMRZLOyAad77p qXFwarS2lr5Ab7jZBrIJW8PtIxHunw4uwzG9g28nXhR1KxYOvE6NVDcU60K2EodseYh9 n4ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732724931; x=1733329731; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GlbopFyhIte0FJS10Nwl9pZw748om4b0DMwVNmVhyf8=; b=TonNeHKc6mfDRLS6jYEHUGL0FLNpMTPSKYYTjB1viKGKwLPlLb1RKFraINwDrctoxl azlI67LxwLZaOOw/oHi1ePBgao6RKe5MfmzxZtXfOXlpOBaWZcZ85gKtQtT/nEGP8RYD Z6AYvOePTpESWxklyA3Msl8Rb9X7gYw2ybRn5mHCarEXeGVPs7fiEYck/HrFu6DY0ppT wkiCX1xnFbOC0uGnce7Al1Q4PCb4rFlzCaQ0ZkL4Vk5j808Wb3jA9ujpWpCNgY1MJC7j CpXV2W9pUEMPVaM1v/cXasYhiGMi1CRrIiu7EQuMEZQ0HGzDLfAnt8g8G0362UGYpAdj wwKQ== X-Gm-Message-State: AOJu0YyH5vywF2ZlOv6uRFFtNUn2BHssGVHG2LnHOtoLHF+/umIuPm7G Lkmb/ssfiN6ZsWV0HdgkIJbab2lmL1GWLbfzmAQmXphRved2ozocTRwzOb497fA= X-Gm-Gg: ASbGncsDnQ4eNwBl9kFDhJRQT1fNF+C3VnlZDDFlvTkIN5hX/Ygms2Z/JgMDs0k0t74 CBssYMFuauh2yy4WJ+6xzpY1Sifw+XPbTvdEjOUpmvDdK2Sf6nMBOkmVAiuVOwpTnX7isBD2JaX C4ITwZj7rwMXDudldb9eSgEe3XX4rZ82PJHMSdZQgXSIgtWvWq+ezamSqrGem/oY8ENkB5Mk+N2 23tJJqFOShDOu10CF+g1eqkTz2o08Tj7k9sDV3XmJvwZ6L4ah/qB8iYg3LxT0PGISCJWjaTkEhd zdrCRXhI3/b0JKPrcdl+MqZ883H+LoLT0QY= X-Google-Smtp-Source: AGHT+IETWCkWSJX84MryQbzWHKT6Q0hr6mgUPEZosVDmCXxxGPnFDOtDGJLEv5u3stNIzDvnJgD8kg== X-Received: by 2002:a05:6902:705:b0:e38:901b:602d with SMTP id 3f1490d57ef6-e395b889348mr3092292276.9.1732724931460; Wed, 27 Nov 2024 08:28:51 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-466871721c2sm45002921cf.17.2024.11.27.08.28.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Nov 2024 08:28:51 -0800 (PST) From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, nehagholkar@meta.com, abhishekd@meta.com, kernel-team@meta.com, david@redhat.com, ying.huang@intel.com, nphamcs@gmail.com, gourry@gourry.net, akpm@linux-foundation.org, hannes@cmpxchg.org, feng.tang@intel.com, kbusch@meta.com Subject: [PATCH 4/4] migrate,sysfs: add pagecache promotion Date: Wed, 27 Nov 2024 03:22:01 -0500 Message-ID: <20241127082201.1276-5-gourry@gourry.net> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241127082201.1276-1-gourry@gourry.net> References: <20241127082201.1276-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" adds /sys/kernel/mm/numa/pagecache_promotion_enabled When page cache lands on lower tiers, there is no way for promotion to occur unless it becomes memory-mapped and exposed to NUMA hint faults. Just adding a mechanism to promote pages unconditionally, however, opens up significant possibility of performance regressions. Similar to the `demotion_enabled` sysfs entry, provide a sysfs toggle to enable and disable page cache promotion. This option will enable opportunistic promotion of unmapped page cache during syscall access. This option is intended for operational conditions where demoted page cache will eventually contain memory which becomes hot - and where said memory likely to cause performance issues due to being trapped on the lower tier of memory. A Page Cache folio is considered a promotion candidates when: 0) tiering and pagecache-promotion are enabled 1) the folio reside on a node not in the top tier 2) the folio is already marked referenced and active. 3) Multiple accesses in (referenced & active) state occur quickly. Since promotion is not safe to execute unconditionally from within folio_mark_accessed, we defer promotion to a new task_work captured in the task_struct. This ensures that the task doing the access has some hand in promoting pages - even among deduplicated read only files. We use numa_hint_fault_latency to help identify when a folio is accessed multiple times in a short period. Along with folio flag checks, this helps us minimize promoting pages on the first few accesses. The promotion node is always the local node of the promoting cpu. Suggested-by: Johannes Weiner Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Keith Busch --- .../ABI/testing/sysfs-kernel-mm-numa | 20 +++++++ include/linux/memory-tiers.h | 2 + include/linux/migrate.h | 4 ++ include/linux/sched.h | 3 + include/linux/sched/numa_balancing.h | 5 ++ init/init_task.c | 1 + kernel/sched/fair.c | 26 ++++++++- mm/memory-tiers.c | 27 +++++++++ mm/migrate.c | 56 +++++++++++++++++++ mm/swap.c | 3 + 10 files changed, 146 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation= /ABI/testing/sysfs-kernel-mm-numa index 77e559d4ed80..b846e7d80cba 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa @@ -22,3 +22,23 @@ Description: Enable/disable demoting pages during reclaim the guarantees of cpusets. This should not be enabled on systems which need strict cpuset location guarantees. + +What: /sys/kernel/mm/numa/pagecache_promotion_enabled +Date: November 2024 +Contact: Linux memory management mailing list +Description: Enable/disable promoting pages during file access + + Page migration during file access is intended for systems + with tiered memory configurations that have significant + unmapped file cache usage. By default, file cache memory + on slower tiers will not be opportunistically promoted by + normal NUMA hint faults, because the system has no way to + track them. This option enables opportunistic promotion + of pages that are accessed via syscall (e.g. read/write) + if multiple accesses occur in quick succession. + + It may move data to a NUMA node that does not fall into + the cpuset of the allocating process which might be + construed to violate the guarantees of cpusets. This + should not be enabled on systems which need strict cpuset + location guarantees. diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 0dc0cf2863e2..fa96a67b8996 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -37,6 +37,7 @@ struct access_coordinate; =20 #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; +extern bool numa_pagecache_promotion_enabled; extern struct memory_dev_type *default_dram_type; extern nodemask_t default_dram_nodes; struct memory_dev_type *alloc_memory_type(int adistance); @@ -76,6 +77,7 @@ static inline bool node_is_toptier(int node) #else =20 #define numa_demotion_enabled false +#define numa_pagecache_promotion_enabled false #define default_dram_type NULL #define default_dram_nodes NODE_MASK_NONE /* diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 002e49b2ebd9..c288c16b1311 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -146,6 +146,7 @@ int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node); int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vm= a, int node); +void promotion_candidate(struct folio *folio); #else static inline int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) @@ -157,6 +158,9 @@ static inline int migrate_misplaced_folio(struct folio = *folio, { return -EAGAIN; /* can't migrate now */ } +static inline void promotion_candidate(struct folio *folio) +{ +} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_MIGRATION diff --git a/include/linux/sched.h b/include/linux/sched.h index bb343136ddd0..8ddd4986e57f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1353,6 +1353,9 @@ struct task_struct { unsigned long numa_faults_locality[3]; =20 unsigned long numa_pages_migrated; + + struct callback_head numa_promo_work; + struct list_head promo_list; #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_RSEQ diff --git a/include/linux/sched/numa_balancing.h b/include/linux/sched/num= a_balancing.h index 52b22c5c396d..cc7750d754ff 100644 --- a/include/linux/sched/numa_balancing.h +++ b/include/linux/sched/numa_balancing.h @@ -32,6 +32,7 @@ extern void set_numabalancing_state(bool enabled); extern void task_numa_free(struct task_struct *p, bool final); bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio, int src_nid, int dst_cpu); +int numa_hint_fault_latency(struct folio *folio); #else static inline void task_numa_fault(int last_node, int node, int pages, int flags) @@ -52,6 +53,10 @@ static inline bool should_numa_migrate_memory(struct tas= k_struct *p, { return true; } +static inline int numa_hint_fault_latency(struct folio *folio) +{ + return 0; +} #endif =20 #endif /* _LINUX_SCHED_NUMA_BALANCING_H */ diff --git a/init/init_task.c b/init/init_task.c index 136a8231355a..ee33e508067e 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -186,6 +186,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .numa_preferred_nid =3D NUMA_NO_NODE, .numa_group =3D NULL, .numa_faults =3D NULL, + .promo_list =3D LIST_HEAD_INIT(init_task.promo_list), #endif #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) .kasan_depth =3D 1, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2d16c8545c71..34d66faa50f9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -42,6 +42,7 @@ #include #include #include +#include #include #include #include @@ -1842,7 +1843,7 @@ static bool pgdat_free_space_enough(struct pglist_dat= a *pgdat) * The smaller the hint page fault latency, the higher the possibility * for the page to be hot. */ -static int numa_hint_fault_latency(struct folio *folio) +int numa_hint_fault_latency(struct folio *folio) { int last_time, time; =20 @@ -3528,6 +3529,27 @@ static void task_numa_work(struct callback_head *wor= k) } } =20 +static void task_numa_promotion_work(struct callback_head *work) +{ + struct task_struct *p =3D current; + struct list_head *promo_list =3D &p->promo_list; + struct folio *folio, *tmp; + int nid =3D numa_node_id(); + + SCHED_WARN_ON(p !=3D container_of(work, struct task_struct, numa_promo_wo= rk)); + + work->next =3D work; + + if (list_empty(promo_list)) + return; + + list_for_each_entry_safe(folio, tmp, promo_list, lru) { + list_del_init(&folio->lru); + migrate_misplaced_folio(folio, NULL, nid); + } +} + + void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) { int mm_users =3D 0; @@ -3552,8 +3574,10 @@ void init_numa_balancing(unsigned long clone_flags, = struct task_struct *p) RCU_INIT_POINTER(p->numa_group, NULL); p->last_task_numa_placement =3D 0; p->last_sum_exec_runtime =3D 0; + INIT_LIST_HEAD(&p->promo_list); =20 init_task_work(&p->numa_work, task_numa_work); + init_task_work(&p->numa_promo_work, task_numa_promotion_work); =20 /* New address space, reset the preferred nid */ if (!(clone_flags & CLONE_VM)) { diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..4c44598e485e 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -935,6 +935,7 @@ static int __init memory_tier_init(void) subsys_initcall(memory_tier_init); =20 bool numa_demotion_enabled =3D false; +bool numa_pagecache_promotion_enabled; =20 #ifdef CONFIG_MIGRATION #ifdef CONFIG_SYSFS @@ -957,11 +958,37 @@ static ssize_t demotion_enabled_store(struct kobject = *kobj, return count; } =20 +static ssize_t pagecache_promotion_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "%s\n", + numa_pagecache_promotion_enabled ? "true" : "false"); +} + +static ssize_t pagecache_promotion_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + + ret =3D kstrtobool(buf, &numa_pagecache_promotion_enabled); + if (ret) + return ret; + + return count; +} + + static struct kobj_attribute numa_demotion_enabled_attr =3D __ATTR_RW(demotion_enabled); =20 +static struct kobj_attribute numa_pagecache_promotion_enabled_attr =3D + __ATTR_RW(pagecache_promotion_enabled); + static struct attribute *numa_attrs[] =3D { &numa_demotion_enabled_attr.attr, + &numa_pagecache_promotion_enabled_attr.attr, NULL, }; =20 diff --git a/mm/migrate.c b/mm/migrate.c index 3b0bd3f21ac3..2cd9faed6ab8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include =20 #include =20 @@ -2711,5 +2713,59 @@ int migrate_misplaced_folio(struct folio *folio, str= uct vm_area_struct *vma, BUG_ON(!list_empty(&migratepages)); return nr_remaining ? -EAGAIN : 0; } + +/** + * promotion_candidate() - report a promotion candidate folio + * + * @folio: The folio reported as a candidate + * + * Records folio access time and places the folio on the task promotion li= st + * if access time is less than the threshold. The folio will be isolated f= rom + * LRU if selected, and task_work will putback the folio on promotion fail= ure. + * + * Takes a folio reference that will be released in task work. + */ +void promotion_candidate(struct folio *folio) +{ + struct task_struct *task =3D current; + struct list_head *promo_list =3D &task->promo_list; + struct callback_head *work =3D &task->numa_promo_work; + struct address_space *mapping =3D folio_mapping(folio); + bool write =3D mapping ? mapping->gfp_mask & __GFP_WRITE : false; + int nid =3D folio_nid(folio); + int flags, last_cpupid; + + /* + * Only do this work if: + * 1) tiering and pagecache promotion are enabled + * 2) the page can actually be promoted + * 3) The hint-fault latency is relatively hot + * 4) the folio is not already isolated + * 5) This is not a kernel thread context + */ + if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) || + !numa_pagecache_promotion_enabled || + node_is_toptier(nid) || + numa_hint_fault_latency(folio) >=3D PAGE_ACCESS_TIME_MASK || + folio_test_isolated(folio) || + (current->flags & PF_KTHREAD)) { + return; + } + + nid =3D numa_migrate_check(folio, NULL, 0, &flags, write, &last_cpupid); + if (nid =3D=3D NUMA_NO_NODE) + return; + + if (migrate_misplaced_folio_prepare(folio, NULL, nid)) + return; + + /* Ensure task can schedule work, otherwise we'll leak folios */ + if (list_empty(promo_list) && task_work_add(task, work, TWA_RESUME)) { + folio_putback_lru(folio); + return; + } + list_add(&folio->lru, promo_list); +} +EXPORT_SYMBOL(promotion_candidate); #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ diff --git a/mm/swap.c b/mm/swap.c index 10decd9dffa1..9cf4c1f73fe5 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -453,6 +454,8 @@ void folio_mark_accessed(struct folio *folio) __lru_cache_activate_folio(folio); folio_clear_referenced(folio); workingset_activation(folio); + } else { + promotion_candidate(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); --=20 2.43.0