From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 52DCF29DB7D for ; Fri, 11 Apr 2025 22:11:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409493; cv=none; b=KdDe6rw01jb6Jgg7sXFHxJSMkZVKnQtFEdcFH3e6uTOu4VtccAet6/0H/ivxRq4wr9wt9pKb85GuTboUr9dAW6uAm1qLni+shvyo3Mkihz5YZB9AF8OWFzhW88yWFwhfRmREBnv3z5gKralG1nPAGeHivjSEhC1BGL6Ml5kRp9I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409493; c=relaxed/simple; bh=ewxzTYz7FpHsua4MPn9Pv3cPqlKa0+soWzYQ4CxwUFo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L2UARzdCU1huJF12ZSDSJ5bpuBOKHnNW0ylKZ48mLv443SLxD6vK7tN4QxJuFJq1L5TRvGuI7X7vzoW8sIfUt9nPljvrJEvzhH+4V4T1pHV33SlaKyg/U1zxlLd8V2Sue00sl2lLITWBl3oFxGY58W27pGmbjSLdnwdMLZYULDU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=t+BtqV8g; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="t+BtqV8g" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-7c5e1b40f68so260266185a.1 for ; Fri, 11 Apr 2025 15:11:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409490; x=1745014290; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4knjTgfLUxexllxquvgcrExMdcCrRUJpeZqsZVIgbfQ=; b=t+BtqV8gRJZQaaZdxNACxutghKtvSgfTfabl9T89dypDzxrvW1WSxbgX2Ss1z2Ye/e o0eBN8QwpmBT4nSdTooH9TGRtrEBLad/tZt/fAj/NkrbwcAbJNgQgwK+dF8qKea21c1/ 3Jr+BZk+Qj8JGtyd1DCDnH0ZavW2bS7gM3Fac7K+3Uu7aRu24v70x0IZ+U9dQTdqR1hB 8nc15RJo9MLsLATOwSPE3jhXPJGVNuzwnKXJeL9U2RvOC+Bh9pt4RACjQbI4HRlNovRn 6/s9CdXg9mYzhnDwZY10DWf2zveU+2HIC71sbbVcwSubBVRcnSwEfYnznbOl6Zm90SGW Jnuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409490; x=1745014290; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4knjTgfLUxexllxquvgcrExMdcCrRUJpeZqsZVIgbfQ=; b=jkHcJPom133D6X/VDHfsi9DJc0R2jj47nXSRWneoMJSSpHFirDcMC1v6ua2MV+6a1/ YiJxGPz/0jGqFY83OOiLzhIPUjjnrq+pXb8EAHmjEsiFGFsuBO1CcAqZEWwoRWppGcQ2 FUT/aXes/DemvNDJzAYedUBwI1hRFcTad3T20pAYwoC7W3UnxXVfqkyAKiyIU7BRi1Q5 aeBvQhYwq5uAeAo7xLkiQltlWyaR+8V+ybEd6rxaC70UREBE9Wfrs6skljiKL3stQHXB KUvPLl1F5l1J4ng5fZ2FaiSEnKBw2DQG+keNWoVIsanmVV8q1pzBypkzKb2rTDINnHYI O36A== X-Forwarded-Encrypted: i=1; AJvYcCVnivXqlbpP0yhm9Kcl1ka1zSByk/LKacPvUUjT79bmlgkIiXj1TUNETC9vdG2YysUpW/RAv78tqKqbBy8=@vger.kernel.org X-Gm-Message-State: AOJu0YwOO/39b4H9mi3mJYkW1Jg66FtFlS4EAribb+sZDcYFYRd3+IQJ rBY0XB3Gl6BVlhMqc16XHxaHJbNGLnQRgaUnPRFO0eu0ZmYh8QdOHp2PLaotIi0= X-Gm-Gg: ASbGncuaiX4Dnmn0mPLJpGYbS0B3g/P/6QE4wp3ybPe5QmhGVhhTYDfpIkFs0FXS+3q XezO/qAAFcKKm2FtSMkvZyKgbUtwVsjqWNjQCythCsoEtsP28ltXgvoV++i12Ns+ilDmv96xbhX eRQO+SZ/FTSQTxWgjHnlMEn/Rg8RXnCba5ICAY6heWIIGkVdWTTC2lSC+aHAdCK+6SPFTUcPiA1 gmR/qeO73c5fbOcXrHAm5AEJWtKgthop4XizPl5vfQIc14dJLxnkLvZDOzOADdCa3sxCbSWq97i FNe7pw1Z2gIx6cfTwmKyrTltFFlI32v3/qT4Ysq4zfhR5FFcABfPWNRry3F0BOjY3LpBStlat6S wTVjI937IF8Zlac7uzIJd84xNHS0Z X-Google-Smtp-Source: AGHT+IFPrHSgjpnA70GEzyNSzThI5+wEmoFgLTgJqJozp7Gixhu+QP1bFfbShzpASwZwh2CD62666w== X-Received: by 2002:a05:620a:4405:b0:7c5:5768:40b9 with SMTP id af79cd13be357-7c7af1cabd4mr698339585a.43.1744409489819; Fri, 11 Apr 2025 15:11:29 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:29 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 1/6] migrate: Allow migrate_misplaced_folio_prepare() to accept a NULL VMA. Date: Fri, 11 Apr 2025 18:11:06 -0400 Message-ID: <20250411221111.493193-2-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migrate_misplaced_folio_prepare() may be called on a folio without a VMA, and so it must be made to accept a NULL VMA. Suggested-by: Johannes Weiner Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Keith Busch Tested-by: Neha Gholkar --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index f3ee6d8d5e2e..047131f6c839 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2654,7 +2654,7 @@ int migrate_misplaced_folio_prepare(struct folio *fol= io, * See folio_maybe_mapped_shared() on possible imprecision * when we cannot easily detect if a folio is shared. */ - if ((vma->vm_flags & VM_EXEC) && folio_maybe_mapped_shared(folio)) + if (vma && (vma->vm_flags & VM_EXEC) && folio_maybe_mapped_shared(folio)) return -EACCES; =20 /* --=20 2.49.0 From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E0B129DB7B for ; Fri, 11 Apr 2025 22:11:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409494; cv=none; b=s8WGrnl8zguFMC5gxf1pvJ2QjMXmWIjDovjOEueF87E/i+0NtfFHpM1jzMkvYHH91K9HkJ8AsNP7lj6eBf96iJRPipG2yMYxkeY39mbpOtW7pK8yt0ZhQ3p1dQ9D6tyM8ZzSst9oK0M0qHL9aJlmZwIySYCSjeqk/zScmQ4mZMY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409494; c=relaxed/simple; bh=nfrDkSS69ALMZvuAOcuriMpQshAiIZWjYt9NVykTxFM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SBmi771JxXdmAHTV2bBEBZSDNLCuuVtcaB9ZcwtuBlxTdfn2rCtDlA/Z05lZGhQ35ZKTh5ZBtIhhm8W5ikFmo3qQYtzZEKOkBYaTrJ88rt9m80s6oJ7VZWC+kRY9X0m+fMeGHDbJNEuWhHFYRjL/dCCWPx46xfGiS/nEAXpQwFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=lqT08Vno; arc=none smtp.client-ip=209.85.222.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="lqT08Vno" Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-7c5e39d1e0eso238298885a.1 for ; Fri, 11 Apr 2025 15:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409491; x=1745014291; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y5YNFEpOhSyGeshVOVn0cluqTsil4Fjix8dEWaIhL+s=; b=lqT08VnoXt7+pyAQBqERlWWAI22fbZhAu1YJvKawKsB02iLvVPV94c0xoenZJY4epY feCaN2t7E8UzqcWUKAa4ZKdMV9/QJEwyhqmI1A0RFSvxgWhWtSqSM0oq7NoldJVLA1wy tVgyPNamg3oDCooJRCSq+9lTTZTpWDAy2nL8K3HIMBc6MMnN9DwxY9shFZqcfhmsHBQH 3CsiFGGB7CpSCkAaoMNytbbhDff6OGpLJ2x7o9wwnuxPnnGDOOghufLRSZpITLt1OPBD val5rPhb5W4K/hXtzGIdXRELA6quueMhiAHCcsbzyvkNkdjRJ5SMrX9hFDWU6OsE1t+i AMUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409491; x=1745014291; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y5YNFEpOhSyGeshVOVn0cluqTsil4Fjix8dEWaIhL+s=; b=dHjDp6KYE6wKm4U3PU2HdeEzIyYj36Y6HpcIl79oawpwJRPwZfjH18Mh9rLODlmaxa xrx0qCmWiTej5e9rQpDSDkDLsTAGmWM30/CC4uLn32mf4L/snoYWiEiOXHVBjlTHl9Ml si8slTolmepmkWH1gNECrACbKV1DB0bjBakO0dA5kE4cVbdsME37OHoJ/WiAHzfnB8Cl bNplEdWX34v9Jbtg77aZWMaHNLZE/bke9mHuaKSl+ElCfBCsy6EyxG83U3bnBG/F+8c3 2sQVhXOuQpAaBmHZac3rM8kWSTJJvkfIZ5bQEDzl/AiFGEqd+r/gfSePHA7IFZK/2R2B kasA== X-Forwarded-Encrypted: i=1; AJvYcCVmfs6NBEe3L4gAMIVNCaVbIlDhNA0R1Gj6TgY6G7UJmjIWEkK315CBSPkkAnlqRqBYdb9taRHoUrt1/R8=@vger.kernel.org X-Gm-Message-State: AOJu0YzhShMRIIeY/Y80ZA1xs5Y8wI76ZulOrBXrX4neS41v3sGfdU+B IBQhgLSO+tKcHVPhSCAlsALv7vpev4eWO8BVHdD/WSzgAdxJgCFD3eUIs6aEc54= X-Gm-Gg: ASbGncuVHOkCd0BevKyVvto3cl+9+zGIoVUDllZuCHlSYYqvVVfAgN/TTPxs2beTIX1 HYdjaekQDjWymIyBx1i75zFFGvFKplJysmyS2taxjcEl8aeiXBmNjlqmbyfKJEQE8szua21KW01 BrERsNkg2NHWQ7Z29EpFUhKqkN2O3BWoMarqRDfGwTL2jHNW/JQucuQ3uMkNek/SPrO928v3J/B LyhJ8hytf/eDfeIuI+RsbvSvVxEAgjzckG60Iwyk95VEYFREF90ZsKCQ+i4qC4/E0tz3cY4I3Zu 045xm81Fw1kzbUGHtopx6cxJXAAMdYeQfcss8040mqXxgkm2wQwV+XE8wjTtHbCvjAGejM6UAC/ xusLF71BMCf8gFajesFOQ+Urpm69Z X-Google-Smtp-Source: AGHT+IG9qHcOzeGKgfSIUSMZuVF3zPqI8cqev16IPbyVhxpBuj3GWu8bqjSiMRFJ1sRmmBEkXIy2mQ== X-Received: by 2002:a05:620a:4492:b0:7c7:a554:e2a5 with SMTP id af79cd13be357-7c7af1e700amr682491785a.44.1744409491351; Fri, 11 Apr 2025 15:11:31 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:31 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 2/6] memory: allow non-fault migration in numa_migrate_check path Date: Fri, 11 Apr 2025 18:11:07 -0400 Message-ID: <20250411221111.493193-3-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" numa_migrate_check and mpol_misplaced presume callers are in the fault path with accessed to a VMA. To enable migrations from page cache, re-using the same logic to handle migration prep is preferable. Mildly refactor numa_migrate_check and mpol_misplaced so that they may be called with (vmf =3D NULL) from non-faulting paths. Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch Tested-by: Neha Gholkar --- mm/memory.c | 24 ++++++++++++++---------- mm/mempolicy.c | 25 +++++++++++++++++-------- 2 files changed, 31 insertions(+), 18 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3900225d99c5..e72b0d8df647 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5665,7 +5665,20 @@ int numa_migrate_check(struct folio *folio, struct v= m_fault *vmf, unsigned long addr, int *flags, bool writable, int *last_cpupid) { - struct vm_area_struct *vma =3D vmf->vma; + if (vmf) { + struct vm_area_struct *vma =3D vmf->vma; + const vm_flags_t vmflags =3D vma->vm_flags; + + /* + * Flag if the folio is shared between multiple address spaces. This + * is later used when determining whether to group tasks together + */ + if (folio_maybe_mapped_shared(folio) && (vmflags & VM_SHARED)) + *flags |=3D TNF_SHARED; + + /* Record the current PID acceesing VMA */ + vma_set_access_pid_bit(vma); + } =20 /* * Avoid grouping on RO pages in general. RO pages shouldn't hurt as @@ -5678,12 +5691,6 @@ int numa_migrate_check(struct folio *folio, struct v= m_fault *vmf, if (!writable) *flags |=3D TNF_NO_GROUP; =20 - /* - * Flag if the folio is shared between multiple address spaces. This - * is later used when determining whether to group tasks together - */ - if (folio_maybe_mapped_shared(folio) && (vma->vm_flags & VM_SHARED)) - *flags |=3D TNF_SHARED; /* * For memory tiering mode, cpupid of slow memory page is used * to record page access time. So use default value. @@ -5693,9 +5700,6 @@ int numa_migrate_check(struct folio *folio, struct vm= _fault *vmf, else *last_cpupid =3D folio_last_cpupid(folio); =20 - /* Record the current PID acceesing VMA */ - vma_set_access_pid_bit(vma); - count_vm_numa_event(NUMA_HINT_FAULTS); #ifdef CONFIG_NUMA_BALANCING count_memcg_folio_events(folio, NUMA_HINT_FAULTS, 1); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 530e71fe9147..f86a4a9087f4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2747,12 +2747,16 @@ static void sp_free(struct sp_node *n) * mpol_misplaced - check whether current folio node is valid in policy * * @folio: folio to be checked - * @vmf: structure describing the fault + * @vmf: structure describing the fault (NULL if called outside fault path) * @addr: virtual address in @vma for shared policy lookup and interleave = policy + * Ignored if vmf is NULL. * * Lookup current policy node id for vma,addr and "compare to" folio's - * node id. Policy determination "mimics" alloc_page_vma(). - * Called from fault path where we know the vma and faulting address. + * node id - or task's policy node id if vmf is NULL. Policy determination + * "mimics" alloc_page_vma(). + * + * vmf must be non-NULL if called from fault path where we know the vma and + * faulting address. The PTL must be held by caller if vmf is not NULL. * * Return: NUMA_NO_NODE if the page is in a node that is valid for this * policy, or a suitable node ID to allocate a replacement folio from. @@ -2764,7 +2768,6 @@ int mpol_misplaced(struct folio *folio, struct vm_fau= lt *vmf, pgoff_t ilx; struct zoneref *z; int curnid =3D folio_nid(folio); - struct vm_area_struct *vma =3D vmf->vma; int thiscpu =3D raw_smp_processor_id(); int thisnid =3D numa_node_id(); int polnid =3D NUMA_NO_NODE; @@ -2774,18 +2777,24 @@ int mpol_misplaced(struct folio *folio, struct vm_f= ault *vmf, * Make sure ptl is held so that we don't preempt and we * have a stable smp processor id */ - lockdep_assert_held(vmf->ptl); - pol =3D get_vma_policy(vma, addr, folio_order(folio), &ilx); + if (vmf) { + lockdep_assert_held(vmf->ptl); + pol =3D get_vma_policy(vmf->vma, addr, folio_order(folio), &ilx); + } else { + pol =3D get_task_policy(current); + } if (!(pol->flags & MPOL_F_MOF)) goto out; =20 switch (pol->mode) { case MPOL_INTERLEAVE: - polnid =3D interleave_nid(pol, ilx); + polnid =3D vmf ? interleave_nid(pol, ilx) : + interleave_nodes(pol); break; =20 case MPOL_WEIGHTED_INTERLEAVE: - polnid =3D weighted_interleave_nid(pol, ilx); + polnid =3D vmf ? weighted_interleave_nid(pol, ilx) : + weighted_interleave_nodes(pol); break; =20 case MPOL_PREFERRED: --=20 2.49.0 From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4964F29DB60 for ; Fri, 11 Apr 2025 22:11:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409495; cv=none; b=m3GnmD8x+tChTDOO8Zf7UqTTA+cRMd+UunAzFdUF/HEV22hWGBRHqApY69hx37j4ILxfGJFIEarpf7tSVlG6KMVM+QH7lDVOBC3Z20rLAIAwr1QTFVBn8ZMC7zqSmpaO3gBglutSJlauZCDh4whmFdrEmUDGAz+8EnFoOLkYR/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409495; c=relaxed/simple; bh=kJWlhdTJTSNuYWImWmYIv1Uf59B3tlS7eTrMHq4Ex/0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VmSzlh+X3DHvIlSor5QekipbpK6CvTqZS5COnoQyZrJLlt9bOG0ANoRWEt376dI4wdKG9bpdWBqE1oMujpku0q9TYoos2B88gqQkKTU8ZwIDfCSUHJNbyrgHTSCVUHk11Q+9ovp/itXi7RvngaMbi/jmL2EaFSZ/nvbG5PXCsQs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=TdGW12g3; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="TdGW12g3" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7c5e2fe5f17so242256285a.3 for ; Fri, 11 Apr 2025 15:11:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409493; x=1745014293; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zIYGb9o3ZNMSFtQQpPHltGF8QRH6KOuBpYhDX4oYg1I=; b=TdGW12g30q3/AQsNCcS212qsR0uh5AW+C+WCWm3+v8St5raVfGJt98eir+FyBikS07 2Rmu+Tw94+2SVk4r2ip0QSNxt4iucGxAN05XuVNqVl8okhCkztRCoGPMV6Z9I+sBQ9PN oFip9oo07UCfEP4ifuIv0N1Cacg+WJJweTHASNSaWCnvsTFWfadzd5T+mWYIlp0Ksset eZiw0iP91jAF3I8JMEdbyFNBF0fBanLUNIcJcq4HJtTV9fW0N/PQ3raNve6TB3057/5S wiswEzVz734cpP0VDQ2vFwjK0PS9dstkYXvtgS47/Z22eRLrwOXU0UQn60/qUJq8CLhr 1kjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409493; x=1745014293; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zIYGb9o3ZNMSFtQQpPHltGF8QRH6KOuBpYhDX4oYg1I=; b=vj87cSJIca/ZnKGy2jaqPi0rzGN7KBDPWQthg0mX2IMJkS9LiRpdykWTjd6C+kTUU/ HM5iw86G+p1CbE7Kp97FXKhYmRAi3WArmVSB4K23uPe2aCZzjj10hiOyy2vN0Nv5Y/WS i4/0ANGrNiGujCazi8bxGoQGmA5Qx2LBCSmKynZ8wwQSDACFyZzBf1de+VKDMcL4rcLx XwaP0CfUWUrTPMnSZajelooqLtCDRikrngYijugstyRu3gYKISC6PzOf2J7zL445LiGB RCywKm36eJoJ7nhcCZdfb3JBsLgxdGXLpTdcTjM3nw0pzTYK1oTd3H9QvnhTWfQ1mwTD ucUA== X-Forwarded-Encrypted: i=1; AJvYcCXaPoN9xizv1GplC2JTnM/RKYhl14535gBoq+AWMUP8yodjftHUFc5ZDBw1sFpdBxNjaiWKFxmNDA35q6E=@vger.kernel.org X-Gm-Message-State: AOJu0YxZp1bh+FyKoIE1pbFakmhvbccAoslPQoq6AA5Y0YMxas4NP2hn Oy3HtFV6EE2r7GzwwmUDQPn4bupjA9iZKMVtDG+RHbH6SDYiJhwV5KHL9nBfTmE= X-Gm-Gg: ASbGncswILbZX6I0FDYdy3Ws5UlFgqLZ4+rpp0NvIKG/xxSnPxp+PHEaQlJiA73oK5d bMbpbssZymSNZnU1+grwjbe8Zqt+d5Mlob+MaUY7km5TLI3tDqciHUX9Ibf3dvuija3WUDyFkxa FeQLGqet/NpqTxI2789KfP9qpAVgMYWeqHIzmUwgl7cWWfKFBGBPVtei1Uz3NB4HFxfGnKch6Ex etPqfISAxItN8Me5ZCldqnbZfGYODXrv6mCh1Wc61PMGWaZCVRK1A79q9Hfs9OXGIyp0n3/dVEf YKENG0Lhue68TjF6+CsnmReLC2HkdZOu8EK9hE839XRMgxgB0fsdFYNczwPvnW7gwI0bYO3dAJg cd5fxLNW69KniTUNQeJ8R2Z9igM86 X-Google-Smtp-Source: AGHT+IFv83+YHnaxvDRk0rvRzSWriqasX5eo6zDdijLVMUUFZ6zWAcypdJ15PvWFrs2NGam5+oBtYQ== X-Received: by 2002:a05:620a:2906:b0:7c5:5692:ee95 with SMTP id af79cd13be357-7c7af12ae32mr620115585a.51.1744409492937; Fri, 11 Apr 2025 15:11:32 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:32 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 3/6] vmstat: add page-cache numa hints Date: Fri, 11 Apr 2025 18:11:08 -0400 Message-ID: <20250411221111.493193-4-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Count non-page-fault events as page-cache numa hints instead of fault hints in vmstat. Add a define to select the hint type to keep the code clean. Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch Tested-by: Neha Gholkar --- include/linux/vm_event_item.h | 8 ++++++++ mm/memcontrol.c | 1 + mm/memory.c | 6 +++--- mm/vmstat.c | 2 ++ 4 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index f11b6fa9c5b3..fa66d784c9ec 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -65,6 +65,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NUMA_HUGE_PTE_UPDATES, NUMA_HINT_FAULTS, NUMA_HINT_FAULTS_LOCAL, + NUMA_HINT_PAGE_CACHE, + NUMA_HINT_PAGE_CACHE_LOCAL, NUMA_PAGE_MIGRATE, #endif #ifdef CONFIG_MIGRATION @@ -187,6 +189,12 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NR_VM_EVENT_ITEMS }; =20 +#ifdef CONFIG_NUMA_BALANCING +#define NUMA_HINT_TYPE(vmf) (vmf ? NUMA_HINT_FAULTS : NUMA_HINT_PAGE_CACHE) +#define NUMA_HINT_TYPE_LOCAL(vmf) (vmf ? NUMA_HINT_FAULTS_LOCAL : \ + NUMA_HINT_PAGE_CACHE_LOCAL) +#endif + #ifndef CONFIG_TRANSPARENT_HUGEPAGE #define THP_FILE_ALLOC ({ BUILD_BUG(); 0; }) #define THP_FILE_FALLBACK ({ BUILD_BUG(); 0; }) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 40c07b8699ae..d50f7522863c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -463,6 +463,7 @@ static const unsigned int memcg_vm_event_stat[] =3D { NUMA_PAGE_MIGRATE, NUMA_PTE_UPDATES, NUMA_HINT_FAULTS, + NUMA_HINT_PAGE_CACHE, #endif }; =20 diff --git a/mm/memory.c b/mm/memory.c index e72b0d8df647..8d3257ee9ab1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5700,12 +5700,12 @@ int numa_migrate_check(struct folio *folio, struct = vm_fault *vmf, else *last_cpupid =3D folio_last_cpupid(folio); =20 - count_vm_numa_event(NUMA_HINT_FAULTS); + count_vm_numa_event(NUMA_HINT_TYPE(vmf)); #ifdef CONFIG_NUMA_BALANCING - count_memcg_folio_events(folio, NUMA_HINT_FAULTS, 1); + count_memcg_folio_events(folio, NUMA_HINT_TYPE(vmf), 1); #endif if (folio_nid(folio) =3D=3D numa_node_id()) { - count_vm_numa_event(NUMA_HINT_FAULTS_LOCAL); + count_vm_numa_event(NUMA_HINT_TYPE_LOCAL(vmf)); *flags |=3D TNF_FAULT_LOCAL; } =20 diff --git a/mm/vmstat.c b/mm/vmstat.c index ab5c840941f3..0f1cc0f2c68f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1343,6 +1343,8 @@ const char * const vmstat_text[] =3D { "numa_huge_pte_updates", "numa_hint_faults", "numa_hint_faults_local", + "numa_hint_page_cache", + "numa_hint_page_cache_local", "numa_pages_migrated", #endif #ifdef CONFIG_MIGRATION --=20 2.49.0 From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f177.google.com (mail-qk1-f177.google.com [209.85.222.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F8A02BD597 for ; Fri, 11 Apr 2025 22:11:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409497; cv=none; b=jebNHZEWdgm4Qpjq+rdtTu9E7edoyWP0UVxdLYtidGgde9qXk7fr/nBqoz/4INcsJ8KNydYuW1TuKIwxlKEVgxpU6driceAWVWFdIh0ojQJhLclBqpoGOrNMz2pgzLKEpdjLFcaMg9qIGIfRyOq1wa4z4qU8P4b4SNyH2ALJ4P4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409497; c=relaxed/simple; bh=2v63zeJJ3qyNW130IYWsx6Y2dRfyeqo3wdMnGBLseKE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dk95KVuxk+qqeoeftPfEXzoekHa+7HdMu4a1FNN+yUQ+JDxhtFsha9NfHxG2KTmYao3WYXtymayh+7G2h5xSDFsWhXNsX1Kd8+J0PrAods+6rDvfxJzWsbfUT+/quFA++G27olelPJgT04iwT7soso2UCc2yEl6HXPbFKYK03Ag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=MBxwcFJA; arc=none smtp.client-ip=209.85.222.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="MBxwcFJA" Received: by mail-qk1-f177.google.com with SMTP id af79cd13be357-7c58974ed57so239908385a.2 for ; Fri, 11 Apr 2025 15:11:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409494; x=1745014294; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sQiBfRqPk8mw1dZkgLepdccSbpRToMErHEtsnH4pLbA=; b=MBxwcFJAwyaHVdTY4d1ulzA8U65YmPxEO7ZRxjJdaY9FKRBmKM+tYrKf96yG9ppUs7 fjDWaeikomndNvxIGJNo3cy7ZfiC0vIXiY3z5iOfW7GdPs0wZmLTmEz9II1tLi7rcGsl VHgHb+K7ISXII9fa8Y8YSc19WZF4oavcT0Lcczri3WcWjXMO02HzyD9prL2bRFYJ4ZYy ULVUM77ZuBVpzZiCHl/mMSrYfp3tX/o7QMts6Y/45KAn0+SFIs4WaKC2LpAcb726HF89 6EZ5NUCJeQHvfir9ndEzdT2k40ppYtsh2HHGhEoWKfCT50O7dnbmhdMeCqCLdiX5ig9/ cL+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409494; x=1745014294; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sQiBfRqPk8mw1dZkgLepdccSbpRToMErHEtsnH4pLbA=; b=Mw0kQatJLjt3zNPOAOuxOAAwdgl17SX4x19LUHfpCK2kOXjv7ZUM63ys/5eTxTyNUN BZ5tWaCAgzy29qxdunvP70yzOPDTGnXq0+GZh7/2r79KO/k7PR1u6R/ojF4lk62HoEtk XO/tt8u7h6fV3WdDVhu3KyCJejQbcaMmGP4+T26l6mXdg9oqn7lvbK6ZfAg1tyOM3amb IbFaE9//ShUVPBrPGTYUX3EiyHKprzmmpStV6760BQEGaQQizKGe/yubdN7oTRXlMYk9 7Y/RfVzBnveNAG5wq/kY1tEq0tdyBvX7xrweIKkPzEAKpZc3B8UywvHJqD4HpeL22ojh EOjg== X-Forwarded-Encrypted: i=1; AJvYcCW40Cj6aVgGpLbOynN3Wiy7vJHdBrZAjpzPbkfA8NER4cqNMpCFvKSzEOsbYyBX85gH90o+Z/YSLXhk0TA=@vger.kernel.org X-Gm-Message-State: AOJu0YzULxmg6134cCsnVg+Abs+6r2OFoX3Xt1XmIWz1sGZjspWz1a/q LMUa6v62a9xLUR87fa5a0n0T7usPHN+2SLsA2zs5nrVc3cU4Fudj47KzlyOfEGE= X-Gm-Gg: ASbGncvITBnqtT7E49WYctyh7Zycwp3FOlqooKa8PkmckX7jqV64LEho90mr1IfhxVU OK4ttXn3DjdXuAMFV4fYFjh0eQYWIBmcT3wCbuMNIHGp+PAaZniO6c0psWJSMfhnf4nT3TxDYl8 cWxRL+7Tx5HmtEHqGudZ5mv5FggAFfgCSGbLJ+ZUA29PF6QU4COLMJtugLXoLrs8rKahl4A6txK ytAdL0MxnHr2IQz1247b1m/e9u0u1B01ZzKsWKLXX+ShSnjhTt0Gbm58BWdTjXJjUtsMFBgbX7v ybbcCVaWuAzL6Qqh4ktMeHMXATtXhqrHutYbsrji9zKlSrK6UcTWK/oWUmLqPk5I+u9qWkjU0/W vyXWdh+xsvSyTSkoVpvZ6KvS2q9Tt X-Google-Smtp-Source: AGHT+IHFJC0NN/wSPotsTklWaXCxqH3AtdGqhorgwESblRoLtPrItNQuVu4sWimxPm4HxktfyaglUg== X-Received: by 2002:a05:620a:2586:b0:7c5:aec7:7ecc with SMTP id af79cd13be357-7c7af0d3f56mr657187185a.13.1744409494387; Fri, 11 Apr 2025 15:11:34 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:34 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 4/6] migrate: implement migrate_misplaced_folio_batch Date: Fri, 11 Apr 2025 18:11:09 -0400 Message-ID: <20250411221111.493193-5-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A common operation in tiering is to migrate multiple pages at once. The migrate_misplaced_folio function requires one call for each individual folio. Expose a batch-variant of the same call for use when doing batch migrations. Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch Tested-by: Neha Gholkar --- include/linux/migrate.h | 6 ++++++ mm/migrate.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 61899ec7a9a3..2df756128316 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -145,6 +145,7 @@ const struct movable_operations *page_movable_ops(struc= t page *page) int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node); int migrate_misplaced_folio(struct folio *folio, int node); +int migrate_misplaced_folio_batch(struct list_head *foliolist, int node); #else static inline int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) @@ -155,6 +156,11 @@ static inline int migrate_misplaced_folio(struct folio= *folio, int node) { return -EAGAIN; /* can't migrate now */ } +static inline int migrate_misplaced_folio_batch(struct list_head *foliolis= t, + int node) +{ + return -EAGAIN; /* can't migrate now */ +} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_MIGRATION diff --git a/mm/migrate.c b/mm/migrate.c index 047131f6c839..7e1ba6001596 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -2731,5 +2731,36 @@ int migrate_misplaced_folio(struct folio *folio, int= node) BUG_ON(!list_empty(&migratepages)); return nr_remaining ? -EAGAIN : 0; } + +/* + * Batch variant of migrate_misplaced_folio. Attempts to migrate + * a folio list to the specified destination. + * + * Caller is expected to have isolated the folios by calling + * migrate_misplaced_folio_prepare(), which will result in an + * elevated reference count on the folio. + * + * This function will un-isolate the folios, dereference them, and + * remove them from the list before returning. + */ +int migrate_misplaced_folio_batch(struct list_head *folio_list, int node) +{ + pg_data_t *pgdat =3D NODE_DATA(node); + unsigned int nr_succeeded; + int nr_remaining; + + nr_remaining =3D migrate_pages(folio_list, alloc_misplaced_dst_folio, + NULL, node, MIGRATE_ASYNC, + MR_NUMA_MISPLACED, &nr_succeeded); + if (nr_remaining) + putback_movable_pages(folio_list); + + if (nr_succeeded) { + count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); + mod_node_page_state(pgdat, PGPROMOTE_SUCCESS, nr_succeeded); + } + BUG_ON(!list_empty(folio_list)); + return nr_remaining ? -EAGAIN : 0; +} #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ --=20 2.49.0 From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF9EF2BE7C2 for ; Fri, 11 Apr 2025 22:11:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409499; cv=none; b=Y+fpJjy1L6i4Nm3JnTtIPk/jueaJQC8SYP3jT9gPsLmd6yrsC8wdmCIEg4toEFtlf27+XkNeuYDd+bvnFyCxCW+9eD0ymmJQKhKluKDUO/GtoySIsZHkT5Zqv5ofHPBNtg5C8iiBNXJbZcqzazHv0a1WMW0gijSVOvvJ4B9skcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409499; c=relaxed/simple; bh=f7F65iJNRwwFOnKLUsW/N2/0B+v7LEPDycDyEgqrlEU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hkcZk55eefLQMISAOhAw97jzC4/x0wdhoCKYxqrPeBCQUq+1fxzI8+1UAZplhRGDFXplD2NuM8Bi5ZAhImJr+LrX16OJAJRzemfDm3QP/VJbRH5cr2kyaAmQH4PGqLCVvzKYFVZJEl0vQXw33zgO/ffZgVwIDSfRdISU9gSP4cw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=PyqJ9Mry; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="PyqJ9Mry" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7c07cd527e4so239457885a.3 for ; Fri, 11 Apr 2025 15:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409496; x=1745014296; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sViCxxrVWn6k5e2quQJqbzJ+9t3Kukzk7HEz/YFKfPM=; b=PyqJ9Mryt5ydzC0pwIr9pgMFnMr7CM9K95FDq6RyO746h3l2Sm4udd7G7C356rR5Zb gIzqmTiBP3bD1pMLZPx8aEUjoxKAfKqApDwtN6PzhTSZyEjggUtfiDGo5hH5aM9VvBef NGm3FRyUpCT8wMInTsktEzJNr7E/QA49Zn/rLX10n8BHyzxaohBiB9dRGC6tSfpBLef3 amsz2YFEhC6vtc+xF01UqP0PEgbzujaIN7n97Wny1pVpZ+eclSU1Rb3ecTtTKDwI97rA G94bQfe/9EXAdBhy7eNwnyWp+Oe/X7Gf3IOrmKubtsVIcUZhDPu7DYkCRa+g7oDJOztR +k0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409496; x=1745014296; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sViCxxrVWn6k5e2quQJqbzJ+9t3Kukzk7HEz/YFKfPM=; b=foTRmDQP/n79lHf3OwJDnf8ZYWkjDvS4Rez9xGc5JcBJ2pSlNCRWaDHWUNJA6fjzIj CKer6vVLS7Ahi0ceicbwe/k6kJ4p2UTJJGF70NowfKJcPRPm3yw+2J8ufpDA/h6yAoUv 5lHdka75M5yw0TdVW6GwlkYFDkzyZulWkFS1lxHnBXTTye6ZlcTwztdMiSbk9GGFmZvO QMMr2X0InoBhB2ptj2W27Qkcpp0xhdGMdUFKuqcjNfRgtnq2Dx5rBXrDxqFmBgiYdTN3 /BxIuIOOlnwJimzGPNlGjsbHFfui2d8NmG3ylS4gLoupriefIYW1HEdgyXcm47PaEsND Tp1g== X-Forwarded-Encrypted: i=1; AJvYcCWrOfxYManrnFRMf7JXCTKPyrhSjJ0Gc4M0ZHBz3k8LhCG3l96jUxjbbZHR0G3epO/TTJ6WTpTett64rME=@vger.kernel.org X-Gm-Message-State: AOJu0YxYzTdaduOsZ+ZopNW5jhM8q1RLgExBaXwG/oQN2kFQA6H3kZWQ 7aSEmN7vuFFzbEyFLqKxMr0LMjZHaA3YWLqSsaofQbnTYMT2yf9NADQCRjD70yY= X-Gm-Gg: ASbGncsP0rXc/6+2v/Gtxw2akUwlsmhKLWYBmcshtmbQAADigSqQp+wlgeDGX9d/qFh 6JjkKEx/J24S06JW/4L1QiRL6FfIspz+U5fZ6FiDaIappnQzLfYtb1+HbmRLdbtE7agiTNzqb63 PDwzkYgcslaXMWka+yIlF/Zc5VHDaoN9w6o7xFleSIvkgGSFwrlzDPd05cl2vGsiFfQMs80XP6S 4CKveN+5f9hq1wq9B50Z+LLSlik7lqZNAnjycGJFZjhH+aV17U7hs+lB66JmmlXU5ujgLuvFkBT o8vSd54oTqh1G6GLlGYCsVRH8bUKj6u6/GfaWdD32jQ0V8+LUXTfBfim12pFLUKrgEtIv6KJxZz m14NY6J86yW4QGFX1oxx4EPo2QzK7 X-Google-Smtp-Source: AGHT+IExdpUhhYAxtoS0uyKY1ixer8VJGq3qLZsJqoU1EB3wCRca50MB5XLvwdg1DuKXWzaQ2U8qAg== X-Received: by 2002:a05:620a:468a:b0:7c0:be39:1a34 with SMTP id af79cd13be357-7c7af12dde8mr747537185a.43.1744409496346; Fri, 11 Apr 2025 15:11:36 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:35 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 5/6] migrate,sysfs: add pagecache promotion Date: Fri, 11 Apr 2025 18:11:10 -0400 Message-ID: <20250411221111.493193-6-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" adds /sys/kernel/mm/numa/pagecache_promotion_enabled When page cache lands on lower tiers, there is no way for promotion to occur unless it becomes memory-mapped and exposed to NUMA hint faults. Just adding a mechanism to promote pages unconditionally, however, opens up significant possibility of performance regressions. Similar to the `demotion_enabled` sysfs entry, provide a sysfs toggle to enable and disable page cache promotion. This option will enable opportunistic promotion of unmapped page cache during syscall access. This option is intended for operational conditions where demoted page cache will eventually contain memory which becomes hot - and where said memory likely to cause performance issues due to being trapped on the lower tier of memory. A Page Cache folio is considered a promotion candidates when: 0) tiering and pagecache-promotion are enabled 1) the folio resides on a node not in the top tier 2) the folio is already marked referenced and active. 3) Multiple accesses in (referenced & active) state occur quickly. Since promotion is not safe to execute unconditionally from within folio_mark_accessed, we defer promotion to a new task_work captured in the task_struct. This ensures that the task doing the access has some hand in promoting pages - even among deduplicated read only files. We limit the total number of folios on the promotion list to the promotion rate limit to limit the amount of inline work done during large reads - avoiding significant overhead. We do not use the existing rate-limit check function this checked during the migration anyway. The promotion node is always the local node of the promoting cpu. Suggested-by: Johannes Weiner Signed-off-by: Gregory Price Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Keith Busch Tested-by: Neha Gholkar --- .../ABI/testing/sysfs-kernel-mm-numa | 20 +++++++ include/linux/memory-tiers.h | 2 + include/linux/migrate.h | 5 ++ include/linux/sched.h | 4 ++ include/linux/sched/sysctl.h | 1 + init/init_task.c | 2 + kernel/sched/fair.c | 24 +++++++- mm/memory-tiers.c | 27 +++++++++ mm/migrate.c | 55 +++++++++++++++++++ mm/swap.c | 8 +++ 10 files changed, 147 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-numa b/Documentation= /ABI/testing/sysfs-kernel-mm-numa index 77e559d4ed80..ebb041891db2 100644 --- a/Documentation/ABI/testing/sysfs-kernel-mm-numa +++ b/Documentation/ABI/testing/sysfs-kernel-mm-numa @@ -22,3 +22,23 @@ Description: Enable/disable demoting pages during reclaim the guarantees of cpusets. This should not be enabled on systems which need strict cpuset location guarantees. + +What: /sys/kernel/mm/numa/pagecache_promotion_enabled +Date: January 2025 +Contact: Linux memory management mailing list +Description: Enable/disable promoting pages during file access + + Page migration during file access is intended for systems + with tiered memory configurations that have significant + unmapped file cache usage. By default, file cache memory + on slower tiers will not be opportunistically promoted by + normal NUMA hint faults, because the system has no way to + track them. This option enables opportunistic promotion + of pages that are accessed via syscall (e.g. read/write) + if multiple accesses occur in quick succession. + + It may move data to a NUMA node that does not fall into + the cpuset of the allocating process which might be + construed to violate the guarantees of cpusets. This + should not be enabled on systems which need strict cpuset + location guarantees. diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 0dc0cf2863e2..fa96a67b8996 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -37,6 +37,7 @@ struct access_coordinate; =20 #ifdef CONFIG_NUMA extern bool numa_demotion_enabled; +extern bool numa_pagecache_promotion_enabled; extern struct memory_dev_type *default_dram_type; extern nodemask_t default_dram_nodes; struct memory_dev_type *alloc_memory_type(int adistance); @@ -76,6 +77,7 @@ static inline bool node_is_toptier(int node) #else =20 #define numa_demotion_enabled false +#define numa_pagecache_promotion_enabled false #define default_dram_type NULL #define default_dram_nodes NODE_MASK_NONE /* diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 2df756128316..3f8f30ae3a67 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -146,6 +146,7 @@ int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node); int migrate_misplaced_folio(struct folio *folio, int node); int migrate_misplaced_folio_batch(struct list_head *foliolist, int node); +void promotion_candidate(struct folio *folio); #else static inline int migrate_misplaced_folio_prepare(struct folio *folio, struct vm_area_struct *vma, int node) @@ -161,6 +162,10 @@ static inline int migrate_misplaced_folio_batch(struct= list_head *foliolist, { return -EAGAIN; /* can't migrate now */ } +static inline void promotion_candidate(struct folio *folio) +{ + return; +} #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_MIGRATION diff --git a/include/linux/sched.h b/include/linux/sched.h index 9c15365a30c0..392aec1f947c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1370,6 +1370,10 @@ struct task_struct { unsigned long numa_faults_locality[3]; =20 unsigned long numa_pages_migrated; + + struct callback_head numa_promo_work; + struct list_head promo_list; + unsigned long promo_count; #endif /* CONFIG_NUMA_BALANCING */ =20 #ifdef CONFIG_RSEQ diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 5a64582b086b..50b1d1dc27e2 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -25,6 +25,7 @@ enum sched_tunable_scaling { =20 #ifdef CONFIG_NUMA_BALANCING extern int sysctl_numa_balancing_mode; +extern unsigned int sysctl_numa_balancing_promote_rate_limit; #else #define sysctl_numa_balancing_mode 0 #endif diff --git a/init/init_task.c b/init/init_task.c index e557f622bd90..47162ed14106 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -187,6 +187,8 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { .numa_preferred_nid =3D NUMA_NO_NODE, .numa_group =3D NULL, .numa_faults =3D NULL, + .promo_list =3D LIST_HEAD_INIT(init_task.promo_list), + .promo_count =3D 0, #endif #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) .kasan_depth =3D 1, diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c798d2795243..68efbd4a9452 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include #include @@ -129,7 +130,7 @@ static unsigned int sysctl_sched_cfs_bandwidth_slice = =3D 5000UL; =20 #ifdef CONFIG_NUMA_BALANCING /* Restrict the NUMA promotion throughput (MB/s) for each target node. */ -static unsigned int sysctl_numa_balancing_promote_rate_limit =3D 65536; +unsigned int sysctl_numa_balancing_promote_rate_limit =3D 65536; #endif =20 #ifdef CONFIG_SYSCTL @@ -3535,6 +3536,25 @@ static void task_numa_work(struct callback_head *wor= k) } } =20 +static void task_numa_promotion_work(struct callback_head *work) +{ + struct task_struct *p =3D current; + struct list_head *promo_list =3D &p->promo_list; + int nid =3D numa_node_id(); + + SCHED_WARN_ON(p !=3D container_of(work, struct task_struct, numa_promo_wo= rk)); + + work->next =3D work; + + if (list_empty(promo_list)) + return; + + migrate_misplaced_folio_batch(promo_list, nid); + current->promo_count =3D 0; + return; +} + + void init_numa_balancing(unsigned long clone_flags, struct task_struct *p) { int mm_users =3D 0; @@ -3559,8 +3579,10 @@ void init_numa_balancing(unsigned long clone_flags, = struct task_struct *p) RCU_INIT_POINTER(p->numa_group, NULL); p->last_task_numa_placement =3D 0; p->last_sum_exec_runtime =3D 0; + INIT_LIST_HEAD(&p->promo_list); =20 init_task_work(&p->numa_work, task_numa_work); + init_task_work(&p->numa_promo_work, task_numa_promotion_work); =20 /* New address space, reset the preferred nid */ if (!(clone_flags & CLONE_VM)) { diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c index fc14fe53e9b7..e8acb54aa8df 100644 --- a/mm/memory-tiers.c +++ b/mm/memory-tiers.c @@ -935,6 +935,7 @@ static int __init memory_tier_init(void) subsys_initcall(memory_tier_init); =20 bool numa_demotion_enabled =3D false; +bool numa_pagecache_promotion_enabled; =20 #ifdef CONFIG_MIGRATION #ifdef CONFIG_SYSFS @@ -957,11 +958,37 @@ static ssize_t demotion_enabled_store(struct kobject = *kobj, return count; } =20 +static ssize_t pagecache_promotion_enabled_show(struct kobject *kobj, + struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "%s\n", + numa_pagecache_promotion_enabled ? "true" : "false"); +} + +static ssize_t pagecache_promotion_enabled_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + + ret =3D kstrtobool(buf, &numa_pagecache_promotion_enabled); + if (ret) + return ret; + + return count; +} + + static struct kobj_attribute numa_demotion_enabled_attr =3D __ATTR_RW(demotion_enabled); =20 +static struct kobj_attribute numa_pagecache_promotion_enabled_attr =3D + __ATTR_RW(pagecache_promotion_enabled); + static struct attribute *numa_attrs[] =3D { &numa_demotion_enabled_attr.attr, + &numa_pagecache_promotion_enabled_attr.attr, NULL, }; =20 diff --git a/mm/migrate.c b/mm/migrate.c index 7e1ba6001596..e6b4bf364837 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -44,6 +44,8 @@ #include #include #include +#include +#include =20 #include =20 @@ -2762,5 +2764,58 @@ int migrate_misplaced_folio_batch(struct list_head *= folio_list, int node) BUG_ON(!list_empty(folio_list)); return nr_remaining ? -EAGAIN : 0; } + +/** + * promotion_candidate: report a promotion candidate folio + * + * The folio will be isolated from LRU if selected, and task_work will + * putback the folio on promotion failure. + * + * Candidates may not be promoted and may be returned to the LRU. + * + * Takes a folio reference that will be released in task work. + */ +void promotion_candidate(struct folio *folio) +{ + struct task_struct *task =3D current; + struct list_head *promo_list =3D &task->promo_list; + struct callback_head *work =3D &task->numa_promo_work; + int nid =3D folio_nid(folio); + int flags, last_cpupid; + + /* do not migrate toptier folios or in kernel context */ + if (node_is_toptier(nid) || task->flags & PF_KTHREAD) + return; + + /* + * Limit per-syscall migration rate to balancing rate limit. This avoids + * excessive work during large reads knowing that task work is likely to + * hit the rate limit and put excess folios back on the LRU anyway. + */ + if (task->promo_count >=3D sysctl_numa_balancing_promote_rate_limit) + return; + + /* Isolate the folio to prepare for migration */ + nid =3D numa_migrate_check(folio, NULL, 0, &flags, folio_test_dirty(folio= ), + &last_cpupid); + if (nid =3D=3D NUMA_NO_NODE) + return; + + if (migrate_misplaced_folio_prepare(folio, NULL, nid)) + return; + + /* + * If work is pending, add this folio to the list. Otherwise, ensure + * the task will execute the work, otherwise we can leak folios. + */ + if (list_empty(promo_list) && task_work_add(task, work, TWA_RESUME)) { + folio_putback_lru(folio); + return; + } + list_add_tail(&folio->lru, promo_list); + task->promo_count +=3D folio_nr_pages(folio); + return; +} +EXPORT_SYMBOL(promotion_candidate); #endif /* CONFIG_NUMA_BALANCING */ #endif /* CONFIG_NUMA */ diff --git a/mm/swap.c b/mm/swap.c index 7523b65d8caa..382828fde505 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,10 @@ #include #include #include +#include +#include +#include +#include =20 #include "internal.h" =20 @@ -476,6 +480,10 @@ void folio_mark_accessed(struct folio *folio) __lru_cache_activate_folio(folio); folio_clear_referenced(folio); workingset_activation(folio); + } else if (!folio_test_isolated(folio) && + (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && + numa_pagecache_promotion_enabled) { + promotion_candidate(folio); } if (folio_test_idle(folio)) folio_clear_idle(folio); --=20 2.49.0 From nobody Wed Dec 17 12:47:59 2025 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40D4F202C50 for ; Fri, 11 Apr 2025 22:11:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409500; cv=none; b=FPQO7JB5JFbTWi2yWW8DW430rmVyC9r44ZQYAB0b2yn37OX8m1cSqmAsCKFHr+NDPaga3v1WKFEveeuLocxt1cTMBYs9K1AUFYRvyX/PiBxsSB4eA+5Dbc0i8atVMZcQot4tp/zuqtbmiMNBwnLSFF90cJhdfv0i3l+Vyhwt53c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744409500; c=relaxed/simple; bh=iipUogHWKqbSnD4CjJCUz99hy1w9mLc5GhkbwJjFgE8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VRQDbgBRWnMCgs+6mlzORLYEK5OdIK3bFUpZNwo2k+NTYdH6lbS0OwzkDNwKVBtYIJ4tZ8f1W59IjvgsPQM1rlSA4b3qDM+aqTgr/2Kzu9LkS0teHa/UxOFXTxduVkGHsTbdEQWBzDKK0e0JVCIrF8E+1vgvGpGzmiQW8MVeMKw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net; spf=pass smtp.mailfrom=gourry.net; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b=LZ6ak8wR; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=gourry.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gourry.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gourry.net header.i=@gourry.net header.b="LZ6ak8wR" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7be49f6b331so283319285a.1 for ; Fri, 11 Apr 2025 15:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1744409498; x=1745014298; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lWwZ2vfL+KhsRUswnAKZ8QRNf9jCtW28RDZA6bxWNqI=; b=LZ6ak8wR1fshCx0fXAjvmT/MFyk5V3oySomvy27rN65222y2m+kiIAq7DC0VBJWW95 9qXTGAq+4CFncw+tOmtqQRAmU4IlOkz/f6FJErAmD0HJo+mwY4OhqS5HNLBqqPlYGspl yda+kscupelujuOAG00hQPejteMZrfi5NXXKf9WXq9A1tRqgqaXYMkLXjY/bL64ni02S nYNNRo2QI2ILeXJMUcBrcx4TlAgzUJjNcqrbuHyyY1vXRWFzuYw+dXMmwAY4vCeD9ifl /X990Ptl41isNUeFi7rFioLVSgiFhwl7jHFtJLrbh/XW8XjGRfI7vvmrEzJnHFnTL2uO gqPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744409498; x=1745014298; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lWwZ2vfL+KhsRUswnAKZ8QRNf9jCtW28RDZA6bxWNqI=; b=U+ajiUMxgbPyEb+khB0NoPgrawWQeiNO5VGoVIOrcK3zF6MH4iZXjx3iEfpPAwAWt9 7xnARUTjR45AEsF5Cug/Pl5ibg4HHqAapkK8IbTWz8/z/6XTW/xlwAlLRSQJTv/lL6Ij ytCJ3ztbZHkmrXHhh2kCaPrfoKefFpL/jdnJma4QGSb8FVvIg6lDGgMBEHgnCvQxNKXK 4Sd6iW9FXSzaprY6vip66uE/CmKQ21ZoT1gBKGtR1yQ2H84jY8xsTT2Mtp0YfZpN0tCG D4nmO6d2fs2hlesStK9Vh2eGMrjqAB0YAJgsnDMsA1og0F/jbQ9RS8FmaCIubjdpCpnP EitQ== X-Forwarded-Encrypted: i=1; AJvYcCV0ZqwbCVTxksgMIjtegKNme/qHUlA9lOCZlpY8repasAB4XHtKw0afO75ELQsAF/0FFivpxQ4SwC+wSKQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxtqwgBrbiwp7bXUh6VmJRuK2wY8rCAS3RY7TXSVCjIXdWXSG79 eWxtwTZ5bsfXW0GTks1qp+bk0zzqnvou+Uo9jRy1TLTsiMA+l3vUzIllqYM85Ng= X-Gm-Gg: ASbGncu9Sp9xqpdr696pX6DVIiYx+vh981Fnpy7Ga5g5JeQL1LVk2fpdiCbFtZt8StQ 0tyTU/lDtKlsCwBbA2PgVRcd6Ab4lGkSyitBb7GfzKjHHW84N1GKIZSaKT6tIvD//4xCynnIKmQ /CPavhkCUTQocIKsJ0z7+dTkgZb+AhSyx9A+x8h/sOG4Ps8T33bavzVa65/tOzAN9pCMUgOjyC1 dT+siqW82bn+cGJ3Eo96uNCmzbHi8pP5XnnAv0cL+CbdD1hufq8GLJbenmIMF/2aLNrU0PDpf/N KXGvpSJQv/u1nV6RNVseZMBVeqvIIOx+bmDfn/BVWuiSXoQ6ETPjQ4GN4IXmhtM4Iq2DqfQYq++ ImCzbRDDB6ZAeojmSqUiSRglDN0UR X-Google-Smtp-Source: AGHT+IGLtJZR0JJVKRILz/ZiqYmeVboSBH4PK1DJ0UoO9oX8aCb9uz98PkTHNKxCb1p4ZJETUczmng== X-Received: by 2002:a05:620a:3913:b0:7c7:a602:66ee with SMTP id af79cd13be357-7c7af0bd2fbmr513370685a.10.1744409497970; Fri, 11 Apr 2025 15:11:37 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F.lan (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c7a8943afcsm321264485a.16.2025.04.11.15.11.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Apr 2025 15:11:37 -0700 (PDT) From: Gregory Price To: linux-mm@kvack.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, akpm@linux-foundation.org, mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, donettom@linux.ibm.com Subject: [RFC PATCH v4 6/6] mm/swap.c: Enable promotion of unmapped MGLRU page cache pages Date: Fri, 11 Apr 2025 18:11:11 -0400 Message-ID: <20250411221111.493193-7-gourry@gourry.net> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250411221111.493193-1-gourry@gourry.net> References: <20250411221111.493193-1-gourry@gourry.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Donet Tom Extend MGLRU to support promotion of page cache pages. An MGLRU page cache page is eligible for promotion when: 1. Memory Tiering and pagecache_promotion_enabled are enabled 2. It resides in a lower memory tier. 3. It is referenced. 4. It is part of the working set. 5. folio reference count is maximun (LRU_REFS_MASK). When a page is accessed through a file descriptor, folio_inc_refs() is invoked. The first access will set the folio=E2=80=99s referenced flag, and subsequent accesses will increment the reference count in the folio flag (reference counter size in folio flags is 2 bits). Once the referenced flag is set, and the folio=E2=80=99s reference count reaches the maximum value (LRU_REFS_MASK), the working set flag will be set as well. If a folio has both the referenced and working set flags set, and its reference count equals LRU_REFS_MASK, it becomes a good candidate for promotion. These pages will be added to the promotion list. The per-process task task_numa_promotion_work() takes the pages from the promotion list and promotes them to a higher memory tier. In the MGLRU, for folios accessed through a file descriptor, if the folio=E2=80=99s referenced and working set flags are set, and the folio's reference count is equal to LRU_REFS_MASK, the folio is lazily promoted to the second oldest generation in the eviction path. When folio_inc_gen() does this, it clears the LRU_REFS_FLAGS so that lru_gen_inc_refs() can start over. Test process: We measured the read time in below scenarios for both LRU and MGLRU. Scenario 1: Pages are on Lower tier + promotion off Scenario 2: Pages are on Lower tier + promotion on Scenario 3: Pages are on higher tier Test Results MGLRU Suggested-by: Feng Tang Suggested-by: Huang Ying Suggested-by: Johannes Weiner Suggested-by: Keith Busch Tested-by: Neha Gholkar ---------------------------------------------------------------- Pages on higher | Pages Lower tier | Pages on Lower Tier | Tier | promotion off | Promotion On | ---------------------------------------------------------------- 0.48s | 1.6s |During Promotion - 3.3s | | |After Promotion - 0.48s | | | | ---------------------------------------------------------------- Test Results LRU ---------------------------------------------------------------- Pages on higher | Pages Lower tier | Pages on Lower Tier | Tier | promotion off | Promotion On | ---------------------------------------------------------------- 0.48s | 1.6s |During Promotion - 3.3s | | |After Promotion - 0.48s | | | | ---------------------------------------------------------------- MGLRU and LRU are showing similar performance benefit. Signed-off-by: Donet Tom --- mm/swap.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/swap.c b/mm/swap.c index 382828fde505..3af2377515ad 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -399,8 +399,13 @@ static void lru_gen_inc_refs(struct folio *folio) =20 do { if ((old_flags & LRU_REFS_MASK) =3D=3D LRU_REFS_MASK) { - if (!folio_test_workingset(folio)) + if (!folio_test_workingset(folio)) { folio_set_workingset(folio); + } else if (!folio_test_isolated(folio) && + (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) && + numa_pagecache_promotion_enabled) { + promotion_candidate(folio); + } return; } =20 --=20 2.49.0