From nobody Thu Apr  9 05:46:21 2026
Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com
 [91.218.175.186])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E19902E5B09
	for <linux-kernel@vger.kernel.org>; Tue, 10 Mar 2026 14:54:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=91.218.175.186
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773154483; cv=none;
 b=jGEcKWFZtmOtVqc+UC9QvAh/tLhChJjDl1qSL84ITHAxbN1dnpYFfBh3Ar/HNxjeeJHzfXiy5e+rpv1vUhY/o761ysA7wqc345J2j6zHhGqpvoiYNt1jMP1tCQA6NKQ+FpH7uy37StHGbo8Q7atgFc8EpGTcmrq/mhpYDEteJpQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773154483; c=relaxed/simple;
	bh=UNdhYhcWjV3ZZ8KNfNO1NqUySGXdFU0kugLx3JPYp10=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=aQfx7YDf51Si2VvFDUKRcGNVwrdDGqdt5YoM2uArUAn7IJuXW9y5K5dSWqRH7q1L0KLuG5A2xyzg0R7Sg5ZSQIuagkqUMjPufwK2gv26F92taFDVTBvMHAgrOE2RKRBcDLgbbOPcTIyZho0HmO/wMGqjKyc8/JEOaKD9kdf3c6w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev;
 spf=pass smtp.mailfrom=linux.dev;
 dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b=taHmAdo3; arc=none smtp.client-ip=91.218.175.186
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b="taHmAdo3"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and
 include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773154479;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=EGWvFNNHbAVQyTUs3YvCDY0GyMiml7VfROhcweab0g8=;
	b=taHmAdo34oCwjQFIaRdmRojINXQm081BWLVpiG89JRPghIcpldJ2rwOWU0KURRrhy5yjpf
	19sWjb7mkMYRCePOGCQXpG9Wc5IKYu1fi/V6HuLIH/qnuv0qdwjurQ0tE2yqYewmgWxhBS
	i69FLwkpsJPKnJP2wwNrpkeBXTBIoAo=
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	ryan.roberts@arm.com,
	david@kernel.org
Cc: ajd@linux.ibm.com,
	anshuman.khandual@arm.com,
	apopple@nvidia.com,
	baohua@kernel.org,
	baolin.wang@linux.alibaba.com,
	brauner@kernel.org,
	catalin.marinas@arm.com,
	dev.jain@arm.com,
	jack@suse.cz,
	kees@kernel.org,
	kevin.brodsky@arm.com,
	lance.yang@linux.dev,
	Liam.Howlett@oracle.com,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com,
	npache@redhat.com,
	rmclure@linux.ibm.com,
	Al Viro <viro@zeniv.linux.org.uk>,
	will@kernel.org,
	willy@infradead.org,
	ziy@nvidia.com,
	hannes@cmpxchg.org,
	kas@kernel.org,
	shakeel.butt@linux.dev,
	kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 1/4] arm64: request contpte-sized folios for exec memory
Date: Tue, 10 Mar 2026 07:51:14 -0700
Message-ID: <20260310145406.3073394-2-usama.arif@linux.dev>
In-Reply-To: <20260310145406.3073394-1-usama.arif@linux.dev>
References: <20260310145406.3073394-1-usama.arif@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Migadu-Flow: FLOW_OUT
Content-Type: text/plain; charset="utf-8"

exec_folio_order() was introduced [1] to request readahead of executable
file-backed pages at an arch-preferred folio order, so that the hardware
can coalesce contiguous PTEs into fewer iTLB entries (contpte).

The current implementation uses ilog2(SZ_64K >> PAGE_SHIFT), which
requests 64K folios. This is optimal for 4K base pages (where CONT_PTES
=3D 16, contpte size =3D 64K), but suboptimal for 16K and 64K base pages:

Page size | Before (order) | After (order) | contpte
----------|----------------|---------------|--------
4K        | 4 (64K)        | 4 (64K)       | Yes (unchanged)
16K       | 2 (64K)        | 7 (2M)        | Yes (new)
64K       | 0 (64K)        | 5 (2M)        | Yes (new)

For 16K pages, CONT_PTES =3D 128 and the contpte size is 2M (order 7).
For 64K pages, CONT_PTES =3D 32 and the contpte size is 2M (order 5).

Use ilog2(CONT_PTES) instead, which directly evaluates to contpte-aligned
order for all page sizes.

The worst-case waste is bounded to one folio (up to 2MB - 64KB)
at the end of the file, since page_cache_ra_order() reduces the folio
order near EOF to avoid allocating past i_size.

[1] https://lore.kernel.org/all/20250430145920.3748738-6-ryan.roberts@arm.c=
om/

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 arch/arm64/include/asm/pgtable.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta=
ble.h
index b3e58735c49bd..a1110a33acb35 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1600,12 +1600,11 @@ static inline void update_mmu_cache_range(struct vm=
_fault *vmf,
 #define arch_wants_old_prefaulted_pte	cpu_has_hw_af
=20
 /*
- * Request exec memory is read into pagecache in at least 64K folios. This=
 size
- * can be contpte-mapped when 4K base pages are in use (16 pages into 1 iT=
LB
- * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base
- * pages are in use.
+ * Request exec memory is read into pagecache in contpte-sized folios. The
+ * contpte size is the number of contiguous PTEs that the hardware can coa=
lesce
+ * into a single iTLB entry: 64K for 4K pages, 2M for 16K and 64K pages.
  */
-#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT)
+#define exec_folio_order() ilog2(CONT_PTES)
=20
 static inline bool pud_sect_supported(void)
 {
--=20
2.47.3
From nobody Thu Apr  9 05:46:21 2026
Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com
 [91.218.175.184])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 074583064A3
	for <linux-kernel@vger.kernel.org>; Tue, 10 Mar 2026 14:54:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=91.218.175.184
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773154489; cv=none;
 b=oDIE0lcqlRYb5B01YipaWOyNF0Dk9tglD1Mf+HIRFJHh7MD6p8OeZ/oop4Uzb8l/oJjirFNLHM8FwchLn2vdFcJmRQSzuryIf07IsQsySuraO5KNnsu3AO/u5hZH5wZ4oC3lM6cmlpNiwT97KmgvrVbjeqyVLso7wrHMrKJn0A4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773154489; c=relaxed/simple;
	bh=4laApc8NnGxTpddKFozyyVwqGUlAQh5OJ1FeFhbLxZo=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=qvfFeC24EIXQCYlnmmwmemBaSETqFABI+QQ6Aes3Gewclv5vh+Ke7XS3+WNXrSYB2e0xQo7Uunlrqp152GGJ/4x+gzj8CAV7zWfkdLLKJVQ2z5jPn0z5q05fPCJ9HSa8ZCqVxlW68HM+wMTa/Hwno1Uc+6vkiFbGwo+pwKjm7I0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev;
 spf=pass smtp.mailfrom=linux.dev;
 dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b=LmrODPSV; arc=none smtp.client-ip=91.218.175.184
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b="LmrODPSV"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and
 include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773154485;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=cGK3lb1Gl4HCpee4Gph+9eiHmaXQeXqY6khXIDpX7y0=;
	b=LmrODPSVrXpm2EmXagV1g0RUXMxioQTzpSwL5eNBUJ9rkDCjz6bNRfo8KV+CqZRbnHjE68
	I8WEXf5qsZGH5CDbqxlCSz2yMBlvWrR6/6lzTS3KHT+IOo4Pw70p5ooXE//x8R2klmN4qF
	c7PhpqQjMaabhHUGcyJnaSnjqP8H69o=
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	ryan.roberts@arm.com,
	david@kernel.org
Cc: ajd@linux.ibm.com,
	anshuman.khandual@arm.com,
	apopple@nvidia.com,
	baohua@kernel.org,
	baolin.wang@linux.alibaba.com,
	brauner@kernel.org,
	catalin.marinas@arm.com,
	dev.jain@arm.com,
	jack@suse.cz,
	kees@kernel.org,
	kevin.brodsky@arm.com,
	lance.yang@linux.dev,
	Liam.Howlett@oracle.com,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com,
	npache@redhat.com,
	rmclure@linux.ibm.com,
	Al Viro <viro@zeniv.linux.org.uk>,
	will@kernel.org,
	willy@infradead.org,
	ziy@nvidia.com,
	hannes@cmpxchg.org,
	kas@kernel.org,
	shakeel.butt@linux.dev,
	kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 2/4] mm: bypass mmap_miss heuristic for VM_EXEC readahead
Date: Tue, 10 Mar 2026 07:51:15 -0700
Message-ID: <20260310145406.3073394-3-usama.arif@linux.dev>
In-Reply-To: <20260310145406.3073394-1-usama.arif@linux.dev>
References: <20260310145406.3073394-1-usama.arif@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Migadu-Flow: FLOW_OUT
Content-Type: text/plain; charset="utf-8"

The mmap_miss counter in do_sync_mmap_readahead() tracks whether
readahead is useful for mmap'd file access. It is incremented by 1 on
every page cache miss in do_sync_mmap_readahead(), and decremented in
two places:

  - filemap_map_pages(): decremented by N for each of N pages
    successfully mapped via fault-around (pages found already in cache,
    evidence readahead was useful). Only pages not in the workingset
    count as hits.

  - do_async_mmap_readahead(): decremented by 1 when a page with
    PG_readahead is found in cache.

When the counter exceeds MMAP_LOTSAMISS (100), all readahead is
disabled, including the targeted VM_EXEC readahead [1] that requests
arch-preferred folio orders for contpte mapping.

On arm64 with 64K base pages, both decrement paths are inactive:

  1. filemap_map_pages() is never called because fault_around_pages
     (65536 >> PAGE_SHIFT =3D 1) disables should_fault_around(), which
     requires fault_around_pages > 1. With only 1 page in the
     fault-around window, there is nothing "around" to map.

  2. do_async_mmap_readahead() never fires for exec mappings because
     exec readahead sets async_size =3D 0, so no PG_readahead markers
     are placed.

With no decrements, mmap_miss monotonically increases past
MMAP_LOTSAMISS after 100 page faults, disabling all subsequent
exec readahead.

Fix this by moving the VM_EXEC readahead block above the mmap_miss
check. The exec readahead path is targeted. It reads a single folio at
the fault location with async_size=3D0, not speculative prefetch, so the
mmap_miss heuristic designed to throttle wasteful speculative readahead
should not gate it. The page would need to be faulted in regardless,
the only question is at what order.

[1] https://lore.kernel.org/all/20250430145920.3748738-6-ryan.roberts@arm.c=
om/

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 mm/filemap.c | 72 ++++++++++++++++++++++++++++------------------------
 1 file changed, 39 insertions(+), 33 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 6cd7974d4adab..c064f31ecec5a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3331,6 +3331,37 @@ static struct file *do_sync_mmap_readahead(struct vm=
_fault *vmf)
 		}
 	}
=20
+	if (vm_flags & VM_EXEC) {
+		/*
+		 * Allow arch to request a preferred minimum folio order for
+		 * executable memory. This can often be beneficial to
+		 * performance if (e.g.) arm64 can contpte-map the folio.
+		 * Executable memory rarely benefits from readahead, due to its
+		 * random access nature, so set async_size to 0.
+		 *
+		 * Limit to the boundaries of the VMA to avoid reading in any
+		 * pad that might exist between sections, which would be a waste
+		 * of memory.
+		 *
+		 * This is targeted readahead (one folio at the fault location),
+		 * not speculative prefetch, so bypass the mmap_miss heuristic
+		 * which would otherwise disable it after MMAP_LOTSAMISS faults.
+		 */
+		struct vm_area_struct *vma =3D vmf->vma;
+		unsigned long start =3D vma->vm_pgoff;
+		unsigned long end =3D start + vma_pages(vma);
+		unsigned long ra_end;
+
+		ra->order =3D exec_folio_order();
+		ra->start =3D round_down(vmf->pgoff, 1UL << ra->order);
+		ra->start =3D max(ra->start, start);
+		ra_end =3D round_up(ra->start + ra->ra_pages, 1UL << ra->order);
+		ra_end =3D min(ra_end, end);
+		ra->size =3D ra_end - ra->start;
+		ra->async_size =3D 0;
+		goto do_readahead;
+	}
+
 	if (!(vm_flags & VM_SEQ_READ)) {
 		/* Avoid banging the cache line if not needed */
 		mmap_miss =3D READ_ONCE(ra->mmap_miss);
@@ -3361,40 +3392,15 @@ static struct file *do_sync_mmap_readahead(struct v=
m_fault *vmf)
 		return fpin;
 	}
=20
-	if (vm_flags & VM_EXEC) {
-		/*
-		 * Allow arch to request a preferred minimum folio order for
-		 * executable memory. This can often be beneficial to
-		 * performance if (e.g.) arm64 can contpte-map the folio.
-		 * Executable memory rarely benefits from readahead, due to its
-		 * random access nature, so set async_size to 0.
-		 *
-		 * Limit to the boundaries of the VMA to avoid reading in any
-		 * pad that might exist between sections, which would be a waste
-		 * of memory.
-		 */
-		struct vm_area_struct *vma =3D vmf->vma;
-		unsigned long start =3D vma->vm_pgoff;
-		unsigned long end =3D start + vma_pages(vma);
-		unsigned long ra_end;
-
-		ra->order =3D exec_folio_order();
-		ra->start =3D round_down(vmf->pgoff, 1UL << ra->order);
-		ra->start =3D max(ra->start, start);
-		ra_end =3D round_up(ra->start + ra->ra_pages, 1UL << ra->order);
-		ra_end =3D min(ra_end, end);
-		ra->size =3D ra_end - ra->start;
-		ra->async_size =3D 0;
-	} else {
-		/*
-		 * mmap read-around
-		 */
-		ra->start =3D max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
-		ra->size =3D ra->ra_pages;
-		ra->async_size =3D ra->ra_pages / 4;
-		ra->order =3D 0;
-	}
+	/*
+	 * mmap read-around
+	 */
+	ra->start =3D max_t(long, 0, vmf->pgoff - ra->ra_pages / 2);
+	ra->size =3D ra->ra_pages;
+	ra->async_size =3D ra->ra_pages / 4;
+	ra->order =3D 0;
=20
+do_readahead:
 	fpin =3D maybe_unlock_mmap_for_io(vmf, fpin);
 	ractl._index =3D ra->start;
 	page_cache_ra_order(&ractl, ra);
--=20
2.47.3
From nobody Thu Apr  9 05:46:21 2026
Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com
 [95.215.58.180])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24F6F2857EA
	for <linux-kernel@vger.kernel.org>; Tue, 10 Mar 2026 14:54:56 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=95.215.58.180
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773154497; cv=none;
 b=tt8OHxWch8UEgmvj0uaPhHhaACoxZVp9WbVR4R7EwSeOFShZ1izTFI7vv1NtYNQGvr+1/GbSGi3owlb154kBRFT/4mQ3xUUrd82PhePjnA2FM39YNUG77NXJJplsU5ELcAFwNAh+Gfv0K9QKkQ2bbc0quNTpnM5yyrIb09ro4LI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773154497; c=relaxed/simple;
	bh=Ahdjpp5tmOKnolDDyNj+hSgqgzjJXUblKpC03uLryBE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ujs8ANb1ayKcwXQEVboaiNuD1xU58HrdqbaWrGGcZIXKdjN2kk324k4uPxdsn40yqvPIRiRGTYET38/BN1OI6sWne4dYF+do9u079QDVabeaGa6U0pYw09w+fRbW1ia98O3+ky1sK/dsPt/CE1SX08XlYFPwCs4sZ2jVKkZLdSI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev;
 spf=pass smtp.mailfrom=linux.dev;
 dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b=gD0zzF1Z; arc=none smtp.client-ip=95.215.58.180
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b="gD0zzF1Z"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and
 include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773154493;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Pwf6Nc7CaGHddB3FSw2OYM6VouGq/Ldl0n4v0DGTvQM=;
	b=gD0zzF1ZzkpibyiRY+w+AjVdGRw9wPyUFI+y5FS68pAmCO1c1M9m41DEaxy2QyJUSTMhzg
	P5XvbFOUZoVyZubGSqW0JOoPKegDPZYBJmxRz5fKJnH3JtcIuHIQQfgUEkfvTqhC9I4zI7
	8e/PuxvBlI2swPY3tUz0PqrlB4hUlJQ=
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	ryan.roberts@arm.com,
	david@kernel.org
Cc: ajd@linux.ibm.com,
	anshuman.khandual@arm.com,
	apopple@nvidia.com,
	baohua@kernel.org,
	baolin.wang@linux.alibaba.com,
	brauner@kernel.org,
	catalin.marinas@arm.com,
	dev.jain@arm.com,
	jack@suse.cz,
	kees@kernel.org,
	kevin.brodsky@arm.com,
	lance.yang@linux.dev,
	Liam.Howlett@oracle.com,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com,
	npache@redhat.com,
	rmclure@linux.ibm.com,
	Al Viro <viro@zeniv.linux.org.uk>,
	will@kernel.org,
	willy@infradead.org,
	ziy@nvidia.com,
	hannes@cmpxchg.org,
	kas@kernel.org,
	shakeel.butt@linux.dev,
	kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 3/4] elf: align ET_DYN base to exec folio order for contpte
 mapping
Date: Tue, 10 Mar 2026 07:51:16 -0700
Message-ID: <20260310145406.3073394-4-usama.arif@linux.dev>
In-Reply-To: <20260310145406.3073394-1-usama.arif@linux.dev>
References: <20260310145406.3073394-1-usama.arif@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Migadu-Flow: FLOW_OUT
Content-Type: text/plain; charset="utf-8"

For PIE binaries (ET_DYN), the load address is randomized at PAGE_SIZE
granularity via arch_mmap_rnd(). On arm64 with 64K base pages, this
means the binary is 64K-aligned, but contpte mapping requires 2M
(CONT_PTE_SIZE) alignment.

Without proper virtual address alignment, the readahead patches that
allocate 2M folios with 2M-aligned file offsets and physical addresses
cannot benefit from contpte mapping. The contpte fold check in
contpte_set_ptes() requires the virtual address to be CONT_PTE_SIZE-
aligned, and since the misalignment from vma->vm_start is constant
across all folios in the VMA, no folio gets the contiguous PTE bit
set, resulting in zero iTLB coalescing benefit.

Fix this by bumping the ELF alignment to PAGE_SIZE << exec_folio_order()
when the arch defines a non-zero exec_folio_order(). This ensures
load_bias is aligned to the folio size, so that file-offset-aligned
folios map to properly aligned virtual addresses.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 fs/binfmt_elf.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 8e89cc5b28200..2d2b3e9fd474f 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -49,6 +49,7 @@
 #include <uapi/linux/rseq.h>
 #include <asm/param.h>
 #include <asm/page.h>
+#include <linux/pgtable.h>
=20
 #ifndef ELF_COMPAT
 #define ELF_COMPAT 0
@@ -1106,6 +1107,20 @@ static int load_elf_binary(struct linux_binprm *bprm)
 			/* Calculate any requested alignment. */
 			alignment =3D maximum_alignment(elf_phdata, elf_ex->e_phnum);
=20
+			/*
+			 * If the arch requested large folios for exec
+			 * memory via exec_folio_order(), ensure the
+			 * binary is mapped with sufficient alignment so
+			 * that virtual addresses of exec pages are
+			 * aligned to the folio boundary. Without this,
+			 * the hardware cannot coalesce PTEs (e.g. arm64
+			 * contpte) even though the physical memory and
+			 * file offset are correctly aligned.
+			 */
+			if (exec_folio_order())
+				alignment =3D max(alignment,
+					(unsigned long)PAGE_SIZE << exec_folio_order());
+
 			/**
 			 * DOC: PIE handling
 			 *
--=20
2.47.3
From nobody Thu Apr  9 05:46:21 2026
Received: from out-184.mta0.migadu.com (out-184.mta0.migadu.com
 [91.218.175.184])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FE7E2DA76C
	for <linux-kernel@vger.kernel.org>; Tue, 10 Mar 2026 14:55:01 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=91.218.175.184
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1773154502; cv=none;
 b=rDZybzHTxdmQi0UYnT5ryZrA2Anj7EmY11njWZfp3aX3lLCg6XtQLLn+yEh7BLiDldTbLdFDBBW1Iz0V0jzfEz5gJsH+gVqcANDdsE6y7XEq5UFtTlcr20dO5wUQfnYwjURa8rcRvFt5vnr+9UsP4djIec1cV+4HZ5Ulj+sBA/4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1773154502; c=relaxed/simple;
	bh=39Q8LnkGEp9VfwdObsoI4kOYowjYOmfnHABtQZKmgIE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=J468+9WUbxDgtIFzYreQBINbmJPuzcu4Awo0BfPtD9UJDypnA9/Gnh70l5/tG5oppTHU1AjX5ODdJlovLgsqPpenCaluPyn0+3AD2e5wnTZQxkuAYIjPG8ALDv5NycEZzXIY0TtALQz1SbIK8HrBnyK38BB6707UAYEGYXIjkc4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev;
 spf=pass smtp.mailfrom=linux.dev;
 dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b=Q7J9Xayb; arc=none smtp.client-ip=91.218.175.184
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev
 header.b="Q7J9Xayb"
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and
 include these headers.
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1773154499;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=4Koap7c4nmBUn1jQcRs2TKnT3RilQxG9yZc8gbf4yHA=;
	b=Q7J9Xayb3tuq1RwczoR+kkYcwmbHr2IBdva+1kWyUfpOmeUqFxY8p3HhUG0VvkvFVRO7cq
	rcbAym+WsHWFr//SsTKMXbfjnthBN1AVMJKC4S1BQIu+Y4f+asu6YqEK9FjMKM6HNAuqFg
	o+kIhQjL1lj8wyLdBd5OI3jV01FKMB4=
From: Usama Arif <usama.arif@linux.dev>
To: Andrew Morton <akpm@linux-foundation.org>,
	ryan.roberts@arm.com,
	david@kernel.org
Cc: ajd@linux.ibm.com,
	anshuman.khandual@arm.com,
	apopple@nvidia.com,
	baohua@kernel.org,
	baolin.wang@linux.alibaba.com,
	brauner@kernel.org,
	catalin.marinas@arm.com,
	dev.jain@arm.com,
	jack@suse.cz,
	kees@kernel.org,
	kevin.brodsky@arm.com,
	lance.yang@linux.dev,
	Liam.Howlett@oracle.com,
	linux-arm-kernel@lists.infradead.org,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	lorenzo.stoakes@oracle.com,
	npache@redhat.com,
	rmclure@linux.ibm.com,
	Al Viro <viro@zeniv.linux.org.uk>,
	will@kernel.org,
	willy@infradead.org,
	ziy@nvidia.com,
	hannes@cmpxchg.org,
	kas@kernel.org,
	shakeel.butt@linux.dev,
	kernel-team@meta.com,
	Usama Arif <usama.arif@linux.dev>
Subject: [PATCH 4/4] mm: align file-backed mmap to exec folio order in
 thp_get_unmapped_area
Date: Tue, 10 Mar 2026 07:51:17 -0700
Message-ID: <20260310145406.3073394-5-usama.arif@linux.dev>
In-Reply-To: <20260310145406.3073394-1-usama.arif@linux.dev>
References: <20260310145406.3073394-1-usama.arif@linux.dev>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Migadu-Flow: FLOW_OUT
Content-Type: text/plain; charset="utf-8"

thp_get_unmapped_area() is the get_unmapped_area callback for
filesystems like ext4, xfs, and btrfs. It attempts to align the virtual
address for PMD_SIZE THP mappings, but on arm64 with 64K base pages
PMD_SIZE is 512M, which is too large for typical shared library mappings,
so the alignment always fails and falls back to PAGE_SIZE.

This means shared libraries loaded by ld.so via mmap() get 64K-aligned
virtual addresses, preventing contpte mapping even when 2M folios are
allocated with properly aligned file offsets and physical addresses.

Add a fallback in thp_get_unmapped_area_vmflags() that tries
PAGE_SIZE << exec_folio_order() alignment (2M on arm64 64K pages)
when PMD_SIZE alignment fails. This is small enough that shared
libraries could qualify, enabling contpte mapping for their executable
segments.

This applies to all file-backed mappings (not just exec). Non-exec
file-backed mappings also benefit from contpte mapping when large
folios are used. Aligning all file-backed mappings ensures that any
large folio in the page cache can be contpte-mapped regardless of
the mapping's protection flags, reducing dTLB misses for read-heavy
workloads.

The fallback is gated by exec_folio_order() which returns 0 by default,
making this a no-op on architectures that don't define it.

Signed-off-by: Usama Arif <usama.arif@linux.dev>
---
 mm/huge_memory.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e2746ea74adf..1c9476a5ed51c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1242,6 +1242,23 @@ unsigned long thp_get_unmapped_area_vmflags(struct f=
ile *filp, unsigned long add
 	if (ret)
 		return ret;
=20
+	/*
+	 * If the arch requested large folios for exec memory, try to align
+	 * to the folio size as a fallback. This is much smaller than PMD_SIZE
+	 * (e.g. 2M vs 512M on arm64 64K pages), so it succeeds for mappings
+	 * that are too small for PMD alignment. Proper alignment ensures that
+	 * the hardware can coalesce PTEs (e.g. arm64 contpte) when large
+	 * folios are mapped.
+	 */
+	if (exec_folio_order()) {
+		unsigned long folio_size =3D PAGE_SIZE << exec_folio_order();
+
+		ret =3D __thp_get_unmapped_area(filp, addr, len, off, flags,
+					      folio_size, vm_flags);
+		if (ret)
+			return ret;
+	}
+
 	return mm_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags,
 					    vm_flags);
 }
--=20
2.47.3