From nobody Mon Jun 8 06:35:47 2026 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E7033988F1 for ; Mon, 1 Jun 2026 10:22:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780309351; cv=none; b=E1mE49FS0u5/0MUPobUR+Ui5u7agACXgMx6RuL7ZD4AWf8SuV6ISMfyERVueBJmJAG/O7BUarbyX8+1HDN53G9+ELcZj9PX474IfKnyXrxjQjWWKlGSU0z05AIr1u6FYmGHkJxO5Idn5uDLxU4WLq6AmNk4/xIlejFJLUT/6t5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780309351; c=relaxed/simple; bh=vMBTbZD/lSHEHatgPrv4IJ3lUE+xbmEaPFoEDjImcNc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KBkngc8S9+pkNHReqI5v7nmuYIR6BWoh0cyvoXRp9ILMzPGMffErrah/OvHNVDomqMP9jJLptQlD0QWAuBvvvZdKsPqNaqw0ZO/KZXMqkyQUPWDlU0IHgjJziPHEeQBNGU4cAfnrHLzrb5lfwPYEaSlBwrT9abaZKg8w+vHDCpM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=PvRsohE/; arc=none smtp.client-ip=95.215.58.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="PvRsohE/" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1780309345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ODVkNxNqhWTJOtKAdxjrGaooJs+N26LEMo3Jm83wAZk=; b=PvRsohE/ZO3afbXRx9ClOgEAbzgQczwO49YoCB7UQqiTfEVBmtIxaF8Vwuow2kGr0loz9v 4wQPHODBVox1noVmxIaEZhk4PUwfCGWuvg+Uu3skr76sAsSqd72Z2jFbLzJ5KIADisNOXt uXoCMcINK8Ff2a53qPVmFtSnw/S+72g= From: Usama Arif To: Andrew Morton , david@kernel.org, willy@infradead.org, ryan.roberts@arm.com, linux-mm@kvack.org Cc: pfalcato@suse.de, r@hev.cc, jack@suse.cz, Andrew Donnellan , apopple@nvidia.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com, kees@kernel.org, kevin.brodsky@arm.com, lance.yang@linux.dev, Liam R. Howlett , linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ljs@kernel.org, mhocko@suse.com, npache@redhat.com, pasha.tatashin@soleen.com, rmclure@linux.ibm.com, rppt@kernel.org, surenb@google.com, vbabka@kernel.org, Al Viro , ziy@nvidia.com, hannes@cmpxchg.org, kas@kernel.org, shakeel.butt@linux.dev, kernel-team@meta.com, Usama Arif Subject: [PATCH v7 1/2] mm: bypass mmap_miss heuristic for VM_EXEC readahead Date: Mon, 1 Jun 2026 03:21:17 -0700 Message-ID: <20260601102205.3985788-2-usama.arif@linux.dev> In-Reply-To: <20260601102205.3985788-1-usama.arif@linux.dev> References: <20260601102205.3985788-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The mmap_miss heuristic is intended to stop speculative mmap readahead when a file looks like a random-access workload. That does not fit the VM_EXEC path very well. VM_EXEC readahead is already constrained differently from ordinary mmap read-around: it is bounded by the VMA, uses exec_folio_order() to choose an order useful for executable mappings, and sets async_size to 0 so it does not create follow-on readahead. When VM_HUGEPAGE is also present, the larger readahead is an explicit userspace opt-in. The mmap_miss counter is decremented from cache-hit paths in do_async_mmap_readahead() and filemap_map_pages(). Those paths are not always enough to balance the synchronous miss increments for executable mappings. In particular, when fault-around is effectively disabled, such as configurations where fault_around_pages is 1, filemap_map_pages() is not reached from the fault path. The counter can then become a stale throttle for VM_EXEC mappings and suppress the readahead behavior that the executable-specific path is trying to provide. Skip both mmap_miss increments and decrements for VM_EXEC mappings, matching the existing VM_SEQ_READ treatment and keeping the counter accounting symmetric. Signed-off-by: Usama Arif Reviewed-by: Jan Kara Reviewed-by: Kiryl Shutsemau (Meta) Reviewed-by: Oscar Salvador (SUSE) Reviewed-by: Pedro Falcato --- mm/filemap.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index cca20e350c95..a16b33e0fc71 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3339,7 +3339,7 @@ static struct file *do_sync_mmap_readahead(struct vm_= fault *vmf) } } =20 - if (!(vm_flags & VM_SEQ_READ)) { + if (!(vm_flags & (VM_SEQ_READ | VM_EXEC))) { /* Avoid banging the cache line if not needed */ mmap_miss =3D READ_ONCE(ra->mmap_miss); if (mmap_miss < MMAP_LOTSAMISS * 10) @@ -3434,12 +3434,12 @@ static struct file *do_async_mmap_readahead(struct = vm_fault *vmf, * times for a single folio and break the balance with mmap_miss * increase in do_sync_mmap_readahead(). * - * VM_SEQ_READ mappings skip the mmap_miss increment in + * VM_SEQ_READ and VM_EXEC mappings skip the mmap_miss increment in * do_sync_mmap_readahead(), so skip the decrement here as well to * keep the counter symmetric. */ if (likely(!folio_test_locked(folio)) && - !(vmf->vma->vm_flags & VM_SEQ_READ)) { + !(vmf->vma->vm_flags & (VM_SEQ_READ | VM_EXEC))) { mmap_miss =3D READ_ONCE(ra->mmap_miss); if (mmap_miss) WRITE_ONCE(ra->mmap_miss, --mmap_miss); @@ -3941,14 +3941,14 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, * Don't decrease mmap_miss in this scenario to make sure * we can stop read-ahead. * - * VM_SEQ_READ mappings skip the mmap_miss increment in - * do_sync_mmap_readahead(), so skip the decrement here as - * well to keep the counter symmetric. + * VM_SEQ_READ and VM_EXEC mappings skip the mmap_miss + * increment in do_sync_mmap_readahead(), so skip the + * decrement here as well to keep the counter symmetric. */ if ((map_ret & VM_FAULT_NOPAGE) && !(vmf->flags & FAULT_FLAG_TRIED) && !folio_test_workingset(folio) && - !(vma->vm_flags & VM_SEQ_READ)) { + !(vma->vm_flags & (VM_SEQ_READ | VM_EXEC))) { unsigned short mmap_miss; =20 mmap_miss =3D READ_ONCE(file->f_ra.mmap_miss); --=20 2.52.0 From nobody Mon Jun 8 06:35:47 2026 Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1268E39A046 for ; Mon, 1 Jun 2026 10:22:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780309359; cv=none; b=UVz1XZgldWT3b/IQBE5nTcdJZ+W0Ya3bPDsgwtqcfyXRt4pQRviRWmsDwsexOrMBJ6x9Sq+rZeTxBXoXKF0KQH/60qtGDDFq+kbUuZybAG+/SjNji6wBpDVp7OmUK4eWssGDJ7as8nklnwbNUhEGrwm96J+4bVlDpY6Q64oKLXI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780309359; c=relaxed/simple; bh=BNjqGBL9lpVcrJTmJbHdzy0LxyQsqyqrFTJgqqYpO0s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G2p8RmOz9d7KRX9Lw/lb27sJro4IV4Sm8p+HSzUFZNTD55ICEYrRwB7rtvebNpwheQgwOBQb5+byBpv4qZIuY1ZlEtmEw1/l787XclELBBulz+gjYV+8ep5hcaL6Izj5p1b4eNwkDeIntFJsj2pruuR5hMiTmCj+uUYLwRhkIKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=p7dsjWCH; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="p7dsjWCH" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1780309352; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oPZZdv/NbNMHW5NCcV0iaaUZxovjPZu3VG+R2rQdX+U=; b=p7dsjWCHL7EpFMCaLyQrf7JLU/eEAG83YHK41hodap461fEKBTCpvHcg6QaMvbIVabbEuD rDXBjEqPL6/pOhxj51YoHh1WwSj5xBhRRt9AWX50DXF6rv3q1XKIBhQBBSGoDKv6cg4/fP JZs/KGtesM/+F98iHSLMIvREohG76CM= From: Usama Arif To: Andrew Morton , david@kernel.org, willy@infradead.org, ryan.roberts@arm.com, linux-mm@kvack.org Cc: pfalcato@suse.de, r@hev.cc, jack@suse.cz, Andrew Donnellan , apopple@nvidia.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, brauner@kernel.org, catalin.marinas@arm.com, dev.jain@arm.com, kees@kernel.org, kevin.brodsky@arm.com, lance.yang@linux.dev, Liam R. Howlett , linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, ljs@kernel.org, mhocko@suse.com, npache@redhat.com, pasha.tatashin@soleen.com, rmclure@linux.ibm.com, rppt@kernel.org, surenb@google.com, vbabka@kernel.org, Al Viro , ziy@nvidia.com, hannes@cmpxchg.org, kas@kernel.org, shakeel.butt@linux.dev, kernel-team@meta.com, Usama Arif Subject: [PATCH v7 2/2] mm: use mapping_max_folio_order() for force_thp_readahead order Date: Mon, 1 Jun 2026 03:21:18 -0700 Message-ID: <20260601102205.3985788-3-usama.arif@linux.dev> In-Reply-To: <20260601102205.3985788-1-usama.arif@linux.dev> References: <20260601102205.3985788-1-usama.arif@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" The force_thp_readahead path in do_sync_mmap_readahead() is gated on HPAGE_PMD_ORDER <=3D MAX_PAGECACHE_ORDER and always requests HPAGE_PMD_ORDER / HPAGE_PMD_NR. On configurations where HPAGE_PMD_ORDER exceeds MAX_PAGECACHE_ORDER, notably arm64 with a 64K base page size, VM_HUGEPAGE mappings cannot use this path and fall back to the non-forced mmap readahead path even when the mapping supports useful large folios. Enable forced readahead for mappings that support large folios and request the max folio order supported by the mapping, capped at 2M. 2MB is chosen as the cap because it matches the PMD size on x86_64 and on arm64 with 4K base pages, so the size/memory-pressure tradeoff for folios of that size is already well understood. On arm64 with 16K and 64K base page sizes, 2MB is also the contiguous-PTE (contpte) block size, so the resulting folios coalesce into a single TLB entry and reduce TLB pressure on the readahead path. This will result in 32M folios not being faulted in with 16K base page size for arm64, but with contpte, the performance difference should be negligible. The final allocation order may still be clamped by page_cache_ra_order() to the mapping and request geometry, but this gives VM_HUGEPAGE mappings on such configurations a large-folio readahead request instead of dropping back to base-page readahead. Signed-off-by: Usama Arif Reviewed-by: Jan Kara Reviewed-by: Pedro Falcato --- mm/filemap.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index a16b33e0fc71..9cf89efaf3f1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3312,14 +3312,26 @@ static struct file *do_sync_mmap_readahead(struct v= m_fault *vmf) struct file *fpin =3D NULL; vm_flags_t vm_flags =3D vmf->vma->vm_flags; bool force_thp_readahead =3D false; + unsigned int thp_order =3D 0; unsigned short mmap_miss; =20 ractl._max_index =3D vmf->vma->vm_pgoff + vma_pages(vmf->vma) - 1; =20 /* Use the readahead code, even if readahead is disabled */ - if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && - (vm_flags & VM_HUGEPAGE) && HPAGE_PMD_ORDER <=3D MAX_PAGECACHE_ORDER) - force_thp_readahead =3D true; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && (vm_flags & VM_HUGEPAGE)) { + /* + * Cap max THP order at 2MB: this is the common PMD-sized + * hugepage size, and it avoids memory pressure from very + * large forced readahead when mapping_max_folio_order() is + * high (for example, 128MB with 64K base pages on arm64). + */ + if (mapping_large_folio_support(mapping)) { + force_thp_readahead =3D true; + thp_order =3D min_t(unsigned int, + mapping_max_folio_order(mapping), + get_order(SZ_2M)); + } + } =20 if (!force_thp_readahead) { /* @@ -3354,17 +3366,19 @@ static struct file *do_sync_mmap_readahead(struct v= m_fault *vmf) } =20 if (force_thp_readahead) { + unsigned long folio_nr_pages =3D 1UL << thp_order; + fpin =3D maybe_unlock_mmap_for_io(vmf, fpin); - ractl._index &=3D ~((unsigned long)HPAGE_PMD_NR - 1); - ra->size =3D HPAGE_PMD_NR; + ractl._index &=3D ~(folio_nr_pages - 1); + ra->size =3D folio_nr_pages; /* - * Fetch two PMD folios, so we get the chance to actually + * Fetch two folios so we get the chance to actually * readahead, unless we've been told not to. */ if (!(vm_flags & VM_RAND_READ)) ra->size *=3D 2; - ra->async_size =3D HPAGE_PMD_NR; - ra->order =3D HPAGE_PMD_ORDER; + ra->async_size =3D folio_nr_pages; + ra->order =3D thp_order; page_cache_ra_order(&ractl, ra); return fpin; } --=20 2.52.0