From nobody Sat Feb 7 08:27:09 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08347242D9D; Mon, 19 Jan 2026 06:35:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804513; cv=none; b=at1pL4h61ek3BRsFa9elGcB4sxyyxKDEoZLm/tdPbbICkXoDQKQi3acI/U2dZLJ+1eSV3kB+vCRcCpWH1SqvfYo5ANj07YSttgPN1QeTYZrMti4mE0v5xUPIN3s/f3+i00XHpLeg+Hwt/YvW+nqurtWySPFTh/Vu5ffLKEb3FCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804513; c=relaxed/simple; bh=a5abOE0XeFOEyTETIaxGrTOrOdiAfrb8kXNH975hzK0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=usd9bUDsK+waOQkzikHF1Uke2vcNrrcoBDAgeC6XfKMGe2mD5NmJB2enPAuhFfg2tEZIJByOAOpooNS2RCaG0+/UOUSPnZ77qeZGeV/36TsEvEXPaurUPyWAPRY9LN1ul9Jg5rTJkoaCz6kTjLoCFleha7chv9vZKzo6hzJnPKM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aJ4LI/pq; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aJ4LI/pq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768804511; x=1800340511; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=a5abOE0XeFOEyTETIaxGrTOrOdiAfrb8kXNH975hzK0=; b=aJ4LI/pqLeDkyHjmuNLmfO/0FUR4gZFUw8pNdkx+ZtxgGbFz0NbrUq6H 4o29EQfO39fvs1BTnQf+lLvGX6ChcdJFnRlFJkbaDfgclClEK7Mo+0s+G tNt7hgY8xigV86j6uFTWwoDnCMWNyPHpGE3YgPD853TbUX3HakyugT//g +OhQ/p8uOQsqtFQ5nZuuW0E23tn/0H9asDC071HKrG9pklsVComAKmvMn q+OE6cgtfH50HpCfcbjbj93dKsyn7GygYyd7ucy0qOCkGCkla70DqOEBn kjEosNvDkf4QcV6YOCa19/F2+cAAHCpSCS3oXSK88vFtXbhsZEVRDLdnh w==; X-CSE-ConnectionGUID: K2ZJjvVESaCXYxTnSX41jA== X-CSE-MsgGUID: JALRGpQyT4eQrmPe0KnpkA== X-IronPort-AV: E=McAfee;i="6800,10657,11675"; a="57565308" X-IronPort-AV: E=Sophos;i="6.21,237,1763452800"; d="scan'208";a="57565308" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2026 22:35:10 -0800 X-CSE-ConnectionGUID: qU3Bo5F6QBOFIpmXHdfW8A== X-CSE-MsgGUID: GaYqelzQQTuu6lB6A6GdfA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,237,1763452800"; d="scan'208";a="205824307" Received: from linux-pnp-server-15.sh.intel.com ([10.239.177.153]) by orviesa007.jf.intel.com with ESMTP; 18 Jan 2026 22:35:05 -0800 From: Zhiguo Zhou To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: willy@infradead.org, akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, muchun.song@linux.dev, osalvador@suse.de, linux-kernel@vger.kernel.org, tianyou.li@intel.com, tim.c.chen@linux.intel.com, gang.deng@intel.com, Zhiguo Zhou Subject: [PATCH 1/2] mm/filemap: refactor __filemap_add_folio to separate critical section Date: Mon, 19 Jan 2026 14:50:24 +0800 Message-ID: <20260119065027.918085-2-zhiguo.zhou@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260119065027.918085-1-zhiguo.zhou@intel.com> References: <20260119065027.918085-1-zhiguo.zhou@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch refactors __filemap_add_folio() to extract its core critical section logic into a new helper function, __filemap_add_folio_xa_locked(). The refactoring maintains the existing functionality while enabling finer control over locking granularity for callers. Key changes: - Move the xarray insertion logic from __filemap_add_folio() into __filemap_add_folio_xa_locked() - Modify __filemap_add_folio() to accept a pre-initialized xa_state and a 'xa_locked' parameter - Update the function signature in the header file accordingly - Adjust existing callers (filemap_add_folio() and hugetlb_add_to_page_cache()) to use the new interface The refactoring is functionally equivalent to the previous code: - When 'xa_locked' is false, __filemap_add_folio() acquires the xarray lock internally (existing behavior) - When 'xa_locked' is true, the caller is responsible for holding the xarray lock, and __filemap_add_folio() only executes the critical section This separation prepares for the subsequent patch that introduces batch folio insertion, where multiple folios can be added to the page cache within a single lock hold. No performance changes are expected from this patch alone, as it only reorganizes code without altering the execution flow. Reported-by: Gang Deng Reviewed-by: Tianyou Li Reviewed-by: Tim Chen Signed-off-by: Zhiguo Zhou --- include/linux/pagemap.h | 2 +- mm/filemap.c | 173 +++++++++++++++++++++++----------------- mm/hugetlb.c | 3 +- 3 files changed, 103 insertions(+), 75 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..59cbf57fb55b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1297,7 +1297,7 @@ loff_t mapping_seek_hole_data(struct address_space *,= loff_t start, loff_t end, =20 /* Must be non-static for BPF error injection */ int __filemap_add_folio(struct address_space *mapping, struct folio *folio, - pgoff_t index, gfp_t gfp, void **shadowp); + struct xa_state *xas, gfp_t gfp, void **shadowp, bool xa_locked); =20 bool filemap_range_has_writeback(struct address_space *mapping, loff_t start_byte, loff_t end_byte); diff --git a/mm/filemap.c b/mm/filemap.c index ebd75684cb0a..eb9e28e5cbd7 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -845,95 +845,114 @@ void replace_page_cache_folio(struct folio *old, str= uct folio *new) } EXPORT_SYMBOL_GPL(replace_page_cache_folio); =20 -noinline int __filemap_add_folio(struct address_space *mapping, - struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp) +/* + * The critical section for storing a folio in an XArray. + * Context: Expects xas->xa->xa_lock to be held. + */ +static void __filemap_add_folio_xa_locked(struct xa_state *xas, + struct address_space *mapping, struct folio *folio, void **shadowp) { - XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); bool huge; long nr; unsigned int forder =3D folio_order(folio); + int order =3D -1; + void *entry, *old =3D NULL; =20 - VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); - VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping), - folio); - mapping_set_update(&xas, mapping); + lockdep_assert_held(xas->xa->xa_lock); =20 - VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio); huge =3D folio_test_hugetlb(folio); nr =3D folio_nr_pages(folio); =20 - gfp &=3D GFP_RECLAIM_MASK; - folio_ref_add(folio, nr); - folio->mapping =3D mapping; - folio->index =3D xas.xa_index; - - for (;;) { - int order =3D -1; - void *entry, *old =3D NULL; - - xas_lock_irq(&xas); - xas_for_each_conflict(&xas, entry) { - old =3D entry; - if (!xa_is_value(entry)) { - xas_set_err(&xas, -EEXIST); - goto unlock; - } - /* - * If a larger entry exists, - * it will be the first and only entry iterated. - */ - if (order =3D=3D -1) - order =3D xas_get_order(&xas); + xas_for_each_conflict(xas, entry) { + old =3D entry; + if (!xa_is_value(entry)) { + xas_set_err(xas, -EEXIST); + return; } + /* + * If a larger entry exists, + * it will be the first and only entry iterated. + */ + if (order =3D=3D -1) + order =3D xas_get_order(xas); + } =20 - if (old) { - if (order > 0 && order > forder) { - unsigned int split_order =3D max(forder, - xas_try_split_min_order(order)); - - /* How to handle large swap entries? */ - BUG_ON(shmem_mapping(mapping)); - - while (order > forder) { - xas_set_order(&xas, index, split_order); - xas_try_split(&xas, old, order); - if (xas_error(&xas)) - goto unlock; - order =3D split_order; - split_order =3D - max(xas_try_split_min_order( - split_order), - forder); - } - xas_reset(&xas); + if (old) { + if (order > 0 && order > forder) { + unsigned int split_order =3D max(forder, + xas_try_split_min_order(order)); + + /* How to handle large swap entries? */ + BUG_ON(shmem_mapping(mapping)); + + while (order > forder) { + xas_set_order(xas, xas->xa_index, split_order); + xas_try_split(xas, old, order); + if (xas_error(xas)) + return; + order =3D split_order; + split_order =3D + max(xas_try_split_min_order( + split_order), + forder); } - if (shadowp) - *shadowp =3D old; + xas_reset(xas); } + if (shadowp) + *shadowp =3D old; + } =20 - xas_store(&xas, folio); - if (xas_error(&xas)) - goto unlock; + xas_store(xas, folio); + if (xas_error(xas)) + return; =20 - mapping->nrpages +=3D nr; + mapping->nrpages +=3D nr; =20 - /* hugetlb pages do not participate in page cache accounting */ - if (!huge) { - lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr); - if (folio_test_pmd_mappable(folio)) - lruvec_stat_mod_folio(folio, - NR_FILE_THPS, nr); - } + /* hugetlb pages do not participate in page cache accounting */ + if (!huge) { + lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr); + if (folio_test_pmd_mappable(folio)) + lruvec_stat_mod_folio(folio, + NR_FILE_THPS, nr); + } +} =20 -unlock: - xas_unlock_irq(&xas); +noinline int __filemap_add_folio(struct address_space *mapping, + struct folio *folio, struct xa_state *xas, + gfp_t gfp, void **shadowp, bool xa_locked) +{ + long nr; =20 - if (!xas_nomem(&xas, gfp)) - break; + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + VM_BUG_ON_FOLIO(folio_test_swapbacked(folio), folio); + VM_BUG_ON_FOLIO(folio_order(folio) < mapping_min_folio_order(mapping), + folio); + mapping_set_update(xas, mapping); + + VM_BUG_ON_FOLIO(xas->xa_index & (folio_nr_pages(folio) - 1), folio); + nr =3D folio_nr_pages(folio); + + gfp &=3D GFP_RECLAIM_MASK; + folio_ref_add(folio, nr); + folio->mapping =3D mapping; + folio->index =3D xas->xa_index; + + if (xa_locked) { + lockdep_assert_held(xas->xa->xa_lock); + __filemap_add_folio_xa_locked(xas, mapping, folio, shadowp); + } else { + lockdep_assert_not_held(xas->xa->xa_lock); + for (;;) { + xas_lock_irq(xas); + __filemap_add_folio_xa_locked(xas, mapping, folio, shadowp); + xas_unlock_irq(xas); + + if (!xas_nomem(xas, gfp)) + break; + } } =20 - if (xas_error(&xas)) + if (xas_error(xas)) goto error; =20 trace_mm_filemap_add_to_page_cache(folio); @@ -942,12 +961,12 @@ noinline int __filemap_add_folio(struct address_space= *mapping, folio->mapping =3D NULL; /* Leave folio->index set: truncation relies upon it */ folio_put_refs(folio, nr); - return xas_error(&xas); + return xas_error(xas); } ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO); =20 -int filemap_add_folio(struct address_space *mapping, struct folio *folio, - pgoff_t index, gfp_t gfp) +static int _filemap_add_folio(struct address_space *mapping, struct folio = *folio, + struct xa_state *xas, gfp_t gfp, bool xa_locked) { void *shadow =3D NULL; int ret; @@ -963,7 +982,7 @@ int filemap_add_folio(struct address_space *mapping, st= ruct folio *folio, return ret; =20 __folio_set_locked(folio); - ret =3D __filemap_add_folio(mapping, folio, index, gfp, &shadow); + ret =3D __filemap_add_folio(mapping, folio, xas, gfp, &shadow, xa_locked); if (unlikely(ret)) { mem_cgroup_uncharge(folio); __folio_clear_locked(folio); @@ -987,6 +1006,14 @@ int filemap_add_folio(struct address_space *mapping, = struct folio *folio, } return ret; } + +int filemap_add_folio(struct address_space *mapping, struct folio *folio, + pgoff_t index, gfp_t gfp) +{ + XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio)); + + return _filemap_add_folio(mapping, folio, &xas, gfp, false); +} EXPORT_SYMBOL_GPL(filemap_add_folio); =20 #ifdef CONFIG_NUMA diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 51273baec9e5..5c6c6b9e463f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5657,10 +5657,11 @@ int hugetlb_add_to_page_cache(struct folio *folio, = struct address_space *mapping struct inode *inode =3D mapping->host; struct hstate *h =3D hstate_inode(inode); int err; + XA_STATE_ORDER(xas, &mapping->i_pages, idx, folio_order(folio)); =20 idx <<=3D huge_page_order(h); __folio_set_locked(folio); - err =3D __filemap_add_folio(mapping, folio, idx, GFP_KERNEL, NULL); + err =3D __filemap_add_folio(mapping, folio, &xas, GFP_KERNEL, NULL, false= ); =20 if (unlikely(err)) { __folio_clear_locked(folio); --=20 2.43.0 From nobody Sat Feb 7 08:27:09 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19BD326E175; Mon, 19 Jan 2026 06:35:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804519; cv=none; b=T1YMN6izTLIUHzWGN8MR2lpWvcIWsXeFIm6D2/tAPK1VggYFR2zDGrdOH7RXD64ZgSb0mtFzPKiI6u7t7RIIBCRXwEX9HqLRO1yTQ+BaYQyc1osGwRkj3qn7bctcuAXbQmHXWL+v4CkRmHj+HdBwLYo2PVOADCOTIzu/NC6XHV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768804519; c=relaxed/simple; bh=2i4jwv95LJfSLU9N5oDj7KqmLz5Pp2tMLrjdxcGCIok=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ogu5AK3k7JBuLw+x/22Ry8iDNm3GJGIGlDF6g4pfPRx9oyzDtZtsMBHVHqG5MeSoimWOGfnSoOuf5KV8OZS3XjHEO/TtotNCgXfaKWUB3phDewXfaCgbvKPtlrf9+HNl8N5p2KnUmZu2SIXFMQkAZ2swtx6br37+pWA2I4hsBxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BJCeMgow; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BJCeMgow" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1768804517; x=1800340517; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2i4jwv95LJfSLU9N5oDj7KqmLz5Pp2tMLrjdxcGCIok=; b=BJCeMgowQ80dEITXR20dr+mbLP0YL/tU84vL4s1/WawH/Wo0WAhT+YYS To88KlJBd+W6LEQZCDfcGPcTtw1a9IG0bSK9WB9X2D/qqfq5BOQyheRl5 UeSpvxvy9hF4l0lZvKJ21XOUsV8L2T7rPQqKGMsv4QApqrErFy49GdByj U7nH/BjZRCaGQoChwj31QU0rNMfzK86Jt2slQTfQPZvPviMMzrnH7gAiT cFwN7a+stldUcSYE0qPV0+jOtfuUY1tKTBypIdYkCs1ERZIPHb8DmMhCb ovVL6OYTgiZ/L9HCWG/OKnn+dk0ey717gOmf7/w7BSNQWubyH8fJJ/opn A==; X-CSE-ConnectionGUID: WJm4p1PjRaOHfQ9P+vfW9g== X-CSE-MsgGUID: hPKI0YXCQ6aievAmZwJaXw== X-IronPort-AV: E=McAfee;i="6800,10657,11675"; a="57565325" X-IronPort-AV: E=Sophos;i="6.21,237,1763452800"; d="scan'208";a="57565325" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2026 22:35:16 -0800 X-CSE-ConnectionGUID: lufysXv1SxWcJgRDDQu+dA== X-CSE-MsgGUID: ACq/P8ebQ3SpyWE5eOcZUA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.21,237,1763452800"; d="scan'208";a="205824330" Received: from linux-pnp-server-15.sh.intel.com ([10.239.177.153]) by orviesa007.jf.intel.com with ESMTP; 18 Jan 2026 22:35:12 -0800 From: Zhiguo Zhou To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: willy@infradead.org, akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, muchun.song@linux.dev, osalvador@suse.de, linux-kernel@vger.kernel.org, tianyou.li@intel.com, tim.c.chen@linux.intel.com, gang.deng@intel.com, Zhiguo Zhou Subject: [PATCH 2/2] mm/readahead: batch folio insertion to improve performance Date: Mon, 19 Jan 2026 14:50:25 +0800 Message-ID: <20260119065027.918085-3-zhiguo.zhou@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260119065027.918085-1-zhiguo.zhou@intel.com> References: <20260119065027.918085-1-zhiguo.zhou@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When `readahead` syscall is invocated, `page_cache_ra_unbounded` would insert folios into the page cache (`xarray`) individually. The `xa_lock` protected critical section could be scheduled across different cores, the cost of cacheline transfer together with lock contention may contribute significant part of execution time. To optimize the performance of `readahead`, the folio insertions are batched into a single critical section. This patch introduces `filemap_add_folio_range()`, which allows inserting an array of folios into a contiguous range of `xarray` while holding the lock only once. `page_cache_ra_unbounded` is updated to pre-allocate folios and use this new batching interface while keeping the original approach when memory is under pressure. The performance of RocksDB's `db_bench` for the `readseq` subcase [1] was tested on a 32-vCPU instance [2], and the results show: - Profiling shows the IPC of `page_cache_ra_unbounded` (excluding `raw_spin_lock_irq` overhead) improved by 2.18x. - Throughput (ops/sec) improved by 1.51x. - Latency reduced significantly: P50 by 63.9%, P75 by 42.1%, P99 by 31.4%. +------------+------------------+-----------------+-----------+ | Percentile | Latency (before) | Latency (after) | Reduction | +------------+------------------+-----------------+-----------+ | P50 | 6.15 usec | 2.22 usec | 63.92% | | P75 | 13.38 usec | 7.75 usec | 42.09% | | P99 | 507.95 usec | 348.54 usec | 31.38% | +------------+------------------+-----------------+-----------+ [1] Command to launch the test ./db_bench --benchmarks=3Dreadseq,stats --use_existing_db=3D1 --num_multi_db=3D32 --threads=3D32 --num=3D1600000 --value_size=3D8192 --cache_size=3D16GB [2] Hardware: Intel Ice Lake server Kernel : v6.19-rc5 Memory : 256GB Reported-by: Gang Deng Reviewed-by: Tianyou Li Reviewed-by: Tim Chen Signed-off-by: Zhiguo Zhou --- include/linux/pagemap.h | 2 + mm/filemap.c | 65 +++++++++++++ mm/readahead.c | 196 +++++++++++++++++++++++++++++++--------- 3 files changed, 222 insertions(+), 41 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 59cbf57fb55b..62cb90471372 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1286,6 +1286,8 @@ int add_to_page_cache_lru(struct page *page, struct a= ddress_space *mapping, pgoff_t index, gfp_t gfp); int filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp); +long filemap_add_folio_range(struct address_space *mapping, struct folio *= *folios, + pgoff_t start, pgoff_t end, gfp_t gfp); void filemap_remove_folio(struct folio *folio); void __filemap_remove_folio(struct folio *folio, void *shadow); void replace_page_cache_folio(struct folio *old, struct folio *new); diff --git a/mm/filemap.c b/mm/filemap.c index eb9e28e5cbd7..d0d79599c7fa 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1016,6 +1016,71 @@ int filemap_add_folio(struct address_space *mapping,= struct folio *folio, } EXPORT_SYMBOL_GPL(filemap_add_folio); =20 +/** + * filemap_add_folio_range - add folios to the page range [start, end) of = filemap. + * @mapping: The address space structure to add folios to. + * @folios: The array of folios to add to page cache. + * @start: The starting page cache index. + * @end: The ending page cache index (exclusive). + * @gfp: The memory allocator flags to use. + * + * This function adds folios to mapping->i_pages with contiguous indices. + * + * If an entry for an index in the range [start, end) already exists, a fo= lio is + * invalid, or _filemap_add_folio fails, this function aborts. All folios = up + * to the point of failure will have been inserted, the rest are left unin= serted. + * + * Return: If the pages are partially or fully added to the page cache, th= e number + * of pages (instead of folios) is returned. Elsewise, if no pages are ins= erted, + * the error number is returned. + */ +long filemap_add_folio_range(struct address_space *mapping, struct folio *= *folios, + pgoff_t start, pgoff_t end, gfp_t gfp) +{ + int ret; + XA_STATE_ORDER(xas, &mapping->i_pages, start, mapping_min_folio_order(map= ping)); + unsigned long min_nrpages =3D mapping_min_folio_nrpages(mapping); + + do { + xas_lock_irq(&xas); + + while (xas.xa_index < end) { + unsigned long index =3D (xas.xa_index - start) / min_nrpages; + struct folio *folio; + + folio =3D xas_load(&xas); + if (folio && !xa_is_value(folio)) { + ret =3D -EEXIST; + break; + } + + folio =3D folios[index]; + if (!folio) { + ret =3D -EINVAL; + break; + } + + ret =3D _filemap_add_folio(mapping, folio, &xas, gfp, true); + + if (unlikely(ret)) + break; + + /* + * On successful insertion, the folio's array entry is set to NULL. + * The caller is responsible for reclaiming any uninserted folios. + */ + folios[index] =3D NULL; + for (unsigned int i =3D 0; i < min_nrpages; i++) + xas_next(&xas); + } + + xas_unlock_irq(&xas); + } while (xas_nomem(&xas, gfp & GFP_RECLAIM_MASK)); + + return xas.xa_index > start ? (long) xas.xa_index - start : ret; +} +EXPORT_SYMBOL_GPL(filemap_add_folio_range); + #ifdef CONFIG_NUMA struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order, struct mempolicy *policy) diff --git a/mm/readahead.c b/mm/readahead.c index b415c9969176..4fe87b467d61 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -193,6 +193,149 @@ static struct folio *ractl_alloc_folio(struct readahe= ad_control *ractl, return folio; } =20 +static void ractl_free_folios(struct folio **folios, unsigned long folio_c= ount) +{ + unsigned long i; + + if (!folios) + return; + + for (i =3D 0; i < folio_count; ++i) { + if (folios[i]) + folio_put(folios[i]); + } + kvfree(folios); +} + +static struct folio **ractl_alloc_folios(struct readahead_control *ractl, + gfp_t gfp_mask, unsigned int order, + unsigned long folio_count) +{ + struct folio **folios; + unsigned long i; + + folios =3D kvcalloc(folio_count, sizeof(struct folio *), GFP_KERNEL); + + if (!folios) + return NULL; + + for (i =3D 0; i < folio_count; ++i) { + struct folio *folio =3D ractl_alloc_folio(ractl, gfp_mask, order); + + if (!folio) + break; + folios[i] =3D folio; + } + + if (i !=3D folio_count) { + ractl_free_folios(folios, i); + i =3D 0; + folios =3D NULL; + } + + return folios; +} + +static void ra_fill_folios_batched(struct readahead_control *ractl, + struct folio **folios, unsigned long nr_to_read, + unsigned long start_index, unsigned long mark, + gfp_t gfp_mask) +{ + struct address_space *mapping =3D ractl->mapping; + unsigned int min_nrpages =3D mapping_min_folio_nrpages(mapping); + unsigned long added_folios =3D 0; + unsigned long i =3D 0; + + while (i < nr_to_read) { + long ret; + unsigned long added_nrpages; + + ret =3D filemap_add_folio_range(mapping, folios + added_folios, + start_index + i, + start_index + nr_to_read, + gfp_mask); + + if (unlikely(ret < 0)) { + if (ret =3D=3D -ENOMEM) + break; + read_pages(ractl); + ractl->_index +=3D min_nrpages; + i =3D ractl->_index + ractl->_nr_pages - start_index; + continue; + } + + if (unlikely(ret =3D=3D 0)) + break; + + added_nrpages =3D ret; + /* + * `added_nrpages` is multiple of min_nrpages. + */ + added_folios +=3D added_nrpages / min_nrpages; + + if (i <=3D mark && mark < i + added_nrpages) + folio_set_readahead(xa_load(&mapping->i_pages, + start_index + mark)); + for (unsigned long j =3D i; j < i + added_nrpages; j +=3D min_nrpages) + ractl->_workingset |=3D folio_test_workingset(xa_load(&mapping->i_pages, + start_index + j)); + ractl->_nr_pages +=3D added_nrpages; + + i +=3D added_nrpages; + } +} + +static void ra_fill_folios_single(struct readahead_control *ractl, + unsigned long nr_to_read, + unsigned long start_index, unsigned long mark, + gfp_t gfp_mask) +{ + struct address_space *mapping =3D ractl->mapping; + unsigned int min_nrpages =3D mapping_min_folio_nrpages(mapping); + unsigned long i =3D 0; + + while (i < nr_to_read) { + struct folio *folio =3D xa_load(&mapping->i_pages, start_index + i); + int ret; + + if (folio && !xa_is_value(folio)) { + /* + * Page already present? Kick off the current batch + * of contiguous pages before continuing with the + * next batch. This page may be the one we would + * have intended to mark as Readahead, but we don't + * have a stable reference to this page, and it's + * not worth getting one just for that. + */ + read_pages(ractl); + ractl->_index +=3D min_nrpages; + i =3D ractl->_index + ractl->_nr_pages - start_index; + continue; + } + + folio =3D ractl_alloc_folio(ractl, gfp_mask, + mapping_min_folio_order(mapping)); + if (!folio) + break; + + ret =3D filemap_add_folio(mapping, folio, start_index + i, gfp_mask); + if (ret < 0) { + folio_put(folio); + if (ret =3D=3D -ENOMEM) + break; + read_pages(ractl); + ractl->_index +=3D min_nrpages; + i =3D ractl->_index + ractl->_nr_pages - start_index; + continue; + } + if (i =3D=3D mark) + folio_set_readahead(folio); + ractl->_workingset |=3D folio_test_workingset(folio); + ractl->_nr_pages +=3D min_nrpages; + i +=3D min_nrpages; + } +} + /** * page_cache_ra_unbounded - Start unchecked readahead. * @ractl: Readahead control. @@ -213,8 +356,10 @@ void page_cache_ra_unbounded(struct readahead_control = *ractl, struct address_space *mapping =3D ractl->mapping; unsigned long index =3D readahead_index(ractl); gfp_t gfp_mask =3D readahead_gfp_mask(mapping); - unsigned long mark =3D ULONG_MAX, i =3D 0; + unsigned long mark =3D ULONG_MAX; unsigned int min_nrpages =3D mapping_min_folio_nrpages(mapping); + struct folio **folios =3D NULL; + unsigned long alloc_folios =3D 0; =20 /* * Partway through the readahead operation, we will have added @@ -249,49 +394,18 @@ void page_cache_ra_unbounded(struct readahead_control= *ractl, } nr_to_read +=3D readahead_index(ractl) - index; ractl->_index =3D index; - + alloc_folios =3D DIV_ROUND_UP(nr_to_read, min_nrpages); /* * Preallocate as many pages as we will need. */ - while (i < nr_to_read) { - struct folio *folio =3D xa_load(&mapping->i_pages, index + i); - int ret; - - if (folio && !xa_is_value(folio)) { - /* - * Page already present? Kick off the current batch - * of contiguous pages before continuing with the - * next batch. This page may be the one we would - * have intended to mark as Readahead, but we don't - * have a stable reference to this page, and it's - * not worth getting one just for that. - */ - read_pages(ractl); - ractl->_index +=3D min_nrpages; - i =3D ractl->_index + ractl->_nr_pages - index; - continue; - } - - folio =3D ractl_alloc_folio(ractl, gfp_mask, - mapping_min_folio_order(mapping)); - if (!folio) - break; - - ret =3D filemap_add_folio(mapping, folio, index + i, gfp_mask); - if (ret < 0) { - folio_put(folio); - if (ret =3D=3D -ENOMEM) - break; - read_pages(ractl); - ractl->_index +=3D min_nrpages; - i =3D ractl->_index + ractl->_nr_pages - index; - continue; - } - if (i =3D=3D mark) - folio_set_readahead(folio); - ractl->_workingset |=3D folio_test_workingset(folio); - ractl->_nr_pages +=3D min_nrpages; - i +=3D min_nrpages; + folios =3D ractl_alloc_folios(ractl, gfp_mask, + mapping_min_folio_order(mapping), + alloc_folios); + if (folios) { + ra_fill_folios_batched(ractl, folios, nr_to_read, index, mark, gfp_mask); + ractl_free_folios(folios, alloc_folios); + } else { + ra_fill_folios_single(ractl, nr_to_read, index, mark, gfp_mask); } =20 /* --=20 2.43.0