From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC04D226D17 for ; Fri, 10 Oct 2025 01:19:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059201; cv=none; b=Zx7VPTQKza5F5Hh3xaiE8mFBGNCRNf8XizSKzQB/0CQXTQGwZoH7qv6o5NI5fqw2ztUTF1Il7DoJg3GEQcJWuTjSwwVoervBBCsMEVQ8cjmcukUH8Eig+v/8b5PfHFwkYPKsW7Av6NM5/C0MP+Ez7652I6+vqy+qDUQzIBEYLmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059201; c=relaxed/simple; bh=CieXPAT7L0KXJ1Qf82e6IIWAmTGlW8WFOLNYyhLZO7o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=goiOBb7ITPpsDJ0mjxUf0pljnMpp0o9uvsvOh6/X+R2cZpVwP8Si7WMgXRgdYadNo+Aq7J//CEpFqy1TC5HLGMNxmtMlG+ffs9tkFLzE8/cbfXWLclPlKaQuXO/rPTo9xQIvt6eGLPjLRs3INcd6OrOqS8JAkaxfiWfWiw8gDRM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oEAbV9Kj; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oEAbV9Kj" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-32ee62ed6beso4896841a91.2 for ; Thu, 09 Oct 2025 18:19:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059198; x=1760663998; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wWL6IOInipGN61wwvhlXqgYwYjt8N8hSERcEIULQIzc=; b=oEAbV9KjJNLrjNpIXNVgOkSGb2JAyxe0F2oq/dNI3SPrh8iNoMjrkae9GuQ8eqBhDS TBY4Dt5P1MDCGQWD3LmC8CYGFt1Ft7+wmIHGpYk0Oa4G0Hcf5jF3X6TWZBCWW+yAA5Z/ HMDgQRRR8OMrmLAgYPeIQQqUpjvCAHWP+IgqNQU15eZk3dQUNNY8/h7XAsTm4XsPsNWN Cwq0CvWTwaeiE/tPVHgsU0egvp8aPDfZWy2BjKP53ZaPiZ4hDWSKP4dpzneCYJ401Ju6 6oqp2pd5ZQIBUHucMngH3l3ttR6nHclahgNxbmIXFEPOmIoTXVRN5Nug2MOEw92nxG8f 8T2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059198; x=1760663998; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wWL6IOInipGN61wwvhlXqgYwYjt8N8hSERcEIULQIzc=; b=dBbxIpH1LwkYI/jyIQsTpTWvCe/PART2YSa9b8bSGGdo7+DrLyWoVMLSj6qbCH/Nxd KF55L4N0b/TeaFij7TaddmTAPxpZkQv6F0c0UEEDm/9iSqk7k3o0rO+lgydmyho70mgp 699G7/eCQVjr8asjwMf3BeQOH46yzbbR2gKed47EkKuoit9kCsjfcfzBM3iYoSydBqjS t7FHz2ZxsJlA/fbKRxKvnXQdkHktykS0eF/kvYLJ7QwqoeDhCb2OYx6JlKvCiGPq/HH/ iKYPu95qDDRCYuPLjYVDW9izNPEcBxJk4KHDsHmEkHIRtq3zVM6qlATnfc4upVuf5Av5 SciA== X-Forwarded-Encrypted: i=1; AJvYcCXPfHXXOY/BRCvyaL6F3U3/+yw1hNaIofn00DkAhszZsyXrzPjx4GWNdkMihZCe3+qZRKlKS5icehDIyPQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yz2guCpR6XNrMw6J3BIm/AmklcV23DNwBtZfjxX9pKuU+22dC67 uNhreB48bhPoeFNPeJdWjO9ocOzsz64WbnS5/g4vw88wDwoK3Gw2teCJ8pGQx0RqaNCera2NE7Z Q+o3ICg== X-Google-Smtp-Source: AGHT+IEpqxIR8ZbSsqevrabb/aH6hugnqiNBkZ/PJuQqn+O7nX/Nhi+MPmSC+SMcKvJS2d3JEgXROee0rmQ= X-Received: from pjwy2.prod.google.com ([2002:a17:90a:d0c2:b0:32d:a4d4:bb17]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1e0c:b0:330:a228:d2c with SMTP id 98e67ed59e1d1-33b51161d53mr13268919a91.15.1760059198181; Thu, 09 Oct 2025 18:19:58 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:44 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-2-surenb@google.com> Subject: [PATCH 1/8] mm: implement cleancache From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, Minchan Kim Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Cleancache can be thought of as a page-granularity victim cache for clean pages that the kernel's pageframe replacement algorithm (PFRA) would like to keep around, but can't since there isn't enough memory. So when the PFRA "evicts" a page, it first attempts to use cleancache code to put the data contained in that page into "transcendent memory", memory that is not directly accessible or addressable by the kernel. Later, when the system needs to access a page in a file on disk, it first checks cleancache to see if it already contains it; if it does, the page of data is copied into the kernel and a disk access is avoided. The patchset borrows the idea, some code and documentation from previous cleancache implementation but as opposed to being a thin pass-through layer, it now implements housekeeping code to associate cleancache pages with their inodes and handling of page pools donated by the cleancache backends. If also avoids intrusive hooks into filesystem code, limiting itself to hooks in mm reclaim and page-in paths and two hooks to detect new filesystem mount/unmount events. Signed-off-by: Suren Baghdasaryan Signed-off-by: Minchan Kim --- MAINTAINERS | 7 + block/bdev.c | 6 + fs/super.c | 3 + include/linux/cleancache.h | 67 +++ include/linux/fs.h | 6 + include/linux/pagemap.h | 1 + mm/Kconfig | 17 + mm/Makefile | 1 + mm/cleancache.c | 869 +++++++++++++++++++++++++++++++++++++ mm/filemap.c | 26 ++ mm/truncate.c | 4 + mm/vmscan.c | 1 + 12 files changed, 1008 insertions(+) create mode 100644 include/linux/cleancache.h create mode 100644 mm/cleancache.c diff --git a/MAINTAINERS b/MAINTAINERS index 8f5208ad442b..de7a89cd44a0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6049,6 +6049,13 @@ F: scripts/Makefile.clang F: scripts/clang-tools/ K: \b(?i:clang|llvm)\b =20 +CLEANCACHE +M: Suren Baghdasaryan +L: linux-mm@kvack.org +S: Maintained +F: include/linux/cleancache.h +F: mm/cleancache.c + CLK API M: Russell King L: linux-clk@vger.kernel.org diff --git a/block/bdev.c b/block/bdev.c index 810707cca970..8411b639d6db 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "../fs/internal.h" #include "blk.h" =20 @@ -101,6 +102,11 @@ void invalidate_bdev(struct block_device *bdev) lru_add_drain_all(); /* make sure all lru add caches are flushed */ invalidate_mapping_pages(mapping, 0, -1); } + /* + * 99% of the time, we don't need to flush the cleancache on the bdev. + * But, for the strange corners, lets be cautious + */ + cleancache_invalidate_inode(mapping, mapping->host); } EXPORT_SYMBOL(invalidate_bdev); =20 diff --git a/fs/super.c b/fs/super.c index 5bab94fb7e03..5639dc069528 100644 --- a/fs/super.c +++ b/fs/super.c @@ -31,6 +31,7 @@ #include #include #include +#include #include #include #include @@ -374,6 +375,7 @@ static struct super_block *alloc_super(struct file_syst= em_type *type, int flags, s->s_time_gran =3D 1000000000; s->s_time_min =3D TIME64_MIN; s->s_time_max =3D TIME64_MAX; + cleancache_add_fs(s); =20 s->s_shrink =3D shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, "sb-%s", type->name); @@ -469,6 +471,7 @@ void deactivate_locked_super(struct super_block *s) { struct file_system_type *fs =3D s->s_type; if (atomic_dec_and_test(&s->s_active)) { + cleancache_remove_fs(s); shrinker_free(s->s_shrink); fs->kill_sb(s); =20 diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h new file mode 100644 index 000000000000..458a7a25a8af --- /dev/null +++ b/include/linux/cleancache.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CLEANCACHE_H +#define _LINUX_CLEANCACHE_H + +#include +#include +#include + +/* super_block->cleancache_id value for an invalid ID */ +#define CLEANCACHE_ID_INVALID -1 + +#define CLEANCACHE_KEY_MAX 6 + + +#ifdef CONFIG_CLEANCACHE + +/* Hooks into MM and FS */ +void cleancache_add_fs(struct super_block *sb); +void cleancache_remove_fs(struct super_block *sb); +bool cleancache_store_folio(struct inode *inode, struct folio *folio); +bool cleancache_restore_folio(struct inode *inode, struct folio *folio); +bool cleancache_invalidate_folio(struct address_space *mapping, + struct inode *inode, struct folio *folio); +bool cleancache_invalidate_inode(struct address_space *mapping, + struct inode *inode); + +/* + * Backend API + * + * Cleancache does not touch page reference. Page refcount should be 1 when + * page is placed or returned into cleancache and pages obtained from + * cleancache will also have their refcount at 1. + */ +int cleancache_backend_register_pool(const char *name); +int cleancache_backend_get_folio(int pool_id, struct folio *folio); +int cleancache_backend_put_folio(int pool_id, struct folio *folio); +int cleancache_backend_put_folios(int pool_id, struct list_head *folios); + +#else /* CONFIG_CLEANCACHE */ + +static inline void cleancache_add_fs(struct super_block *sb) {} +static inline void cleancache_remove_fs(struct super_block *sb) {} +static inline bool cleancache_store_folio(struct inode *inode, + struct folio *folio) + { return false; } +static inline bool cleancache_restore_folio(struct inode *inode, + struct folio *folio) + { return false; } +static inline bool cleancache_invalidate_folio(struct address_space *mappi= ng, + struct inode *inode, + struct folio *folio) + { return false; } +static inline bool cleancache_invalidate_inode(struct address_space *mappi= ng, + struct inode *inode) + { return false; } +static inline int cleancache_backend_register_pool(const char *name) + { return -EOPNOTSUPP; } +static inline int cleancache_backend_get_folio(int pool_id, struct folio *= folio) + { return -EOPNOTSUPP; } +static inline int cleancache_backend_put_folio(int pool_id, struct folio *= folio) + { return -EOPNOTSUPP; } +static inline int cleancache_backend_put_folios(int pool_id, struct list_h= ead *folios) + { return -EOPNOTSUPP; } + +#endif /* CONFIG_CLEANCACHE */ + +#endif /* _LINUX_CLEANCACHE_H */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 43f3ef76db46..a24e36913cda 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1582,6 +1582,12 @@ struct super_block { =20 spinlock_t s_inode_wblist_lock; struct list_head s_inodes_wb; /* writeback inodes */ +#ifdef CONFIG_CLEANCACHE + /* + * Saved identifier for cleancache (CLEANCACHE_ID_INVALID means none) + */ + int cleancache_id; +#endif } __randomize_layout; =20 static inline struct user_namespace *i_user_ns(const struct inode *inode) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 09b581c1d878..7d9fa68ad0c9 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1269,6 +1269,7 @@ int add_to_page_cache_lru(struct page *page, struct a= ddress_space *mapping, int filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp); void filemap_remove_folio(struct folio *folio); +void store_into_cleancache(struct address_space *mapping, struct folio *fo= lio); void __filemap_remove_folio(struct folio *folio, void *shadow); void replace_page_cache_folio(struct folio *old, struct folio *new); void delete_from_page_cache_batch(struct address_space *mapping, diff --git a/mm/Kconfig b/mm/Kconfig index 0e26f4fc8717..7e2482c522a0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -948,6 +948,23 @@ config USE_PERCPU_NUMA_NODE_ID config HAVE_SETUP_PER_CPU_AREA bool =20 +config CLEANCACHE + bool "Enable cleancache to cache clean pages" + help + Cleancache can be thought of as a page-granularity victim cache + for clean pages that the kernel's pageframe replacement algorithm + (PFRA) would like to keep around, but can't since there isn't enough + memory. So when the PFRA "evicts" a page, it first attempts to use + cleancache code to put the data contained in that page into + "transcendent memory", memory that is not directly accessible or + addressable by the kernel and is of unknown and possibly + time-varying size. When system wishes to access a page in a file + on disk, it first checks cleancache to see if it already contains + it; if it does, the page is copied into the kernel and a disk + access is avoided. + + If unsure, say N. + config CMA bool "Contiguous Memory Allocator" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index 21abb3353550..b78073b87aea 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -146,3 +146,4 @@ obj-$(CONFIG_SHRINKER_DEBUG) +=3D shrinker_debug.o obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_PT_RECLAIM) +=3D pt_reclaim.o +obj-$(CONFIG_CLEANCACHE) +=3D cleancache.o diff --git a/mm/cleancache.c b/mm/cleancache.c new file mode 100644 index 000000000000..0023962de024 --- /dev/null +++ b/mm/cleancache.c @@ -0,0 +1,869 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Lock nesting: + * ccinode->folios.xa_lock + * fs->hash_lock + * + * ccinode->folios.xa_lock + * pool->lock + */ + +#define INODE_HASH_BITS 6 + +/* represents each file system instance hosted by the cleancache */ +struct cleancache_fs { + refcount_t ref_count; + DECLARE_HASHTABLE(inode_hash, INODE_HASH_BITS); + spinlock_t hash_lock; /* protects inode_hash */ + struct rcu_head rcu; +}; + +/* + * @cleancache_inode represents each ccinode in @cleancache_fs + * + * The cleancache_inode will be freed by RCU when the last folio from xarr= ay + * is freed, except for invalidate_inode() case. + */ +struct cleancache_inode { + struct inode *inode; + struct hlist_node hash; + refcount_t ref_count; + struct xarray folios; /* protected by folios.xa_lock */ + struct cleancache_fs *fs; + struct rcu_head rcu; +}; + +/* Cleancache backend memory pool */ +struct cleancache_pool { + struct list_head folio_list; + spinlock_t lock; /* protects folio_list */ +}; + +#define CLEANCACHE_MAX_POOLS 64 + +static DEFINE_IDR(fs_idr); +static DEFINE_SPINLOCK(fs_lock); +static struct kmem_cache *slab_inode; /* cleancache_inode slab */ +static struct cleancache_pool pools[CLEANCACHE_MAX_POOLS]; +static atomic_t nr_pools =3D ATOMIC_INIT(0); +static DEFINE_SPINLOCK(pools_lock); /* protects pools */ + +/* + * Folio attributes: + * folio->_mapcount - pool_id + * folio->mapping - ccinode reference or NULL if folio is unused + * folio->index - file offset + * + * Locking: + * pool_id is set when folio gets donated and cleared when it's revoked, + * therefore no locking is performed. + * folio->mapping and folio->index are accessed under pool->lock. + */ +static inline void init_cleancache_folio(struct folio *folio, int pool_id) +{ + atomic_set(&folio->_mapcount, pool_id); + folio->mapping =3D NULL; + folio->index =3D 0; +} + +static inline void clear_cleancache_folio(struct folio *folio) +{ + atomic_set(&folio->_mapcount, -1); +} + +static inline int folio_pool_id(struct folio *folio) +{ + return atomic_read(&folio->_mapcount); +} + +static inline struct cleancache_pool *folio_pool(struct folio *folio) +{ + return &pools[folio_pool_id(folio)]; +} + +static void attach_folio(struct folio *folio, struct cleancache_inode *cci= node, + unsigned long offset) +{ + lockdep_assert_held(&(folio_pool(folio)->lock)); + + folio->mapping =3D (struct address_space *)ccinode; + folio->index =3D offset; +} + +static void detach_folio(struct folio *folio) +{ + lockdep_assert_held(&(folio_pool(folio)->lock)); + + folio->mapping =3D NULL; + folio->index =3D 0; +} + +static void folio_attachment(struct folio *folio, struct cleancache_inode = **ccinode, + unsigned long *offset) +{ + lockdep_assert_held(&(folio_pool(folio)->lock)); + + *ccinode =3D (struct cleancache_inode *)folio->mapping; + *offset =3D folio->index; +} + +static inline bool is_folio_attached(struct folio *folio) +{ + lockdep_assert_held(&(folio_pool(folio)->lock)); + + return folio->mapping !=3D NULL; +} + +/* + * Folio pool helpers. + * Only detached folios are stored in the pool->folio_list. + * + * Locking: + * pool->folio_list is accessed under pool->lock. + */ +static void add_folio_to_pool(struct folio *folio, struct cleancache_pool = *pool) +{ + lockdep_assert_held(&pool->lock); + VM_BUG_ON(folio_pool(folio) !=3D pool); + VM_BUG_ON(!list_empty(&folio->lru)); + VM_BUG_ON(is_folio_attached(folio)); + + list_add(&folio->lru, &pool->folio_list); +} + +static struct folio *remove_folio_from_pool(struct folio *folio, struct cl= eancache_pool *pool) +{ + lockdep_assert_held(&pool->lock); + VM_BUG_ON(folio_pool(folio) !=3D pool); + + if (is_folio_attached(folio)) + return NULL; + + list_del_init(&folio->lru); + + return folio; +} + +static struct folio *pick_folio_from_any_pool(void) +{ + struct cleancache_pool *pool; + struct folio *folio =3D NULL; + int count; + + /* nr_pools can only increase, so the following loop is safe */ + count =3D atomic_read_acquire(&nr_pools); + for (int i =3D 0; i < count; i++) { + pool =3D &pools[i]; + spin_lock(&pool->lock); + if (!list_empty(&pool->folio_list)) { + folio =3D list_last_entry(&pool->folio_list, + struct folio, lru); + WARN_ON(!remove_folio_from_pool(folio, pool)); + spin_unlock(&pool->lock); + break; + } + spin_unlock(&pool->lock); + } + + return folio; +} + +/* FS helpers */ +static struct cleancache_fs *get_fs(int fs_id) +{ + struct cleancache_fs *fs; + + rcu_read_lock(); + fs =3D idr_find(&fs_idr, fs_id); + if (fs && !refcount_inc_not_zero(&fs->ref_count)) + fs =3D NULL; + rcu_read_unlock(); + + return fs; +} + +static unsigned int invalidate_inode(struct cleancache_fs *fs, + struct inode *inode); + +static void put_fs(struct cleancache_fs *fs) +{ + if (refcount_dec_and_test(&fs->ref_count)) { + struct cleancache_inode *ccinode; + struct hlist_node *tmp; + int cursor; + + /* + * There are no concurrent RCU walkers because they + * would have taken fs reference. + * We don't need to hold fs->hash_lock because there + * are no other users and no way to reach fs. + */ + hash_for_each_safe(fs->inode_hash, cursor, tmp, ccinode, hash) + invalidate_inode(fs, ccinode->inode); + /* + * Don't need to synchronize_rcu() and wait for all inodes to be + * freed because RCU read walkers can't take fs refcount anymore + * to start their walk. + */ + kfree_rcu(fs, rcu); + } +} + +/* cleancache_inode helpers. */ +static struct cleancache_inode *alloc_cleancache_inode(struct cleancache_f= s *fs, + struct inode *inode) +{ + struct cleancache_inode *ccinode; + + ccinode =3D kmem_cache_alloc(slab_inode, GFP_ATOMIC|__GFP_NOWARN); + if (ccinode) { + ccinode->inode =3D inode; + xa_init_flags(&ccinode->folios, XA_FLAGS_LOCK_IRQ); + INIT_HLIST_NODE(&ccinode->hash); + ccinode->fs =3D fs; + refcount_set(&ccinode->ref_count, 1); + } + + return ccinode; +} + +static void inode_free_rcu(struct rcu_head *rcu) +{ + struct cleancache_inode *ccinode; + + ccinode =3D container_of(rcu, struct cleancache_inode, rcu); + VM_BUG_ON(!xa_empty(&ccinode->folios)); + kmem_cache_free(slab_inode, ccinode); +} + +static inline bool get_inode(struct cleancache_inode *ccinode) +{ + return refcount_inc_not_zero(&ccinode->ref_count); +} + +static unsigned int erase_folios_from_inode(struct cleancache_inode *ccino= de, + struct xa_state *xas); + +static void put_inode(struct cleancache_inode *ccinode) +{ + VM_BUG_ON(refcount_read(&ccinode->ref_count) =3D=3D 0); + if (!refcount_dec_and_test(&ccinode->ref_count)) + return; + + lockdep_assert_not_held(&ccinode->folios.xa_lock); + VM_BUG_ON(!xa_empty(&ccinode->folios)); + call_rcu(&ccinode->rcu, inode_free_rcu); +} + +static void remove_inode_if_empty(struct cleancache_inode *ccinode) +{ + struct cleancache_fs *fs =3D ccinode->fs; + + lockdep_assert_held(&ccinode->folios.xa_lock); + + if (!xa_empty(&ccinode->folios)) + return; + + spin_lock(&fs->hash_lock); + hlist_del_init_rcu(&ccinode->hash); + spin_unlock(&fs->hash_lock); + /* + * Drop the refcount set in alloc_cleancache_inode(). Caller should + * have taken an extra refcount to keep ccinode valid, so ccinode + * will be freed once the caller releases it. + */ + put_inode(ccinode); +} + +static bool store_folio_in_inode(struct cleancache_inode *ccinode, + unsigned long offset, struct folio *folio) +{ + struct cleancache_pool *pool =3D folio_pool(folio); + int err; + + lockdep_assert_held(&ccinode->folios.xa_lock); + VM_BUG_ON(!list_empty(&folio->lru)); + + spin_lock(&pool->lock); + err =3D xa_err(__xa_store(&ccinode->folios, offset, folio, + GFP_ATOMIC|__GFP_NOWARN)); + if (!err) + attach_folio(folio, ccinode, offset); + spin_unlock(&pool->lock); + + return err =3D=3D 0; +} + +static void erase_folio_from_inode(struct cleancache_inode *ccinode, + unsigned long offset, struct folio *folio) +{ + bool removed; + + lockdep_assert_held(&ccinode->folios.xa_lock); + + removed =3D __xa_erase(&ccinode->folios, offset); + VM_BUG_ON(!removed); + remove_inode_if_empty(ccinode); +} + +static void move_folio_from_inode_to_pool(struct cleancache_inode *ccinode, + unsigned long offset, struct folio *folio) +{ + struct cleancache_pool *pool =3D folio_pool(folio); + + erase_folio_from_inode(ccinode, offset, folio); + spin_lock(&pool->lock); + detach_folio(folio); + add_folio_to_pool(folio, pool); + spin_unlock(&pool->lock); +} + +static bool isolate_folio_from_inode(struct cleancache_inode *ccinode, + unsigned long offset, struct folio *folio) +{ + bool isolated =3D false; + + xa_lock(&ccinode->folios); + if (xa_load(&ccinode->folios, offset) =3D=3D folio) { + struct cleancache_pool *pool =3D folio_pool(folio); + + erase_folio_from_inode(ccinode, offset, folio); + spin_lock(&pool->lock); + detach_folio(folio); + spin_unlock(&pool->lock); + isolated =3D true; + } + xa_unlock(&ccinode->folios); + + return isolated; +} + +static unsigned int erase_folios_from_inode(struct cleancache_inode *ccino= de, + struct xa_state *xas) +{ + unsigned int ret =3D 0; + struct folio *folio; + + lockdep_assert_held(&ccinode->folios.xa_lock); + + xas_for_each(xas, folio, ULONG_MAX) { + move_folio_from_inode_to_pool(ccinode, xas->xa_index, folio); + ret++; + } + + return ret; +} + +static struct cleancache_inode *find_and_get_inode(struct cleancache_fs *f= s, + struct inode *inode) +{ + struct cleancache_inode *ccinode =3D NULL; + struct cleancache_inode *tmp; + + rcu_read_lock(); + hash_for_each_possible_rcu(fs->inode_hash, tmp, hash, inode->i_ino) { + if (tmp->inode !=3D inode) + continue; + + if (get_inode(tmp)) { + ccinode =3D tmp; + break; + } + } + rcu_read_unlock(); + + return ccinode; +} + +static struct cleancache_inode *add_and_get_inode(struct cleancache_fs *fs, + struct inode *inode) +{ + struct cleancache_inode *ccinode, *tmp; + + ccinode =3D alloc_cleancache_inode(fs, inode); + if (!ccinode) + return ERR_PTR(-ENOMEM); + + spin_lock(&fs->hash_lock); + tmp =3D find_and_get_inode(fs, inode); + if (tmp) { + spin_unlock(&fs->hash_lock); + /* someone already added it */ + put_inode(ccinode); + put_inode(tmp); + return ERR_PTR(-EEXIST); + } + hash_add_rcu(fs->inode_hash, &ccinode->hash, inode->i_ino); + get_inode(ccinode); + spin_unlock(&fs->hash_lock); + + return ccinode; +} + +static void copy_folio_content(struct folio *from, struct folio *to) +{ + void *src =3D kmap_local_folio(from, 0); + void *dst =3D kmap_local_folio(to, 0); + + memcpy(dst, src, PAGE_SIZE); + kunmap_local(dst); + kunmap_local(src); +} + +/* + * We want to store only workingset folios in the cleancache to increase h= it + * ratio so there are four cases: + * + * @folio is workingset but cleancache doesn't have it: use new cleancache= folio + * @folio is workingset and cleancache has it: overwrite the stale data + * @folio is !workingset and cleancache doesn't have it: just bail out + * @folio is !workingset and cleancache has it: remove the stale @folio + */ +static bool store_into_inode(struct cleancache_fs *fs, + struct inode *inode, + pgoff_t offset, struct folio *folio) +{ + bool workingset =3D folio_test_workingset(folio); + struct cleancache_inode *ccinode; + struct folio *stored_folio; + bool new_inode =3D false; + bool ret =3D false; + +find_inode: + ccinode =3D find_and_get_inode(fs, inode); + if (!ccinode) { + if (!workingset) + return false; + + ccinode =3D add_and_get_inode(fs, inode); + if (IS_ERR_OR_NULL(ccinode)) { + /* + * Retry if someone just added new ccinode from under us. + */ + if (PTR_ERR(ccinode) =3D=3D -EEXIST) + goto find_inode; + + return false; + } + new_inode =3D true; + } + + xa_lock(&ccinode->folios); + stored_folio =3D xa_load(&ccinode->folios, offset); + if (stored_folio) { + if (!workingset) { + move_folio_from_inode_to_pool(ccinode, offset, stored_folio); + goto out_unlock; + } + } else { + if (!workingset) + goto out_unlock; + + stored_folio =3D pick_folio_from_any_pool(); + if (!stored_folio) { + /* No free folios, TODO: try reclaiming */ + goto out_unlock; + } + + if (!store_folio_in_inode(ccinode, offset, stored_folio)) { + struct cleancache_pool *pool =3D folio_pool(stored_folio); + + /* Return stored_folio back into pool */ + spin_lock(&pool->lock); + add_folio_to_pool(stored_folio, pool); + spin_unlock(&pool->lock); + goto out_unlock; + } + } + copy_folio_content(folio, stored_folio); + + ret =3D true; +out_unlock: + /* Free ccinode if it was created but no folio was stored in it. */ + if (new_inode) + remove_inode_if_empty(ccinode); + xa_unlock(&ccinode->folios); + put_inode(ccinode); + + return ret; +} + +static bool load_from_inode(struct cleancache_fs *fs, + struct inode *inode, + pgoff_t offset, struct folio *folio) +{ + struct cleancache_inode *ccinode; + struct folio *stored_folio; + bool ret =3D false; + + ccinode =3D find_and_get_inode(fs, inode); + if (!ccinode) + return false; + + xa_lock(&ccinode->folios); + stored_folio =3D xa_load(&ccinode->folios, offset); + if (stored_folio) { + copy_folio_content(stored_folio, folio); + ret =3D true; + } + xa_unlock(&ccinode->folios); + put_inode(ccinode); + + return ret; +} + +static bool invalidate_folio(struct cleancache_fs *fs, + struct inode *inode, pgoff_t offset) +{ + struct cleancache_inode *ccinode; + struct folio *folio; + + ccinode =3D find_and_get_inode(fs, inode); + if (!ccinode) + return false; + + xa_lock(&ccinode->folios); + folio =3D xa_load(&ccinode->folios, offset); + if (folio) + move_folio_from_inode_to_pool(ccinode, offset, folio); + xa_unlock(&ccinode->folios); + put_inode(ccinode); + + return folio !=3D NULL; +} + +static unsigned int invalidate_inode(struct cleancache_fs *fs, + struct inode *inode) +{ + struct cleancache_inode *ccinode; + unsigned int ret; + + ccinode =3D find_and_get_inode(fs, inode); + if (ccinode) { + XA_STATE(xas, &ccinode->folios, 0); + + xas_lock(&xas); + ret =3D erase_folios_from_inode(ccinode, &xas); + xas_unlock(&xas); + put_inode(ccinode); + + return ret; + } + + return 0; +} + +/* Hooks into MM and FS */ +void cleancache_add_fs(struct super_block *sb) +{ + int fs_id; + struct cleancache_fs *fs; + + fs =3D kzalloc(sizeof(struct cleancache_fs), GFP_KERNEL); + if (!fs) + goto err; + + spin_lock_init(&fs->hash_lock); + hash_init(fs->inode_hash); + refcount_set(&fs->ref_count, 1); + idr_preload(GFP_KERNEL); + spin_lock(&fs_lock); + fs_id =3D idr_alloc(&fs_idr, fs, 0, 0, GFP_NOWAIT); + spin_unlock(&fs_lock); + idr_preload_end(); + if (fs_id < 0) { + pr_warn("too many file systems\n"); + goto err_free; + } + sb->cleancache_id =3D fs_id; + + return; +err_free: + kfree(fs); +err: + sb->cleancache_id =3D CLEANCACHE_ID_INVALID; +} + +void cleancache_remove_fs(struct super_block *sb) +{ + int fs_id =3D sb->cleancache_id; + struct cleancache_fs *fs; + + sb->cleancache_id =3D CLEANCACHE_ID_INVALID; + fs =3D get_fs(fs_id); + if (!fs) + return; + + spin_lock(&fs_lock); + idr_remove(&fs_idr, fs_id); + spin_unlock(&fs_lock); + put_fs(fs); + + /* free the object */ + put_fs(fs); +} + +bool cleancache_store_folio(struct inode *inode, struct folio *folio) +{ + struct cleancache_fs *fs; + int fs_id; + bool ret; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!inode) + return false; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return false; + + fs_id =3D folio->mapping->host->i_sb->cleancache_id; + if (fs_id =3D=3D CLEANCACHE_ID_INVALID) + return false; + + fs =3D get_fs(fs_id); + if (!fs) + return false; + + ret =3D store_into_inode(fs, inode, folio->index, folio); + put_fs(fs); + + return ret; +} + +bool cleancache_restore_folio(struct inode *inode, struct folio *folio) +{ + struct cleancache_fs *fs; + int fs_id; + bool ret; + + if (!inode) + return false; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return false; + + fs_id =3D folio->mapping->host->i_sb->cleancache_id; + if (fs_id =3D=3D CLEANCACHE_ID_INVALID) + return false; + + fs =3D get_fs(fs_id); + if (!fs) + return false; + + ret =3D load_from_inode(fs, inode, folio->index, folio); + put_fs(fs); + + return ret; +} + +bool cleancache_invalidate_folio(struct address_space *mapping, + struct inode *inode, struct folio *folio) +{ + struct cleancache_fs *fs; + int fs_id; + bool ret; + + VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); + + if (!inode) + return false; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return false; + + /* Careful, folio->mapping can be NULL */ + fs_id =3D mapping->host->i_sb->cleancache_id; + if (fs_id =3D=3D CLEANCACHE_ID_INVALID) + return false; + + fs =3D get_fs(fs_id); + if (!fs) + return false; + + ret =3D invalidate_folio(fs, inode, folio->index); + put_fs(fs); + + return ret; +} + +bool cleancache_invalidate_inode(struct address_space *mapping, + struct inode *inode) +{ + struct cleancache_fs *fs; + unsigned int count; + int fs_id; + + if (!inode) + return false; + + fs_id =3D mapping->host->i_sb->cleancache_id; + if (fs_id =3D=3D CLEANCACHE_ID_INVALID) + return false; + + fs =3D get_fs(fs_id); + if (!fs) + return false; + + count =3D invalidate_inode(fs, inode); + put_fs(fs); + + return count > 0; +} + +/* Backend API */ +/* + * Register a new backend and add its folios for cleancache to use. + * Returns pool id on success or a negative error code on failure. + */ +int cleancache_backend_register_pool(const char *name) +{ + struct cleancache_pool *pool; + int pool_id; + + /* pools_lock prevents concurrent registrations */ + spin_lock(&pools_lock); + pool_id =3D atomic_read(&nr_pools); + if (pool_id >=3D CLEANCACHE_MAX_POOLS) { + spin_unlock(&pools_lock); + return -ENOMEM; + } + + pool =3D &pools[pool_id]; + INIT_LIST_HEAD(&pool->folio_list); + spin_lock_init(&pool->lock); + /* Ensure above stores complete before we increase the count */ + atomic_set_release(&nr_pools, pool_id + 1); + spin_unlock(&pools_lock); + + pr_info("Registered \'%s\' cleancache backend, pool id %d\n", + name ? : "none", pool_id); + + return pool_id; +} +EXPORT_SYMBOL(cleancache_backend_register_pool); + +int cleancache_backend_get_folio(int pool_id, struct folio *folio) +{ + struct cleancache_inode *ccinode; + struct cleancache_pool *pool; + unsigned long offset; + + /* Do not support large folios yet */ + if (folio_test_large(folio)) + return -EOPNOTSUPP; + + /* Does the folio belong to the requesting backend */ + if (folio_pool_id(folio) !=3D pool_id) + return -EINVAL; + + pool =3D &pools[pool_id]; +again: + spin_lock(&pool->lock); + + /* If folio is free in the pool, return it */ + if (remove_folio_from_pool(folio, pool)) { + spin_unlock(&pool->lock); + goto out; + } + /* + * The folio is not free, therefore it has to belong to a valid ccinode. + * Operations on CCacheFree and folio->mapping are done under + * pool->lock which we are currently holding and CCacheFree + * always gets cleared before folio->mapping is set. + */ + folio_attachment(folio, &ccinode, &offset); + if (WARN_ON(!ccinode || !get_inode(ccinode))) { + spin_unlock(&pool->lock); + return -EINVAL; + } + + spin_unlock(&pool->lock); + + /* Retry if the folio got erased from the ccinode */ + if (!isolate_folio_from_inode(ccinode, offset, folio)) { + put_inode(ccinode); + goto again; + } + + put_inode(ccinode); +out: + VM_BUG_ON_FOLIO(folio_ref_count(folio) !=3D 0, (folio)); + clear_cleancache_folio(folio); + + return 0; +} +EXPORT_SYMBOL(cleancache_backend_get_folio); + +int cleancache_backend_put_folio(int pool_id, struct folio *folio) +{ + struct cleancache_pool *pool =3D &pools[pool_id]; + + /* Do not support large folios yet */ + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + + /* Can't put a still used folio into cleancache */ + if (folio_ref_count(folio) !=3D 0) + return -EINVAL; + + /* Reset struct folio fields */ + init_cleancache_folio(folio, pool_id); + INIT_LIST_HEAD(&folio->lru); + spin_lock(&pool->lock); + add_folio_to_pool(folio, pool); + spin_unlock(&pool->lock); + + return 0; +} +EXPORT_SYMBOL(cleancache_backend_put_folio); + +int cleancache_backend_put_folios(int pool_id, struct list_head *folios) +{ + struct cleancache_pool *pool =3D &pools[pool_id]; + LIST_HEAD(unused_folios); + struct folio *folio; + struct folio *tmp; + + list_for_each_entry_safe(folio, tmp, folios, lru) { + /* Do not support large folios yet */ + VM_BUG_ON_FOLIO(folio_test_large(folio), folio); + if (folio_ref_count(folio) !=3D 0) + continue; + + init_cleancache_folio(folio, pool_id); + list_move(&folio->lru, &unused_folios); + } + + spin_lock(&pool->lock); + list_splice_init(&unused_folios, &pool->folio_list); + spin_unlock(&pool->lock); + + return list_empty(folios) ? 0 : -EINVAL; +} +EXPORT_SYMBOL(cleancache_backend_put_folios); + +static int __init init_cleancache(void) +{ + slab_inode =3D KMEM_CACHE(cleancache_inode, 0); + if (!slab_inode) + return -ENOMEM; + + return 0; +} +core_initcall(init_cleancache); diff --git a/mm/filemap.c b/mm/filemap.c index 893ba49808b7..6ed495960021 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -213,6 +214,19 @@ static void filemap_unaccount_folio(struct address_spa= ce *mapping, folio_account_cleaned(folio, inode_to_wb(mapping->host)); } =20 +void store_into_cleancache(struct address_space *mapping, struct folio *fo= lio) +{ + /* + * If we're uptodate, flush out into the cleancache, otherwise + * invalidate any existing cleancache entries. We can't leave + * stale data around in the cleancache once our page is gone. + */ + if (folio_test_uptodate(folio) && folio_test_mappedtodisk(folio)) + cleancache_store_folio(mapping->host, folio); + else + cleancache_invalidate_folio(mapping, mapping->host, folio); +} + /* * Delete a page from the page cache and free it. Caller has to make * sure the page is locked and that nobody else uses it - or that usage @@ -251,6 +265,9 @@ void filemap_remove_folio(struct folio *folio) struct address_space *mapping =3D folio->mapping; =20 BUG_ON(!folio_test_locked(folio)); + + store_into_cleancache(mapping, folio); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); __filemap_remove_folio(folio, NULL); @@ -324,6 +341,9 @@ void delete_from_page_cache_batch(struct address_space = *mapping, if (!folio_batch_count(fbatch)) return; =20 + for (i =3D 0; i < folio_batch_count(fbatch); i++) + store_into_cleancache(mapping, fbatch->folios[i]); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); for (i =3D 0; i < folio_batch_count(fbatch); i++) { @@ -2438,6 +2458,12 @@ static int filemap_read_folio(struct file *file, fil= ler_t filler, unsigned long pflags; int error; =20 + if (cleancache_restore_folio(folio->mapping->host, folio)) { + folio_mark_uptodate(folio); + folio_unlock(folio); + return 0; + } + /* Start the actual read. The read will unlock the page. */ if (unlikely(workingset)) psi_memstall_enter(&pflags); diff --git a/mm/truncate.c b/mm/truncate.c index 91eb92a5ce4f..ed947314321b 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include "internal.h" =20 @@ -136,6 +137,7 @@ void folio_invalidate(struct folio *folio, size_t offse= t, size_t length) { const struct address_space_operations *aops =3D folio->mapping->a_ops; =20 + cleancache_invalidate_folio(folio->mapping, folio->mapping->host, folio); if (aops->invalidate_folio) aops->invalidate_folio(folio, offset, length); } @@ -613,6 +615,8 @@ int folio_unmap_invalidate(struct address_space *mappin= g, struct folio *folio, if (!filemap_release_folio(folio, gfp)) return -EBUSY; =20 + cleancache_invalidate_folio(mapping, mapping->host, folio); + spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); if (folio_test_dirty(folio)) diff --git a/mm/vmscan.c b/mm/vmscan.c index c80fcae7f2a1..5ff1009e68e0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -716,6 +716,7 @@ static int __remove_mapping(struct address_space *mappi= ng, struct folio *folio, if (folio_test_swapcache(folio)) { ci =3D swap_cluster_get_and_lock_irq(folio); } else { + store_into_cleancache(mapping, folio); spin_lock(&mapping->host->i_lock); xa_lock_irq(&mapping->i_pages); } --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6372C22D9ED for ; Fri, 10 Oct 2025 01:20:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059203; cv=none; b=fF4d2/Ds0p2gQA5Ft9Ao9sbcdJROvc3qsJK+EyPFtaYw/AZZgQ6R6+uCC9acs+oVSR/pwyg/nICN0tfUAmPgDQb1QoCJuW3236Gey0qk81hM6kxJ5/SFrweckDn3EKdyjy08rz+teVGgbG6RMX3/2gmgDViNzw0JBv7QSa4Id1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059203; c=relaxed/simple; bh=qqMh5neXihNma1bqqODHksMYAs2CpreAk4attEWQEuY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=c50JfGB2H+tOZWOJmyGQIOT9gCP5s1C6jlh9TZKouHcVfXLudTqItL9miHBveNsQwD2MpLR4qX/9S+V195cMK0ISN1wplQ3ziiCIMq/8cQKjeZzXug44t0noZzG4ywk8IBna/NSyQpaarqew3fbFdjyh5BXEfQprRwEbYgMitYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=JCujioF7; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JCujioF7" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-33428befc5bso3863573a91.0 for ; Thu, 09 Oct 2025 18:20:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059200; x=1760664000; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q5NnfUqcWLVy127Caashokc+Ty+gAwqDLPiFwXfeq7Q=; b=JCujioF7S9U5iO7gpOnPbRaxbTP2IE8DwG4AX+rOh/96UgsT7HrsceM/YFotKvUi+q PnpzIVmruDz2xzD3PycsfrLeXf0z+765Uqtk4qkoDtWViGRbQKv3g5W6ZyAVpT26s9SA HhP1Y03x8CIPkxchqDaV7/GFVq4J/seQV6KgUuurfp04iI1+PS2A9cZr9gKIz0QKneRY sA2JbuENEPA/4ZrNMzS1wOeUMfdF/xvYW+zPbnWM0PZWnTtRmlfWdxfOa9jTMrsenpF0 lXIlyNPPd78JBtOQl3tYPYYVB/Ney+aBbsx7zGxfMPqcS7LLrql8It6mGRAKCiknS2h4 /XoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059200; x=1760664000; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q5NnfUqcWLVy127Caashokc+Ty+gAwqDLPiFwXfeq7Q=; b=Dn63CBJVKaU13XnhPuw9sV3HcGHV5+ZYL4xzpU2pj2MjUmBqTSrblzNk8L2ponVnC6 n+5BBAP+BTaW0G6ey6jQV8kHArVxKbH+1ql+M2G7XF8qi96FBcy9x2HlTjom7bDk6QtX cdP1nDJrPgJLMmwVP7qpphSvi3JTmKK9CYiCtGmPjWm5y8IQEi8YPhP5OSuxBE6ZOU3k 21Pj7AV7pYy42Jch0ghjvLZTViJxWHSWmWjva5b+qxI/4ThMpfjnhtT5H3XJZGASdOoc 3T4tbwOgp5MUcIz/td41Y6m0AvD84hTqft31APyvqWBJVlk94ptintyLzZThW7ausVNL SeFA== X-Forwarded-Encrypted: i=1; AJvYcCVUq+mr23sXMxtvVc+3dMHJGWtvfdTJEOZfi4TggqxM6ykGtuC9BzkLN1XpZp1QCNnuNi2pXc9znBh4IRU=@vger.kernel.org X-Gm-Message-State: AOJu0YwTs6kJ3svCsambN3lLuElfb9+tbQVS50SiFuOgUEVAzbRwEi5l N8rvzE28ziLzqUyqH5g4Ch5lGcirXe1G1Ho+udS9HbAVy/HdHUTfHBXHOUxMovkTApVzPXdS9xc GK6M3UQ== X-Google-Smtp-Source: AGHT+IGUtqiuXgYovxJ9pqzet+oIg30O6HUu5Ov7PiPDZ2bQlxmJF3pLfeJkqhVr7qj3OayvX9LqFKSZH5I= X-Received: from pjbpb8.prod.google.com ([2002:a17:90b:3c08:b0:33b:51fe:1a95]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3908:b0:335:21bf:3b99 with SMTP id 98e67ed59e1d1-33b5139a3c6mr12252369a91.32.1760059200475; Thu, 09 Oct 2025 18:20:00 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:45 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-3-surenb@google.com> Subject: [PATCH 2/8] mm/cleancache: add cleancache LRU for folio aging From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, Minchan Kim Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Once all folios in the cleancache are used to store data from previously evicted folios, no more data can be stored there. To avoid that situation we can drop older data and make space for new one. Add an LRU for cleancache folios to reclaim the oldest folio when cleancache is full and we need to store a new folio. Signed-off-by: Suren Baghdasaryan Signed-off-by: Minchan Kim --- mm/cleancache.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 88 insertions(+), 2 deletions(-) diff --git a/mm/cleancache.c b/mm/cleancache.c index 0023962de024..73a8b2655def 100644 --- a/mm/cleancache.c +++ b/mm/cleancache.c @@ -19,6 +19,13 @@ * * ccinode->folios.xa_lock * pool->lock + * + * ccinode->folios.xa_lock + * lru_lock + * + * ccinode->folios.xa_lock + * lru_lock + * pool->lock */ =20 #define INODE_HASH_BITS 6 @@ -60,6 +67,8 @@ static struct kmem_cache *slab_inode; /* cleancache_inode= slab */ static struct cleancache_pool pools[CLEANCACHE_MAX_POOLS]; static atomic_t nr_pools =3D ATOMIC_INIT(0); static DEFINE_SPINLOCK(pools_lock); /* protects pools */ +static LIST_HEAD(cleancache_lru); +static DEFINE_SPINLOCK(lru_lock); /* protects cleancache_lru */ =20 /* * Folio attributes: @@ -130,6 +139,7 @@ static inline bool is_folio_attached(struct folio *foli= o) /* * Folio pool helpers. * Only detached folios are stored in the pool->folio_list. + * Once a folio gets attached, it's placed on the cleancache LRU list. * * Locking: * pool->folio_list is accessed under pool->lock. @@ -181,6 +191,32 @@ static struct folio *pick_folio_from_any_pool(void) return folio; } =20 +/* Folio LRU helpers. Only attached folios are stored in the cleancache_lr= u. */ +static void add_folio_to_lru(struct folio *folio) +{ + VM_BUG_ON(!list_empty(&folio->lru)); + + spin_lock(&lru_lock); + list_add(&folio->lru, &cleancache_lru); + spin_unlock(&lru_lock); +} + +static void rotate_lru_folio(struct folio *folio) +{ + spin_lock(&lru_lock); + if (!list_empty(&folio->lru)) + list_move(&folio->lru, &cleancache_lru); + spin_unlock(&lru_lock); +} + +static void delete_folio_from_lru(struct folio *folio) +{ + spin_lock(&lru_lock); + if (!list_empty(&folio->lru)) + list_del_init(&folio->lru); + spin_unlock(&lru_lock); +} + /* FS helpers */ static struct cleancache_fs *get_fs(int fs_id) { @@ -316,6 +352,7 @@ static void erase_folio_from_inode(struct cleancache_in= ode *ccinode, =20 removed =3D __xa_erase(&ccinode->folios, offset); VM_BUG_ON(!removed); + delete_folio_from_lru(folio); remove_inode_if_empty(ccinode); } =20 @@ -413,6 +450,48 @@ static struct cleancache_inode *add_and_get_inode(stru= ct cleancache_fs *fs, return ccinode; } =20 +static struct folio *reclaim_folio_from_lru(void) +{ + struct cleancache_inode *ccinode; + unsigned long offset; + struct folio *folio; + +again: + spin_lock(&lru_lock); + if (list_empty(&cleancache_lru)) { + spin_unlock(&lru_lock); + return NULL; + } + ccinode =3D NULL; + /* Get the ccinode of the folio at the LRU tail */ + list_for_each_entry_reverse(folio, &cleancache_lru, lru) { + struct cleancache_pool *pool =3D folio_pool(folio); + + /* Find and get ccinode */ + spin_lock(&pool->lock); + folio_attachment(folio, &ccinode, &offset); + if (ccinode && !get_inode(ccinode)) + ccinode =3D NULL; + spin_unlock(&pool->lock); + if (ccinode) + break; + } + spin_unlock(&lru_lock); + + if (!ccinode) + return NULL; /* No ccinode to reclaim */ + + if (!isolate_folio_from_inode(ccinode, offset, folio)) { + /* Retry if the folio got erased from the ccinode */ + put_inode(ccinode); + goto again; + } + + put_inode(ccinode); + + return folio; +} + static void copy_folio_content(struct folio *from, struct folio *to) { void *src =3D kmap_local_folio(from, 0); @@ -468,14 +547,19 @@ static bool store_into_inode(struct cleancache_fs *fs, move_folio_from_inode_to_pool(ccinode, offset, stored_folio); goto out_unlock; } + rotate_lru_folio(stored_folio); } else { if (!workingset) goto out_unlock; =20 stored_folio =3D pick_folio_from_any_pool(); if (!stored_folio) { - /* No free folios, TODO: try reclaiming */ - goto out_unlock; + /* No free folios, try reclaiming */ + xa_unlock(&ccinode->folios); + stored_folio =3D reclaim_folio_from_lru(); + xa_lock(&ccinode->folios); + if (!stored_folio) + goto out_unlock; } =20 if (!store_folio_in_inode(ccinode, offset, stored_folio)) { @@ -487,6 +571,7 @@ static bool store_into_inode(struct cleancache_fs *fs, spin_unlock(&pool->lock); goto out_unlock; } + add_folio_to_lru(stored_folio); } copy_folio_content(folio, stored_folio); =20 @@ -516,6 +601,7 @@ static bool load_from_inode(struct cleancache_fs *fs, xa_lock(&ccinode->folios); stored_folio =3D xa_load(&ccinode->folios, offset); if (stored_folio) { + rotate_lru_folio(stored_folio); copy_folio_content(stored_folio, folio); ret =3D true; } --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9CABC1EF0B0 for ; Fri, 10 Oct 2025 01:20:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059205; cv=none; b=bJzAQ9mMPsj4QI5kCnksorQZ7k50TVcaqP0qtEVC0o2pjwZiUAMSa0rk64WUs0SjjLI3N5KUwAub9q1j3/LnUCagroWjs7FBbOP833NuP3DzD1C3/q83YeTbDbDSnUBhBRCzkKv817v0YKXmkWmSQABJgAEy+guyo7MQv4osYs8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059205; c=relaxed/simple; bh=B/QFPXniGwEFGN4sNohbRDTRW/JKndUEcycvqtnTGMQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sGPb2YsxIQaNLPnAMeRI3j6kMGpSEMtGFXnAJehN6gj4+oRqXo1bqpIq/qbFaWN2xosxQxdiEiaB9EoZWQ/RUFUsovjAU6kl1L0AL75npKPc8qs2n62C8lsTt8bF4vLdBwN8xRPt0piAtc0rVD0Nw11YpwU7k/MrXdXckMkfT0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Nh5EllIq; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Nh5EllIq" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-32eae48beaaso3344856a91.0 for ; Thu, 09 Oct 2025 18:20:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059203; x=1760664003; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qfFmaAKl937kSaDktiPuEwSl7VxkpxzCXx+g6sFaHjQ=; b=Nh5EllIqzXs9BFjaNGglTDUbfKAr8u8Xu3LkPIG4yCuKwkExy7mwfckEIaVzq5pKMb Dtc0q0UI10Yl3tlIaeGsP5z8hZaSuBsfMFzijhbAwvBKz2fm/FzaTGZjsitUdZA/AFMy JnS1OuSnXmoeVi5rFJN3usCU10tYg6Du8dduaDGcIURAGwPhmGzkyA4FGhP6IKm6ZKBH +z8VwcNOxbcSnm/P+tA/6ULN6JVDmg30rdQ3uKqJm6P6R9ljywpLUKGxqnFZ+pPfp6+j Sdl+J4yv7+RQW0nLsCMcb4HPkuLxkqtl8biXQ/OdjGQJ2GAswadmwcnjnR0k2Nm1/zMo udpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059203; x=1760664003; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qfFmaAKl937kSaDktiPuEwSl7VxkpxzCXx+g6sFaHjQ=; b=UA6/domVG22Vv3X7OfWHPksyOdxAtcM3V9B2cZKC5m7dqYwPJ8oQNYC8Rs+wtNvw4K dk5+JA9bqSkZJv4cJbcFnxQbh2mzZt9RSHAf78jN5dLG7Vzzj7BJGz07V53yji7X5fxO hdk2ZHJzfaIt2rIHkjXjxfYI/GjyOgfAfutUvkMgYYYgNc0DuEb7x35CGGk/L2nAM2NG H6pyauz+5CFAduKSyEpWgj1HQ1EEzT0K2F5FfGI0214B5+y4yUedbu1MDIvIFW0jZzH9 qf4QDr+QZIIDQJrrtc+Ggsed3ielSn52nydOgFRwqOoshHXTan5q5LiO26h+WIXmSE4t rDEQ== X-Forwarded-Encrypted: i=1; AJvYcCWgf1EmltZ6wwQ7GO0zMSgE+d6wZCzYncmZHx22aX3RwdzGrtynsuPXnhKTjf2EUsknUameVvw9BcFt3Do=@vger.kernel.org X-Gm-Message-State: AOJu0YxwBFcOVxs11i2t59W+alLCahKS/N/2ihpL0HMj9pKWntCt8PGK DhdeW6iN147QtJHgZM++634oc3bxYpuaYQ+OnRmaGn8Dt8XN32uboq5NWL2VQS9EccoZ8HpulN+ BNDc9HA== X-Google-Smtp-Source: AGHT+IFjOaBKWzMfb/TvgcjiAKDwb9C3QEg3GiheZ0wej2yAwCBPTptgb70zWp2JQn59IyW7GzdntSOdWLc= X-Received: from pjbsr7.prod.google.com ([2002:a17:90b:4e87:b0:32e:e06a:4668]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1e07:b0:339:f09b:d372 with SMTP id 98e67ed59e1d1-33b513b4c91mr13745970a91.23.1760059202712; Thu, 09 Oct 2025 18:20:02 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:46 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-4-surenb@google.com> Subject: [PATCH 3/8] mm/cleancache: readahead support From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Restore pages from the cleancache during readahead operation. Signed-off-by: Suren Baghdasaryan --- include/linux/cleancache.h | 17 +++++++++++ mm/cleancache.c | 59 ++++++++++++++++++++++++++++++++++++++ mm/readahead.c | 55 +++++++++++++++++++++++++++++++++++ 3 files changed, 131 insertions(+) diff --git a/include/linux/cleancache.h b/include/linux/cleancache.h index 458a7a25a8af..28b6d7b25964 100644 --- a/include/linux/cleancache.h +++ b/include/linux/cleancache.h @@ -11,6 +11,7 @@ =20 #define CLEANCACHE_KEY_MAX 6 =20 +struct cleancache_inode; =20 #ifdef CONFIG_CLEANCACHE =20 @@ -24,6 +25,14 @@ bool cleancache_invalidate_folio(struct address_space *m= apping, bool cleancache_invalidate_inode(struct address_space *mapping, struct inode *inode); =20 +struct cleancache_inode * +cleancache_start_inode_walk(struct address_space *mapping, + struct inode *inode, + unsigned long count); +void cleancache_end_inode_walk(struct cleancache_inode *ccinode); +bool cleancache_restore_from_inode(struct cleancache_inode *ccinode, + struct folio *folio); + /* * Backend API * @@ -53,6 +62,14 @@ static inline bool cleancache_invalidate_folio(struct ad= dress_space *mapping, static inline bool cleancache_invalidate_inode(struct address_space *mappi= ng, struct inode *inode) { return false; } +static inline struct cleancache_inode * +cleancache_start_inode_walk(struct address_space *mapping, struct inode *i= node, + unsigned long count) + { return NULL; } +static inline void cleancache_end_inode_walk(struct cleancache_inode *ccin= ode) {} +static inline bool cleancache_restore_from_inode(struct cleancache_inode *= ccinode, + struct folio *folio) + { return false; } static inline int cleancache_backend_register_pool(const char *name) { return -EOPNOTSUPP; } static inline int cleancache_backend_get_folio(int pool_id, struct folio *= folio) diff --git a/mm/cleancache.c b/mm/cleancache.c index 73a8b2655def..59b8fd309619 100644 --- a/mm/cleancache.c +++ b/mm/cleancache.c @@ -813,6 +813,65 @@ bool cleancache_invalidate_inode(struct address_space = *mapping, return count > 0; } =20 +struct cleancache_inode * +cleancache_start_inode_walk(struct address_space *mapping, struct inode *i= node, + unsigned long count) +{ + struct cleancache_inode *ccinode; + struct cleancache_fs *fs; + int fs_id; + + if (!inode) + return ERR_PTR(-EINVAL); + + fs_id =3D mapping->host->i_sb->cleancache_id; + if (fs_id =3D=3D CLEANCACHE_ID_INVALID) + return ERR_PTR(-EINVAL); + + fs =3D get_fs(fs_id); + if (!fs) + return NULL; + + ccinode =3D find_and_get_inode(fs, inode); + if (!ccinode) { + put_fs(fs); + return NULL; + } + + return ccinode; +} + +void cleancache_end_inode_walk(struct cleancache_inode *ccinode) +{ + struct cleancache_fs *fs =3D ccinode->fs; + + put_inode(ccinode); + put_fs(fs); +} + +bool cleancache_restore_from_inode(struct cleancache_inode *ccinode, + struct folio *folio) +{ + struct folio *stored_folio; + void *src, *dst; + bool ret =3D false; + + xa_lock(&ccinode->folios); + stored_folio =3D xa_load(&ccinode->folios, folio->index); + if (stored_folio) { + rotate_lru_folio(stored_folio); + src =3D kmap_local_folio(stored_folio, 0); + dst =3D kmap_local_folio(folio, 0); + memcpy(dst, src, PAGE_SIZE); + kunmap_local(dst); + kunmap_local(src); + ret =3D true; + } + xa_unlock(&ccinode->folios); + + return ret; +} + /* Backend API */ /* * Register a new backend and add its folios for cleancache to use. diff --git a/mm/readahead.c b/mm/readahead.c index 3a4b5d58eeb6..6f4986a5e14a 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -128,6 +128,7 @@ #include #include #include +#include =20 #define CREATE_TRACE_POINTS #include @@ -146,12 +147,66 @@ file_ra_state_init(struct file_ra_state *ra, struct a= ddress_space *mapping) } EXPORT_SYMBOL_GPL(file_ra_state_init); =20 +static inline bool restore_from_cleancache(struct readahead_control *rac) +{ + XA_STATE(xas, &rac->mapping->i_pages, rac->_index); + struct cleancache_inode *ccinode; + struct folio *folio; + unsigned long end; + bool ret =3D true; + + int count =3D readahead_count(rac); + + /* Readahead should not have started yet. */ + VM_BUG_ON(rac->_batch_count !=3D 0); + + if (!count) + return true; + + ccinode =3D cleancache_start_inode_walk(rac->mapping, rac->mapping->host, + count); + if (!ccinode) + return false; + + end =3D rac->_index + rac->_nr_pages - 1; + xas_for_each(&xas, folio, end) { + unsigned long nr; + + if (xas_retry(&xas, folio)) { + ret =3D false; + break; + } + + if (!cleancache_restore_from_inode(ccinode, folio)) { + ret =3D false; + break; + } + + nr =3D folio_nr_pages(folio); + folio_mark_uptodate(folio); + folio_unlock(folio); + rac->_index +=3D nr; + rac->_nr_pages -=3D nr; + rac->ra->size -=3D nr; + if (rac->ra->async_size >=3D nr) + rac->ra->async_size -=3D nr; + } + + cleancache_end_inode_walk(ccinode); + + return ret; +} + static void read_pages(struct readahead_control *rac) { const struct address_space_operations *aops =3D rac->mapping->a_ops; struct folio *folio; struct blk_plug plug; =20 + /* Try to read all pages from the cleancache */ + if (restore_from_cleancache(rac)) + return; + if (!readahead_count(rac)) return; =20 --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6AE3246BD5 for ; Fri, 10 Oct 2025 01:20:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059209; cv=none; b=lsi3SNRQeDAOU+pYu9f2qRiLylEgIIMz8frDGEMg7iwRuXKGdFmJIvrP9KU3yQwQRqtAlYwnAjMOMIdc0uoi5x7Z4G8tGI5pQiV5oRLUXKymEda0ocJkISEoQnFLpD+dalid0kK1JKYuy5JUuP+jVe8DIbCSVw/m9Ytn1PCyxTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059209; c=relaxed/simple; bh=bn9lTdYGL9Ujc3sLUhzNwz04Ju6VtGy9ZkLXqXBGZBs=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=soN+5WbaSs3dXcSr5Uesa2CvbVW4WQSUwGkJ5vPXTh1LgQ/82tKFKMmrxQMUtt8TtZHJSsGcjEsR1uLUshms8ylrMipXo5IoW4IIHOhVIH6XgcOjDK0j/uy8PR5pQt4vIt9v9VPdXjc1gMeSxKIABWA+wQK54R4NWXMYBkM0wag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YLDhr6Kk; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YLDhr6Kk" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-76e2eb787f2so2238049b3a.3 for ; Thu, 09 Oct 2025 18:20:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059205; x=1760664005; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tcN6OGexdDN4REs0WUDwtbwKs1bNtyrXTB87BI/xDL8=; b=YLDhr6KkTJ2Y3psZfw/wE2SBhy/5uyKGG7mofjp7VdjDyoTkXBge7lzgi6aldCh4Sl anOGxmM7YeXqVzA84SmAkSf0BI9pdcBhy430oURG2KEHVtGxMo4toTceAZYB95pxagyM caaRfj9ArGjik0PCzjBtH/x+LcZoiyEjzhgedSC0Jz42yQa2BLTwhf1eqwB7dT40RQ87 H8OR5fJiczZZiIdwm6ftHfee9pTRltDkS3Fn6tobyk3EqHm6f2EkxjrrB7PxERj7KMwz E5uY+zeDSoosFU80gvslgMCCsjeFd7w0JxiXSg8K5tcL0Eqgx2s5tZEee6qRa0qOFi1M vwEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059205; x=1760664005; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tcN6OGexdDN4REs0WUDwtbwKs1bNtyrXTB87BI/xDL8=; b=hCDsRyl08hPdrv52/GMR5ce58rV00VK3hDzb0Rt8B3TaDEqXwq7sHZKzgVwLO5Tfx2 sdFwF237P077XNqQaJISD11UbZ6Jnbb3ll8nZ2JmYidWGIpi+YFznNi/FYTMLZikpkt6 CyXT/yBSJAUycwI/iw40RTI/yay/hsi0N/HH2EynKW84ak8VRJ9Ksuk/egMEDN6LGdUq +NFFmTpaDLcoiz/NzuAhTejJsz8swqzF4PE2dEgm/5pSWBT1/RRK7bpYrCMvIMegogb4 GYq1dVk8ZPcHvF2snteegW6BeE1Z//PknuD67/9R5GFonGVGZ0m/0dafQlPZ4r4RSJLg 6TMg== X-Forwarded-Encrypted: i=1; AJvYcCU2SRjfqz5qVLrnS8H5Kg288SpRVBJ7xan9NiQrhti0v0622Lcot6zDyiIkUkJqaPUvivhxXXYTZRzk4/4=@vger.kernel.org X-Gm-Message-State: AOJu0Yxs+fOEhKY6jZ72ed3/wgyskOhOntxVRnhKRH45rtUtZ/kICvOP zUShqcCWoSc5lH7J9Kyjnos+88W+SfqgkBhegAbsso//mUz+s41UJ8+NcqrGtiECHREGYNjDOgG QT1cV5Q== X-Google-Smtp-Source: AGHT+IGWo2umirCkNnF3dRSzHnJXKgnnR9cLHUFGdmby3Duxq7TZVj0L2Xz38+ViMs5EtsI8nLTHCdiBjWM= X-Received: from pgar12.prod.google.com ([2002:a05:6a02:2e8c:b0:b63:7a61:419c]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:748b:b0:32b:6e08:fa19 with SMTP id adf61e73a8af0-32da81345fdmr12546348637.1.1760059205122; Thu, 09 Oct 2025 18:20:05 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:47 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-5-surenb@google.com> Subject: [PATCH 4/8] mm/cleancache: add sysfs interface From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Create sysfs API under /sys/kernel/mm/cleancache/ to report the following metrics: stored - number of successful cleancache folio stores skipped - number of folios skipped during cleancache store operation restored - number of successful cleancache folio restore operations missed - number of failed cleancache folio restore operations reclaimed - number of folios dropped due to their age recalled - number of folios dropped because cleancache backend took them back invalidated - number of folios dropped due to invalidation cached - number of folios currently cached in the cleancache In addition each pool creates a /sys/kernel/mm/cleancache/ directory containing the following metrics: size - number of folios in the pool cached - number of folios currently cached in the pool recalled - number of folios dropped from the pool because cleancache backend took them back Signed-off-by: Suren Baghdasaryan --- MAINTAINERS | 2 + mm/Kconfig | 8 ++ mm/Makefile | 1 + mm/cleancache.c | 113 +++++++++++++++++++++-- mm/cleancache_sysfs.c | 209 ++++++++++++++++++++++++++++++++++++++++++ mm/cleancache_sysfs.h | 58 ++++++++++++ 6 files changed, 383 insertions(+), 8 deletions(-) create mode 100644 mm/cleancache_sysfs.c create mode 100644 mm/cleancache_sysfs.h diff --git a/MAINTAINERS b/MAINTAINERS index de7a89cd44a0..f66307cd9c4b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6055,6 +6055,8 @@ L: linux-mm@kvack.org S: Maintained F: include/linux/cleancache.h F: mm/cleancache.c +F: mm/cleancache_sysfs.c +F: mm/cleancache_sysfs.h =20 CLK API M: Russell King diff --git a/mm/Kconfig b/mm/Kconfig index 7e2482c522a0..9f4da8a848f4 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -965,6 +965,14 @@ config CLEANCACHE =20 If unsure, say N. =20 +config CLEANCACHE_SYSFS + bool "Cleancache information through sysfs interface" + depends on CLEANCACHE && SYSFS + help + This option exposes sysfs attributes to get information from + cleancache. The user space can use the interface for query cleancache + and individual cleancache pool metrics. + config CMA bool "Contiguous Memory Allocator" depends on MMU diff --git a/mm/Makefile b/mm/Makefile index b78073b87aea..a7a635f762ee 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -147,3 +147,4 @@ obj-$(CONFIG_EXECMEM) +=3D execmem.o obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_PT_RECLAIM) +=3D pt_reclaim.o obj-$(CONFIG_CLEANCACHE) +=3D cleancache.o +obj-$(CONFIG_CLEANCACHE_SYSFS) +=3D cleancache_sysfs.o diff --git a/mm/cleancache.c b/mm/cleancache.c index 59b8fd309619..56dce7e03709 100644 --- a/mm/cleancache.c +++ b/mm/cleancache.c @@ -12,6 +12,8 @@ #include #include =20 +#include "cleancache_sysfs.h" + /* * Lock nesting: * ccinode->folios.xa_lock @@ -57,6 +59,8 @@ struct cleancache_inode { struct cleancache_pool { struct list_head folio_list; spinlock_t lock; /* protects folio_list */ + char *name; + struct cleancache_pool_stats *stats; }; =20 #define CLEANCACHE_MAX_POOLS 64 @@ -110,6 +114,7 @@ static void attach_folio(struct folio *folio, struct cl= eancache_inode *ccinode, =20 folio->mapping =3D (struct address_space *)ccinode; folio->index =3D offset; + cleancache_pool_stat_inc(folio_pool(folio)->stats, POOL_CACHED); } =20 static void detach_folio(struct folio *folio) @@ -118,6 +123,7 @@ static void detach_folio(struct folio *folio) =20 folio->mapping =3D NULL; folio->index =3D 0; + cleancache_pool_stat_dec(folio_pool(folio)->stats, POOL_CACHED); } =20 static void folio_attachment(struct folio *folio, struct cleancache_inode = **ccinode, @@ -525,7 +531,7 @@ static bool store_into_inode(struct cleancache_fs *fs, ccinode =3D find_and_get_inode(fs, inode); if (!ccinode) { if (!workingset) - return false; + goto out; =20 ccinode =3D add_and_get_inode(fs, inode); if (IS_ERR_OR_NULL(ccinode)) { @@ -545,6 +551,7 @@ static bool store_into_inode(struct cleancache_fs *fs, if (stored_folio) { if (!workingset) { move_folio_from_inode_to_pool(ccinode, offset, stored_folio); + cleancache_stat_inc(RECLAIMED); goto out_unlock; } rotate_lru_folio(stored_folio); @@ -560,6 +567,8 @@ static bool store_into_inode(struct cleancache_fs *fs, xa_lock(&ccinode->folios); if (!stored_folio) goto out_unlock; + + cleancache_stat_inc(RECLAIMED); } =20 if (!store_folio_in_inode(ccinode, offset, stored_folio)) { @@ -571,6 +580,7 @@ static bool store_into_inode(struct cleancache_fs *fs, spin_unlock(&pool->lock); goto out_unlock; } + cleancache_stat_inc(STORED); add_folio_to_lru(stored_folio); } copy_folio_content(folio, stored_folio); @@ -582,6 +592,8 @@ static bool store_into_inode(struct cleancache_fs *fs, remove_inode_if_empty(ccinode); xa_unlock(&ccinode->folios); put_inode(ccinode); +out: + cleancache_stat_inc(SKIPPED); =20 return ret; } @@ -592,23 +604,26 @@ static bool load_from_inode(struct cleancache_fs *fs, { struct cleancache_inode *ccinode; struct folio *stored_folio; - bool ret =3D false; =20 ccinode =3D find_and_get_inode(fs, inode); - if (!ccinode) + if (!ccinode) { + cleancache_stat_inc(MISSED); return false; + } =20 xa_lock(&ccinode->folios); stored_folio =3D xa_load(&ccinode->folios, offset); if (stored_folio) { rotate_lru_folio(stored_folio); copy_folio_content(stored_folio, folio); - ret =3D true; + cleancache_stat_inc(RESTORED); + } else { + cleancache_stat_inc(MISSED); } xa_unlock(&ccinode->folios); put_inode(ccinode); =20 - return ret; + return !!stored_folio; } =20 static bool invalidate_folio(struct cleancache_fs *fs, @@ -623,8 +638,10 @@ static bool invalidate_folio(struct cleancache_fs *fs, =20 xa_lock(&ccinode->folios); folio =3D xa_load(&ccinode->folios, offset); - if (folio) + if (folio) { move_folio_from_inode_to_pool(ccinode, offset, folio); + cleancache_stat_inc(INVALIDATED); + } xa_unlock(&ccinode->folios); put_inode(ccinode); =20 @@ -645,6 +662,7 @@ static unsigned int invalidate_inode(struct cleancache_= fs *fs, ret =3D erase_folios_from_inode(ccinode, &xas); xas_unlock(&xas); put_inode(ccinode); + cleancache_stat_add(INVALIDATED, ret); =20 return ret; } @@ -652,6 +670,53 @@ static unsigned int invalidate_inode(struct cleancache= _fs *fs, return 0; } =20 +/* Sysfs helpers */ +#ifdef CONFIG_CLEANCACHE_SYSFS + +static struct kobject *kobj_sysfs_root; + +static void __init cleancache_sysfs_init(void) +{ + struct cleancache_pool *pool; + int pool_id, pool_count; + struct kobject *kobj; + + kobj =3D cleancache_sysfs_create_root(); + if (IS_ERR(kobj)) { + pr_warn("Failed to create cleancache sysfs root\n"); + return; + } + + kobj_sysfs_root =3D kobj; + if (!kobj_sysfs_root) + return; + + pool_count =3D atomic_read(&nr_pools); + pool =3D &pools[0]; + for (pool_id =3D 0; pool_id < pool_count; pool_id++, pool++) + if (cleancache_sysfs_create_pool(kobj_sysfs_root, pool->stats, pool->nam= e)) + pr_warn("Failed to create sysfs nodes for \'%s\' cleancache backend\n", + pool->name); +} + +static void cleancache_sysfs_pool_init(struct cleancache_pool_stats *pool_= stats, + const char *name) +{ + /* Skip if sysfs was not initialized yet. */ + if (!kobj_sysfs_root) + return; + + if (cleancache_sysfs_create_pool(kobj_sysfs_root, pool_stats, name)) + pr_warn("Failed to create sysfs nodes for \'%s\' cleancache backend\n", + name); +} + +#else /* CONFIG_CLEANCACHE_SYSFS */ +static inline void cleancache_sysfs_init(void) {} +static inline void cleancache_sysfs_pool_init(struct cleancache_pool_stats= *pool_stats, + const char *name) {} +#endif /* CONFIG_CLEANCACHE_SYSFS */ + /* Hooks into MM and FS */ void cleancache_add_fs(struct super_block *sb) { @@ -835,6 +900,7 @@ cleancache_start_inode_walk(struct address_space *mappi= ng, struct inode *inode, ccinode =3D find_and_get_inode(fs, inode); if (!ccinode) { put_fs(fs); + cleancache_stat_add(MISSED, count); return NULL; } =20 @@ -865,7 +931,10 @@ bool cleancache_restore_from_inode(struct cleancache_i= node *ccinode, memcpy(dst, src, PAGE_SIZE); kunmap_local(dst); kunmap_local(src); + cleancache_stat_inc(RESTORED); ret =3D true; + } else { + cleancache_stat_inc(MISSED); } xa_unlock(&ccinode->folios); =20 @@ -879,9 +948,18 @@ bool cleancache_restore_from_inode(struct cleancache_i= node *ccinode, */ int cleancache_backend_register_pool(const char *name) { + struct cleancache_pool_stats *pool_stats; struct cleancache_pool *pool; + char *pool_name; int pool_id; =20 + if (!name) + return -EINVAL; + + pool_name =3D kstrdup(name, GFP_KERNEL); + if (!pool_name) + return -ENOMEM; + /* pools_lock prevents concurrent registrations */ spin_lock(&pools_lock); pool_id =3D atomic_read(&nr_pools); @@ -893,12 +971,22 @@ int cleancache_backend_register_pool(const char *name) pool =3D &pools[pool_id]; INIT_LIST_HEAD(&pool->folio_list); spin_lock_init(&pool->lock); + pool->name =3D pool_name; /* Ensure above stores complete before we increase the count */ atomic_set_release(&nr_pools, pool_id + 1); spin_unlock(&pools_lock); =20 + pool_stats =3D cleancache_create_pool_stats(pool_id); + if (!IS_ERR(pool_stats)) { + pool->stats =3D pool_stats; + cleancache_sysfs_pool_init(pool_stats, pool->name); + } else { + pr_warn("Failed to create pool stats for \'%s\' cleancache backend\n", + pool->name); + } + pr_info("Registered \'%s\' cleancache backend, pool id %d\n", - name ? : "none", pool_id); + name, pool_id); =20 return pool_id; } @@ -947,10 +1035,13 @@ int cleancache_backend_get_folio(int pool_id, struct= folio *folio) goto again; } =20 + cleancache_stat_inc(RECALLED); + cleancache_pool_stat_inc(folio_pool(folio)->stats, POOL_RECALLED); put_inode(ccinode); out: VM_BUG_ON_FOLIO(folio_ref_count(folio) !=3D 0, (folio)); clear_cleancache_folio(folio); + cleancache_pool_stat_dec(pool->stats, POOL_SIZE); =20 return 0; } @@ -972,6 +1063,7 @@ int cleancache_backend_put_folio(int pool_id, struct f= olio *folio) INIT_LIST_HEAD(&folio->lru); spin_lock(&pool->lock); add_folio_to_pool(folio, pool); + cleancache_pool_stat_inc(pool->stats, POOL_SIZE); spin_unlock(&pool->lock); =20 return 0; @@ -984,6 +1076,7 @@ int cleancache_backend_put_folios(int pool_id, struct = list_head *folios) LIST_HEAD(unused_folios); struct folio *folio; struct folio *tmp; + int count =3D 0; =20 list_for_each_entry_safe(folio, tmp, folios, lru) { /* Do not support large folios yet */ @@ -993,10 +1086,12 @@ int cleancache_backend_put_folios(int pool_id, struc= t list_head *folios) =20 init_cleancache_folio(folio, pool_id); list_move(&folio->lru, &unused_folios); + count++; } =20 spin_lock(&pool->lock); list_splice_init(&unused_folios, &pool->folio_list); + cleancache_pool_stat_add(pool->stats, POOL_SIZE, count); spin_unlock(&pool->lock); =20 return list_empty(folios) ? 0 : -EINVAL; @@ -1009,6 +1104,8 @@ static int __init init_cleancache(void) if (!slab_inode) return -ENOMEM; =20 + cleancache_sysfs_init(); + return 0; } -core_initcall(init_cleancache); +subsys_initcall(init_cleancache); diff --git a/mm/cleancache_sysfs.c b/mm/cleancache_sysfs.c new file mode 100644 index 000000000000..5ad7ae84ca1d --- /dev/null +++ b/mm/cleancache_sysfs.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include +#include +#include "cleancache_sysfs.h" + +static atomic64_t stats[CLEANCACHE_STAT_NR]; + +void cleancache_stat_inc(enum cleancache_stat type) +{ + atomic64_inc(&stats[type]); +} + +void cleancache_stat_add(enum cleancache_stat type, unsigned long delta) +{ + atomic64_add(delta, &stats[type]); +} + +void cleancache_pool_stat_inc(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type) +{ + atomic64_inc(&pool_stats->stats[type]); +} + +void cleancache_pool_stat_dec(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type) +{ + atomic64_dec(&pool_stats->stats[type]); +} + +void cleancache_pool_stat_add(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type, long delta) +{ + atomic64_add(delta, &pool_stats->stats[type]); +} + +#define CLEANCACHE_ATTR_RO(_name) \ + static struct kobj_attribute _name##_attr =3D __ATTR_RO(_name) + +static inline struct cleancache_pool_stats *kobj_to_stats(struct kobject *= kobj) +{ + return container_of(kobj, struct cleancache_pool_stats, kobj); +} + +static ssize_t stored_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[STORED])); +} +CLEANCACHE_ATTR_RO(stored); + +static ssize_t skipped_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[SKIPPED])); +} +CLEANCACHE_ATTR_RO(skipped); + +static ssize_t restored_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[RESTORED])); +} +CLEANCACHE_ATTR_RO(restored); + +static ssize_t missed_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[MISSED])); +} +CLEANCACHE_ATTR_RO(missed); + +static ssize_t reclaimed_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[RECLAIMED])); +} +CLEANCACHE_ATTR_RO(reclaimed); + +static ssize_t recalled_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[RECALLED])); +} +CLEANCACHE_ATTR_RO(recalled); + +static ssize_t invalidated_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", (u64)atomic64_read(&stats[INVALIDATED])); +} +CLEANCACHE_ATTR_RO(invalidated); + +static ssize_t cached_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + s64 dropped =3D atomic64_read(&stats[INVALIDATED]) + + atomic64_read(&stats[RECLAIMED]) + + atomic64_read(&stats[RECALLED]); + + return sysfs_emit(buf, "%llu\n", (u64)(atomic64_read(&stats[STORED]) - dr= opped)); +} +CLEANCACHE_ATTR_RO(cached); + +static struct attribute *cleancache_attrs[] =3D { + &stored_attr.attr, + &skipped_attr.attr, + &restored_attr.attr, + &missed_attr.attr, + &reclaimed_attr.attr, + &recalled_attr.attr, + &invalidated_attr.attr, + &cached_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(cleancache); + +#define CLEANCACHE_POOL_ATTR_RO(_name) \ + static struct kobj_attribute _name##_pool_attr =3D { \ + .attr =3D { .name =3D __stringify(_name), .mode =3D 0444 }, \ + .show =3D _name##_pool_show, \ +} + +static ssize_t size_pool_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", + (u64)atomic64_read(&kobj_to_stats(kobj)->stats[POOL_SIZE])); +} +CLEANCACHE_POOL_ATTR_RO(size); + +static ssize_t cached_pool_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", + (u64)atomic64_read(&kobj_to_stats(kobj)->stats[POOL_CACHED])); +} +CLEANCACHE_POOL_ATTR_RO(cached); + +static ssize_t recalled_pool_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%llu\n", + (u64)atomic64_read(&kobj_to_stats(kobj)->stats[POOL_RECALLED])); +} +CLEANCACHE_POOL_ATTR_RO(recalled); + + +static struct attribute *cleancache_pool_attrs[] =3D { + &size_pool_attr.attr, + &cached_pool_attr.attr, + &recalled_pool_attr.attr, + NULL, +}; +ATTRIBUTE_GROUPS(cleancache_pool); + +static void cleancache_pool_release(struct kobject *kobj) +{ + kfree(kobj_to_stats(kobj)); +} + +static const struct kobj_type cleancache_pool_ktype =3D { + .release =3D &cleancache_pool_release, + .sysfs_ops =3D &kobj_sysfs_ops, + .default_groups =3D cleancache_pool_groups, +}; + +struct cleancache_pool_stats *cleancache_create_pool_stats(int pool_id) +{ + struct cleancache_pool_stats *pool_stats; + + pool_stats =3D kzalloc(sizeof(*pool_stats), GFP_KERNEL); + if (!pool_stats) + return ERR_PTR(-ENOMEM); + + pool_stats->pool_id =3D pool_id; + + return pool_stats; +} + +struct kobject * __init cleancache_sysfs_create_root(void) +{ + struct kobject *kobj; + int err; + + kobj =3D kobject_create_and_add("cleancache", mm_kobj); + if (unlikely(!kobj)) { + pr_err("Failed to create cleancache kobject\n"); + return ERR_PTR(-ENOMEM); + } + + err =3D sysfs_create_group(kobj, cleancache_groups[0]); + if (err) { + kobject_put(kobj); + pr_err("Failed to create cleancache group kobject\n"); + return ERR_PTR(err); + } + + return kobj; +} + +int cleancache_sysfs_create_pool(struct kobject *root_kobj, + struct cleancache_pool_stats *pool_stats, + const char *name) +{ + return kobject_init_and_add(&pool_stats->kobj, &cleancache_pool_ktype, + root_kobj, name); +} diff --git a/mm/cleancache_sysfs.h b/mm/cleancache_sysfs.h new file mode 100644 index 000000000000..fb8d2a72be63 --- /dev/null +++ b/mm/cleancache_sysfs.h @@ -0,0 +1,58 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __CLEANCACHE_SYSFS_H__ +#define __CLEANCACHE_SYSFS_H__ + +enum cleancache_stat { + STORED, + SKIPPED, + RESTORED, + MISSED, + RECLAIMED, + RECALLED, + INVALIDATED, + CLEANCACHE_STAT_NR +}; + +enum cleancache_pool_stat { + POOL_SIZE, + POOL_CACHED, + POOL_RECALLED, + CLEANCACHE_POOL_STAT_NR +}; + +struct cleancache_pool_stats { + struct kobject kobj; + int pool_id; + atomic64_t stats[CLEANCACHE_POOL_STAT_NR]; +}; + +#ifdef CONFIG_CLEANCACHE_SYSFS +void cleancache_stat_inc(enum cleancache_stat type); +void cleancache_stat_add(enum cleancache_stat type, unsigned long delta); +void cleancache_pool_stat_inc(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type); +void cleancache_pool_stat_dec(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type); +void cleancache_pool_stat_add(struct cleancache_pool_stats *pool_stats, + enum cleancache_pool_stat type, long delta); +struct cleancache_pool_stats *cleancache_create_pool_stats(int pool_id); +struct kobject * __init cleancache_sysfs_create_root(void); +int cleancache_sysfs_create_pool(struct kobject *root_kobj, + struct cleancache_pool_stats *pool_stats, + const char *name); + +#else /* CONFIG_CLEANCACHE_SYSFS */ +static inline void cleancache_stat_inc(enum cleancache_stat type) {} +static inline void cleancache_stat_add(enum cleancache_stat type, unsigned= long delta) {} +static inline void cleancache_pool_stat_inc(struct cleancache_pool_stats *= pool_stats, + enum cleancache_pool_stat type) {} +static inline void cleancache_pool_stat_dec(struct cleancache_pool_stats *= pool_stats, + enum cleancache_pool_stat type) {} +static inline void cleancache_pool_stat_add(struct cleancache_pool_stats *= pool_stats, + enum cleancache_pool_stat type, long delta) {} +static inline +struct cleancache_pool_stats *cleancache_create_pool_stats(int pool_id) { = return NULL; } + +#endif /* CONFIG_CLEANCACHE_SYSFS */ + +#endif /* __CLEANCACHE_SYSFS_H__ */ --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 389402561A7 for ; Fri, 10 Oct 2025 01:20:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059211; cv=none; b=Mlyn8DIyMHrEN8nl+xG74pMl48Yem1+2mI5+BWJBxCO1OEFm05J99x5nVFEZtGAxHv+4rvHSAJks7DwZ+UC2BX+TGmlr7MyK1AjRUImg3n9h9wZP5FHziTnrfgJqJmJRUXqZTn1P+PJ+Zk3c0946rHsdd0rcl5mUlugRvxgQcFo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059211; c=relaxed/simple; bh=VRxXrZZF2VXTYZ8iIq1VsjEC5bkGWZOb9Rg6YcJl7ZM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jzDWLMxC9bYptq20492CUJYM+k+25CmBYyForCuy+85AbRGBPeMQAfyNSJKZXpoNOxfafFIC8vyKsekoKGOp/4HPUY7DYiqUP3RYt8uEmx0o5etTjbT7wPNVJXebr2f0ugYiL4WcOGEWuASC5ET3d2RH15SvGe/OGfdWMLaftdk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ft0cEvhy; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ft0cEvhy" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2711a55da20so17104195ad.1 for ; Thu, 09 Oct 2025 18:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059207; x=1760664007; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pEIUd4ZHclhGCkol/EpGDU6dqFXNl4VLKrOL2ZuQAlU=; b=Ft0cEvhyhiTFZ2+PEFVmDwsHgKsVMYkRlLvN7WXeiLeq65/qRMVuUjlvJrAEnikO47 jL/3z5jMuWFUDs7ofHI/bRyXhG4bzecMTsjErODXVAe1lagfZG2YJTip6vJgcQ/QMyZi 1+ACipDWfzEBiGEzmnIFGpP6YdqP1/YFv2Y+L9cH/1dLUGDQ6sTe7cB22Rj6FnLnbGLm Mvo5HcpXPMMdSHzinBFlZSmmJnaLPx9SZSuD+8coIasAd1reKQPjGwk20fb3tOhKj66b KC79fhbCkog2YUh+x4UVydMK8EHjEu2JN+2Ao7DsyDj2DgDhFcnTgKjSpTJ4GU1PiZmj GlVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059207; x=1760664007; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pEIUd4ZHclhGCkol/EpGDU6dqFXNl4VLKrOL2ZuQAlU=; b=bAOQI/c29ZEkPARd6Uuw8yG8n9RUOrbmjIOqHFwayo+n21X3lM+5XUEM+K8w48N7KE 0Ry6k2sdIOhSPwstXH9c9wxdjjFtxYdgIdGtqQ5KQr6FZJxcijGzdum3ky9cQIvytOWu aLxfYh+aSBiOfECIh2CDgzov9BYNnE2Zc3trLqtBVmvEVVRD9420WzjlMTaBP3objZBQ saujSqk1zw40vG9zVqhAZu9aN9yfLqF49BwIhJz98a1QUUp4oWXvu+HLyd74YadzIqyY 1e385X1pCzjcguNnLUq6VBzo7nryqXa4nNx6qxWQnAgZ1GaiqeAx3b4Ka9csIHe5Ff9B jocA== X-Forwarded-Encrypted: i=1; AJvYcCVf1Dr72HXOCmeSs58xM4SFli1LWs/t1O/E9JgUGjevVY02ZjdNz2bUrvPGSxj+dcdeVBO2R0L19yoDocs=@vger.kernel.org X-Gm-Message-State: AOJu0YyFRBb1D+gpGnDFZbwGh7jV9bUiXNUk1dQ/onQpc7VJfGsdpJzb OIbhGlvVfnC1N2WSrv2/eUGwvDrQG9uvwUxC6BbCqFitFYzWcKTXyozq+cIQuZRsHF/+bZ0Yc5R pXXqCMw== X-Google-Smtp-Source: AGHT+IHEfcZb//+A/XoFxHeEt6Ogc1LHbtC/aY5NYrm9NviJtA9b0iGFosX3+0YfoF7OGZYkFgLjPVJGLKw= X-Received: from plps24.prod.google.com ([2002:a17:902:9898:b0:267:de1d:2687]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e943:b0:264:f714:8dce with SMTP id d9443c01a7336-290272c2542mr118120205ad.36.1760059207349; Thu, 09 Oct 2025 18:20:07 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:48 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-6-surenb@google.com> Subject: [PATCH 5/8] mm/tests: add cleancache kunit test From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a kunit test that creates fake inodes, fills them with folios with predefined content, registers a cleancache pool, allocates and donates folios to the new pool. After this initialization it runs several scenarios: 1. cleancache_restore_test - stores fake inode pages into cleancache, then restores them into auxiliary folios and checks restored content; 2. cleancache_invalidate_test - stores a folio, successfully restores it, invalidates it and tries to restore again expecting a failure; 3. cleancache_reclaim_test - fills up the cleancache, stores one more folio and verifies that the oldest folio got reclaimed; 4. cleancache_backend_api_test - takes all donated folios and puts them back verifying the results; Signed-off-by: Suren Baghdasaryan --- MAINTAINERS | 1 + mm/Kconfig.debug | 13 ++ mm/Makefile | 1 + mm/cleancache.c | 35 ++- mm/tests/Makefile | 6 + mm/tests/cleancache_kunit.c | 425 ++++++++++++++++++++++++++++++++++++ 6 files changed, 480 insertions(+), 1 deletion(-) create mode 100644 mm/tests/Makefile create mode 100644 mm/tests/cleancache_kunit.c diff --git a/MAINTAINERS b/MAINTAINERS index f66307cd9c4b..1c97227e7ffa 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6057,6 +6057,7 @@ F: include/linux/cleancache.h F: mm/cleancache.c F: mm/cleancache_sysfs.c F: mm/cleancache_sysfs.h +F: mm/tests/cleancache_kunit.c =20 CLK API M: Russell King diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug index 32b65073d0cc..c3482f7bc977 100644 --- a/mm/Kconfig.debug +++ b/mm/Kconfig.debug @@ -309,3 +309,16 @@ config PER_VMA_LOCK_STATS overhead in the page fault path. =20 If in doubt, say N. + +config CLEANCACHE_KUNIT + tristate "KUnit test for cleancache" if !KUNIT_ALL_TESTS + depends on KUNIT + depends on CLEANCACHE + default KUNIT_ALL_TESTS + help + This builds the cleancache unit test. + Tests the clencache functionality. + For more information on KUnit and unit tests in general please refer + to the KUnit documentation in Documentation/dev-tools/kunit/. + + If unsure, say N. diff --git a/mm/Makefile b/mm/Makefile index a7a635f762ee..845841a140e3 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -70,6 +70,7 @@ obj-y +=3D init-mm.o obj-y +=3D memblock.o obj-y +=3D $(memory-hotplug-y) obj-y +=3D slub.o +obj-y +=3D tests/ =20 ifdef CONFIG_MMU obj-$(CONFIG_ADVISE_SYSCALLS) +=3D madvise.o diff --git a/mm/cleancache.c b/mm/cleancache.c index 56dce7e03709..fd18486b0407 100644 --- a/mm/cleancache.c +++ b/mm/cleancache.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include =20 #include "cleancache_sysfs.h" =20 @@ -74,6 +76,28 @@ static DEFINE_SPINLOCK(pools_lock); /* protects pools */ static LIST_HEAD(cleancache_lru); static DEFINE_SPINLOCK(lru_lock); /* protects cleancache_lru */ =20 +#if IS_ENABLED(CONFIG_CLEANCACHE_KUNIT) + +static bool is_pool_allowed(int pool_id) +{ + struct kunit *test =3D kunit_get_current_test(); + + /* Restrict kunit tests to using only the test pool */ + return test && *((int *)test->priv) =3D=3D pool_id; +} + +#else /* CONFIG_CLEANCACHE_KUNIT */ + +static bool is_pool_allowed(int pool_id) { return true; } + +#endif /* CONFIG_CLEANCACHE_KUNIT */ + +#if IS_MODULE(CONFIG_CLEANCACHE_KUNIT) +#define EXPORT_SYMBOL_FOR_KUNIT(x) EXPORT_SYMBOL(x) +#else +#define EXPORT_SYMBOL_FOR_KUNIT(x) +#endif + /* * Folio attributes: * folio->_mapcount - pool_id @@ -184,7 +208,7 @@ static struct folio *pick_folio_from_any_pool(void) for (int i =3D 0; i < count; i++) { pool =3D &pools[i]; spin_lock(&pool->lock); - if (!list_empty(&pool->folio_list)) { + if (!list_empty(&pool->folio_list) && is_pool_allowed(i)) { folio =3D list_last_entry(&pool->folio_list, struct folio, lru); WARN_ON(!remove_folio_from_pool(folio, pool)); @@ -747,6 +771,7 @@ void cleancache_add_fs(struct super_block *sb) err: sb->cleancache_id =3D CLEANCACHE_ID_INVALID; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_add_fs); =20 void cleancache_remove_fs(struct super_block *sb) { @@ -766,6 +791,7 @@ void cleancache_remove_fs(struct super_block *sb) /* free the object */ put_fs(fs); } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_remove_fs); =20 bool cleancache_store_folio(struct inode *inode, struct folio *folio) { @@ -795,6 +821,7 @@ bool cleancache_store_folio(struct inode *inode, struct= folio *folio) =20 return ret; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_store_folio); =20 bool cleancache_restore_folio(struct inode *inode, struct folio *folio) { @@ -822,6 +849,7 @@ bool cleancache_restore_folio(struct inode *inode, stru= ct folio *folio) =20 return ret; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_restore_folio); =20 bool cleancache_invalidate_folio(struct address_space *mapping, struct inode *inode, struct folio *folio) @@ -853,6 +881,7 @@ bool cleancache_invalidate_folio(struct address_space *= mapping, =20 return ret; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_invalidate_folio); =20 bool cleancache_invalidate_inode(struct address_space *mapping, struct inode *inode) @@ -877,6 +906,7 @@ bool cleancache_invalidate_inode(struct address_space *= mapping, =20 return count > 0; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_invalidate_inode); =20 struct cleancache_inode * cleancache_start_inode_walk(struct address_space *mapping, struct inode *i= node, @@ -906,6 +936,7 @@ cleancache_start_inode_walk(struct address_space *mappi= ng, struct inode *inode, =20 return ccinode; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_start_inode_walk); =20 void cleancache_end_inode_walk(struct cleancache_inode *ccinode) { @@ -914,6 +945,7 @@ void cleancache_end_inode_walk(struct cleancache_inode = *ccinode) put_inode(ccinode); put_fs(fs); } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_end_inode_walk); =20 bool cleancache_restore_from_inode(struct cleancache_inode *ccinode, struct folio *folio) @@ -940,6 +972,7 @@ bool cleancache_restore_from_inode(struct cleancache_in= ode *ccinode, =20 return ret; } +EXPORT_SYMBOL_FOR_KUNIT(cleancache_restore_from_inode); =20 /* Backend API */ /* diff --git a/mm/tests/Makefile b/mm/tests/Makefile new file mode 100644 index 000000000000..fac2e964b4d5 --- /dev/null +++ b/mm/tests/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Makefile for tests of kernel mm subsystem. + +# KUnit tests +obj-$(CONFIG_CLEANCACHE_KUNIT) +=3D cleancache_kunit.o diff --git a/mm/tests/cleancache_kunit.c b/mm/tests/cleancache_kunit.c new file mode 100644 index 000000000000..18b4386d6322 --- /dev/null +++ b/mm/tests/cleancache_kunit.c @@ -0,0 +1,425 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KUnit test for the Cleancache. + * + * Copyright (C) 2025, Google LLC. + * Author: Suren Baghdasaryan + */ +#include + +#include +#include +#include + +#include "../internal.h" + +#define INODE_COUNT 5 +#define FOLIOS_PER_INODE 4 +#define FOLIO_COUNT (INODE_COUNT * FOLIOS_PER_INODE) + +static const u32 TEST_CONTENT =3D 0xBADCAB32; + +struct inode_data { + struct address_space mapping; + struct inode inode; + struct folio *folios[FOLIOS_PER_INODE]; +}; + +static struct test_data { + /* Mock a fs */ + struct super_block sb; + struct inode_data inodes[INODE_COUNT]; + /* Folios donated to the cleancache pools */ + struct folio *pool_folios[FOLIO_COUNT]; + /* Auxiliary folio */ + struct folio *aux_folio; + int pool_id; +} test_data; + +static void set_folio_content(struct folio *folio, u32 value) +{ + u32 *data; + + data =3D kmap_local_folio(folio, 0); + *data =3D value; + kunmap_local(data); +} + +static u32 get_folio_content(struct folio *folio) +{ + unsigned long value; + u32 *data; + + data =3D kmap_local_folio(folio, 0); + value =3D *data; + kunmap_local(data); + + return value; +} + +static void fill_cleancache(struct kunit *test) +{ + struct inode_data *inode_data; + struct folio *folio; + + /* Store inode folios into cleancache */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + inode_data =3D &test_data.inodes[inode]; + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio =3D inode_data->folios[fidx]; + KUNIT_EXPECT_NOT_NULL(test, folio); + folio_lock(folio); /* Folio has to be locked */ + folio_set_workingset(folio); + KUNIT_EXPECT_TRUE(test, cleancache_store_folio(&inode_data->inode, foli= o)); + folio_unlock(folio); + } + } +} + +static int cleancache_suite_init(struct kunit_suite *suite) +{ + LIST_HEAD(pool_folios); + + /* Add a fake fs superblock */ + cleancache_add_fs(&test_data.sb); + + /* Initialize fake inodes */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + struct inode_data *inode_data =3D &test_data.inodes[inode]; + + inode_data->inode.i_sb =3D &test_data.sb; + inode_data->inode.i_ino =3D inode; + inode_data->mapping.host =3D &inode_data->inode; + + /* Allocate folios for the inode */ + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + struct folio *folio =3D folio_alloc(GFP_KERNEL | __GFP_ZERO, 0); + + if (!folio) + return -ENOMEM; + + set_folio_content(folio, (u32)fidx); + folio->mapping =3D &inode_data->mapping; + folio->index =3D PAGE_SIZE * fidx; + inode_data->folios[fidx] =3D folio; + } + } + + /* Register new cleancache pool and donate test folios */ + test_data.pool_id =3D cleancache_backend_register_pool("kunit_pool"); + if (test_data.pool_id < 0) + return -EINVAL; + + /* Allocate folios and put them to cleancache */ + for (int fidx =3D 0; fidx < FOLIO_COUNT; fidx++) { + struct folio *folio =3D folio_alloc(GFP_KERNEL | __GFP_ZERO, 0); + + if (!folio) + return -ENOMEM; + + folio_ref_freeze(folio, 1); + test_data.pool_folios[fidx] =3D folio; + list_add(&folio->lru, &pool_folios); + } + + cleancache_backend_put_folios(test_data.pool_id, &pool_folios); + + /* Allocate auxiliary folio for testing */ + test_data.aux_folio =3D folio_alloc(GFP_KERNEL | __GFP_ZERO, 0); + if (!test_data.aux_folio) + return -ENOMEM; + + return 0; +} + +static void cleancache_suite_exit(struct kunit_suite *suite) +{ + /* Take back donated folios and free them */ + for (int fidx =3D 0; fidx < FOLIO_COUNT; fidx++) { + struct folio *folio =3D test_data.pool_folios[fidx]; + + if (folio) { + if (!cleancache_backend_get_folio(test_data.pool_id, + folio)) + set_page_refcounted(&folio->page); + folio_put(folio); + } + } + + /* Free the auxiliary folio */ + if (test_data.aux_folio) { + test_data.aux_folio->mapping =3D NULL; + folio_put(test_data.aux_folio); + } + + /* Free inode folios */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + struct folio *folio =3D test_data.inodes[inode].folios[fidx]; + + if (folio) { + folio->mapping =3D NULL; + folio_put(folio); + } + } + } + + cleancache_remove_fs(&test_data.sb); +} + +static int cleancache_test_init(struct kunit *test) +{ + /* Pass pool_id to cleancache to restrict pools that can be used for test= s */ + test->priv =3D &test_data.pool_id; + + return 0; +} + +static void cleancache_restore_test(struct kunit *test) +{ + struct inode_data *inode_data; + struct folio *folio; + + /* Store inode folios into cleancache */ + fill_cleancache(test); + + /* Restore and validate folios stored in cleancache */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + inode_data =3D &test_data.inodes[inode]; + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio =3D inode_data->folios[fidx]; + test_data.aux_folio->mapping =3D folio->mapping; + test_data.aux_folio->index =3D folio->index; + KUNIT_EXPECT_TRUE(test, cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + KUNIT_EXPECT_EQ(test, get_folio_content(test_data.aux_folio), + get_folio_content(folio)); + } + } +} + +static void cleancache_walk_and_restore_test(struct kunit *test) +{ + struct cleancache_inode *ccinode; + struct inode_data *inode_data; + struct folio *folio; + + /* Store inode folios into cleancache */ + fill_cleancache(test); + + /* Restore and validate folios stored in the first inode */ + inode_data =3D &test_data.inodes[0]; + ccinode =3D cleancache_start_inode_walk(&inode_data->mapping, &inode_data= ->inode, + FOLIOS_PER_INODE); + KUNIT_EXPECT_NOT_NULL(test, ccinode); + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio =3D inode_data->folios[fidx]; + test_data.aux_folio->mapping =3D folio->mapping; + test_data.aux_folio->index =3D folio->index; + KUNIT_EXPECT_TRUE(test, cleancache_restore_from_inode(ccinode, + test_data.aux_folio)); + KUNIT_EXPECT_EQ(test, get_folio_content(test_data.aux_folio), + get_folio_content(folio)); + } + cleancache_end_inode_walk(ccinode); +} + +static void cleancache_invalidate_test(struct kunit *test) +{ + struct inode_data *inode_data; + struct folio *folio; + + /* Store inode folios into cleancache */ + fill_cleancache(test); + + /* Invalidate one folio */ + inode_data =3D &test_data.inodes[0]; + folio =3D inode_data->folios[0]; + test_data.aux_folio->mapping =3D folio->mapping; + test_data.aux_folio->index =3D folio->index; + KUNIT_EXPECT_TRUE(test, cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + folio_lock(folio); /* Folio has to be locked */ + KUNIT_EXPECT_TRUE(test, cleancache_invalidate_folio(&inode_data->mapping, + &inode_data->inode, + inode_data->folios[0])); + folio_unlock(folio); + KUNIT_EXPECT_FALSE(test, cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + + /* Invalidate one node */ + inode_data =3D &test_data.inodes[1]; + KUNIT_EXPECT_TRUE(test, cleancache_invalidate_inode(&inode_data->mapping, + &inode_data->inode)); + + /* Verify results */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + inode_data =3D &test_data.inodes[inode]; + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio =3D inode_data->folios[fidx]; + test_data.aux_folio->mapping =3D folio->mapping; + test_data.aux_folio->index =3D folio->index; + if (inode =3D=3D 0 && fidx =3D=3D 0) { + /* Folio should be missing */ + KUNIT_EXPECT_FALSE(test, + cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + continue; + } + if (inode =3D=3D 1) { + /* Folios in the node should be missing */ + KUNIT_EXPECT_FALSE(test, + cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + continue; + } + KUNIT_EXPECT_TRUE(test, + cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio)); + KUNIT_EXPECT_EQ(test, get_folio_content(test_data.aux_folio), + get_folio_content(folio)); + } + } +} + +static void cleancache_reclaim_test(struct kunit *test) +{ + struct inode_data *inode_data; + struct inode_data *inode_new; + unsigned long new_index; + struct folio *folio; + + /* Store inode folios into cleancache */ + fill_cleancache(test); + + /* + * Store one extra new folio. There should be no free folios, so the + * oldest folio will be reclaimed to store new folio. Add it into the + * last node at the next unoccupied offset. + */ + inode_new =3D &test_data.inodes[INODE_COUNT - 1]; + new_index =3D inode_new->folios[FOLIOS_PER_INODE - 1]->index + PAGE_SIZE; + + test_data.aux_folio->mapping =3D &inode_new->mapping; + test_data.aux_folio->index =3D new_index; + set_folio_content(test_data.aux_folio, TEST_CONTENT); + folio_lock(test_data.aux_folio); /* Folio has to be locked */ + folio_set_workingset(test_data.aux_folio); + KUNIT_EXPECT_TRUE(test, cleancache_store_folio(&inode_new->inode, test_da= ta.aux_folio)); + folio_unlock(test_data.aux_folio); + + /* Verify results */ + for (int inode =3D 0; inode < INODE_COUNT; inode++) { + inode_data =3D &test_data.inodes[inode]; + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio =3D inode_data->folios[fidx]; + test_data.aux_folio->mapping =3D folio->mapping; + test_data.aux_folio->index =3D folio->index; + /* + * The first folio of the first node was added first, + * so it's the oldest and must have been reclaimed. + */ + if (inode =3D=3D 0 && fidx =3D=3D 0) { + /* Reclaimed folio should be missing */ + KUNIT_EXPECT_FALSE_MSG(test, + cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio), + "inode %d, folio %d is invalid\n", inode, fidx); + continue; + } + KUNIT_EXPECT_TRUE_MSG(test, + cleancache_restore_folio(&inode_data->inode, + test_data.aux_folio), + "inode %d, folio %d is invalid\n", + inode, fidx); + KUNIT_EXPECT_EQ_MSG(test, get_folio_content(test_data.aux_folio), + get_folio_content(folio), + "inode %d, folio %d content is invalid\n", + inode, fidx); + } + } + + /* Auxiliary folio should be stored */ + test_data.aux_folio->mapping =3D &inode_new->mapping; + test_data.aux_folio->index =3D new_index; + KUNIT_EXPECT_TRUE_MSG(test, + cleancache_restore_folio(&inode_new->inode, test_data.aux_folio), + "inode %lu, folio %ld is invalid\n", + inode_new->inode.i_ino, new_index); + KUNIT_EXPECT_EQ_MSG(test, get_folio_content(test_data.aux_folio), TEST_CO= NTENT, + "inode %lu, folio %ld content is invalid\n", + inode_new->inode.i_ino, new_index); +} + +static void cleancache_backend_api_test(struct kunit *test) +{ + struct folio *folio; + LIST_HEAD(folios); + int unused =3D 0; + int used =3D 0; + + /* Store inode folios into cleancache */ + fill_cleancache(test); + + /* Get all donated folios back */ + for (int fidx =3D 0; fidx < FOLIO_COUNT; fidx++) { + KUNIT_EXPECT_EQ(test, cleancache_backend_get_folio(test_data.pool_id, + test_data.pool_folios[fidx]), 0); + set_page_refcounted(&test_data.pool_folios[fidx]->page); + } + + /* Try putting a refcounted folio */ + KUNIT_EXPECT_NE(test, cleancache_backend_put_folio(test_data.pool_id, + test_data.pool_folios[0]), 0); + + /* Put some of the folios back into cleancache */ + for (int fidx =3D 0; fidx < FOLIOS_PER_INODE; fidx++) { + folio_ref_freeze(test_data.pool_folios[fidx], 1); + KUNIT_EXPECT_EQ(test, cleancache_backend_put_folio(test_data.pool_id, + test_data.pool_folios[fidx]), 0); + } + + /* Put the rest back into cleancache but keep half of folios still refcou= nted */ + for (int fidx =3D FOLIOS_PER_INODE; fidx < FOLIO_COUNT; fidx++) { + if (fidx % 2) { + folio_ref_freeze(test_data.pool_folios[fidx], 1); + unused++; + } else { + used++; + } + list_add(&test_data.pool_folios[fidx]->lru, &folios); + } + KUNIT_EXPECT_NE(test, cleancache_backend_put_folios(test_data.pool_id, + &folios), 0); + /* Used folios should be still in the list */ + KUNIT_EXPECT_EQ(test, list_count_nodes(&folios), used); + + /* Release refcounts and put the remaining folios into cleancache */ + list_for_each_entry(folio, &folios, lru) + folio_ref_freeze(folio, 1); + KUNIT_EXPECT_EQ(test, cleancache_backend_put_folios(test_data.pool_id, + &folios), 0); + KUNIT_EXPECT_TRUE(test, list_empty(&folios)); +} + +static struct kunit_case cleancache_test_cases[] =3D { + KUNIT_CASE(cleancache_restore_test), + KUNIT_CASE(cleancache_walk_and_restore_test), + KUNIT_CASE(cleancache_invalidate_test), + KUNIT_CASE(cleancache_reclaim_test), + KUNIT_CASE(cleancache_backend_api_test), + {}, +}; + +static struct kunit_suite hashtable_test_module =3D { + .name =3D "cleancache", + .init =3D cleancache_test_init, + .suite_init =3D cleancache_suite_init, + .suite_exit =3D cleancache_suite_exit, + .test_cases =3D cleancache_test_cases, +}; + +kunit_test_suites(&hashtable_test_module); + +MODULE_DESCRIPTION("KUnit test for the Kernel Cleancache"); +MODULE_LICENSE("GPL"); --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7889125A323 for ; Fri, 10 Oct 2025 01:20:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059212; cv=none; b=GLN+4w+/ONjT3xqa2jUIZO2URtcejwfsWCJG008WlsS5dQLrPnY0cBT1I+lEJkhhrGYZ3wzu4qGQXEMPutW5wnhmZ0OYVzGs3uRhudUt8zBLor6VJA8YTai2Ud3TvmtJ1fPt6g6tH2m3OlrI2Qx81Eulz6CieDMqjG0SsEWbmso= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059212; c=relaxed/simple; bh=NHw3RZ90+Da7UyHMBKVXx8p20p9oL5hkY0RUKVTk0us=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=p+v1xiZGRycki1XAWWY+3ku6liIEHBmlvzKFUskxT+egcuH64tbb5bOs/ihXkQORcq6J6VdKBfrwL5Xrwxeuv6nXEtH6sYO0zBpoSx+1TJzGHfeVtkCuf+rIWpwzaZYhiFc+mSH8RFYZe4w09oALmwqT1VksX8L8wJUT2e396LM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0bdtkpWH; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0bdtkpWH" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b5527f0d39bso3663657a12.2 for ; Thu, 09 Oct 2025 18:20:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059210; x=1760664010; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TF3568yP8arSICTc3vnXw7wkTlYOmrE0XKsy2eUArt4=; b=0bdtkpWHo8pLk3hXZk4zATYYmGusDjPfsoPfZoRL3ObRJPNCXeM+xyjBtE7m/VbJL4 1032p7sGoy/QgkXTH3APiB4HfhJM0YQWSoHXcEkVTlNVzqzj4Fi7M7UrMqLaPGH0InXd nigNOI63sa+yccPwFYAx/+xUrqmLqk4QoALc4U1S/c5OmL4tT1nFJlYko6gefmUIUa0j gAzsYWCYUu3RHRdwBgfW31gNBkLgmLOqR/wLetQJJnNtdWAN5E1TSwvnju3k4gGCevU5 QkkEDzDMkV56UKJ8/61lF37Ir0bLo0sWj8aWFjr7Ybkf8texhT8tGFZvJ3b0q70QDAzN XFZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059210; x=1760664010; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TF3568yP8arSICTc3vnXw7wkTlYOmrE0XKsy2eUArt4=; b=q/7qvf9GAW/xhFvyT6+qRhBTHLyVL0qZk2WrCucJlQvnHEn+kPOdqvVB6Rig3Uyw4m INK8tLaQvM0IxtJIfetJFS3CknzVqKklcKnKpkJyk0qhs/LSPtEkqdKCBlXV38NdcIQK JQZXqfNmyzkSUPMMtAYLCC9JGB6OXuOI8t3Hv0yCrUbdc0D5V2yacl5kD7xxiwxVLxk8 vncgqeRCKWTE307XGtY1TG2+hJV5dFvl2JfJNLj8Z3mUU9EMSjfX3ANSVUPxDX/11xXi JTYAsTZQryC9XAJfEewpPSEd3D5Sz0M7RzD+a0mH4MTinKwG6Sn8QweEmyq+TFXHOAgJ hXLg== X-Forwarded-Encrypted: i=1; AJvYcCWOBYms9NKoEN2KpAzdOpgbAgElJYrtRp3xaydVnQOoTd8dV3s2GirYcgjpyAXAEI/kGFxLBywBOexasxQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyZbSrojWHu+Wda4sdfivY38fgkma2HCFNCnIocXmOrXuGTeSPC r/X96Ch1kW201pcmnTxQtApFAGhZumLsuzdyMBmMr07DcAtNWMvha8R9vflNYrKBlaz9YkhOBXF uGg/p0Q== X-Google-Smtp-Source: AGHT+IHpnUj2OMWlXlYxp3ZZZRkgvkpKUji8FWq+Is/jpL7rJv/HUvTEG2aUVmhGkdq+yIG7uf4JBu02rqQ= X-Received: from pgbcq6.prod.google.com ([2002:a05:6a02:4086:b0:b49:de56:6e3c]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:431f:b0:32d:80ab:7132 with SMTP id adf61e73a8af0-32da83de283mr12170415637.37.1760059209572; Thu, 09 Oct 2025 18:20:09 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:49 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-7-surenb@google.com> Subject: [PATCH 6/8] add cleancache documentation From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Document cleancache, it's APIs and sysfs interface. Signed-off-by: Suren Baghdasaryan --- Documentation/mm/cleancache.rst | 112 ++++++++++++++++++++++++++++++++ MAINTAINERS | 1 + 2 files changed, 113 insertions(+) create mode 100644 Documentation/mm/cleancache.rst diff --git a/Documentation/mm/cleancache.rst b/Documentation/mm/cleancache.= rst new file mode 100644 index 000000000000..deaf7de51829 --- /dev/null +++ b/Documentation/mm/cleancache.rst @@ -0,0 +1,112 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Cleancache +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Motivation +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Cleancache is a feature to utilize unused reserved memory for extending +page cache. + +Cleancache can be thought of as a folio-granularity victim cache for clean +file-backed pages that the kernel's pageframe replacement algorithm (PFRA) +would like to keep around, but can't since there isn't enough memory. So +when the PFRA "evicts" a folio, it stores the data contained in the folio +into cleancache memory which is not directly accessible or addressable by +the kernel (transcendent memory) and is of unknown and possibly +time-varying size. + +Later, when a filesystem wishes to access a folio in a file on disk, it +first checks cleancache to see if it already contains required data; if it +does, the folio data is copied into the kernel and a disk access is +avoided. + +The memory cleancache uses is donated by other system components, which +reserve memory not directly addressable by the kernel. By donating this +memory to cleancache, the memory owner enables its utilization while it +is not used. Memory donation is done using cleancache backend API and any +donated memory can be taken back at any time by its donor without no delay +and with guarantees success. Since cleancache uses this memory only to +store clean file-backed data, it can be dropped at any time and therefore +the donor's request to take back the memory can be always satisfied. + +Implementation Overview +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Cleancache "backend" (donor that provides transcendent memory), registers +itself with cleancache "frontend" and received a unique pool_id which it +can use in all later API calls to identify the pool of folios it donates. +Once registered, backend can call cleancache_backend_put_folio() or +cleancache_backend_put_folios() to donate memory to cleancache. Note that +cleancache currently supports only 0-order folios and will not accept +larger-order ones. Once the backend needs that memory back, it can get it +by calling cleancache_backend_get_folio(). Only the original backend can +take the folio it donated from the cleancache. + +Kernel uses cleancache by first calling cleancache_add_fs() to register +each file system and then using a combination of cleancache_store_folio(), +cleancache_restore_folio(), cleancache_invalidate_{folio|inode} to store, +restore and invalidate folio content. +cleancache_{start|end}_inode_walk() are used to walk over folios inside +an inode and cleancache_restore_from_inode() is used to restore folios +during such walks. + +From kernel's point of view folios which are copied into cleancache have +an indefinite lifetime which is completely unknowable by the kernel and so +may or may not still be in cleancache at any later time. Thus, as its name +implies, cleancache is not suitable for dirty folios. Cleancache has +complete discretion over what folios to preserve and what folios to discard +and when. + +Cleancache Performance Metrics +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D + +If CONFIG_CLEANCACHE_SYSFS is enabled, monitoring of cleancache performance +can be done via sysfs in the `/sys/kernel/mm/cleancache` directory. +The effectiveness of cleancache can be measured (across all filesystems) +with provided stats. +Global stats are published directly under `/sys/kernel/mm/cleancache` and +include: + +``stored`` + number of successful cleancache folio stores. + +``skipped`` + number of folios skipped during cleancache store operation. + +``restored`` + number of successful cleancache folio restore operations. + +``missed`` + number of failed cleancache folio restore operations. + +``reclaimed`` + number of folios reclaimed from the cleancache due to insufficient + memory. + +``recalled`` + number of times cleancache folio content was discarded as a result + of the cleancache backend taking the folio back. + +``invalidated`` + number of times cleancache folio content was discarded as a result + of invalidation. + +``cached`` + number of folios currently cached in the cleancache. + +Per-pool stats are published under `/sys/kernel/mm/cleancache/` +where "pool name" is the name pool was registered under. These stats +include: + +``size`` + number of folios donated to this pool. + +``cached`` + number of folios currently cached in the pool. + +``recalled`` + number of times cleancache folio content was discarded as a result + of the cleancache backend taking the folio back from the pool. diff --git a/MAINTAINERS b/MAINTAINERS index 1c97227e7ffa..441e68c94177 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6053,6 +6053,7 @@ CLEANCACHE M: Suren Baghdasaryan L: linux-mm@kvack.org S: Maintained +F: Documentation/mm/cleancache.rst F: include/linux/cleancache.h F: mm/cleancache.c F: mm/cleancache_sysfs.c --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55D21261593 for ; Fri, 10 Oct 2025 01:20:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059214; cv=none; b=DMV6nn8wdxj/SKYttfFpjHw7V3duTYSIltMeQHwRlVNbNRRnsLOTXr0FWgntVd3IA0vfbHwxksRDPnD8elnEsU+RyrF4ZLdJ04WLkSFYRSm80RgFyb94C5zrw4JHMkM1LL7hWlftd1nzcWal8g7FgRrof2d4mpV3crLvyDThAZo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059214; c=relaxed/simple; bh=2Kc+vG/8MkGEpxWJcD4ifjS1MgE6MwSmoOS8ymE7jeg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fY3FGfceuFH6QHplVbKf4eCtb+pXT5VS+5Ze2NOStntBYhRjNyuDtFIqpmCgUJ9aeAXBtSu23BC1GXc3yr9En5s+r88cTJW/RpwcTcxSDWb6ht0AbLjdgqXwEhUvTacrQKXcOeokwgp/6uJiyqrTXjUaN6xn5nE6JnAaVxQoJ3I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=awxNyvXl; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="awxNyvXl" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-269880a7bd9so35732815ad.3 for ; Thu, 09 Oct 2025 18:20:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059211; x=1760664011; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BMspCuJAq7oIjVtiLnvMgH+BOK3foud4K2myRpZR2Ss=; b=awxNyvXl+ijY1RCQm6mHIVE9wySJvO1dlrvcLl5Dr/5nLnFp6vlL3axKr5CueA8QMF 0gg+kYcu7g0EGpQ7tf1nw3I54Cjv19bfKGMNlRNwckqZmZOPvc98pbmojy3ipEsVurJ4 gE1VfpgHvjZtBQ5z7fiS9aXVCZM/rxYG/LJeMcrZzmMd5boTIxofnqfNb8PVWvGRP69U gYIAySDSsHF+oyXAMjmJ3nT7uU3vzzR6AgtCZSJAwxP43kyzL4AfVDn3HQHuKXSgau5C tYoUDatBLySucc8JuX0w/LOp24jhO/Xc5+zklAPfAwUGtj26+I9Ofcn/Xzu2QbRaDKDD ZFrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059211; x=1760664011; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BMspCuJAq7oIjVtiLnvMgH+BOK3foud4K2myRpZR2Ss=; b=E0yitJ7lQLPWpFK5hnFUFjXpntkPdBxLhgQwAlq8dw0zU0DyW1RPOkR/oz9hOPmOqW al46BdZHxmyZdVWALme2GXwR3IVse6RQO+2Y0QqLTwVcQFllo4al4mGwcRI21140oAOh yu6+A3IWhrAxGGtak2VqvJStR26kCa9WCwdNVSw49V/gayUk6ofEpgAmY8qDCHrAWoe1 BLx6aU9PiibhK9tHLjDeG/bUt6uXn+Ou23hFKwLTEQSpf1F9sT3NL/i8HH25BjbgMxCF uPnrbcbKZY73CQJo81O/+dvQFRJbbAyNVWCMaW8VDagQx84t8Lpd3ACqe/XmDE1M8801 Rpaw== X-Forwarded-Encrypted: i=1; AJvYcCUVYkEuot6mqizZAR4x1a/49I3FHh54ngE3b3Ys2rLuX0TIuC9EjtqCjhKHusdrk09fUr8L/+l045Ylz1w=@vger.kernel.org X-Gm-Message-State: AOJu0YyZypsjbpamON7uSvxEG378hJqQuNxl2P8ojNhrjLbNmkaIGhw3 3iEznRjl4tGfYJe0jC/lKXO2BZJRxwoY8WMGUV7Yp4xMNedG6QFnvoAg0m9qWh1EIcPYsnXwNjB 5Evl0JQ== X-Google-Smtp-Source: AGHT+IEWKkEivo5xkxXtogAKm07PhNF4ozYnoXCJEzQk/zOL2Y/TNZ+SlYrB9tiZy0+b63lcurI0GuhAwZk= X-Received: from plas15.prod.google.com ([2002:a17:903:200f:b0:25f:48ba:97e1]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2d0:b0:25e:78db:4a0d with SMTP id d9443c01a7336-290273eddd5mr117013065ad.36.1760059211496; Thu, 09 Oct 2025 18:20:11 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:50 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-8-surenb@google.com> Subject: [PATCH 7/8] mm: introduce GCMA From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, Minchan Kim Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Minchan Kim This patch introduces GCMA (Guaranteed Contiguous Memory Allocator) cleacache backend which reserves some amount of memory at the boot and then donates it to store clean file-backed pages in the cleancache. GCMA aims to guarantee contiguous memory allocation success as well as low and deterministic allocation latency. Notes: Originally, the idea was posted by SeongJae Park and Minchan Kim [1]. Later Minchan reworked it to be used in Android as a reference for Android vendors to use [2]. [1] https://lwn.net/Articles/619865/ [2] https://android-review.googlesource.com/q/topic:%22gcma_6.12%22 Signed-off-by: Minchan Kim Signed-off-by: Suren Baghdasaryan --- MAINTAINERS | 2 + include/linux/gcma.h | 36 +++++++ mm/Kconfig | 15 +++ mm/Makefile | 1 + mm/gcma.c | 231 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 285 insertions(+) create mode 100644 include/linux/gcma.h create mode 100644 mm/gcma.c diff --git a/MAINTAINERS b/MAINTAINERS index 441e68c94177..95b5ad26ec11 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16361,6 +16361,7 @@ F: Documentation/admin-guide/mm/ F: Documentation/mm/ F: include/linux/cma.h F: include/linux/dmapool.h +F: include/linux/gcma.h F: include/linux/ioremap.h F: include/linux/memory-tiers.h F: include/linux/page_idle.h @@ -16372,6 +16373,7 @@ F: mm/dmapool.c F: mm/dmapool_test.c F: mm/early_ioremap.c F: mm/fadvise.c +F: mm/gcma.c F: mm/ioremap.c F: mm/mapping_dirty_helpers.c F: mm/memory-tiers.c diff --git a/include/linux/gcma.h b/include/linux/gcma.h new file mode 100644 index 000000000000..20b2c85de87b --- /dev/null +++ b/include/linux/gcma.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __GCMA_H__ +#define __GCMA_H__ + +#include + +#ifdef CONFIG_GCMA + +int gcma_register_area(const char *name, + unsigned long start_pfn, unsigned long count); + +/* + * NOTE: allocated pages are still marked reserved and when freeing them + * the caller should ensure they are isolated and not referenced by anyone + * other than the caller. + */ +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t g= fp); +int gcma_free_range(unsigned long start_pfn, unsigned long count); + +#else /* CONFIG_GCMA */ + +static inline int gcma_register_area(const char *name, + unsigned long start_pfn, + unsigned long count) + { return -EOPNOTSUPP; } +static inline int gcma_alloc_range(unsigned long start_pfn, + unsigned long count, gfp_t gfp) + { return -EOPNOTSUPP; } + +static inline int gcma_free_range(unsigned long start_pfn, + unsigned long count) + { return -EOPNOTSUPP; } + +#endif /* CONFIG_GCMA */ + +#endif /* __GCMA_H__ */ diff --git a/mm/Kconfig b/mm/Kconfig index 9f4da8a848f4..41ce5ef8db55 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1013,6 +1013,21 @@ config CMA_AREAS =20 If unsure, leave the default value "8" in UMA and "20" in NUMA. =20 +config GCMA + bool "GCMA (Guaranteed Contiguous Memory Allocator)" + depends on CLEANCACHE + help + This enables the Guaranteed Contiguous Memory Allocator to allow + low latency guaranteed contiguous memory allocations. Memory + reserved by GCMA is donated to cleancache to be used as pagecache + extension. Once GCMA allocation is requested, necessary pages are + taken back from the cleancache and used to satisfy the request. + Cleancache guarantees low latency successful allocation as long + as the total size of GCMA allocations does not exceed the size of + the memory donated to the cleancache. + + If unsure, say "N". + # # Select this config option from the architecture Kconfig, if available, t= o set # the max page order for physically contiguous allocations. diff --git a/mm/Makefile b/mm/Makefile index 845841a140e3..05aee66a8b07 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -149,3 +149,4 @@ obj-$(CONFIG_TMPFS_QUOTA) +=3D shmem_quota.o obj-$(CONFIG_PT_RECLAIM) +=3D pt_reclaim.o obj-$(CONFIG_CLEANCACHE) +=3D cleancache.o obj-$(CONFIG_CLEANCACHE_SYSFS) +=3D cleancache_sysfs.o +obj-$(CONFIG_GCMA) +=3D gcma.o diff --git a/mm/gcma.c b/mm/gcma.c new file mode 100644 index 000000000000..3ee0e1340db3 --- /dev/null +++ b/mm/gcma.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * GCMA (Guaranteed Contiguous Memory Allocator) + * + */ + +#define pr_fmt(fmt) "gcma: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +#define MAX_GCMA_AREAS 64 +#define GCMA_AREA_NAME_MAX_LEN 32 + +struct gcma_area { + int pool_id; + unsigned long start_pfn; + unsigned long end_pfn; + char name[GCMA_AREA_NAME_MAX_LEN]; +}; + +static struct gcma_area areas[MAX_GCMA_AREAS]; +static atomic_t nr_gcma_area =3D ATOMIC_INIT(0); +static DEFINE_SPINLOCK(gcma_area_lock); + +static int free_folio_range(struct gcma_area *area, + unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long scanned =3D 0; + struct folio *folio; + unsigned long pfn; + + for (pfn =3D start_pfn; pfn < end_pfn; pfn++) { + int err; + + if (!(++scanned % XA_CHECK_SCHED)) + cond_resched(); + + folio =3D pfn_folio(pfn); + err =3D cleancache_backend_put_folio(area->pool_id, folio); + if (WARN(err, "PFN %lu: folio is still in use\n", pfn)) + return -EINVAL; + } + + return 0; +} + +static int alloc_folio_range(struct gcma_area *area, + unsigned long start_pfn, unsigned long end_pfn, + gfp_t gfp) +{ + unsigned long scanned =3D 0; + unsigned long pfn; + + for (pfn =3D start_pfn; pfn < end_pfn; pfn++) { + int err; + + if (!(++scanned % XA_CHECK_SCHED)) + cond_resched(); + + err =3D cleancache_backend_get_folio(area->pool_id, pfn_folio(pfn)); + if (err) { + free_folio_range(area, start_pfn, pfn); + return err; + } + } + + return 0; +} + +static struct gcma_area *find_area(unsigned long start_pfn, unsigned long = end_pfn) +{ + int nr_area =3D atomic_read_acquire(&nr_gcma_area); + int i; + + for (i =3D 0; i < nr_area; i++) { + struct gcma_area *area =3D &areas[i]; + + if (area->end_pfn <=3D start_pfn) + continue; + + if (area->start_pfn > end_pfn) + continue; + + /* The entire range should belong to a single area */ + if (start_pfn < area->start_pfn || end_pfn > area->end_pfn) + break; + + /* Found the area containing the entire range */ + return area; + } + + return NULL; +} + +int gcma_register_area(const char *name, + unsigned long start_pfn, unsigned long count) +{ + LIST_HEAD(folios); + int i, pool_id; + int nr_area; + int ret =3D 0; + + pool_id =3D cleancache_backend_register_pool(name); + if (pool_id < 0) + return pool_id; + + for (i =3D 0; i < count; i++) { + struct folio *folio; + + folio =3D pfn_folio(start_pfn + i); + folio_clear_reserved(folio); + folio_set_count(folio, 0); + list_add(&folio->lru, &folios); + } + + cleancache_backend_put_folios(pool_id, &folios); + + spin_lock(&gcma_area_lock); + + nr_area =3D atomic_read(&nr_gcma_area); + if (nr_area < MAX_GCMA_AREAS) { + struct gcma_area *area =3D &areas[nr_area]; + + area->pool_id =3D pool_id; + area->start_pfn =3D start_pfn; + area->end_pfn =3D start_pfn + count; + strscpy(area->name, name); + /* Ensure above stores complete before we increase the count */ + atomic_set_release(&nr_gcma_area, nr_area + 1); + } else { + ret =3D -ENOMEM; + } + + spin_unlock(&gcma_area_lock); + + return ret; +} +EXPORT_SYMBOL_GPL(gcma_register_area); + +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t g= fp) +{ + unsigned long end_pfn =3D start_pfn + count; + struct gcma_area *area; + struct folio *folio; + int err, order =3D 0; + + gfp =3D current_gfp_context(gfp); + if (gfp & __GFP_COMP) { + if (!is_power_of_2(count)) + return -EINVAL; + + order =3D ilog2(count); + if (order >=3D MAX_PAGE_ORDER) + return -EINVAL; + } + + area =3D find_area(start_pfn, end_pfn); + if (!area) + return -EINVAL; + + err =3D alloc_folio_range(area, start_pfn, end_pfn, gfp); + if (err) + return err; + + /* + * GCMA returns pages with refcount 1 and expects them to have + * the same refcount 1 when they are freed. + */ + if (order) { + folio =3D pfn_folio(start_pfn); + set_page_count(&folio->page, 1); + prep_compound_page(&folio->page, order); + } else { + for (unsigned long pfn =3D start_pfn; pfn < end_pfn; pfn++) { + folio =3D pfn_folio(pfn); + set_page_count(&folio->page, 1); + } + } + + return 0; +} +EXPORT_SYMBOL_GPL(gcma_alloc_range); + +int gcma_free_range(unsigned long start_pfn, unsigned long count) +{ + unsigned long end_pfn =3D start_pfn + count; + struct gcma_area *area; + struct folio *folio; + + area =3D find_area(start_pfn, end_pfn); + if (!area) + return -EINVAL; + + folio =3D pfn_folio(start_pfn); + if (folio_test_large(folio)) { + int expected =3D folio_nr_pages(folio); + + if (WARN(count !=3D expected, "PFN %lu: count %lu !=3D expected %d\n", + start_pfn, count, expected)) + return -EINVAL; + + if (WARN(!folio_ref_dec_and_test(folio), + "PFN %lu: invalid folio refcount when freeing\n", start_pfn)) + return -EINVAL; + + free_pages_prepare(&folio->page, folio_order(folio)); + } else { + for (unsigned long pfn =3D start_pfn; pfn < end_pfn; pfn++) { + folio =3D pfn_folio(pfn); + if (folio_nr_pages(folio) =3D=3D 1) + count--; + + if (WARN(!folio_ref_dec_and_test(folio), + "PFN %lu: invalid folio refcount when freeing\n", pfn)) + return -EINVAL; + + free_pages_prepare(&folio->page, 0); + } + WARN(count !=3D 0, "%lu pages are still in use!\n", count); + } + + return free_folio_range(area, start_pfn, end_pfn); +} +EXPORT_SYMBOL_GPL(gcma_free_range); --=20 2.51.0.740.g6adb054d12-goog From nobody Fri Dec 19 10:42:15 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BE1C26A091 for ; Fri, 10 Oct 2025 01:20:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059216; cv=none; b=NlgoA2FlDbpf459REcauDJbb9gsynMLaR1Q74HzgKQjkRCkOD/+3tDc6H/pbxPjaCB7gYbuJ6mngSQHSNoeiXHGyx7e7cs50JGn8geuYe/hihHD1iB8ukBL6aZ8Ea6nhZHVR6fHOrCw701t7YVdLXkEmkuLzyC2knGZBKZ8mwvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760059216; c=relaxed/simple; bh=oyemZIe+YmeGtd9SCcDsGnEDPdQlvUm05tKLh3ceOSk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pbMomiuJjRXm9M+mkW3l+nAQs69P97afpcmzGV5NgUSQAJzad77jvcagIOAj4EsecfY665xu+oJYRX3Wxt1IxOE43GACFR3774+4BncMGuXoDdrTNv5RA5OmdUFPmyEeHLtkPLurGNN2dWlmRaw4T9T132j/VAIXDx4p9GIhgA8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=gsFRymlZ; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="gsFRymlZ" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b5509ed1854so1807014a12.0 for ; Thu, 09 Oct 2025 18:20:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1760059214; x=1760664014; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Vrde2qlU5hqI2CpGGepvHr5s2l9yi26YQf6xU7aW9eE=; b=gsFRymlZ+Y96mJXgowjOaL4nbZn8xqz9wFAIIalnn0mClLRL5KwcCKvSBeZmcHvfV8 qh69VoknRSIwt5xeBKRWrS4FRTvpcpUX6f2ki5S0hJlkw5HBJbVRxuYuMLk+IsgeN7rt P23NzJ2EP852ziE0ZUjvcATbQ+rjsWMLKrLgTE7wdPjEiyNkj2/FY2WVfouwV8lDJ+Xt 5XdAkrH2+MHWjpzKzp4peTlM/laFk5BluNBjhjfI/jw6ye3lsyY6z/malwM6I3ZEVxgY vB2s5RYvLOoqtJhjq7Hq4MCiKZyzpREyvBAbWpqohAz6W5sIc7pCcYVlbtEROwf3DZng uI8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760059214; x=1760664014; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Vrde2qlU5hqI2CpGGepvHr5s2l9yi26YQf6xU7aW9eE=; b=Oi9QPhcajjO0vwp4dA8BBOnGoU2SOT+3rEwfgrd5WHWwb8EhqeJntIxq1daiZoh+wr 2igbsa+G3xefUQkVXAbAEyme2A2NtoK9zqnOBfUEC6JfCnYqEiN8+xKDr/gX2A8JPjXZ NS2B/NGEGiT2hd26PqB3bwIIbrwi5MlzGFSK/ysfps9P9FWkEFHiBkQYNJYoDqV3dL04 f2NdmAggDvUX+esdS581Fj68hr+YyOpD81Upohk5BgI5HcJBLjyMwrBI/CgTr+Au7Dew 2wKkpf1ktOcZEpuiZnYzJy6IHENhyo4rP4Ejp+ZPu1Ybba3Ccws6rbwgppz9AY/SgNdQ GyHg== X-Forwarded-Encrypted: i=1; AJvYcCWzfLE63CZ1ig/f+giVfFjZX32mlebEJQR0IyhImzoJ7i3+VNbt6M4aqGZwr4M5Q7BiDYO5R42kz9LWBAA=@vger.kernel.org X-Gm-Message-State: AOJu0YyIzmmbLiaTFigDhWt3tvECKXqO9TjXDLa2UtI0OaDzDyWlCsRI OBURzZ5uCWYUNj2okJxLlPf9mXSmbxgoS4mPO8iDfaZzM2DTkkDWtCioUMBXzSJ8U5GAnnDqKoi 27R+QKg== X-Google-Smtp-Source: AGHT+IHv8uHGO40bLuN/jD5N/hL/DXXvgcz7LG4u+TAflEKVP3j7kuuG1LjFXB0Jk3SVlTNaWflJDBbA6Ac= X-Received: from pgew27.prod.google.com ([2002:a63:af1b:0:b0:b55:794f:64bb]) (user=surenb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:12ce:b0:309:48d8:cf0a with SMTP id adf61e73a8af0-32da8462fb2mr12231761637.54.1760059213708; Thu, 09 Oct 2025 18:20:13 -0700 (PDT) Date: Thu, 9 Oct 2025 18:19:51 -0700 In-Reply-To: <20251010011951.2136980-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251010011951.2136980-1-surenb@google.com> X-Mailer: git-send-email 2.51.0.740.g6adb054d12-goog Message-ID: <20251010011951.2136980-9-surenb@google.com> Subject: [PATCH 8/8] mm: integrate GCMA with CMA using dt-bindings From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: david@redhat.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com, sj@kernel.org, rppt@kernel.org, mhocko@suse.com, corbet@lwn.net, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, hch@infradead.org, jack@suse.cz, willy@infradead.org, m.szyprowski@samsung.com, robin.murphy@arm.com, hannes@cmpxchg.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, minchan@kernel.org, surenb@google.com, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev, Minchan Kim Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a new "guarantee" property for shared-dma-pool to enable GCMA-backed memory pools. Memory allocations from such pools will have low latency and will be guaranteed to succeed as long as there is contiguous space inside the reservation. dt-schema for shared-dma-pool [1] will need to be updated once this patch is accepted. [1] https://github.com/devicetree-org/dt-schema/blob/main/dtschema/schemas/= reserved-memory/shared-dma-pool.yaml Signed-off-by: Minchan Kim Signed-off-by: Suren Baghdasaryan --- include/linux/cma.h | 11 +++++++++-- kernel/dma/contiguous.c | 11 ++++++++++- mm/Kconfig | 2 +- mm/cma.c | 37 +++++++++++++++++++++++++++---------- mm/cma.h | 1 + mm/cma_sysfs.c | 10 ++++++++++ mm/gcma.c | 2 +- 7 files changed, 59 insertions(+), 15 deletions(-) diff --git a/include/linux/cma.h b/include/linux/cma.h index 62d9c1cf6326..3ec2e76a8666 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -43,10 +43,17 @@ static inline int __init cma_declare_contiguous(phys_ad= dr_t base, extern int __init cma_declare_contiguous_multi(phys_addr_t size, phys_addr_t align, unsigned int order_per_bit, const char *name, struct cma **res_cma, int nid); -extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, +extern int __cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, unsigned int order_per_bit, const char *name, - struct cma **res_cma); + struct cma **res_cma, bool gcma); +static inline int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, + unsigned int order_per_bit, + const char *name, + struct cma **res_cma) +{ + return __cma_init_reserved_mem(base, size, order_per_bit, name, res_cma, = false); +} extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsign= ed int align, bool no_warn); extern bool cma_pages_valid(struct cma *cma, const struct page *pages, uns= igned long count); diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index d9b9dcba6ff7..73a699ef0377 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -461,6 +461,7 @@ static int __init rmem_cma_setup(struct reserved_mem *r= mem) unsigned long node =3D rmem->fdt_node; bool default_cma =3D of_get_flat_dt_prop(node, "linux,cma-default", NULL); struct cma *cma; + bool gcma; int err; =20 if (size_cmdline !=3D -1 && default_cma) { @@ -478,7 +479,15 @@ static int __init rmem_cma_setup(struct reserved_mem *= rmem) return -EINVAL; } =20 - err =3D cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, &cma= ); + gcma =3D !!of_get_flat_dt_prop(node, "guarantee", NULL); +#ifndef CONFIG_GCMA + if (gcma) { + pr_err("Reserved memory: unable to setup GCMA region, GCMA is not enable= d\n"); + return -EINVAL; + } +#endif + err =3D __cma_init_reserved_mem(rmem->base, rmem->size, 0, rmem->name, + &cma, gcma); if (err) { pr_err("Reserved memory: unable to setup CMA region\n"); return err; diff --git a/mm/Kconfig b/mm/Kconfig index 41ce5ef8db55..729f150369cc 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1015,7 +1015,7 @@ config CMA_AREAS =20 config GCMA bool "GCMA (Guaranteed Contiguous Memory Allocator)" - depends on CLEANCACHE + depends on CLEANCACHE && CMA help This enables the Guaranteed Contiguous Memory Allocator to allow low latency guaranteed contiguous memory allocations. Memory diff --git a/mm/cma.c b/mm/cma.c index 813e6dc7b095..71fb494ef2a4 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -28,6 +28,7 @@ #include #include #include +#include #include =20 #include "internal.h" @@ -161,11 +162,18 @@ static void __init cma_activate_area(struct cma *cma) count =3D early_pfn[r] - cmr->base_pfn; bitmap_count =3D cma_bitmap_pages_to_bits(cma, count); bitmap_set(cmr->bitmap, 0, bitmap_count); + } else { + count =3D 0; } =20 - for (pfn =3D early_pfn[r]; pfn < cmr->base_pfn + cmr->count; - pfn +=3D pageblock_nr_pages) - init_cma_reserved_pageblock(pfn_to_page(pfn)); + if (cma->gcma) { + gcma_register_area(cma->name, early_pfn[r], + cma->count - count); + } else { + for (pfn =3D early_pfn[r]; pfn < cmr->base_pfn + cmr->count; + pfn +=3D pageblock_nr_pages) + init_cma_reserved_pageblock(pfn_to_page(pfn)); + } } =20 spin_lock_init(&cma->lock); @@ -252,7 +260,7 @@ static void __init cma_drop_area(struct cma *cma) } =20 /** - * cma_init_reserved_mem() - create custom contiguous area from reserved m= emory + * __cma_init_reserved_mem() - create custom contiguous area from reserved= memory * @base: Base address of the reserved area * @size: Size of the reserved area (in bytes), * @order_per_bit: Order of pages represented by one bit on bitmap. @@ -260,13 +268,14 @@ static void __init cma_drop_area(struct cma *cma) * the area will be set to "cmaN", where N is a running counter of * used areas. * @res_cma: Pointer to store the created cma region. + * @gcma: Flag to reserve guaranteed reserved memory area. * * This function creates custom contiguous area from already reserved memo= ry. */ -int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, - unsigned int order_per_bit, - const char *name, - struct cma **res_cma) +int __init __cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, + unsigned int order_per_bit, + const char *name, + struct cma **res_cma, bool gcma) { struct cma *cma; int ret; @@ -297,6 +306,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys= _addr_t size, cma->ranges[0].count =3D cma->count; cma->nranges =3D 1; cma->nid =3D NUMA_NO_NODE; + cma->gcma =3D gcma; =20 *res_cma =3D cma; =20 @@ -836,7 +846,11 @@ static int cma_range_alloc(struct cma *cma, struct cma= _memrange *cmr, spin_unlock_irq(&cma->lock); =20 mutex_lock(&cma->alloc_mutex); - ret =3D alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp); + if (cma->gcma) + ret =3D gcma_alloc_range(pfn, count, gfp); + else + ret =3D alloc_contig_range(pfn, pfn + count, + ACR_FLAGS_CMA, gfp); mutex_unlock(&cma->alloc_mutex); if (!ret) break; @@ -1009,7 +1023,10 @@ bool cma_release(struct cma *cma, const struct page = *pages, if (r =3D=3D cma->nranges) return false; =20 - free_contig_range(pfn, count); + if (cma->gcma) + gcma_free_range(pfn, count); + else + free_contig_range(pfn, count); cma_clear_bitmap(cma, cmr, pfn, count); cma_sysfs_account_release_pages(cma, count); trace_cma_release(cma->name, pfn, pages, count); diff --git a/mm/cma.h b/mm/cma.h index c70180c36559..3b09e8619082 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -49,6 +49,7 @@ struct cma { char name[CMA_MAX_NAME]; int nranges; struct cma_memrange ranges[CMA_MAX_RANGES]; + bool gcma; #ifdef CONFIG_CMA_SYSFS /* the number of CMA page successful allocations */ atomic64_t nr_pages_succeeded; diff --git a/mm/cma_sysfs.c b/mm/cma_sysfs.c index 97acd3e5a6a5..4ecc36270a4d 100644 --- a/mm/cma_sysfs.c +++ b/mm/cma_sysfs.c @@ -80,6 +80,15 @@ static ssize_t available_pages_show(struct kobject *kobj, } CMA_ATTR_RO(available_pages); =20 +static ssize_t gcma_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct cma *cma =3D cma_from_kobj(kobj); + + return sysfs_emit(buf, "%d\n", cma->gcma); +} +CMA_ATTR_RO(gcma); + static void cma_kobj_release(struct kobject *kobj) { struct cma *cma =3D cma_from_kobj(kobj); @@ -95,6 +104,7 @@ static struct attribute *cma_attrs[] =3D { &release_pages_success_attr.attr, &total_pages_attr.attr, &available_pages_attr.attr, + &gcma_attr.attr, NULL, }; ATTRIBUTE_GROUPS(cma); diff --git a/mm/gcma.c b/mm/gcma.c index 3ee0e1340db3..8e7d7a829b49 100644 --- a/mm/gcma.c +++ b/mm/gcma.c @@ -119,7 +119,7 @@ int gcma_register_area(const char *name, folio_set_count(folio, 0); list_add(&folio->lru, &folios); } - + folio_zone(pfn_folio(start_pfn))->cma_pages +=3D count; cleancache_backend_put_folios(pool_id, &folios); =20 spin_lock(&gcma_area_lock); --=20 2.51.0.740.g6adb054d12-goog