From nobody Fri Dec 19 08:59:21 2025 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5705816CD24 for ; Wed, 28 Aug 2024 11:20:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.132 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844012; cv=none; b=iCC+LUjK/cghpz/AclqkMDWvhRRPX3SUidGZDOkBU2nXRgtxGAYqkFhXWECEwI6K3K55UftHycs8KsCVi1p6Y4dfgS22fXrjTkZgMvlGaWmi3TQt+WAwQXKqxWdBP9eBfYnuqC2L19pksxXGWvWxGj0T1unNl67odWs8ud1S6s8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844012; c=relaxed/simple; bh=UlfRxRywAS2lOoTJWHbkwthZ3E4AlBxQfGVIssaQN1A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bAKylAo1zPTtuDAjBr7lGbJgJn8zF9oLFKUcUIMab17mVycL81A6575pbibxtVUZNXJBBch7VUX5pdO7ceZLKd7rLyqlr3Vo0MOjBEE0IbONFCCNNy/PR4upmyPyMyOrwisJBO0y1mkXdQ5exlw8tThWbJUorowTbvc76A31M64= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=hIoHgRg2; arc=none smtp.client-ip=115.124.30.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="hIoHgRg2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1724844003; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=+MAPxIrkS42YkSLKKULbrzqnaSd+CBL8Uk5HdFnoGPE=; b=hIoHgRg25nnANFB5ZQLtXQHGj4dx9MbBfkMXTJehqeXVSSd9X0uuFCFwwnsP95P79nr+3N/NWUexIsoe+DDqrLAIoJ7B22B9jsfaQwr9/TnPGMioZpyTlILex7rm4lYHkKYntanm/52Wk3KwcIToAU7BwcnPgapESQaw+kqqKac= Received: from localhost(mailfrom:hongzhen@linux.alibaba.com fp:SMTPD_---0WDpbUOx_1724844002) by smtp.aliyun-inc.com; Wed, 28 Aug 2024 19:20:03 +0800 From: Hongzhen Luo To: linux-erofs@lists.ozlabs.org, lihongbo22@huawei.com Cc: linux-kernel@vger.kernel.org, Hongzhen Luo Subject: [PATCH RFC v3 1/3] erofs: move `struct erofs_anon_fs_type` to super.c Date: Wed, 28 Aug 2024 19:19:57 +0800 Message-ID: <20240828111959.3677011-2-hongzhen@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> References: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the `struct erofs_anon_fs_type` to the super.c and expose it in preparation for the upcoming page cache share feature. Signed-off-by: Hongzhen Luo --- v3: Changes since the v1: - Utilize the `erofs_anon_sops` as the interface for the VFS to operate on anonymous inodes. The subsequent patch will implement .free_inode to facilitate more granular control over anonymous inodes. v2: The patch set v2 does not move the `struct erofs_anon_fs_type` to super= .c. v1: https://lore.kernel.org/all/20240722065355.1396365-2-hongzhen@linux.ali= baba.com/ --- fs/erofs/fscache.c | 13 ------------- fs/erofs/internal.h | 2 ++ fs/erofs/super.c | 21 +++++++++++++++++++++ 3 files changed, 23 insertions(+), 13 deletions(-) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index fda16eedafb5..826b2893acb2 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -3,7 +3,6 @@ * Copyright (C) 2022, Alibaba Cloud * Copyright (C) 2022, Bytedance Inc. All rights reserved. */ -#include #include #include "internal.h" =20 @@ -13,18 +12,6 @@ static LIST_HEAD(erofs_domain_list); static LIST_HEAD(erofs_domain_cookies_list); static struct vfsmount *erofs_pseudo_mnt; =20 -static int erofs_anon_init_fs_context(struct fs_context *fc) -{ - return init_pseudo(fc, EROFS_SUPER_MAGIC) ? 0 : -ENOMEM; -} - -static struct file_system_type erofs_anon_fs_type =3D { - .owner =3D THIS_MODULE, - .name =3D "pseudo_erofs", - .init_fs_context =3D erofs_anon_init_fs_context, - .kill_sb =3D kill_anon_super, -}; - struct erofs_fscache_io { struct netfs_cache_resources cres; struct iov_iter iter; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 736607675396..3f1984664dac 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -387,6 +387,8 @@ extern const struct file_operations erofs_dir_fops; =20 extern const struct iomap_ops z_erofs_iomap_report_ops; =20 +extern struct file_system_type erofs_anon_fs_type; + /* flags for erofs_fscache_register_cookie() */ #define EROFS_REG_COOKIE_SHARE 0x0001 #define EROFS_REG_COOKIE_NEED_NOEXIST 0x0002 diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 32ce5b35e1df..36291feaa5f6 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "xattr.h" =20 #define CREATE_TRACE_POINTS @@ -848,6 +849,26 @@ static struct file_system_type erofs_fs_type =3D { }; MODULE_ALIAS_FS("erofs"); =20 +static const struct super_operations erofs_anon_sops =3D { + .statfs =3D simple_statfs, +}; + +static int erofs_anon_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx =3D init_pseudo(fc, EROFS_SUPER_MAGIC); + + if (ctx) + ctx->ops =3D &erofs_anon_sops; + return ctx ? 0 : -ENOMEM; +} + +struct file_system_type erofs_anon_fs_type =3D { + .owner =3D THIS_MODULE, + .name =3D "pseudo_erofs", + .init_fs_context =3D erofs_anon_init_fs_context, + .kill_sb =3D kill_anon_super, +}; + static int __init erofs_module_init(void) { int err; --=20 2.43.5 From nobody Fri Dec 19 08:59:21 2025 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F94916BE1D for ; Wed, 28 Aug 2024 11:20:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844010; cv=none; b=M9L8WqBXEgdf3FDB7LSi+zm97zK+9Fu828aa7P7wu4VM1PcqBzEa4sT1H0NQu3+im9C93fcVhkEc5eJOb6ei/zpiDn/QpkH9KYq0TTAI59d9W24zG9PgBVMBPAbIs5SuymAUah8zj7XOA7bE+TM+krOoCs/+ST/ajucrq38iDcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844010; c=relaxed/simple; bh=3pi99uHIwwOIFQzEvRSPY6co3WBSXGC1UypCX80O/qc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EvAFBSx01q2vVHjViQTGZKEMcDI2ZSu4lDtPw1sDC4eNyyOIO1jK6PCPP7JBcjKUct04CIEtW+RLomiWflHt8RiXugVqtXRhVJNpTJ1VScHSa4SAmXsh4cvNCbfoRti63oMDC5V2T4lfkIkEp3TIA8qaFJd3WeuHwFf6YBy7b0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=cE0eWcpG; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="cE0eWcpG" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1724844004; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=GP1qWyY2z2RJIcQnlphdbQWrtwZ/5976hBqeDeYwFxE=; b=cE0eWcpGrDht4o3bs8WIhkLlBFXu9w/b1SAyTUG3kjGJWwncREZiI9Sk21zr1CJ6TSgn2YlZ8qjCc1voOu8zHrsQhPcXBaYoJcuTSlM229H6GAq5EIpydRix/95U+Y/xZ/WI9JLvV3nGG3R4zHKFTif+xTRb7TCRd3eFiwMZ1ag= Received: from localhost(mailfrom:hongzhen@linux.alibaba.com fp:SMTPD_---0WDpbUPP_1724844003) by smtp.aliyun-inc.com; Wed, 28 Aug 2024 19:20:04 +0800 From: Hongzhen Luo To: linux-erofs@lists.ozlabs.org, lihongbo22@huawei.com Cc: linux-kernel@vger.kernel.org, Hongzhen Luo Subject: [PATCH RFC v3 2/3] erofs: introduce page cache share feature Date: Wed, 28 Aug 2024 19:19:58 +0800 Message-ID: <20240828111959.3677011-3-hongzhen@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> References: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently, reading files with different paths (or names) but the same content will consume multiple copies of the page cache, even if the content of these page caches is the same. For example, reading identical files (e.g., *.so files) from two different minor versions of container images will cost multiple copies of the same page cache, since different containers have different mount points. Therefore, sharing the page cache for files with the same content can save memory. This introduces the page cache share feature in erofs. During the mkfs phase, the file content is hashed and the hash value is stored in the `trusted.erofs.fingerprint` extended attribute. Inodes of files with the same `trusted.erofs.fingerprint` are mapped to the same anonymous inode (indicated by the `ano_inode` field). When a read request occurs, the anonymous inode serves as a "container" whose page cache is shared. The actual operations involving the iomap are carried out by the original inode which is mapped to the anonymous inode. Below is the memory usage for reading all files in two different minor versions of container images: +-------------------+------------------+-------------+---------------+ | Image | Page Cache Share | Memory (MB) | Memory | | | | | Reduction (%) | +-------------------+------------------+-------------+---------------+ | | No | 241 | - | | redis +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 163 | 33% | +-------------------+------------------+-------------+---------------+ | | No | 872 | - | | postgres +------------------+-------------+---------------+ | 16.1 & 16.2 | Yes | 630 | 28% | +-------------------+------------------+-------------+---------------+ | | No | 2771 | - | | tensorflow +------------------+-------------+---------------+ | 1.11.0 & 2.11.1 | Yes | 2340 | 16% | +-------------------+------------------+-------------+---------------+ | | No | 926 | - | | mysql +------------------+-------------+---------------+ | 8.0.11 & 8.0.12 | Yes | 735 | 21% | +-------------------+------------------+-------------+---------------+ | | No | 390 | - | | nginx +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 219 | 44% | +-------------------+------------------+-------------+---------------+ | tomcat | No | 924 | - | | 10.1.25 & 10.1.26 +------------------+-------------+---------------+ | | Yes | 474 | 49% | +-------------------+------------------+-------------+---------------+ Additionally, the table below shows the runtime memory usage of the container: +-------------------+------------------+-------------+---------------+ | Image | Page Cache Share | Memory (MB) | Memory | | | | | Reduction (%) | +-------------------+------------------+-------------+---------------+ | | No | 35 | - | | redis +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 28 | 20% | +-------------------+------------------+-------------+---------------+ | | No | 149 | - | | postgres +------------------+-------------+---------------+ | 16.1 & 16.2 | Yes | 95 | 37% | +-------------------+------------------+-------------+---------------+ | | No | 1028 | - | | tensorflow +------------------+-------------+---------------+ | 1.11.0 & 2.11.1 | Yes | 930 | 10% | +-------------------+------------------+-------------+---------------+ | | No | 155 | - | | mysql +------------------+-------------+---------------+ | 8.0.11 & 8.0.12 | Yes | 132 | 15% | +-------------------+------------------+-------------+---------------+ | | No | 25 | - | | nginx +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 20 | 20% | +-------------------+------------------+-------------+---------------+ | tomcat | No | 186 | - | | 10.1.25 & 10.1.26 +------------------+-------------+---------------+ | | Yes | 98 | 48% | +-------------------+------------------+-------------+---------------+ Signed-off-by: Hongzhen Luo --- v3: The previous implementation maintained a list for inodes with identical con= tent, and utilized one of these inodes from the list to handle read request opera= tions. However, this could lead to a situation where a device cannot be unmounted = because its inode is being used for read-related operations. This implementation has been redesigned to avoid the aforementioned unmount= ing issues. Additionally, this implementation adds support for compressed files. v2: https://lore.kernel.org/all/20240731080704.678259-2-hongzhen@linux.alib= aba.com/ v1: https://lore.kernel.org/all/20240722065355.1396365-4-hongzhen@linux.ali= baba.com/ --- fs/erofs/Kconfig | 10 +++ fs/erofs/Makefile | 1 + fs/erofs/internal.h | 4 + fs/erofs/pagecache_share.c | 171 +++++++++++++++++++++++++++++++++++++ fs/erofs/pagecache_share.h | 20 +++++ 5 files changed, 206 insertions(+) create mode 100644 fs/erofs/pagecache_share.c create mode 100644 fs/erofs/pagecache_share.h diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index 7dcdce660cac..756a74de623c 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -158,3 +158,13 @@ config EROFS_FS_PCPU_KTHREAD_HIPRI at higher priority. =20 If unsure, say N. + +config EROFS_FS_PAGE_CACHE_SHARE + bool "EROFS page cache share support" + depends on EROFS_FS + default n + help + This permits EROFS to share page cache for files with same + fingerprints. + + If unsure, say N. diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile index 097d672e6b14..f14a2ac0e561 100644 --- a/fs/erofs/Makefile +++ b/fs/erofs/Makefile @@ -8,3 +8,4 @@ erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) +=3D decompressor_lzma.o erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) +=3D decompressor_deflate.o erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) +=3D decompressor_zstd.o erofs-$(CONFIG_EROFS_FS_ONDEMAND) +=3D fscache.o +erofs-$(CONFIG_EROFS_FS_PAGE_CACHE_SHARE) +=3D pagecache_share.o diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 3f1984664dac..a1517b2e6973 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -288,6 +288,9 @@ struct erofs_inode { }; #endif /* CONFIG_EROFS_FS_ZIP */ }; +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + struct inode *ano_inode; +#endif /* the corresponding vfs inode */ struct inode vfs_inode; }; @@ -384,6 +387,7 @@ extern const struct inode_operations erofs_dir_iops; =20 extern const struct file_operations erofs_file_fops; extern const struct file_operations erofs_dir_fops; +extern const struct file_operations erofs_pcs_file_fops; =20 extern const struct iomap_ops z_erofs_iomap_report_ops; =20 diff --git a/fs/erofs/pagecache_share.c b/fs/erofs/pagecache_share.c new file mode 100644 index 000000000000..2d2a74547b67 --- /dev/null +++ b/fs/erofs/pagecache_share.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2024, Alibaba Cloud + */ +#include +#include +#include "pagecache_share.h" +#include "internal.h" +#include "xattr.h" + +#define PCS_FPRT_IDX 4 +#define PCS_FPRT_NAME "erofs.fingerprint" +#define PCS_FPRT_MAXLEN (sizeof(size_t) + 1024) + +static DEFINE_MUTEX(pseudo_mnt_lock); +static refcount_t pseudo_mnt_count; +static struct vfsmount *erofs_pcs_mnt; + +int erofs_pcs_init_mnt(void) +{ + mutex_lock(&pseudo_mnt_lock); + if (!erofs_pcs_mnt) { + struct vfsmount *tmp =3D kern_mount(&erofs_anon_fs_type); + if (IS_ERR(tmp)) + return PTR_ERR(tmp); + erofs_pcs_mnt =3D tmp; + refcount_set(&pseudo_mnt_count, 1); + } else + refcount_add(1, &pseudo_mnt_count); + mutex_unlock(&pseudo_mnt_lock); + return 0; +} + +void erofs_pcs_free_mnt(void) +{ + mutex_lock(&pseudo_mnt_lock); + if (refcount_dec_and_test(&pseudo_mnt_count)) { + kern_unmount(erofs_pcs_mnt); + erofs_pcs_mnt =3D NULL; + } + mutex_unlock(&pseudo_mnt_lock); +} + +static int erofs_pcs_eq(struct inode *inode, void *data) +{ + return inode->i_private && memcmp(inode->i_private, data, + sizeof(size_t) + *(size_t *)data) =3D=3D 0 ? 1 : 0; +} + +static int erofs_pcs_set_fprt(struct inode *inode, void *data) +{ + /* fprt length and content */ + inode->i_private =3D kmalloc(*(size_t *)data + sizeof(size_t), + GFP_KERNEL); + memcpy(inode->i_private, data, sizeof(size_t) + *(size_t *)data); + return 0; +} + +void erofs_pcs_fill_inode(struct inode *inode) +{ + struct erofs_inode *vi =3D EROFS_I(inode); + char fprt[PCS_FPRT_MAXLEN]; + struct inode *ano_inode; + unsigned long fprt_hash; + size_t fprt_len; + + vi->ano_inode =3D NULL; + fprt_len =3D erofs_getxattr(inode, PCS_FPRT_IDX, PCS_FPRT_NAME, + fprt + sizeof(size_t), PCS_FPRT_MAXLEN); + if (fprt_len > 0 && fprt_len <=3D PCS_FPRT_MAXLEN) { + *(size_t *)fprt =3D fprt_len; + fprt_hash =3D xxh32(fprt + sizeof(size_t), fprt_len, 0); + ano_inode =3D iget5_locked(erofs_pcs_mnt->mnt_sb, fprt_hash, + erofs_pcs_eq, erofs_pcs_set_fprt, fprt); + vi->ano_inode =3D ano_inode; + if (ano_inode->i_state & I_NEW) { + if (erofs_inode_is_data_compressed(vi->datalayout)) + ano_inode->i_mapping->a_ops =3D &z_erofs_aops; + else + ano_inode->i_mapping->a_ops =3D + &erofs_raw_access_aops; + ano_inode->i_size =3D inode->i_size; + unlock_new_inode(ano_inode); + } + } +} + +static struct file *erofs_pcs_alloc_file(struct file *file, + struct inode *ano_inode) +{ + struct file *ano_file; + + ano_file =3D alloc_file_pseudo(ano_inode, erofs_pcs_mnt, "[erofs_pcs_f]", + O_RDONLY, &erofs_file_fops); + file_ra_state_init(&ano_file->f_ra, file->f_mapping); + ano_file->private_data =3D EROFS_I(file_inode(file)); + return ano_file; +} + +static int erofs_pcs_file_open(struct inode *inode, struct file *file) +{ + struct file *ano_file; + struct inode *ano_inode; + struct erofs_inode *vi =3D EROFS_I(inode); + + ano_inode =3D vi->ano_inode; + if (!ano_inode) + return -EINVAL; + ano_file =3D erofs_pcs_alloc_file(file, ano_inode); + ihold(ano_inode); + file->private_data =3D (void *)ano_file; + return 0; +} + +static int erofs_pcs_file_release(struct inode *inode, struct file *file) +{ + if (!file->private_data) + return -EINVAL; + fput((struct file *)file->private_data); + file->private_data =3D NULL; + return 0; +} + +static ssize_t erofs_pcs_file_read_iter(struct kiocb *iocb, + struct iov_iter *to) +{ + struct inode *inode =3D file_inode(iocb->ki_filp); + struct file *file, *ano_file; + struct kiocb ano_iocb; + ssize_t res; + + if (!iov_iter_count(to)) + return 0; +#ifdef CONFIG_FS_DAX + if (IS_DAX(inode)) + return erofs_file_fops.read_iter(iocb, to); +#endif + if (iocb->ki_flags & IOCB_DIRECT) + return erofs_file_fops.read_iter(iocb, to); + + memcpy(&ano_iocb, iocb, sizeof(struct kiocb)); + file =3D iocb->ki_filp; + ano_file =3D file->private_data; + if (!ano_file) + return -EINVAL; + ano_iocb.ki_filp =3D ano_file; + res =3D filemap_read(&ano_iocb, to, 0); + memcpy(iocb, &ano_iocb, sizeof(struct kiocb)); + iocb->ki_filp =3D file; + file_accessed(file); + return res; +} + +static int erofs_pcs_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct file *ano_file =3D file->private_data; + + vma_set_file(vma, ano_file); + vma->vm_ops =3D &generic_file_vm_ops; + return 0; +} + +const struct file_operations erofs_pcs_file_fops =3D { + .open =3D erofs_pcs_file_open, + .llseek =3D generic_file_llseek, + .read_iter =3D erofs_pcs_file_read_iter, + .mmap =3D erofs_pcs_mmap, + .release =3D erofs_pcs_file_release, + .get_unmapped_area =3D thp_get_unmapped_area, + .splice_read =3D filemap_splice_read, +}; diff --git a/fs/erofs/pagecache_share.h b/fs/erofs/pagecache_share.h new file mode 100644 index 000000000000..b8111291cf79 --- /dev/null +++ b/fs/erofs/pagecache_share.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2024, Alibaba Cloud + */ +#ifndef __EROFS_PAGECACHE_SHARE_H +#define __EROFS_PAGECACHE_SHARE_H + +#include +#include +#include +#include +#include "internal.h" + +int erofs_pcs_init_mnt(void); +void erofs_pcs_free_mnt(void); +void erofs_pcs_fill_inode(struct inode *inode); + +extern const struct vm_operations_struct generic_file_vm_ops; + +#endif --=20 2.43.5 From nobody Fri Dec 19 08:59:21 2025 Received: from out30-118.freemail.mail.aliyun.com (out30-118.freemail.mail.aliyun.com [115.124.30.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A90D16CD2A for ; Wed, 28 Aug 2024 11:20:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.118 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844013; cv=none; b=mY+XMlK4MmSJZ++yQxymq0BXtelxbA4I8CELncQotKEHmDHCVc8wrOkX90AGBQ/QPyGTMb/JgF5OA36qZxl77T32y4B8IBIIoqZtoduuBYLRBoGzZZQR0vL/pHfsXGcBMr1sETUFvW05ZaYgzxbHpLJE6x+vylMDmR25gryMTLA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724844013; c=relaxed/simple; bh=Ezy4HNFGJSTKEsf5HkYKSrv3EXBYHx9iLfOg4sPe8PY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qIcpKGGLDsALyaZid6bh0b+xa7a4kbrt8iZlLizwL9M7zKb6a8Qdjg4bP3ExAqYVf8z81v7mC8OXaSCm+6R+gXCS3avAomwWtMkDCDRqEpeO2FWIUm/5RoRu+y/m/1T1LQC/D4181+AuC3CQ0ztxVy+7dv4o6dAF2stfDhdbYyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=wstjfw6T; arc=none smtp.client-ip=115.124.30.118 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="wstjfw6T" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1724844006; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=9t26X20wOk6RC/v/5buWxmWBy5+YEGCpdhBhX53OLoo=; b=wstjfw6TvPHsYeX1Uz69gPMf1pSogUsRkCsYRb/xJ7UzKleuG/srJJCAmsmfKoSLDpJVdiHVRfJKSZ2daVX6W8dDuTF4cya+cD6pRFFrsSQPInR62Z7rhwopjjTs5JToFuaNriaWQUpZP1M644zeBY02vVasZ7WZmFcM/ONzL3o= Received: from localhost(mailfrom:hongzhen@linux.alibaba.com fp:SMTPD_---0WDpbUPx_1724844004) by smtp.aliyun-inc.com; Wed, 28 Aug 2024 19:20:05 +0800 From: Hongzhen Luo To: linux-erofs@lists.ozlabs.org, lihongbo22@huawei.com Cc: linux-kernel@vger.kernel.org, Hongzhen Luo Subject: [PATCH RFC v3 3/3] erofs: apply the page cache share feature Date: Wed, 28 Aug 2024 19:19:59 +0800 Message-ID: <20240828111959.3677011-4-hongzhen@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> References: <20240828111959.3677011-1-hongzhen@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This modifies relevant functions to apply the page cache share feature. Signed-off-by: Hongzhen Luo --- v3: Reimplement and add support for compressed files. v2: https://lore.kernel.org/all/20240731080704.678259-3-hongzhen@linux.alib= aba.com/ v1: https://lore.kernel.org/all/20240722065355.1396365-5-hongzhen@linux.ali= baba.com/ --- fs/erofs/data.c | 34 +++++++++++++++++++++++++++++++++- fs/erofs/inode.c | 12 ++++++++++++ fs/erofs/super.c | 29 +++++++++++++++++++++++++++++ fs/erofs/zdata.c | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 106 insertions(+), 1 deletion(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 1b7eba38ba1e..ef27b934115f 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -347,12 +347,44 @@ int erofs_fiemap(struct inode *inode, struct fiemap_e= xtent_info *fieinfo, */ static int erofs_read_folio(struct file *file, struct folio *folio) { +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + struct erofs_inode *vi =3D NULL; + int ret; + + if (file && file->private_data) { + vi =3D file->private_data; + if (vi->ano_inode =3D=3D file_inode(file)) + folio->mapping->host =3D &vi->vfs_inode; + else + vi =3D NULL; + } + ret =3D iomap_read_folio(folio, &erofs_iomap_ops); + if (vi) + folio->mapping->host =3D file_inode(file); + return ret; +#else return iomap_read_folio(folio, &erofs_iomap_ops); +#endif } - static void erofs_readahead(struct readahead_control *rac) { +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + struct erofs_inode *vi =3D NULL; + struct file *file =3D rac->file; + + if (file && file->private_data) { + vi =3D file->private_data; + if (vi->ano_inode =3D=3D file_inode(file)) + rac->mapping->host =3D &vi->vfs_inode; + else + vi =3D NULL; + } + iomap_readahead(rac, &erofs_iomap_ops); + if (vi) + rac->mapping->host =3D file_inode(file); +#else return iomap_readahead(rac, &erofs_iomap_ops); +#endif } =20 static sector_t erofs_bmap(struct address_space *mapping, sector_t block) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 43c09aae2afc..1811f73478b4 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -5,6 +5,7 @@ * Copyright (C) 2021, Alibaba Cloud */ #include "xattr.h" +#include "pagecache_share.h" =20 #include =20 @@ -229,10 +230,21 @@ static int erofs_fill_inode(struct inode *inode) switch (inode->i_mode & S_IFMT) { case S_IFREG: inode->i_op =3D &erofs_generic_iops; +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + erofs_pcs_fill_inode(inode); + if (vi->ano_inode) + inode->i_fop =3D &erofs_pcs_file_fops; + else if (erofs_inode_is_data_compressed(vi->datalayout)) + inode->i_fop =3D &generic_ro_fops; + else + inode->i_fop =3D &erofs_file_fops; +#else if (erofs_inode_is_data_compressed(vi->datalayout)) inode->i_fop =3D &generic_ro_fops; else inode->i_fop =3D &erofs_file_fops; +#endif + break; case S_IFDIR: inode->i_op =3D &erofs_dir_iops; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 36291feaa5f6..c61779fe2e3a 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -12,6 +12,7 @@ #include #include #include "xattr.h" +#include "pagecache_share.h" =20 #define CREATE_TRACE_POINTS #include @@ -103,6 +104,12 @@ static void erofs_free_inode(struct inode *inode) { struct erofs_inode *vi =3D EROFS_I(inode); =20 +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + if (S_ISREG(inode->i_mode) && vi->ano_inode) { + iput(vi->ano_inode); + vi->ano_inode =3D NULL; + } +#endif if (inode->i_op =3D=3D &erofs_fast_symlink_iops) kfree(inode->i_link); kfree(vi->xattr_shared_xattrs); @@ -701,6 +708,12 @@ static int erofs_fc_fill_super(struct super_block *sb,= struct fs_context *fc) if (err) return err; =20 +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + err =3D erofs_pcs_init_mnt(); + if (err) + return err; +#endif + erofs_info(sb, "mounted with root inode @ nid %llu.", sbi->root_nid); return 0; } @@ -811,6 +824,9 @@ static void erofs_kill_sb(struct super_block *sb) else kill_block_super(sb); =20 +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + erofs_pcs_free_mnt(); +#endif erofs_free_dev_context(sbi->devs); fs_put_dax(sbi->dax_dev, NULL); erofs_fscache_unregister_fs(sb); @@ -849,8 +865,21 @@ static struct file_system_type erofs_fs_type =3D { }; MODULE_ALIAS_FS("erofs"); =20 +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE +static void erofs_free_anon_inode(struct inode *inode) +{ + if (inode->i_private) { + kfree(inode->i_private); + inode->i_private =3D NULL; + } +} +#endif + static const struct super_operations erofs_anon_sops =3D { .statfs =3D simple_statfs, +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + .free_inode =3D erofs_free_anon_inode, +#endif }; =20 static int erofs_anon_init_fs_context(struct fs_context *fc) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 424f656cd765..cd3cabfef462 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1802,6 +1802,17 @@ static void z_erofs_pcluster_readmore(struct z_erofs= _decompress_frontend *f, =20 static int z_erofs_read_folio(struct file *file, struct folio *folio) { +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + struct erofs_inode *vi =3D NULL; + + if (file && file->private_data) { + vi =3D file->private_data; + if (vi->ano_inode =3D=3D file_inode(file)) + folio->mapping->host =3D &vi->vfs_inode; + else + vi =3D NULL; + } +#endif struct inode *const inode =3D folio->mapping->host; struct erofs_sb_info *const sbi =3D EROFS_I_SB(inode); struct z_erofs_decompress_frontend f =3D DECOMPRESS_FRONTEND_INIT(inode); @@ -1824,11 +1835,27 @@ static int z_erofs_read_folio(struct file *file, st= ruct folio *folio) =20 erofs_put_metabuf(&f.map.buf); erofs_release_pages(&f.pagepool); + +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + if (vi) + folio->mapping->host =3D file_inode(file); +#endif return err; } =20 static void z_erofs_readahead(struct readahead_control *rac) { +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + struct erofs_inode *vi =3D NULL; + + if (rac->file && rac->file->private_data) { + vi =3D rac->file->private_data; + if (vi->ano_inode =3D=3D file_inode(rac->file)) + rac->mapping->host =3D &vi->vfs_inode; + else + vi =3D NULL; + } +#endif struct inode *const inode =3D rac->mapping->host; struct erofs_sb_info *const sbi =3D EROFS_I_SB(inode); struct z_erofs_decompress_frontend f =3D DECOMPRESS_FRONTEND_INIT(inode); @@ -1863,6 +1890,11 @@ static void z_erofs_readahead(struct readahead_contr= ol *rac) z_erofs_runqueue(&f, z_erofs_is_sync_decompress(sbi, nr_folios), true); erofs_put_metabuf(&f.map.buf); erofs_release_pages(&f.pagepool); + +#ifdef CONFIG_EROFS_FS_PAGE_CACHE_SHARE + if (vi) + rac->mapping->host =3D file_inode(rac->file); +#endif } =20 const struct address_space_operations z_erofs_aops =3D { --=20 2.43.5