From nobody Thu Feb 12 21:44:04 2026 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1DA77082E for ; Sun, 5 Jan 2025 15:12:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736089945; cv=none; b=j1S/4aFGCL/xvfMibFnpVO3NprIrVW9va0Qx+fZbyuzclBQF/sBDLfW4ROKKj7PuQgDDiP4fXkE3/tKZrr9GBQexfvt5rvz0vOXrnXP2ObScSaqXzpMS6pGP2+WW0hdXilLBCCOvaE3Ros8uGiLJ3JvpMVWdN4MK26xkuopnyDE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736089945; c=relaxed/simple; bh=PUy0gA8L3+9uMjQ6vFhqRHPKPuIQyhoPMq5koXpo6IE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HB3aLkxfaODyz1w7KS81JcA168SOK/leY0GT9K1ibD3qCVJ5QwqLpeT2r5zgPtj4WUps+suFJJjvzv7f0jLDWWjrdsoBaFenAOW3aelsuQSkR8WlDvVAQ5mvvhJxhgJX0Nx7tvsgOYZHv/iFwUJ7ezcpff4o4tyfEgVKQr04U0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=S6zihR0a; arc=none smtp.client-ip=115.124.30.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="S6zihR0a" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1736089934; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=zcuKqis8GqEhVM+85u6GWnryNgXPPTu4+n1bBvKQiR8=; b=S6zihR0av8p+pe8JA7Uz0mcClBcFTswS+WBTsZ3DL+ZMviZoVIFIMLz+HDLXkToYOc4wrfXaBvHL3U3p6pvxvX8sKo+4X5XdlLMwlZLajC8FIgJngzgrE9/gZ/O3KugCzeQlewo2zbHXpU5BGW3xGWDrxsUPiUvzQaTdNHtK0Bk= Received: from localhost(mailfrom:hongzhen@linux.alibaba.com fp:SMTPD_---0WMypL8l_1736089932 cluster:ay36) by smtp.aliyun-inc.com; Sun, 05 Jan 2025 23:12:13 +0800 From: Hongzhen Luo To: linux-erofs@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org, Hongzhen Luo Subject: [RFC PATCH v5 3/4] erofs: apply the page cache share feature Date: Sun, 5 Jan 2025 23:12:07 +0800 Message-ID: <20250105151208.3797385-4-hongzhen@linux.alibaba.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250105151208.3797385-1-hongzhen@linux.alibaba.com> References: <20250105151208.3797385-1-hongzhen@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This modifies relevant functions to apply the page cache share feature. Below is the memory usage for reading all files in two different minor versions of container images: +-------------------+------------------+-------------+---------------+ | Image | Page Cache Share | Memory (MB) | Memory | | | | | Reduction (%) | +-------------------+------------------+-------------+---------------+ | | No | 241 | - | | redis +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 163 | 33% | +-------------------+------------------+-------------+---------------+ | | No | 872 | - | | postgres +------------------+-------------+---------------+ | 16.1 & 16.2 | Yes | 630 | 28% | +-------------------+------------------+-------------+---------------+ | | No | 2771 | - | | tensorflow +------------------+-------------+---------------+ | 1.11.0 & 2.11.1 | Yes | 2340 | 16% | +-------------------+------------------+-------------+---------------+ | | No | 926 | - | | mysql +------------------+-------------+---------------+ | 8.0.11 & 8.0.12 | Yes | 735 | 21% | +-------------------+------------------+-------------+---------------+ | | No | 390 | - | | nginx +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 219 | 44% | +-------------------+------------------+-------------+---------------+ | tomcat | No | 924 | - | | 10.1.25 & 10.1.26 +------------------+-------------+---------------+ | | Yes | 474 | 49% | +-------------------+------------------+-------------+---------------+ Additionally, the table below shows the runtime memory usage of the container: +-------------------+------------------+-------------+---------------+ | Image | Page Cache Share | Memory (MB) | Memory | | | | | Reduction (%) | +-------------------+------------------+-------------+---------------+ | | No | 35 | - | | redis +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 28 | 20% | +-------------------+------------------+-------------+---------------+ | | No | 149 | - | | postgres +------------------+-------------+---------------+ | 16.1 & 16.2 | Yes | 95 | 37% | +-------------------+------------------+-------------+---------------+ | | No | 1028 | - | | tensorflow +------------------+-------------+---------------+ | 1.11.0 & 2.11.1 | Yes | 930 | 10% | +-------------------+------------------+-------------+---------------+ | | No | 155 | - | | mysql +------------------+-------------+---------------+ | 8.0.11 & 8.0.12 | Yes | 132 | 15% | +-------------------+------------------+-------------+---------------+ | | No | 25 | - | | nginx +------------------+-------------+---------------+ | 7.2.4 & 7.2.5 | Yes | 20 | 20% | +-------------------+------------------+-------------+---------------+ | tomcat | No | 186 | - | | 10.1.25 & 10.1.26 +------------------+-------------+---------------+ | | Yes | 98 | 48% | +-------------------+------------------+-------------+---------------+ Signed-off-by: Hongzhen Luo --- fs/erofs/data.c | 14 +++++++-- fs/erofs/inode.c | 5 ++- fs/erofs/pagecache_share.c | 63 ++++++++++++++++++++++++++++++++++++++ fs/erofs/pagecache_share.h | 11 +++++++ fs/erofs/super.c | 7 +++++ fs/erofs/zdata.c | 9 ++++-- 6 files changed, 104 insertions(+), 5 deletions(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index 0cd6b5c4df98..fb08acbeaab6 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -5,6 +5,7 @@ * Copyright (C) 2021, Alibaba Cloud */ #include "internal.h" +#include "pagecache_share.h" #include #include =20 @@ -370,12 +371,21 @@ int erofs_fiemap(struct inode *inode, struct fiemap_e= xtent_info *fieinfo, */ static int erofs_read_folio(struct file *file, struct folio *folio) { - return iomap_read_folio(folio, &erofs_iomap_ops); + int ret, pcshr; + + pcshr =3D erofs_pcshr_read_begin(file, folio); + ret =3D iomap_read_folio(folio, &erofs_iomap_ops); + erofs_pcshr_read_end(file, folio, pcshr); + return ret; } =20 static void erofs_readahead(struct readahead_control *rac) { - return iomap_readahead(rac, &erofs_iomap_ops); + int pcshr; + + pcshr =3D erofs_pcshr_readahead_begin(rac); + iomap_readahead(rac, &erofs_iomap_ops); + erofs_pcshr_readahead_end(rac, pcshr); } =20 static sector_t erofs_bmap(struct address_space *mapping, sector_t block) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index d4b89407822a..0b070f4b46b8 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -5,6 +5,7 @@ * Copyright (C) 2021, Alibaba Cloud */ #include "xattr.h" +#include "pagecache_share.h" #include =20 static int erofs_fill_symlink(struct inode *inode, void *kaddr, @@ -212,7 +213,9 @@ static int erofs_fill_inode(struct inode *inode) switch (inode->i_mode & S_IFMT) { case S_IFREG: inode->i_op =3D &erofs_generic_iops; - if (erofs_inode_is_data_compressed(vi->datalayout)) + if (erofs_pcshr_fill_inode(inode) =3D=3D 0) + inode->i_fop =3D &erofs_pcshr_fops; + else if (erofs_inode_is_data_compressed(vi->datalayout)) inode->i_fop =3D &generic_ro_fops; else inode->i_fop =3D &erofs_file_fops; diff --git a/fs/erofs/pagecache_share.c b/fs/erofs/pagecache_share.c index 703fd17c002c..22172b5e21c7 100644 --- a/fs/erofs/pagecache_share.c +++ b/fs/erofs/pagecache_share.c @@ -22,6 +22,7 @@ struct erofs_pcshr_counter { =20 struct erofs_pcshr_private { char fprt[PCSHR_FPRT_MAXLEN]; + struct mutex mutex; }; =20 static struct erofs_pcshr_counter mnt_counter =3D { @@ -84,6 +85,7 @@ static int erofs_fprt_set(struct inode *inode, void *data) if (!ano_private) return -ENOMEM; memcpy(ano_private, data, sizeof(size_t) + *(size_t *)data); + mutex_init(&ano_private->mutex); inode->i_private =3D ano_private; return 0; } @@ -226,3 +228,64 @@ const struct file_operations erofs_pcshr_fops =3D { .get_unmapped_area =3D thp_get_unmapped_area, .splice_read =3D filemap_splice_read, }; + +int erofs_pcshr_read_begin(struct file *file, struct folio *folio) +{ + struct erofs_inode *vi; + struct erofs_pcshr_private *ano_private; + + if (!(file && file->private_data)) + return 0; + + vi =3D file->private_data; + if (vi->ano_inode !=3D file_inode(file)) + return 0; + + ano_private =3D vi->ano_inode->i_private; + mutex_lock(&ano_private->mutex); + folio->mapping->host =3D &vi->vfs_inode; + return 1; +} + +void erofs_pcshr_read_end(struct file *file, struct folio *folio, int pcsh= r) +{ + struct erofs_pcshr_private *ano_private; + + if (pcshr =3D=3D 0) + return; + + ano_private =3D file_inode(file)->i_private; + folio->mapping->host =3D file_inode(file); + mutex_unlock(&ano_private->mutex); +} + +int erofs_pcshr_readahead_begin(struct readahead_control *rac) +{ + struct erofs_inode *vi; + struct file *file =3D rac->file; + struct erofs_pcshr_private *ano_private; + + if (!(file && file->private_data)) + return 0; + + vi =3D file->private_data; + if (vi->ano_inode !=3D file_inode(file)) + return 0; + + ano_private =3D file_inode(file)->i_private; + mutex_lock(&ano_private->mutex); + rac->mapping->host =3D &vi->vfs_inode; + return 1; +} + +void erofs_pcshr_readahead_end(struct readahead_control *rac, int pcshr) +{ + struct erofs_pcshr_private *ano_private; + + if (pcshr =3D=3D 0) + return; + + ano_private =3D file_inode(rac->file)->i_private; + rac->mapping->host =3D file_inode(rac->file); + mutex_unlock(&ano_private->mutex); +} diff --git a/fs/erofs/pagecache_share.h b/fs/erofs/pagecache_share.h index f3889d6889e5..abda2a60278b 100644 --- a/fs/erofs/pagecache_share.h +++ b/fs/erofs/pagecache_share.h @@ -14,6 +14,12 @@ void erofs_pcshr_free_mnt(void); int erofs_pcshr_fill_inode(struct inode *inode); void erofs_pcshr_free_inode(struct inode *inode); =20 +/* switch between the anonymous inode and the real inode */ +int erofs_pcshr_read_begin(struct file *file, struct folio *folio); +void erofs_pcshr_read_end(struct file *file, struct folio *folio, int pcsh= r); +int erofs_pcshr_readahead_begin(struct readahead_control *rac); +void erofs_pcshr_readahead_end(struct readahead_control *rac, int pcshr); + #else =20 static inline int erofs_pcshr_init_mnt(void) { return 0; } @@ -21,6 +27,11 @@ static inline void erofs_pcshr_free_mnt(void) {} static inline int erofs_pcshr_fill_inode(struct inode *inode) { return -1;= } static inline void erofs_pcshr_free_inode(struct inode *inode) {} =20 +static inline int erofs_pcshr_read_begin(struct file *file, struct folio *= folio) { return 0; } +static inline void erofs_pcshr_read_end(struct file *file, struct folio *f= olio, int pcshr) {} +static inline int erofs_pcshr_readahead_begin(struct readahead_control *ra= c) { return 0; } +static inline void erofs_pcshr_readahead_end(struct readahead_control *rac= , int pcshr) {} + #endif // CONFIG_EROFS_FS_PAGE_CACHE_SHARE =20 #endif diff --git a/fs/erofs/super.c b/fs/erofs/super.c index b4ce07dc931c..1b690eb6c1f1 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -13,6 +13,7 @@ #include #include #include "xattr.h" +#include "pagecache_share.h" =20 #define CREATE_TRACE_POINTS #include @@ -81,6 +82,7 @@ static void erofs_free_inode(struct inode *inode) { struct erofs_inode *vi =3D EROFS_I(inode); =20 + erofs_pcshr_free_inode(inode); if (inode->i_op =3D=3D &erofs_fast_symlink_iops) kfree(inode->i_link); kfree(vi->xattr_shared_xattrs); @@ -683,6 +685,10 @@ static int erofs_fc_fill_super(struct super_block *sb,= struct fs_context *fc) if (err) return err; =20 + err =3D erofs_pcshr_init_mnt(); + if (err) + return err; + erofs_info(sb, "mounted with root inode @ nid %llu.", sbi->root_nid); return 0; } @@ -818,6 +824,7 @@ static void erofs_kill_sb(struct super_block *sb) kill_anon_super(sb); else kill_block_super(sb); + erofs_pcshr_free_mnt(); fs_put_dax(sbi->dif0.dax_dev, NULL); erofs_fscache_unregister_fs(sb); erofs_sb_free(sbi); diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 19ef4ff2a134..fc2ed01eaabe 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -5,6 +5,7 @@ * Copyright (C) 2022 Alibaba Cloud */ #include "compress.h" +#include "pagecache_share.h" #include #include #include @@ -1891,9 +1892,10 @@ static int z_erofs_read_folio(struct file *file, str= uct folio *folio) { struct inode *const inode =3D folio->mapping->host; struct z_erofs_decompress_frontend f =3D DECOMPRESS_FRONTEND_INIT(inode); - int err; + int err, pcshr; =20 trace_erofs_read_folio(folio, false); + pcshr =3D erofs_pcshr_read_begin(file, folio); f.headoffset =3D (erofs_off_t)folio->index << PAGE_SHIFT; =20 z_erofs_pcluster_readmore(&f, NULL, true); @@ -1909,6 +1911,7 @@ static int z_erofs_read_folio(struct file *file, stru= ct folio *folio) =20 erofs_put_metabuf(&f.map.buf); erofs_release_pages(&f.pagepool); + erofs_pcshr_read_end(file, folio, pcshr); return err; } =20 @@ -1918,8 +1921,9 @@ static void z_erofs_readahead(struct readahead_contro= l *rac) struct z_erofs_decompress_frontend f =3D DECOMPRESS_FRONTEND_INIT(inode); struct folio *head =3D NULL, *folio; unsigned int nr_folios; - int err; + int err, pcshr; =20 + pcshr =3D erofs_pcshr_readahead_begin(rac); f.headoffset =3D readahead_pos(rac); =20 z_erofs_pcluster_readmore(&f, rac, true); @@ -1947,6 +1951,7 @@ static void z_erofs_readahead(struct readahead_contro= l *rac) (void)z_erofs_runqueue(&f, nr_folios); erofs_put_metabuf(&f.map.buf); erofs_release_pages(&f.pagepool); + erofs_pcshr_readahead_end(rac, pcshr); } =20 const struct address_space_operations z_erofs_aops =3D { --=20 2.43.5