From nobody Fri Sep 19 03:48:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAEC1C4321E for ; Tue, 29 Nov 2022 11:58:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233093AbiK2L6z (ORCPT ); Tue, 29 Nov 2022 06:58:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229923AbiK2L6k (ORCPT ); Tue, 29 Nov 2022 06:58:40 -0500 Received: from out30-44.freemail.mail.aliyun.com (out30-44.freemail.mail.aliyun.com [115.124.30.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA6ED644E for ; Tue, 29 Nov 2022 03:58:37 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VW-qli8_1669723114; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VW-qli8_1669723114) by smtp.aliyun-inc.com; Tue, 29 Nov 2022 19:58:35 +0800 From: Jingbo Xu To: xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH v3 1/2] erofs: support large folios for fscache mode Date: Tue, 29 Nov 2022 19:58:32 +0800 Message-Id: <20221129115833.41062-2-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20221129115833.41062-1-jefflexu@linux.alibaba.com> References: <20221129115833.41062-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When large folios supported, one folio can be split into several slices, each of which may be mapped to META/UNMAPPED/MAPPED, and the folio can be unlocked as a whole only when all slices have completed. Thus always allocate erofs_fscache_request for each .read_folio() or .readahead(). In this case, only when all slices of the folio or folio range have completed, the request will be marked as completed and the folio or folio range will be unlocked then. In this case, one folio (from .read_folio()) or one folio range (from .readahead()) can be mapped into several slices, with each slice mapped to different cookies, and thus each slice needs its own netfs_cache_resources. Here we introduce listed requests to support this, where each .read_folio() or .readahead() calling can correspond to a list of requests, with each request reading from the corresponding cookie. Signed-off-by: Jingbo Xu --- fs/erofs/fscache.c | 166 ++++++++++++++++++++++++--------------------- 1 file changed, 88 insertions(+), 78 deletions(-) diff --git a/fs/erofs/fscache.c b/fs/erofs/fscache.c index 3e794891cd91..86d5cd5f909f 100644 --- a/fs/erofs/fscache.c +++ b/fs/erofs/fscache.c @@ -13,6 +13,7 @@ static struct vfsmount *erofs_pseudo_mnt; =20 struct erofs_fscache_request { struct netfs_cache_resources cache_resources; + struct erofs_fscache_request *parent; struct address_space *mapping; /* The mapping being accessed */ loff_t start; /* Start position */ size_t len; /* Length of the request */ @@ -22,7 +23,7 @@ struct erofs_fscache_request { }; =20 static struct erofs_fscache_request *erofs_fscache_req_alloc(struct addres= s_space *mapping, - loff_t start, size_t len) + loff_t start, size_t len, struct erofs_fscache_request *parent) { struct erofs_fscache_request *req; =20 @@ -34,6 +35,10 @@ static struct erofs_fscache_request *erofs_fscache_req_a= lloc(struct address_spac req->start =3D start; req->len =3D len; refcount_set(&req->ref, 1); + if (parent) { + req->parent =3D parent; + refcount_inc(&parent->ref); + } =20 return req; } @@ -56,17 +61,22 @@ static void erofs_fscache_req_complete(struct erofs_fsc= ache_request *req) folio_unlock(folio); } rcu_read_unlock(); - - if (req->cache_resources.ops) - req->cache_resources.ops->end_operation(&req->cache_resources); - - kfree(req); } =20 static void erofs_fscache_req_put(struct erofs_fscache_request *req) { - if (refcount_dec_and_test(&req->ref)) - erofs_fscache_req_complete(req); + struct erofs_fscache_request *parent =3D req->parent; + + if (refcount_dec_and_test(&req->ref)) { + if (!parent) + erofs_fscache_req_complete(req); + if (req->cache_resources.ops) + req->cache_resources.ops->end_operation(&req->cache_resources); + kfree(req); + + if (parent) + erofs_fscache_req_put(parent); + } } =20 static void erofs_fscache_subreq_complete(void *priv, @@ -74,8 +84,12 @@ static void erofs_fscache_subreq_complete(void *priv, { struct erofs_fscache_request *req =3D priv; =20 - if (IS_ERR_VALUE(transferred_or_error)) - req->error =3D transferred_or_error; + if (IS_ERR_VALUE(transferred_or_error)) { + if (req->parent) + req->parent->error =3D transferred_or_error; + else + req->error =3D transferred_or_error; + } erofs_fscache_req_put(req); } =20 @@ -152,7 +166,7 @@ static int erofs_fscache_meta_read_folio(struct file *d= ata, struct folio *folio) } =20 req =3D erofs_fscache_req_alloc(folio_mapping(folio), - folio_pos(folio), folio_size(folio)); + folio_pos(folio), folio_size(folio), NULL); if (IS_ERR(req)) { folio_unlock(folio); return PTR_ERR(req); @@ -167,32 +181,20 @@ static int erofs_fscache_meta_read_folio(struct file = *data, struct folio *folio) return ret; } =20 -/* - * Read into page cache in the range described by (@pos, @len). - * - * On return, if the output @unlock is true, the caller is responsible for= page - * unlocking; otherwise the callee will take this responsibility through r= equest - * completion. - * - * The return value is the number of bytes successfully handled, or negati= ve - * error code on failure. The only exception is that, the length of the ra= nge - * instead of the error code is returned on failure after request is alloc= ated, - * so that .readahead() could advance rac accordingly. - */ -static int erofs_fscache_data_read(struct address_space *mapping, - loff_t pos, size_t len, bool *unlock) +static int erofs_fscache_data_read_slice(struct erofs_fscache_request *req) { + struct address_space *mapping =3D req->mapping; struct inode *inode =3D mapping->host; struct super_block *sb =3D inode->i_sb; - struct erofs_fscache_request *req; + loff_t pos =3D req->start + req->submitted; + struct erofs_fscache_request *new; struct erofs_map_blocks map; struct erofs_map_dev mdev; struct iov_iter iter; + loff_t pstart; size_t count; int ret; =20 - *unlock =3D true; - map.m_la =3D pos; ret =3D erofs_map_blocks(inode, &map, EROFS_GET_BLOCKS_RAW); if (ret) @@ -201,36 +203,37 @@ static int erofs_fscache_data_read(struct address_spa= ce *mapping, if (map.m_flags & EROFS_MAP_META) { struct erofs_buf buf =3D __EROFS_BUF_INITIALIZER; erofs_blk_t blknr; - size_t offset, size; + size_t offset; void *src; =20 /* For tail packing layout, the offset may be non-zero. */ offset =3D erofs_blkoff(map.m_pa); blknr =3D erofs_blknr(map.m_pa); - size =3D map.m_llen; + count =3D map.m_llen; =20 src =3D erofs_read_metabuf(&buf, sb, blknr, EROFS_KMAP); if (IS_ERR(src)) return PTR_ERR(src); =20 - iov_iter_xarray(&iter, READ, &mapping->i_pages, pos, PAGE_SIZE); - if (copy_to_iter(src + offset, size, &iter) !=3D size) { + iov_iter_xarray(&iter, READ, &mapping->i_pages, pos, count); + if (copy_to_iter(src + offset, count, &iter) !=3D count) { erofs_put_metabuf(&buf); return -EFAULT; } - iov_iter_zero(PAGE_SIZE - size, &iter); erofs_put_metabuf(&buf); - return PAGE_SIZE; + req->submitted +=3D count; + return 0; } =20 + count =3D req->len - req->submitted; if (!(map.m_flags & EROFS_MAP_MAPPED)) { - count =3D len; iov_iter_xarray(&iter, READ, &mapping->i_pages, pos, count); iov_iter_zero(count, &iter); - return count; + req->submitted +=3D count; + return 0; } =20 - count =3D min_t(size_t, map.m_llen - (pos - map.m_la), len); + count =3D min_t(size_t, map.m_llen - (pos - map.m_la), count); DBG_BUGON(!count || count % PAGE_SIZE); =20 mdev =3D (struct erofs_map_dev) { @@ -241,68 +244,75 @@ static int erofs_fscache_data_read(struct address_spa= ce *mapping, if (ret) return ret; =20 - req =3D erofs_fscache_req_alloc(mapping, pos, count); - if (IS_ERR(req)) - return PTR_ERR(req); + pstart =3D mdev.m_pa + (pos - map.m_la); + if (!req->submitted) + return erofs_fscache_read_folios_async(mdev.m_fscache->cookie, + req, pstart, count); + + /* allocate a child request if current request ever been submitted */ + new =3D erofs_fscache_req_alloc(mapping, + req->start + req->submitted, count, req); + if (IS_ERR(new)) + return PTR_ERR(new); =20 - *unlock =3D false; ret =3D erofs_fscache_read_folios_async(mdev.m_fscache->cookie, - req, mdev.m_pa + (pos - map.m_la), count); - if (ret) - req->error =3D ret; + new, pstart, count); + req->submitted +=3D count; + erofs_fscache_req_put(new); + return ret; +} =20 - erofs_fscache_req_put(req); - return count; +/* + * Read into page cache in the range described by (req->start, req->len). + */ +static int erofs_fscache_data_read(struct erofs_fscache_request *req) +{ + int ret; + + do { + ret =3D erofs_fscache_data_read_slice(req); + if (ret) + req->error =3D ret; + } while (!ret && req->submitted < req->len); + + return ret; } =20 static int erofs_fscache_read_folio(struct file *file, struct folio *folio) { - bool unlock; + struct erofs_fscache_request *req; int ret; =20 - DBG_BUGON(folio_size(folio) !=3D EROFS_BLKSIZ); - - ret =3D erofs_fscache_data_read(folio_mapping(folio), folio_pos(folio), - folio_size(folio), &unlock); - if (unlock) { - if (ret > 0) - folio_mark_uptodate(folio); + req =3D erofs_fscache_req_alloc(folio_mapping(folio), + folio_pos(folio), folio_size(folio), NULL); + if (IS_ERR(req)) { folio_unlock(folio); + return PTR_ERR(req); } - return ret < 0 ? ret : 0; + + ret =3D erofs_fscache_data_read(req); + erofs_fscache_req_put(req); + return ret; } =20 static void erofs_fscache_readahead(struct readahead_control *rac) { - struct folio *folio; - size_t len, done =3D 0; - loff_t start, pos; - bool unlock; - int ret, size; + struct erofs_fscache_request *req; =20 if (!readahead_count(rac)) return; =20 - start =3D readahead_pos(rac); - len =3D readahead_length(rac); + req =3D erofs_fscache_req_alloc(rac->mapping, + readahead_pos(rac), readahead_length(rac), NULL); + if (IS_ERR(req)) + return; =20 - do { - pos =3D start + done; - ret =3D erofs_fscache_data_read(rac->mapping, pos, - len - done, &unlock); - if (ret <=3D 0) - return; + /* The request completion will drop refs on the folios. */ + while (readahead_folio(rac)) + ; =20 - size =3D ret; - while (size) { - folio =3D readahead_folio(rac); - size -=3D folio_size(folio); - if (unlock) { - folio_mark_uptodate(folio); - folio_unlock(folio); - } - } - } while ((done +=3D ret) < len); + erofs_fscache_data_read(req); + erofs_fscache_req_put(req); } =20 static const struct address_space_operations erofs_fscache_meta_aops =3D { --=20 2.19.1.6.gb485710b From nobody Fri Sep 19 03:48:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 379F3C4321E for ; Tue, 29 Nov 2022 11:59:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233394AbiK2L7L (ORCPT ); Tue, 29 Nov 2022 06:59:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232772AbiK2L6l (ORCPT ); Tue, 29 Nov 2022 06:58:41 -0500 Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A22C757 for ; Tue, 29 Nov 2022 03:58:39 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VW-cX8S_1669723115; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VW-cX8S_1669723115) by smtp.aliyun-inc.com; Tue, 29 Nov 2022 19:58:36 +0800 From: Jingbo Xu To: xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Subject: [PATCH v3 2/2] erofs: enable large folios for fscache mode Date: Tue, 29 Nov 2022 19:58:33 +0800 Message-Id: <20221129115833.41062-3-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20221129115833.41062-1-jefflexu@linux.alibaba.com> References: <20221129115833.41062-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Enable large folios for fscache mode. Enable this feature for non-compressed format for now, until the compression part supports large folios later. One thing worth noting is that, the feature is not enabled for the meta data routine since meta inodes don't need large folios for now, nor do they support readahead yet. Signed-off-by: Jingbo Xu Reviewed-by: Jia Zhu Link: https://lore.kernel.org/r/20221128025011.36352-3-jefflexu@linux.aliba= ba.com Signed-off-by: Gao Xiang --- fs/erofs/inode.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index e457b8a59ee7..85932086d23f 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -295,8 +295,7 @@ static int erofs_fill_inode(struct inode *inode) goto out_unlock; } inode->i_mapping->a_ops =3D &erofs_raw_access_aops; - if (!erofs_is_fscache_mode(inode->i_sb)) - mapping_set_large_folios(inode->i_mapping); + mapping_set_large_folios(inode->i_mapping); #ifdef CONFIG_EROFS_FS_ONDEMAND if (erofs_is_fscache_mode(inode->i_sb)) inode->i_mapping->a_ops =3D &erofs_fscache_access_aops; --=20 2.19.1.6.gb485710b