From nobody Fri Dec 19 00:22:58 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8901EC4167B for ; Wed, 6 Dec 2023 09:11:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346790AbjLFJLP (ORCPT ); Wed, 6 Dec 2023 04:11:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34742 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346664AbjLFJLJ (ORCPT ); Wed, 6 Dec 2023 04:11:09 -0500 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF47BD68 for ; Wed, 6 Dec 2023 01:11:14 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R831e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VxxSRPy_1701853871; Received: from e69b19392.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VxxSRPy_1701853871) by smtp.aliyun-inc.com; Wed, 06 Dec 2023 17:11:12 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , dhavale@google.com, Gao Xiang Subject: [PATCH 1/5] erofs: support I/O submission for sub-page compressed blocks Date: Wed, 6 Dec 2023 17:10:53 +0800 Message-Id: <20231206091057.87027-2-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> References: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a basic I/O submission path first to support sub-page blocks: - Temporary short-lived pages will be used entirely; - In-place I/O pages can be used partially, but compressed pages need to be able to be mapped in contiguous virtual memory. As a start, currently cache decompression is explicitly disabled for sub-page blocks, which will be supported in the future. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Reviewed-by: Yue Hu --- fs/erofs/zdata.c | 156 ++++++++++++++++++++++------------------------- 1 file changed, 74 insertions(+), 82 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index a33cd6757f98..421c0a88a0ca 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -1435,86 +1435,85 @@ static void z_erofs_decompress_kickoff(struct z_ero= fs_decompressqueue *io, z_erofs_decompressqueue_work(&io->u.work); } =20 -static struct page *pickup_page_for_submission(struct z_erofs_pcluster *pc= l, - unsigned int nr, - struct page **pagepool, - struct address_space *mc) +static void z_erofs_fill_bio_vec(struct bio_vec *bvec, + struct z_erofs_decompress_frontend *f, + struct z_erofs_pcluster *pcl, + unsigned int nr, + struct address_space *mc) { - const pgoff_t index =3D pcl->obj.index; gfp_t gfp =3D mapping_gfp_mask(mc); bool tocache =3D false; - + struct z_erofs_bvec *zbv =3D pcl->compressed_bvecs + nr; struct address_space *mapping; - struct page *oldpage, *page; - int justfound; + struct page *page, *oldpage; + int justfound, bs =3D i_blocksize(f->inode); =20 + /* Except for inplace pages, the entire page can be used for I/Os */ + bvec->bv_offset =3D 0; + bvec->bv_len =3D PAGE_SIZE; repeat: - page =3D READ_ONCE(pcl->compressed_bvecs[nr].page); - oldpage =3D page; - - if (!page) + oldpage =3D READ_ONCE(zbv->page); + if (!oldpage) goto out_allocpage; =20 - justfound =3D (unsigned long)page & 1UL; - page =3D (struct page *)((unsigned long)page & ~1UL); + justfound =3D (unsigned long)oldpage & 1UL; + page =3D (struct page *)((unsigned long)oldpage & ~1UL); + bvec->bv_page =3D page; =20 + DBG_BUGON(z_erofs_is_shortlived_page(page)); /* - * preallocated cached pages, which is used to avoid direct reclaim - * otherwise, it will go inplace I/O path instead. + * Handle preallocated cached pages. We tried to allocate such pages + * without triggering direct reclaim. If allocation failed, inplace + * file-backed pages will be used instead. */ if (page->private =3D=3D Z_EROFS_PREALLOCATED_PAGE) { - WRITE_ONCE(pcl->compressed_bvecs[nr].page, page); set_page_private(page, 0); + WRITE_ONCE(zbv->page, page); tocache =3D true; goto out_tocache; } - mapping =3D READ_ONCE(page->mapping); =20 + mapping =3D READ_ONCE(page->mapping); /* - * file-backed online pages in plcuster are all locked steady, - * therefore it is impossible for `mapping' to be NULL. + * File-backed pages for inplace I/Os are all locked steady, + * therefore it is impossible for `mapping` to be NULL. */ - if (mapping && mapping !=3D mc) - /* ought to be unmanaged pages */ - goto out; - - /* directly return for shortlived page as well */ - if (z_erofs_is_shortlived_page(page)) - goto out; + if (mapping && mapping !=3D mc) { + if (zbv->offset < 0) + bvec->bv_offset =3D round_up(-zbv->offset, bs); + bvec->bv_len =3D round_up(zbv->end, bs) - bvec->bv_offset; + return; + } =20 lock_page(page); - /* only true if page reclaim goes wrong, should never happen */ DBG_BUGON(justfound && PagePrivate(page)); =20 - /* the page is still in manage cache */ + /* the cached page is still in managed cache */ if (page->mapping =3D=3D mc) { - WRITE_ONCE(pcl->compressed_bvecs[nr].page, page); - + WRITE_ONCE(zbv->page, page); + /* + * The cached page is still available but without a valid + * `->private` pcluster hint. Let's reconnect them. + */ if (!PagePrivate(page)) { - /* - * impossible to be !PagePrivate(page) for - * the current restriction as well if - * the page is already in compressed_bvecs[]. - */ DBG_BUGON(!justfound); - - justfound =3D 0; - set_page_private(page, (unsigned long)pcl); - SetPagePrivate(page); + /* compressed_bvecs[] already takes a ref */ + attach_page_private(page, pcl); + put_page(page); } =20 - /* no need to submit io if it is already up-to-date */ + /* no need to submit if it is already up-to-date */ if (PageUptodate(page)) { unlock_page(page); - page =3D NULL; + bvec->bv_page =3D NULL; } - goto out; + return; } =20 /* - * the managed page has been truncated, it's unsafe to - * reuse this one, let's allocate a new cache-managed page. + * It has been truncated, so it's unsafe to reuse this one. Let's + * allocate a new page for compressed data. */ DBG_BUGON(page->mapping); DBG_BUGON(!justfound); @@ -1523,25 +1522,23 @@ static struct page *pickup_page_for_submission(stru= ct z_erofs_pcluster *pcl, unlock_page(page); put_page(page); out_allocpage: - page =3D erofs_allocpage(pagepool, gfp | __GFP_NOFAIL); - if (oldpage !=3D cmpxchg(&pcl->compressed_bvecs[nr].page, - oldpage, page)) { - erofs_pagepool_add(pagepool, page); + page =3D erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL); + if (oldpage !=3D cmpxchg(&zbv->page, oldpage, page)) { + erofs_pagepool_add(&f->pagepool, page); cond_resched(); goto repeat; } + bvec->bv_page =3D page; out_tocache: - if (!tocache || add_to_page_cache_lru(page, mc, index + nr, gfp)) { - /* turn into temporary page if fails (1 ref) */ + if (!tocache || bs !=3D PAGE_SIZE || + add_to_page_cache_lru(page, mc, pcl->obj.index + nr, gfp)) { + /* turn into a temporary shortlived page (1 ref) */ set_page_private(page, Z_EROFS_SHORTLIVED_PAGE); - goto out; + return; } attach_page_private(page, pcl); - /* drop a refcount added by allocpage (then we have 2 refs here) */ + /* drop a refcount added by allocpage (then 2 refs in total here) */ put_page(page); - -out: /* the only exit (for tracing and debugging) */ - return page; } =20 static struct z_erofs_decompressqueue *jobqueue_init(struct super_block *s= b, @@ -1596,7 +1593,7 @@ static void move_to_bypass_jobqueue(struct z_erofs_pc= luster *pcl, qtail[JQ_BYPASS] =3D &pcl->next; } =20 -static void z_erofs_decompressqueue_endio(struct bio *bio) +static void z_erofs_submissionqueue_endio(struct bio *bio) { struct z_erofs_decompressqueue *q =3D bio->bi_private; blk_status_t err =3D bio->bi_status; @@ -1608,7 +1605,6 @@ static void z_erofs_decompressqueue_endio(struct bio = *bio) =20 DBG_BUGON(PageUptodate(page)); DBG_BUGON(z_erofs_page_is_invalidated(page)); - if (erofs_page_is_managed(EROFS_SB(q->sb), page)) { if (!err) SetPageUptodate(page); @@ -1631,17 +1627,14 @@ static void z_erofs_submit_queue(struct z_erofs_dec= ompress_frontend *f, struct z_erofs_decompressqueue *q[NR_JOBQUEUES]; z_erofs_next_pcluster_t owned_head =3D f->owned_head; /* bio is NULL initially, so no need to initialize last_{index,bdev} */ - pgoff_t last_index; + erofs_off_t last_pa; struct block_device *last_bdev; unsigned int nr_bios =3D 0; struct bio *bio =3D NULL; unsigned long pflags; int memstall =3D 0; =20 - /* - * if managed cache is enabled, bypass jobqueue is needed, - * no need to read from device for all pclusters in this queue. - */ + /* No need to read from device for pclusters in the bypass queue. */ q[JQ_BYPASS] =3D jobqueue_init(sb, fgq + JQ_BYPASS, NULL); q[JQ_SUBMIT] =3D jobqueue_init(sb, fgq + JQ_SUBMIT, force_fg); =20 @@ -1654,7 +1647,8 @@ static void z_erofs_submit_queue(struct z_erofs_decom= press_frontend *f, do { struct erofs_map_dev mdev; struct z_erofs_pcluster *pcl; - pgoff_t cur, end; + erofs_off_t cur, end; + struct bio_vec bvec; unsigned int i =3D 0; bool bypass =3D true; =20 @@ -1673,18 +1667,14 @@ static void z_erofs_submit_queue(struct z_erofs_dec= ompress_frontend *f, }; (void)erofs_map_dev(sb, &mdev); =20 - cur =3D erofs_blknr(sb, mdev.m_pa); - end =3D cur + pcl->pclusterpages; - + cur =3D mdev.m_pa; + end =3D cur + pcl->pclusterpages << PAGE_SHIFT; do { - struct page *page; - - page =3D pickup_page_for_submission(pcl, i++, - &f->pagepool, mc); - if (!page) + z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); + if (!bvec.bv_page) continue; =20 - if (bio && (cur !=3D last_index + 1 || + if (bio && (cur !=3D last_pa || last_bdev !=3D mdev.m_bdev)) { submit_bio_retry: submit_bio(bio); @@ -1695,7 +1685,8 @@ static void z_erofs_submit_queue(struct z_erofs_decom= press_frontend *f, bio =3D NULL; } =20 - if (unlikely(PageWorkingset(page)) && !memstall) { + if (unlikely(PageWorkingset(bvec.bv_page)) && + !memstall) { psi_memstall_enter(&pflags); memstall =3D 1; } @@ -1703,23 +1694,24 @@ static void z_erofs_submit_queue(struct z_erofs_dec= ompress_frontend *f, if (!bio) { bio =3D bio_alloc(mdev.m_bdev, BIO_MAX_VECS, REQ_OP_READ, GFP_NOIO); - bio->bi_end_io =3D z_erofs_decompressqueue_endio; - - last_bdev =3D mdev.m_bdev; - bio->bi_iter.bi_sector =3D (sector_t)cur << - (sb->s_blocksize_bits - 9); + bio->bi_end_io =3D z_erofs_submissionqueue_endio; + bio->bi_iter.bi_sector =3D cur >> 9; bio->bi_private =3D q[JQ_SUBMIT]; if (readahead) bio->bi_opf |=3D REQ_RAHEAD; ++nr_bios; + last_bdev =3D mdev.m_bdev; } =20 - if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) + if (cur + bvec.bv_len > end) + bvec.bv_len =3D end - cur; + if (!bio_add_page(bio, bvec.bv_page, bvec.bv_len, + bvec.bv_offset)) goto submit_bio_retry; =20 - last_index =3D cur; + last_pa =3D cur + bvec.bv_len; bypass =3D false; - } while (++cur < end); + } while ((cur +=3D bvec.bv_len) < end); =20 if (!bypass) qtail[JQ_SUBMIT] =3D &pcl->next; --=20 2.39.3 From nobody Fri Dec 19 00:22:58 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07D72C10DC1 for ; Wed, 6 Dec 2023 09:11:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346780AbjLFJLT (ORCPT ); Wed, 6 Dec 2023 04:11:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346731AbjLFJLK (ORCPT ); Wed, 6 Dec 2023 04:11:10 -0500 Received: from out30-111.freemail.mail.aliyun.com (out30-111.freemail.mail.aliyun.com [115.124.30.111]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1DBE10C0 for ; Wed, 6 Dec 2023 01:11:15 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VxxSRQ7_1701853873; Received: from e69b19392.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VxxSRQ7_1701853873) by smtp.aliyun-inc.com; Wed, 06 Dec 2023 17:11:13 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , dhavale@google.com, Gao Xiang Subject: [PATCH 2/5] erofs: record `pclustersize` in bytes instead of pages Date: Wed, 6 Dec 2023 17:10:54 +0800 Message-Id: <20231206091057.87027-3-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> References: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Currently, compressed sizes are recorded in pages using `pclusterpages`, However, for tailpacking pclusters, `tailpacking_size` is used instead. This approach doesn't work when dealing with sub-page blocks. To address this, let's switch them to the unified `pclustersize` in bytes. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Reviewed-by: Yue Hu --- fs/erofs/zdata.c | 64 ++++++++++++++++++++---------------------------- 1 file changed, 26 insertions(+), 38 deletions(-) diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 421c0a88a0ca..d02989466711 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -56,6 +56,9 @@ struct z_erofs_pcluster { /* L: total number of bvecs */ unsigned int vcnt; =20 + /* I: pcluster size (compressed size) in bytes */ + unsigned int pclustersize; + /* I: page offset of start position of decompression */ unsigned short pageofs_out; =20 @@ -70,14 +73,6 @@ struct z_erofs_pcluster { struct rcu_head rcu; }; =20 - union { - /* I: physical cluster size in pages */ - unsigned short pclusterpages; - - /* I: tailpacking inline compressed size */ - unsigned short tailpacking_size; - }; - /* I: compression algorithm format */ unsigned char algorithmformat; =20 @@ -115,9 +110,7 @@ static inline bool z_erofs_is_inline_pcluster(struct z_= erofs_pcluster *pcl) =20 static inline unsigned int z_erofs_pclusterpages(struct z_erofs_pcluster *= pcl) { - if (z_erofs_is_inline_pcluster(pcl)) - return 1; - return pcl->pclusterpages; + return PAGE_ALIGN(pcl->pclustersize) >> PAGE_SHIFT; } =20 /* @@ -298,12 +291,12 @@ static int z_erofs_create_pcluster_pool(void) return 0; } =20 -static struct z_erofs_pcluster *z_erofs_alloc_pcluster(unsigned int nrpage= s) +static struct z_erofs_pcluster *z_erofs_alloc_pcluster(unsigned int size) { - int i; + unsigned int nrpages =3D PAGE_ALIGN(size) >> PAGE_SHIFT; + struct z_erofs_pcluster_slab *pcs =3D pcluster_pool; =20 - for (i =3D 0; i < ARRAY_SIZE(pcluster_pool); ++i) { - struct z_erofs_pcluster_slab *pcs =3D pcluster_pool + i; + for (; pcs < pcluster_pool + ARRAY_SIZE(pcluster_pool); ++pcs) { struct z_erofs_pcluster *pcl; =20 if (nrpages > pcs->maxpages) @@ -312,7 +305,7 @@ static struct z_erofs_pcluster *z_erofs_alloc_pcluster(= unsigned int nrpages) pcl =3D kmem_cache_zalloc(pcs->slab, GFP_NOFS); if (!pcl) return ERR_PTR(-ENOMEM); - pcl->pclusterpages =3D nrpages; + pcl->pclustersize =3D size; return pcl; } return ERR_PTR(-EINVAL); @@ -559,6 +552,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompres= s_frontend *fe) { struct address_space *mc =3D MNGD_MAPPING(EROFS_I_SB(fe->inode)); struct z_erofs_pcluster *pcl =3D fe->pcl; + unsigned int pclusterpages =3D z_erofs_pclusterpages(pcl); bool shouldalloc =3D z_erofs_should_alloc_cache(fe); bool standalone =3D true; /* @@ -572,10 +566,9 @@ static void z_erofs_bind_cache(struct z_erofs_decompre= ss_frontend *fe) if (fe->mode < Z_EROFS_PCLUSTER_FOLLOWED) return; =20 - for (i =3D 0; i < pcl->pclusterpages; ++i) { - struct page *page; + for (i =3D 0; i < pclusterpages; ++i) { + struct page *page, *newpage; void *t; /* mark pages just found for debugging */ - struct page *newpage =3D NULL; =20 /* the compressed page was loaded before */ if (READ_ONCE(pcl->compressed_bvecs[i].page)) @@ -585,6 +578,7 @@ static void z_erofs_bind_cache(struct z_erofs_decompres= s_frontend *fe) =20 if (page) { t =3D (void *)((unsigned long)page | 1); + newpage =3D NULL; } else { /* I/O is needed, no possible to decompress directly */ standalone =3D false; @@ -592,9 +586,8 @@ static void z_erofs_bind_cache(struct z_erofs_decompres= s_frontend *fe) continue; =20 /* - * try to use cached I/O if page allocation - * succeeds or fallback to in-place I/O instead - * to avoid any direct reclaim. + * Try cached I/O if allocation succeeds or fallback to + * in-place I/O instead to avoid any direct reclaim. */ newpage =3D erofs_allocpage(&fe->pagepool, gfp); if (!newpage) @@ -626,6 +619,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_= info *sbi, { struct z_erofs_pcluster *const pcl =3D container_of(grp, struct z_erofs_pcluster, obj); + unsigned int pclusterpages =3D z_erofs_pclusterpages(pcl); int i; =20 DBG_BUGON(z_erofs_is_inline_pcluster(pcl)); @@ -633,7 +627,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_= info *sbi, * refcount of workgroup is now freezed as 0, * therefore no need to worry about available decompression users. */ - for (i =3D 0; i < pcl->pclusterpages; ++i) { + for (i =3D 0; i < pclusterpages; ++i) { struct page *page =3D pcl->compressed_bvecs[i].page; =20 if (!page) @@ -657,6 +651,7 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_= info *sbi, static bool z_erofs_cache_release_folio(struct folio *folio, gfp_t gfp) { struct z_erofs_pcluster *pcl =3D folio_get_private(folio); + unsigned int pclusterpages =3D z_erofs_pclusterpages(pcl); bool ret; int i; =20 @@ -669,7 +664,7 @@ static bool z_erofs_cache_release_folio(struct folio *f= olio, gfp_t gfp) goto out; =20 DBG_BUGON(z_erofs_is_inline_pcluster(pcl)); - for (i =3D 0; i < pcl->pclusterpages; ++i) { + for (i =3D 0; i < pclusterpages; ++i) { if (pcl->compressed_bvecs[i].page =3D=3D &folio->page) { WRITE_ONCE(pcl->compressed_bvecs[i].page, NULL); ret =3D true; @@ -778,20 +773,20 @@ static void z_erofs_try_to_claim_pcluster(struct z_er= ofs_decompress_frontend *f) static int z_erofs_register_pcluster(struct z_erofs_decompress_frontend *f= e) { struct erofs_map_blocks *map =3D &fe->map; + struct super_block *sb =3D fe->inode->i_sb; bool ztailpacking =3D map->m_flags & EROFS_MAP_META; struct z_erofs_pcluster *pcl; struct erofs_workgroup *grp; int err; =20 if (!(map->m_flags & EROFS_MAP_ENCODED) || - (!ztailpacking && !(map->m_pa >> PAGE_SHIFT))) { + (!ztailpacking && !erofs_blknr(sb, map->m_pa))) { DBG_BUGON(1); return -EFSCORRUPTED; } =20 /* no available pcluster, let's allocate one */ - pcl =3D z_erofs_alloc_pcluster(ztailpacking ? 1 : - map->m_plen >> PAGE_SHIFT); + pcl =3D z_erofs_alloc_pcluster(map->m_plen); if (IS_ERR(pcl)) return PTR_ERR(pcl); =20 @@ -816,9 +811,8 @@ static int z_erofs_register_pcluster(struct z_erofs_dec= ompress_frontend *fe) if (ztailpacking) { pcl->obj.index =3D 0; /* which indicates ztailpacking */ pcl->pageofs_in =3D erofs_blkoff(fe->inode->i_sb, map->m_pa); - pcl->tailpacking_size =3D map->m_plen; } else { - pcl->obj.index =3D map->m_pa >> PAGE_SHIFT; + pcl->obj.index =3D erofs_blknr(sb, map->m_pa); =20 grp =3D erofs_insert_workgroup(fe->inode->i_sb, &pcl->obj); if (IS_ERR(grp)) { @@ -1244,8 +1238,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs= _decompress_backend *be, unsigned int pclusterpages =3D z_erofs_pclusterpages(pcl); const struct z_erofs_decompressor *decompressor =3D &erofs_decompressors[pcl->algorithmformat]; - unsigned int i, inputsize; - int err2; + int i, err2; struct page *page; bool overlapped; =20 @@ -1282,18 +1275,13 @@ static int z_erofs_decompress_pcluster(struct z_ero= fs_decompress_backend *be, if (err) goto out; =20 - if (z_erofs_is_inline_pcluster(pcl)) - inputsize =3D pcl->tailpacking_size; - else - inputsize =3D pclusterpages * PAGE_SIZE; - err =3D decompressor->decompress(&(struct z_erofs_decompress_req) { .sb =3D be->sb, .in =3D be->compressed_pages, .out =3D be->decompressed_pages, .pageofs_in =3D pcl->pageofs_in, .pageofs_out =3D pcl->pageofs_out, - .inputsize =3D inputsize, + .inputsize =3D pcl->pclustersize, .outputsize =3D pcl->length, .alg =3D pcl->algorithmformat, .inplace_io =3D overlapped, @@ -1668,7 +1656,7 @@ static void z_erofs_submit_queue(struct z_erofs_decom= press_frontend *f, (void)erofs_map_dev(sb, &mdev); =20 cur =3D mdev.m_pa; - end =3D cur + pcl->pclusterpages << PAGE_SHIFT; + end =3D cur + pcl->pclustersize; do { z_erofs_fill_bio_vec(&bvec, f, pcl, i++, mc); if (!bvec.bv_page) --=20 2.39.3 From nobody Fri Dec 19 00:22:58 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AD27C4167B for ; Wed, 6 Dec 2023 09:11:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346824AbjLFJLV (ORCPT ); Wed, 6 Dec 2023 04:11:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346716AbjLFJLL (ORCPT ); Wed, 6 Dec 2023 04:11:11 -0500 Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F31FF11F for ; Wed, 6 Dec 2023 01:11:16 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VxxSRQH_1701853874; Received: from e69b19392.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VxxSRQH_1701853874) by smtp.aliyun-inc.com; Wed, 06 Dec 2023 17:11:14 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , dhavale@google.com, Gao Xiang Subject: [PATCH 3/5] erofs: fix up compacted indexes for block size < 4096 Date: Wed, 6 Dec 2023 17:10:55 +0800 Message-Id: <20231206091057.87027-4-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> References: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Previously, the block size always equaled to PAGE_SIZE, therefore `lclusterbits` couldn't be less than 12. Since sub-page compressed blocks are now considered, `lobits` for a lcluster in each pack cannot always be `lclusterbits` as before. Otherwise, there is no enough room for the special value `Z_EROFS_LI_D0_CBLKCNT`. To support smaller block sizes, `lobits` for each compacted lcluster is now calculated as: lobits =3D max(lclusterbits, ilog2(Z_EROFS_LI_D0_CBLKCNT) + 1) Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Reviewed-by: Yue Hu --- fs/erofs/zmap.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/fs/erofs/zmap.c b/fs/erofs/zmap.c index 7b55111fd533..9753875e41cb 100644 --- a/fs/erofs/zmap.c +++ b/fs/erofs/zmap.c @@ -82,29 +82,26 @@ static int z_erofs_load_full_lcluster(struct z_erofs_ma= precorder *m, } =20 static unsigned int decode_compactedbits(unsigned int lobits, - unsigned int lomask, u8 *in, unsigned int pos, u8 *type) { const unsigned int v =3D get_unaligned_le32(in + pos / 8) >> (pos & 7); - const unsigned int lo =3D v & lomask; + const unsigned int lo =3D v & ((1 << lobits) - 1); =20 *type =3D (v >> lobits) & 3; return lo; } =20 -static int get_compacted_la_distance(unsigned int lclusterbits, +static int get_compacted_la_distance(unsigned int lobits, unsigned int encodebits, unsigned int vcnt, u8 *in, int i) { - const unsigned int lomask =3D (1 << lclusterbits) - 1; unsigned int lo, d1 =3D 0; u8 type; =20 DBG_BUGON(i >=3D vcnt); =20 do { - lo =3D decode_compactedbits(lclusterbits, lomask, - in, encodebits * i, &type); + lo =3D decode_compactedbits(lobits, in, encodebits * i, &type); =20 if (type !=3D Z_EROFS_LCLUSTER_TYPE_NONHEAD) return d1; @@ -123,15 +120,14 @@ static int unpack_compacted_index(struct z_erofs_mapr= ecorder *m, { struct erofs_inode *const vi =3D EROFS_I(m->inode); const unsigned int lclusterbits =3D vi->z_logical_clusterbits; - const unsigned int lomask =3D (1 << lclusterbits) - 1; - unsigned int vcnt, base, lo, encodebits, nblk, eofs; + unsigned int vcnt, base, lo, lobits, encodebits, nblk, eofs; int i; u8 *in, type; bool big_pcluster; =20 if (1 << amortizedshift =3D=3D 4 && lclusterbits <=3D 14) vcnt =3D 2; - else if (1 << amortizedshift =3D=3D 2 && lclusterbits =3D=3D 12) + else if (1 << amortizedshift =3D=3D 2 && lclusterbits <=3D 12) vcnt =3D 16; else return -EOPNOTSUPP; @@ -140,6 +136,7 @@ static int unpack_compacted_index(struct z_erofs_maprec= order *m, m->nextpackoff =3D round_down(pos, vcnt << amortizedshift) + (vcnt << amortizedshift); big_pcluster =3D vi->z_advise & Z_EROFS_ADVISE_BIG_PCLUSTER_1; + lobits =3D max(lclusterbits, ilog2(Z_EROFS_LI_D0_CBLKCNT) + 1U); encodebits =3D ((vcnt << amortizedshift) - sizeof(__le32)) * 8 / vcnt; eofs =3D erofs_blkoff(m->inode->i_sb, pos); base =3D round_down(eofs, vcnt << amortizedshift); @@ -147,15 +144,14 @@ static int unpack_compacted_index(struct z_erofs_mapr= ecorder *m, =20 i =3D (eofs - base) >> amortizedshift; =20 - lo =3D decode_compactedbits(lclusterbits, lomask, - in, encodebits * i, &type); + lo =3D decode_compactedbits(lobits, in, encodebits * i, &type); m->type =3D type; if (type =3D=3D Z_EROFS_LCLUSTER_TYPE_NONHEAD) { m->clusterofs =3D 1 << lclusterbits; =20 /* figure out lookahead_distance: delta[1] if needed */ if (lookahead) - m->delta[1] =3D get_compacted_la_distance(lclusterbits, + m->delta[1] =3D get_compacted_la_distance(lobits, encodebits, vcnt, in, i); if (lo & Z_EROFS_LI_D0_CBLKCNT) { if (!big_pcluster) { @@ -174,8 +170,8 @@ static int unpack_compacted_index(struct z_erofs_maprec= order *m, * of which lo saves delta[1] rather than delta[0]. * Hence, get delta[0] by the previous lcluster indirectly. */ - lo =3D decode_compactedbits(lclusterbits, lomask, - in, encodebits * (i - 1), &type); + lo =3D decode_compactedbits(lobits, in, + encodebits * (i - 1), &type); if (type !=3D Z_EROFS_LCLUSTER_TYPE_NONHEAD) lo =3D 0; else if (lo & Z_EROFS_LI_D0_CBLKCNT) @@ -190,8 +186,8 @@ static int unpack_compacted_index(struct z_erofs_maprec= order *m, nblk =3D 1; while (i > 0) { --i; - lo =3D decode_compactedbits(lclusterbits, lomask, - in, encodebits * i, &type); + lo =3D decode_compactedbits(lobits, in, + encodebits * i, &type); if (type =3D=3D Z_EROFS_LCLUSTER_TYPE_NONHEAD) i -=3D lo; =20 @@ -202,8 +198,8 @@ static int unpack_compacted_index(struct z_erofs_maprec= order *m, nblk =3D 0; while (i > 0) { --i; - lo =3D decode_compactedbits(lclusterbits, lomask, - in, encodebits * i, &type); + lo =3D decode_compactedbits(lobits, in, + encodebits * i, &type); if (type =3D=3D Z_EROFS_LCLUSTER_TYPE_NONHEAD) { if (lo & Z_EROFS_LI_D0_CBLKCNT) { --i; --=20 2.39.3 From nobody Fri Dec 19 00:22:58 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3409DC4167B for ; Wed, 6 Dec 2023 09:11:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346806AbjLFJL0 (ORCPT ); Wed, 6 Dec 2023 04:11:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346770AbjLFJLM (ORCPT ); Wed, 6 Dec 2023 04:11:12 -0500 Received: from out30-98.freemail.mail.aliyun.com (out30-98.freemail.mail.aliyun.com [115.124.30.98]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4320D10C1 for ; Wed, 6 Dec 2023 01:11:18 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046056;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VxxSRQT_1701853874; Received: from e69b19392.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VxxSRQT_1701853874) by smtp.aliyun-inc.com; Wed, 06 Dec 2023 17:11:15 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , dhavale@google.com, Gao Xiang Subject: [PATCH 4/5] erofs: refine z_erofs_transform_plain() for sub-page block support Date: Wed, 6 Dec 2023 17:10:56 +0800 Message-Id: <20231206091057.87027-5-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> References: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Sub-page block support is still unusable even with previous commits if interlaced PLAIN pclusters exist. Such pclusters can be found if the fragment feature is enabled. This commit tries to handle "the head part" of interlaced PLAIN pclusters first: it was once explained in commit fdffc091e6f9 ("erofs: support interlaced uncompressed data for compressed files"). It uses a unique way for both shifted and interlaced PLAIN pclusters. As an added bonus, PLAIN pclusters larger than the block size is also supported now for the upcoming large lclusters. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Reviewed-by: Yue Hu --- fs/erofs/decompressor.c | 81 ++++++++++++++++++++++++----------------- 1 file changed, 48 insertions(+), 33 deletions(-) diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 021be5feb1bc..5ec11f5024b7 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -319,43 +319,58 @@ static int z_erofs_lz4_decompress(struct z_erofs_deco= mpress_req *rq, static int z_erofs_transform_plain(struct z_erofs_decompress_req *rq, struct page **pagepool) { - const unsigned int inpages =3D PAGE_ALIGN(rq->inputsize) >> PAGE_SHIFT; - const unsigned int outpages =3D + const unsigned int nrpages_in =3D + PAGE_ALIGN(rq->pageofs_in + rq->inputsize) >> PAGE_SHIFT; + const unsigned int nrpages_out =3D PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT; - const unsigned int righthalf =3D min_t(unsigned int, rq->outputsize, - PAGE_SIZE - rq->pageofs_out); - const unsigned int lefthalf =3D rq->outputsize - righthalf; - const unsigned int interlaced_offset =3D - rq->alg =3D=3D Z_EROFS_COMPRESSION_SHIFTED ? 0 : rq->pageofs_out; - u8 *src; - - if (outpages > 2 && rq->alg =3D=3D Z_EROFS_COMPRESSION_SHIFTED) { - DBG_BUGON(1); - return -EFSCORRUPTED; - } - - if (rq->out[0] =3D=3D *rq->in) { - DBG_BUGON(rq->pageofs_out); - return 0; + const unsigned int bs =3D rq->sb->s_blocksize; + unsigned int cur =3D 0, ni =3D 0, no, pi, po, insz, cnt; + u8 *kin; + + DBG_BUGON(rq->outputsize > rq->inputsize); + if (rq->alg =3D=3D Z_EROFS_COMPRESSION_INTERLACED) { + cur =3D bs - (rq->pageofs_out & (bs - 1)); + pi =3D (rq->pageofs_in + rq->inputsize - cur) & ~PAGE_MASK; + cur =3D min(cur, rq->outputsize); + if (cur && rq->out[0]) { + kin =3D kmap_local_page(rq->in[nrpages_in - 1]); + if (rq->out[0] =3D=3D rq->in[nrpages_in - 1]) { + memmove(kin + rq->pageofs_out, kin + pi, cur); + flush_dcache_page(rq->out[0]); + } else { + memcpy_to_page(rq->out[0], rq->pageofs_out, + kin + pi, cur); + } + kunmap_local(kin); + } + rq->outputsize -=3D cur; } =20 - src =3D kmap_local_page(rq->in[inpages - 1]) + rq->pageofs_in; - if (rq->out[0]) - memcpy_to_page(rq->out[0], rq->pageofs_out, - src + interlaced_offset, righthalf); - - if (outpages > inpages) { - DBG_BUGON(!rq->out[outpages - 1]); - if (rq->out[outpages - 1] !=3D rq->in[inpages - 1]) { - memcpy_to_page(rq->out[outpages - 1], 0, src + - (interlaced_offset ? 0 : righthalf), - lefthalf); - } else if (!interlaced_offset) { - memmove(src, src + righthalf, lefthalf); - flush_dcache_page(rq->in[inpages - 1]); - } + for (; rq->outputsize; rq->pageofs_in =3D 0, cur +=3D PAGE_SIZE, ni++) { + insz =3D min(PAGE_SIZE - rq->pageofs_in, rq->outputsize); + rq->outputsize -=3D insz; + if (!rq->in[ni]) + continue; + kin =3D kmap_local_page(rq->in[ni]); + pi =3D 0; + do { + no =3D (rq->pageofs_out + cur + pi) >> PAGE_SHIFT; + po =3D (rq->pageofs_out + cur + pi) & ~PAGE_MASK; + DBG_BUGON(no >=3D nrpages_out); + cnt =3D min(insz - pi, PAGE_SIZE - po); + if (rq->out[no] =3D=3D rq->in[ni]) { + memmove(kin + po, + kin + rq->pageofs_in + pi, cnt); + flush_dcache_page(rq->out[no]); + } else if (rq->out[no]) { + memcpy_to_page(rq->out[no], po, + kin + rq->pageofs_in + pi, cnt); + } + pi +=3D cnt; + } while (pi < insz); + kunmap_local(kin); } - kunmap_local(src); + DBG_BUGON(ni > nrpages_in); return 0; } =20 --=20 2.39.3 From nobody Fri Dec 19 00:22:58 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 542FCC4167B for ; Wed, 6 Dec 2023 09:11:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346888AbjLFJL3 (ORCPT ); Wed, 6 Dec 2023 04:11:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346777AbjLFJLN (ORCPT ); Wed, 6 Dec 2023 04:11:13 -0500 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D194F10C0 for ; Wed, 6 Dec 2023 01:11:18 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R291e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0VxxSRQh_1701853876; Received: from e69b19392.et15sqa.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VxxSRQh_1701853876) by smtp.aliyun-inc.com; Wed, 06 Dec 2023 17:11:16 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , dhavale@google.com, Gao Xiang Subject: [PATCH 5/5] erofs: enable sub-page compressed block support Date: Wed, 6 Dec 2023 17:10:57 +0800 Message-Id: <20231206091057.87027-6-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> References: <20231206091057.87027-1-hsiangkao@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Let's just disable cached decompression and inplace I/Os for partial pages as a first step in order to enable sub-page block initial support. In other words, currently it works primarily based on temporary short-lived pages. Don't expect too much in terms of performance. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Reviewed-by: Yue Hu --- fs/erofs/inode.c | 6 ++++-- fs/erofs/zdata.c | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index 14a79d3226ab..3d616dea55dc 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -259,8 +259,10 @@ static int erofs_fill_inode(struct inode *inode) =20 if (erofs_inode_is_data_compressed(vi->datalayout)) { #ifdef CONFIG_EROFS_FS_ZIP - if (!erofs_is_fscache_mode(inode->i_sb) && - inode->i_sb->s_blocksize_bits =3D=3D PAGE_SHIFT) { + if (!erofs_is_fscache_mode(inode->i_sb)) { + DO_ONCE_LITE_IF(inode->i_sb->s_blocksize !=3D PAGE_SIZE, + erofs_info, inode->i_sb, + "EXPERIMENTAL EROFS subpage compressed block support in use. Use at = your own risk!"); inode->i_mapping->a_ops =3D &z_erofs_aops; err =3D 0; goto out_unlock; diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index d02989466711..a2c3e87d2f81 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -563,6 +563,8 @@ static void z_erofs_bind_cache(struct z_erofs_decompres= s_frontend *fe) __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN; unsigned int i; =20 + if (i_blocksize(fe->inode) !=3D PAGE_SIZE) + return; if (fe->mode < Z_EROFS_PCLUSTER_FOLLOWED) return; =20 @@ -967,12 +969,12 @@ static int z_erofs_do_read_page(struct z_erofs_decomp= ress_frontend *fe, struct inode *const inode =3D fe->inode; struct erofs_map_blocks *const map =3D &fe->map; const loff_t offset =3D page_offset(page); + const unsigned int bs =3D i_blocksize(inode); bool tight =3D true, exclusive; unsigned int cur, end, len, split; int err =3D 0; =20 z_erofs_onlinepage_init(page); - split =3D 0; end =3D PAGE_SIZE; repeat: @@ -1021,7 +1023,7 @@ static int z_erofs_do_read_page(struct z_erofs_decomp= ress_frontend *fe, * for inplace I/O or bvpage (should be processed in a strict order.) */ tight &=3D (fe->mode > Z_EROFS_PCLUSTER_FOLLOWED_NOINPLACE); - exclusive =3D (!cur && ((split <=3D 1) || tight)); + exclusive =3D (!cur && ((split <=3D 1) || (tight && bs =3D=3D PAGE_SIZE))= ); if (cur) tight &=3D (fe->mode >=3D Z_EROFS_PCLUSTER_FOLLOWED); =20 --=20 2.39.3