From nobody Wed Dec 17 05:57:20 2025 Received: from out30-119.freemail.mail.aliyun.com (out30-119.freemail.mail.aliyun.com [115.124.30.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26C797317D for ; Wed, 8 May 2024 09:03:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.119 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715159038; cv=none; b=Fgn2jMvCTClOv6U2lODLnZslRwYS5A2bl9URe+uhqnQvS/LkLvZ7ZXmcsfnkPh9mAdQcR9wJZ/ZGE6Ua8yXWdOyMZuZo83bFnxHgINXjPgrR1iIvuwJhoBZkcGx7fPljUU18s2/R4jJOHIcGQbUMba+f9zAc3FVH6h3huceSvcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715159038; c=relaxed/simple; bh=9ZlWY00vQOWrcU+4MlAEaVWnaK9T8gshi1/vawmnVi8=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=qD8TJXvE4OcK28tne5vlhiInJJWlyLFA+L8bFLulwfU85NOnDsXNBYe6l+pBK1rdTnRCbSoqZERH+ntP1l4ukDflAYDfcoUKG5swa7fFGHqwhFA2PVNn6IQ9L/K5aLYDOIJI5ObSDks0y9QVkBihTvNVe+Q3czGmfQmoDIabErw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=qv2lq8Wm; arc=none smtp.client-ip=115.124.30.119 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="qv2lq8Wm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1715159032; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=SZx+e9wmgw07dwf/9Z1IxqynR9ARV1IJtyL8bffJTYY=; b=qv2lq8WmmUoUMPRfJUOi9m1QrEbZnom0gjhpwqtOoFWrZwHe2A0zHytQHl0dzB+aGxi4MFVGcj0N6bguEx1gvZdKakLC9XeHhWR1ZZYOmh8ACgTAiM26WJMZnFJ0QlZutAoMJnMGpw0Lt7G+j4iYfwFqb5QETQQLCq+rJGyw2uU= X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R161e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=maildocker-contentspam033045046011;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0W638gUY_1715159027; Received: from x31i01179.sqa.na131.tbsite.net(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0W638gUY_1715159027) by smtp.aliyun-inc.com; Wed, 08 May 2024 17:03:52 +0800 From: Gao Xiang To: linux-erofs@lists.ozlabs.org Cc: LKML , Chao Yu , Gao Xiang Subject: [PATCH] erofs: Zstandard compression support Date: Wed, 8 May 2024 17:03:46 +0800 Message-Id: <20240508090346.2992116-1-hsiangkao@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add Zstandard compression as the 4th supported algorithm since it becomes more popular now and some end users have asked this for quite a while [1][2]. Each EROFS physical cluster contains only one valid standard Zstandard frame as described in [3] so that decompression can be performed on a per-pcluster basis independently. Currently, it just leverages multi-call stream decompression APIs with internal sliding window buffers. One-shot or bufferless decompression could be implemented later for even better performance if needed. [1] https://github.com/erofs/erofs-utils/issues/6 [2] https://lore.kernel.org/r/Y08h+z6CZdnS1XBm@B-P7TQMD6M-0146.lan [3] https://www.rfc-editor.org/rfc/rfc8478.txt Signed-off-by: Gao Xiang Acked-by: Chao Yu --- fs/erofs/Kconfig | 15 ++ fs/erofs/Makefile | 1 + fs/erofs/compress.h | 4 + fs/erofs/decompressor.c | 7 + fs/erofs/decompressor_zstd.c | 279 +++++++++++++++++++++++++++++++++++ fs/erofs/erofs_fs.h | 10 ++ fs/erofs/internal.h | 8 + fs/erofs/super.c | 7 + 8 files changed, 331 insertions(+) create mode 100644 fs/erofs/decompressor_zstd.c diff --git a/fs/erofs/Kconfig b/fs/erofs/Kconfig index fffd3919343e..7dcdce660cac 100644 --- a/fs/erofs/Kconfig +++ b/fs/erofs/Kconfig @@ -112,6 +112,21 @@ config EROFS_FS_ZIP_DEFLATE =20 If unsure, say N. =20 +config EROFS_FS_ZIP_ZSTD + bool "EROFS Zstandard compressed data support" + depends on EROFS_FS_ZIP + select ZSTD_DECOMPRESS + help + Saying Y here includes support for reading EROFS file systems + containing Zstandard compressed data. It gives better compression + ratios than the default LZ4 format, while it costs more CPU + overhead. + + Zstandard support is an experimental feature for now and so most + file systems will be readable without selecting this option. + + If unsure, say N. + config EROFS_FS_ONDEMAND bool "EROFS fscache-based on-demand read support" depends on EROFS_FS diff --git a/fs/erofs/Makefile b/fs/erofs/Makefile index 20d1ec422443..097d672e6b14 100644 --- a/fs/erofs/Makefile +++ b/fs/erofs/Makefile @@ -6,4 +6,5 @@ erofs-$(CONFIG_EROFS_FS_XATTR) +=3D xattr.o erofs-$(CONFIG_EROFS_FS_ZIP) +=3D decompressor.o zmap.o zdata.o zutil.o erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) +=3D decompressor_lzma.o erofs-$(CONFIG_EROFS_FS_ZIP_DEFLATE) +=3D decompressor_deflate.o +erofs-$(CONFIG_EROFS_FS_ZIP_ZSTD) +=3D decompressor_zstd.o erofs-$(CONFIG_EROFS_FS_ONDEMAND) +=3D fscache.o diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h index 333587ba6183..19d53c30c8af 100644 --- a/fs/erofs/compress.h +++ b/fs/erofs/compress.h @@ -90,8 +90,12 @@ int z_erofs_load_lzma_config(struct super_block *sb, struct erofs_super_block *dsb, void *data, int size); int z_erofs_load_deflate_config(struct super_block *sb, struct erofs_super_block *dsb, void *data, int size); +int z_erofs_load_zstd_config(struct super_block *sb, + struct erofs_super_block *dsb, void *data, int size); int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq, struct page **pagepool); int z_erofs_deflate_decompress(struct z_erofs_decompress_req *rq, struct page **pagepool); +int z_erofs_zstd_decompress(struct z_erofs_decompress_req *rq, + struct page **pgpl); #endif diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index d2fe8130819e..9d85b6c11c6b 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -399,6 +399,13 @@ const struct z_erofs_decompressor erofs_decompressors[= ] =3D { .name =3D "deflate" }, #endif +#ifdef CONFIG_EROFS_FS_ZIP_ZSTD + [Z_EROFS_COMPRESSION_ZSTD] =3D { + .config =3D z_erofs_load_zstd_config, + .decompress =3D z_erofs_zstd_decompress, + .name =3D "zstd" + }, +#endif }; =20 int z_erofs_parse_cfgs(struct super_block *sb, struct erofs_super_block *d= sb) diff --git a/fs/erofs/decompressor_zstd.c b/fs/erofs/decompressor_zstd.c new file mode 100644 index 000000000000..24279511db3b --- /dev/null +++ b/fs/erofs/decompressor_zstd.c @@ -0,0 +1,279 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include +#include "compress.h" + +struct z_erofs_zstd { + struct z_erofs_zstd *next; + u8 bounce[PAGE_SIZE]; + void *wksp; + unsigned int wkspsz; +}; + +static DEFINE_SPINLOCK(z_erofs_zstd_lock); +static unsigned int z_erofs_zstd_max_dictsize; +static unsigned int z_erofs_zstd_nstrms, z_erofs_zstd_avail_strms; +static struct z_erofs_zstd *z_erofs_zstd_head; +static DECLARE_WAIT_QUEUE_HEAD(z_erofs_zstd_wq); + +module_param_named(zstd_streams, z_erofs_zstd_nstrms, uint, 0444); + +static struct z_erofs_zstd *z_erofs_isolate_strms(bool all) +{ + struct z_erofs_zstd *strm; + +again: + spin_lock(&z_erofs_zstd_lock); + strm =3D z_erofs_zstd_head; + if (!strm) { + spin_unlock(&z_erofs_zstd_lock); + wait_event(z_erofs_zstd_wq, READ_ONCE(z_erofs_zstd_head)); + goto again; + } + z_erofs_zstd_head =3D all ? NULL : strm->next; + spin_unlock(&z_erofs_zstd_lock); + return strm; +} + +void z_erofs_zstd_exit(void) +{ + while (z_erofs_zstd_avail_strms) { + struct z_erofs_zstd *strm, *n; + + for (strm =3D z_erofs_isolate_strms(true); strm; strm =3D n) { + n =3D strm->next; + + kvfree(strm->wksp); + kfree(strm); + --z_erofs_zstd_avail_strms; + } + } +} + +int __init z_erofs_zstd_init(void) +{ + /* by default, use # of possible CPUs instead */ + if (!z_erofs_zstd_nstrms) + z_erofs_zstd_nstrms =3D num_possible_cpus(); + + for (; z_erofs_zstd_avail_strms < z_erofs_zstd_nstrms; + ++z_erofs_zstd_avail_strms) { + struct z_erofs_zstd *strm; + + strm =3D kzalloc(sizeof(*strm), GFP_KERNEL); + if (!strm) { + z_erofs_zstd_exit(); + return -ENOMEM; + } + spin_lock(&z_erofs_zstd_lock); + strm->next =3D z_erofs_zstd_head; + z_erofs_zstd_head =3D strm; + spin_unlock(&z_erofs_zstd_lock); + } + return 0; +} + +int z_erofs_load_zstd_config(struct super_block *sb, + struct erofs_super_block *dsb, void *data, int size) +{ + static DEFINE_MUTEX(zstd_resize_mutex); + struct z_erofs_zstd_cfgs *zstd =3D data; + unsigned int dict_size, wkspsz; + struct z_erofs_zstd *strm, *head =3D NULL; + void *wksp; + + if (!zstd || size < sizeof(struct z_erofs_zstd_cfgs) || zstd->format) { + erofs_err(sb, "unsupported zstd format, size=3D%u", size); + return -EINVAL; + } + + if (zstd->windowlog > ilog2(Z_EROFS_ZSTD_MAX_DICT_SIZE) - 10) { + erofs_err(sb, "unsupported zstd dictionary size %u", dict_size); + return -EINVAL; + } + dict_size =3D 1U << (zstd->windowlog + 10); + + /* in case 2 z_erofs_load_zstd_config() race to avoid deadlock */ + mutex_lock(&zstd_resize_mutex); + if (z_erofs_zstd_max_dictsize >=3D dict_size) { + mutex_unlock(&zstd_resize_mutex); + return 0; + } + + /* 1. collect/isolate all streams for the following check */ + while (z_erofs_zstd_avail_strms) { + struct z_erofs_zstd *n; + + for (strm =3D z_erofs_isolate_strms(true); strm; strm =3D n) { + n =3D strm->next; + strm->next =3D head; + head =3D strm; + --z_erofs_zstd_avail_strms; + } + } + + /* 2. walk each isolated stream and grow max dict_size if needed */ + wkspsz =3D zstd_dstream_workspace_bound(dict_size); + for (strm =3D head; strm; strm =3D strm->next) { + wksp =3D kvmalloc(wkspsz, GFP_KERNEL); + if (!wksp) + break; + kvfree(strm->wksp); + strm->wksp =3D wksp; + strm->wkspsz =3D wkspsz; + } + + /* 3. push back all to the global list and update max dict_size */ + spin_lock(&z_erofs_zstd_lock); + DBG_BUGON(z_erofs_zstd_head); + z_erofs_zstd_head =3D head; + spin_unlock(&z_erofs_zstd_lock); + z_erofs_zstd_avail_strms =3D z_erofs_zstd_nstrms; + wake_up_all(&z_erofs_zstd_wq); + if (!strm) + z_erofs_zstd_max_dictsize =3D dict_size; + mutex_unlock(&zstd_resize_mutex); + return strm ? -ENOMEM : 0; +} + +int z_erofs_zstd_decompress(struct z_erofs_decompress_req *rq, + struct page **pgpl) +{ + const unsigned int nrpages_out =3D + PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT; + const unsigned int nrpages_in =3D + PAGE_ALIGN(rq->inputsize) >> PAGE_SHIFT; + zstd_dstream *stream; + struct super_block *sb =3D rq->sb; + unsigned int insz, outsz, pofs; + struct z_erofs_zstd *strm; + zstd_in_buffer in_buf =3D { NULL, 0, 0 }; + zstd_out_buffer out_buf =3D { NULL, 0, 0 }; + u8 *kin, *kout =3D NULL; + bool bounced =3D false; + int no =3D -1, ni =3D 0, j =3D 0, zerr, err; + + /* 1. get the exact compressed size */ + kin =3D kmap_local_page(*rq->in); + err =3D z_erofs_fixup_insize(rq, kin + rq->pageofs_in, + min_t(unsigned int, rq->inputsize, + sb->s_blocksize - rq->pageofs_in)); + if (err) { + kunmap_local(kin); + return err; + } + + /* 2. get an available ZSTD context */ + strm =3D z_erofs_isolate_strms(false); + + /* 3. multi-call decompress */ + insz =3D rq->inputsize; + outsz =3D rq->outputsize; + stream =3D zstd_init_dstream(z_erofs_zstd_max_dictsize, strm->wksp, strm-= >wkspsz); + if (!stream) { + err =3D -EIO; + goto failed_zinit; + } + + pofs =3D rq->pageofs_out; + in_buf.size =3D min_t(u32, insz, PAGE_SIZE - rq->pageofs_in); + insz -=3D in_buf.size; + in_buf.src =3D kin + rq->pageofs_in; + do { + if (out_buf.size =3D=3D out_buf.pos) { + if (++no >=3D nrpages_out || !outsz) { + erofs_err(sb, "insufficient space for decompressed data"); + err =3D -EFSCORRUPTED; + break; + } + + if (kout) + kunmap_local(kout); + out_buf.size =3D min_t(u32, outsz, PAGE_SIZE - pofs); + outsz -=3D out_buf.size; + if (!rq->out[no]) { + rq->out[no] =3D erofs_allocpage(pgpl, rq->gfp); + if (!rq->out[no]) { + kout =3D NULL; + err =3D -ENOMEM; + break; + } + set_page_private(rq->out[no], + Z_EROFS_SHORTLIVED_PAGE); + } + kout =3D kmap_local_page(rq->out[no]); + out_buf.dst =3D kout + pofs; + out_buf.pos =3D 0; + pofs =3D 0; + } + + if (in_buf.size =3D=3D in_buf.pos && insz) { + if (++ni >=3D nrpages_in) { + erofs_err(sb, "invalid compressed data"); + err =3D -EFSCORRUPTED; + break; + } + + if (kout) /* unlike kmap(), take care of the orders */ + kunmap_local(kout); + kunmap_local(kin); + in_buf.size =3D min_t(u32, insz, PAGE_SIZE); + insz -=3D in_buf.size; + kin =3D kmap_local_page(rq->in[ni]); + in_buf.src =3D kin; + in_buf.pos =3D 0; + bounced =3D false; + if (kout) { + j =3D (u8 *)out_buf.dst - kout; + kout =3D kmap_local_page(rq->out[no]); + out_buf.dst =3D kout + j; + } + } + + /* + * Handle overlapping: Use bounced buffer if the compressed + * data is under processing; Or use short-lived pages from the + * on-stack pagepool where pages share among the same request + * and not _all_ inplace I/O pages are needed to be doubled. + */ + if (!bounced && rq->out[no] =3D=3D rq->in[ni]) { + memcpy(strm->bounce, in_buf.src, in_buf.size); + in_buf.src =3D strm->bounce; + bounced =3D true; + } + + for (j =3D ni + 1; j < nrpages_in; ++j) { + struct page *tmppage; + + if (rq->out[no] !=3D rq->in[j]) + continue; + tmppage =3D erofs_allocpage(pgpl, rq->gfp); + if (!tmppage) { + err =3D -ENOMEM; + goto failed; + } + set_page_private(tmppage, Z_EROFS_SHORTLIVED_PAGE); + copy_highpage(tmppage, rq->in[j]); + rq->in[j] =3D tmppage; + } + zerr =3D zstd_decompress_stream(stream, &out_buf, &in_buf); + if (zstd_is_error(zerr) || (!zerr && outsz)) { + erofs_err(sb, "failed to decompress in[%u] out[%u]: %s", + rq->inputsize, rq->outputsize, + zerr ? zstd_get_error_name(zerr) : "unexpected end of stream"); + err =3D -EFSCORRUPTED; + break; + } + } while (outsz || out_buf.pos < out_buf.size); +failed: + if (kout) + kunmap_local(kout); +failed_zinit: + kunmap_local(kin); + /* 4. push back ZSTD stream context to the global list */ + spin_lock(&z_erofs_zstd_lock); + strm->next =3D z_erofs_zstd_head; + z_erofs_zstd_head =3D strm; + spin_unlock(&z_erofs_zstd_lock); + wake_up(&z_erofs_zstd_wq); + return err; +} diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index a03ec70ba6f2..4bc11602aac8 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -296,6 +296,7 @@ enum { Z_EROFS_COMPRESSION_LZ4 =3D 0, Z_EROFS_COMPRESSION_LZMA =3D 1, Z_EROFS_COMPRESSION_DEFLATE =3D 2, + Z_EROFS_COMPRESSION_ZSTD =3D 3, Z_EROFS_COMPRESSION_MAX }; #define Z_EROFS_ALL_COMPR_ALGS ((1 << Z_EROFS_COMPRESSION_MAX) - 1) @@ -322,6 +323,15 @@ struct z_erofs_deflate_cfgs { u8 reserved[5]; } __packed; =20 +/* 6 bytes (+ length field =3D 8 bytes) */ +struct z_erofs_zstd_cfgs { + u8 format; + u8 windowlog; /* windowLog - ZSTD_WINDOWLOG_ABSOLUTEMIN(10) */ + u8 reserved[4]; +} __packed; + +#define Z_EROFS_ZSTD_MAX_DICT_SIZE Z_EROFS_PCLUSTER_MAX_SIZE + /* * bit 0 : COMPACTED_2B indexes (0 - off; 1 - on) * e.g. for 4k logical cluster size, 4B if compacted 2B is of= f; diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 63891d90e4b1..14988da62856 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -502,6 +502,14 @@ static inline int z_erofs_deflate_init(void) { return = 0; } static inline int z_erofs_deflate_exit(void) { return 0; } #endif /* !CONFIG_EROFS_FS_ZIP_DEFLATE */ =20 +#ifdef CONFIG_EROFS_FS_ZIP_ZSTD +int __init z_erofs_zstd_init(void); +void z_erofs_zstd_exit(void); +#else +static inline int z_erofs_zstd_init(void) { return 0; } +static inline int z_erofs_zstd_exit(void) { return 0; } +#endif /* !CONFIG_EROFS_FS_ZIP_ZSTD */ + #ifdef CONFIG_EROFS_FS_ONDEMAND int erofs_fscache_register_fs(struct super_block *sb); void erofs_fscache_unregister_fs(struct super_block *sb); diff --git a/fs/erofs/super.c b/fs/erofs/super.c index e9ab554c8471..e9bd1eee7b02 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -857,6 +857,10 @@ static int __init erofs_module_init(void) if (err) goto deflate_err; =20 + err =3D z_erofs_zstd_init(); + if (err) + goto zstd_err; + err =3D z_erofs_gbuf_init(); if (err) goto gbuf_err; @@ -882,6 +886,8 @@ static int __init erofs_module_init(void) zip_err: z_erofs_gbuf_exit(); gbuf_err: + z_erofs_zstd_exit(); +zstd_err: z_erofs_deflate_exit(); deflate_err: z_erofs_lzma_exit(); @@ -901,6 +907,7 @@ static void __exit erofs_module_exit(void) =20 erofs_exit_sysfs(); z_erofs_exit_zip_subsystem(); + z_erofs_zstd_exit(); z_erofs_deflate_exit(); z_erofs_lzma_exit(); erofs_exit_shrinker(); --=20 2.39.3