drivers/block/zram/backend_lz4.c | 464 +++++++++++++++++++++++++++++++++++++-- drivers/block/zram/zcomp.c | 85 +++++-- drivers/block/zram/zcomp.h | 35 ++- drivers/block/zram/zram_drv.c | 29 ++- 4 files changed, 562 insertions(+), 51 deletions(-)
Hi all,
This RFC series introduces a new interface to allow zram compression
backends to manage their own streams, in addition to the existing
per-CPU stream model.
Currently, zram manages compression contexts via preemptive per-CPU
streams, which strictly limits concurrency to the number of online CPUs.
In contrast, hardware accelerators specialized for page compression
generally process PAGE_SIZE payloads (e.g. 4K) using standard
algorithms. These devices expose the limitations of the current model
due to the following features:
- These devices utilize a hardware queue to batch requests. A typical
queue depth (e.g., 256) far exceeds the number of available CPUs.
- These devices are asymmetric. Submission is generally fast and
asynchronous, but completion implies latency.
- Some devices only support compression requests, leaving decompression
to be handled by software.
The current "one-size-fits-all" design lacks the flexibility to support
these devices, preventing effective offloading of compression work.
This series proposes a hybrid approach. While maintaining full backward
compatibility with existing backends, this series introduces a new set
of operations, op->{get, put}_stream(), for backends that wish to manage
their own streams. This allows the backend to handle contention
internally and dynamically select an execution path for the acquired
streams. A new flag is also introduced to indicate this capability at
runtime. zram_write_page() now prefers streams managed by the backend if
a bio is considered asynchronous.
Some design decisions are as follows.
1. The proposed get_stream() does not take gfp_t flags to keep the
interface minimal. By design, backends are fully responsible for
allocation safety.
2. The default per-cpu streams now also imply a synchronous path for the
backends.
3. The recompression path currently relies on the default per-cpu
streams. This is a trade-off, since recompression is primarily for
memory saving, and hardware accelerators typically prioritize
throughput over compression ratio.
4. Backends must implement internal locking if required.
This RFC series focuses on the stream management interface required for
accelerator backends, laying the groundwork for batched asynchronous
operations in zram. Since I cannot verify this on specific accelerators
at this moment, a PoC patch that simulates this behavior in software is
included to verify new stream operations without requiring specific
accelerators. The next step would be to add a non-blocking interface to
fully utilize their concurrency, and allow backends to be built as
separate modules. Any feedback would be greatly appreciated.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
Changes in v2:
- Decouple locking from per-CPU streams by introducing struct
percpu_zstrm (PATCH 2/5)
- Refactor zcomp-managed streams to use struct managed_zstrm (PATCH 3/5)
- Add PoC zcomp-managed streams for lz4 backend (PATCH 5/5, only for
demonstration)
- Rebase to v7.0-rc2
- Link to v1: https://lore.kernel.org/r/20260204-b4_zcomp_stream-v1-0-35c06ce1d332@gmail.com
---
Jihan LIN (5):
zram: Rename zcomp_strm_{init, free}()
zram: Separate the lock from zcomp_strm
zram: Introduce zcomp-managed streams
zram: Use zcomp-managed streams for async write requests
zram: Add lz4 PoC for zcomp-managed streams
drivers/block/zram/backend_lz4.c | 464 +++++++++++++++++++++++++++++++++++++--
drivers/block/zram/zcomp.c | 85 +++++--
drivers/block/zram/zcomp.h | 35 ++-
drivers/block/zram/zram_drv.c | 29 ++-
4 files changed, 562 insertions(+), 51 deletions(-)
---
base-commit: 11439c4635edd669ae435eec308f4ab8a0804808
change-id: 20260202-b4_zcomp_stream-7e9f7884e128
Best regards,
--
Jihan LIN <linjh22s@gmail.com>
A quick question: On (26/03/09 12:23), Jihan LIN via B4 Relay wrote: > This RFC series focuses on the stream management interface required for > accelerator backends, laying the groundwork for batched asynchronous > operations in zram. Since I cannot verify this on specific accelerators > at this moment, a PoC patch that simulates this behavior in software is > included to verify new stream operations without requiring specific > accelerators. The next step would be to add a non-blocking interface to > fully utilize their concurrency, and allow backends to be built as > separate modules. Any feedback would be greatly appreciated. So does such a hardware exist? This series is a little too complex, so it better solve some real problem, so to speak, before we start looking into it.
Hi Sergey, On Wed, Mar 11, 2026 at 4:52 PM Sergey Senozhatsky <senozhatsky@chromium.org> wrote: > So does such a hardware exist? Yes, there are a few examples, as far as I know. LZ4 is relevant here because it is widely used, and decompression is already very fast on the CPU. So a compression-only accelerator makes sense. HiSilicon has hisi_zip in its server SoCs. For LZ4, hisi_zip offloads compression only, with decompression handled in software[1]. I also found some out-of-tree examples, such as qpace_drv for SM8845 in OnePlus's tree[2] and mtk_hwz for some MediaTek SoCs from Samsung[3]. These examples suggest a similar setup: queue-based hardware, while software (or synchronous paths) is still used for some decompression paths. These are the kinds of devices I had in mind, and deeper hardware queues do not fit well into the current model. Well, I don't have any of these on hand yet, but this is a kind of use case behind this series. [1]: https://lore.kernel.org/all/20260117023435.1616703-1-huangchenghai2@huawei.com/ [2]: https://github.com/OnePlusOSS/android_kernel_oneplus_sm8845/tree/ecfc67b9e933937140df7a1cf39060de8dbd11be/drivers/block/zram [3]: https://github.com/samsung-mediatek/android_kernel_device_modules-6.12/tree/4749bfe7783c045f53c50160e05b67a9a2acc3f4/drivers/misc/mediatek/mtk_zram
© 2016 - 2026 Red Hat, Inc.