[v5] zram: introduce writeback bio batching | Patchew

[RFC PATCHv5 0/6] zram: introduce writeback bio batching

Sergey Senozhatsky posted 6 patches 2 months, 2 weeks ago

Diff against v2 v3 v4 v5 v6
Download series mbox

There is a newer version of this series

drivers/block/zram/zram_drv.c | 470 ++++++++++++++++++++++++++--------
drivers/block/zram/zram_drv.h |   2 +-
2 files changed, 364 insertions(+), 108 deletions(-)

Expand all Fold all

[RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

RFC

This is a different approach compared to [1].  Instead of
using blk plug API to batch writeback bios, we just keep
submitting them and track available of done/idle requests
(we still use a pool of requests, to put a constraint on
memory usage).  The intuition is that blk plug API is good
for sequential IO patterns, but zram writeback is more
likely to use random IO patterns.

I only did minimal testing so far (in a VM).  More testing
(on real H/W) is needed, any help is highly appreciated.

[1] https://lore.kernel.org/linux-kernel/20251118073000.1928107-1-senozhatsky@chromium.org

v3 -> v4:
- do not use blk plug API

Sergey Senozhatsky (6):
  zram: introduce writeback bio batching
  zram: add writeback batch size device attr
  zram: take write lock in wb limit store handlers
  zram: drop wb_limit_lock
  zram: rework bdev block allocation
  zram: read slot block idx under slot lock

 drivers/block/zram/zram_drv.c | 470 ++++++++++++++++++++++++++--------
 drivers/block/zram/zram_drv.h |   2 +-
 2 files changed, 364 insertions(+), 108 deletions(-)

-- 
2.52.0.rc1.455.g30608eb744-goog

[RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Yuwen Chen 2 months, 2 weeks ago

On Fri, 21 Nov 2025 00:21:20 +0900, Sergey Senozhatsky wrote:
> This is a different approach compared to [1].  Instead of
> using blk plug API to batch writeback bios, we just keep
> submitting them and track available of done/idle requests
> (we still use a pool of requests, to put a constraint on
> memory usage).  The intuition is that blk plug API is good
> for sequential IO patterns, but zram writeback is more
> likely to use random IO patterns.

> I only did minimal testing so far (in a VM).  More testing
> (on real H/W) is needed, any help is highly appreciated.

I conducted a test on an NVMe host. When all requests were random,
this fix was indeed a bit faster than the previous one.

before:
real	0m0.261s
user	0m0.000s
sys	0m0.243s

real	0m0.260s
user	0m0.000s
sys	0m0.244s

real	0m0.259s
user	0m0.000s
sys	0m0.243s

after:
real	0m0.322s
user	0m0.000s
sys	0m0.214s

real	0m0.326s
user	0m0.000s
sys	0m0.206s

real	0m0.325s
user	0m0.000s
sys	0m0.215s

This result is something to be happy about. However, I'm also quite
curious about the test results on devices like UFS, which have
relatively less internal memory.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/21 15:14), Yuwen Chen wrote:
> On Fri, 21 Nov 2025 00:21:20 +0900, Sergey Senozhatsky wrote:
> > This is a different approach compared to [1].  Instead of
> > using blk plug API to batch writeback bios, we just keep
> > submitting them and track available of done/idle requests
> > (we still use a pool of requests, to put a constraint on
> > memory usage).  The intuition is that blk plug API is good
> > for sequential IO patterns, but zram writeback is more
> > likely to use random IO patterns.
> 
> > I only did minimal testing so far (in a VM).  More testing
> > (on real H/W) is needed, any help is highly appreciated.
> 
> I conducted a test on an NVMe host. When all requests were random,
> this fix was indeed a bit faster than the previous one.

Is "before" blk-plug based approach and "after" this new approach?

> before:
> real	0m0.261s
> user	0m0.000s
> sys	0m0.243s
> 
> real	0m0.260s
> user	0m0.000s
> sys	0m0.244s
> 
> real	0m0.259s
> user	0m0.000s
> sys	0m0.243s
> 
> after:
> real	0m0.322s
> user	0m0.000s
> sys	0m0.214s
> 
> real	0m0.326s
> user	0m0.000s
> sys	0m0.206s
> 
> real	0m0.325s
> user	0m0.000s
> sys	0m0.215s

Hmm that's less than was anticipated.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Yuwen Chen 2 months, 2 weeks ago

On Fri, 21 Nov 2025 16:32:27 +0900, Sergey Senozhatsky wrote:
> Is "before" blk-plug based approach and "after" this new approach?

Sorry, I got the before and after mixed up.


In addition, I also have some related questions to consult:

1. Will page fault exceptions be delayed during the writeback processing?
2. Since the loop device uses a work queue to handle requests, when
the system load is relatively high, will it have a relatively large
impact on the latency of page fault exceptions? Is there any way to solve
this problem?

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/21 15:44), Yuwen Chen wrote:
> On Fri, 21 Nov 2025 16:32:27 +0900, Sergey Senozhatsky wrote:
> > Is "before" blk-plug based approach and "after" this new approach?
> 
> Sorry, I got the before and after mixed up.

No problem.  I wonder if the effect is more visible on larger data sets.
0.3 second sounds like a very short write.  In my VM tests I couldn't get
more than 2 inflight requests at a time, I guess because decompression
was much slower than IO.  I wonder how many inflight requests you had in
your tests.

> In addition, I also have some related questions to consult:
> 
> 1. Will page fault exceptions be delayed during the writeback processing?

I don't think our reads are blocked by writes.

> 2. Since the loop device uses a work queue to handle requests, when
> the system load is relatively high, will it have a relatively large
> impact on the latency of page fault exceptions? Is there any way to solve
> this problem?

I think page-fault latency of a written-back page is expected to be
higher, that's a trade-off that we agree on.  Off the top of my head,
I don't think we can do anything about it.

Is loop device always used as for writeback targets?

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Yuwen Chen 2 months, 2 weeks ago

On Fri, 21 Nov 2025 16:58:41 +0900, Sergey Senozhatsky wrote:
> No problem.  I wonder if the effect is more visible on larger data sets.
> 0.3 second sounds like a very short write.  In my VM tests I couldn't get
> more than 2 inflight requests at a time, I guess because decompression
> was much slower than IO.  I wonder how many inflight requests you had in
> your tests.

I used the following code for testing here, and the result was 32.

code:
@@ -983,6 +983,7 @@ static int zram_writeback_slots(struct zram *zram,
        struct zram_pp_slot *pps;
        int ret = 0, err = 0;
        u32 index = 0;
+       int inflight = 0;
 
        while ((pps = select_pp_slot(ctl))) {
                spin_lock(&zram->wb_limit_lock);
@@ -993,6 +994,9 @@ static int zram_writeback_slots(struct zram *zram,
                }
                spin_unlock(&zram->wb_limit_lock);
 
+               if (inflight < atomic_read(&wb_ctl->num_inflight))
+                       inflight = atomic_read(&wb_ctl->num_inflight);
+
                while (!req) {
                        req = zram_select_idle_req(wb_ctl);
                        if (req)
@@ -1074,6 +1078,7 @@ next:
                        ret = err;
        }
 
+       pr_err("%s: inflight max: %d\n", __func__, inflight);
        return ret;
 }

log: 
[3741949.842927] zram: zram_writeback_slots: inflight max: 32

Changing ZRAM_WB_REQ_CNT to 64 didn't shorten the overall time.

> I think page-fault latency of a written-back page is expected to be
> higher, that's a trade-off that we agree on.  Off the top of my head,
> I don't think we can do anything about it.
>
> Is loop device always used as for writeback targets?

On the Android platform, currently only the loop device is supported as
the backend for writeback, possibly for security reasons. I noticed that
EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
latency. I think ZRAM might also be able to do this.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/21 16:23), Yuwen Chen wrote:
> I used the following code for testing here, and the result was 32.
> 
> code:
> @@ -983,6 +983,7 @@ static int zram_writeback_slots(struct zram *zram,
>         struct zram_pp_slot *pps;
>         int ret = 0, err = 0;
>         u32 index = 0;
> +       int inflight = 0;
>  
>         while ((pps = select_pp_slot(ctl))) {
>                 spin_lock(&zram->wb_limit_lock);
> @@ -993,6 +994,9 @@ static int zram_writeback_slots(struct zram *zram,
>                 }
>                 spin_unlock(&zram->wb_limit_lock);
>  
> +               if (inflight < atomic_read(&wb_ctl->num_inflight))
> +                       inflight = atomic_read(&wb_ctl->num_inflight);
> +
>                 while (!req) {
>                         req = zram_select_idle_req(wb_ctl);
>                         if (req)
> @@ -1074,6 +1078,7 @@ next:
>                         ret = err;
>         }
>  
> +       pr_err("%s: inflight max: %d\n", __func__, inflight);
>         return ret;
>  }

I think this will always give you 32 (or you current batch size limit),
just because the way it works - we first deplete all ->idle (reaching
max ->inflight) and only then complete finished requests (dropping
->inflight).

I had a version of the patch that had different main loop. It would
always first complete finished requests.  I think this one will give
accurate ->inflight number.

---
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ab0785878069..398609e9d061 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -999,13 +999,6 @@ static int zram_writeback_slots(struct zram *zram,
 		}
 
 		while (!req) {
-			req = zram_select_idle_req(wb_ctl);
-			if (req)
-				break;
-
-			wait_event(wb_ctl->done_wait,
-				   !list_empty(&wb_ctl->done_reqs));
-
 			err = zram_complete_done_reqs(zram, wb_ctl);
 			/*
 			 * BIO errors are not fatal, we continue and simply
@@ -1017,6 +1010,13 @@ static int zram_writeback_slots(struct zram *zram,
 			 */
 			if (err)
 				ret = err;
+
+			req = zram_select_idle_req(wb_ctl);
+			if (req)
+				break;
+
+			wait_event(wb_ctl->done_wait,
+				   !list_empty(&wb_ctl->done_reqs));
 		}
 
 		if (blk_idx == INVALID_BDEV_BLOCK) {

---

> > I think page-fault latency of a written-back page is expected to be
> > higher, that's a trade-off that we agree on.  Off the top of my head,
> > I don't think we can do anything about it.
> >
> > Is loop device always used as for writeback targets?
> 
> On the Android platform, currently only the loop device is supported as
> the backend for writeback, possibly for security reasons. I noticed that
> EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
> latency. I think ZRAM might also be able to do this.

I see.  Do you use S/W or H/W compression?

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago

On 2025/11/21 17:12, Sergey Senozhatsky wrote:
> On (25/11/21 16:23), Yuwen Chen wrote:

..

>>> I think page-fault latency of a written-back page is expected to be
>>> higher, that's a trade-off that we agree on.  Off the top of my head,
>>> I don't think we can do anything about it.
>>>
>>> Is loop device always used as for writeback targets?
>>
>> On the Android platform, currently only the loop device is supported as
>> the backend for writeback, possibly for security reasons. I noticed that
>> EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
>> latency. I think ZRAM might also be able to do this.
> 
> I see.  Do you use S/W or H/W compression?

No, I'm pretty sure it's impossible for zram to access
file I/Os without another thread context (e.g. workqueue),
especially for write I/Os, which is unlike erofs:

EROFS can do because EROFS is a specific filesystem, you
could see it's a seperate fs, and it can only read (no
write context) backing files in erofs and/or other fses,
which is much like vfs/overlayfs read_iter() directly
going into the backing fses without nested contexts.
(Even if loop is used, it will create its own thread
contexts with workqueues, which is safe.)

  In the other hand, zram/loop can act as a virtual block
device which is rather different, which means you could
format an ext4 filesystem and backing another ext4/btrfs,
like this:

   zram(ext4) -> backing ext4/btrfs

It's unsafe (in addition to GFP_NOIO allocation
restriction) since zram cannot manage those ext4/btrfs
existing contexts:

  - Take one detailed example, if the upper zram ext4
assigns current->journal_info = xxx, and submit_bio() to
zram, which will confuse the backing ext4 since it should
assume current->journal_info == NULL, so the virtual block
devices need another thread context to isolate those two
different uncontrolled contexts.

So I don't think it's feasible for block drivers to act
like this, especially mixing with writing to backing fses
operations.

Thanks,
Gao Xiang

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/21 20:21), Gao Xiang wrote:
> > > > I think page-fault latency of a written-back page is expected to be
> > > > higher, that's a trade-off that we agree on.  Off the top of my head,
> > > > I don't think we can do anything about it.
> > > > 
> > > > Is loop device always used as for writeback targets?
> > > 
> > > On the Android platform, currently only the loop device is supported as
> > > the backend for writeback, possibly for security reasons. I noticed that
> > > EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
> > > latency. I think ZRAM might also be able to do this.
> > 
> > I see.  Do you use S/W or H/W compression?
> 
> No, I'm pretty sure it's impossible for zram to access
> file I/Os without another thread context (e.g. workqueue),
> especially for write I/Os, which is unlike erofs:
> 
> EROFS can do because EROFS is a specific filesystem, you
> could see it's a seperate fs, and it can only read (no
> write context) backing files in erofs and/or other fses,
> which is much like vfs/overlayfs read_iter() directly
> going into the backing fses without nested contexts.
> (Even if loop is used, it will create its own thread
> contexts with workqueues, which is safe.)
> 
>  In the other hand, zram/loop can act as a virtual block
> device which is rather different, which means you could
> format an ext4 filesystem and backing another ext4/btrfs,
> like this:
> 
>   zram(ext4) -> backing ext4/btrfs
> 
> It's unsafe (in addition to GFP_NOIO allocation
> restriction) since zram cannot manage those ext4/btrfs
> existing contexts:
> 
>  - Take one detailed example, if the upper zram ext4
> assigns current->journal_info = xxx, and submit_bio() to
> zram, which will confuse the backing ext4 since it should
> assume current->journal_info == NULL, so the virtual block
> devices need another thread context to isolate those two
> different uncontrolled contexts.
> 
> So I don't think it's feasible for block drivers to act
> like this, especially mixing with writing to backing fses
> operations.

Sorry, I don't completely understand your point, but backing
device is never expected to have any fs on it.  So from your
email:

> zram(ext4) -> backing ext4/btrfs

This is not a valid configuration, as far as I'm concerned.
Unless I'm missing your point.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago


On 2025/11/22 18:07, Sergey Senozhatsky wrote:
> On (25/11/21 20:21), Gao Xiang wrote:
>>>>> I think page-fault latency of a written-back page is expected to be
>>>>> higher, that's a trade-off that we agree on.  Off the top of my head,
>>>>> I don't think we can do anything about it.
>>>>>
>>>>> Is loop device always used as for writeback targets?
>>>>
>>>> On the Android platform, currently only the loop device is supported as
>>>> the backend for writeback, possibly for security reasons. I noticed that
>>>> EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
>>>> latency. I think ZRAM might also be able to do this.
>>>
>>> I see.  Do you use S/W or H/W compression?
>>
>> No, I'm pretty sure it's impossible for zram to access
>> file I/Os without another thread context (e.g. workqueue),
>> especially for write I/Os, which is unlike erofs:
>>
>> EROFS can do because EROFS is a specific filesystem, you
>> could see it's a seperate fs, and it can only read (no
>> write context) backing files in erofs and/or other fses,
>> which is much like vfs/overlayfs read_iter() directly
>> going into the backing fses without nested contexts.
>> (Even if loop is used, it will create its own thread
>> contexts with workqueues, which is safe.)
>>
>>   In the other hand, zram/loop can act as a virtual block
>> device which is rather different, which means you could
>> format an ext4 filesystem and backing another ext4/btrfs,
>> like this:
>>
>>    zram(ext4) -> backing ext4/btrfs
>>
>> It's unsafe (in addition to GFP_NOIO allocation
>> restriction) since zram cannot manage those ext4/btrfs
>> existing contexts:
>>
>>   - Take one detailed example, if the upper zram ext4
>> assigns current->journal_info = xxx, and submit_bio() to
>> zram, which will confuse the backing ext4 since it should
>> assume current->journal_info == NULL, so the virtual block
>> devices need another thread context to isolate those two
>> different uncontrolled contexts.
>>
>> So I don't think it's feasible for block drivers to act
>> like this, especially mixing with writing to backing fses
>> operations.
> 
> Sorry, I don't completely understand your point, but backing
> device is never expected to have any fs on it.  So from your
> email:

zram(ext4) means zram device itself is formated as ext4.

> 
>> zram(ext4) -> backing ext4/btrfs
> 
> This is not a valid configuration, as far as I'm concerned.
> Unless I'm missing your point.

Why it's not valid? zram can be used as a regular virtual
block device, and format with any fs, and mount the zram
then.

Thanks,
Gao Xiang

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/22 20:24), Gao Xiang wrote:
> > 
> > > zram(ext4) -> backing ext4/btrfs
> > 
> > This is not a valid configuration, as far as I'm concerned.
> > Unless I'm missing your point.
> 
> Why it's not valid? zram can be used as a regular virtual
> block device, and format with any fs, and mount the zram
> then.

If you want to move data between two filesystems, then just
mount both devices and cp/mv data between them.  zram is not
going to do that for you, zram writeback is for different
purpose.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago

On 2025/11/23 08:22, Sergey Senozhatsky wrote:
> On (25/11/22 20:24), Gao Xiang wrote:
>>>
>>>> zram(ext4) -> backing ext4/btrfs
>>>
>>> This is not a valid configuration, as far as I'm concerned.
>>> Unless I'm missing your point.
>>
>> Why it's not valid? zram can be used as a regular virtual
>> block device, and format with any fs, and mount the zram
>> then.
> 
> If you want to move data between two filesystems, then just
> mount both devices and cp/mv data between them.  zram is not
> going to do that for you, zram writeback is for different
> purpose.

No, I know what zram writeback is and I was definitely not
saying using zram writeback device to mount something (if
you have interest, just check out my first reply, it's
already clear.  Also you can know why loop devices need a
workqueue or a kthread since pre-v2.6 in the first place
just because of the same reason).  I want to stop here
because it's none of my business.

Thanks,
Gao Xiang

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/22 20:24), Gao Xiang wrote:
> zram(ext4) means zram device itself is formated as ext4.
> 
> > 
> > > zram(ext4) -> backing ext4/btrfs
> > 
> > This is not a valid configuration, as far as I'm concerned.
> > Unless I'm missing your point.
> 
> Why it's not valid? zram can be used as a regular virtual
> block device, and format with any fs, and mount the zram
> then.

I thought you were talking about the backing device being
ext4/btrfs.  Sorry, I don't have enough context/knowledge
to understand what you're getting at.  zram has been doing
writeback for ages, I really don't know what you mean by
"to act like this".

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago

On 2025/11/22 21:43, Sergey Senozhatsky wrote:
> On (25/11/22 20:24), Gao Xiang wrote:
>> zram(ext4) means zram device itself is formated as ext4.
>>
>>>
>>>> zram(ext4) -> backing ext4/btrfs
>>>
>>> This is not a valid configuration, as far as I'm concerned.
>>> Unless I'm missing your point.
>>
>> Why it's not valid? zram can be used as a regular virtual
>> block device, and format with any fs, and mount the zram
>> then.
> 
> I thought you were talking about the backing device being
> ext4/btrfs.  Sorry, I don't have enough context/knowledge
> to understand what you're getting at.  zram has been doing
> writeback for ages, I really don't know what you mean by
> "to act like this".

I mean, if zram is formatted as ext4, and then mount it;
and then there is a backing file which is also in another
ext4, you'd need a workqueue to do writeback I/Os (or needs
a loop device to transit), was that the original question
raised by Yuwen?

If it's backed by a physical device rather than a file in
a filesystem, such potential problem doesn't exist.

Thanks,
Gao Xiang

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Sergey Senozhatsky 2 months, 2 weeks ago

On (25/11/22 22:09), Gao Xiang wrote:
> > I thought you were talking about the backing device being
> > ext4/btrfs.  Sorry, I don't have enough context/knowledge
> > to understand what you're getting at.  zram has been doing
> > writeback for ages, I really don't know what you mean by
> > "to act like this".
> 
> I mean, if zram is formatted as ext4, and then mount it;
> and then there is a backing file which is also in another
> ext4, you'd need a workqueue to do writeback I/Os (or needs
> a loop device to transit), was that the original question
> raised by Yuwen?

We take pages of data from zram0 and write them straight to
the backing device.  Those writes don't go through vfs/fs so
fs on the backing device will simply be corrupted, as far as
I can tell.  This is not intendant use case for zram writeback.

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago


On 2025/11/23 08:08, Sergey Senozhatsky wrote:
> On (25/11/22 22:09), Gao Xiang wrote:
>>> I thought you were talking about the backing device being
>>> ext4/btrfs.  Sorry, I don't have enough context/knowledge
>>> to understand what you're getting at.  zram has been doing
>>> writeback for ages, I really don't know what you mean by
>>> "to act like this".
>>
>> I mean, if zram is formatted as ext4, and then mount it;
>> and then there is a backing file which is also in another
>> ext4, you'd need a workqueue to do writeback I/Os (or needs
>> a loop device to transit), was that the original question
>> raised by Yuwen?
> 
> We take pages of data from zram0 and write them straight to
> the backing device.  Those writes don't go through vfs/fs so
> fs on the backing device will simply be corrupted, as far as
> I can tell.  This is not intendant use case for zram writeback.

I'm pretty sure you don't understand what I meant.

I won't reply this anymore, good luck.

Thanks,
Gao Xiang

Re: [RFC PATCHv5 0/6] zram: introduce writeback bio batching

Posted by Gao Xiang 2 months, 2 weeks ago


On 2025/11/21 20:21, Gao Xiang wrote:
> 
> 
> On 2025/11/21 17:12, Sergey Senozhatsky wrote:
>> On (25/11/21 16:23), Yuwen Chen wrote:
> 
> ..
> 
> 
>>>> I think page-fault latency of a written-back page is expected to be
>>>> higher, that's a trade-off that we agree on.  Off the top of my head,
>>>> I don't think we can do anything about it.
>>>>
>>>> Is loop device always used as for writeback targets?
>>>
>>> On the Android platform, currently only the loop device is supported as
>>> the backend for writeback, possibly for security reasons. I noticed that
>>> EROFS has implemented a CONFIG_EROFS_FS_BACKED_BY_FILE to reduce this
>>> latency. I think ZRAM might also be able to do this.
>>
>> I see.  Do you use S/W or H/W compression?
> 
> No, I'm pretty sure it's impossible for zram to access
> file I/Os without another thread context (e.g. workqueue),
> especially for write I/Os, which is unlike erofs:
> 
> EROFS can do because EROFS is a specific filesystem, you
> could see it's a seperate fs, and it can only read (no
> write context) backing files in erofs and/or other fses,
> which is much like vfs/overlayfs read_iter() directly
> going into the backing fses without nested contexts.
> (Even if loop is used, it will create its own thread
> contexts with workqueues, which is safe.)
> 
>   In the other hand, zram/loop can act as a virtual block
> device which is rather different, which means you could
> format an ext4 filesystem and backing another ext4/btrfs,
> like this:
> 
>    zram(ext4) -> backing ext4/btrfs
> 
> It's unsafe (in addition to GFP_NOIO allocation
> restriction) since zram cannot manage those ext4/btrfs
> existing contexts:
> 
>   - Take one detailed example, if the upper zram ext4
> assigns current->journal_info = xxx, and submit_bio() to
> zram, which will confuse the backing ext4 since it should
> assume current->journal_info == NULL, so the virtual block
> devices need another thread context to isolate those two
> different uncontrolled contexts.
> 
> So I don't think it's feasible for block drivers to act
> like this, especially mixing with writing to backing fses
> operations.

In other words, a fs claims it can do file-backed-mounts
without a new context only if:

  - Its own implementation can be safely applied to any
    other kernel filesystem (e.g., it shouldn't change
    current->journal_info or do context save/restore before
    handing over, for example); and its own implementation
    can safely mount itself with file-backed mounts.

So it's filesystem-specific internals to make sure it can
work like this (for example ext4 on erofs, ext4 still uses
loop to mount). The virtual block device layer knows
nothing about what the upper filesystem did before the
execution passes through, so it's unsafe to work like this.

Thanks,
Gao Xiang