[PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback

Tal Zussman posted 3 patches 1 week, 1 day ago
[PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Tal Zussman 1 week, 1 day ago
Set BIO_COMPLETE_IN_TASK on iomap writeback bios when
IOMAP_IOEND_DONTCACHE is set. This ensures that bi_end_io runs in task
context, where folio_end_dropbehind() can safely invalidate folios.

With the bio layer now handling task-context deferral generically, XFS
no longer needs to route DONTCACHE ioends through its completion
workqueue for page cache invalidation. Remove the DONTCACHE check from
xfs_ioend_needs_wq_completion().

Signed-off-by: Tal Zussman <tz2294@columbia.edu>
---
 fs/iomap/ioend.c  | 2 ++
 fs/xfs/xfs_aops.c | 4 ----
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
index e4d57cb969f1..6b8375d11cc0 100644
--- a/fs/iomap/ioend.c
+++ b/fs/iomap/ioend.c
@@ -113,6 +113,8 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
 			       GFP_NOFS, &iomap_ioend_bioset);
 	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
 	bio->bi_write_hint = wpc->inode->i_write_hint;
+	if (ioend_flags & IOMAP_IOEND_DONTCACHE)
+		bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
 	wbc_init_bio(wpc->wbc, bio);
 	wpc->nr_folios = 0;
 	return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 76678814f46f..0d469b91377d 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -510,10 +510,6 @@ xfs_ioend_needs_wq_completion(
 	if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
 		return true;
 
-	/* Page cache invalidation cannot be done in irq context. */
-	if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
-		return true;
-
 	return false;
 }
 

-- 
2.39.5
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Dave Chinner 1 week, 1 day ago
On Wed, Mar 25, 2026 at 02:43:01PM -0400, Tal Zussman wrote:
> Set BIO_COMPLETE_IN_TASK on iomap writeback bios when
> IOMAP_IOEND_DONTCACHE is set. This ensures that bi_end_io runs in task
> context, where folio_end_dropbehind() can safely invalidate folios.
> 
> With the bio layer now handling task-context deferral generically, XFS
> no longer needs to route DONTCACHE ioends through its completion
> workqueue for page cache invalidation. Remove the DONTCACHE check from
> xfs_ioend_needs_wq_completion().
> 
> Signed-off-by: Tal Zussman <tz2294@columbia.edu>
> ---
>  fs/iomap/ioend.c  | 2 ++
>  fs/xfs/xfs_aops.c | 4 ----
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index e4d57cb969f1..6b8375d11cc0 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -113,6 +113,8 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
>  			       GFP_NOFS, &iomap_ioend_bioset);
>  	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
>  	bio->bi_write_hint = wpc->inode->i_write_hint;
> +	if (ioend_flags & IOMAP_IOEND_DONTCACHE)
> +		bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
>  	wbc_init_bio(wpc->wbc, bio);
>  	wpc->nr_folios = 0;
>  	return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 76678814f46f..0d469b91377d 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -510,10 +510,6 @@ xfs_ioend_needs_wq_completion(
>  	if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))
>  		return true;
>  
> -	/* Page cache invalidation cannot be done in irq context. */
> -	if (ioend->io_flags & IOMAP_IOEND_DONTCACHE)
> -		return true;
> -
>  	return false;
>  }

Ok, so higher layers can set it.

At this point, I'd suggest that we should not be making random
one-off changes to the iomap and filesystem layers like this just
for one operation that needs deferred IO completion work. This needs
to considered from the overall perspective of how we defer
completion work -  there are lots of different paths through
filesystems and/or iomap that require/use task deferal for IO
completion. We want them all to use the same mechanism - splitting
deferal between multiple layers depending on IO type is not a
particularly nice thing to be doing...

-Dave.
-- 
Dave Chinner
dgc@kernel.org
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Christoph Hellwig 6 days, 17 hours ago
On Thu, Mar 26, 2026 at 07:34:45AM +1100, Dave Chinner wrote:
> At this point, I'd suggest that we should not be making random
> one-off changes to the iomap and filesystem layers like this just
> for one operation that needs deferred IO completion work. This needs
> to considered from the overall perspective of how we defer
> completion work -  there are lots of different paths through
> filesystems and/or iomap that require/use task deferal for IO
> completion. We want them all to use the same mechanism - splitting
> deferal between multiple layers depending on IO type is not a
> particularly nice thing to be doing...

Yes and no.  The XFS/iomap write completions needs special handling
for merging operation, using different workqueues, and also the
serialization provided by the per-inode list.

Everything that just needs a dumb user context should be the same,
though.  And this mechanism should work just fine for the T10 PI
checksums.  It does not currently work for the defer to user on error
used by the fserror reporting, but should be adaptable to that by
allowing to also defer an I/O completion from an already running
end_io handler, although that might get ugly.

It should work really well for other places that defer bio completions
like the erofs decompression handler that recently came up, and it will
be very useful to implement actually working REQ_NOWAIT support for
file system writes.  So yes, I think we need to look more at the whole
picture, and I think this is a good building block considering the
whole picture.  I don't think we can coverge on just a single mechanism,
but having few and generic ones is good.
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Gao Xiang 6 days, 17 hours ago
Hi Christiph,

On 2026/3/27 14:08, Christoph Hellwig wrote:
> On Thu, Mar 26, 2026 at 07:34:45AM +1100, Dave Chinner wrote:
>> At this point, I'd suggest that we should not be making random
>> one-off changes to the iomap and filesystem layers like this just
>> for one operation that needs deferred IO completion work. This needs
>> to considered from the overall perspective of how we defer
>> completion work -  there are lots of different paths through
>> filesystems and/or iomap that require/use task deferal for IO
>> completion. We want them all to use the same mechanism - splitting
>> deferal between multiple layers depending on IO type is not a
>> particularly nice thing to be doing...
> 
> Yes and no.  The XFS/iomap write completions needs special handling
> for merging operation, using different workqueues, and also the
> serialization provided by the per-inode list.
> 
> Everything that just needs a dumb user context should be the same,
> though.  And this mechanism should work just fine for the T10 PI
> checksums.  It does not currently work for the defer to user on error
> used by the fserror reporting, but should be adaptable to that by
> allowing to also defer an I/O completion from an already running
> end_io handler, although that might get ugly.
> 
> It should work really well for other places that defer bio completions
> like the erofs decompression handler that recently came up, and it will

I noticed this work, but typically the current EROFS
decompression has two latency-sensitive cases:

  - dm-verity calls EROFS completion, yes, in that case, this
    work can work well since dm-verity already takes some
    merkle tree latencies, and we just don't want to add more
    scheduling latencies with another workqueue;

  - use EROFS directly, in that case, we still need process
    contexts to decompress, but due to Android latency
    requirements, they really need per-cpu RT threads instead,
    otherwise it will cause serious regression too; but I'm not
    sure that case can be replaced by this work since workqueues
    don't support RT threads and I guess generic block layer
    won't be bothered with that too.

Thanks,
Gao Xiang

> be very useful to implement actually working REQ_NOWAIT support for
> file system writes.  So yes, I think we need to look more at the whole
> picture, and I think this is a good building block considering the
> whole picture.  I don't think we can coverge on just a single mechanism,
> but having few and generic ones is good.
> 
>
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Christoph Hellwig 6 days, 17 hours ago
On Fri, Mar 27, 2026 at 02:24:02PM +0800, Gao Xiang wrote:
>  - use EROFS directly, in that case, we still need process
>    contexts to decompress, but due to Android latency
>    requirements, they really need per-cpu RT threads instead,
>    otherwise it will cause serious regression too; but I'm not
>    sure that case can be replaced by this work since workqueues
>    don't support RT threads and I guess generic block layer
>    won't be bothered with that too.

All of the I/O completions should be latency sensitive.  So I think it
would be great if you could help out here with the requirements and
implementation.
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Gao Xiang 6 days, 17 hours ago

On 2026/3/27 14:27, Christoph Hellwig wrote:
> On Fri, Mar 27, 2026 at 02:24:02PM +0800, Gao Xiang wrote:
>>   - use EROFS directly, in that case, we still need process
>>     contexts to decompress, but due to Android latency
>>     requirements, they really need per-cpu RT threads instead,
>>     otherwise it will cause serious regression too; but I'm not
>>     sure that case can be replaced by this work since workqueues
>>     don't support RT threads and I guess generic block layer
>>     won't be bothered with that too.
> 
> All of the I/O completions should be latency sensitive.  So I think it
> would be great if you could help out here with the requirements and
> implementation.

Yes, especially for sync read completion. Our requirement can
be outlined as:

   - a mark to make the whole bio completion in task, so that
     we ensure that the bio completion is in the task context
     so that we don't need to worry about that;

   - another per-CPU RT thread flag (or similiar) relates to
     a bio or some other things, so that bio completion can be
     handled by per-cpu RT threads instead of workqueues
     instead.

If they meet, I think that would be very helpful to clean
up our internal codebase at least.

Thanks,
Gao Xiang
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Matthew Wilcox 1 week, 1 day ago
On Wed, Mar 25, 2026 at 02:43:01PM -0400, Tal Zussman wrote:
> Set BIO_COMPLETE_IN_TASK on iomap writeback bios when
> IOMAP_IOEND_DONTCACHE is set. This ensures that bi_end_io runs in task
> context, where folio_end_dropbehind() can safely invalidate folios.
> 
> With the bio layer now handling task-context deferral generically, XFS
> no longer needs to route DONTCACHE ioends through its completion
> workqueue for page cache invalidation. Remove the DONTCACHE check from
> xfs_ioend_needs_wq_completion().
> 
> Signed-off-by: Tal Zussman <tz2294@columbia.edu>
> ---
>  fs/iomap/ioend.c  | 2 ++
>  fs/xfs/xfs_aops.c | 4 ----
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/iomap/ioend.c b/fs/iomap/ioend.c
> index e4d57cb969f1..6b8375d11cc0 100644
> --- a/fs/iomap/ioend.c
> +++ b/fs/iomap/ioend.c
> @@ -113,6 +113,8 @@ static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
>  			       GFP_NOFS, &iomap_ioend_bioset);
>  	bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
>  	bio->bi_write_hint = wpc->inode->i_write_hint;
> +	if (ioend_flags & IOMAP_IOEND_DONTCACHE)
> +		bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
>  	wbc_init_bio(wpc->wbc, bio);
>  	wpc->nr_folios = 0;
>  	return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);

Can't we delete IOMAP_IOEND_DONTCACHE, and just do:

	if (folio_test_dropbehind(folio))
		bio_set_flag(&ioend->io_bio, BIO_COMPLETE_IN_TASK);

It'd need to move down a few lines in iomap_add_to_ioend() to after
bio_add_folio() succeeds.
Re: [PATCH RFC v4 2/3] iomap: use BIO_COMPLETE_IN_TASK for dropbehind writeback
Posted by Christoph Hellwig 6 days, 17 hours ago
On Wed, Mar 25, 2026 at 08:21:28PM +0000, Matthew Wilcox wrote:
> > +	if (ioend_flags & IOMAP_IOEND_DONTCACHE)
> > +		bio_set_flag(bio, BIO_COMPLETE_IN_TASK);
> >  	wbc_init_bio(wpc->wbc, bio);
> >  	wpc->nr_folios = 0;
> >  	return iomap_init_ioend(wpc->inode, bio, pos, ioend_flags);
> 
> Can't we delete IOMAP_IOEND_DONTCACHE, and just do:
> 
> 	if (folio_test_dropbehind(folio))
> 		bio_set_flag(&ioend->io_bio, BIO_COMPLETE_IN_TASK);
> 
> It'd need to move down a few lines in iomap_add_to_ioend() to after
> bio_add_folio() succeeds.

Yes, that sounds sensible.