[v2] iomap: allow partial folio write with iomap_folio_state

[PATCH v2 4/4] iomap: don't abandon the whole thing with iomap_folio_state

Posted by alexjlzheng@gmail.com 1 month, 3 weeks ago

From: Jinliang Zheng <alexjlzheng@tencent.com>

With iomap_folio_state, we can identify uptodate states at the block
level, and a read_folio reading can correctly handle partially
uptodate folios.

Therefore, when a partial write occurs, accept the block-aligned
partial write instead of rejecting the entire write.

Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
---
 fs/iomap/buffered-io.c | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index f80386a57d37..19bf879f3333 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -873,6 +873,25 @@ static int iomap_write_begin(struct iomap_iter *iter,
 	return status;
 }
 
+static int iomap_trim_tail_partial(struct inode *inode, loff_t pos,
+		size_t copied, struct folio *folio)
+{
+	struct iomap_folio_state *ifs = folio->private;
+	unsigned block_size, last_blk, last_blk_bytes;
+
+	if (!ifs || !copied)
+		return 0;
+
+	block_size = 1 << inode->i_blkbits;
+	last_blk = offset_in_folio(folio, pos + copied - 1) >> inode->i_blkbits;
+	last_blk_bytes = (pos + copied) & (block_size - 1);
+
+	if (!ifs_block_is_uptodate(ifs, last_blk))
+		copied -= min(copied, last_blk_bytes);
+
+	return copied;
+}
+
 static int __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 		size_t copied, struct folio *folio)
 {
@@ -886,12 +905,15 @@ static int __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
 	 * read_folio might come in and destroy our partial write.
 	 *
 	 * Do the simplest thing and just treat any short write to a
-	 * non-uptodate page as a zero-length write, and force the caller to
-	 * redo the whole thing.
+	 * non-uptodate block as a zero-length write, and force the caller to
+	 * redo the things begin from the block.
 	 */
-	if (unlikely(copied < len && !folio_test_uptodate(folio)))
-		return 0;
-	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
+	if (unlikely(copied < len && !folio_test_uptodate(folio))) {
+		copied = iomap_trim_tail_partial(inode, pos, copied, folio);
+		if (!copied)
+			return 0;
+	}
+	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), copied);
 	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
 	filemap_dirty_folio(inode->i_mapping, folio);
 	return copied;
-- 
2.49.0

Re: [PATCH v2 4/4] iomap: don't abandon the whole thing with iomap_folio_state

Posted by Christoph Hellwig 1 month, 3 weeks ago

Where "the whole thing" is the current iteration in the write loop.
Can you spell this out a bit better?

Also please include the rationale why you are changing the logic
here in the commit log.

Re: [PATCH v2 4/4] iomap: don't abandon the whole thing with iomap_folio_state

Posted by Jinliang Zheng 1 month, 3 weeks ago

On Mon, 11 Aug 2025 03:41:39 -0700, Christoph Hellwig wrote:
> Where "the whole thing" is the current iteration in the write loop.
> Can you spell this out a bit better?

Hahaha, I was also confused about "the whole thing". I guess it refers to a
partial write in a folio. It appears in the comments of __iomap_write_end().

static bool __iomap_write_end(struct inode *inode, loff_t pos, size_t len,
		size_t copied, struct folio *folio)
{
	flush_dcache_folio(folio);

	/*
	 * The blocks that were entirely written will now be uptodate, so we
	 * don't have to worry about a read_folio reading them and overwriting a
	 * partial write.  However, if we've encountered a short write and only
	 * partially written into a block, it will not be marked uptodate, so a
	 * read_folio might come in and destroy our partial write.
	 *
	 * Do the simplest thing and just treat any short write to a
	 * non-uptodate page as a zero-length write, and force the caller to
	 * redo the whole thing.
                ^^^^^^^^^^^^^^^ <------------------ look look look, it's here :)
	 */
	if (unlikely(copied < len && !folio_test_uptodate(folio)))
		return false;
	iomap_set_range_uptodate(folio, offset_in_folio(folio, pos), len);
	iomap_set_range_dirty(folio, offset_in_folio(folio, pos), copied);
	filemap_dirty_folio(inode->i_mapping, folio);
	return true;
}

> 
> Also please include the rationale why you are changing the logic
> here in the commit log.

Hahaha, what I want to express is that we no longer need to define partial write
based on folio granularity, it is more appropriate to use block granularity.

Please forgive my poor English. :-<

thanks,
Jinliang Zheng :)

[PATCH v2 1/4] iomap: make sure iomap_adjust_read_range() are aligned with block_size
[PATCH v2 2/4] iomap: move iter revert case out of the unwritten branch
[PATCH v2 3/4] iomap: make iomap_write_end() return the number of written length again
[PATCH v2 4/4] iomap: don't abandon the whole thing with iomap_folio_state