[PATCH v2] ext4: don't order data when zeroing unwritten or delayed block

Zhang Yi posted 1 patch 1 month, 2 weeks ago
fs/ext4/inode.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
[PATCH v2] ext4: don't order data when zeroing unwritten or delayed block
Posted by Zhang Yi 1 month, 2 weeks ago
From: Zhang Yi <yi.zhang@huawei.com>

When zeroing out a written partial block, it is necessary to order the
data to prevent exposing stale data on disk. However, if the buffer is
unwritten or delayed, it is not allocated as written, so ordering the
data is not required. This can prevent strange and unnecessary ordered
writes when appending data across a region within a block.

Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
and buffered write from 3K to 4K. Before this patch,
__ext4_block_zero_page_range() would add the range [2k,3k) to the
ordered range, and then the JBD2 commit process would write back this
block. However, it does nothing since the block is not mapped as
written, this folio will be redirtied and written back agian through the
normal write back process.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2e79b09fe2f0..f2d70c9af446 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4109,9 +4109,13 @@ static int __ext4_block_zero_page_range(handle_t *handle,
 	if (ext4_should_journal_data(inode)) {
 		err = ext4_dirty_journalled_data(handle, bh);
 	} else {
-		err = 0;
 		mark_buffer_dirty(bh);
-		if (ext4_should_order_data(inode))
+		/*
+		 * Only the written block requires ordered data to prevent
+		 * exposing stale data.
+		 */
+		if (!buffer_unwritten(bh) && !buffer_delay(bh) &&
+		    ext4_should_order_data(inode))
 			err = ext4_jbd2_inode_add_write(handle, inode, from,
 					length);
 	}
-- 
2.52.0
Re: [PATCH v2] ext4: don't order data when zeroing unwritten or delayed block
Posted by Theodore Ts'o 1 week, 3 days ago
On Tue, 23 Dec 2025 09:19:27 +0800, Zhang Yi wrote:
> When zeroing out a written partial block, it is necessary to order the
> data to prevent exposing stale data on disk. However, if the buffer is
> unwritten or delayed, it is not allocated as written, so ordering the
> data is not required. This can prevent strange and unnecessary ordered
> writes when appending data across a region within a block.
> 
> Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
> and buffered write from 3K to 4K. Before this patch,
> __ext4_block_zero_page_range() would add the range [2k,3k) to the
> ordered range, and then the JBD2 commit process would write back this
> block. However, it does nothing since the block is not mapped as
> written, this folio will be redirtied and written back agian through the
> normal write back process.
> 
> [...]

Applied, thanks!

[1/1] ext4: don't order data when zeroing unwritten or delayed block
      commit: 154922b34da9770223d9883ac6976635a786b5ba

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>
Re: [PATCH v2] ext4: don't order data when zeroing unwritten or delayed block
Posted by Baokun Li 1 month, 1 week ago
On 2025-12-23 09:19, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> When zeroing out a written partial block, it is necessary to order the
> data to prevent exposing stale data on disk. However, if the buffer is
> unwritten or delayed, it is not allocated as written, so ordering the
> data is not required. This can prevent strange and unnecessary ordered
> writes when appending data across a region within a block.
>
> Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
> and buffered write from 3K to 4K. Before this patch,
> __ext4_block_zero_page_range() would add the range [2k,3k) to the
> ordered range, and then the JBD2 commit process would write back this
> block. However, it does nothing since the block is not mapped as
> written, this folio will be redirtied and written back agian through the
> normal write back process.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Makes sense. Feel free to add:

Reviewed-by: Baokun Li <libaokun1@huawei.com>

> ---
>  fs/ext4/inode.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2e79b09fe2f0..f2d70c9af446 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4109,9 +4109,13 @@ static int __ext4_block_zero_page_range(handle_t *handle,
>  	if (ext4_should_journal_data(inode)) {
>  		err = ext4_dirty_journalled_data(handle, bh);
>  	} else {
> -		err = 0;
>  		mark_buffer_dirty(bh);
> -		if (ext4_should_order_data(inode))
> +		/*
> +		 * Only the written block requires ordered data to prevent
> +		 * exposing stale data.
> +		 */
> +		if (!buffer_unwritten(bh) && !buffer_delay(bh) &&
> +		    ext4_should_order_data(inode))
>  			err = ext4_jbd2_inode_add_write(handle, inode, from,
>  					length);
>  	}