[v1] ext4: don't order data when zeroing unwritten or delayed block

[PATCH] ext4: don't order data when zeroing unwritten or delayed block

Posted by Zhang Yi 1 month, 2 weeks ago

From: Zhang Yi <yi.zhang@huawei.com>

When zeroing out a written partial block, it is necessary to order the
data to prevent exposing stale data on disk. However, if the buffer is
unwritten or delayed, it is not allocated as written, so ordering the
data is not required. This can prevent strange and unnecessary ordered
writes when appending data across a region within a block.

Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
and buffered write from 3K to 4K. Before this patch,
__ext4_block_zero_page_range() would add the range [2k,3k) to the
ordered range, and then the JBD2 commit process would write back this
block. However, it does nothing since the block is not mapped, this
folio will be redirtied and written back agian through the normal write
back process.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa579e857baf..fc16a89903b9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4104,9 +4104,13 @@ static int __ext4_block_zero_page_range(handle_t *handle,
 	if (ext4_should_journal_data(inode)) {
 		err = ext4_dirty_journalled_data(handle, bh);
 	} else {
-		err = 0;
 		mark_buffer_dirty(bh);
-		if (ext4_should_order_data(inode))
+		/*
+		 * Only the written block requires ordered data to prevent
+		 * exposing stale data.
+		 */
+		if (!buffer_unwritten(bh) && !buffer_delay(bh) &&
+		    ext4_should_order_data(inode))
 			err = ext4_jbd2_inode_add_write(handle, inode, from,
 					length);
 	}
-- 
2.52.0

Re: [PATCH] ext4: don't order data when zeroing unwritten or delayed block

Posted by Jan Kara 1 month, 2 weeks ago

On Mon 22-12-25 09:31:36, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> When zeroing out a written partial block, it is necessary to order the
> data to prevent exposing stale data on disk. However, if the buffer is
> unwritten or delayed, it is not allocated as written, so ordering the
> data is not required. This can prevent strange and unnecessary ordered
> writes when appending data across a region within a block.
> 
> Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
> and buffered write from 3K to 4K. Before this patch,
> __ext4_block_zero_page_range() would add the range [2k,3k) to the
> ordered range, and then the JBD2 commit process would write back this
> block. However, it does nothing since the block is not mapped, this
							^^^ by this you
mean that the block is unwritten, don't you?

> folio will be redirtied and written back agian through the normal write
> back process.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

The patch looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index fa579e857baf..fc16a89903b9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -4104,9 +4104,13 @@ static int __ext4_block_zero_page_range(handle_t *handle,
>  	if (ext4_should_journal_data(inode)) {
>  		err = ext4_dirty_journalled_data(handle, bh);
>  	} else {
> -		err = 0;
>  		mark_buffer_dirty(bh);
> -		if (ext4_should_order_data(inode))
> +		/*
> +		 * Only the written block requires ordered data to prevent
> +		 * exposing stale data.
> +		 */
> +		if (!buffer_unwritten(bh) && !buffer_delay(bh) &&
> +		    ext4_should_order_data(inode))
>  			err = ext4_jbd2_inode_add_write(handle, inode, from,
>  					length);
>  	}
> -- 
> 2.52.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH] ext4: don't order data when zeroing unwritten or delayed block

Posted by Zhang Yi 1 month, 2 weeks ago

On 12/22/2025 6:48 PM, Jan Kara wrote:
> On Mon 22-12-25 09:31:36, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> When zeroing out a written partial block, it is necessary to order the
>> data to prevent exposing stale data on disk. However, if the buffer is
>> unwritten or delayed, it is not allocated as written, so ordering the
>> data is not required. This can prevent strange and unnecessary ordered
>> writes when appending data across a region within a block.
>>
>> Assume we have a 2K unwritten file on a filesystem with 4K blocksize,
>> and buffered write from 3K to 4K. Before this patch,
>> __ext4_block_zero_page_range() would add the range [2k,3k) to the
>> ordered range, and then the JBD2 commit process would write back this
>> block. However, it does nothing since the block is not mapped, this
> 							^^^ by this you
> mean that the block is unwritten, don't you?
> 

Yes, that is exactly what I wanted to express. The term "not mapped" might
indeed be unclear and prone to misunderstanding. I will revise it to "the
block is not mapped as written" in v2.

Thanks,
Yi.

>> folio will be redirtied and written back agian through the normal write
>> back process.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> The patch looks good. Feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> 
> 								Honza
> 
>> ---
>>  fs/ext4/inode.c | 8 ++++++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index fa579e857baf..fc16a89903b9 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4104,9 +4104,13 @@ static int __ext4_block_zero_page_range(handle_t *handle,
>>  	if (ext4_should_journal_data(inode)) {
>>  		err = ext4_dirty_journalled_data(handle, bh);
>>  	} else {
>> -		err = 0;
>>  		mark_buffer_dirty(bh);
>> -		if (ext4_should_order_data(inode))
>> +		/*
>> +		 * Only the written block requires ordered data to prevent
>> +		 * exposing stale data.
>> +		 */
>> +		if (!buffer_unwritten(bh) && !buffer_delay(bh) &&
>> +		    ext4_should_order_data(inode))
>>  			err = ext4_jbd2_inode_add_write(handle, inode, from,
>>  					length);
>>  	}
>> -- 
>> 2.52.0
>>