[PATCH v2 06/13] ext4: don't cache extent during splitting extent

Zhang Yi posted 13 patches 1 week, 3 days ago
There is a newer version of this series
[PATCH v2 06/13] ext4: don't cache extent during splitting extent
Posted by Zhang Yi 1 week, 3 days ago
From: Zhang Yi <yi.zhang@huawei.com>

Caching extents during the splitting process is risky, as it may result
in stale extents remaining in the status tree. Moreover, in most cases,
the corresponding extent block entries are likely already cached before
the split happens, making caching here not particularly useful.

Assume we have an unwritten extent, and then DIO writes the first half.

  [UUUUUUUUUUUUUUUU] on-disk extent        U: unwritten extent
  [UUUUUUUUUUUUUUUU] extent status tree
  |<-   ->| ----> dio write this range

First, when ext4_split_extent_at() splits this extent, it truncates the
existing extent and then inserts a new one. During this process, this
extent status entry may be shrunk, and calls to ext4_find_extent() and
ext4_cache_extents() may occur, which could potentially insert the
truncated range as a hole into the extent status tree. After the split
is completed, this hole is not replaced with the correct status.

  [UUUUUUU|UUUUUUUU] on-disk extent        U: unwritten extent
  [UUUUUUU|HHHHHHHH] extent status tree    H: hole

Then, the outer calling functions will not correct this remaining hole
extent either. Finally, if we perform a delayed buffer write on this
latter part, it will re-insert the delayed extent and cause an error in
space accounting.

In adition, if the unwritten extent cache is not shrunk during the
splitting, ext4_cache_extents() also conflicts with existing extents
when caching extents. In the future, we will add checks when caching
extents, which will trigger a warning. Therefore, Do not cache extents
that are being split.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/extents.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 19338f488550..2b5aec3f8882 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3199,6 +3199,9 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
 	BUG_ON((split_flag & EXT4_EXT_DATA_VALID1) &&
 	       (split_flag & EXT4_EXT_DATA_VALID2));
 
+	/* Do not cache extents that are in the process of being modified. */
+	flags |= EXT4_EX_NOCACHE;
+
 	ext_debug(inode, "logical block %llu\n", (unsigned long long)split);
 
 	ext4_ext_show_leaf(inode, path);
@@ -3364,6 +3367,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
 	ee_len = ext4_ext_get_actual_len(ex);
 	unwritten = ext4_ext_is_unwritten(ex);
 
+	/* Do not cache extents that are in the process of being modified. */
+	flags |= EXT4_EX_NOCACHE;
+
 	if (map->m_lblk + map->m_len < ee_block + ee_len) {
 		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
 		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
-- 
2.46.1
Re: [PATCH v2 06/13] ext4: don't cache extent during splitting extent
Posted by Ojaswin Mujoo 5 days, 13 hours ago
On Fri, Nov 21, 2025 at 02:08:04PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Caching extents during the splitting process is risky, as it may result
> in stale extents remaining in the status tree. Moreover, in most cases,
> the corresponding extent block entries are likely already cached before
> the split happens, making caching here not particularly useful.
> 
> Assume we have an unwritten extent, and then DIO writes the first half.
> 
>   [UUUUUUUUUUUUUUUU] on-disk extent        U: unwritten extent
>   [UUUUUUUUUUUUUUUU] extent status tree
>   |<-   ->| ----> dio write this range
> 
> First, when ext4_split_extent_at() splits this extent, it truncates the
> existing extent and then inserts a new one. During this process, this
> extent status entry may be shrunk, and calls to ext4_find_extent() and
> ext4_cache_extents() may occur, which could potentially insert the
> truncated range as a hole into the extent status tree. After the split
> is completed, this hole is not replaced with the correct status.
> 
>   [UUUUUUU|UUUUUUUU] on-disk extent        U: unwritten extent
>   [UUUUUUU|HHHHHHHH] extent status tree    H: hole
> 
> Then, the outer calling functions will not correct this remaining hole
> extent either. Finally, if we perform a delayed buffer write on this
> latter part, it will re-insert the delayed extent and cause an error in
> space accounting.

Okay, makes sense. So one basic question, I see that in
ext4_ext_insert_extent() doesnt really care about updating es unless as a
side effect of ext4_find_extent().  For example, if we end up at goto
has_space; we don't add the new extent and niether do we update that the
exsisting extent has shrunk. 

Similarly, the splitting code also doesn't seem to care about the es
cache other than zeroout in the error handling.

Is there a reason for this? Do we expect the upper layers to maintain
the es cache?
> 
> In adition, if the unwritten extent cache is not shrunk during the
> splitting, ext4_cache_extents() also conflicts with existing extents
> when caching extents. In the future, we will add checks when caching
> extents, which will trigger a warning. Therefore, Do not cache extents
> that are being split.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/ext4/extents.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 19338f488550..2b5aec3f8882 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -3199,6 +3199,9 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
>  	BUG_ON((split_flag & EXT4_EXT_DATA_VALID1) &&
>  	       (split_flag & EXT4_EXT_DATA_VALID2));
>  
> +	/* Do not cache extents that are in the process of being modified. */
> +	flags |= EXT4_EX_NOCACHE;
> +
>  	ext_debug(inode, "logical block %llu\n", (unsigned long long)split);
>  
>  	ext4_ext_show_leaf(inode, path);
> @@ -3364,6 +3367,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
>  	ee_len = ext4_ext_get_actual_len(ex);
>  	unwritten = ext4_ext_is_unwritten(ex);
>  
> +	/* Do not cache extents that are in the process of being modified. */
> +	flags |= EXT4_EX_NOCACHE;
> +
>  	if (map->m_lblk + map->m_len < ee_block + ee_len) {
>  		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
>  		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
> -- 
> 2.46.1
>
Re: [PATCH v2 06/13] ext4: don't cache extent during splitting extent
Posted by Zhang Yi 4 days, 18 hours ago
On 11/26/2025 8:04 PM, Ojaswin Mujoo wrote:
> On Fri, Nov 21, 2025 at 02:08:04PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Caching extents during the splitting process is risky, as it may result
>> in stale extents remaining in the status tree. Moreover, in most cases,
>> the corresponding extent block entries are likely already cached before
>> the split happens, making caching here not particularly useful.
>>
>> Assume we have an unwritten extent, and then DIO writes the first half.
>>
>>   [UUUUUUUUUUUUUUUU] on-disk extent        U: unwritten extent
>>   [UUUUUUUUUUUUUUUU] extent status tree
>>   |<-   ->| ----> dio write this range
>>
>> First, when ext4_split_extent_at() splits this extent, it truncates the
>> existing extent and then inserts a new one. During this process, this
>> extent status entry may be shrunk, and calls to ext4_find_extent() and
>> ext4_cache_extents() may occur, which could potentially insert the
>> truncated range as a hole into the extent status tree. After the split
>> is completed, this hole is not replaced with the correct status.
>>
>>   [UUUUUUU|UUUUUUUU] on-disk extent        U: unwritten extent
>>   [UUUUUUU|HHHHHHHH] extent status tree    H: hole
>>
>> Then, the outer calling functions will not correct this remaining hole
>> extent either. Finally, if we perform a delayed buffer write on this
>> latter part, it will re-insert the delayed extent and cause an error in
>> space accounting.
> 
> Okay, makes sense. So one basic question, I see that in
> ext4_ext_insert_extent() doesnt really care about updating es unless as a
> side effect of ext4_find_extent().  For example, if we end up at goto
> has_space; we don't add the new extent and niether do we update that the
> exsisting extent has shrunk. 
> 
> Similarly, the splitting code also doesn't seem to care about the es
> cache other than zeroout in the error handling.
> 
> Is there a reason for this? Do we expect the upper layers to maintain
> the es cache?

Yeah, if we don't consider the zeroout case caused by a failed split,
under typical circumstances, the ext4_es_insert_extent() in
ext4_map_create_blocks() is called to insert or update the cached
extent entries.

However, ext4_map_create_blocks() only insert or update
the range that the caller want to map, it can't know the actual
initialized range if this extent has been zeroed out, so we have to
update the extent cache in ext4_split_extent_at() for this special case.
Please see commit adb2355104b2 ("ext4: update extent status tree after
an extent is zeroed out") for details.

Unfortunately, the legacy scenario described in this patch remains
unhandled.

Cheers,
Yi.

>>
>> In adition, if the unwritten extent cache is not shrunk during the
>> splitting, ext4_cache_extents() also conflicts with existing extents
>> when caching extents. In the future, we will add checks when caching
>> extents, which will trigger a warning. Therefore, Do not cache extents
>> that are being split.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/ext4/extents.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index 19338f488550..2b5aec3f8882 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -3199,6 +3199,9 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
>>  	BUG_ON((split_flag & EXT4_EXT_DATA_VALID1) &&
>>  	       (split_flag & EXT4_EXT_DATA_VALID2));
>>  
>> +	/* Do not cache extents that are in the process of being modified. */
>> +	flags |= EXT4_EX_NOCACHE;
>> +
>>  	ext_debug(inode, "logical block %llu\n", (unsigned long long)split);
>>  
>>  	ext4_ext_show_leaf(inode, path);
>> @@ -3364,6 +3367,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
>>  	ee_len = ext4_ext_get_actual_len(ex);
>>  	unwritten = ext4_ext_is_unwritten(ex);
>>  
>> +	/* Do not cache extents that are in the process of being modified. */
>> +	flags |= EXT4_EX_NOCACHE;
>> +
>>  	if (map->m_lblk + map->m_len < ee_block + ee_len) {
>>  		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
>>  		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
>> -- 
>> 2.46.1
>>
Re: [PATCH v2 06/13] ext4: don't cache extent during splitting extent
Posted by Ojaswin Mujoo 3 days, 17 hours ago
On Thu, Nov 27, 2025 at 03:01:27PM +0800, Zhang Yi wrote:
> On 11/26/2025 8:04 PM, Ojaswin Mujoo wrote:
> > On Fri, Nov 21, 2025 at 02:08:04PM +0800, Zhang Yi wrote:
> >> From: Zhang Yi <yi.zhang@huawei.com>
> >>
> >> Caching extents during the splitting process is risky, as it may result
> >> in stale extents remaining in the status tree. Moreover, in most cases,
> >> the corresponding extent block entries are likely already cached before
> >> the split happens, making caching here not particularly useful.
> >>
> >> Assume we have an unwritten extent, and then DIO writes the first half.
> >>
> >>   [UUUUUUUUUUUUUUUU] on-disk extent        U: unwritten extent
> >>   [UUUUUUUUUUUUUUUU] extent status tree
> >>   |<-   ->| ----> dio write this range
> >>
> >> First, when ext4_split_extent_at() splits this extent, it truncates the
> >> existing extent and then inserts a new one. During this process, this
> >> extent status entry may be shrunk, and calls to ext4_find_extent() and
> >> ext4_cache_extents() may occur, which could potentially insert the
> >> truncated range as a hole into the extent status tree. After the split
> >> is completed, this hole is not replaced with the correct status.
> >>
> >>   [UUUUUUU|UUUUUUUU] on-disk extent        U: unwritten extent
> >>   [UUUUUUU|HHHHHHHH] extent status tree    H: hole
> >>
> >> Then, the outer calling functions will not correct this remaining hole
> >> extent either. Finally, if we perform a delayed buffer write on this
> >> latter part, it will re-insert the delayed extent and cause an error in
> >> space accounting.
> > 
> > Okay, makes sense. So one basic question, I see that in
> > ext4_ext_insert_extent() doesnt really care about updating es unless as a
> > side effect of ext4_find_extent().  For example, if we end up at goto
> > has_space; we don't add the new extent and niether do we update that the
> > exsisting extent has shrunk. 
> > 
> > Similarly, the splitting code also doesn't seem to care about the es
> > cache other than zeroout in the error handling.
> > 
> > Is there a reason for this? Do we expect the upper layers to maintain
> > the es cache?
> 
> Yeah, if we don't consider the zeroout case caused by a failed split,
> under typical circumstances, the ext4_es_insert_extent() in
> ext4_map_create_blocks() is called to insert or update the cached
> extent entries.
> 
> However, ext4_map_create_blocks() only insert or update
> the range that the caller want to map, it can't know the actual
> initialized range if this extent has been zeroed out, so we have to
> update the extent cache in ext4_split_extent_at() for this special case.
> Please see commit adb2355104b2 ("ext4: update extent status tree after
> an extent is zeroed out") for details.
> 
> Unfortunately, the legacy scenario described in this patch remains
> unhandled.

Got it thanks for the details.

Feel free to add:
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

Regards,
ojaswin

> 
> Cheers,
> Yi.
> 
> >>
> >> In adition, if the unwritten extent cache is not shrunk during the
> >> splitting, ext4_cache_extents() also conflicts with existing extents
> >> when caching extents. In the future, we will add checks when caching
> >> extents, which will trigger a warning. Therefore, Do not cache extents
> >> that are being split.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >> ---
> >>  fs/ext4/extents.c | 6 ++++++
> >>  1 file changed, 6 insertions(+)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 19338f488550..2b5aec3f8882 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -3199,6 +3199,9 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
> >>  	BUG_ON((split_flag & EXT4_EXT_DATA_VALID1) &&
> >>  	       (split_flag & EXT4_EXT_DATA_VALID2));
> >>  
> >> +	/* Do not cache extents that are in the process of being modified. */
> >> +	flags |= EXT4_EX_NOCACHE;
> >> +
> >>  	ext_debug(inode, "logical block %llu\n", (unsigned long long)split);
> >>  
> >>  	ext4_ext_show_leaf(inode, path);
> >> @@ -3364,6 +3367,9 @@ static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
> >>  	ee_len = ext4_ext_get_actual_len(ex);
> >>  	unwritten = ext4_ext_is_unwritten(ex);
> >>  
> >> +	/* Do not cache extents that are in the process of being modified. */
> >> +	flags |= EXT4_EX_NOCACHE;
> >> +
> >>  	if (map->m_lblk + map->m_len < ee_block + ee_len) {
> >>  		split_flag1 = split_flag & EXT4_EXT_MAY_ZEROOUT;
> >>  		flags1 = flags | EXT4_GET_BLOCKS_PRE_IO;
> >> -- 
> >> 2.46.1
> >>
>