From: Zhang Yi <yi.zhang@huawei.com>
Currently, ext4_es_cache_extent() is used to load extents into the
extent status tree when reading on-disk extent blocks. Since it may be
called while moving or modifying the extent tree, so it does not
overwrite existing extents in the extent status tree and is only used
for the initial loading.
There are many other places in ext4 where on-disk extents are inserted
into the extent status tree, such as in ext4_map_query_blocks().
Currently, they call ext4_es_insert_extent() to perform the insertion,
but they don't modify the extents, so ext4_es_cache_extent() would be a
more appropriate choice. However, when ext4_map_query_blocks() inserts
an extent, it may overwrite a short existing extent of the same type.
Therefore, to prepare for the replacements, we need to extend
ext4_es_cache_extent() to allow it to overwrite existing extents with
the same type.
In addition, since cached extents can be more lenient than the extents
they modify and do not involve modifying reserved blocks, it is not
necessary to ensure that the insertion operation succeeds as strictly as
in the ext4_es_insert_extent() function.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/extents.c | 4 ++--
fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
fs/ext4/extents_status.h | 2 +-
3 files changed, 24 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ca5499e9412b..c42ceb5aae37 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
if (prev && (prev != lblk))
ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
- EXTENT_STATUS_HOLE);
+ EXTENT_STATUS_HOLE, false);
if (ext4_ext_is_unwritten(ex))
status = EXTENT_STATUS_UNWRITTEN;
ext4_es_cache_extent(inode, lblk, len,
- ext4_ext_pblock(ex), status);
+ ext4_ext_pblock(ex), status, false);
prev = lblk + len;
}
}
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 31dc0496f8d0..f9546ecf7340 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
}
/*
- * ext4_es_cache_extent() inserts information into the extent status
- * tree if and only if there isn't information about the range in
- * question already.
+ * ext4_es_cache_extent() inserts extent information into the extent status
+ * tree. If 'overwrite' is not set, it inserts extent only if there isn't
+ * information about the specified range. Otherwise, it overwrites the
+ * current information.
+ *
+ * Note that this interface is only used for caching on-disk extent
+ * information and cannot be used to convert existing extents in the extent
+ * status tree. To convert existing extents, use ext4_es_insert_extent()
+ * instead.
*/
void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len, ext4_fsblk_t pblk,
- unsigned int status)
+ unsigned int status, bool overwrite)
{
struct extent_status *es;
struct extent_status newes;
@@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
BUG_ON(end < lblk);
write_lock(&EXT4_I(inode)->i_es_lock);
-
es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
- if (!es || es->es_lblk > end)
- __es_insert_extent(inode, &newes, NULL);
+ if (es && es->es_lblk <= end) {
+ if (!overwrite)
+ goto unlock;
+
+ /* Only extents of the same type can be overwritten. */
+ WARN_ON_ONCE(ext4_es_type(es) != status);
+ if (__es_remove_extent(inode, lblk, end, NULL, NULL))
+ goto unlock;
+ }
+ __es_insert_extent(inode, &newes, NULL);
+unlock:
write_unlock(&EXT4_I(inode)->i_es_lock);
}
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index 8f9c008d11e8..415f7c223a46 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
bool delalloc_reserve_used);
extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len, ext4_fsblk_t pblk,
- unsigned int status);
+ unsigned int status, bool overwrite);
extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
ext4_lblk_t len);
extern void ext4_es_find_extent_range(struct inode *inode,
--
2.46.1
On Fri 31-10-25 14:29:02, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
>
> Currently, ext4_es_cache_extent() is used to load extents into the
> extent status tree when reading on-disk extent blocks. Since it may be
> called while moving or modifying the extent tree, so it does not
> overwrite existing extents in the extent status tree and is only used
> for the initial loading.
>
> There are many other places in ext4 where on-disk extents are inserted
> into the extent status tree, such as in ext4_map_query_blocks().
> Currently, they call ext4_es_insert_extent() to perform the insertion,
> but they don't modify the extents, so ext4_es_cache_extent() would be a
> more appropriate choice. However, when ext4_map_query_blocks() inserts
> an extent, it may overwrite a short existing extent of the same type.
> Therefore, to prepare for the replacements, we need to extend
> ext4_es_cache_extent() to allow it to overwrite existing extents with
> the same type.
>
> In addition, since cached extents can be more lenient than the extents
> they modify and do not involve modifying reserved blocks, it is not
> necessary to ensure that the insertion operation succeeds as strictly as
> in the ext4_es_insert_extent() function.
>
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Thanks for writing this series! I think we can actually simplify things
event further. Extent status tree operations can be divided into three
groups:
1) Lookups in es tree - protected only by i_es_lock.
2) Caching of on-disk state into es tree - protected by i_es_lock and
i_data_sem (at least in read mode).
3) Modification of existing state - protected by i_es_lock and i_data_sem
in write mode.
Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
that 2) should never see a real conflict - i.e., all intersecting entries
in es tree have the same status, otherwise this is a bug. So I think that
ext4_es_cache_extent() should always walk the whole inserted range, verify
the statuses match and merge all these entries into a single one. This
isn't going to be slower than what we have today because your
__es_remove_extent(), __es_insert_extent() pair is effectively doing the
same thing, just without checking the statuses. That way we always get the
checking and also ext4_es_cache_extent() doesn't have to have the
overwriting and non-overwriting variant. What do you think?
Honza
> ---
> fs/ext4/extents.c | 4 ++--
> fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
> fs/ext4/extents_status.h | 2 +-
> 3 files changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ca5499e9412b..c42ceb5aae37 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
>
> if (prev && (prev != lblk))
> ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
> - EXTENT_STATUS_HOLE);
> + EXTENT_STATUS_HOLE, false);
>
> if (ext4_ext_is_unwritten(ex))
> status = EXTENT_STATUS_UNWRITTEN;
> ext4_es_cache_extent(inode, lblk, len,
> - ext4_ext_pblock(ex), status);
> + ext4_ext_pblock(ex), status, false);
> prev = lblk + len;
> }
> }
> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
> index 31dc0496f8d0..f9546ecf7340 100644
> --- a/fs/ext4/extents_status.c
> +++ b/fs/ext4/extents_status.c
> @@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> }
>
> /*
> - * ext4_es_cache_extent() inserts information into the extent status
> - * tree if and only if there isn't information about the range in
> - * question already.
> + * ext4_es_cache_extent() inserts extent information into the extent status
> + * tree. If 'overwrite' is not set, it inserts extent only if there isn't
> + * information about the specified range. Otherwise, it overwrites the
> + * current information.
> + *
> + * Note that this interface is only used for caching on-disk extent
> + * information and cannot be used to convert existing extents in the extent
> + * status tree. To convert existing extents, use ext4_es_insert_extent()
> + * instead.
> */
> void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len, ext4_fsblk_t pblk,
> - unsigned int status)
> + unsigned int status, bool overwrite)
> {
> struct extent_status *es;
> struct extent_status newes;
> @@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
> BUG_ON(end < lblk);
>
> write_lock(&EXT4_I(inode)->i_es_lock);
> -
> es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
> - if (!es || es->es_lblk > end)
> - __es_insert_extent(inode, &newes, NULL);
> + if (es && es->es_lblk <= end) {
> + if (!overwrite)
> + goto unlock;
> +
> + /* Only extents of the same type can be overwritten. */
> + WARN_ON_ONCE(ext4_es_type(es) != status);
> + if (__es_remove_extent(inode, lblk, end, NULL, NULL))
> + goto unlock;
> + }
> + __es_insert_extent(inode, &newes, NULL);
> +unlock:
> write_unlock(&EXT4_I(inode)->i_es_lock);
> }
>
> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
> index 8f9c008d11e8..415f7c223a46 100644
> --- a/fs/ext4/extents_status.h
> +++ b/fs/ext4/extents_status.h
> @@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
> bool delalloc_reserve_used);
> extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len, ext4_fsblk_t pblk,
> - unsigned int status);
> + unsigned int status, bool overwrite);
> extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
> ext4_lblk_t len);
> extern void ext4_es_find_extent_range(struct inode *inode,
> --
> 2.46.1
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
Hi! Thank you for the review and suggestions!
On 11/6/2025 5:15 PM, Jan Kara wrote:
> On Fri 31-10-25 14:29:02, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Currently, ext4_es_cache_extent() is used to load extents into the
>> extent status tree when reading on-disk extent blocks. Since it may be
>> called while moving or modifying the extent tree, so it does not
>> overwrite existing extents in the extent status tree and is only used
>> for the initial loading.
>>
>> There are many other places in ext4 where on-disk extents are inserted
>> into the extent status tree, such as in ext4_map_query_blocks().
>> Currently, they call ext4_es_insert_extent() to perform the insertion,
>> but they don't modify the extents, so ext4_es_cache_extent() would be a
>> more appropriate choice. However, when ext4_map_query_blocks() inserts
>> an extent, it may overwrite a short existing extent of the same type.
>> Therefore, to prepare for the replacements, we need to extend
>> ext4_es_cache_extent() to allow it to overwrite existing extents with
>> the same type.
>>
>> In addition, since cached extents can be more lenient than the extents
>> they modify and do not involve modifying reserved blocks, it is not
>> necessary to ensure that the insertion operation succeeds as strictly as
>> in the ext4_es_insert_extent() function.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>
> Thanks for writing this series! I think we can actually simplify things
> event further. Extent status tree operations can be divided into three
> groups:
> 1) Lookups in es tree - protected only by i_es_lock.
> 2) Caching of on-disk state into es tree - protected by i_es_lock and
> i_data_sem (at least in read mode).
> 3) Modification of existing state - protected by i_es_lock and i_data_sem
> in write mode.
Yeah.
>
> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is
> that 2) should never see a real conflict - i.e., all intersecting entries
> in es tree have the same status, otherwise this is a bug.
While I was debugging, I observed two exceptions here.
A. The first exceptions is about the delay extent. Since there is no actual
extent present in the extent tree on the disk, if a delayed extent
already exists in the extent status tree and someone calls
ext4_find_extent()->ext4_cache_extents() to cache an extent at the same
location, then a status mismatch will occur (attempting to replace
the delayed extent with a hole). This is not a bug.
B. I also observed that ext4_find_extent()->ext4_cache_extents() is called
during splitting and conversion between unwritten and written states (in
most scenarios, EXT4_EX_NOCACHE is not added). However, because the
process is in an intermediate state of handling extents, there can be
cases where the status do not match. I did not analyze this scenario in
detail, but since ext4_es_insert_extent() is called at the end of the
processing to ensure the final state is correct, I don't think this is a
practical issue either.
Therefore, I believe that retaining non-overwriting is necessary for this
scenario involving ext4_cache_extents() because it will be called during
case 3). Except for ext4_cache_extents(), other scenarios theoretically
should not be involved.
> So I think that
> ext4_es_cache_extent() should always walk the whole inserted range, verify
> the statuses match and merge all these entries into a single one. This
> isn't going to be slower than what we have today because your
> __es_remove_extent(), __es_insert_extent() pair is effectively doing the
> same thing, just without checking the statuses.
Yes, I agree that we can delegate the verification work in
ext4_es_cache_extent() to __es_remove_extent(). During the process of
overwriting extents, the first step is to remove the existing extents. If
the extent status does not match, an alarm will be triggered.
> That way we always get the
> checking and also ext4_es_cache_extent() doesn't have to have the
> overwriting and non-overwriting variant. What do you think?
>
> Honza
For case A, we can add an exception during verification and skip the
warnings. For case B, We need to ensure that ext4_cache_extents() is not
allowed to be called during the intermediate processing of the extent
tree. This seems feasible in theory, but I guess it is somewhat fragile.
So, keep the non-overwriting mode?
Best Regards,
Yi.
>
>> ---
>> fs/ext4/extents.c | 4 ++--
>> fs/ext4/extents_status.c | 28 +++++++++++++++++++++-------
>> fs/ext4/extents_status.h | 2 +-
>> 3 files changed, 24 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
>> index ca5499e9412b..c42ceb5aae37 100644
>> --- a/fs/ext4/extents.c
>> +++ b/fs/ext4/extents.c
>> @@ -537,12 +537,12 @@ static void ext4_cache_extents(struct inode *inode,
>>
>> if (prev && (prev != lblk))
>> ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
>> - EXTENT_STATUS_HOLE);
>> + EXTENT_STATUS_HOLE, false);
>>
>> if (ext4_ext_is_unwritten(ex))
>> status = EXTENT_STATUS_UNWRITTEN;
>> ext4_es_cache_extent(inode, lblk, len,
>> - ext4_ext_pblock(ex), status);
>> + ext4_ext_pblock(ex), status, false);
>> prev = lblk + len;
>> }
>> }
>> diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
>> index 31dc0496f8d0..f9546ecf7340 100644
>> --- a/fs/ext4/extents_status.c
>> +++ b/fs/ext4/extents_status.c
>> @@ -986,13 +986,19 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> }
>>
>> /*
>> - * ext4_es_cache_extent() inserts information into the extent status
>> - * tree if and only if there isn't information about the range in
>> - * question already.
>> + * ext4_es_cache_extent() inserts extent information into the extent status
>> + * tree. If 'overwrite' is not set, it inserts extent only if there isn't
>> + * information about the specified range. Otherwise, it overwrites the
>> + * current information.
>> + *
>> + * Note that this interface is only used for caching on-disk extent
>> + * information and cannot be used to convert existing extents in the extent
>> + * status tree. To convert existing extents, use ext4_es_insert_extent()
>> + * instead.
>> */
>> void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len, ext4_fsblk_t pblk,
>> - unsigned int status)
>> + unsigned int status, bool overwrite)
>> {
>> struct extent_status *es;
>> struct extent_status newes;
>> @@ -1012,10 +1018,18 @@ void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> BUG_ON(end < lblk);
>>
>> write_lock(&EXT4_I(inode)->i_es_lock);
>> -
>> es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk);
>> - if (!es || es->es_lblk > end)
>> - __es_insert_extent(inode, &newes, NULL);
>> + if (es && es->es_lblk <= end) {
>> + if (!overwrite)
>> + goto unlock;
>> +
>> + /* Only extents of the same type can be overwritten. */
>> + WARN_ON_ONCE(ext4_es_type(es) != status);
>> + if (__es_remove_extent(inode, lblk, end, NULL, NULL))
>> + goto unlock;
>> + }
>> + __es_insert_extent(inode, &newes, NULL);
>> +unlock:
>> write_unlock(&EXT4_I(inode)->i_es_lock);
>> }
>>
>> diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
>> index 8f9c008d11e8..415f7c223a46 100644
>> --- a/fs/ext4/extents_status.h
>> +++ b/fs/ext4/extents_status.h
>> @@ -139,7 +139,7 @@ extern void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
>> bool delalloc_reserve_used);
>> extern void ext4_es_cache_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len, ext4_fsblk_t pblk,
>> - unsigned int status);
>> + unsigned int status, bool overwrite);
>> extern void ext4_es_remove_extent(struct inode *inode, ext4_lblk_t lblk,
>> ext4_lblk_t len);
>> extern void ext4_es_find_extent_range(struct inode *inode,
>> --
>> 2.46.1
>>
Hi! On Thu 06-11-25 21:02:35, Zhang Yi wrote: > On 11/6/2025 5:15 PM, Jan Kara wrote: > > On Fri 31-10-25 14:29:02, Zhang Yi wrote: > >> From: Zhang Yi <yi.zhang@huawei.com> > >> > >> Currently, ext4_es_cache_extent() is used to load extents into the > >> extent status tree when reading on-disk extent blocks. Since it may be > >> called while moving or modifying the extent tree, so it does not > >> overwrite existing extents in the extent status tree and is only used > >> for the initial loading. > >> > >> There are many other places in ext4 where on-disk extents are inserted > >> into the extent status tree, such as in ext4_map_query_blocks(). > >> Currently, they call ext4_es_insert_extent() to perform the insertion, > >> but they don't modify the extents, so ext4_es_cache_extent() would be a > >> more appropriate choice. However, when ext4_map_query_blocks() inserts > >> an extent, it may overwrite a short existing extent of the same type. > >> Therefore, to prepare for the replacements, we need to extend > >> ext4_es_cache_extent() to allow it to overwrite existing extents with > >> the same type. > >> > >> In addition, since cached extents can be more lenient than the extents > >> they modify and do not involve modifying reserved blocks, it is not > >> necessary to ensure that the insertion operation succeeds as strictly as > >> in the ext4_es_insert_extent() function. > >> > >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> > > > > Thanks for writing this series! I think we can actually simplify things > > event further. Extent status tree operations can be divided into three > > groups: > > 1) Lookups in es tree - protected only by i_es_lock. > > 2) Caching of on-disk state into es tree - protected by i_es_lock and > > i_data_sem (at least in read mode). > > 3) Modification of existing state - protected by i_es_lock and i_data_sem > > in write mode. > > Yeah. > > > > > Now because 2) has exclusion vs 3) due to i_data_sem, the observation is > > that 2) should never see a real conflict - i.e., all intersecting entries > > in es tree have the same status, otherwise this is a bug. > > While I was debugging, I observed two exceptions here. > > A. The first exceptions is about the delay extent. Since there is no actual > extent present in the extent tree on the disk, if a delayed extent > already exists in the extent status tree and someone calls > ext4_find_extent()->ext4_cache_extents() to cache an extent at the same > location, then a status mismatch will occur (attempting to replace > the delayed extent with a hole). This is not a bug. > B. I also observed that ext4_find_extent()->ext4_cache_extents() is called > during splitting and conversion between unwritten and written states (in > most scenarios, EXT4_EX_NOCACHE is not added). However, because the > process is in an intermediate state of handling extents, there can be > cases where the status do not match. I did not analyze this scenario in > detail, but since ext4_es_insert_extent() is called at the end of the > processing to ensure the final state is correct, I don't think this is a > practical issue either. Thanks for bringing this up. I didn't think about these two cases. As for case A that is easy to deal with as you write below. A hole insertion can be deemed compatible with existing delalloc extent. Case B is more difficult and I think I need to better understand the details there to decide what to do. Only extent splitting (as it happens e.g. with EXT4_GET_BLOCKS_PRE_IO) should keep extents in the extent tree and extent status tree compatible. So it has to be something like EXT4_GET_BLOCKS_CONVERT case. There indeed after we call ext4_ext_mark_initialized() we have initialized extent on disk but in extent status tree it is still as unwritten. But I just didn't find a place in the extent conversion path that would modify extent state on disk and then call ext4_find_extent(). Can you perhaps share a stacktrace where the extent incompatibility was hit from ext4_cache_extents()? Thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
On 11/11/2025 6:33 PM, Jan Kara wrote: > Hi! > > On Thu 06-11-25 21:02:35, Zhang Yi wrote: >> On 11/6/2025 5:15 PM, Jan Kara wrote: >>> On Fri 31-10-25 14:29:02, Zhang Yi wrote: >>>> From: Zhang Yi <yi.zhang@huawei.com> >>>> >>>> Currently, ext4_es_cache_extent() is used to load extents into the >>>> extent status tree when reading on-disk extent blocks. Since it may be >>>> called while moving or modifying the extent tree, so it does not >>>> overwrite existing extents in the extent status tree and is only used >>>> for the initial loading. >>>> >>>> There are many other places in ext4 where on-disk extents are inserted >>>> into the extent status tree, such as in ext4_map_query_blocks(). >>>> Currently, they call ext4_es_insert_extent() to perform the insertion, >>>> but they don't modify the extents, so ext4_es_cache_extent() would be a >>>> more appropriate choice. However, when ext4_map_query_blocks() inserts >>>> an extent, it may overwrite a short existing extent of the same type. >>>> Therefore, to prepare for the replacements, we need to extend >>>> ext4_es_cache_extent() to allow it to overwrite existing extents with >>>> the same type. >>>> >>>> In addition, since cached extents can be more lenient than the extents >>>> they modify and do not involve modifying reserved blocks, it is not >>>> necessary to ensure that the insertion operation succeeds as strictly as >>>> in the ext4_es_insert_extent() function. >>>> >>>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> >>> >>> Thanks for writing this series! I think we can actually simplify things >>> event further. Extent status tree operations can be divided into three >>> groups: >>> 1) Lookups in es tree - protected only by i_es_lock. >>> 2) Caching of on-disk state into es tree - protected by i_es_lock and >>> i_data_sem (at least in read mode). >>> 3) Modification of existing state - protected by i_es_lock and i_data_sem >>> in write mode. >> >> Yeah. >> >>> >>> Now because 2) has exclusion vs 3) due to i_data_sem, the observation is >>> that 2) should never see a real conflict - i.e., all intersecting entries >>> in es tree have the same status, otherwise this is a bug. >> >> While I was debugging, I observed two exceptions here. >> >> A. The first exceptions is about the delay extent. Since there is no actual >> extent present in the extent tree on the disk, if a delayed extent >> already exists in the extent status tree and someone calls >> ext4_find_extent()->ext4_cache_extents() to cache an extent at the same >> location, then a status mismatch will occur (attempting to replace >> the delayed extent with a hole). This is not a bug. >> B. I also observed that ext4_find_extent()->ext4_cache_extents() is called >> during splitting and conversion between unwritten and written states (in >> most scenarios, EXT4_EX_NOCACHE is not added). However, because the >> process is in an intermediate state of handling extents, there can be >> cases where the status do not match. I did not analyze this scenario in >> detail, but since ext4_es_insert_extent() is called at the end of the >> processing to ensure the final state is correct, I don't think this is a >> practical issue either. > > Thanks for bringing this up. I didn't think about these two cases. As for > case A that is easy to deal with as you write below. A hole insertion can > be deemed compatible with existing delalloc extent. > Yeah. > Case B is more difficult and I think I need to better understand the > details there to decide what to do. Only extent splitting (as it happens > e.g. with EXT4_GET_BLOCKS_PRE_IO) should keep extents in the extent tree and > extent status tree compatible. So it has to be something like > EXT4_GET_BLOCKS_CONVERT case. There indeed after we call > ext4_ext_mark_initialized() we have initialized extent on disk but in > extent status tree it is still as unwritten. But I just didn't find a place > in the extent conversion path that would modify extent state on disk and > then call ext4_find_extent(). Can you perhaps share a stacktrace where the > extent incompatibility was hit from ext4_cache_extents()? Thanks! > > Honza > Sorry for the late. I have found several real issues during debugging this case, the situation is a bit complicated and will take some time, I will address these in the next iteration. Cheers, Yi.
© 2016 - 2026 Red Hat, Inc.