[PATCH v3] ext4: avoid infinite loops caused by residual data

Edward Adam Davis posted 1 patch 1 month ago
There is a newer version of this series
fs/ext4/extents.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
[PATCH v3] ext4: avoid infinite loops caused by residual data
Posted by Edward Adam Davis 1 month ago
On the mkdir/mknod path, when mapping logical blocks to physical blocks,
if inserting a new extent into the extent tree fails (in this example,
because the file system disabled the huge file feature when marking the
inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
reclaim the physical block without deleting the corresponding data in
the extent tree. This causes subsequent mkdir operations to reference
the previously reclaimed physical block number again, even though this
physical block is already being used by the xattr block. Therefore, a
situation arises where both the directory and xattr are using the same
buffer head block in memory simultaneously.

The above causes ext4_xattr_block_set() to enter an infinite loop about
"inserted" and cannot release the inode lock, ultimately leading to the
143s blocking problem mentioned in [1].

By using ext4_ext_remove_space() to delete the inserted logical block
and reclaim the physical block when inserting a new extent fails during
extent block mapping, residual extent data can be prevented from affecting
subsequent logical block physical mappings. 

[1]
INFO: task syz.0.17:5995 blocked for more than 143 seconds.
Call Trace:
 inode_lock_nested include/linux/fs.h:1073 [inline]
 __start_dirop fs/namei.c:2923 [inline]
 start_dirop fs/namei.c:2934 [inline]

Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: fix ci reported issues
v2 -> v3: new fix for removing residual data and update subject and coments

 fs/ext4/extents.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ae3804f36535..0bed3379f2d2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4458,19 +4458,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	if (IS_ERR(path)) {
 		err = PTR_ERR(path);
 		if (allocated_clusters) {
-			int fb_flags = 0;
-
 			/*
 			 * free data blocks we just allocated.
 			 * not a good idea to call discard here directly,
 			 * but otherwise we'd need to call it every free().
 			 */
 			ext4_discard_preallocations(inode);
-			if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
-				fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
-			ext4_free_blocks(handle, inode, NULL, newblock,
-					 EXT4_C2B(sbi, allocated_clusters),
-					 fb_flags);
+			ext4_ext_remove_space(inode, newex.ee_block, newex.ee_block);
 		}
 		goto out;
 	}
-- 
2.43.0
Re: [PATCH v3] ext4: avoid infinite loops caused by residual data
Posted by Jan Kara 1 month ago
On Thu 05-03-26 15:25:46, Edward Adam Davis wrote:
> On the mkdir/mknod path, when mapping logical blocks to physical blocks,
> if inserting a new extent into the extent tree fails (in this example,
> because the file system disabled the huge file feature when marking the
> inode as dirty),

I don't quite understand what you mean here but I think you say that
ext4_ext_dirty() -> ext4_mark_inode_dirty() returns error due to whatever
corruption it has hit.

> ext4_ext_map_blocks() only calls ext4_free_blocks() to
> reclaim the physical block without deleting the corresponding data in
> the extent tree. This causes subsequent mkdir operations to reference
> the previously reclaimed physical block number again, even though this
> physical block is already being used by the xattr block. Therefore, a
> situation arises where both the directory and xattr are using the same
> buffer head block in memory simultaneously.

OK, this indeed looks like "not so great" error handling. Thanks for
digging into this.

> The above causes ext4_xattr_block_set() to enter an infinite loop about
> "inserted" and cannot release the inode lock, ultimately leading to the
> 143s blocking problem mentioned in [1].
> 
> By using ext4_ext_remove_space() to delete the inserted logical block
> and reclaim the physical block when inserting a new extent fails during
> extent block mapping, residual extent data can be prevented from affecting
> subsequent logical block physical mappings. 
> 
> [1]
> INFO: task syz.0.17:5995 blocked for more than 143 seconds.
> Call Trace:
>  inode_lock_nested include/linux/fs.h:1073 [inline]
>  __start_dirop fs/namei.c:2923 [inline]
>  start_dirop fs/namei.c:2934 [inline]
> 
> Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
...
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ae3804f36535..0bed3379f2d2 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4458,19 +4458,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
>  	if (IS_ERR(path)) {
>  		err = PTR_ERR(path);
>  		if (allocated_clusters) {
> -			int fb_flags = 0;
> -
>  			/*
>  			 * free data blocks we just allocated.
>  			 * not a good idea to call discard here directly,
>  			 * but otherwise we'd need to call it every free().
>  			 */
>  			ext4_discard_preallocations(inode);
> -			if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> -				fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
> -			ext4_free_blocks(handle, inode, NULL, newblock,
> -					 EXT4_C2B(sbi, allocated_clusters),
> -					 fb_flags);
> +			ext4_ext_remove_space(inode, newex.ee_block, newex.ee_block);

So I'm concerned that if the metadata is corrupted, then trying to remove
some extent space can do even more harm. Also in case
EXT4_GET_BLOCKS_DELALLOC_RESERVE was passed, we now wrongly update quota
information. So this definitely isn't a correct fix. What I'd do instead
would be distinguishing two cases:

1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully
consistent and we must maintain its consistency including all the
accounting. However these errors can happen only early before we've
inserted the extent into the extent tree. So current code works correctly
for this case.

2) Some other error - this means metadata is corrupted. We should strive to
do as few modifications as possible to limit damage. So I'd just skip
freeing of allocated blocks.

Long story short I think we should just modify the above condition:

	if (allocated_clusters)

to

	/*
	 * Gracefully handle out of space conditions. If the filesystem is
	 * inconsistent, we'll just leak allocated blocks to avoid causing
	 * even more damage.
	 */
	if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC))

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
Re: [PATCH v3] ext4: avoid infinite loops caused by residual data
Posted by Edward Adam Davis 1 month ago
On Thu, 5 Mar 2026 11:38:00 +0100 Jan Kara wrote:
> > On the mkdir/mknod path, when mapping logical blocks to physical blocks,
> > if inserting a new extent into the extent tree fails (in this example,
> > because the file system disabled the huge file feature when marking the
> > inode as dirty),
> 
> I don't quite understand what you mean here but I think you say that
> ext4_ext_dirty() -> ext4_mark_inode_dirty() returns error due to whatever
> corruption it has hit.
It returns -EFSCORRUPTED in following calltrace: 
ext4_ext_dirty()->
  ext4_mark_inode_dirty()->
    __ext4_mark_inode_dirty()->
      ext4_mark_iloc_dirty()->
        ext4_do_update_inode()->
	  ext4_fill_raw_inode()->
	    ext4_inode_blocks_set()->
	      if (!ext4_has_feature_huge_file(sb))
	          return -EFSCORRUPTED;
> 
> > ext4_ext_map_blocks() only calls ext4_free_blocks() to
> > reclaim the physical block without deleting the corresponding data in
> > the extent tree. This causes subsequent mkdir operations to reference
> > the previously reclaimed physical block number again, even though this
> > physical block is already being used by the xattr block. Therefore, a
> > situation arises where both the directory and xattr are using the same
> > buffer head block in memory simultaneously.
> 
> OK, this indeed looks like "not so great" error handling. Thanks for
> digging into this.
> 
> > The above causes ext4_xattr_block_set() to enter an infinite loop about
> > "inserted" and cannot release the inode lock, ultimately leading to the
> > 143s blocking problem mentioned in [1].
> >
> > By using ext4_ext_remove_space() to delete the inserted logical block
> > and reclaim the physical block when inserting a new extent fails during
> > extent block mapping, residual extent data can be prevented from affecting
> > subsequent logical block physical mappings.
> >
> > [1]
> > INFO: task syz.0.17:5995 blocked for more than 143 seconds.
> > Call Trace:
> >  inode_lock_nested include/linux/fs.h:1073 [inline]
> >  __start_dirop fs/namei.c:2923 [inline]
> >  start_dirop fs/namei.c:2934 [inline]
> >
> > Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
> > Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> > Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
> > Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> > Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> > ---
> ...
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index ae3804f36535..0bed3379f2d2 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -4458,19 +4458,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
> >  	if (IS_ERR(path)) {
> >  		err = PTR_ERR(path);
> >  		if (allocated_clusters) {
> > -			int fb_flags = 0;
> > -
> >  			/*
> >  			 * free data blocks we just allocated.
> >  			 * not a good idea to call discard here directly,
> >  			 * but otherwise we'd need to call it every free().
> >  			 */
> >  			ext4_discard_preallocations(inode);
> > -			if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> > -				fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
> > -			ext4_free_blocks(handle, inode, NULL, newblock,
> > -					 EXT4_C2B(sbi, allocated_clusters),
> > -					 fb_flags);
> > +			ext4_ext_remove_space(inode, newex.ee_block, newex.ee_block);
> 
> So I'm concerned that if the metadata is corrupted, then trying to remove
> some extent space can do even more harm. Also in case
> EXT4_GET_BLOCKS_DELALLOC_RESERVE was passed, we now wrongly update quota
> information. So this definitely isn't a correct fix. What I'd do instead
> would be distinguishing two cases:
> 
> 1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully
> consistent and we must maintain its consistency including all the
> accounting. However these errors can happen only early before we've
> inserted the extent into the extent tree. So current code works correctly
> for this case.
> 
> 2) Some other error - this means metadata is corrupted. We should strive to
> do as few modifications as possible to limit damage. So I'd just skip
> freeing of allocated blocks.
> 
> Long story short I think we should just modify the above condition:
> 
> 	if (allocated_clusters)
> 
> to
> 
> 	/*
> 	 * Gracefully handle out of space conditions. If the filesystem is
> 	 * inconsistent, we'll just leak allocated blocks to avoid causing
> 	 * even more damage.
> 	 */
> 	if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC))
I have tested the modified code, and now it no longer deletes any data
for the -EFSCORRUPTED error (before applying my patch, it only deleted
the physical block and not the corresponding data in the extent tree).
This also prevents conflicts caused by data being reused by dir and
xattr.

I will release a new patch later to add the filtering you requested.

BR,
Edward
[PATCH v4] ext4: avoid infinite loops caused by residual data
Posted by Edward Adam Davis 1 month ago
On the mkdir/mknod path, when mapping logical blocks to physical blocks,
if inserting a new extent into the extent tree fails (in this example,
because the file system disabled the huge file feature when marking the
inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
reclaim the physical block without deleting the corresponding data in
the extent tree. This causes subsequent mkdir operations to reference
the previously reclaimed physical block number again, even though this
physical block is already being used by the xattr block. Therefore, a
situation arises where both the directory and xattr are using the same
buffer head block in memory simultaneously.

The above causes ext4_xattr_block_set() to enter an infinite loop about
"inserted" and cannot release the inode lock, ultimately leading to the
143s blocking problem mentioned in [1].

By using ext4_ext_remove_space() to delete the inserted logical block
and reclaim the physical block when inserting a new extent fails during
extent block mapping, either delete both as described above, or retain
the data completely (i.e., neither delete the physical block nor the
data in the extent tree), residual extent data can be prevented from
affecting subsequent logical block physical mappings. 

Besides the errors ENOSPC or EDQUOT, this means metadata is corrupted.
We should strive to do as few modifications as possible to limit damage.
So just skip freeing of allocated blocks.

[1]
INFO: task syz.0.17:5995 blocked for more than 143 seconds.
Call Trace:
 inode_lock_nested include/linux/fs.h:1073 [inline]
 __start_dirop fs/namei.c:2923 [inline]
 start_dirop fs/namei.c:2934 [inline]

Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: fix ci reported issues
v2 -> v3: new fix for removing residual data and update subject and coments
v3 -> v4: filtering already allocated blocks and update comments

 fs/ext4/extents.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ae3804f36535..56b7398a0b40 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4457,20 +4457,19 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	path = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
 	if (IS_ERR(path)) {
 		err = PTR_ERR(path);
-		if (allocated_clusters) {
-			int fb_flags = 0;
-
+		/*
+		 * Gracefully handle out of space conditions. If the filesystem
+		 * is inconsistent, we'll just leak allocated blocks to avoid
+		 * causing even more damage.
+		 */
+		if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC)) {
 			/*
 			 * free data blocks we just allocated.
 			 * not a good idea to call discard here directly,
 			 * but otherwise we'd need to call it every free().
 			 */
 			ext4_discard_preallocations(inode);
-			if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
-				fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
-			ext4_free_blocks(handle, inode, NULL, newblock,
-					 EXT4_C2B(sbi, allocated_clusters),
-					 fb_flags);
+			ext4_ext_remove_space(inode, newex.ee_block, newex.ee_block);
 		}
 		goto out;
 	}
-- 
2.43.0
Re: [PATCH v4] ext4: avoid infinite loops caused by residual data
Posted by Jan Kara 1 month ago
On Thu 05-03-26 22:12:03, Edward Adam Davis wrote:
> On the mkdir/mknod path, when mapping logical blocks to physical blocks,
> if inserting a new extent into the extent tree fails (in this example,
> because the file system disabled the huge file feature when marking the
> inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
> reclaim the physical block without deleting the corresponding data in
> the extent tree. This causes subsequent mkdir operations to reference
> the previously reclaimed physical block number again, even though this
> physical block is already being used by the xattr block. Therefore, a
> situation arises where both the directory and xattr are using the same
> buffer head block in memory simultaneously.
> 
> The above causes ext4_xattr_block_set() to enter an infinite loop about
> "inserted" and cannot release the inode lock, ultimately leading to the
> 143s blocking problem mentioned in [1].
> 
> By using ext4_ext_remove_space() to delete the inserted logical block
> and reclaim the physical block when inserting a new extent fails during
> extent block mapping, either delete both as described above, or retain
> the data completely (i.e., neither delete the physical block nor the
> data in the extent tree), residual extent data can be prevented from
> affecting subsequent logical block physical mappings. 
> 
> Besides the errors ENOSPC or EDQUOT, this means metadata is corrupted.
> We should strive to do as few modifications as possible to limit damage.
> So just skip freeing of allocated blocks.
> 
> [1]
> INFO: task syz.0.17:5995 blocked for more than 143 seconds.
> Call Trace:
>  inode_lock_nested include/linux/fs.h:1073 [inline]
>  __start_dirop fs/namei.c:2923 [inline]
>  start_dirop fs/namei.c:2934 [inline]
> 
> Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

...

> @@ -4457,20 +4457,19 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
>  	path = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
>  	if (IS_ERR(path)) {
>  		err = PTR_ERR(path);
> -		if (allocated_clusters) {
> -			int fb_flags = 0;
> -
> +		/*
> +		 * Gracefully handle out of space conditions. If the filesystem
> +		 * is inconsistent, we'll just leak allocated blocks to avoid
> +		 * causing even more damage.
> +		 */
> +		if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC)) {

This is fine now...

>  			/*
>  			 * free data blocks we just allocated.
>  			 * not a good idea to call discard here directly,
>  			 * but otherwise we'd need to call it every free().
>  			 */
>  			ext4_discard_preallocations(inode);
> -			if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
> -				fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
> -			ext4_free_blocks(handle, inode, NULL, newblock,
> -					 EXT4_C2B(sbi, allocated_clusters),
> -					 fb_flags);
> +			ext4_ext_remove_space(inode, newex.ee_block, newex.ee_block);

But this won't work because in case of errors like ENOSPC there's no extent
inserted into the tree so ext4_ext_remove_space() won't do anything. Just
don't touch the freeing code and things will be fine...

								Honza

>  		}
>  		goto out;
>  	}
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
[PATCH v5] ext4: avoid infinite loops caused by residual data
Posted by Edward Adam Davis 1 month ago
On the mkdir/mknod path, when mapping logical blocks to physical blocks,
if inserting a new extent into the extent tree fails (in this example,
because the file system disabled the huge file feature when marking the
inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
reclaim the physical block without deleting the corresponding data in
the extent tree. This causes subsequent mkdir operations to reference
the previously reclaimed physical block number again, even though this
physical block is already being used by the xattr block. Therefore, a
situation arises where both the directory and xattr are using the same
buffer head block in memory simultaneously.

The above causes ext4_xattr_block_set() to enter an infinite loop about
"inserted" and cannot release the inode lock, ultimately leading to the
143s blocking problem mentioned in [1].

If the metadata is corrupted, then trying to remove some extent space
can do even more harm. Also in case EXT4_GET_BLOCKS_DELALLOC_RESERVE
was passed, remove space wrongly update quota information.
Jan Kara suggests distinguishing between two cases:

1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully
consistent and we must maintain its consistency including all the
accounting. However these errors can happen only early before we've
inserted the extent into the extent tree. So current code works correctly
for this case.

2) Some other error - this means metadata is corrupted. We should strive to
do as few modifications as possible to limit damage. So I'd just skip
freeing of allocated blocks.

[1]
INFO: task syz.0.17:5995 blocked for more than 143 seconds.
Call Trace:
 inode_lock_nested include/linux/fs.h:1073 [inline]
 __start_dirop fs/namei.c:2923 [inline]
 start_dirop fs/namei.c:2934 [inline]

Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
v1 -> v2: fix ci reported issues
v2 -> v3: new fix for removing residual data and update subject and coments
v3 -> v4: filtering already allocated blocks and update comments
v4 -> v5: don't touch corrupted data and update comments

 fs/ext4/extents.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ae3804f36535..4779da94f816 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4457,9 +4457,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	path = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
 	if (IS_ERR(path)) {
 		err = PTR_ERR(path);
-		if (allocated_clusters) {
+		/*
+		 * Gracefully handle out of space conditions. If the filesystem
+		 * is inconsistent, we'll just leak allocated blocks to avoid
+		 * causing even more damage.
+		 */
+		if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC)) {
 			int fb_flags = 0;
-
 			/*
 			 * free data blocks we just allocated.
 			 * not a good idea to call discard here directly,
-- 
2.43.0
Re: [PATCH v5] ext4: avoid infinite loops caused by residual data
Posted by Theodore Ts'o 1 week, 6 days ago
On Fri, 06 Mar 2026 09:31:58 +0800, Edward Adam Davis wrote:
> On the mkdir/mknod path, when mapping logical blocks to physical blocks,
> if inserting a new extent into the extent tree fails (in this example,
> because the file system disabled the huge file feature when marking the
> inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
> reclaim the physical block without deleting the corresponding data in
> the extent tree. This causes subsequent mkdir operations to reference
> the previously reclaimed physical block number again, even though this
> physical block is already being used by the xattr block. Therefore, a
> situation arises where both the directory and xattr are using the same
> buffer head block in memory simultaneously.
> 
> [...]

Applied, thanks!

[1/1] ext4: avoid infinite loops caused by residual data
      commit: 86709d389530941e5816505e3c12c757ceca374d

Best regards,
-- 
Theodore Ts'o <tytso@mit.edu>
Re: [PATCH v5] ext4: avoid infinite loops caused by residual data
Posted by Jan Kara 1 month ago
On Fri 06-03-26 09:31:58, Edward Adam Davis wrote:
> On the mkdir/mknod path, when mapping logical blocks to physical blocks,
> if inserting a new extent into the extent tree fails (in this example,
> because the file system disabled the huge file feature when marking the
> inode as dirty), ext4_ext_map_blocks() only calls ext4_free_blocks() to
> reclaim the physical block without deleting the corresponding data in
> the extent tree. This causes subsequent mkdir operations to reference
> the previously reclaimed physical block number again, even though this
> physical block is already being used by the xattr block. Therefore, a
> situation arises where both the directory and xattr are using the same
> buffer head block in memory simultaneously.
> 
> The above causes ext4_xattr_block_set() to enter an infinite loop about
> "inserted" and cannot release the inode lock, ultimately leading to the
> 143s blocking problem mentioned in [1].
> 
> If the metadata is corrupted, then trying to remove some extent space
> can do even more harm. Also in case EXT4_GET_BLOCKS_DELALLOC_RESERVE
> was passed, remove space wrongly update quota information.
> Jan Kara suggests distinguishing between two cases:
> 
> 1) The error is ENOSPC or EDQUOT - in this case the filesystem is fully
> consistent and we must maintain its consistency including all the
> accounting. However these errors can happen only early before we've
> inserted the extent into the extent tree. So current code works correctly
> for this case.
> 
> 2) Some other error - this means metadata is corrupted. We should strive to
> do as few modifications as possible to limit damage. So I'd just skip
> freeing of allocated blocks.
> 
> [1]
> INFO: task syz.0.17:5995 blocked for more than 143 seconds.
> Call Trace:
>  inode_lock_nested include/linux/fs.h:1073 [inline]
>  __start_dirop fs/namei.c:2923 [inline]
>  start_dirop fs/namei.c:2934 [inline]
> 
> Reported-by: syzbot+512459401510e2a9a39f@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=1659aaaaa8d9d11265d7
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Reported-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=512459401510e2a9a39f
> Tested-by: syzbot+1659aaaaa8d9d11265d7@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>

Looks good to me! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
> v1 -> v2: fix ci reported issues
> v2 -> v3: new fix for removing residual data and update subject and coments
> v3 -> v4: filtering already allocated blocks and update comments
> v4 -> v5: don't touch corrupted data and update comments
> 
>  fs/ext4/extents.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index ae3804f36535..4779da94f816 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -4457,9 +4457,13 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
>  	path = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
>  	if (IS_ERR(path)) {
>  		err = PTR_ERR(path);
> -		if (allocated_clusters) {
> +		/*
> +		 * Gracefully handle out of space conditions. If the filesystem
> +		 * is inconsistent, we'll just leak allocated blocks to avoid
> +		 * causing even more damage.
> +		 */
> +		if (allocated_clusters && (err == -EDQUOT || err == -ENOSPC)) {
>  			int fb_flags = 0;
> -
>  			/*
>  			 * free data blocks we just allocated.
>  			 * not a good idea to call discard here directly,
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR