ext4: fix out-of-bounds issue in ext4_xattr_set_entry

[PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Qianqiang Liu 1 year, 4 months ago

syzbot has found an out-of-bounds issue in ext4_xattr_set_entry:

==================================================================
BUG: KASAN: out-of-bounds in ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
Read of size 18446744073709551572 at addr ffff888036426850 by task syz-executor264/5095

CPU: 0 UID: 0 PID: 5095 Comm: syz-executor264 Not tainted 6.11.0-syzkaller-03917-ga940d9a43e62 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:93 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
 __asan_memmove+0x29/0x70 mm/kasan/shadow.c:94
 ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
[...]
==================================================================

This issue is caused by a negative size in memmove.
We need to check for this.

Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
Reported-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
Tested-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
---
 fs/ext4/xattr.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 46ce2f21fef9..336badb46246 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1776,7 +1776,14 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
 	} else if (s->not_found) {
 		/* Insert new name. */
 		size_t size = EXT4_XATTR_LEN(name_len);
-		size_t rest = (void *)last - (void *)here + sizeof(__u32);
+		size_t rest;
+
+		if (last < here) {
+			ret = -ENOSPC;
+			goto out;
+		} else {
+			rest = (void *)last - (void *)here + sizeof(__u32);
+		}
 
 		memmove((void *)here + size, here, rest);
 		memset(here, 0, size);
-- 
2.34.1

-- 
Best,
Qianqiang Liu

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Baokun Li 1 year, 4 months ago

On 2024/9/22 14:42, Qianqiang Liu wrote:
> syzbot has found an out-of-bounds issue in ext4_xattr_set_entry:
>
> ==================================================================
> BUG: KASAN: out-of-bounds in ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> Read of size 18446744073709551572 at addr ffff888036426850 by task syz-executor264/5095
>
> CPU: 0 UID: 0 PID: 5095 Comm: syz-executor264 Not tainted 6.11.0-syzkaller-03917-ga940d9a43e62 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
>   <TASK>
>   __dump_stack lib/dump_stack.c:93 [inline]
>   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
>   print_address_description mm/kasan/report.c:377 [inline]
>   print_report+0x169/0x550 mm/kasan/report.c:488
>   kasan_report+0x143/0x180 mm/kasan/report.c:601
>   kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
>   __asan_memmove+0x29/0x70 mm/kasan/shadow.c:94
>   ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> [...]
> ==================================================================
>
> This issue is caused by a negative size in memmove.
> We need to check for this.
>
> Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
> Reported-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
> Tested-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
> ---
>   fs/ext4/xattr.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 46ce2f21fef9..336badb46246 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1776,7 +1776,14 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
>   	} else if (s->not_found) {
>   		/* Insert new name. */
>   		size_t size = EXT4_XATTR_LEN(name_len);
> -		size_t rest = (void *)last - (void *)here + sizeof(__u32);
> +		size_t rest;
> +
> +		if (last < here) {
> +			ret = -ENOSPC;
> +			goto out;
> +		} else {
> +			rest = (void *)last - (void *)here + sizeof(__u32);
> +		}
>   
>   		memmove((void *)here + size, here, rest);
>   		memset(here, 0, size);
This change just passes syzbot's test cases without fixing the real
problem.

The root cause of the problem is that the inode's xattr block is marked as
free in the block bitmap, so that block is allocated to the ea inode
resulting in the data in the xattr block being overwritten, and the last of
the second lookups changing resulting in out-of-bounds access.

The stack that triggers the problem is as follows:

// An inode with an xattr block of 33.
__ext4_mark_inode_dirty
  __ext4_expand_extra_isize
   ext4_expand_extra_isize_ea
    ext4_xattr_make_inode_space
     // Move xattr from inode to block
     ext4_xattr_move_to_block
      // Find out if the xattr exists in the block
      ext4_xattr_block_find
       // If xattr does not exist, here == last
       xattr_find_entry
      // Add a new xattr to the block
      ext4_xattr_block_set
       |// xattr is too long, needs an ea inode
       |ext4_xattr_inode_lookup_create
       | ext4_xattr_inode_create
       | ext4_xattr_inode_write
       |  ext4_map_blocks
       |   // xattr block 33 is assigned to the new ea inode
       |  memcpy(bh->b_data, buf, csize)
       |   // The value of xattr overwrites the data in the xattr block.
       |ext4_xattr_set_entry
        // Since the contents of the xattr block have changed,
        // now here == last does not hold, so it is possible to
        // have last < here and trigger an out-of-bounds access.

So I think we should probably add a helper function ext4_mb_block_inuse()
that checks if xattr block is free with the block bitmap in check_xattrs().

Or go one step further and add a mechanism like xfs Reverse-Mapping, which
makes sure that allocated blocks do point to the target inode, which could
replace the current block_validity, and could also be used in future online
fscks.

Ted, Honza, do you have any thoughts?

Regards,
Baokun

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Jan Kara 1 year, 4 months ago

On Tue 08-10-24 15:40:39, Baokun Li wrote:
> On 2024/9/22 14:42, Qianqiang Liu wrote:
> > syzbot has found an out-of-bounds issue in ext4_xattr_set_entry:
> > 
> > ==================================================================
> > BUG: KASAN: out-of-bounds in ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> > Read of size 18446744073709551572 at addr ffff888036426850 by task syz-executor264/5095
> > 
> > CPU: 0 UID: 0 PID: 5095 Comm: syz-executor264 Not tainted 6.11.0-syzkaller-03917-ga940d9a43e62 #0
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> > Call Trace:
> >   <TASK>
> >   __dump_stack lib/dump_stack.c:93 [inline]
> >   dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
> >   print_address_description mm/kasan/report.c:377 [inline]
> >   print_report+0x169/0x550 mm/kasan/report.c:488
> >   kasan_report+0x143/0x180 mm/kasan/report.c:601
> >   kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
> >   __asan_memmove+0x29/0x70 mm/kasan/shadow.c:94
> >   ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> > [...]
> > ==================================================================
> > 
> > This issue is caused by a negative size in memmove.
> > We need to check for this.
> > 
> > Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
> > Reported-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
> > Tested-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> > Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
> > ---
> >   fs/ext4/xattr.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> > index 46ce2f21fef9..336badb46246 100644
> > --- a/fs/ext4/xattr.c
> > +++ b/fs/ext4/xattr.c
> > @@ -1776,7 +1776,14 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
> >   	} else if (s->not_found) {
> >   		/* Insert new name. */
> >   		size_t size = EXT4_XATTR_LEN(name_len);
> > -		size_t rest = (void *)last - (void *)here + sizeof(__u32);
> > +		size_t rest;
> > +
> > +		if (last < here) {
> > +			ret = -ENOSPC;
> > +			goto out;
> > +		} else {
> > +			rest = (void *)last - (void *)here + sizeof(__u32);
> > +		}
> >   		memmove((void *)here + size, here, rest);
> >   		memset(here, 0, size);
> This change just passes syzbot's test cases without fixing the real
> problem.
> 
> The root cause of the problem is that the inode's xattr block is marked as
> free in the block bitmap, so that block is allocated to the ea inode
> resulting in the data in the xattr block being overwritten, and the last of
> the second lookups changing resulting in out-of-bounds access.
> 
> The stack that triggers the problem is as follows:
> 
> // An inode with an xattr block of 33.
> __ext4_mark_inode_dirty
>  __ext4_expand_extra_isize
>   ext4_expand_extra_isize_ea
>    ext4_xattr_make_inode_space
>     // Move xattr from inode to block
>     ext4_xattr_move_to_block
>      // Find out if the xattr exists in the block
>      ext4_xattr_block_find
>       // If xattr does not exist, here == last
>       xattr_find_entry
>      // Add a new xattr to the block
>      ext4_xattr_block_set
>       |// xattr is too long, needs an ea inode
>       |ext4_xattr_inode_lookup_create
>       | ext4_xattr_inode_create
>       | ext4_xattr_inode_write
>       |  ext4_map_blocks
>       |   // xattr block 33 is assigned to the new ea inode
>       |  memcpy(bh->b_data, buf, csize)
>       |   // The value of xattr overwrites the data in the xattr block.
>       |ext4_xattr_set_entry
>        // Since the contents of the xattr block have changed,
>        // now here == last does not hold, so it is possible to
>        // have last < here and trigger an out-of-bounds access.
> 
> So I think we should probably add a helper function ext4_mb_block_inuse()
> that checks if xattr block is free with the block bitmap in check_xattrs().

Well, even that would be a relatively narrow fix. You could have e.g.
file reference the xattr block as one of its data blocks and then corrupt
xattr contents at unfortunate moment. That will not get fixed by checking
whether the block is allocated. These multiply claimed blocks (as e2fsck
calls it) are very hard to detect inside the kernel.

> Or go one step further and add a mechanism like xfs Reverse-Mapping, which
> makes sure that allocated blocks do point to the target inode, which could
> replace the current block_validity, and could also be used in future online
> fscks.

Well, that is a rather big change. It requires significant on-disk format
change and also performance cost when to maintain. Furthermore for xattr
blocks which can be shared by many inodes it is not even clear how to
implement this... So I'm not sure we really want to do this either.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Baokun Li 1 year, 4 months ago

On 2024/10/9 23:50, Jan Kara wrote:
> On Tue 08-10-24 15:40:39, Baokun Li wrote:
>> On 2024/9/22 14:42, Qianqiang Liu wrote:
>>> syzbot has found an out-of-bounds issue in ext4_xattr_set_entry:
>>>
>>> ==================================================================
>>> BUG: KASAN: out-of-bounds in ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
>>> Read of size 18446744073709551572 at addr ffff888036426850 by task syz-executor264/5095
>>>
>>> CPU: 0 UID: 0 PID: 5095 Comm: syz-executor264 Not tainted 6.11.0-syzkaller-03917-ga940d9a43e62 #0
>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
>>> Call Trace:
>>>    <TASK>
>>>    __dump_stack lib/dump_stack.c:93 [inline]
>>>    dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
>>>    print_address_description mm/kasan/report.c:377 [inline]
>>>    print_report+0x169/0x550 mm/kasan/report.c:488
>>>    kasan_report+0x143/0x180 mm/kasan/report.c:601
>>>    kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
>>>    __asan_memmove+0x29/0x70 mm/kasan/shadow.c:94
>>>    ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
>>> [...]
>>> ==================================================================
>>>
>>> This issue is caused by a negative size in memmove.
>>> We need to check for this.
>>>
>>> Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
>>> Reported-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
>>> Tested-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
>>> Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
>>> ---
>>>    fs/ext4/xattr.c | 9 ++++++++-
>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>>> index 46ce2f21fef9..336badb46246 100644
>>> --- a/fs/ext4/xattr.c
>>> +++ b/fs/ext4/xattr.c
>>> @@ -1776,7 +1776,14 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
>>>    	} else if (s->not_found) {
>>>    		/* Insert new name. */
>>>    		size_t size = EXT4_XATTR_LEN(name_len);
>>> -		size_t rest = (void *)last - (void *)here + sizeof(__u32);
>>> +		size_t rest;
>>> +
>>> +		if (last < here) {
>>> +			ret = -ENOSPC;
>>> +			goto out;
>>> +		} else {
>>> +			rest = (void *)last - (void *)here + sizeof(__u32);
>>> +		}
>>>    		memmove((void *)here + size, here, rest);
>>>    		memset(here, 0, size);
>> This change just passes syzbot's test cases without fixing the real
>> problem.
>>
>> The root cause of the problem is that the inode's xattr block is marked as
>> free in the block bitmap, so that block is allocated to the ea inode
>> resulting in the data in the xattr block being overwritten, and the last of
>> the second lookups changing resulting in out-of-bounds access.
>>
>> The stack that triggers the problem is as follows:
>>
>> // An inode with an xattr block of 33.
>> __ext4_mark_inode_dirty
>>   __ext4_expand_extra_isize
>>    ext4_expand_extra_isize_ea
>>     ext4_xattr_make_inode_space
>>      // Move xattr from inode to block
>>      ext4_xattr_move_to_block
>>       // Find out if the xattr exists in the block
>>       ext4_xattr_block_find
>>        // If xattr does not exist, here == last
>>        xattr_find_entry
>>       // Add a new xattr to the block
>>       ext4_xattr_block_set
>>        |// xattr is too long, needs an ea inode
>>        |ext4_xattr_inode_lookup_create
>>        | ext4_xattr_inode_create
>>        | ext4_xattr_inode_write
>>        |  ext4_map_blocks
>>        |   // xattr block 33 is assigned to the new ea inode
>>        |  memcpy(bh->b_data, buf, csize)
>>        |   // The value of xattr overwrites the data in the xattr block.
>>        |ext4_xattr_set_entry
>>         // Since the contents of the xattr block have changed,
>>         // now here == last does not hold, so it is possible to
>>         // have last < here and trigger an out-of-bounds access.
>>
>> So I think we should probably add a helper function ext4_mb_block_inuse()
>> that checks if xattr block is free with the block bitmap in check_xattrs().
Hi Honza,

Thanks so much for your thoughts and feedback!
> Well, even that would be a relatively narrow fix. You could have e.g.
> file reference the xattr block as one of its data blocks and then corrupt
> xattr contents at unfortunate moment. That will not get fixed by checking
> whether the block is allocated. These multiply claimed blocks (as e2fsck
> calls it) are very hard to detect inside the kernel.
Yes, after locating the issue, the first thought was to just get the buffer
lock and check xattr magic and xattr block checksum. However, if the block
is allocated as an xattr block to another file, the issue may still occur.

Therefore we have to make sure that the block has been allocated to the
current file. With the block bitmap we can verify that the current block
is allocated, but as you pointed out we cannot verify that it is only
allocated to the current file.

That means we need some means to find the owner of the block by block,
and then I came up with xfs Reverse-Mapping.
>> Or go one step further and add a mechanism like xfs Reverse-Mapping, which
>> makes sure that allocated blocks do point to the target inode, which could
>> replace the current block_validity, and could also be used in future online
>> fscks.
> Well, that is a rather big change. It requires significant on-disk format
> change and also performance cost when to maintain. Furthermore for xattr
> blocks which can be shared by many inodes it is not even clear how to
> implement this... So I'm not sure we really want to do this either.
>
> 								Honza
Yes, there can be a lot of work involved.

  * Perhaps we could create an rmap file to store the rmap tree to avoid
    on-disk format changes.
  * The performance impact of maintaining rmap really needs to be evaluated,
    perhaps by writing a simple DEMO to test it.
  * XFS supports shared blocks(A.K.A. reflink.), so even if the physical
    blocks are the same, but the inodes are different or the logical blocks
    are different, they will be recorded multiple times in the tree. So the
    shared xattr block can be handled similarly.

We have plans to support online fsck in the future, and implementing rmap
is one of the steps. Perhaps one can wait until rmap is implemented to
assess whether it is worth a strict check here.

Implementing rmap may take some time, until then we can avoid the problem
as much as possible by checking the magic and xattr block csum.
Maybe something like this?

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7647e9f6e190..cd3ae1e3371c 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct 
ext4_xattr_info *i,
                 }
         }

+       if (WARN_ON_ONCE(last < here)) {
+               EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
+                                       is_block ? "block" : "ibody");
+               ret = -EFSCORRUPTED;
+               goto out;
+       }
+
         /* Check whether we have enough space. */
         if (i->value) {
                 size_t free;
@@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct 
inode *inode,
         }

         if (s->base) {
+               struct ext4_xattr_header *hdr;
                 int offset = (char *)s->here - bs->bh->b_data;

                 BUFFER_TRACE(bs->bh, "get_write_access");
@@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct 
inode *inode,
                         goto cleanup;

                 lock_buffer(bs->bh);
+               hdr = header(s->base);
+
+               if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
+                   (ext4_has_metadata_csum(inode->i_sb) &&
+                    (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, 
hdr) !=
+                     hdr->h_checksum))) {
+                       unlock_buffer(bs->bh);
+                       error = -EFSCORRUPTED;
+                       goto bad_block;
+               }

                 if (header(s->base)->h_refcount == cpu_to_le32(1)) {
                         __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);

--
Thanks,
Baokun

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Jan Kara 1 year, 3 months ago

On Fri 11-10-24 10:18:04, Baokun Li wrote:
> On 2024/10/9 23:50, Jan Kara wrote:
> > > Or go one step further and add a mechanism like xfs Reverse-Mapping, which
> > > makes sure that allocated blocks do point to the target inode, which could
> > > replace the current block_validity, and could also be used in future online
> > > fscks.
> > Well, that is a rather big change. It requires significant on-disk format
> > change and also performance cost when to maintain. Furthermore for xattr
> > blocks which can be shared by many inodes it is not even clear how to
> > implement this... So I'm not sure we really want to do this either.
> 
> Yes, there can be a lot of work involved.
> 
>  * Perhaps we could create an rmap file to store the rmap tree to avoid
>    on-disk format changes.
>  * The performance impact of maintaining rmap really needs to be evaluated,
>    perhaps by writing a simple DEMO to test it.
>  * XFS supports shared blocks(A.K.A. reflink.), so even if the physical
>    blocks are the same, but the inodes are different or the logical blocks
>    are different, they will be recorded multiple times in the tree. So the
>    shared xattr block can be handled similarly.
> 
> We have plans to support online fsck in the future, and implementing rmap
> is one of the steps. Perhaps one can wait until rmap is implemented to
> assess whether it is worth a strict check here.

Yes, we could implement something like this be as you wrote, it's going to
be a lot of work. We've briefly discussed this with Ted on ext4 call and we
came to a conclusion that this is a type of corruption ext4 may never
protect agaist. You simply should not mount arbitrarily corrupted
filesystems... But if you want to try, sure go ahead :)

One relatively easy solution to similar class of problems would be to store
the type of metadata buffer inside the buffer_head when we are verifying
checksum, clear the info when freeing the block in __ext4_forget(), and
fail with EFSCORRUPTED error when one type -> another type transition would
happen.

> Implementing rmap may take some time, until then we can avoid the problem
> as much as possible by checking the magic and xattr block csum.
> Maybe something like this?
> 
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 7647e9f6e190..cd3ae1e3371c 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct
> ext4_xattr_info *i,
>                 }
>         }
> 
> +       if (WARN_ON_ONCE(last < here)) {
> +               EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
> +                                       is_block ? "block" : "ibody");
> +               ret = -EFSCORRUPTED;
> +               goto out;
> +       }
> +
>         /* Check whether we have enough space. */
>         if (i->value) {
>                 size_t free;
> @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode
> *inode,
>         }
> 
>         if (s->base) {
> +               struct ext4_xattr_header *hdr;
>                 int offset = (char *)s->here - bs->bh->b_data;
> 
>                 BUFFER_TRACE(bs->bh, "get_write_access");
> @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode
> *inode,
>                         goto cleanup;
> 
>                 lock_buffer(bs->bh);
> +               hdr = header(s->base);
> +
> +               if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
> +                   (ext4_has_metadata_csum(inode->i_sb) &&
> +                    (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr)
> !=
> +                     hdr->h_checksum))) {
> +                       unlock_buffer(bs->bh);
> +                       error = -EFSCORRUPTED;
> +                       goto bad_block;
> +               }
> 
>                 if (header(s->base)->h_refcount == cpu_to_le32(1)) {
>                         __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);

Hum, there are more places in xattr code that access a buffer that could
have been modified. So why do you add check into this place? Is it somehow
special?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Baokun Li 1 year, 3 months ago

On 2024/10/15 0:31, Jan Kara wrote:
> On Fri 11-10-24 10:18:04, Baokun Li wrote:
>> On 2024/10/9 23:50, Jan Kara wrote:
>>>> Or go one step further and add a mechanism like xfs Reverse-Mapping, which
>>>> makes sure that allocated blocks do point to the target inode, which could
>>>> replace the current block_validity, and could also be used in future online
>>>> fscks.
>>> Well, that is a rather big change. It requires significant on-disk format
>>> change and also performance cost when to maintain. Furthermore for xattr
>>> blocks which can be shared by many inodes it is not even clear how to
>>> implement this... So I'm not sure we really want to do this either.
>> Yes, there can be a lot of work involved.
>>
>>   * Perhaps we could create an rmap file to store the rmap tree to avoid
>>     on-disk format changes.
>>   * The performance impact of maintaining rmap really needs to be evaluated,
>>     perhaps by writing a simple DEMO to test it.
>>   * XFS supports shared blocks(A.K.A. reflink.), so even if the physical
>>     blocks are the same, but the inodes are different or the logical blocks
>>     are different, they will be recorded multiple times in the tree. So the
>>     shared xattr block can be handled similarly.
>>
>> We have plans to support online fsck in the future, and implementing rmap
>> is one of the steps. Perhaps one can wait until rmap is implemented to
>> assess whether it is worth a strict check here.
> Yes, we could implement something like this be as you wrote, it's going to
> be a lot of work. We've briefly discussed this with Ted on ext4 call and we
> came to a conclusion that this is a type of corruption ext4 may never
> protect agaist. You simply should not mount arbitrarily corrupted
> filesystems...

Thank you for discussing this with Ted.

This kind of problem is really tricky.

>   But if you want to try, sure go ahead :)
As server clusters get larger and larger, server maintenance becomes very
difficult. Therefore, timely detection of problems (i.e. online scanning,
similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e.
online fsck, similar to e2fsck -a) have always been the requirements of
our customers. Thus online fsck has been on our TODO list, and it's really
time to start doing it. 😀
> One relatively easy solution to similar class of problems would be to store
> the type of metadata buffer inside the buffer_head when we are verifying
> checksum, clear the info when freeing the block in __ext4_forget(), and
> fail with EFSCORRUPTED error when one type -> another type transition would
> happen.
This solution looks great. If we save checksum further, we can sense not
only the change of block type, but also the change of block data.

But now journal_head takes up bh->b_private, so to record information,
we need to add new fields to buffer_head (e.g. b_info), or modify the logic
of journal_head (e.g. make bh->b_private hold ext4_bufdata, and include the
journal_head in ext4_bufdata. ocfs2 needs a similar adaptation).
>> Implementing rmap may take some time, until then we can avoid the problem
>> as much as possible by checking the magic and xattr block csum.
>> Maybe something like this?
>>
>> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
>> index 7647e9f6e190..cd3ae1e3371c 100644
>> --- a/fs/ext4/xattr.c
>> +++ b/fs/ext4/xattr.c
>> @@ -1676,6 +1676,13 @@ static int ext4_xattr_set_entry(struct
>> ext4_xattr_info *i,
>>                  }
>>          }
>>
>> +       if (WARN_ON_ONCE(last < here)) {
>> +               EXT4_ERROR_INODE(inode, "corrupted xattr entries in %s",
>> +                                       is_block ? "block" : "ibody");
>> +               ret = -EFSCORRUPTED;
>> +               goto out;
>> +       }
>> +
>>          /* Check whether we have enough space. */
>>          if (i->value) {
>>                  size_t free;
>> @@ -1923,6 +1930,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>>          }
>>
>>          if (s->base) {
>> +               struct ext4_xattr_header *hdr;
>>                  int offset = (char *)s->here - bs->bh->b_data;
>>
>>                  BUFFER_TRACE(bs->bh, "get_write_access");
>> @@ -1932,6 +1940,16 @@ ext4_xattr_block_set(handle_t *handle, struct inode
>> *inode,
>>                          goto cleanup;
>>
>>                  lock_buffer(bs->bh);
>> +               hdr = header(s->base);
>> +
>> +               if (hdr->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
>> +                   (ext4_has_metadata_csum(inode->i_sb) &&
>> +                    (ext4_xattr_block_csum(inode, bs->bh->b_blocknr, hdr)
>> !=
>> +                     hdr->h_checksum))) {
>> +                       unlock_buffer(bs->bh);
>> +                       error = -EFSCORRUPTED;
>> +                       goto bad_block;
>> +               }
>>
>>                  if (header(s->base)->h_refcount == cpu_to_le32(1)) {
>>                          __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash);
> Hum, there are more places in xattr code that access a buffer that could
> have been modified. So why do you add check into this place? Is it somehow
> special?
>
> 								Honza

The out-of-bounds access occurs here because the last dentry obtained
in ext4_xattr_set_entry() is not the same as the last dentry obtained
in ext4_xattr_block_find().

When we modify an xattr, we always hold the inode lock, so the ibody
case is not considered. Check_xattrs() is called to do a full check
when looking up an xattr, so it's not necessary to consider that either.

When inserting an xattr into an xattr block, there are three cases:

  * xattr block is newly allocated, and the block is allocated only
    after the entry is set in memory.
  * xattr block is unshared, insert the new xattr directly.
  * xattr block is shared, copy the xattr block and insert the new xattr.

Only in the last two cases can a multiply claimed xattr block result in
out-of-bounds access, so the buffer lock is held to verify that the data
is correct.
(It looks like kmemdup should be moved to the buffer lock here as well.🤔)

Thanks again!

Regards,
Baokun

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Theodore Ts'o 1 year, 3 months ago

On Wed, Oct 16, 2024 at 04:02:40PM +0800, Baokun Li wrote:
> As server clusters get larger and larger, server maintenance becomes very
> difficult. Therefore, timely detection of problems (i.e. online scanning,
> similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e.
> online fsck, similar to e2fsck -a) have always been the requirements of
> our customers. Thus online fsck has been on our TODO list, and it's really
> time to start doing it. 😀

As far as online scaning is concerned, if you are using LVM, we can
use a combination of dm-snapshot and e2fsck -fn --- that is what the
e2scrub command automates.

Online fsck is much harder, since it would require back pointers to do
this efficienctly.  To do this, a general way of solving this would
involve a generalized some kind of b-tree or b+tree where changes are
managed via jbd2.  This could be used so that (for example) if we had
a tree which maps block ranges to an inode number, then given a block
number, we can figure out which inode "owns" that block.  The harder
part is those objects that have multiple forward pointers --- for
example an inode might have multiple hard links to multiple
directories, so we need to handle this somehow.

If we had the jbd2-aware b+tree, we could also use this add support
for reflink/clone, which would also be cool.

If this is something that your team really weants to work on, what I'd
suggest is to create a rough design of what the journaled b+tree would
look like, and then implement it first, since this is the prerequisite
for a huge number of advanced file system features.  Implementation
should be done in a way that makes it easy for the code to be usable
both in the kernel and in e2fsprogs, since life will be much easier if
we have e2fsck and debugfs support for the new file system data
structures from the very beginning of the development.

If your company is willing to invest in the engineering effort to do
this, great!  But I have to point out that an alternative approach
that you should consider is whether XFS might be a closer match for
some of your customers' needs.  The advantage of ext4 is that it is
much simpler and easier to understand that XFS.  But as we add these
new features, ext4 will get more complex.  And so one of the design
considerations we should keep in mind is to keep ext4 as simple and
miantainable as possible, even as we add new functionality.

Cheers,

						- Ted

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Baokun Li 1 year, 3 months ago

On 2024/10/17 4:47, Theodore Ts'o wrote:
> On Wed, Oct 16, 2024 at 04:02:40PM +0800, Baokun Li wrote:
>> As server clusters get larger and larger, server maintenance becomes very
>> difficult. Therefore, timely detection of problems (i.e. online scanning,
>> similar to e2fsck -fn) and timely and non-stop fixing of problems (i.e.
>> online fsck, similar to e2fsck -a) have always been the requirements of
>> our customers. Thus online fsck has been on our TODO list, and it's really
>> time to start doing it. 😀
> As far as online scaning is concerned, if you are using LVM, we can
> use a combination of dm-snapshot and e2fsck -fn --- that is what the
> e2scrub command automates.
Yes, e2scrub is very nice, but it has too many limitations. We have some
customers who don't use lvm.
> Online fsck is much harder, since it would require back pointers to do
> this efficienctly.
Indeed, our rough plan is to first implement isolation of abnormal file
system resources, so that the system can continue to run normally even
when there is an error; then implement online scanning, so that the
maintainer can see the health report at any time; and finally implement
the most difficult online repair.
>   To do this, a general way of solving this would
> involve a generalized some kind of b-tree or b+tree where changes are
> managed via jbd2.  This could be used so that (for example) if we had
> a tree which maps block ranges to an inode number, then given a block
> number, we can figure out which inode "owns" that block.  The harder
> part is those objects that have multiple forward pointers --- for
> example an inode might have multiple hard links to multiple
> directories, so we need to handle this somehow.
We do need to establish the mapping of physical blocks to inodes and inodes
to parent dir. By tree managed by jbd2 do you mean updating the tree when
committing to journal? Or are updates to the tree logged to journal?
>
> If we had the jbd2-aware b+tree, we could also use this add support
> for reflink/clone, which would also be cool.
Yeah, reflink is pretty cool, we can try it out when the others are done.
>
> If this is something that your team really weants to work on, what I'd
> suggest is to create a rough design of what the journaled b+tree would
> look like, and then implement it first, since this is the prerequisite
> for a huge number of advanced file system features.  Implementation
> should be done in a way that makes it easy for the code to be usable
> both in the kernel and in e2fsprogs, since life will be much easier if
> we have e2fsck and debugfs support for the new file system data
> structures from the very beginning of the development.
Thank you for your suggestion! This is really key to the development. We'll
discuss the overall design internally before consulting the community.
> If your company is willing to invest in the engineering effort to do
> this, great!  But I have to point out that an alternative approach
> that you should consider is whether XFS might be a closer match for
> some of your customers' needs.  The advantage of ext4 is that it is
> much simpler and easier to understand that XFS.
The XFS maintainability enhancement is something my colleague is working
on. But we have a fair number of downstream customers who prefer ext4, so
it's worth investing the time to do that.
> But as we add these
> new features, ext4 will get more complex.  And so one of the design
> considerations we should keep in mind is to keep ext4 as simple and
> miantainable as possible, even as we add new functionality.
>
> Cheers,
>
> 						- Ted
Of course! we will keep the code as simple and maintainable as possible.

Thanks again for your input! 😉

Cheers,
Baokun

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Theodore Ts'o 1 year, 3 months ago

On Thu, Oct 17, 2024 at 08:42:59PM +0800, Baokun Li wrote:
> Indeed, our rough plan is to first implement isolation of abnormal file
> system resources, so that the system can continue to run normally even
> when there is an error; then implement online scanning, so that the
> maintainer can see the health report at any time; and finally implement
> the most difficult online repair.

We have some of this already; if a block group has obvious
inconsistencies --- for example, if there is an attempt to mark a
block or inode as free, but it's already marked as free as the
allocation bitmap, we can mark the block group as inconsistent, and
then avoid allocating from the block group.  That's easy because it's
the kind of inconsistency which can be detected locally.

The problem comes with those inconsistencies which require a global
examination of the file system data structures.  For example, is the
refcount of an inode correct?  Or is a block claimed by more than one
inode?  The e2scrub approach requires creating a read-only snapshot
(which is why we need LVM) and then running e2fsck in userspace,
because it does a global examination of all file system data
structures.

> We do need to establish the mapping of physical blocks to inodes and
> inodes to parent dir. By tree managed by jbd2 do you mean updating
> the tree when committing to journal? Or are updates to the tree
> logged to journal?

When we allocate a block, we need to journal the changes to the
allocation bitmap.  If we are going to also update the reverse mapping
data structure, that needs to be journalled also, so that after a
crash, the data structures are consistent.

						- Ted

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Baokun Li 1 year, 3 months ago

On 2024/10/17 22:47, Theodore Ts'o wrote:
> On Thu, Oct 17, 2024 at 08:42:59PM +0800, Baokun Li wrote:
>> Indeed, our rough plan is to first implement isolation of abnormal file
>> system resources, so that the system can continue to run normally even
>> when there is an error; then implement online scanning, so that the
>> maintainer can see the health report at any time; and finally implement
>> the most difficult online repair.
> We have some of this already; if a block group has obvious
> inconsistencies --- for example, if there is an attempt to mark a
> block or inode as free, but it's already marked as free as the
> allocation bitmap, we can mark the block group as inconsistent, and
> then avoid allocating from the block group.  That's easy because it's
> the kind of inconsistency which can be detected locally.
Yes, there is now block group level isolation. Our goal is to further
reduce the scope of isolation to minimise the impact of isolation.

The rough idea is to isolate resources where errors are reported,
and throw errors when isolation is not possible. This may be a bit
crude, but after implementing inline scanning we can achieve precise
fine-grained isolation.
> The problem comes with those inconsistencies which require a global
> examination of the file system data structures.  For example, is the
> refcount of an inode correct?  Or is a block claimed by more than one
> inode?  The e2scrub approach requires creating a read-only snapshot
> (which is why we need LVM) and then running e2fsck in userspace,
> because it does a global examination of all file system data
> structures.
Indeed, consistency is a tricky issue, and we'll focus on that piece of
logic.
>> We do need to establish the mapping of physical blocks to inodes and
>> inodes to parent dir. By tree managed by jbd2 do you mean updating
>> the tree when committing to journal? Or are updates to the tree
>> logged to journal?
> When we allocate a block, we need to journal the changes to the
> allocation bitmap.  If we are going to also update the reverse mapping
> data structure, that needs to be journalled also, so that after a
> crash, the data structures are consistent.
>
> 						- Ted
>
Of course, we have to make sure that the metadata modification and the tree
update are in the same transaction, otherwise there is no guarantee that
the metadata is consistent.

Thank you for your input!

Regards,
Baokun

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Ojaswin Mujoo 1 year, 4 months ago

On Sun, Sep 22, 2024 at 02:42:49PM +0800, Qianqiang Liu wrote:
> syzbot has found an out-of-bounds issue in ext4_xattr_set_entry:
> 
> ==================================================================
> BUG: KASAN: out-of-bounds in ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> Read of size 18446744073709551572 at addr ffff888036426850 by task syz-executor264/5095
> 
> CPU: 0 UID: 0 PID: 5095 Comm: syz-executor264 Not tainted 6.11.0-syzkaller-03917-ga940d9a43e62 #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:93 [inline]
>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
>  print_address_description mm/kasan/report.c:377 [inline]
>  print_report+0x169/0x550 mm/kasan/report.c:488
>  kasan_report+0x143/0x180 mm/kasan/report.c:601
>  kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
>  __asan_memmove+0x29/0x70 mm/kasan/shadow.c:94
>  ext4_xattr_set_entry+0x8ce/0x1f60 fs/ext4/xattr.c:1781
> [...]
> ==================================================================
> 
> This issue is caused by a negative size in memmove.
> We need to check for this.
> 
> Fixes: dec214d00e0d ("ext4: xattr inode deduplication")
> Reported-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=f792df426ff0f5ceb8d1
> Tested-by: syzbot+f792df426ff0f5ceb8d1@syzkaller.appspotmail.com
> Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
> ---
>  fs/ext4/xattr.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
> index 46ce2f21fef9..336badb46246 100644
> --- a/fs/ext4/xattr.c
> +++ b/fs/ext4/xattr.c
> @@ -1776,7 +1776,14 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
>  	} else if (s->not_found) {
>  		/* Insert new name. */
>  		size_t size = EXT4_XATTR_LEN(name_len);
> -		size_t rest = (void *)last - (void *)here + sizeof(__u32);
> +		size_t rest;
> +
> +		if (last < here) {
> +			ret = -ENOSPC;
> +			goto out;
> +		} else {
> +			rest = (void *)last - (void *)here + sizeof(__u32);
> +		}

Hey Qianqiang,

Thanks for the patch. I'm still reviewing this codepath but I do have
some questions around the patch. So I understand that xattrs are
arranged in the following format:

 *   +------------------+
 *   | header           |
 *   | entry 1          | 
 *   | entry 2          | 
 *   | entry 3          | 
 *   | four null bytes  | <--- last
 *   | . . .            | 
 *   | . . .            | <--- here
 *   | . . .            | 
 *   | value 1          | 
 *   | value 3          | 
 *   | value 2          | 
 *   +------------------+

Now, in this error, my understanding is that we are actually ending up
in a case where "here" ie the place where the new xattr entry will go is
beyond the "last" ie the last slot for xattr entry and that is causing
an underflow, something like the above diagram.

My only concern is that why were we not able to detect this in the logic
near the start of the function where we explcity check if we have enough
space? 

Perhaps we should be fixing the logic in that if {..} instead
since the comment a few lines above your fix:

	/* No failures allowed past this point. */

does suggest that we can't error out below that point, so ideally all
the checks would have been done before that.

I'm still going through the issue, will update here if needed.

Regards,
ojaswin

>  
>  		memmove((void *)here + size, here, rest);
>  		memset(here, 0, size);
> -- 
> 2.34.1
> 
> -- 
> Best,
> Qianqiang Liu
>

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Qianqiang Liu 1 year, 4 months ago

Hi Ojaswin,

On Tue, Oct 01, 2024 at 03:11:44PM +0530, Ojaswin Mujoo wrote:
> 
> Hey Qianqiang,
> 
> Thanks for the patch. I'm still reviewing this codepath but I do have
> some questions around the patch. So I understand that xattrs are
> arranged in the following format:
> 
>  *   +------------------+
>  *   | header           |
>  *   | entry 1          | 
>  *   | entry 2          | 
>  *   | entry 3          | 
>  *   | four null bytes  | <--- last
>  *   | . . .            | 
>  *   | . . .            | <--- here
>  *   | . . .            | 
>  *   | value 1          | 
>  *   | value 3          | 
>  *   | value 2          | 
>  *   +------------------+
> 
> Now, in this error, my understanding is that we are actually ending up
> in a case where "here" ie the place where the new xattr entry will go is
> beyond the "last" ie the last slot for xattr entry and that is causing
> an underflow, something like the above diagram.
> 
> My only concern is that why were we not able to detect this in the logic
> near the start of the function where we explcity check if we have enough
> space? 
> 
> Perhaps we should be fixing the logic in that if {..} instead
> since the comment a few lines above your fix:
> 
> 	/* No failures allowed past this point. */
> 
> does suggest that we can't error out below that point, so ideally all
> the checks would have been done before that.
> 
> I'm still going through the issue, will update here if needed.
> 
> Regards,
> ojaswin
> 

I reviewed the codepath, and here is the backtrace when the error occurs:

=> vfs_unlink
=> ext4_unlink
=> __ext4_unlink
=> __ext4_mark_inode_dirty
=> ext4_try_to_expand_extra_isize
=> __ext4_expand_extra_isize
=> ext4_expand_extra_isize_ea
=> ext4_xattr_make_inode_space
=> ext4_xattr_move_to_block -> ext4_xattr_block_find -> xattr_find_entry
=> ext4_xattr_block_set
=> ext4_xattr_set_entry
=> memmove((void *)here + size, here, rest);

The xattr_find_entry function return -ENODATA, but beacuase of the
following code, the error does not be returned to caller:

static int
ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i,
		      struct ext4_xattr_block_find *bs)
{
	...
	error = xattr_find_entry(inode, &bs->s.here, bs->s.end,
				 i->name_index, i->name, 1);
	if (error && error != -ENODATA)
		return error;
	...
}

So, perhaps we could modify the code as follows:

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index e0e1956dcdd3..649b278d4c1f 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1884,7 +1884,7 @@ ext4_xattr_block_find(struct inode *inode, struct ext4_xattr_info *i,
 		bs->s.here = bs->s.first;
 		error = xattr_find_entry(inode, &bs->s.here, bs->s.end,
 					 i->name_index, i->name, 1);
-		if (error && error != -ENODATA)
+		if (error)
 			return error;
 		bs->s.not_found = error;
 	}

Or, we could check if s->not_found is -ENODATA in the ext4_xattr_set_entry function.

Any suggestions?

-- 
Best,
Qianqiang Liu

Re: [PATCH] ext4: fix out-of-bounds issue in ext4_xattr_set_entry

Posted by Qianqiang Liu 1 year, 4 months ago

Hi Ojaswin,

On Tue, Oct 01, 2024 at 03:11:44PM +0530, Ojaswin Mujoo wrote:
> 
> Hey Qianqiang,
> 
> Thanks for the patch. I'm still reviewing this codepath but I do have
> some questions around the patch. So I understand that xattrs are
> arranged in the following format:
> 
>  *   +------------------+
>  *   | header           |
>  *   | entry 1          | 
>  *   | entry 2          | 
>  *   | entry 3          | 
>  *   | four null bytes  | <--- last
>  *   | . . .            | 
>  *   | . . .            | <--- here
>  *   | . . .            | 
>  *   | value 1          | 
>  *   | value 3          | 
>  *   | value 2          | 
>  *   +------------------+
> 
> Now, in this error, my understanding is that we are actually ending up
> in a case where "here" ie the place where the new xattr entry will go is
> beyond the "last" ie the last slot for xattr entry and that is causing
> an underflow, something like the above diagram.
> 
> My only concern is that why were we not able to detect this in the logic
> near the start of the function where we explcity check if we have enough
> space? 
> 
> Perhaps we should be fixing the logic in that if {..} instead
> since the comment a few lines above your fix:
> 
> 	/* No failures allowed past this point. */
> 
> does suggest that we can't error out below that point, so ideally all
> the checks would have been done before that.
> 
> I'm still going through the issue, will update here if needed.
> 
> Regards,
> ojaswin
> 

Thank you for your suggestions.
I will investigate this issue further. If there are any updates, I will
let you know.

-- 
Best,
Qianqiang Liu