fs/ext4/xattr.c | 58 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 42 insertions(+), 16 deletions(-)
[BUG]
KASAN reports show out-of-bounds and use-after-free memory accesses when
ext4_xattr_set_entry() processes corrupted on-disk xattr entries:
BUG: KASAN: slab-out-of-bounds in ext4_xattr_set_entry+0xfc2/0x1f40 fs/ext4/xattr.c:1735
Write of size 12 at addr ffff88801a249af4 [slab OOB]
Call Trace:
...
ext4_xattr_set_entry+0xfc2/0x1f40 fs/ext4/xattr.c:1735
ext4_xattr_ibody_set+0x396/0x5a0 fs/ext4/xattr.c:2268
ext4_destroy_inline_data_nolock+0x25e/0x560 fs/ext4/inline.c:463
ext4_convert_inline_data_nolock+0x186/0xa80 fs/ext4/inline.c:1105
ext4_try_add_inline_entry+0x58e/0x960 fs/ext4/inline.c:1224
ext4_add_entry+0x6d2/0xce0 fs/ext4/namei.c:2389
ext4_rename+0x133c/0x2490 fs/ext4/namei.c:3929
ext4_rename2+0x1de/0x2c0 fs/ext4/namei.c:4208
vfs_rename+0xd42/0x1d50 fs/namei.c:5216
do_renameat2+0x715/0xb60 fs/namei.c:5364
...
BUG: KASAN: use-after-free in ext4_xattr_set_entry+0xfd3/0x1f40 fs/ext4/xattr.c:1736
Write of size 65796 at addr ffff88802feb6ee8 [UAF across page boundary]
[CAUSE]
During inode load, xattr_check_inode() validates the ibody xattr entries
found in the inode at that time, and ext4_read_inode_extra() sets
EXT4_STATE_XATTR after the validation succeeds.
Later, when updating an ibody xattr, ext4_xattr_ibody_find() does not rely
on those already validated contents. It calls ext4_get_inode_loc() and
reads the inode table block again, so the entry eventually passed to
ext4_xattr_set_entry() comes from a new on-disk read. xattr_find_entry()
may return that entry based on its name, but does not revalidate its
e_value_offs and e_value_size before they are dereferenced.
Therefore, if the inode table block is modified between inode load and the
later xattr update, the code ends up validating one version of the xattr
data and using another. ext4_xattr_set_entry() may then consume corrupted
e_value_offs/e_value_size fields from the newly read entry, which can cause
out-of-bounds accesses, size_t underflow, and use-after-free.
[FIX]
Fix this by validating the target entry's value offset and size in
ext4_xattr_set_entry() before using them. Reject invalid entries
with -EFSCORRUPTED, consistent with the checks already enforced by
check_xattrs() for ibody xattrs.
Fixes: dec214d00e0d7 ("ext4: xattr inode deduplication")
Cc: stable@vger.kernel.org
Signed-off-by: ZhengYuan Huang <gality369@gmail.com>
---
fs/ext4/xattr.c | 58 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 42 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index ce7253b3f549..3ebfe2dfcae9 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1638,6 +1638,48 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
EXT4_XATTR_SIZE(le32_to_cpu(here->e_value_size)) : 0;
new_size = (i->value && !in_inode) ? EXT4_XATTR_SIZE(i->value_len) : 0;
+ /* Compute min_offs and last. */
+ last = s->first;
+ for (; !IS_LAST_ENTRY(last); last = next) {
+ next = EXT4_XATTR_NEXT(last);
+ if ((void *)next >= s->end) {
+ EXT4_ERROR_INODE(inode, "corrupted xattr entries");
+ ret = -EFSCORRUPTED;
+ goto out;
+ }
+ if (!last->e_value_inum && last->e_value_size) {
+ size_t offs = le16_to_cpu(last->e_value_offs);
+
+ if (offs < min_offs)
+ min_offs = offs;
+ }
+ }
+
+ /*
+ * Validate the value range before dereferencing e_value_offs / e_value_size.
+ * This mirrors check_xattrs() for the entry we are about to touch.
+ */
+ if (!s->not_found && !here->e_value_inum && here->e_value_size) {
+ u16 offs = le16_to_cpu(here->e_value_offs);
+ size_t size = le32_to_cpu(here->e_value_size);
+ void *value;
+
+ if (offs > s->end - s->base) {
+ EXT4_ERROR_INODE(inode, "corrupted xattr entry: invalid value offset");
+ ret = -EFSCORRUPTED;
+ goto out;
+ }
+
+ value = s->base + offs;
+ if (value < (void *)last + sizeof(__u32) ||
+ size > s->end - value ||
+ EXT4_XATTR_SIZE(size) > s->end - value) {
+ EXT4_ERROR_INODE(inode, "corrupted xattr entry: invalid value range");
+ ret = -EFSCORRUPTED;
+ goto out;
+ }
+ }
+
/*
* Optimization for the simple case when old and new values have the
* same padded sizes. Not applicable if external inodes are involved.
@@ -1657,22 +1699,6 @@ static int ext4_xattr_set_entry(struct ext4_xattr_info *i,
goto update_hash;
}
- /* Compute min_offs and last. */
- last = s->first;
- for (; !IS_LAST_ENTRY(last); last = next) {
- next = EXT4_XATTR_NEXT(last);
- if ((void *)next >= s->end) {
- EXT4_ERROR_INODE(inode, "corrupted xattr entries");
- ret = -EFSCORRUPTED;
- goto out;
- }
- if (!last->e_value_inum && last->e_value_size) {
- size_t offs = le16_to_cpu(last->e_value_offs);
- if (offs < min_offs)
- min_offs = offs;
- }
- }
-
/* Check whether we have enough space. */
if (i->value) {
size_t free;
--
2.43.0
On Wed, Mar 18, 2026 at 03:58:42PM +0800, ZhengYuan Huang wrote:
> [BUG]
> KASAN reports show out-of-bounds and use-after-free memory accesses when
> ext4_xattr_set_entry() processes corrupted on-disk xattr entries:
Can you send us a pointer to the reproducer? And does the reproducer
involve actively modifying the mounted file system image, either via
the block device or the underlying file (if a loop device is being used)?
- Ted
On Wed, Mar 18, 2026 at 10:46 PM Theodore Tso <tytso@mit.edu> wrote: > Can you send us a pointer to the reproducer? And does the reproducer > involve actively modifying the mounted file system image, either via > the block device or the underlying file (if a loop device is being used)? Thanks for your reply. I'm happy to provide a reproducer. The following PoC reproduces the bug deterministically. The PoC is too large to inline in email, so I uploaded it here: https://drive.google.com/drive/folders/1OzH1XvAOAb9ulpOKfL70U1LvXhhlHAyz Steps to reproduce: 1. Download the PoC from the provided link and extract it. 2. Build the ublk helper program from the ublk codebase, which is used to provide the runtime corruption capability: g++ -std=c++20 -fcoroutines -O2 -o standalone_replay \ standalone_replay_ext4.cpp targets/ublksrv_tgt.cpp \ -I. -Iinclude -Itargets/include \ -L./lib/.libs -lublksrv -luring -lpthread 3. Attach the image through ublk: ./standalone_replay add -t loop -f /path/to/image 4. Run the reproducer: ./syz-execprog -executor=./syz-executor -repeat=0 -procs=1 -threaded=0 -sandbox=none -method=dynamic -fstype=ext4 ./corpus0 I can reproduce the issue reliably on Ubuntu 24.04. For completeness: the syz-execprog and syz-executor binaries here are based on syzkaller, with only small local changes to add the environment setup required by this reproducer. I can also provide the modified sources if that would be helpful. Apologies for the complexity of the reproducer. This issue was found by our fuzzing tool, and I am still working on minimizing it, which might take some time. I will send an updated, minimized version as soon as possible. And yes, the reproducer does involve actively modifying the mounted filesystem image. We use ublk to enable this behavior. thanks, ZhengYuan Huang
On Thu, Mar 19, 2026 at 07:13:45PM +0800, ZhengYuan Huang wrote:
>
> And yes, the reproducer does involve actively modifying the mounted
> filesystem image. We use ublk to enable this behavior.
We don't consider bugs which involve modfying the mounted filesystem
as valid from a security perspective. In particular, I don't want to
add checks to hotpaths to try to protect against these sorts of
failures, because they simply shouldn't be allowed --- and/or if the
attacker has write access to the block device while the file system is
mounted, you've basically lost already.
We have added the ioctl EXT4_IOC_SET_TUNE_SB_PARAM to more recent
kernels, and will be teaching tune2fs to use that instead of modifying
the mounted file system directly. Once that happens, we will add a
kernel configuration option which works like
CONFIG_BLK_DEV_WRITE_MOUNTED, but which is ext4 specific; so that for
distributions that are shipping a sufficiently new version of
e2fsprogs, they can block write access to mounted ext4 file systems.
Quoting from Kconfig documentation for CONFIG_BLK_DEV_WRITE_MOUNTED:
When a block device is mounted, writing to its buffer cache is very
likely going to cause filesystem corruption. It is also rather easy to
crash the kernel in this way since the filesystem has no practical way
of detecting these writes to buffer cache and verifying its metadata
integrity. However there are some setups that need this capability
like running fsck on read-only mounted root device, modifying some
features on mounted ext4 filesystem, and similar. If you say N, the
kernel will prevent processes from writing to block devices that are
mounted by filesystems which provides some more protection from runaway
privileged processes and generally makes it much harder to crash
filesystem drivers.
The official syzkaller instance sets CONFIG_BLK_DEV_WRITE_MOUNTED to
"no" and we've asked them to block fuzzers which try to modify the
file used by loopback mounts, because these sorts of syzkaller reports
are pure noise as far as we are concerned.
If system administrators are stupid enough to make the block device
world writeable, they deserve everything they get. Similarly, if
system administrators don't run fsck on random USB thumb drive dropped
in the parking lot by the MSS or the KGB before mounting it, again,
the bug is between the chair and keyboard. (For that matter,
inserting a random USB device for which you aren't sure whether it
came from a trusted source can make you vulnerable to hardware-level
attacks. Just don't do it.)
That being said, we are more likely to accept patches to address
static file system corruption, but the checks need to be done when the
metadata in question is first loaded, and outside of a hot path. But
trying to defend against dynamic modifications of the block device is
really a fools errand, without completely trashing the performance of
the file system.
Cheers,
- Ted
On Thu, Mar 19, 2026 at 9:59 PM Theodore Tso <tytso@mit.edu> wrote: > We don't consider bugs which involve modfying the mounted filesystem > as valid from a security perspective. In particular, I don't want to > add checks to hotpaths to try to protect against these sorts of > failures, because they simply shouldn't be allowed --- and/or if the > attacker has write access to the block device while the file system is > mounted, you've basically lost already. Thank you for the detailed explanation. I understand that runtime modifications to a mounted block device are considered out of scope, and adding checks for such cases in hot paths would be too costly. Our original understanding was that a filesystem should handle on-disk inconsistencies gracefully, so we used this approach to simulate silent disk corruption or I/O errors at runtime and test filesystem robustness. From your reply above, it seems that this understanding may not be correct. > That being said, we are more likely to accept patches to address > static file system corruption, but the checks need to be done when the > metadata in question is first loaded, and outside of a hot path. But > trying to defend against dynamic modifications of the block device is > really a fools errand, without completely trashing the performance of > the file system. There seem to be three layers of defense: fsck, mount-time checks, and runtime checks. Would it be more accurate to understand the boundary this way: once the filesystem metadata has passed mount-time validation (even if it would not necessarily pass fsck), the filesystem is still expected to handle later errors gracefully rather than crash? More specifically, for inconsistencies that arise at runtime, is the general expectation that they are outside the filesystem's responsibility and should instead be handled by other layers (for example, lower-level storage redundancy / RAID)? Or is there still room for defensive checks in the filesystem, as long as they are done outside hot paths? Thanks again for your time and clarification. Best regards, ZhengYuan Huang
On Fri, Mar 20, 2026 at 03:43:21PM +0800, ZhengYuan Huang wrote: > > There seem to be three layers of defense: fsck, mount-time checks, and > runtime checks. Within runtime checks, there are those checks that are done the first time metadata is loaded from disk --- for example, see the checks in __ext4_iget() and the functions it calls, such as check_igot_inode(). And then there are checks that are done in hotpaths, since at least in theory, a stupid system administrator which makes a block device be world-writeable and so a malicious or accidental actor could modify the copy of the metadata in the buffer cache. Those are the ones sorts of runtime checks we sould try to avoid. Mount-time checks tend to be those that validate superblock and block group descriptor contents. They can't validate all of the inodes because that would take a lot longer. > Would it be more accurate to understand the boundary > this way: once the filesystem metadata has passed mount-time > validation (even if it would not necessarily pass fsck), the > filesystem is still expected to handle later errors gracefully rather > than crash? It is a nice to have that a file system, should handle errors gracefully rather than crash. However, if the inconsistency would have been caught and corrected by fsck, I don't consider it a CVE-worthy security bug, but rather a quality-of-implementation bug. This is important, because there are risks associated with rolling out a new kernel to hundreds of thousands of machines, or using live patching to fix high severity security bugs. If the issue could have been caught by fsck, and a competently administered system *does* run fsck at boot time (such as at $WORK), the cost benefit ratio of treating such bugs as security bugs doesn't make sense. > More specifically, for inconsistencies that arise at runtime, is the > general expectation that they are outside the filesystem's > responsibility and should instead be handled by other layers (for > example, lower-level storage redundancy / RAID)? Or is there still > room for defensive checks in the filesystem, as long as they are done > outside hot paths? This would be on a case by case basis. If the check is *super* cheap, and it's done outside of a hotpath --- say, when a file is first opened. And if doesn't cause long-term maintenance issues, it is comething that could be considered. But in terms of the priority of dealing with such patches, it is not something that would be considered high priority. Perhaps just a step above spelling or grammer fixes in comments. :-) Consider that for a enterprise hard drives, the bit error rate is 1 in 10**15. And the chances that such as a bit error would be cause a metadata inconsistency that would lead to a crash has to be factored in. If we had infinite resources, it might be something that would be considered higher priority, but in the real world, when the opportunity cost of having software engineers working on other improvements, it's not necessarily going to be a compelling business case when I go to my management asking for more headcount. And if you are an academic, perhaps the impact of such work might also be called into question. Cheers, - Ted
On Wed, Mar 18, 2026 at 03:58:42PM +0800, ZhengYuan Huang wrote: > [BUG] > KASAN reports show out-of-bounds and use-after-free memory accesses when > ext4_xattr_set_entry() processes corrupted on-disk xattr entries: Does runing fsck on the disk image before mounting it catch this error? thanks, greg k-h
© 2016 - 2026 Red Hat, Inc.