[PATCH] ext4: fix OOB read when checking dotdot dir

Jakub Acs posted 1 patch 9 months ago
There is a newer version of this series
fs/ext4/dir.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH] ext4: fix OOB read when checking dotdot dir
Posted by Jakub Acs 9 months ago
Mounting a corrupted filesystem with directory which contains '.' dir
entry with rec_len == block size results in out-of-bounds read (later
on, when the corrupted directory is removed).

ext4_empty_dir() assumes every ext4 directory contains at least '.'
and '..' as directory entries in the first data block. It first loads
the '.' dir entry, performs sanity checks by calling ext4_check_dir_entry()
and then uses its rec_len member to compute the location of '..' dir
entry (in ext4_next_entry). It assumes the '..' dir entry fits into the
same data block.

If the rec_len of '.' is precisely one block (4KB), it slips through the
sanity checks (it is considered the last directory entry in the data
block) and leaves "struct ext4_dir_entry_2 *de" point exactly past the
memory slot allocated to the data block. The following call to
ext4_check_dir_entry() on new value of de then dereferences this pointer
which results in out-of-bounds mem access.

Fix this by extending __ext4_check_dir_entry() to check for '.' dir
entries that reach the end of data block. Make sure to ignore the phony
dir entries for checksum (by checking name_len for non-zero).

Note: This is reported by KASAN as use-after-free in case another
structure was recently freed from the slot past the bound, but it is
really an OOB read.

This issue was found by syzkaller tool.

Call Trace:
[   38.594108] BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x67e/0x710
[   38.594649] Read of size 2 at addr ffff88802b41a004 by task syz-executor/5375
[   38.595158]
[   38.595288] CPU: 0 UID: 0 PID: 5375 Comm: syz-executor Not tainted 6.14.0-rc7 #1
[   38.595298] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   38.595304] Call Trace:
[   38.595308]  <TASK>
[   38.595311]  dump_stack_lvl+0xa7/0xd0
[   38.595325]  print_address_description.constprop.0+0x2c/0x3f0
[   38.595339]  ? __ext4_check_dir_entry+0x67e/0x710
[   38.595349]  print_report+0xaa/0x250
[   38.595359]  ? __ext4_check_dir_entry+0x67e/0x710
[   38.595368]  ? kasan_addr_to_slab+0x9/0x90
[   38.595378]  kasan_report+0xab/0xe0
[   38.595389]  ? __ext4_check_dir_entry+0x67e/0x710
[   38.595400]  __ext4_check_dir_entry+0x67e/0x710
[   38.595410]  ext4_empty_dir+0x465/0x990
[   38.595421]  ? __pfx_ext4_empty_dir+0x10/0x10
[   38.595432]  ext4_rmdir.part.0+0x29a/0xd10
[   38.595441]  ? __dquot_initialize+0x2a7/0xbf0
[   38.595455]  ? __pfx_ext4_rmdir.part.0+0x10/0x10
[   38.595464]  ? __pfx___dquot_initialize+0x10/0x10
[   38.595478]  ? down_write+0xdb/0x140
[   38.595487]  ? __pfx_down_write+0x10/0x10
[   38.595497]  ext4_rmdir+0xee/0x140
[   38.595506]  vfs_rmdir+0x209/0x670
[   38.595517]  ? lookup_one_qstr_excl+0x3b/0x190
[   38.595529]  do_rmdir+0x363/0x3c0
[   38.595537]  ? __pfx_do_rmdir+0x10/0x10
[   38.595544]  ? strncpy_from_user+0x1ff/0x2e0
[   38.595561]  __x64_sys_unlinkat+0xf0/0x130
[   38.595570]  do_syscall_64+0x5b/0x180
[   38.595583]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: ac27a0ec112a0 ("[PATCH] ext4: initial copy of files from ext3")
Signed-off-by: Jakub Acs <acsjakub@amazon.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Mahmoud Adam <mngyadam@amazon.com>
Cc: stable@vger.kernel.org
Cc: security@kernel.org
---
If not fixed, this potentially leaks information from kernel data
structures allocated after the data block.

I based the check on the assumption that every ext4 directory has '.'
followed by at least one entry ('..') in the first data block.
(the code in ext4_empty_dir seems to operate on this assumption)

..and it is also supported by claim in
https://www.kernel.org/doc/html/latest/filesystems/ext4/directory.html:
"By ext2 custom, the '.' and '..' entries must appear at the beginning of
this first block"

Please confirm that this is correct and there are no valid ext4
directories that have '.' as the last directory entry. If this
assumption is wrong, I would fix the caller rather than the callee.

Testing:
I booted the kernel in AL2023 EC2 Instance and ran ext4 tests from
git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git:

<setup with loop devices>
./check ext4/*
[..]
ext4/002       [not run] This test requires a valid $SCRATCH_LOGDEV
ext4/004       [not run] dump utility required, skipped this test
ext4/006       [not run] Couldn't find e2fuzz
ext4/029       [not run] This test requires a valid $SCRATCH_LOGDEV
ext4/030       [not run] mount /dev/loop1 with dax failed
ext4/031       [not run] mount /dev/loop1 with dax failed
ext4/047       [not run] mount /dev/loop1 with dax=always failed
ext4/055       [not run] fsgqa user not defined.
ext4/057       [not run] UUID ioctls are not supported by kernel.
Not run: ext4/002 ext4/004 ext4/006 ext4/029 ext4/030 ext4/031 ext4/047
ext4/055 ext4/057
Passed all 69 tests
(please let me know if any of the skipped tests need to be run)

Thanks,
Jakub
---
 fs/ext4/dir.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 02d47a64e8d1..d157a6c0eff6 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -104,6 +104,9 @@ int __ext4_check_dir_entry(const char *function, unsigned int line,
 	else if (unlikely(le32_to_cpu(de->inode) >
 			le32_to_cpu(EXT4_SB(dir->i_sb)->s_es->s_inodes_count)))
 		error_msg = "inode out of bounds";
+	else if (unlikely(de->name_len > 0 && strcmp(".", de->name) == 0 &&
+			  next_offset == size))
+		error_msg = "'.' directory cannot be the last in data block";
 	else
 		return 0;
 
-- 
2.47.1
Re: [PATCH] ext4: fix OOB read when checking dotdot dir
Posted by Theodore Ts'o 9 months ago
On Wed, Mar 19, 2025 at 11:01:34AM +0000, Jakub Acs wrote:
> If the rec_len of '.' is precisely one block (4KB), it slips through the
> sanity checks (it is considered the last directory entry in the data
> block) and leaves "struct ext4_dir_entry_2 *de" point exactly past the
> memory slot allocated to the data block. The following call to
> ext4_check_dir_entry() on new value of de then dereferences this pointer
> which results in out-of-bounds mem access.
> 
> Fix this by extending __ext4_check_dir_entry() to check for '.' dir
> entries that reach the end of data block. Make sure to ignore the phony
> dir entries for checksum (by checking name_len for non-zero).
> 
> Note: This is reported by KASAN as use-after-free in case another
> structure was recently freed from the slot past the bound, but it is
> really an OOB read.

Well, a (non-inline) directory where '.' is a single directory entry
that fills the file system block should (probably) never occur in
nature.  So from that perspective, the check that you propose is
probably fine.

HOWEVER.  The check that e2fsck does is slightly different, which is
stronger in some ways, and weaker than others relative to what you
propose.  What e2fsck checks for non-inline directory is that '.' must
be first entry, and '..' must be the second directory entry.

So if somehow, a file system gets created where '.' is the first entry
that fills the first directory block, and '..' is the second entry
which is at the beginning of the second directory block, we would be
in the interesting situation where it's possible that the kernel would
report the file system to be corrupted --- not in the case of
check_empty_dir() used in the rmdir_path, but there are other calls to
check_dir_entry that will result in the file system to be declare
corrupted --- then when the user runs e2fsck on a file system declared
corrupted by the kernel, e2fsck would return a Big Thumbs Up; but then
when the kernel trips over that directory again, it would get declared
corrupted again.

So this ultimately turns on the definition of "valid".  If the
definition is "could this happen naturally, at least using all Linux
implementations that I'm aware of", then no, it's not valid.  However,
if there is some other implementation of ext2/ext3/ext4 (say, BSD,
Hurd, etc.), or a potentially malicious user (or a fuzzer) carefully
crafts a file system, and e2fsck reports that the file system doesn't
have any problems, then yes there _could_ be valid file systems where
'.' is both the first and last directory entry.

Is it likely that there are other legitimate implementations that
would create such a file system?  No, it's highly unlikely.  So one
approach might be to make this change in what is officially considered
valid as defined by e2fsck.  The downside of this would be that there
could be a version skew, where the kernel change gets backported into
an LTS kernel, but the user doesn't upgrade to a newer version of
e2fsprogs, the user might get confused.

In this particular case, I think it's worthwhile to make the change,
since if we don't make the change in __ext4_check_dir_entry(), it's
not enough to make a change in ext4_empty_dir().  We'd have to audit
*all* of check_dir_entry's, and a quick check indicates we'd also need
to add a check to ext4_get_first_dir_block().

> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index 02d47a64e8d1..d157a6c0eff6 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -104,6 +104,9 @@ int __ext4_check_dir_entry(const char *function, unsigned int line,
>  	else if (unlikely(le32_to_cpu(de->inode) >
>  			le32_to_cpu(EXT4_SB(dir->i_sb)->s_es->s_inodes_count)))
>  		error_msg = "inode out of bounds";
> +	else if (unlikely(de->name_len > 0 && strcmp(".", de->name) == 0 &&
> +			  next_offset == size))
> +		error_msg = "'.' directory cannot be the last in data block";
>  	else
>  		return 0;

I'd change the check to:

	else if (unlikely(next_offset == size && de->name_len == 1 &&
			  strcmp(".", de->name) == 0))

which is a bit more optimized.

So if you resend this commit with this change, and remove the question
to the ext4 maintainers (for future reference, it's best to put things
that don't need to be in the commit if the patch gets accepted after
the '---' line and before the diffstat, since that way if the
maintainer is ready to accept the patch, they won't have the edit the
commit description), I'd be happy to accept the patch.

Also, the best way to test a commit is not just to run all of the
ext4/* xfstests patches, but to run a smoke test.  If you use
kvm-xfstests[1], you can do this via "kvm-xfstests smoke", which is
syntactic sugar for "SOAK_DURATION=3m check -g smoketest" and will
take 15-20 minutes.  If you want to spend a bit more time, you can run
the quick group ("check -g quick" or "kvm-xfstests -c ext4/4k -g
quick").

[1] https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md

The reason why I mention using kvm-xfstests is because in the future,
if you try your hand at fixing a syzbot issue that uses a slightly
more exotic feature, such as inline_data, you'd want to do something
like "kvm-xfstests -c ext4/inline <test_specifiers>" so that the file
system is set up correctly for testing those code paths.  And of
course, for more testing fun, please see gce-xfstests[2].

[2] https://thunk.org/gce-xfstests

Many thanks,

					- Ted

P.S.  If anyone is interested in adding support for other cloud
platforms such as Amazon and Azure as an additional back ends to
xfstests-bld -- we have support for MacOS's hvf, Docker and Android as
back ends, and more would be great; please contact me.  Patches
Gratefully Accepted.  :-)
Re: [PATCH] ext4: fix OOB read when checking dotdot dir
Posted by Linus Torvalds 9 months ago
On Wed, 19 Mar 2025 at 06:06, Theodore Ts'o <tytso@mit.edu> wrote:
>
> I'd change the check to:
>
>         else if (unlikely(next_offset == size && de->name_len == 1 &&
>                           strcmp(".", de->name) == 0))
>
> which is a bit more optimized.

Why would you use 'strcmp()' when you just checked that the length is one?

IOW, if you are talking about "a bit more optimized", please just check

        de->name[0] == '.'

after you've checked that the length is 1.

No?

             Linus
Re: [PATCH] ext4: fix OOB read when checking dotdot dir
Posted by Theodore Ts'o 9 months ago
On Wed, Mar 19, 2025 at 09:28:56AM -0700, Linus Torvalds wrote:
> Why would you use 'strcmp()' when you just checked that the length is one?
> 
> IOW, if you are talking about "a bit more optimized", please just check
> 
>         de->name[0] == '.'
> 
> after you've checked that the length is 1.

Yes, good point.  My suggestion was to optimize htings by avoid
calling strcmp() in most cases, but you're absolutely correct that we
could do even better by *never* calling strcmp().

						- Ted