In ext4, even if an allocated range is physically and logically
contiguous, it can still be split into 2 extents. This is because ext4
does not merge extents across leaf nodes. This is an issue for atomic
writes since even for a continuous extent the map block could (in rare
cases) return a shorter map, hence tearning the write. This test creates
such a file and ensures that the atomic write handles this case
correctly
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
tests/ext4/063 | 125 +++++++++++++++++++++++++++++++++++++++++++++
tests/ext4/063.out | 2 +
2 files changed, 127 insertions(+)
create mode 100755 tests/ext4/063
create mode 100644 tests/ext4/063.out
diff --git a/tests/ext4/063 b/tests/ext4/063
new file mode 100755
index 00000000..25b5693d
--- /dev/null
+++ b/tests/ext4/063
@@ -0,0 +1,125 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 IBM Corporation. All Rights Reserved.
+#
+# In ext4, even if an allocated range is physically and logically contiguous,
+# it can still be split into 2 extents. This is because ext4 does not merge
+# extents across leaf nodes. This is an issue for atomic writes since even for
+# a continuous extent the map block could (in rare cases) return a shorter map,
+# hence tearning the write. This test creates such a file and ensures that the
+# atomic write handles this case correctly
+#
+. ./common/preamble
+. ./common/atomicwrites
+_begin_fstest auto atomicwrites
+
+_require_scratch_write_atomic_multi_fsblock
+_require_atomic_write_test_commands
+_require_command "$DEBUGFS_PROG" debugfs
+
+prep() {
+ local bs=`_get_block_size $SCRATCH_MNT`
+ local ex_hdr_bytes=12
+ local ex_entry_bytes=12
+ local entries_per_blk=$(( (bs - ex_hdr_bytes) / ex_entry_bytes ))
+
+ # fill the extent tree leaf which bs len extents at alternate offsets. For example,
+ # for 4k bs the tree should look as follows
+ #
+ # +---------+---------+
+ # | index 1 | index 2 |
+ # +-----+---+-----+---+
+ # +--------+ +-------+
+ # | |
+ # +----------+--------------+ +-----+-----+
+ # | ex 1 | ex 2 |... | ex n | | ex n + 1 |
+ # +-------------------------+ +-----------+
+ # 0 2 680 682
+ for i in $(seq 0 $entries_per_blk)
+ do
+ $XFS_IO_PROG -fc "pwrite -b $bs $((i * 2 * bs)) $bs" $testfile > /dev/null
+ done
+ sync $testfile
+
+ echo >> $seqres.full
+ echo "Create file with extents spanning 2 leaves. Extents:">> $seqres.full
+ echo "...">> $seqres.full
+ $DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full
+
+ # Now try to insert a new extent ex(new) between ex(n) and ex(n+1). Since
+ # this is a new FS the allocator would find continuous blocks such that
+ # ex(n) ex(new) ex(n+1) are physically(and logically) contiguous. However,
+ # since we dont merge extents across leaf we will end up with a tree as:
+ #
+ # +---------+---------+
+ # | index 1 | index 2 |
+ # +-----+---+-----+---+
+ # +--------+ +-------+
+ # | |
+ # +----------+--------------+ +-----+-----+
+ # | ex 1 | ex 2 |... | ex n | | ex merged |
+ # +-------------------------+ +-----------+
+ # 0 2 680 681 682 684
+ #
+ echo >> $seqres.full
+ torn_ex_offset=$((((entries_per_blk * 2) - 1) * bs))
+ $XFS_IO_PROG -c "pwrite $torn_ex_offset $bs" $testfile >> /dev/null
+ sync $testfile
+
+ echo >> $seqres.full
+ echo "Perform 1 block write at $torn_ex_offset to create torn extent. Extents:">> $seqres.full
+ echo "...">> $seqres.full
+ $DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full
+
+ _scratch_cycle_mount
+}
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount >> $seqres.full
+
+testfile=$SCRATCH_MNT/testfile
+touch $testfile
+awu_max=$(_get_atomic_write_unit_max $testfile)
+
+echo >> $seqres.full
+echo "# Prepping the file" >> $seqres.full
+prep
+
+torn_aw_offset=$((torn_ex_offset - (torn_ex_offset % awu_max)))
+
+echo >> $seqres.full
+echo "# Performing atomic IO on the torn extent range. Command: " >> $seqres.full
+echo $XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $awu_max $torn_aw_offset $awu_max" >> $seqres.full
+$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $awu_max $torn_aw_offset $awu_max" >> $seqres.full
+
+echo >> $seqres.full
+echo "Extent state after atomic write:">> $seqres.full
+echo "...">> $seqres.full
+$DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full
+
+echo >> $seqres.full
+echo "# Checking data integrity" >> $seqres.full
+
+# create a dummy file with expected data
+$XFS_IO_PROG -fc "pwrite -S 0x61 -b $awu_max 0 $awu_max" $testfile.exp >> /dev/null
+expected_data=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp)
+
+# We ensure that the data after atomic writes should match the expected data
+actual_data=$(od -An -t x1 -j $torn_aw_offset -N $awu_max $testfile)
+if [[ "$actual_data" != "$expected_data" ]]
+then
+ echo "Checksum match failed at off: $torn_aw_offset size: $awu_max"
+ echo
+ echo "Expected: "
+ echo "$expected_data"
+ echo
+ echo "Actual contents: "
+ echo "$actual_data"
+
+ _fail
+fi
+
+echo -n "Data verification at offset $torn_aw_offset suceeded!" >> $seqres.full
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/ext4/063.out b/tests/ext4/063.out
new file mode 100644
index 00000000..de35fc52
--- /dev/null
+++ b/tests/ext4/063.out
@@ -0,0 +1,2 @@
+QA output created by 063
+Silence is golden
--
2.49.0
On Sat, Jul 12, 2025 at 07:42:54PM +0530, Ojaswin Mujoo wrote: > In ext4, even if an allocated range is physically and logically > contiguous, it can still be split into 2 extents. This is because ext4 > does not merge extents across leaf nodes. This is an issue for atomic > writes since even for a continuous extent the map block could (in rare > cases) return a shorter map, hence tearning the write. This test creates > such a file and ensures that the atomic write handles this case > correctly > > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> > --- > tests/ext4/063 | 125 +++++++++++++++++++++++++++++++++++++++++++++ > tests/ext4/063.out | 2 + > 2 files changed, 127 insertions(+) > create mode 100755 tests/ext4/063 > create mode 100644 tests/ext4/063.out > > diff --git a/tests/ext4/063 b/tests/ext4/063 > new file mode 100755 > index 00000000..25b5693d > --- /dev/null > +++ b/tests/ext4/063 > @@ -0,0 +1,125 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2025 IBM Corporation. All Rights Reserved. > +# > +# In ext4, even if an allocated range is physically and logically contiguous, > +# it can still be split into 2 extents. This is because ext4 does not merge > +# extents across leaf nodes. This is an issue for atomic writes since even for > +# a continuous extent the map block could (in rare cases) return a shorter map, > +# hence tearning the write. This test creates such a file and ensures that the > +# atomic write handles this case correctly > +# > +. ./common/preamble > +. ./common/atomicwrites > +_begin_fstest auto atomicwrites > + > +_require_scratch_write_atomic_multi_fsblock > +_require_atomic_write_test_commands > +_require_command "$DEBUGFS_PROG" debugfs > + > +prep() { > + local bs=`_get_block_size $SCRATCH_MNT` > + local ex_hdr_bytes=12 > + local ex_entry_bytes=12 > + local entries_per_blk=$(( (bs - ex_hdr_bytes) / ex_entry_bytes )) > + > + # fill the extent tree leaf which bs len extents at alternate offsets. For example, > + # for 4k bs the tree should look as follows > + # > + # +---------+---------+ > + # | index 1 | index 2 | > + # +-----+---+-----+---+ > + # +--------+ +-------+ > + # | | > + # +----------+--------------+ +-----+-----+ > + # | ex 1 | ex 2 |... | ex n | | ex n + 1 | > + # +-------------------------+ +-----------+ > + # 0 2 680 682 > + for i in $(seq 0 $entries_per_blk) > + do > + $XFS_IO_PROG -fc "pwrite -b $bs $((i * 2 * bs)) $bs" $testfile > /dev/null > + done > + sync $testfile > + > + echo >> $seqres.full > + echo "Create file with extents spanning 2 leaves. Extents:">> $seqres.full > + echo "...">> $seqres.full > + $DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full > + > + # Now try to insert a new extent ex(new) between ex(n) and ex(n+1). Since > + # this is a new FS the allocator would find continuous blocks such that > + # ex(n) ex(new) ex(n+1) are physically(and logically) contiguous. However, > + # since we dont merge extents across leaf we will end up with a tree as: > + # > + # +---------+---------+ > + # | index 1 | index 2 | > + # +-----+---+-----+---+ > + # +--------+ +-------+ > + # | | > + # +----------+--------------+ +-----+-----+ > + # | ex 1 | ex 2 |... | ex n | | ex merged | > + # +-------------------------+ +-----------+ > + # 0 2 680 681 682 684 Where did 684 come from? It's not in the 'before' diagram. Did "ex n + 1" previously map 682-684, and now it maps 681-684? The rest looks ok though. --D > + # > + echo >> $seqres.full > + torn_ex_offset=$((((entries_per_blk * 2) - 1) * bs)) > + $XFS_IO_PROG -c "pwrite $torn_ex_offset $bs" $testfile >> /dev/null > + sync $testfile > + > + echo >> $seqres.full > + echo "Perform 1 block write at $torn_ex_offset to create torn extent. Extents:">> $seqres.full > + echo "...">> $seqres.full > + $DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full > + > + _scratch_cycle_mount > +} > + > +_scratch_mkfs >> $seqres.full > +_scratch_mount >> $seqres.full > + > +testfile=$SCRATCH_MNT/testfile > +touch $testfile > +awu_max=$(_get_atomic_write_unit_max $testfile) > + > +echo >> $seqres.full > +echo "# Prepping the file" >> $seqres.full > +prep > + > +torn_aw_offset=$((torn_ex_offset - (torn_ex_offset % awu_max))) > + > +echo >> $seqres.full > +echo "# Performing atomic IO on the torn extent range. Command: " >> $seqres.full > +echo $XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $awu_max $torn_aw_offset $awu_max" >> $seqres.full > +$XFS_IO_PROG -c "open -fsd $testfile" -c "pwrite -S 0x61 -DA -V1 -b $awu_max $torn_aw_offset $awu_max" >> $seqres.full > + > +echo >> $seqres.full > +echo "Extent state after atomic write:">> $seqres.full > +echo "...">> $seqres.full > +$DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full > + > +echo >> $seqres.full > +echo "# Checking data integrity" >> $seqres.full > + > +# create a dummy file with expected data > +$XFS_IO_PROG -fc "pwrite -S 0x61 -b $awu_max 0 $awu_max" $testfile.exp >> /dev/null > +expected_data=$(od -An -t x1 -j 0 -N $awu_max $testfile.exp) > + > +# We ensure that the data after atomic writes should match the expected data > +actual_data=$(od -An -t x1 -j $torn_aw_offset -N $awu_max $testfile) > +if [[ "$actual_data" != "$expected_data" ]] > +then > + echo "Checksum match failed at off: $torn_aw_offset size: $awu_max" > + echo > + echo "Expected: " > + echo "$expected_data" > + echo > + echo "Actual contents: " > + echo "$actual_data" > + > + _fail > +fi > + > +echo -n "Data verification at offset $torn_aw_offset suceeded!" >> $seqres.full > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/ext4/063.out b/tests/ext4/063.out > new file mode 100644 > index 00000000..de35fc52 > --- /dev/null > +++ b/tests/ext4/063.out > @@ -0,0 +1,2 @@ > +QA output created by 063 > +Silence is golden > -- > 2.49.0 > >
On Tue, Jul 29, 2025 at 12:41:54PM -0700, Darrick J. Wong wrote: > On Sat, Jul 12, 2025 at 07:42:54PM +0530, Ojaswin Mujoo wrote: > > In ext4, even if an allocated range is physically and logically > > contiguous, it can still be split into 2 extents. This is because ext4 > > does not merge extents across leaf nodes. This is an issue for atomic > > writes since even for a continuous extent the map block could (in rare > > cases) return a shorter map, hence tearning the write. This test creates > > such a file and ensures that the atomic write handles this case > > correctly > > > > Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> > > --- > > tests/ext4/063 | 125 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/ext4/063.out | 2 + > > 2 files changed, 127 insertions(+) > > create mode 100755 tests/ext4/063 > > create mode 100644 tests/ext4/063.out > > > > diff --git a/tests/ext4/063 b/tests/ext4/063 > > new file mode 100755 > > index 00000000..25b5693d > > --- /dev/null > > +++ b/tests/ext4/063 > > @@ -0,0 +1,125 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2025 IBM Corporation. All Rights Reserved. > > +# > > +# In ext4, even if an allocated range is physically and logically contiguous, > > +# it can still be split into 2 extents. This is because ext4 does not merge > > +# extents across leaf nodes. This is an issue for atomic writes since even for > > +# a continuous extent the map block could (in rare cases) return a shorter map, > > +# hence tearning the write. This test creates such a file and ensures that the > > +# atomic write handles this case correctly > > +# > > +. ./common/preamble > > +. ./common/atomicwrites > > +_begin_fstest auto atomicwrites > > + > > +_require_scratch_write_atomic_multi_fsblock > > +_require_atomic_write_test_commands > > +_require_command "$DEBUGFS_PROG" debugfs > > + > > +prep() { > > + local bs=`_get_block_size $SCRATCH_MNT` > > + local ex_hdr_bytes=12 > > + local ex_entry_bytes=12 > > + local entries_per_blk=$(( (bs - ex_hdr_bytes) / ex_entry_bytes )) > > + > > + # fill the extent tree leaf which bs len extents at alternate offsets. For example, > > + # for 4k bs the tree should look as follows > > + # > > + # +---------+---------+ > > + # | index 1 | index 2 | > > + # +-----+---+-----+---+ > > + # +--------+ +-------+ > > + # | | > > + # +----------+--------------+ +-----+-----+ > > + # | ex 1 | ex 2 |... | ex n | | ex n + 1 | > > + # +-------------------------+ +-----------+ > > + # 0 2 680 682 > > + for i in $(seq 0 $entries_per_blk) > > + do > > + $XFS_IO_PROG -fc "pwrite -b $bs $((i * 2 * bs)) $bs" $testfile > /dev/null > > + done > > + sync $testfile > > + > > + echo >> $seqres.full > > + echo "Create file with extents spanning 2 leaves. Extents:">> $seqres.full > > + echo "...">> $seqres.full > > + $DEBUGFS_PROG -R "ex `basename $testfile`" $SCRATCH_DEV |& tail >> $seqres.full > > + > > + # Now try to insert a new extent ex(new) between ex(n) and ex(n+1). Since > > + # this is a new FS the allocator would find continuous blocks such that > > + # ex(n) ex(new) ex(n+1) are physically(and logically) contiguous. However, > > + # since we dont merge extents across leaf we will end up with a tree as: > > + # > > + # +---------+---------+ > > + # | index 1 | index 2 | > > + # +-----+---+-----+---+ > > + # +--------+ +-------+ > > + # | | > > + # +----------+--------------+ +-----+-----+ > > + # | ex 1 | ex 2 |... | ex n | | ex merged | > > + # +-------------------------+ +-----------+ > > + # 0 2 680 681 682 684 > > Where did 684 come from? It's not in the 'before' diagram. Did > "ex n + 1" previously map 682-684, and now it maps 681-684? Okay so the 684 is a bit misleading as in there is nothing there. The extent at 682 is len=1 and spans [682-683). Now that you pointed it out, I think the 0..2...680 logicial offsets are confusing, since they are actually ext4_extent.ee_block values but the diagram makes it seem like they are indexes into the array of extents. Let me see if I can make it better. Thanks for the review! ojaswin > > The rest looks ok though. > > --D >
© 2016 - 2025 Red Hat, Inc.