From: Zhang Yi <yi.zhang@huawei.com>
Hello!
This series proposes deferring the splitting of unwritten extents from
the point of I/O submission until I/O completion when partially writing
to a preallocated file.
This change primarily needs to address whether it will increase the
likelihood of extent conversion failure due to the inability to split
extents in scenarios with insufficient space, which could result in I/O
write failures and data loss. After analysis, it has been confirmed that
two existing mechanisms ensure I/O operations do not fail.
The first is the EXT4_GET_BLOCKS_METADATA_NOFAIL flag, which is a best
effort, it permits the use of 2% of the reserved space or 4,096 blocks
in the file system when splitting extents. This flag covers most
scenarios where extent splitting might fail. The second is the
EXT4_EXT_MAY_ZEROOUT flag, which is also set during extent splitting. If
the reserved space is insufficient and splitting fails, it does not
retry the allocation. Instead, it directly zeros out the extra part of
the extent, thereby avoiding splitting and directly converting the
entire extent to the written type.
These two mechanisms currently have no difference before I/O submission
or after I/O completion. Therefore, Although deferring extent splitting
will add pressure on reserved space after I/O completion, but it won't
increase the risk of I/O failure and data loss. On the contrary, if some
I/Os can be merged when I/O completion during writeback, it can also
reduce unnecessary splitting operations, thereby alleviating the
pressure on reserved space.
In addition, deferring extent splitting until I/O completion can also
simplify the I/O submission process and avoid initiating unnecessary
journal handles when writing unwritten extents.
Patch 01-03: defer splitting extent until I/O completion.
Patch 04-07: do some cleanup of the DIO path and remove
EXT4_GET_BLOCKS_IO_CREATE_EXT.
Tests:
- Run xfstests with the -g enospc option approximately 50 times. Before
applying this series, the reserved blocks were used over 6000/7000
times on a 1 GB filesystem with a 4 KB / 1 KB block size. After
applying this series, the counts remain nearly the same. In both
cases, there were no splitting failures.
- Run xfstests with the -g enospc option about one day, no regressions
occurred.
- Intentionally create a scenario in which reserved blocks are
exhausted. Before applying the patch, zero out the extent before I/O
submission; after applying the patch, zero out the extent after I/O
completion. There are no other differences.
- xfstests-bld shows no regression.
Performance:
This can improve the write performance of concurrent DIO for multiple
files. The fio tests below show a ~25% performance improvement when
wirting to unwritten files on my VM with a 100G memory backed disk.
[unwritten]
direct=1
ioengine=psync
numjobs=16
rw=write # write/randwrite
bs=4K
iodepth=1
directory=/mnt
size=5G
runtime=30s
overwrite=0
norandommap=1
fallocate=native
ramp_time=5s
group_reporting=1
[w/o]
w: IOPS=62.5k, BW=244MiB/s
rw: IOPS=56.7k, BW=221MiB/s
[w]
w: IOPS=79.6k, BW=311MiB/s
rw: IOPS=70.2k, BW=274MiB/s
TODO:
Next, we can investigate whether, during the buffer I/O write-back
process, writing an unwritten extent can also avoid initiating a journal
handle.
Thank,
Yi.
Zhang Yi (7):
ext4: use reserved metadata blocks when splitting extent on endio
ext4: don't split extent before submitting I/O
ext4: avoid starting handle when dio writing an unwritten extent
ext4: remove useless ext4_iomap_overwrite_ops
ext4: remove unused unwritten parameter in ext4_dio_write_iter()
ext4: simply the mapping query logic in ext4_iomap_begin()
ext4: remove EXT4_GET_BLOCKS_IO_CREATE_EXT
fs/ext4/ext4.h | 10 ---------
fs/ext4/extents.c | 46 ++++-----------------------------------
fs/ext4/file.c | 23 ++++++++------------
fs/ext4/inode.c | 55 ++++++++++-------------------------------------
4 files changed, 24 insertions(+), 110 deletions(-)
--
2.46.1