ntfs: error/durability fixes for mft writepage paths

[PATCH 0/3] ntfs: error/durability fixes for mft writepage paths

Posted by DaeMyung Kang 1 month, 2 weeks ago

This series fixes three independent issues in the ntfs mft writepage
paths in linux-next.

Patch 1 restores the documented synchronous semantics of the @sync
path of write_mft_record_nolock() and of ntfs_sync_mft_mirror().
Both functions claim to wait for I/O completion, but in the converted
bio code they only call submit_bio() and return; bi_status is never
inspected.  As a result write_inode() can report success while dirty
bytes are still in flight and bio errors are silently dropped.
Introduced by commit 115380f9a2f9 ("ntfs: update mft operations").

Patch 2 fixes ntfs_write_mft_block().  When the per-call allocations
for @locked_nis or @ref_inos fail, the function returns -ENOMEM with
the folio still locked, which stalls any later task that needs the
folio's lock and drops the dirty state from the writeback iterator's
point of view.  Use folio_redirty_for_writepage() and folio_unlock()
before returning.  Introduced when those buffers were moved off the
stack by commit f462fdf3d6a4 ("ntfs: reduce stack usage in
ntfs_write_mft_block()").

Patch 3 captures the return value of ntfs_sync_mft_mirror() inside
ntfs_write_mft_block() so a $MFTMirr write failure during writepages
is propagated and surfaces via NVolErrors.  Patch 3 is what makes
ntfs_sync_mft_mirror()'s newly-meaningful return value (from patch 1)
visible on this code path.  Also introduced by commit
115380f9a2f9 ("ntfs: update mft operations").

Note: this series does not yet wait for the main $MFT bios submitted
inside ntfs_write_mft_block() to complete, nor does it propagate
their bi_status, nor does it redirty folios when records are skipped
by ntfs_may_write_mft_record().  Those issues require a per-folio
writeback completion context and are the subject of a follow-up
patch I am preparing separately.

The patches apply on top of linkinjeon/ntfs-next.  Build-tested on
x86_64 with CONFIG_NTFS_FS=m and pass scripts/checkpatch.pl with no
warnings.

Runtime testing was done in QEMU with a small initramfs, ntfs.ko and
dm-flakey.  The test kernel used CONFIG_NTFS_FS=m, CONFIG_FAILSLAB=y,
CONFIG_FAULT_INJECTION=y, and
CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y.  KVM was not available in
the test environment, so these runs used TCG acceleration.

On the unpatched baseline (the commit immediately before this series,
9e9354075d5a), the QEMU tests reproduced the fixed failures:

  - With dm-flakey erroring writes during a metadata fsync,
    utime_fsync returned success:

      QEMU-NTFS-ORIG: TEST patch1-original: BUG reproduced
      fsync returned success under write I/O failure

  - With failslab injected from the ntfs_mft_writepages() stack,
    sync stayed blocked after the injected allocation failure and the
    guest later emitted hung-task diagnostics:

      QEMU-NTFS-ORIG: TEST patch2-original: BUG reproduced
      sync still blocked after injected allocation failure

With this series applied, the same QEMU test setup completed:

  - The metadata fsync path returned EIO under dm-flakey write errors:

      QEMU-NTFS-BATCH1: TEST patch1: PASS fsync failed as expected rc=1

  - failslab injection from ntfs_mft_writepages() was observed and sync
    completed without hanging:

      QEMU-NTFS-BATCH1: TEST patch2: observed injected
      ntfs_mft_writepages allocation failure
      QEMU-NTFS-BATCH1: TEST patch2: PASS no hang after
      injected allocation failure rc=0

  - The writepages path observed and propagated an $MFTMirr write
    failure:

      QEMU-NTFS-BATCH1: TEST patch3: observed MFT mirror sync failure
      QEMU-NTFS-BATCH1: TEST patch3: PASS syncfs failed as expected rc=1

  - Final result:

      QEMU-NTFS-BATCH1: PASS all qemu ntfs batch1 checks completed

The whole-device dm-flakey run is not a clean negative control for
patch 3 alone because syncfs can also see main $MFT writeback errors in
the unpatched tree.  It nevertheless covers the patched $MFTMirr error
path and verifies that the error is propagated.

DaeMyung Kang (3):
  ntfs: wait for sync mft writes to complete
  ntfs: redirty folio when ntfs_write_mft_block() runs out of memory
  ntfs: capture mft mirror sync errors in ntfs_write_mft_block()

 fs/ntfs/mft.c | 76 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 51 insertions(+), 25 deletions(-)

-- 
2.43.0

Re: [PATCH 0/3] ntfs: error/durability fixes for mft writepage paths

Posted by Namjae Jeon 1 month, 2 weeks ago

On Fri, May 1, 2026 at 2:21 AM DaeMyung Kang <charsyam@gmail.com> wrote:
>
> This series fixes three independent issues in the ntfs mft writepage
> paths in linux-next.
>
> Patch 1 restores the documented synchronous semantics of the @sync
> path of write_mft_record_nolock() and of ntfs_sync_mft_mirror().
> Both functions claim to wait for I/O completion, but in the converted
> bio code they only call submit_bio() and return; bi_status is never
> inspected.  As a result write_inode() can report success while dirty
> bytes are still in flight and bio errors are silently dropped.
> Introduced by commit 115380f9a2f9 ("ntfs: update mft operations").
>
> Patch 2 fixes ntfs_write_mft_block().  When the per-call allocations
> for @locked_nis or @ref_inos fail, the function returns -ENOMEM with
> the folio still locked, which stalls any later task that needs the
> folio's lock and drops the dirty state from the writeback iterator's
> point of view.  Use folio_redirty_for_writepage() and folio_unlock()
> before returning.  Introduced when those buffers were moved off the
> stack by commit f462fdf3d6a4 ("ntfs: reduce stack usage in
> ntfs_write_mft_block()").
>
> Patch 3 captures the return value of ntfs_sync_mft_mirror() inside
> ntfs_write_mft_block() so a $MFTMirr write failure during writepages
> is propagated and surfaces via NVolErrors.  Patch 3 is what makes
> ntfs_sync_mft_mirror()'s newly-meaningful return value (from patch 1)
> visible on this code path.  Also introduced by commit
> 115380f9a2f9 ("ntfs: update mft operations").
>
> Note: this series does not yet wait for the main $MFT bios submitted
> inside ntfs_write_mft_block() to complete, nor does it propagate
> their bi_status, nor does it redirty folios when records are skipped
> by ntfs_may_write_mft_record().  Those issues require a per-folio
> writeback completion context and are the subject of a follow-up
> patch I am preparing separately.
>
> The patches apply on top of linkinjeon/ntfs-next.  Build-tested on
> x86_64 with CONFIG_NTFS_FS=m and pass scripts/checkpatch.pl with no
> warnings.
>
> Runtime testing was done in QEMU with a small initramfs, ntfs.ko and
> dm-flakey.  The test kernel used CONFIG_NTFS_FS=m, CONFIG_FAILSLAB=y,
> CONFIG_FAULT_INJECTION=y, and
> CONFIG_FAULT_INJECTION_STACKTRACE_FILTER=y.  KVM was not available in
> the test environment, so these runs used TCG acceleration.
>
> On the unpatched baseline (the commit immediately before this series,
> 9e9354075d5a), the QEMU tests reproduced the fixed failures:
>
>   - With dm-flakey erroring writes during a metadata fsync,
>     utime_fsync returned success:
>
>       QEMU-NTFS-ORIG: TEST patch1-original: BUG reproduced
>       fsync returned success under write I/O failure
>
>   - With failslab injected from the ntfs_mft_writepages() stack,
>     sync stayed blocked after the injected allocation failure and the
>     guest later emitted hung-task diagnostics:
>
>       QEMU-NTFS-ORIG: TEST patch2-original: BUG reproduced
>       sync still blocked after injected allocation failure
>
> With this series applied, the same QEMU test setup completed:
>
>   - The metadata fsync path returned EIO under dm-flakey write errors:
>
>       QEMU-NTFS-BATCH1: TEST patch1: PASS fsync failed as expected rc=1
>
>   - failslab injection from ntfs_mft_writepages() was observed and sync
>     completed without hanging:
>
>       QEMU-NTFS-BATCH1: TEST patch2: observed injected
>       ntfs_mft_writepages allocation failure
>       QEMU-NTFS-BATCH1: TEST patch2: PASS no hang after
>       injected allocation failure rc=0
>
>   - The writepages path observed and propagated an $MFTMirr write
>     failure:
>
>       QEMU-NTFS-BATCH1: TEST patch3: observed MFT mirror sync failure
>       QEMU-NTFS-BATCH1: TEST patch3: PASS syncfs failed as expected rc=1
>
>   - Final result:
>
>       QEMU-NTFS-BATCH1: PASS all qemu ntfs batch1 checks completed
>
> The whole-device dm-flakey run is not a clean negative control for
> patch 3 alone because syncfs can also see main $MFT writeback errors in
> the unpatched tree.  It nevertheless covers the patched $MFTMirr error
> path and verifies that the error is propagated.
>
> DaeMyung Kang (3):
>   ntfs: wait for sync mft writes to complete
>   ntfs: redirty folio when ntfs_write_mft_block() runs out of memory
>   ntfs: capture mft mirror sync errors in ntfs_write_mft_block()
Applied it to #ntfs-next
Thanks!