[PATCH 0/5] ceph: CephFS writeback correctness and performance fixes

Sam Edwards posted 5 patches 1 month, 1 week ago
There is a newer version of this series
fs/ceph/addr.c | 35 +++++++++++++++++++----------------
1 file changed, 19 insertions(+), 16 deletions(-)
[PATCH 0/5] ceph: CephFS writeback correctness and performance fixes
Posted by Sam Edwards 1 month, 1 week ago
Hello list,

This series addresses several interrelated CephFS writeback issues,
particularly for fscrypted files. My work began with a performance problem:
encrypted files caused a write storm during writeback because the writeback
code was inadvertently selecting the crypto block instead of the stripe unit as
the maximum write unit size.

While testing that fix, I encountered a correctness bug: failures to allocate
bounce pages during writeback were incorrectly propagated as batch errors,
which trigger kernel oopses/panics due to poor handling in the writeback loop.
While investigating that, I discovered that the same oopses could be triggered
by a failure in ceph_submit_write() as well.

The patches in this series:

1. Prevent bounce page allocation failures from aborting the writeback batch
   and causing a kernel oops/panic due to the page array not being freed.
2. Remove the now-redundant error return from ceph_process_folio_batch().
3. Free page arrays during failure in ceph_submit_write(), preventing another
   path to the same kernel oops/panic. This was not an issue I encountered in
   testing, and it is tricky to trigger organically. I used the fault injection
   framework to confirm it and verify the fix.
4. Assert writeback loop invariants explicitly to help prevent regressions and
   aid debugging should the problem reappear.
5. Fix the write storm on fscrypted files by using the correct stripe unit.

Note that this series follows a "fix-then-refactor" cadence: patches 1, 3, and
5 fix bugs and are intended for stable, while patches 2 and 4 represent code
cleanup and are intended only for next.

Wishing you all a prosperous 2026 ahead,
Sam

Sam Edwards (5):
  ceph: Do not propagate page array emplacement errors as batch errors
  ceph: Remove error return from ceph_process_folio_batch()
  ceph: Free page array when ceph_submit_write fails
  ceph: Assert writeback loop invariants
  ceph: Fix write storm on fscrypted files

 fs/ceph/addr.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

-- 
2.51.2