Changes from V1:
- Avoid setting MD_BROKEN instead of clearing it
- Add pr_crit() when setting MD_BROKEN
- Fix the message may shown after all rdevs failure:
"Operation continuing on 0 devices"
A failfast bio, for example in the case of nvme-tcp,
will fail immediately if the connection to the target is
lost for a few seconds and the device enters a reconnecting
state - even though it would recover if given a few seconds.
This behavior is exactly as intended by the design of failfast.
However, md treats super_write operations fails with failfast as fatal.
For example, if an initiator - that is, a machine loading the md module -
loses all connections for a few seconds, the array becomes
broken and subsequent write is no longer possible.
This is the issue I am currently facing, and which this patch aims to fix.
The 1st patch changes the behavior on super_write MD_FAILFAST IO failures.
The 2nd and 3rd patches modify the output of pr_crit.
Kenta Akagi (3):
md/raid1,raid10: don't broken array on failfast metadata write fails
md/raid1,raid10: Add error message when setting MD_BROKEN
md/raid1,raid10: Fix: Operation continuing on 0 devices.
drivers/md/md.c | 9 ++++++---
drivers/md/md.h | 7 ++++---
drivers/md/raid1.c | 26 ++++++++++++++++++++------
drivers/md/raid10.c | 26 ++++++++++++++++++++------
4 files changed, 50 insertions(+), 18 deletions(-)
--
2.50.1