[PATCH] nilfs2: fix state management in error path of log writing function

Ryusuke Konishi posted 1 patch 1 year, 6 months ago
There is a newer version of this series
fs/nilfs2/segment.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
[PATCH] nilfs2: fix state management in error path of log writing function
Posted by Ryusuke Konishi 1 year, 6 months ago
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests
continuously even if user data blocks were split into multiple logs
across segments, but two potential flaws were introduced in its error
handling.

First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set
on pages/folios will remain uncleared.  This causes page cache
operations to hang waiting for the writeback flag.  For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode()
when an inode is evicted from memory, will hang.

Second, the NILFS_I_COLLECTED flag set on normal inodes remain
uncleared.  As a result, if the next log write involves checkpoint
creation, that's fine, but if a partial log write is performed that
does not, inodes with NILFS_I_COLLECTED set are erroneously removed
from the "sc_dirty_files" list, and their data and b-tree blocks may
not be written to the device, corrupting the block mapping.

Fix these issues by correcting the jump destination of the error
branch in nilfs_segctor_do_construct() and the condition for calling
nilfs_redirty_inodes(), which clears the NILFS_I_COLLECTED flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
---
Hi Andrew, please apply this as a bug fix.

This fixes error path flaws of the log writing function that was
discovered during error injection testing, which could lead to a hang
due to the writeback flag not being cleared on folios, and potential
filesystem corruption due to missing blocks in the log after an error.

Thanks,
Ryusuke Konishi

 fs/nilfs2/segment.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 0ca3110d6386..8b3225bd08ed 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2056,7 +2056,7 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
 
 		err = nilfs_segctor_begin_construction(sci, nilfs);
 		if (unlikely(err))
-			goto out;
+			goto failed;
 
 		/* Update time stamp */
 		sci->sc_seg_ctime = ktime_get_real_seconds();
@@ -2120,10 +2120,9 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
 	return err;
 
  failed_to_write:
-	if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
-		nilfs_redirty_inodes(&sci->sc_dirty_files);
-
  failed:
+	if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
+		nilfs_redirty_inodes(&sci->sc_dirty_files);
 	if (nilfs_doing_gc())
 		nilfs_redirty_inodes(&sci->sc_gc_inodes);
 	nilfs_segctor_abort_construction(sci, nilfs, err);
-- 
2.34.1
Re: [PATCH] nilfs2: fix state management in error path of log writing function
Posted by Ryusuke Konishi 1 year, 5 months ago
On Thu, Aug 8, 2024 at 8:07 AM Ryusuke Konishi wrote:
>
> After commit a694291a6211 ("nilfs2: separate wait function from
> nilfs_segctor_write") was applied, the log writing function
> nilfs_segctor_do_construct() was able to issue I/O requests
> continuously even if user data blocks were split into multiple logs
> across segments, but two potential flaws were introduced in its error
> handling.
>
> First, if nilfs_segctor_begin_construction() fails while creating the
> second or subsequent logs, the log writing function returns without
> calling nilfs_segctor_abort_construction(), so the writeback flag set
> on pages/folios will remain uncleared.  This causes page cache
> operations to hang waiting for the writeback flag.  For example,
> truncate_inode_pages_final(), which is called via nilfs_evict_inode()
> when an inode is evicted from memory, will hang.
>
> Second, the NILFS_I_COLLECTED flag set on normal inodes remain
> uncleared.  As a result, if the next log write involves checkpoint
> creation, that's fine, but if a partial log write is performed that
> does not, inodes with NILFS_I_COLLECTED set are erroneously removed
> from the "sc_dirty_files" list, and their data and b-tree blocks may
> not be written to the device, corrupting the block mapping.
>
> Fix these issues by correcting the jump destination of the error
> branch in nilfs_segctor_do_construct() and the condition for calling
> nilfs_redirty_inodes(), which clears the NILFS_I_COLLECTED flag.
>
> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
> Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
> Cc: stable@vger.kernel.org
> ---
> Hi Andrew, please apply this as a bug fix.
>
> This fixes error path flaws of the log writing function that was
> discovered during error injection testing, which could lead to a hang
> due to the writeback flag not being cleared on folios, and potential
> filesystem corruption due to missing blocks in the log after an error.
>
> Thanks,
> Ryusuke Konishi

Andrew, please stop sending this patch upstream.

I found a problem with changing the error path in this patch in
another error injection test, so I would like to create a revised
version.

The other two bug fix patches I have sent will not be affected.

Thanks,
Ryusuke Konishi

>
>  fs/nilfs2/segment.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index 0ca3110d6386..8b3225bd08ed 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -2056,7 +2056,7 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
>
>                 err = nilfs_segctor_begin_construction(sci, nilfs);
>                 if (unlikely(err))
> -                       goto out;
> +                       goto failed;
>
>                 /* Update time stamp */
>                 sci->sc_seg_ctime = ktime_get_real_seconds();
> @@ -2120,10 +2120,9 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
>         return err;
>
>   failed_to_write:
> -       if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
> -               nilfs_redirty_inodes(&sci->sc_dirty_files);
> -
>   failed:
> +       if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
> +               nilfs_redirty_inodes(&sci->sc_dirty_files);
>         if (nilfs_doing_gc())
>                 nilfs_redirty_inodes(&sci->sc_gc_inodes);
>         nilfs_segctor_abort_construction(sci, nilfs, err);
> --
> 2.34.1
>
[PATCH v2] nilfs2: fix state management in error path of log writing function
Posted by Ryusuke Konishi 1 year, 5 months ago
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests
continuously even if user data blocks were split into multiple logs
across segments, but two potential flaws were introduced in its error
handling.

First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set
on pages/folios will remain uncleared.  This causes page cache
operations to hang waiting for the writeback flag.  For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode()
when an inode is evicted from memory, will hang.

Second, the NILFS_I_COLLECTED flag set on normal inodes remain
uncleared.  As a result, if the next log write involves checkpoint
creation, that's fine, but if a partial log write is performed that
does not, inodes with NILFS_I_COLLECTED set are erroneously removed
from the "sc_dirty_files" list, and their data and b-tree blocks may
not be written to the device, corrupting the block mapping.

Fix these issues by uniformly calling
nilfs_segctor_abort_construction() on failure of each step in the loop
in nilfs_segctor_do_construct(), having it clean up logs and segment
usages according to progress, and correcting the conditions for
calling nilfs_redirty_inodes() to ensure that the NILFS_I_COLLECTED
flag is cleared.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: stable@vger.kernel.org
---
Andrew, please apply this as a bug fix instead of the one dropped
recently.

This fixes error path flaws of the log writing function, which could
lead to a hang due to the writeback flag not being cleared on folios,
and potential filesystem corruption due to missing blocks in the log
after an error.

v1->v2: fixed a regression that caused unexpected cleanup when
        handling an error at a stage where no logs were ready.

Thanks,
Ryusuke Konishi

 fs/nilfs2/segment.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 0ca3110d6386..871ec35ea8e8 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1812,6 +1812,9 @@ static void nilfs_segctor_abort_construction(struct nilfs_sc_info *sci,
 	nilfs_abort_logs(&logs, ret ? : err);
 
 	list_splice_tail_init(&sci->sc_segbufs, &logs);
+	if (list_empty(&logs))
+		return; /* if the first segment buffer preparation failed */
+
 	nilfs_cancel_segusage(&logs, nilfs->ns_sufile);
 	nilfs_free_incomplete_logs(&logs, nilfs);
 
@@ -2056,7 +2059,7 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
 
 		err = nilfs_segctor_begin_construction(sci, nilfs);
 		if (unlikely(err))
-			goto out;
+			goto failed;
 
 		/* Update time stamp */
 		sci->sc_seg_ctime = ktime_get_real_seconds();
@@ -2120,10 +2123,9 @@ static int nilfs_segctor_do_construct(struct nilfs_sc_info *sci, int mode)
 	return err;
 
  failed_to_write:
-	if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
-		nilfs_redirty_inodes(&sci->sc_dirty_files);
-
  failed:
+	if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
+		nilfs_redirty_inodes(&sci->sc_dirty_files);
 	if (nilfs_doing_gc())
 		nilfs_redirty_inodes(&sci->sc_gc_inodes);
 	nilfs_segctor_abort_construction(sci, nilfs, err);
-- 
2.34.1