gfs2: fix hung task in gfs2_jhead_process_page

[PATCH] gfs2: fix hung task in gfs2_jhead_process_page

Posted by Deepanshu Kartikey 1 week, 3 days ago

filemap_get_folio() returns an ERR_PTR if the folio is not present
in the page cache. gfs2_jhead_process_page() does not check the
return value and passes it directly to folio_wait_locked(), causing
the kernel task to get stuck in uninterruptible sleep (state D)
forever, triggering the hung task watchdog.

This can be triggered by mounting a crafted or corrupted GFS2
filesystem image.

Fix this by checking the return value of filemap_get_folio() and
returning early if the folio is not found.

Fixes: 240159077d00 ("gfs2: Convert gfs2_jhead_process_page() to use a folio")
Reported-by: syzbot+9013411dc43f3582823a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=9013411dc43f3582823a
Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
---
 fs/gfs2/lops.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 797931eb5845..005584311eff 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -467,6 +467,9 @@ static void gfs2_jhead_process_page(struct gfs2_jdesc *jd, unsigned long index,
 
 	folio = filemap_get_folio(jd->jd_inode->i_mapping, index);
 
+	if (IS_ERR(folio))
+		return;
+
 	folio_wait_locked(folio);
 	if (!folio_test_uptodate(folio))
 		*done = true;
-- 
2.43.0

Re: [PATCH] gfs2: fix hung task in gfs2_jhead_process_page

Posted by Matthew Wilcox 1 week, 2 days ago

On Tue, Mar 24, 2026 at 09:09:59AM +0530, Deepanshu Kartikey wrote:
> filemap_get_folio() returns an ERR_PTR if the folio is not present
> in the page cache. gfs2_jhead_process_page() does not check the
> return value and passes it directly to folio_wait_locked(), causing
> the kernel task to get stuck in uninterruptible sleep (state D)
> forever, triggering the hung task watchdog.
> 
> This can be triggered by mounting a crafted or corrupted GFS2
> filesystem image.
> 
> Fix this by checking the return value of filemap_get_folio() and
> returning early if the folio is not found.
> 
> Fixes: 240159077d00 ("gfs2: Convert gfs2_jhead_process_page() to use a folio")

No.  That commit only changed the code, it didn't introduce the bug.
f4686c26ecc3 may have introduced it, but I wouldn't swear to it.

I have my doubts that this is the right fix.  If you look at the entire
function, it assumes that the folio was already created and added to
the page cache.  The error should surely be detected earlier, not by
this function.

> Reported-by: syzbot+9013411dc43f3582823a@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=9013411dc43f3582823a
> Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com>
> ---
>  fs/gfs2/lops.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 797931eb5845..005584311eff 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -467,6 +467,9 @@ static void gfs2_jhead_process_page(struct gfs2_jdesc *jd, unsigned long index,
>  
>  	folio = filemap_get_folio(jd->jd_inode->i_mapping, index);
>  
> +	if (IS_ERR(folio))
> +		return;
> +
>  	folio_wait_locked(folio);
>  	if (!folio_test_uptodate(folio))
>  		*done = true;
> -- 
> 2.43.0
>

Re: [PATCH] gfs2: fix hung task in gfs2_jhead_process_page

Posted by Deepanshu Kartikey 1 week, 2 days ago

On Tue, Mar 24, 2026 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> I have my doubts that this is the right fix.  If you look at the entire
> function, it assumes that the folio was already created and added to
> the page cache.  The error should surely be detected earlier, not by
> this function.
>

Hi Mathew,

Thank you for the review. After further analysis, we found that the
bug is triggered by a malicious/corrupted GFS2 filesystem image
whose journal extent list contains extents with gaps between them:

  extent 1: lblock=0,   dblock=1000, blocks=2
  extent 2: lblock=100, dblock=2000, blocks=2

gfs2_find_jhead() only grabs pages for blocks it visits:

  extent 1: filemap_grab_folio(page 0) -> success
  extent 2: filemap_grab_folio(page 12) -> success

Pages 1-11 are never grabbed. But the cleanup loop at out: advances
blocks_read sequentially through ALL page indices including the gaps:

  while (blocks_read < block=101):
    blocks_read=0  -> process page 0  ✓ grabbed
    blocks_read=8  -> process page 1  ✗ NEVER grabbed
                      filemap_get_folio() -> ERR_PTR(-ENOENT)
                      folio_wait_locked(ERR_PTR) -> HANG FOREVER

We considered fixing this by tracking blocks_grabbed in
gfs2_find_jhead() and limiting the cleanup loop to only process
pages that were actually grabbed:

  while (blocks_read < blocks_grabbed) { ... }

However this does not work because blocks_read advances
page-by-page sequentially. Even if we track the last successfully
grabbed block, the cleanup loop will still walk through page indices
that fall in the gaps between extents and were never grabbed.

I believe the correct fix is to handle ERR_PTR in
gfs2_jhead_process_page(), since that is the only place that can
distinguish between a page that was grabbed but not yet read versus
a page that was never grabbed at all due to gaps between extents.
But I may be missing something — could you suggest a better
approach?

Thank you for your guidance.

Deepanshu

Re: [PATCH] gfs2: fix hung task in gfs2_jhead_process_page

Posted by Andreas Gruenbacher 1 week, 2 days ago

On Wed, Mar 25, 2026 at 1:07 AM Deepanshu Kartikey
<kartikey406@gmail.com> wrote:
> On Tue, Mar 24, 2026 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > I have my doubts that this is the right fix.  If you look at the entire
> > function, it assumes that the folio was already created and added to
> > the page cache.  The error should surely be detected earlier, not by
> > this function.
> >
>
> Hi Mathew,
>
> Thank you for the review. After further analysis, we found that the
> bug is triggered by a malicious/corrupted GFS2 filesystem image
> whose journal extent list contains extents with gaps between them:
>
>   extent 1: lblock=0,   dblock=1000, blocks=2
>   extent 2: lblock=100, dblock=2000, blocks=2

How about something like this?

--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -2210,7 +2210,7 @@ void gfs2_free_journal_extents(struct gfs2_jdesc *jd)
  * @dblock: The physical block at start of new extent
  * @blocks: Size of extent in fs blocks
  *
- * Returns: 0 on success or -ENOMEM
+ * Returns: 0 on success, or an error code
  */

 static int gfs2_add_jextent(struct gfs2_jdesc *jd, u64 lblock, u64
dblock, u64 blocks)
@@ -2219,6 +2219,8 @@ static int gfs2_add_jextent(struct gfs2_jdesc
*jd, u64 lblock, u64 dblock, u64 b

     if (!list_empty(&jd->extent_list)) {
         jext = list_last_entry(&jd->extent_list, struct
gfs2_journal_extent, list);
+        if (jext->lblock + jext->blocks != lblock)
+            return -EINVAL;
         if ((jext->dblock + jext->blocks) == dblock) {
             jext->blocks += blocks;
             return 0;

Thanks,
Andreas

Re: [PATCH] gfs2: fix hung task in gfs2_jhead_process_page

Posted by Deepanshu Kartikey 1 week, 1 day ago

On Wed, Mar 25, 2026 at 7:12 AM Andreas Gruenbacher <agruenba@redhat.com> wrote:
>
>
> How about something like this?
>
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -2210,7 +2210,7 @@ void gfs2_free_journal_extents(struct gfs2_jdesc *jd)
>   * @dblock: The physical block at start of new extent
>   * @blocks: Size of extent in fs blocks
>   *
> - * Returns: 0 on success or -ENOMEM
> + * Returns: 0 on success, or an error code
>   */
>
>  static int gfs2_add_jextent(struct gfs2_jdesc *jd, u64 lblock, u64
> dblock, u64 blocks)
> @@ -2219,6 +2219,8 @@ static int gfs2_add_jextent(struct gfs2_jdesc
> *jd, u64 lblock, u64 dblock, u64 b
>
>      if (!list_empty(&jd->extent_list)) {
>          jext = list_last_entry(&jd->extent_list, struct
> gfs2_journal_extent, list);
> +        if (jext->lblock + jext->blocks != lblock)
> +            return -EINVAL;
>          if ((jext->dblock + jext->blocks) == dblock) {
>              jext->blocks += blocks;
>              return 0;
>
> Thanks,
> Andreas
>

Thanks for the clarification. It looks good.

I have sent the patch v2

Deepanshu