[PATCH] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()

Seiji Nishikawa posted 1 patch 1 year, 2 months ago
There is a newer version of this series
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
[PATCH] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Posted by Seiji Nishikawa 1 year, 2 months ago
The kernel hangs due to a task stuck in throttle_direct_reclaim(),
caused by a node being incorrectly deemed balanced despite pressure in
certain zones, such as ZONE_NORMAL. This issue arises from
zone_reclaimable_pages() returning 0 for zones without reclaimable file-
backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
free pages to be skipped.

The lack of swap or reclaimable pages results in ZONE_DMA32 being
ignored during reclaim, masking pressure in other zones. Consequently,
pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
mechanisms in allow_direct_reclaim() from being triggered, leading to an
infinite loop in throttle_direct_reclaim().

This patch modifies zone_reclaimable_pages() to account for free pages
(NR_FREE_PAGES) when no other reclaimable pages exist. This ensures
zones with sufficient free pages are not skipped, enabling proper
balancing and reclaim behavior.

Signed-off-by: Seiji Nishikawa <snishika@redhat.com>
---
 mm/vmscan.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76378bc257e3..fb6b4056dcce 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
 			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
-
+	/*
+	 * If there are no reclaimable file-backed or anonymous pages, 
+	 * ensure zones with sufficient free pages are not skipped. 
+	 * This prevents zones like DMA32 from being ignored in reclaim 
+	 * scenarios where they can still help alleviate memory pressure.
+	 */
+	if (nr == 0)
+	    nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
 	return nr;
 }
 
-- 
2.47.0
Re: [PATCH] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Posted by Andrew Morton 1 year, 2 months ago
On Sun,  1 Dec 2024 01:12:34 +0900 Seiji Nishikawa <snishika@redhat.com> wrote:

> The kernel hangs due to a task stuck in throttle_direct_reclaim(),
> caused by a node being incorrectly deemed balanced despite pressure in
> certain zones, such as ZONE_NORMAL. This issue arises from
> zone_reclaimable_pages() returning 0 for zones without reclaimable file-
> backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
> free pages to be skipped.
> 
> The lack of swap or reclaimable pages results in ZONE_DMA32 being
> ignored during reclaim, masking pressure in other zones. Consequently,
> pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
> mechanisms in allow_direct_reclaim() from being triggered, leading to an
> infinite loop in throttle_direct_reclaim().
> 
> This patch modifies zone_reclaimable_pages() to account for free pages
> (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures
> zones with sufficient free pages are not skipped, enabling proper
> balancing and reclaim behavior.

We'll want to backport a fix for this into -stable kernels.  For that
it's best to be able to identify a suitable Fixes: target, to tell
others whether their kernel needs the fix.  Are you able to help
identify that commit?

Thanks.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
>  	if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
>  		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
>  			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
> -
> +	/*
> +	 * If there are no reclaimable file-backed or anonymous pages, 
> +	 * ensure zones with sufficient free pages are not skipped. 
> +	 * This prevents zones like DMA32 from being ignored in reclaim 
> +	 * scenarios where they can still help alleviate memory pressure.
> +	 */
> +	if (nr == 0)
> +	    nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
>  	return nr;
>  }
>  
> -- 
> 2.47.0
Re: [PATCH] mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
Posted by Seiji Nishikawa 1 year, 2 months ago
On Sun, Dec 1, 2024 at 11:40 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Sun,  1 Dec 2024 01:12:34 +0900 Seiji Nishikawa <snishika@redhat.com> wrote:
>
> > The kernel hangs due to a task stuck in throttle_direct_reclaim(),
> > caused by a node being incorrectly deemed balanced despite pressure in
> > certain zones, such as ZONE_NORMAL. This issue arises from
> > zone_reclaimable_pages() returning 0 for zones without reclaimable file-
> > backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
> > free pages to be skipped.
> >
> > The lack of swap or reclaimable pages results in ZONE_DMA32 being
> > ignored during reclaim, masking pressure in other zones. Consequently,
> > pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
> > mechanisms in allow_direct_reclaim() from being triggered, leading to an
> > infinite loop in throttle_direct_reclaim().
> >
> > This patch modifies zone_reclaimable_pages() to account for free pages
> > (NR_FREE_PAGES) when no other reclaimable pages exist. This ensures
> > zones with sufficient free pages are not skipped, enabling proper
> > balancing and reclaim behavior.
>
> We'll want to backport a fix for this into -stable kernels.  For that
> it's best to be able to identify a suitable Fixes: target, to tell
> others whether their kernel needs the fix.  Are you able to help
> identify that commit?

Based on my analysis, the issue appears to be fundamentally rooted in 
the original design of zone_reclaimable_pages(). The subsequent change 
introduced with a2a36488a61c ("mm/vmscan: Consider anonymous pages 
without swap") does not fundamentally alter the behavior but it just 
refines the handling of anonymous pages. It does not account for zones 
with sufficient free pages but no reclaimable file-backed or anonymous 
pages. The relevant commit that introduced this function is:

Fixes: 5a1c84b404a7 ("mm: remove reclaim and compaction retry approximations")

This commit seems to be the most appropriate target for the Fixes: tag,
as it introduced the logic that my patch modifies to address the 
observed kernel hang.

>
> Thanks.
>
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -374,7 +374,14 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
> >       if (can_reclaim_anon_pages(NULL, zone_to_nid(zone), NULL))
> >               nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
> >                       zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
> > -
> > +     /*
> > +      * If there are no reclaimable file-backed or anonymous pages,
> > +      * ensure zones with sufficient free pages are not skipped.
> > +      * This prevents zones like DMA32 from being ignored in reclaim
> > +      * scenarios where they can still help alleviate memory pressure.
> > +      */
> > +     if (nr == 0)
> > +         nr = zone_page_state_snapshot(zone, NR_FREE_PAGES);
> >       return nr;
> >  }
> >
> > --
> > 2.47.0
>