[PATCH] mm: page_alloc: unreserve highatomic page blocks before oom

Charan Teja Kalla posted 1 patch 2 years, 2 months ago
There is a newer version of this series
mm/page_alloc.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
[PATCH] mm: page_alloc: unreserve highatomic page blocks before oom
Posted by Charan Teja Kalla 2 years, 2 months ago
__alloc_pages_direct_reclaim() is called from slowpath allocation where
high atomic reserves can be unreserved after there is a progress in
reclaim and yet no suitable page is found. Later should_reclaim_retry()
gets called from slow path allocation to decide if the reclaim needs to
be retried before OOM kill path is taken.

should_reclaim_retry() checks the available(reclaimable + free pages)
memory against the min wmark levels of a zone and returns:
a)  true, if it is above the min wmark so that slow path allocation will
do the reclaim retries.
b) false, thus slowpath allocation takes oom kill path.

should_reclaim_retry() can also unreserves the high atomic reserves
**but only after all the reclaim retries are exhausted.**

In a case where there are almost none reclaimable memory and free pages
contains mostly the high atomic reserves but allocation context can't
use these high atomic reserves, makes the available memory below min
wmark levels hence false is returned from should_reclaim_retry() leading
the allocation request to take OOM kill path. This is an early oom kill
because high atomic reserves are holding lot of free memory and 
unreserving of them is not attempted.

(early)OOM is encountered on a machine in the below state(excerpt from
the oom kill logs):
[  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
local_pcp:492kB free_cma:0kB
[  295.998656] lowmem_reserve[]: 0 32
[  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
0*4096kB = 7752kB

Per above log, the free memory of ~7MB exist in the high atomic
reserves is not freed up before falling back to oom kill path.

This fix includes unreserving these atomic reserves in the OOM path
before going for a kill. The side effect of unreserving in oom kill path
is that these free pages are checked against the high wmark. If
unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(),
they are checked against the min wmark levels.

Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
---
 mm/page_alloc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 95546f3..2a2536d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3281,6 +3281,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		.order = order,
 	};
 	struct page *page;
+	struct zone *zone;
+	struct zoneref *z;
 
 	*did_some_progress = 0;
 
@@ -3295,6 +3297,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 
 	/*
+	 * If should_reclaim_retry() encounters a state where:
+	 * reclaimable + free doesn't satisfy the wmark levels,
+	 * it can directly jump to OOM without even unreserving
+	 * the highatomic page blocks. Try them for once here
+	 * before jumping to OOM.
+	 */
+retry:
+	unreserve_highatomic_pageblock(ac, true);
+
+	/*
 	 * Go through the zonelist yet one more time, keep very high watermark
 	 * here, this is only to catch a parallel oom killing, we must fail if
 	 * we're still under heavy pressure. But make sure that this reclaim
@@ -3307,6 +3319,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	if (page)
 		goto out;
 
+	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->highest_zoneidx,
+								ac->nodemask) {
+		if (zone->nr_reserved_highatomic > 0)
+			goto retry;
+	}
+
 	/* Coredumps can quickly deplete all memory reserves */
 	if (current->flags & PF_DUMPCORE)
 		goto out;
-- 
2.7.4
Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before oom
Posted by Michal Hocko 2 years, 2 months ago
On Mon 30-10-23 18:09:50, Charan Teja Kalla wrote:
> __alloc_pages_direct_reclaim() is called from slowpath allocation where
> high atomic reserves can be unreserved after there is a progress in
> reclaim and yet no suitable page is found. Later should_reclaim_retry()
> gets called from slow path allocation to decide if the reclaim needs to
> be retried before OOM kill path is taken.
> 
> should_reclaim_retry() checks the available(reclaimable + free pages)
> memory against the min wmark levels of a zone and returns:
> a)  true, if it is above the min wmark so that slow path allocation will
> do the reclaim retries.
> b) false, thus slowpath allocation takes oom kill path.
> 
> should_reclaim_retry() can also unreserves the high atomic reserves
> **but only after all the reclaim retries are exhausted.**
> 
> In a case where there are almost none reclaimable memory and free pages
> contains mostly the high atomic reserves but allocation context can't
> use these high atomic reserves, makes the available memory below min
> wmark levels hence false is returned from should_reclaim_retry() leading
> the allocation request to take OOM kill path. This is an early oom kill
> because high atomic reserves are holding lot of free memory and 
> unreserving of them is not attempted.

OK, I see. So we do not release those reserved pages because OOM hits
too early. 

> (early)OOM is encountered on a machine in the below state(excerpt from
> the oom kill logs):
> [  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
> high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
> active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
> present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
> local_pcp:492kB free_cma:0kB
> [  295.998656] lowmem_reserve[]: 0 32
> [  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
> 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> 0*4096kB = 7752kB

OK, this is quite interesting as well. The system is really tiny and 8MB
of reserved memory is indeed really high. How come those reservations
have grown that high?
> 
> Per above log, the free memory of ~7MB exist in the high atomic
> reserves is not freed up before falling back to oom kill path.
> 
> This fix includes unreserving these atomic reserves in the OOM path
> before going for a kill. The side effect of unreserving in oom kill path
> is that these free pages are checked against the high wmark. If
> unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(),
> they are checked against the min wmark levels.

I do not like the fix much TBH. I think the logic should live in
should_reclaim_retry. One way to approach it is to unreserve at the end
of the function, something like this:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 95546f376302..d04e14adf2c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3813,10 +3813,8 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 	 * Make sure we converge to OOM if we cannot make any progress
 	 * several times in the row.
 	 */
-	if (*no_progress_loops > MAX_RECLAIM_RETRIES) {
-		/* Before OOM, exhaust highatomic_reserve */
-		return unreserve_highatomic_pageblock(ac, true);
-	}
+	if (*no_progress_loops > MAX_RECLAIM_RETRIES)
+		goto out;
 
 	/*
 	 * Keep reclaiming pages while there is a chance this will lead
@@ -3859,6 +3857,12 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order,
 		schedule_timeout_uninterruptible(1);
 	else
 		cond_resched();
+
+out:
+	/* Before OOM, exhaust highatomic_reserve */
+	if (!ret)
+		return unreserve_highatomic_pageblock(ac, true);
+
 	return ret;
 }
 
-- 
Michal Hocko
SUSE Labs
Re: [PATCH] mm: page_alloc: unreserve highatomic page blocks before oom
Posted by Pavan Kondeti 2 years, 2 months ago
On Mon, Oct 30, 2023 at 06:09:50PM +0530, Charan Teja Kalla wrote:
> __alloc_pages_direct_reclaim() is called from slowpath allocation where
> high atomic reserves can be unreserved after there is a progress in
> reclaim and yet no suitable page is found. Later should_reclaim_retry()
> gets called from slow path allocation to decide if the reclaim needs to
> be retried before OOM kill path is taken.
> 
> should_reclaim_retry() checks the available(reclaimable + free pages)
> memory against the min wmark levels of a zone and returns:
> a)  true, if it is above the min wmark so that slow path allocation will
> do the reclaim retries.
> b) false, thus slowpath allocation takes oom kill path.
> 
> should_reclaim_retry() can also unreserves the high atomic reserves
> **but only after all the reclaim retries are exhausted.**
> 
> In a case where there are almost none reclaimable memory and free pages
> contains mostly the high atomic reserves but allocation context can't
> use these high atomic reserves, makes the available memory below min
> wmark levels hence false is returned from should_reclaim_retry() leading
> the allocation request to take OOM kill path. This is an early oom kill
> because high atomic reserves are holding lot of free memory and 
> unreserving of them is not attempted.
> 
> (early)OOM is encountered on a machine in the below state(excerpt from
> the oom kill logs):
> [  295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB
> high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB
> active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB
> present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB
> local_pcp:492kB free_cma:0kB
> [  295.998656] lowmem_reserve[]: 0 32
> [  295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH)
> 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
> 0*4096kB = 7752kB
> 
> Per above log, the free memory of ~7MB exist in the high atomic
> reserves is not freed up before falling back to oom kill path.
> 
> This fix includes unreserving these atomic reserves in the OOM path
> before going for a kill. The side effect of unreserving in oom kill path
> is that these free pages are checked against the high wmark. If
> unreserved from should_reclaim_retry()/__alloc_pages_direct_reclaim(),
> they are checked against the min wmark levels.
> 
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>

Thanks for the detailed commit description. Really helpful in
understanding the problem you are fixing.

> ---
>  mm/page_alloc.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 95546f3..2a2536d 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3281,6 +3281,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  		.order = order,
>  	};
>  	struct page *page;
> +	struct zone *zone;
> +	struct zoneref *z;
>  
>  	*did_some_progress = 0;
>  
> @@ -3295,6 +3297,16 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	}
>  
>  	/*
> +	 * If should_reclaim_retry() encounters a state where:
> +	 * reclaimable + free doesn't satisfy the wmark levels,
> +	 * it can directly jump to OOM without even unreserving
> +	 * the highatomic page blocks. Try them for once here
> +	 * before jumping to OOM.
> +	 */
> +retry:
> +	unreserve_highatomic_pageblock(ac, true);
> +

Not possible to fix this in should_reclaim_retry()? 

> +	/*
>  	 * Go through the zonelist yet one more time, keep very high watermark
>  	 * here, this is only to catch a parallel oom killing, we must fail if
>  	 * we're still under heavy pressure. But make sure that this reclaim
> @@ -3307,6 +3319,12 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	if (page)
>  		goto out;
>  
> +	for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, ac->highest_zoneidx,
> +								ac->nodemask) {
> +		if (zone->nr_reserved_highatomic > 0)
> +			goto retry;
> +	}
> +
>  	/* Coredumps can quickly deplete all memory reserves */
>  	if (current->flags & PF_DUMPCORE)
>  		goto out;