mm/memcg: reorder retry checks for clarity in try_charge_memcg

[PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg

Posted by Dipendra Khadka 1 month, 3 weeks ago

In try_charge_memcg(), reorder the retry logic checks to follow the
early-exit pattern by testing for dying task before decrementing the
retry counter:

Before:
    if (nr_retries--)
        goto retry;
    
    if (passed_oom && task_is_dying())
        goto nomem;

After:
    if (passed_oom && task_is_dying())
        goto nomem;
    
    if (nr_retries--)
        goto retry;

This makes the control flow more obvious: check exit conditions first,
then decide whether to retry. When current task is dying (e.g., has
received SIGKILL or is exiting), we should exit immediately rather than
consuming a retry count first.

No functional change for the common case where task is not dying.

Signed-off-by: Dipendra Khadka <kdipendra88@gmail.com>
---
 mm/memcontrol.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..7c9145538683 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2412,16 +2412,16 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	if (nr_reclaimed && nr_pages <= (1 << PAGE_ALLOC_COSTLY_ORDER))
 		goto retry;

+	/* Avoid endless loop for tasks bypassed by the oom killer */
+	if (passed_oom && task_is_dying())
+		goto nomem;
+
 	if (nr_retries--)
 		goto retry;
 
 	if (gfp_mask & __GFP_RETRY_MAYFAIL)
 		goto nomem;
 
-	/* Avoid endless loop for tasks bypassed by the oom killer */
-	if (passed_oom && task_is_dying())
-		goto nomem;
-
 	/*
 	 * keep retrying as long as the memcg oom killer is able to make
 	 * a forward progress or bypass the charge if the oom killer
-- 
2.43.0

Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg

Posted by Johannes Weiner 1 month, 3 weeks ago

On Mon, Dec 15, 2025 at 02:54:19PM +0000, Dipendra Khadka wrote:
> In try_charge_memcg(), reorder the retry logic checks to follow the
> early-exit pattern by testing for dying task before decrementing the
> retry counter:
> 
> Before:
>     if (nr_retries--)
>         goto retry;
>     
>     if (passed_oom && task_is_dying())
>         goto nomem;
> 
> After:
>     if (passed_oom && task_is_dying())
>         goto nomem;
>     
>     if (nr_retries--)
>         goto retry;
> 
> This makes the control flow more obvious: check exit conditions first,
> then decide whether to retry. When current task is dying (e.g., has
> received SIGKILL or is exiting), we should exit immediately rather than
> consuming a retry count first.
> 
> No functional change for the common case where task is not dying.

It's definitely a functional change, not just code clarification.

The oom kill resets nr_retries. This means that currently, even an OOM
victim is going to retry a full set of reclaims, even if they are
hopeless. After your patch, it'll retry for other reasons but can bail
much earlier as well. Check the other conditions.

The dying task / OOM victim allocation path is tricky and it tends to
fail us in the rarest and most difficult to debug scenarios. There
should be a good reason to change it.

Re: [PATCH] mm/memcg: reorder retry checks for clarity in try_charge_memcg

Posted by Dipendra Khadka 1 month, 3 weeks ago

Hi Johannes,

Thank you for the feedback. Let me clarify the scenario this patch addresses.

The task_is_dying() check in try_charge_memcg() identifies when the
CURRENT task (the caller) is the OOM victim - not when some other 
process was killed.

Two scenarios:

1. Normal allocator triggers OOM:
  - Process A allocates → triggers OOM
  - Process B is killed (victim)
  - Process A continues with reset retries - task_is_dying() = false for A
  → Unchanged by my patch

2. Victim tries to allocate:
 - Process B (victim, TIF_MEMDIE set) tries to allocate
  - task_is_dying() = true
  - Current code: wastes retries on hopeless reclaims
  - My patch: exits immediately
  → Optimization for this case

The victim has three safety mechanisms that make the retries unnecessary:
1. oom_reaper proactively frees its memory
2. __alloc_pages_slowpath() grants reserves via oom_reserves_allowed()
3. Critical allocations with __GFP_NOFAIL still reach force: label

The retry loop for a dying victim is futile because:
- Reclaim won't help (victim is being killed to free memory!)
- Victim will exit regardless
- Just wastes CPU cycles

Would you like me to provide evidence showing the unnecessary retries,
or run specific tests to verify the safety mechanisms are sufficient?

Best Regards,
Dipendra