Use the hierarchical tree counter approximation to implement the OOM
killer task selection with a 2-pass algorithm. The first pass selects
the process that has the highest badness points approximation, and the
second pass compares each process using the current max badness points
approximation.
The second pass uses an approximate comparison to eliminate all processes
which are below the current max badness points approximation accuracy
range.
Summing the per-CPU counters to calculate the precise badness of tasks
is only required for tasks with an approximate badness within the
accuracy range of the current max points value.
Limit to 16 the maximum number of badness sums allowed for an OOM killer
task selection before falling back to the approximated comparison. This
ensures bounded execution time for scenarios where many tasks have
badness within the accuracy of the maximum badness approximation.
Tested with the following script:
#!/bin/sh
for a in $(seq 1 10); do (tail /dev/zero &); done
sleep 5
for a in $(seq 1 10); do (tail /dev/zero &); done
sleep 2
for a in $(seq 1 10); do (tail /dev/zero &); done
echo "Waiting for tasks to finish"
wait
Results: OOM kill order on a 128GB memory system
================================================
* systemd and sd-pam are chosen first due to their oom_score_ajd:100:
Out of memory: Killed process 3502 (systemd) total-vm:20096kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:72kB oom_score_adj:100
Out of memory: Killed process 3503 ((sd-pam)) total-vm:21432kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:76kB oom_score_adj:100
* The first batch of 10 processes are gradually killed, consecutively
picking the one that uses the most memory. The fact that we are
freeing memory from the previous processes increases the threshold
at which the remaining processes of that group are killed. Processes
from the second and third batches of 10 processes have time to start
before we complete killing the first 10 processes:
Out of memory: Killed process 3703 (tail) total-vm:6591280kB, anon-rss:6578176kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:12936kB oom_score_adj:0
Out of memory: Killed process 3705 (tail) total-vm:6731716kB, anon-rss:6709248kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:13212kB oom_score_adj:0
Out of memory: Killed process 3707 (tail) total-vm:6977216kB, anon-rss:6946816kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:13692kB oom_score_adj:0
Out of memory: Killed process 3699 (tail) total-vm:7205640kB, anon-rss:7184384kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:14136kB oom_score_adj:0
Out of memory: Killed process 3713 (tail) total-vm:7463204kB, anon-rss:7438336kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:14644kB oom_score_adj:0
Out of memory: Killed process 3701 (tail) total-vm:7739204kB, anon-rss:7716864kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15180kB oom_score_adj:0
Out of memory: Killed process 3709 (tail) total-vm:8050176kB, anon-rss:8028160kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15792kB oom_score_adj:0
Out of memory: Killed process 3711 (tail) total-vm:8362236kB, anon-rss:8339456kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:16404kB oom_score_adj:0
Out of memory: Killed process 3715 (tail) total-vm:8649360kB, anon-rss:8634368kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:16972kB oom_score_adj:0
Out of memory: Killed process 3697 (tail) total-vm:8951788kB, anon-rss:8929280kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:17560kB oom_score_adj:0
* Even though there is a 2 seconds delay between the 2nd and 3rd batches
those appear to execute in mixed order. Therefore, let's consider them
as a single batch of 20 processes. We are hitting oom at a lower
memory threshold because at this point the 20 remaining proceses are
running rather than the previous 10. The process with highest memory
usage is selected for oom, thus making room for the remaining
processes so they can use more memory before they fill the available
memory, thus explaining why the memory use for selected processes
gradually increases, until all system memory is used by the last one:
Out of memory: Killed process 3731 (tail) total-vm:7089868kB, anon-rss:7077888kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:13912kB oom_score_adj:0
Out of memory: Killed process 3721 (tail) total-vm:7417248kB, anon-rss:7405568kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:14556kB oom_score_adj:0
Out of memory: Killed process 3729 (tail) total-vm:7795864kB, anon-rss:7766016kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:15300kB oom_score_adj:0
Out of memory: Killed process 3723 (tail) total-vm:8259620kB, anon-rss:8224768kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:16208kB oom_score_adj:0
Out of memory: Killed process 3737 (tail) total-vm:8695984kB, anon-rss:8667136kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:17060kB oom_score_adj:0
Out of memory: Killed process 3735 (tail) total-vm:9295980kB, anon-rss:9265152kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:18240kB oom_score_adj:0
Out of memory: Killed process 3727 (tail) total-vm:9907900kB, anon-rss:9895936kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:19428kB oom_score_adj:0
Out of memory: Killed process 3719 (tail) total-vm:10631248kB, anon-rss:10600448kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:20844kB oom_score_adj:0
Out of memory: Killed process 3733 (tail) total-vm:11341720kB, anon-rss:11321344kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:22232kB oom_score_adj:0
Out of memory: Killed process 3725 (tail) total-vm:12348124kB, anon-rss:12320768kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:24204kB oom_score_adj:0
Out of memory: Killed process 3759 (tail) total-vm:12978888kB, anon-rss:12967936kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:25440kB oom_score_adj:0
Out of memory: Killed process 3751 (tail) total-vm:14386412kB, anon-rss:14352384kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:28196kB oom_score_adj:0
Out of memory: Killed process 3741 (tail) total-vm:16153168kB, anon-rss:16130048kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:31652kB oom_score_adj:0
Out of memory: Killed process 3753 (tail) total-vm:18414856kB, anon-rss:18391040kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:36076kB oom_score_adj:0
Out of memory: Killed process 3745 (tail) total-vm:21389456kB, anon-rss:21356544kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:41904kB oom_score_adj:0
Out of memory: Killed process 3747 (tail) total-vm:25659348kB, anon-rss:25632768kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:50260kB oom_score_adj:0
Out of memory: Killed process 3755 (tail) total-vm:32030820kB, anon-rss:32006144kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:62720kB oom_score_adj:0
Out of memory: Killed process 3743 (tail) total-vm:42648456kB, anon-rss:42614784kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:83504kB oom_score_adj:0
Out of memory: Killed process 3757 (tail) total-vm:63971028kB, anon-rss:63938560kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:125228kB oom_score_adj:0
Out of memory: Killed process 3749 (tail) total-vm:127799660kB, anon-rss:127778816kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:250140kB oom_score_adj:0
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Martin Liu <liumartin@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: christian.koenig@amd.com
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R . Howlett" <liam.howlett@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: linux-trace-kernel@vger.kernel.org
Cc: Yu Zhao <yuzhao@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
Changes since v11:
- get_mm_counter_sum() returns a precise sum.
- Use unsigned long type rather than unsigned int for accuracy.
- Use precise sum min/max calculation to compare the chosen vs
current points.
- The first pass finds the maximum task's min points. The
second pass eliminates all tasks for which the max points
are below the currently chosen min points, and uses a precise
sum to validate the candidates which are possibly in range.
---
fs/proc/base.c | 2 +-
include/linux/mm.h | 34 ++++++++++++++++---
include/linux/oom.h | 11 +++++-
kernel/fork.c | 2 +-
mm/oom_kill.c | 82 +++++++++++++++++++++++++++++++++++++--------
5 files changed, 109 insertions(+), 22 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 4eec684baca9..d75d0ce97032 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -589,7 +589,7 @@ static int proc_oom_score(struct seq_file *m, struct pid_namespace *ns,
unsigned long points = 0;
long badness;
- badness = oom_badness(task, totalpages);
+ badness = oom_badness(task, totalpages, false, NULL, NULL);
/*
* Special case OOM_SCORE_ADJ_MIN for all others scale the
* badness value into [0, 2000] range which we have been
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6d938b3e3709..680f2811702e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2855,14 +2855,32 @@ static inline struct percpu_counter_tree_level_item *get_rss_stat_items(struct m
/*
* per-process(per-mm_struct) statistics.
*/
+static inline unsigned long __get_mm_counter(struct mm_struct *mm, int member, bool approximate,
+ unsigned long *accuracy_under, unsigned long *accuracy_over)
+{
+ if (approximate) {
+ if (accuracy_under && accuracy_over) {
+ unsigned long under, over;
+
+ percpu_counter_tree_approximate_accuracy_range(&mm->rss_stat[member], &under, &over);
+ *accuracy_under += under;
+ *accuracy_over += over;
+ }
+ return percpu_counter_tree_approximate_sum_positive(&mm->rss_stat[member]);
+ } else {
+ return percpu_counter_tree_precise_sum_positive(&mm->rss_stat[member]);
+ }
+}
+
static inline unsigned long get_mm_counter(struct mm_struct *mm, int member)
{
- return percpu_counter_tree_approximate_sum_positive(&mm->rss_stat[member]);
+ return __get_mm_counter(mm, member, true, NULL, NULL);
}
+
static inline unsigned long get_mm_counter_sum(struct mm_struct *mm, int member)
{
- return percpu_counter_tree_precise_sum_positive(&mm->rss_stat[member]);
+ return __get_mm_counter(mm, member, false, NULL, NULL);
}
void mm_trace_rss_stat(struct mm_struct *mm, int member);
@@ -2903,11 +2921,17 @@ static inline int mm_counter(struct folio *folio)
return mm_counter_file(folio);
}
+static inline unsigned long __get_mm_rss(struct mm_struct *mm, bool approximate,
+ unsigned long *accuracy_under, unsigned long *accuracy_over)
+{
+ return __get_mm_counter(mm, MM_FILEPAGES, approximate, accuracy_under, accuracy_over) +
+ __get_mm_counter(mm, MM_ANONPAGES, approximate, accuracy_under, accuracy_over) +
+ __get_mm_counter(mm, MM_SHMEMPAGES, approximate, accuracy_under, accuracy_over);
+}
+
static inline unsigned long get_mm_rss(struct mm_struct *mm)
{
- return get_mm_counter(mm, MM_FILEPAGES) +
- get_mm_counter(mm, MM_ANONPAGES) +
- get_mm_counter(mm, MM_SHMEMPAGES);
+ return __get_mm_rss(mm, true, NULL, NULL);
}
static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 7b02bc1d0a7e..f8e5bfaf7b39 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -48,6 +48,12 @@ struct oom_control {
unsigned long totalpages;
struct task_struct *chosen;
long chosen_points;
+ bool approximate;
+ /*
+ * Number of precise badness points sums performed by this task
+ * selection.
+ */
+ int nr_precise;
/* Used to print the constraint info. */
enum oom_constraint constraint;
@@ -97,7 +103,10 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm)
}
long oom_badness(struct task_struct *p,
- unsigned long totalpages);
+ unsigned long totalpages,
+ bool approximate,
+ unsigned long *accuracy_under,
+ unsigned long *accuracy_over);
extern bool out_of_memory(struct oom_control *oc);
diff --git a/kernel/fork.c b/kernel/fork.c
index 949ac019a7b1..8b56d81af734 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -632,7 +632,7 @@ static void check_mm(struct mm_struct *mm)
for (i = 0; i < NR_MM_COUNTERS; i++) {
if (unlikely(percpu_counter_tree_precise_compare_value(&mm->rss_stat[i], 0) != 0))
- pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%d Comm:%s Pid:%d\n",
+ pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
mm, resident_page_types[i],
percpu_counter_tree_precise_sum(&mm->rss_stat[i]),
current->comm,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5eb11fbba704..740891be3267 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -53,6 +53,14 @@
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>
+/*
+ * Maximum number of badness sums allowed before using an approximated
+ * comparison. This ensures bounded execution time for scenarios where
+ * many tasks have badness within the accuracy of the maximum badness
+ * approximation.
+ */
+static int max_precise_badness_sums = 16;
+
static int sysctl_panic_on_oom;
static int sysctl_oom_kill_allocating_task;
static int sysctl_oom_dump_tasks = 1;
@@ -194,12 +202,16 @@ static bool should_dump_unreclaim_slab(void)
* oom_badness - heuristic function to determine which candidate task to kill
* @p: task struct of which task we should calculate
* @totalpages: total present RAM allowed for page allocation
+ * @approximate: whether the value can be approximated
+ * @accuracy_under: accuracy of the badness value approximation (under value)
+ * @accuracy_over: accuracy of the badness value approximation (over value)
*
* The heuristic for determining which task to kill is made to be as simple and
* predictable as possible. The goal is to return the highest value for the
* task consuming the most memory to avoid subsequent oom failures.
*/
-long oom_badness(struct task_struct *p, unsigned long totalpages)
+long oom_badness(struct task_struct *p, unsigned long totalpages, bool approximate,
+ unsigned long *accuracy_under, unsigned long *accuracy_over)
{
long points;
long adj;
@@ -228,7 +240,8 @@ long oom_badness(struct task_struct *p, unsigned long totalpages)
* The baseline for the badness score is the proportion of RAM that each
* task's rss, pagetable and swap space use.
*/
- points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
+ points = __get_mm_rss(p->mm, approximate, accuracy_under, accuracy_over) +
+ __get_mm_counter(p->mm, MM_SWAPENTS, approximate, accuracy_under, accuracy_over) +
mm_pgtables_bytes(p->mm) / PAGE_SIZE;
task_unlock(p);
@@ -309,7 +322,8 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
static int oom_evaluate_task(struct task_struct *task, void *arg)
{
struct oom_control *oc = arg;
- long points;
+ unsigned long accuracy_under = 0, accuracy_over = 0;
+ long points, points_min, points_max;
if (oom_unkillable_task(task))
goto next;
@@ -339,16 +353,43 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
goto select;
}
- points = oom_badness(task, oc->totalpages);
- if (points == LONG_MIN || points < oc->chosen_points)
- goto next;
+ points = oom_badness(task, oc->totalpages, true, &accuracy_under, &accuracy_over);
+ if (points != LONG_MIN) {
+ percpu_counter_tree_approximate_min_max_range(points,
+ accuracy_under, accuracy_over,
+ &points_min, &points_max);
+ }
+ if (oc->approximate) {
+ /*
+ * Keep the process which has the highest minimum
+ * possible points value based on approximation.
+ */
+ if (points == LONG_MIN || points_min < oc->chosen_points)
+ goto next;
+ } else {
+ /*
+ * Eliminate processes which are certainly below the
+ * chosen points minimum possible value with an
+ * approximation.
+ */
+ if (points == LONG_MIN || (long)(points_max - oc->chosen_points) < 0)
+ goto next;
+
+ if (oc->nr_precise < max_precise_badness_sums) {
+ oc->nr_precise++;
+ /* Precise evaluation. */
+ points_min = points_max = points = oom_badness(task, oc->totalpages, false, NULL, NULL);
+ if (points == LONG_MIN || (long)(points - oc->chosen_points) < 0)
+ goto next;
+ }
+ }
select:
if (oc->chosen)
put_task_struct(oc->chosen);
get_task_struct(task);
oc->chosen = task;
- oc->chosen_points = points;
+ oc->chosen_points = points_min;
next:
return 0;
abort:
@@ -358,14 +399,8 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
return 1;
}
-/*
- * Simple selection loop. We choose the process with the highest number of
- * 'points'. In case scan was aborted, oc->chosen is set to -1.
- */
-static void select_bad_process(struct oom_control *oc)
+static void select_bad_process_iter(struct oom_control *oc)
{
- oc->chosen_points = LONG_MIN;
-
if (is_memcg_oom(oc))
mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc);
else {
@@ -379,6 +414,25 @@ static void select_bad_process(struct oom_control *oc)
}
}
+/*
+ * Simple selection loop. We choose the process with the highest number of
+ * 'points'. In case scan was aborted, oc->chosen is set to -1.
+ */
+static void select_bad_process(struct oom_control *oc)
+{
+ oc->chosen_points = LONG_MIN;
+ oc->nr_precise = 0;
+
+ /* Approximate scan. */
+ oc->approximate = true;
+ select_bad_process_iter(oc);
+ if (oc->chosen == (void *)-1UL)
+ return;
+ /* Precise scan. */
+ oc->approximate = false;
+ select_bad_process_iter(oc);
+}
+
static int dump_task(struct task_struct *p, void *arg)
{
struct oom_control *oc = arg;
--
2.39.5
Hi Mathieu,
kernel test robot noticed the following build warnings:
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/lib-Introduce-hierarchical-per-cpu-counters/20260111-231206
base: next-20260109
patch link: https://lore.kernel.org/r/20260111150249.1222944-4-mathieu.desnoyers%40efficios.com
patch subject: [PATCH v12 3/3] mm: Implement precise OOM killer task selection
config: s390-randconfig-r071-20260112 (https://download.01.org/0day-ci/archive/20260112/202601120452.VufCnz2j-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 14.3.0
smatch version: v0.5.0-8985-g2614ff1a
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202601120452.VufCnz2j-lkp@intel.com/
smatch warnings:
mm/oom_kill.c:392 oom_evaluate_task() error: uninitialized symbol 'points_min'.
vim +/points_min +392 mm/oom_kill.c
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 322 static int oom_evaluate_task(struct task_struct *task, void *arg)
462607ecc519b19 David Rientjes 2012-07-31 323 {
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 324 struct oom_control *oc = arg;
72456781289a6ed Mathieu Desnoyers 2026-01-11 325 unsigned long accuracy_under = 0, accuracy_over = 0;
72456781289a6ed Mathieu Desnoyers 2026-01-11 326 long points, points_min, points_max;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 327
ac311a14c682dcd Shakeel Butt 2019-07-11 328 if (oom_unkillable_task(task))
ac311a14c682dcd Shakeel Butt 2019-07-11 329 goto next;
ac311a14c682dcd Shakeel Butt 2019-07-11 330
ac311a14c682dcd Shakeel Butt 2019-07-11 331 /* p may not have freeable memory in nodemask */
ac311a14c682dcd Shakeel Butt 2019-07-11 332 if (!is_memcg_oom(oc) && !oom_cpuset_eligible(task, oc))
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 333 goto next;
462607ecc519b19 David Rientjes 2012-07-31 334
462607ecc519b19 David Rientjes 2012-07-31 335 /*
462607ecc519b19 David Rientjes 2012-07-31 336 * This task already has access to memory reserves and is being killed.
a373966d1f64c04 Michal Hocko 2016-07-28 337 * Don't allow any other task to have access to the reserves unless
862e3073b3eed13 Michal Hocko 2016-10-07 338 * the task has MMF_OOM_SKIP because chances that it would release
a373966d1f64c04 Michal Hocko 2016-07-28 339 * any memory is quite low.
462607ecc519b19 David Rientjes 2012-07-31 340 */
862e3073b3eed13 Michal Hocko 2016-10-07 341 if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) {
12e423ba4eaed7b Lorenzo Stoakes 2025-08-12 342 if (mm_flags_test(MMF_OOM_SKIP, task->signal->oom_mm))
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 343 goto next;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 344 goto abort;
a373966d1f64c04 Michal Hocko 2016-07-28 345 }
462607ecc519b19 David Rientjes 2012-07-31 346
e1e12d2f3104be8 David Rientjes 2012-12-11 347 /*
e1e12d2f3104be8 David Rientjes 2012-12-11 348 * If task is allocating a lot of memory and has been marked to be
e1e12d2f3104be8 David Rientjes 2012-12-11 349 * killed first if it triggers an oom, then select it.
e1e12d2f3104be8 David Rientjes 2012-12-11 350 */
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 351 if (oom_task_origin(task)) {
9066e5cfb73cdbc Yafang Shao 2020-08-11 352 points = LONG_MAX;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 353 goto select;
points_min is uninitialized.
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 354 }
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 355
72456781289a6ed Mathieu Desnoyers 2026-01-11 356 points = oom_badness(task, oc->totalpages, true, &accuracy_under, &accuracy_over);
72456781289a6ed Mathieu Desnoyers 2026-01-11 357 if (points != LONG_MIN) {
72456781289a6ed Mathieu Desnoyers 2026-01-11 358 percpu_counter_tree_approximate_min_max_range(points,
72456781289a6ed Mathieu Desnoyers 2026-01-11 359 accuracy_under, accuracy_over,
72456781289a6ed Mathieu Desnoyers 2026-01-11 360 &points_min, &points_max);
72456781289a6ed Mathieu Desnoyers 2026-01-11 361 }
72456781289a6ed Mathieu Desnoyers 2026-01-11 362 if (oc->approximate) {
72456781289a6ed Mathieu Desnoyers 2026-01-11 363 /*
72456781289a6ed Mathieu Desnoyers 2026-01-11 364 * Keep the process which has the highest minimum
72456781289a6ed Mathieu Desnoyers 2026-01-11 365 * possible points value based on approximation.
72456781289a6ed Mathieu Desnoyers 2026-01-11 366 */
72456781289a6ed Mathieu Desnoyers 2026-01-11 367 if (points == LONG_MIN || points_min < oc->chosen_points)
72456781289a6ed Mathieu Desnoyers 2026-01-11 368 goto next;
72456781289a6ed Mathieu Desnoyers 2026-01-11 369 } else {
72456781289a6ed Mathieu Desnoyers 2026-01-11 370 /*
72456781289a6ed Mathieu Desnoyers 2026-01-11 371 * Eliminate processes which are certainly below the
72456781289a6ed Mathieu Desnoyers 2026-01-11 372 * chosen points minimum possible value with an
72456781289a6ed Mathieu Desnoyers 2026-01-11 373 * approximation.
72456781289a6ed Mathieu Desnoyers 2026-01-11 374 */
72456781289a6ed Mathieu Desnoyers 2026-01-11 375 if (points == LONG_MIN || (long)(points_max - oc->chosen_points) < 0)
72456781289a6ed Mathieu Desnoyers 2026-01-11 376 goto next;
72456781289a6ed Mathieu Desnoyers 2026-01-11 377
72456781289a6ed Mathieu Desnoyers 2026-01-11 378 if (oc->nr_precise < max_precise_badness_sums) {
72456781289a6ed Mathieu Desnoyers 2026-01-11 379 oc->nr_precise++;
72456781289a6ed Mathieu Desnoyers 2026-01-11 380 /* Precise evaluation. */
72456781289a6ed Mathieu Desnoyers 2026-01-11 381 points_min = points_max = points = oom_badness(task, oc->totalpages, false, NULL, NULL);
72456781289a6ed Mathieu Desnoyers 2026-01-11 382 if (points == LONG_MIN || (long)(points - oc->chosen_points) < 0)
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 383 goto next;
72456781289a6ed Mathieu Desnoyers 2026-01-11 384 }
72456781289a6ed Mathieu Desnoyers 2026-01-11 385 }
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 386
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 387 select:
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 388 if (oc->chosen)
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 389 put_task_struct(oc->chosen);
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 390 get_task_struct(task);
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 391 oc->chosen = task;
72456781289a6ed Mathieu Desnoyers 2026-01-11 @392 oc->chosen_points = points_min;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 393 next:
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 394 return 0;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 395 abort:
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 396 if (oc->chosen)
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 397 put_task_struct(oc->chosen);
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 398 oc->chosen = (void *)-1UL;
7c5f64f84483bd1 Vladimir Davydov 2016-10-07 399 return 1;
462607ecc519b19 David Rientjes 2012-07-31 400 }
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Mathieu,
kernel test robot noticed the following build warnings:
[auto build test WARNING on next-20260109]
[cannot apply to akpm-mm/mm-everything kees/for-next/execve tip/sched/core linus/master v6.19-rc4 v6.19-rc3 v6.19-rc2 v6.19-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/lib-Introduce-hierarchical-per-cpu-counters/20260111-231206
base: next-20260109
patch link: https://lore.kernel.org/r/20260111150249.1222944-4-mathieu.desnoyers%40efficios.com
patch subject: [PATCH v12 3/3] mm: Implement precise OOM killer task selection
config: arm-randconfig-001-20260112 (https://download.01.org/0day-ci/archive/20260112/202601120124.RK3AWOwu-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9b8addffa70cee5b2acc5454712d9cf78ce45710)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260112/202601120124.RK3AWOwu-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601120124.RK3AWOwu-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> kernel/fork.c:637:6: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
635 | pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
| ~~~
| %d
636 | mm, resident_page_types[i],
637 | percpu_counter_tree_precise_sum(&mm->rss_stat[i]),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/printk.h:534:35: note: expanded from macro 'pr_alert'
534 | printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
| ~~~ ^~~~~~~~~~~
include/linux/printk.h:511:60: note: expanded from macro 'printk'
511 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
| ~~~ ^~~~~~~~~~~
include/linux/printk.h:483:19: note: expanded from macro 'printk_index_wrap'
483 | _p_func(_fmt, ##__VA_ARGS__); \
| ~~~~ ^~~~~~~~~~~
1 warning generated.
--
>> mm/oom_kill.c:351:6: warning: variable 'points_min' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
351 | if (oom_task_origin(task)) {
| ^~~~~~~~~~~~~~~~~~~~~
mm/oom_kill.c:392:22: note: uninitialized use occurs here
392 | oc->chosen_points = points_min;
| ^~~~~~~~~~
mm/oom_kill.c:351:2: note: remove the 'if' if its condition is always false
351 | if (oom_task_origin(task)) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
352 | points = LONG_MAX;
| ~~~~~~~~~~~~~~~~~~
353 | goto select;
| ~~~~~~~~~~~~
354 | }
| ~
mm/oom_kill.c:326:25: note: initialize the variable 'points_min' to silence this warning
326 | long points, points_min, points_max;
| ^
| = 0
1 warning generated.
vim +637 kernel/fork.c
6af8cb80d3a9a6 David Hildenbrand 2025-03-03 625
d70f2a14b72a4b Andrew Morton 2018-01-31 626 static void check_mm(struct mm_struct *mm)
d70f2a14b72a4b Andrew Morton 2018-01-31 627 {
d70f2a14b72a4b Andrew Morton 2018-01-31 628 int i;
d70f2a14b72a4b Andrew Morton 2018-01-31 629
8495f7e6732ed2 Sai Praneeth Prakhya 2019-09-25 630 BUILD_BUG_ON_MSG(ARRAY_SIZE(resident_page_types) != NR_MM_COUNTERS,
8495f7e6732ed2 Sai Praneeth Prakhya 2019-09-25 631 "Please make sure 'struct resident_page_types[]' is updated as well");
8495f7e6732ed2 Sai Praneeth Prakhya 2019-09-25 632
d70f2a14b72a4b Andrew Morton 2018-01-31 633 for (i = 0; i < NR_MM_COUNTERS; i++) {
25d942f31cc499 Mathieu Desnoyers 2026-01-11 634 if (unlikely(percpu_counter_tree_precise_compare_value(&mm->rss_stat[i], 0) != 0))
72456781289a6e Mathieu Desnoyers 2026-01-11 635 pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
25d942f31cc499 Mathieu Desnoyers 2026-01-11 636 mm, resident_page_types[i],
25d942f31cc499 Mathieu Desnoyers 2026-01-11 @637 percpu_counter_tree_precise_sum(&mm->rss_stat[i]),
881388f3433819 Xuanye Liu 2025-07-23 638 current->comm,
881388f3433819 Xuanye Liu 2025-07-23 639 task_pid_nr(current));
881388f3433819 Xuanye Liu 2025-07-23 640 }
d70f2a14b72a4b Andrew Morton 2018-01-31 641
d70f2a14b72a4b Andrew Morton 2018-01-31 642 if (mm_pgtables_bytes(mm))
d70f2a14b72a4b Andrew Morton 2018-01-31 643 pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
d70f2a14b72a4b Andrew Morton 2018-01-31 644 mm_pgtables_bytes(mm));
d70f2a14b72a4b Andrew Morton 2018-01-31 645
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 2026-01-11 13:03, kernel test robot wrote:
> Hi Mathieu,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on next-20260109]
> [cannot apply to akpm-mm/mm-everything kees/for-next/execve tip/sched/core linus/master v6.19-rc4 v6.19-rc3 v6.19-rc2 v6.19-rc4]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/lib-Introduce-hierarchical-per-cpu-counters/20260111-231206
> base: next-20260109
> patch link: https://lore.kernel.org/r/20260111150249.1222944-4-mathieu.desnoyers%40efficios.com
> patch subject: [PATCH v12 3/3] mm: Implement precise OOM killer task selection
> config: arm-randconfig-001-20260112 (https://download.01.org/0day-ci/archive/20260112/202601120124.RK3AWOwu-lkp@intel.com/config)
> compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9b8addffa70cee5b2acc5454712d9cf78ce45710)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260112/202601120124.RK3AWOwu-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202601120124.RK3AWOwu-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>>> kernel/fork.c:637:6: warning: format specifies type 'long' but the argument has type 'int' [-Wformat]
> 635 | pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
> | ~~~
> | %d
> 636 | mm, resident_page_types[i],
> 637 | percpu_counter_tree_precise_sum(&mm->rss_stat[i]),
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> include/linux/printk.h:534:35: note: expanded from macro 'pr_alert'
> 534 | printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> | ~~~ ^~~~~~~~~~~
> include/linux/printk.h:511:60: note: expanded from macro 'printk'
> 511 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__)
> | ~~~ ^~~~~~~~~~~
> include/linux/printk.h:483:19: note: expanded from macro 'printk_index_wrap'
> 483 | _p_func(_fmt, ##__VA_ARGS__); \
> | ~~~~ ^~~~~~~~~~~
> 1 warning generated.
That's percpu_counter_tree_precise_sum which needs to return a "long".
Will fix.
> --
>>> mm/oom_kill.c:351:6: warning: variable 'points_min' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> 351 | if (oom_task_origin(task)) {
> | ^~~~~~~~~~~~~~~~~~~~~
> mm/oom_kill.c:392:22: note: uninitialized use occurs here
> 392 | oc->chosen_points = points_min;
> | ^~~~~~~~~~
> mm/oom_kill.c:351:2: note: remove the 'if' if its condition is always false
> 351 | if (oom_task_origin(task)) {
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 352 | points = LONG_MAX;
I need to change this to "points_min = LONG_MAX". Will fix.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Hi Mathieu,
kernel test robot noticed the following build warnings:
[auto build test WARNING on next-20260109]
[cannot apply to akpm-mm/mm-everything kees/for-next/execve tip/sched/core linus/master v6.19-rc4 v6.19-rc3 v6.19-rc2 v6.19-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Mathieu-Desnoyers/lib-Introduce-hierarchical-per-cpu-counters/20260111-231206
base: next-20260109
patch link: https://lore.kernel.org/r/20260111150249.1222944-4-mathieu.desnoyers%40efficios.com
patch subject: [PATCH v12 3/3] mm: Implement precise OOM killer task selection
config: arc-randconfig-002-20260112 (https://download.01.org/0day-ci/archive/20260112/202601120122.xSlEJ1AJ-lkp@intel.com/config)
compiler: arc-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260112/202601120122.xSlEJ1AJ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601120122.xSlEJ1AJ-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from include/asm-generic/bug.h:31,
from arch/arc/include/asm/bug.h:30,
from include/linux/bug.h:5,
from include/linux/slab.h:15,
from kernel/fork.c:16:
kernel/fork.c: In function 'check_mm':
>> include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int' [-Wformat=]
#define KERN_SOH "\001" /* ASCII Start Of Header */
^~~~~~
include/linux/printk.h:483:11: note: in definition of macro 'printk_index_wrap'
_p_func(_fmt, ##__VA_ARGS__); \
^~~~
include/linux/printk.h:534:2: note: in expansion of macro 'printk'
printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~
include/linux/kern_levels.h:9:20: note: in expansion of macro 'KERN_SOH'
#define KERN_ALERT KERN_SOH "1" /* action must be taken immediately */
^~~~~~~~
include/linux/printk.h:534:9: note: in expansion of macro 'KERN_ALERT'
printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~
kernel/fork.c:635:4: note: in expansion of macro 'pr_alert'
pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
^~~~~~~~
kernel/fork.c:635:61: note: format string is defined here
pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
~~^
%d
vim +5 include/linux/kern_levels.h
314ba3520e513a Joe Perches 2012-07-30 4
04d2c8c83d0e3a Joe Perches 2012-07-30 @5 #define KERN_SOH "\001" /* ASCII Start Of Header */
04d2c8c83d0e3a Joe Perches 2012-07-30 6 #define KERN_SOH_ASCII '\001'
04d2c8c83d0e3a Joe Perches 2012-07-30 7
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 2026-01-11 12:50, kernel test robot wrote:
> Hi Mathieu,
>
> kernel test robot noticed the following build warnings:
>
[...]> All warnings (new ones prefixed by >>):
>
> In file included from include/asm-generic/bug.h:31,
> from arch/arc/include/asm/bug.h:30,
> from include/linux/bug.h:5,
> from include/linux/slab.h:15,
> from kernel/fork.c:16:
> kernel/fork.c: In function 'check_mm':
>>> include/linux/kern_levels.h:5:18: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int' [-Wformat=]
> #define KERN_SOH "\001" /* ASCII Start Of Header */
> ^~~~~~
> include/linux/printk.h:483:11: note: in definition of macro 'printk_index_wrap'
> _p_func(_fmt, ##__VA_ARGS__); \
> ^~~~
> include/linux/printk.h:534:2: note: in expansion of macro 'printk'
> printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> ^~~~~~
> include/linux/kern_levels.h:9:20: note: in expansion of macro 'KERN_SOH'
> #define KERN_ALERT KERN_SOH "1" /* action must be taken immediately */
> ^~~~~~~~
> include/linux/printk.h:534:9: note: in expansion of macro 'KERN_ALERT'
> printk(KERN_ALERT pr_fmt(fmt), ##__VA_ARGS__)
> ^~~~~~~~~~
> kernel/fork.c:635:4: note: in expansion of macro 'pr_alert'
> pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
> ^~~~~~~~
> kernel/fork.c:635:61: note: format string is defined here
> pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld Comm:%s Pid:%d\n",
> ~~^
> %d
percpu_counter_tree_precise_sum() needs to return a long, not int.
Will fix for v13.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
© 2016 - 2026 Red Hat, Inc.