From nobody Fri Sep 12 14:10:40 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08D90C636D4 for ; Thu, 9 Feb 2023 07:40:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229907AbjBIHkt convert rfc822-to-8bit (ORCPT ); Thu, 9 Feb 2023 02:40:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229879AbjBIHkn (ORCPT ); Thu, 9 Feb 2023 02:40:43 -0500 X-Greylist: delayed 720 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Feb 2023 23:40:41 PST Received: from chinatelecom.cn (prt-mail.chinatelecom.cn [42.123.76.220]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 55C70402DA for ; Wed, 8 Feb 2023 23:40:41 -0800 (PST) HMM_SOURCE_IP: 172.18.0.188:33460.1299675246 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-10.133.8.199 (unknown [172.18.0.188]) by chinatelecom.cn (HERMES) with SMTP id 68AA52800BD; Thu, 9 Feb 2023 15:14:56 +0800 (CST) X-189-SAVE-TO-SEND: huyd12@chinatelecom.cn Received: from ([10.133.8.199]) by app0023 with ESMTP id 79a6d3fc04714ff6ba7bdd059d93cc5d for agruenba@redhat.com; Thu, 09 Feb 2023 15:15:12 CST X-Transaction-ID: 79a6d3fc04714ff6ba7bdd059d93cc5d X-Real-From: huyd12@chinatelecom.cn X-Receive-IP: 10.133.8.199 X-MEDUSA-Status: 0 Sender: huyd12@chinatelecom.cn From: To: , , , "'Christian Brauner'" , "'Michal Hocko'" , "'Andrew Morton'" Cc: , , References: <20230208094905.373-1-liuq131@chinatelecom.cn> In-Reply-To: <20230208094905.373-1-liuq131@chinatelecom.cn> Subject: =?gb2312?B?u9i4tDogW1BBVENIXSBwaWQ6IGFkZCBoYW5kbGluZyBvZiB0b28gbQ==?= =?gb2312?B?YW55IHpvbWJpZSBwcm9jZXNzZXM=?= Date: Thu, 9 Feb 2023 15:14:57 +0800 Message-ID: <000e01d93c56$3a4bcb00$aee36100$@chinatelecom.cn> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQDiPQqBEqp6eGY2GPbOHeYBBQy2tLC0MRXw Content-Language: zh-cn Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Any comments will be appreciated. -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: liuq131@chinatelecom.cn =20 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2023=E5=B9=B42=E6=9C=888=E6=97=A5 17:= 49 =E6=94=B6=E4=BB=B6=E4=BA=BA: akpm@linux-foundation.org =E6=8A=84=E9=80=81: agruenba@redhat.com; linux-mm@kvack.org; linux-kernel@v= ger.kernel.org; huyd12@chinatelecom.cn; liuq =E4=B8=BB=E9=A2=98: [PATCH] pid: add handling of too many zombie processes There is a common situation that a parent process forks many child processes to execute tasks, but the parent process does not execute wait/waitpid when the child process exits, resulting in a large number of child processes becoming zombie processes. At this time, if the number of processes in the system out of kernel.pid_max, the new fork syscall will fail, and the system will not be able to execute any command at this time (unless an old process exits) eg: [root@lq-workstation ~]# ls -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable [root@lq-workstation ~]# reboot -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable I dealt with this situation in the alloc_pid function, and found a process with the most zombie subprocesses, and more than 10(or other reasonable values?) zombie subprocesses, so I tried to kill this process to release the pid resources. Signed-off-by: liuq --- include/linux/mm.h | 2 ++ kernel/pid.c | 6 +++- mm/oom_kill.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..afcff08a3878 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1940,6 +1940,8 @@ static inline void clear_page_pfmemalloc(struct page *page) * Can be called by the pagefault handler when it gets a VM_FAULT_OOM. */ extern void pagefault_out_of_memory(void); +extern void pid_max_oom_check(struct pid_namespace *ns); + =20 #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1)) diff --git a/kernel/pid.c b/kernel/pid.c index 3fbc5e46b721..1a9a60e19ab6 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -237,7 +237,11 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, idr_preload_end(); =20 if (nr < 0) { - retval =3D (nr =3D=3D -ENOSPC) ? -EAGAIN : nr; + retval =3D nr; + if (nr =3D=3D -ENOSPC) { + retval =3D -EAGAIN; + pid_max_oom_check(tmp); + } goto out_free; } =20 diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1276e49b31b0..18d05d706f48 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1260,3 +1260,73 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +static void oom_pid_evaluate_task(struct task_struct *p, + struct task_struct **max_zombie_task, int *max_zombie_num) { + struct task_struct *child; + int zombie_num =3D 0; + + list_for_each_entry(child, &p->children, sibling) { + if (child->exit_state =3D=3D EXIT_ZOMBIE) + zombie_num++; + } + if (zombie_num > *max_zombie_num) { + *max_zombie_num =3D zombie_num; + *max_zombie_task =3D p; + } +} +#define MAX_ZOMBIE_NUM 10 +struct task_struct *pid_max_bad_process(struct pid_namespace *ns) { + int max_zombie_num =3D 0; + struct task_struct *max_zombie_task =3D &init_task; + struct task_struct *p; + + rcu_read_lock(); + for_each_process(p) + oom_pid_evaluate_task(p, &max_zombie_task, &max_zombie_num); + rcu_read_unlock(); + + if (max_zombie_num > MAX_ZOMBIE_NUM) { + pr_info("process %d has %d zombie child\n", + task_pid_nr_ns(max_zombie_task, ns), max_zombie_num); + return max_zombie_task; + } + + return NULL; +} + +void pid_max_oom_kill_process(struct task_struct *task) { + struct oom_control oc =3D { + .zonelist =3D NULL, + .nodemask =3D NULL, + .memcg =3D NULL, + .gfp_mask =3D 0, + .order =3D 0, + }; + + get_task_struct(task); + oc.chosen =3D task; + + if (mem_cgroup_oom_synchronize(true)) + return; + + if (!mutex_trylock(&oom_lock)) + return; + + oom_kill_process(&oc, "Out of pid max(oom_kill_allocating_task)"); + mutex_unlock(&oom_lock); +} + +void pid_max_oom_check(struct pid_namespace *ns) { + struct task_struct *p; + + p =3D pid_max_bad_process(ns); + if (p) { + pr_info("oom_kill process %d\n", task_pid_nr_ns(p, ns)); + pid_max_oom_kill_process(p); + } +} -- 2.27.0