From nobody Wed Dec 17 23:22:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83AC9177998 for ; Wed, 26 Jun 2024 15:31:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719415876; cv=none; b=klvfYkOC9/ddB+Rkl/tSOjQB4ues3a5oHOrB74sW5QdPTyRjJGj3Buv6L/2v8Yi84hsPAZ5C3uHot15R2F9jkgX1azEp0eLdzUWaF1FZAPCvGYmfKRpkFDGSWjOzFOHfsNgNuvTbiRThQi4HJq5UPqbMDjFF49mTU03tbDDUwM8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719415876; c=relaxed/simple; bh=A44+haS0ogOcovAsrTvZz6KwI9aZDCle6FMP44S65f0=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=RIO622CPDtw5TG4UCMOQcX/j148mRTJfz+ZTpvh9GHz6hlvSOiMK4AQKQgqExc1svJQAzHNYUDaXLr2ajwI6mwOAFcLGQtjdcQ+1sbeiM2DMPyPfutUU0mzM7KbZyfubRJSLgxXd3WlYhSeZPVY9N/32hFPPCbL2trCbHUXlaGI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Vtb3gQ+D; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Vtb3gQ+D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719415873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=xG79oUfDfsEVibBMciuKOQEduN1A78RdRi7KI1lmE6w=; b=Vtb3gQ+DwKu87noID/udrwkfvG+XJupLpAUNsWR2nWAO+8GyLPW2cZwQd5uaVIPisgWTjd bOtzeAcY2OSVle2X0c3ckksfNiVjlfLPuvTdoK+COioDPezuRJAZ3SrnBFWzMcFGxQeK8e gKjxcTSJeX9b/S1oF2jYqwN25wIgdLI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-176-bDtvoUNXNr60LU_1d4RpxA-1; Wed, 26 Jun 2024 11:31:06 -0400 X-MC-Unique: bDtvoUNXNr60LU_1d4RpxA-1 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 117A1195606C; Wed, 26 Jun 2024 15:31:03 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.226.94]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with SMTP id 756DC300021A; Wed, 26 Jun 2024 15:30:58 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Wed, 26 Jun 2024 17:29:29 +0200 (CEST) Date: Wed, 26 Jun 2024 17:29:24 +0200 From: Oleg Nesterov To: Andrew Morton , Michal Hocko Cc: Christian Brauner , "Eric W. Biederman" , Jens Axboe , Jinliang Zheng , Mateusz Guzik , Matthew Wilcox , Tycho Andersen , linux-kernel@vger.kernel.org Subject: [PATCH 1/2] memcg: mm_update_next_owner: kill the "retry" logic Message-ID: <20240626152924.GA17933@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240626152835.GA17910@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the new helper, try_to_set_owner(), which tries to update mm->owner once we see c->mm =3D=3D mm. This way mm_update_next_owner() doesn't need to restart the list_for_each_entry/for_each_process loops from the very beginning if it races with exit/exec, it can just continue. Unlike the current code, try_to_set_owner() re-checks tsk->mm =3D=3D mm before it drops tasklist_lock, so it doesn't need get/put_task_struct(). Signed-off-by: Oleg Nesterov Acked-by: Michal Hocko --- kernel/exit.c | 57 ++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 30 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index 45210443e68d..a1ef5f23d5be 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -439,6 +439,23 @@ static void coredump_task_exit(struct task_struct *tsk) } =20 #ifdef CONFIG_MEMCG +/* drops tasklist_lock if succeeds */ +static bool try_to_set_owner(struct task_struct *tsk, struct mm_struct *mm) +{ + bool ret =3D false; + + task_lock(tsk); + if (likely(tsk->mm =3D=3D mm)) { + /* tsk can't pass exit_mm/exec_mmap and exit */ + read_unlock(&tasklist_lock); + WRITE_ONCE(mm->owner, tsk); + lru_gen_migrate_mm(mm); + ret =3D true; + } + task_unlock(tsk); + return ret; +} + /* * A task is exiting. If it owned this mm, find a new owner for the mm. */ @@ -446,7 +463,6 @@ void mm_update_next_owner(struct mm_struct *mm) { struct task_struct *c, *g, *p =3D current; =20 -retry: /* * If the exiting or execing task is not the owner, it's * someone else's problem. @@ -468,16 +484,16 @@ void mm_update_next_owner(struct mm_struct *mm) * Search in the children */ list_for_each_entry(c, &p->children, sibling) { - if (c->mm =3D=3D mm) - goto assign_new_owner; + if (c->mm =3D=3D mm && try_to_set_owner(c, mm)) + goto ret; } =20 /* * Search in the siblings */ list_for_each_entry(c, &p->real_parent->children, sibling) { - if (c->mm =3D=3D mm) - goto assign_new_owner; + if (c->mm =3D=3D mm && try_to_set_owner(c, mm)) + goto ret; } =20 /* @@ -489,9 +505,11 @@ void mm_update_next_owner(struct mm_struct *mm) if (g->flags & PF_KTHREAD) continue; for_each_thread(g, c) { - if (c->mm =3D=3D mm) - goto assign_new_owner; - if (c->mm) + struct mm_struct *c_mm =3D READ_ONCE(c->mm); + if (c_mm =3D=3D mm) { + if (try_to_set_owner(c, mm)) + goto ret; + } else if (c_mm) break; } } @@ -502,30 +520,9 @@ void mm_update_next_owner(struct mm_struct *mm) * ptrace or page migration (get_task_mm()). Mark owner as NULL. */ WRITE_ONCE(mm->owner, NULL); + ret: return; =20 -assign_new_owner: - BUG_ON(c =3D=3D p); - get_task_struct(c); - /* - * The task_lock protects c->mm from changing. - * We always want mm->owner->mm =3D=3D mm - */ - task_lock(c); - /* - * Delay read_unlock() till we have the task_lock() - * to ensure that c does not slip away underneath us - */ - read_unlock(&tasklist_lock); - if (c->mm !=3D mm) { - task_unlock(c); - put_task_struct(c); - goto retry; - } - WRITE_ONCE(mm->owner, c); - lru_gen_migrate_mm(mm); - task_unlock(c); - put_task_struct(c); } #endif /* CONFIG_MEMCG */ =20 --=20 2.25.1.362.g51ebf55 From nobody Wed Dec 17 23:22:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D93718C322 for ; Wed, 26 Jun 2024 15:31:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719415879; cv=none; b=GJRArkmnZJ1garbtN5Qx8GhIkDZJpuxpeTLgr2yphhvZi43Al4FXXitblx1kcP4pAGvG4spx7UnujUJAcVM1UOpofCAzhBXSj3XDxRsgkCklJ5oDMVwwFBjucx/I4z1BwxN7VSVUVZArrD9rxTMv9QGBfxZPsqn9wqAyISKEmGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1719415879; c=relaxed/simple; bh=g2UyanIA1brZZ0AAppKl7b+Z3+5kebwYY6D2sODTWeI=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:In-Reply-To; b=ripJNh+kHUaobmLx3ATXpJGLA9qg5qZToj1zAQKLH9P4W4IkiJtmosRuf12lpS/v33tAXsGzPlII66aydV1mR7sqBu9FKPoblyMkMUHVxDnX4+ulSra0xLdZWmhE3Bfxb3jlGHPa4diQoiNh8Q35H1GoPKXa3hHImQ7IIz4FBAY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Pv3rmoOC; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Pv3rmoOC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1719415877; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=03Wnrm5YRjbsE1n0WJWiQiBYlDzQZX4kOrh/KXilwy0=; b=Pv3rmoOC7bj+1GuRtYO44IYM060Mf2Q51lh4f02SukAB9AbggM32SYwc2Aj8w3k1ENeuBW wLsmNny5a1wTaFPicckocL42PxrY5CW63mvF6lg29fUZtilCKm8i2IzvCazmBwFCG4V6TD O/k6KM8XDesMc0K2rKbcN0qUnM9CcCE= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-335-pjUci0qIPbeqbeLoxLVOkA-1; Wed, 26 Jun 2024 11:31:11 -0400 X-MC-Unique: pjUci0qIPbeqbeLoxLVOkA-1 Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 32B431956057; Wed, 26 Jun 2024 15:31:09 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.226.94]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with SMTP id 30FE81956087; Wed, 26 Jun 2024 15:31:04 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Wed, 26 Jun 2024 17:29:35 +0200 (CEST) Date: Wed, 26 Jun 2024 17:29:30 +0200 From: Oleg Nesterov To: Andrew Morton , Michal Hocko Cc: Christian Brauner , "Eric W. Biederman" , Jens Axboe , Jinliang Zheng , Mateusz Guzik , Matthew Wilcox , Tycho Andersen , linux-kernel@vger.kernel.org Subject: [PATCH 2/2] memcg: mm_update_next_owner: move for_each_thread() into try_to_set_owner() Message-ID: <20240626152930.GA17936@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240626152835.GA17910@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mm_update_next_owner() checks the children / real_parent->children to avoid the "everything else" loop in the likely case, but this won't work if a child/sibling has a zombie leader with ->mm =3D=3D NULL. Move the for_each_thread() logic into try_to_set_owner(), if nothing else this makes the children/siblings/everything searches more consistent. Signed-off-by: Oleg Nesterov Acked-by: Michal Hocko --- kernel/exit.c | 40 ++++++++++++++++++++++++---------------- 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/kernel/exit.c b/kernel/exit.c index a1ef5f23d5be..cc56edc1103e 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -440,7 +440,7 @@ static void coredump_task_exit(struct task_struct *tsk) =20 #ifdef CONFIG_MEMCG /* drops tasklist_lock if succeeds */ -static bool try_to_set_owner(struct task_struct *tsk, struct mm_struct *mm) +static bool __try_to_set_owner(struct task_struct *tsk, struct mm_struct *= mm) { bool ret =3D false; =20 @@ -456,12 +456,28 @@ static bool try_to_set_owner(struct task_struct *tsk,= struct mm_struct *mm) return ret; } =20 +static bool try_to_set_owner(struct task_struct *g, struct mm_struct *mm) +{ + struct task_struct *t; + + for_each_thread(g, t) { + struct mm_struct *t_mm =3D READ_ONCE(t->mm); + if (t_mm =3D=3D mm) { + if (__try_to_set_owner(t, mm)) + return true; + } else if (t_mm) + break; + } + + return false; +} + /* * A task is exiting. If it owned this mm, find a new owner for the mm. */ void mm_update_next_owner(struct mm_struct *mm) { - struct task_struct *c, *g, *p =3D current; + struct task_struct *g, *p =3D current; =20 /* * If the exiting or execing task is not the owner, it's @@ -483,19 +499,17 @@ void mm_update_next_owner(struct mm_struct *mm) /* * Search in the children */ - list_for_each_entry(c, &p->children, sibling) { - if (c->mm =3D=3D mm && try_to_set_owner(c, mm)) + list_for_each_entry(g, &p->children, sibling) { + if (try_to_set_owner(g, mm)) goto ret; } - /* * Search in the siblings */ - list_for_each_entry(c, &p->real_parent->children, sibling) { - if (c->mm =3D=3D mm && try_to_set_owner(c, mm)) + list_for_each_entry(g, &p->real_parent->children, sibling) { + if (try_to_set_owner(g, mm)) goto ret; } - /* * Search through everything else, we should not get here often. */ @@ -504,14 +518,8 @@ void mm_update_next_owner(struct mm_struct *mm) break; if (g->flags & PF_KTHREAD) continue; - for_each_thread(g, c) { - struct mm_struct *c_mm =3D READ_ONCE(c->mm); - if (c_mm =3D=3D mm) { - if (try_to_set_owner(c, mm)) - goto ret; - } else if (c_mm) - break; - } + if (try_to_set_owner(g, mm)) + goto ret; } read_unlock(&tasklist_lock); /* --=20 2.25.1.362.g51ebf55