From nobody Wed Dec 31 17:57:25 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A818C4332F for ; Tue, 31 Oct 2023 00:15:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236100AbjJaAPl (ORCPT ); Mon, 30 Oct 2023 20:15:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236097AbjJaAPi (ORCPT ); Mon, 30 Oct 2023 20:15:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FB00AB for ; Mon, 30 Oct 2023 17:14:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698711290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=IoQ2Le2ITOSvLLlQ6rdOrFcEhEfUWnC+H48y+paeBiM=; b=VUD67BggCP8j6qoyAOkK1XEkMT3NYfPrG/Ln66rhC6TkX/oqJO1f/Cp5R56Qrugg0qZBmS qtnNWQkbvSV60f10oIxp6Xs+iolLjGQo+IosSDZcBXRxpM2yMR6X8KWt6lU/uS4AkxgoEJ XgjnQ0nQ8wwcQrEBdngbAdWAw1aH2Vw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-119-bSQ0dSVyMeiQq3ONrasRCw-1; Mon, 30 Oct 2023 20:14:46 -0400 X-MC-Unique: bSQ0dSVyMeiQq3ONrasRCw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FFB7811E7E; Tue, 31 Oct 2023 00:14:46 +0000 (UTC) Received: from llong.com (unknown [10.22.10.39]) by smtp.corp.redhat.com (Postfix) with ESMTP id 524D42026D4C; Tue, 31 Oct 2023 00:14:45 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Phil Auld , kernel test robot , aubrey.li@linux.intel.com, yu.c.chen@intel.com, Waiman Long Subject: [PATCH] sched: Don't call any kfree*() API in do_set_cpus_allowed() Date: Mon, 30 Oct 2023 20:14:18 -0400 Message-Id: <20231031001418.274187-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()") added a kfree() call to free any user provided affinity mask, if present. It was changed later to use kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()") to avoid a circular locking dependency problem. It turns out that even kfree_rcu() isn't safe for avoiding circular locking problem. As reported by kernel test robot, the following circular locking dependency still exists: &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock So no kfree*() API can be used in do_set_cpus_allowed(). To prevent memory leakage, the unused user provided affinity mask is now saved in a lockless list to be reused later by subsequent sched_setaffinity() calls. Without kfree_rcu(), the internal cpumask_rcuhead union can be removed too as a lockless list entry only holds a single pointer. Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allo= wed()") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@in= tel.com Signed-off-by: Waiman Long --- kernel/sched/core.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 802551e0009b..f536d11a284e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2789,6 +2789,11 @@ __do_set_cpus_allowed(struct task_struct *p, struct = affinity_context *ctx) set_next_task(rq, p); } =20 +/* + * A lockless list of free cpumasks to be used for user cpumasks. + */ +static LLIST_HEAD(cpumask_free_lhead); + /* * Used for kthread_bind() and select_fallback_rq(), in both cases the user * affinity (if any) should be destroyed too. @@ -2800,29 +2805,29 @@ void do_set_cpus_allowed(struct task_struct *p, con= st struct cpumask *new_mask) .user_mask =3D NULL, .flags =3D SCA_USER, /* clear the user requested mask */ }; - union cpumask_rcuhead { - cpumask_t cpumask; - struct rcu_head rcu; - }; =20 __do_set_cpus_allowed(p, &ac); =20 /* - * Because this is called with p->pi_lock held, it is not possible - * to use kfree() here (when PREEMPT_RT=3Dy), therefore punt to using - * kfree_rcu(). + * We can't call any kfree*() API here as p->pi_lock and/or rq lock + * may be held. So we save it in a llist to be reused in the next + * sched_setaffinity() call. */ - kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu); + if (ac.user_mask) + llist_add((struct llist_node *)ac.user_mask, &cpumask_free_lhead); } =20 static cpumask_t *alloc_user_cpus_ptr(int node) { - /* - * See do_set_cpus_allowed() above for the rcu_head usage. - */ - int size =3D max_t(int, cpumask_size(), sizeof(struct rcu_head)); + struct cpumask *pmask =3D NULL; + + if (!llist_empty(&cpumask_free_lhead)) + pmask =3D (struct cpumask *)llist_del_first(&cpumask_free_lhead); + + if (!pmask) + pmask =3D kmalloc_node(cpumask_size(), GFP_KERNEL, node); =20 - return kmalloc_node(size, GFP_KERNEL, node); + return pmask; } =20 int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src, --=20 2.39.3