From nobody Mon Jun 8 15:36:59 2026 Received: from out-172.mta0.migadu.com (out-172.mta0.migadu.com [91.218.175.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48F353F4DCF for ; Thu, 28 May 2026 13:30:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975026; cv=none; b=bef+hsXCmfT2AqJNaQ2xH8cEKLgulxJf6PyKH7BA8x/ASrXoYOKw03aWKsvxcHLG6k/kG/MQEnA72Om+0AhiY/hq/q2GudhobFXs1fgXMRypy9JMUN6elzJPsIPYSftVpCOGQWgEEQ0FmvG+7IplaJ6cUIVHZdfawERsREBAfIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975026; c=relaxed/simple; bh=fBV7kgjVOW7YD+b0n4xG3n5GcYxPNPKjJh2X2fJk3gM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Suy98qJS3XrxIpWOmnoPaCQMrP3pmvhklk8D39tuAToGlrWsveF5e6ygQ4iuPLJIlpa46gpf6DcEa1WAe8uhpMTBTpzrjX8S2UzJGbNwrIeDJOrhxIMMVa7QZ24EsekDVEPOMcLmgzYFSQXVjiMal+UvQjmgp8XKkhKpfIyEwTk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=cyDYzH4/; arc=none smtp.client-ip=91.218.175.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="cyDYzH4/" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1779975023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ymCDErMyx33t1ESuPWjnecWS9by/6Siz6vuFexERi1I=; b=cyDYzH4/+ZT8vA7JAhmm8qbgu/lvOEBFg9dhu1qduEUxh1Hrjxpsxk8yoPy22H3wD+Y6Ib sqbb3EhkhGuIyldt0iTGWDTkvmKk8MMTvjyDKA3LtzB2hJbLASAdZO3KtAB3tpEqr25KfK jyQkDK4jNvpteFI7zeUAyu9Vktw0iyk= From: Kaitao Cheng To: dennis@kernel.org, tj@kernel.org, cl@gentwo.org, akpm@linux-foundation.org Cc: mhocko@suse.com, vbabka@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Kaitao Cheng Subject: [PATCH 1/2] mm/percpu: Preserve NOFS/NOIO scope during chunk create and populate Date: Thu, 28 May 2026 21:29:16 +0800 Message-ID: <20260528132917.81123-2-kaitao.cheng@linux.dev> In-Reply-To: <20260528132917.81123-1-kaitao.cheng@linux.dev> References: <20260528132917.81123-1-kaitao.cheng@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Kaitao Cheng pcpu_alloc_noprof() derives pcpu_gfp from the caller supplied GFP mask and passes it to the backing percpu allocators. This preserves GFP_NOFS and GFP_NOIO for pcpu_alloc_pages() and for the initial pcpu_chunk allocation. However, the chunk creation and population slow paths also call helpers which do not take a GFP mask and perform internal allocations with GFP_KERNEL. For example, pcpu_create_chunk() calls pcpu_get_vm_areas(), and population can allocate temporary metadata or page tables while mapping backing pages. As a result, a caller which explicitly uses GFP_NOFS or GFP_NOIO can still enter FS or IO reclaim while creating or populating a percpu chunk. This is problematic for callers which use GFP_NOFS or GFP_NOIO because they are already holding filesystem or IO-path locks. If free chunks are exhausted, the percpu allocation can take pcpu_alloc_mutex and then enter unconstrained reclaim from these internal allocations, defeating the caller's allocation context and potentially recreating reclaim lock dependencies. Wrap chunk creation and population in a scoped NOIO or NOFS context when pcpu_gfp has the corresponding constraints. Leave ordinary GFP_KERNEL allocations unchanged so they retain full reclaim capability. Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations ato= mic") Signed-off-by: Kaitao Cheng --- mm/percpu.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/percpu.c b/mm/percpu.c index 71a85d7245c7..1bb38467390b 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1778,6 +1778,23 @@ static void pcpu_alloc_tag_free_hook(struct pcpu_chu= nk *chunk, int off, size_t s } #endif =20 +static unsigned int pcpu_memalloc_scope_save(gfp_t gfp) +{ + if (!(gfp & __GFP_IO)) + return memalloc_noio_save(); + if (!(gfp & __GFP_FS)) + return memalloc_nofs_save(); + return 0; +} + +static void pcpu_memalloc_scope_restore(gfp_t gfp, unsigned int flags) +{ + if (!(gfp & __GFP_IO)) + memalloc_noio_restore(flags); + else if (!(gfp & __GFP_FS)) + memalloc_nofs_restore(flags); +} + /** * pcpu_alloc - the percpu allocator * @size: size of area to allocate in bytes @@ -1901,7 +1918,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t= align, bool reserved, =20 /* No space left. Create a new chunk. */ if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) { + unsigned int pcpu_scope; + + pcpu_scope =3D pcpu_memalloc_scope_save(pcpu_gfp); chunk =3D pcpu_create_chunk(pcpu_gfp); + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope); + if (!chunk) { err =3D "failed to allocate new chunk"; goto fail; @@ -1931,9 +1953,13 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t= align, bool reserved, page_end =3D PFN_UP(off + size); =20 for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) { + unsigned int pcpu_scope; + WARN_ON(chunk->immutable); =20 + pcpu_scope =3D pcpu_memalloc_scope_save(pcpu_gfp); ret =3D pcpu_populate_chunk(chunk, rs, re, pcpu_gfp); + pcpu_memalloc_scope_restore(pcpu_gfp, pcpu_scope); =20 spin_lock_irqsave(&pcpu_lock, flags); if (ret) { --=20 2.50.1 (Apple Git-155) From nobody Mon Jun 8 15:36:59 2026 Received: from out-173.mta0.migadu.com (out-173.mta0.migadu.com [91.218.175.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8613D3F44E8 for ; Thu, 28 May 2026 13:30:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975030; cv=none; b=hSF6HpBNTFi1djrf2+nMA9qZ9Bxu5UAf67OPRRAXj81COV3ZwUDO91euWtxI78CbCqGXqxW3q8oYGj9F7caR58jh9ee77XhOjIoOMMSCZEWP6mH/3200sLfZeVXtxuk7weyXajm8bSWqGmZDryjVoV7FYSVyis3c+m7Y0ZT2NYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779975030; c=relaxed/simple; bh=60OxiMo42dilbQTJSpdna78BcBAGrO3oDLbTHFznUBM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BCdiWJ6MwTwX46K7KocKcABRj1aPVnD14WXZ5JzOGC5w9nSoMaDnt+K4kluQ21IqBpxP3EAk366doGh0gzyGoGXNwA5QZdLFhwQQaHzXgXBd8g1A7eyKJdyWPiXfTcpBvI8fZgcOf7Yq9ZCB7Nf21c497OG0sL1tiYtKWMkPAec= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=HIyd9o0V; arc=none smtp.client-ip=91.218.175.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="HIyd9o0V" X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1779975026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xtP6vXc+wJ1AKv++CgvfrsjrOTISB3WM3uSjHYG3gWE=; b=HIyd9o0V6brTfZATywJxIFk831aiMRHc+XNfJ8OFdKNEZu/JOtvt1X6UUEetvJSxDR4MQ4 W5oNzudzyvuB7V8LEJlYz/BxhuUaw/VIQKTPj53ulQkUDg5Wju+1H954dpOWAOhjzMNKXk +BAkjgcMQrRSNLrH7PJmtp3E0gzimbU= From: Kaitao Cheng To: dennis@kernel.org, tj@kernel.org, cl@gentwo.org, akpm@linux-foundation.org Cc: mhocko@suse.com, vbabka@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Kaitao Cheng Subject: [PATCH 2/2] mm/percpu: Avoid pcpu_alloc_mutex recursion from reclaim Date: Thu, 28 May 2026 21:29:17 +0800 Message-ID: <20260528132917.81123-3-kaitao.cheng@linux.dev> In-Reply-To: <20260528132917.81123-1-kaitao.cheng@linux.dev> References: <20260528132917.81123-1-kaitao.cheng@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Migadu-Flow: FLOW_OUT Content-Type: text/plain; charset="utf-8" From: Kaitao Cheng pcpu_alloc_noprof() takes pcpu_alloc_mutex for sleepable allocations so that it can create chunks and populate backing pages. If reclaim is entered while that mutex is already held, and reclaim reaches a path which allocates percpu memory, the nested allocation can try to take pcpu_alloc_mutex again. That creates a reclaim recursion dependency: pcpu_alloc_noprof(GFP_KERNEL) mutex_lock(&pcpu_alloc_mutex) reclaim pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS) mutex_lock(&pcpu_alloc_mutex) Avoid this by treating percpu allocations from reclaim context as atomic. Such allocations may still be served from already available and populated areas, but they must not enter the mutex-protected slow path or create new chunks. If no space is available, fail the allocation and let the normal balance work handle replenishment outside reclaim. Update the function comment to describe that reclaim context allocations are atomic regardless of whether the supplied GFP mask would otherwise allow blocking. This patch is a preventive fix. There may not currently be any path that calls pcpu_alloc_noprof(GFP_NOIO/GFP_NOFS) from direct reclaim context. Fixes: 9a5b183941b5 ("mm, percpu: do not consider sleepable allocations ato= mic") Signed-off-by: Kaitao Cheng --- mm/percpu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 1bb38467390b..9c30e5897813 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1803,9 +1803,9 @@ static void pcpu_memalloc_scope_restore(gfp_t gfp, un= signed int flags) * @gfp: allocation flags * * Allocate percpu area of @size bytes aligned at @align. If @gfp doesn't - * contain %GFP_KERNEL, the allocation is atomic. If @gfp has __GFP_NOWARN - * then no warning will be triggered on invalid or failed allocation - * requests. + * allow blocking, or if allocation is requested from reclaim context, the + * allocation is atomic. If @gfp has __GFP_NOWARN then no warning will be + * triggered on invalid or failed allocation requests. * * RETURNS: * Percpu pointer to the allocated area on success, NULL on failure. @@ -1828,7 +1828,12 @@ void __percpu *pcpu_alloc_noprof(size_t size, size_t= align, bool reserved, gfp =3D current_gfp_context(gfp); /* whitelisted flags that can be passed to the backing allocators */ pcpu_gfp =3D gfp & (GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); - is_atomic =3D !gfpflags_allow_blocking(gfp); + /* + * Reclaim can be entered while pcpu_alloc_mutex is already held by + * another percpu allocation. Avoid recursing back into the mutex from + * reclaim; best-effort allocations from already populated areas are OK. + */ + is_atomic =3D !gfpflags_allow_blocking(gfp) || current->reclaim_state; do_warn =3D !(gfp & __GFP_NOWARN); =20 /* --=20 2.50.1 (Apple Git-155)