From nobody Tue Dec 2 02:05:21 2025 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01CA52F83B2 for ; Thu, 20 Nov 2025 05:40:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763617227; cv=none; b=WfEF9u7BRYQihX4y0gRNqzwfpjQ/CWgUAQ4cRHJXFGSACJSfbF1BxvcVmUjpJVPnXnGrht8c0O9QA2XaUEu3Gn+JZ5Kf7W6Psczvy51UAk0yctHCaTbV7OdV4Nd8AsZXIfSVyIjxUumbbG5PK4lWu/AffHU+/mwW7FZxp6IkryI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763617227; c=relaxed/simple; bh=FgRNbGrm/As9c9VVx0WvKKfeaxFE4frIN/GIx6hiOcs=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=o6QzzFEhppm2tbzxSUquId4oZEs3dSbzj0rVcI7HAJEfwdpto3PjHHjYSsseC73Eetqv18eq5EHGkmOapnSAzqCNM0jXqRCsgYNSSZ2V8van1DBFKtm+Xa7MJ36hKioYC8uxVWrQ/qfw6kCBLT06iR0LlytmYHaG0FQp34lR0iU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Cd1wfaTf; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Cd1wfaTf" Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-42b32a5494dso211168f8f.2 for ; Wed, 19 Nov 2025 21:40:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763617224; x=1764222024; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=lgsFxspkgu6YkSyWhEfGfez4tdMFOsL2gZAGZAzxZjM=; b=Cd1wfaTfDuyFkrYkVahYsbw5i1hyzQ25NBkzn+06neILDwiWOyf80I9oyWwar0yGMT HcZnJK798SZGAiEhx519vk1245HEPdXTg7/QUPM9/g1W8bT+hiOWVGgXkHpKDGM3C76T xSoDoCwzKW87F/w0ykROfNYF+0h/k+GJTpEqPzflEg3In3HqVA2JmjsSz9Bm+BZxA6OI PrYxlWvnWgCeb/dcHnCdixgUoKvndiqGEna7jUQmGoNi8fPIu/9mWtaypYoPKwWPxiYK Vvf42rsxRTNe2nYJN6rjKGRpsCqmbgI3BOZR/zse/1eCZQCzG4Ph+SODEO/AET3Cu9gy db/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763617224; x=1764222024; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lgsFxspkgu6YkSyWhEfGfez4tdMFOsL2gZAGZAzxZjM=; b=eyuzyQ60pJSZHAvlXJmoSDYctniYpsMobYzooAPYCCvBgM0FTSul8+A/Kj869A4f94 hGRN9Cb1wbQzM7AZRCJi8dP6270J067Q2IhZ82sMt6W02UK2H2fdSQl115pRNECh4iOu qTFfesNuJ2c3R+x+7AOUdbMHn/JJUu3GUzkw+c6n2+Fo7BiJoqvkEnss6k7F3ohayRrd a14cATYijbqyBSWZdYPUZdgUbFeDSz4dm9PWx9qfyoXalXRxu4zYuIJybvcRnbocGbK0 7vTvoUTbEWHjozJorhrQIE5nnifEEzXdvAjooyE/YR3cNUwODoxbuMqQLTiHFV2D1QSv OcDw== X-Forwarded-Encrypted: i=1; AJvYcCXPd/zQ84G8LRepG7ByAVkqdWljas2DWLB5EAkH+af0J1kgtcmg1W41vk7aaH9T5SrmOZWRlaDjMq532fE=@vger.kernel.org X-Gm-Message-State: AOJu0YxVW1LH/QABbWMUvpnAexusRPHddzBHa6ibqc1vGUaAMcodWeCX iX3+3W8TTisxw5nPOjIdYXg1vmg/X9uJ3mZNF5M+j8UISELIiAW05dD0 X-Gm-Gg: ASbGncs8w2CUdGtN+slVII7u3LrypBWkVUwXM1MT6vzNm8b1RE6DZenEHI3iPYhZlRd SaDJEAnQfvftK90Vzx/A77IpwdlnCWrbLFWnpWU4CZmiUG0GliZCQT6SLtDh9QrfOvEBxKWpVcO FQJeHnuZunWRCEg2ZN3JYrAB2J6Qz6Rp7fV05IhWVwt81dgf+M0xPBMXrfWJMv7jNWRgMaEtZTR XQj+zXWnWS0ctH91aPFgqVhg0HA8bytAxZNLd6x+Q/ybJRDrYDsHyauLhMTnzzue4zo+68pBtJF NZKR9B/B1gjHeWzDTMv4ic3ymykEQ9NNPzkPAXRyzIpPqJh3RP04AAOjBgDQkky7f1Ab+Akjbmm DLGc3HFT0Dvq87C/T6Qq/+bkPQUViVnuyOOP2hbiNbVvhhsTcuIsr83laKq9Q7TyY44vbvyyZV8 WNqoSnVgFRObllzquzYVniQiYMYYV4KFfH7XY0tM5cOh7eAIEu2lKmWtuj3Zl9ptVqHdXQYA== X-Google-Smtp-Source: AGHT+IH9uRFnEuXcyVZFjmbxMJ9g+G/ei31Iz24NIjvAUivA5C+Hz9chFyY9I577kjh8wUI5YXlVOw== X-Received: by 2002:a05:6000:1885:b0:42b:3afa:5e1d with SMTP id ffacd0b85a97d-42cbb278409mr595748f8f.20.1763617223990; Wed, 19 Nov 2025 21:40:23 -0800 (PST) Received: from f.. (cst-prg-14-82.cust.vodafone.cz. [46.135.14.82]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42cb7f363c0sm3124639f8f.18.2025.11.19.21.40.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Nov 2025 21:40:22 -0800 (PST) From: Mateusz Guzik To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linus.walleij@linaro.org, pasha.tatashin@soleen.com, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, Mateusz Guzik Subject: [PATCH v2] fork: stop ignoring NUMA while handling cached thread stacks Date: Thu, 20 Nov 2025 06:40:15 +0100 Message-ID: <20251120054015.3019419-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 1. the numa parameter was straight up ignored. 2. nothing was done to check if the to-be-cached/allocated stack matches the local node The id remains ignored on free in case of memoryless nodes. Note the current caching is already bad as the cache keeps overflowing and a different solution is needed for the long run, to be worked out(tm). Stats collected over a kernel build with the patch with the following topology: NUMA node(s): 2 NUMA node0 CPU(s): 0-11 NUMA node1 CPU(s): 12-23 caller's node vs stack backing pages on free: matching: 50083 (70%) mismatched: 21492 (30%) caching efficiency: cached: 32651 (65.2%) dropped: 17432 (34.8%) Signed-off-by: Mateusz Guzik Reviewed-by: Linus Walleij --- v2: - add commentary to try_release_thread_stack_to_cache kernel/fork.c | 63 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 53 insertions(+), 10 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index f1857672426e..b52b4ac8fe10 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -208,15 +208,62 @@ struct vm_stack { struct vm_struct *stack_vm_area; }; =20 +static struct vm_struct *alloc_thread_stack_node_from_cache(struct task_st= ruct *tsk, int node) +{ + struct vm_struct *vm_area; + unsigned int i; + + /* + * If the node has memory, we are guaranteed the stacks are backed by loc= al pages. + * Otherwise the pages are arbitrary. + * + * Note that depending on cpuset it is possible we will get migrated to a= different + * node immediately after allocating here, so this does *not* guarantee l= ocality for + * arbitrary callers. + */ + scoped_guard(preempt) { + if (node !=3D NUMA_NO_NODE && numa_node_id() !=3D node) + return NULL; + + for (i =3D 0; i < NR_CACHED_STACKS; i++) { + vm_area =3D this_cpu_xchg(cached_stacks[i], NULL); + if (vm_area) + return vm_area; + } + } + + return NULL; +} + static bool try_release_thread_stack_to_cache(struct vm_struct *vm_area) { unsigned int i; + int nid; =20 - for (i =3D 0; i < NR_CACHED_STACKS; i++) { - struct vm_struct *tmp =3D NULL; + /* + * Don't cache stacks if any of the pages don't match the local domain, u= nless + * there is no local memory to begin with. + * + * Note that lack of local memory does not automatically mean it makes no= difference + * performance-wise which other domain backs the stack. In this case we a= re merely + * trying to avoid constantly going to vmalloc. + */ + scoped_guard(preempt) { + nid =3D numa_node_id(); + if (node_state(nid, N_MEMORY)) { + for (i =3D 0; i < vm_area->nr_pages; i++) { + struct page *page =3D vm_area->pages[i]; + if (page_to_nid(page) !=3D nid) + return false; + } + } + + for (i =3D 0; i < NR_CACHED_STACKS; i++) { + struct vm_struct *tmp =3D NULL; =20 - if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area)) - return true; + if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area)) + return true; + } } return false; } @@ -283,13 +330,9 @@ static int alloc_thread_stack_node(struct task_struct = *tsk, int node) { struct vm_struct *vm_area; void *stack; - int i; - - for (i =3D 0; i < NR_CACHED_STACKS; i++) { - vm_area =3D this_cpu_xchg(cached_stacks[i], NULL); - if (!vm_area) - continue; =20 + vm_area =3D alloc_thread_stack_node_from_cache(tsk, node); + if (vm_area) { if (memcg_charge_kernel_stack(vm_area)) { vfree(vm_area->addr); return -ENOMEM; --=20 2.48.1