From nobody Tue Dec 2 02:32:07 2025 Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F41922E5B27 for ; Wed, 19 Nov 2025 03:13:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521993; cv=none; b=j2JP3hqb9Yg6ZzVXV08qf/lOd4ZAw0YCNNTaxdwxfsHo+0mOrFKqAGaNCSPvfiBRQPbDCgGMKh3QC5D9Jj4Rbwdc7nVFx8Gb5TupQ14xjlYv6RiMbpb37MR1sAFkVLEF8l+NKcK63YWjMAZXMJrLTmCwhwgJf+t+3XTiqEo+hCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521993; c=relaxed/simple; bh=FzQoN0Fz/5DX4bgbgC0hvq+Q/W90GfzxBvwdlrQmhYU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GQAIBdyG5Amxxg9ioxQK38EttOR6o/fuM6xk+h4x+ZwtLwvqoDphlTxAlsBOyMECz8caAQCmqZgkZR0xA38qaiCJUHzI1gdslXbANoLrVbbA0lE3CNKXB47sOB4lDZOc7Gn97kBOspOUnzztt4SSiHpwrOqmgFYtRk2ENTSPFF0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FmJN6xMi; arc=none smtp.client-ip=209.85.219.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FmJN6xMi" Received: by mail-qv1-f54.google.com with SMTP id 6a1803df08f44-88246401c9eso64377566d6.1 for ; Tue, 18 Nov 2025 19:13:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763521991; x=1764126791; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c4l16W1Wn26X05sYd70JqRscVlE8+b33U+EjlbNpCeI=; b=FmJN6xMi7dhJudBYKsPK0RfzQUkekPdUZ8cA00ojRez/IS8wpHNj067CG0sc8khDDq 3qKecDVWUSnTwX7cZdBeqnhVbeW425OXd5nENu67+kzAgJe8BY0SXb6Ib8VA1dtunXbi l3VcMspLpNT28Vtkqbiu4Tcsou67ShFll8z+iIfKoQNUkUPuc63OxgeAMVkUBwLft0Zm zpSuNAi7N1dG/qdwGGKADvc0Mwbmr/ekrkX5LRt+1OGiscQ4+XDC7gBJyiGxIOG2n81t wuIBi4ks3iIs/broB2tlUMGrrT5hIGGcfszBPsuvHokw+DLH3A4jSAAHjMVQJcD1dmCF X3UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763521991; x=1764126791; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=c4l16W1Wn26X05sYd70JqRscVlE8+b33U+EjlbNpCeI=; b=rwxn/w6mG6Rzr0AKyrbDS1Nz1jzU6I1b4i1apvg/FGk3vune3nFLKHwdf7P5dhVw4p iWr2KgySd4mY1xfVAMLyKuc5lQISjSnRI2bRUCrFLpmUSkH/i1nzWRmENvnTXNGC0ZyA WHw0Mrf+yeQ0uKST64igHcPkpXpR7ECQwRt6eh6VbLd1gQU5vgqleu62mr/0oDEisnhG 7Y6AZ1TNuCJJMuFQKHjZHDYiKYnNXEHPtNyPqtsd+uNCDYpGf6dxrZr3boEzA//cHmHb G4xjNUPTzh9oJt3t23zJbFZE8EDRdFRPcxkkd83p+EnbiKceIx9cNIhMdUhWODwaEScJ y/aQ== X-Forwarded-Encrypted: i=1; AJvYcCUu0xk5wBdzTGixNuS9pW7YqPk2LLMhQhd2nfhijFAjGG+Ha063S/gDHC8/Lu860VC+XZsA+ccAFnnbNuQ=@vger.kernel.org X-Gm-Message-State: AOJu0YxbkhUdvE5TtiTRQb8+FN8+nCfqrS+vPSnIbQw66C9WjS981cic PKXRWvRaYN7wpvVnIUrC0RlFL4DHX/3m/EQl7L0uFAdIHk+AH+3PcMt9 X-Gm-Gg: ASbGnct0Rczsvlw0pkvpYIwrhBbxmCZ65RZUfbd9gwLYSLtnGSu1hI0yYPBjqyPIkij whPDKmR6bUGLcqvB10Zxlq/cHYtN3U5sqv87AMIU1nuMAHR0RaELDFKRdZUU6OFdtV/H8mHnSSg AiqKg6NAQzA9+xxQaU8OiyT2jhmg7RbpkIKl24NX/+5ghPdLlow9dYoXF9VuXLFcWQ3ipIL2pAe LxFF9deeX1/ZwskaTF1LHQNs0TS4n+udQLi8txNC3FWgBXkioPTZ3BQwNbIMyqfcJcDEo/fHm2b VnaC8XLzT2yWnH7Xc0cqidk77S7bDvuwQ0o+ZdkuDW+8W1xt39E8s8tP+neHwYoiHdY4F+MabyI TsFU/gk5SYUhLtWkcDT52lk7/GhqHUTMVzm5vkca9uhrkpU5xk7zZZmgfVaFXZHSJFn8EwsCg6E 1dRWbRuvTnW5VvTgbo6Q== X-Google-Smtp-Source: AGHT+IFEbjE+oNmvrGPTatpweMC+bRlR73gGZI6ca81xg2FWv4mqMyMtqfy/KoPx8GCi/Mvfx+83ng== X-Received: by 2002:ad4:5cae:0:b0:882:4a63:63a0 with SMTP id 6a1803df08f44-8829273fca0mr286088926d6.58.1763521990795; Tue, 18 Nov 2025 19:13:10 -0800 (PST) Received: from localhost ([12.22.141.131]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8828631454csm125873926d6.18.2025.11.18.19.13.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Nov 2025 19:13:10 -0800 (PST) From: "Yury Norov (NVIDIA)" To: Andrew Morton , Thomas Gleixner Cc: "Yury Norov (NVIDIA)" , Rasmus Villemoes , linux-kernel@vger.kernel.org Subject: [PATCH 1/3] bitmap: cpumask: introduce and_andnot search helper and iterator Date: Tue, 18 Nov 2025 22:13:03 -0500 Message-ID: <20251119031306.644129-2-yury.norov@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251119031306.644129-1-yury.norov@gmail.com> References: <20251119031306.644129-1-yury.norov@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Like other similar iterators, *_and_andnot helps to get rid of temporary on-stack bitmaps and associate housekeeping code. Signed-off-by: Yury Norov (NVIDIA) --- include/linux/cpumask.h | 22 ++++++++++++++++++++++ include/linux/find.h | 38 ++++++++++++++++++++++++++++++++++++++ lib/find_bit.c | 9 +++++++++ 3 files changed, 69 insertions(+) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index ff8f41ab7ce6..6de16a0e6e7b 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -406,6 +406,28 @@ unsigned int cpumask_random(const struct cpumask *src) #define for_each_cpu_and(cpu, mask1, mask2) \ for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpu= mask_bits) =20 +/** + * for_each_cpu_and_andnot_from - iterate over every cpu in all masks + * @cpu: the (optionally unsigned) integer iterator + * @mask1: the first cpumask pointer + * @mask2: the second cpumask pointer + * @mask3: the third cpumask pointer + * + * This saves a temporary CPU mask in many places. It is equivalent to: + * struct cpumask tmp; + * cpumask_and(&tmp, &mask1, &mask2); + * cpumask_andnot(&tmp, &tmp, &mask3); + * for_each_cpu_from(cpu, &tmp) + * ... + * + * After the loop, cpu is >=3D nr_cpu_ids. + */ +#define for_each_cpu_and_andnot_from(cpu, mask1, mask2, mask3) \ + for_each_and_andnot_bit_from(cpu, cpumask_bits(mask1), \ + cpumask_bits(mask2), \ + cpumask_bits(mask3), \ + small_cpumask_bits) + /** * for_each_cpu_andnot - iterate over every cpu present in one mask, exclu= ding * those present in another. diff --git a/include/linux/find.h b/include/linux/find.h index 9d720ad92bc1..daf72078c25e 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -14,6 +14,9 @@ unsigned long _find_next_and_bit(const unsigned long *add= r1, const unsigned long unsigned long nbits, unsigned long start); unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsi= gned long *addr2, unsigned long nbits, unsigned long start); +unsigned long _find_next_and_andnot_bit(const unsigned long *addr1, const = unsigned long *addr2, + const unsigned long *addr3, unsigned long size, + unsigned long n); unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start); unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long= nbits, @@ -135,6 +138,36 @@ unsigned long find_next_andnot_bit(const unsigned long= *addr1, } #endif =20 +/** + * find_next_and_andnot_bit - find the next set bit in *addr1 and *addr2 + * excluding all the bits in *addr3 + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @addr3: The second address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next set bit + * If no bits found, returns >=3D @size. + */ +static __always_inline +unsigned long find_next_and_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, const unsigned long *addr3, + unsigned long size, unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >=3D size)) + return size; + + val =3D *addr1 & *addr2 & ~*addr3; + return val ? __ffs(val) : size; + } + + return _find_next_and_andnot_bit(addr1, addr2, addr3, size, offset); +} + #ifndef find_next_or_bit /** * find_next_or_bit - find the next set bit in either memory regions @@ -595,6 +628,11 @@ unsigned long find_next_bit_le(const void *addr, unsig= ned (bit) =3D find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bi= t) < (size);\ (bit)++) =20 +#define for_each_and_andnot_bit_from(bit, addr1, addr2, addr3, size) \ + for (; (bit) =3D find_next_and_andnot_bit((addr1), (addr2), (addr3), \ + (size), (bit)), (bit) < (size); \ + (bit)++) + #define for_each_or_bit(bit, addr1, addr2, size) \ for ((bit) =3D 0; \ (bit) =3D find_next_or_bit((addr1), (addr2), (size), (bit)), (bit) <= (size);\ diff --git a/lib/find_bit.c b/lib/find_bit.c index d4b5a29e3e72..aec79207c566 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -206,6 +206,15 @@ unsigned long _find_next_andnot_bit(const unsigned lon= g *addr1, const unsigned l EXPORT_SYMBOL(_find_next_andnot_bit); #endif =20 +unsigned long _find_next_and_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + const unsigned long *addr3, + unsigned long nbits, unsigned long start) +{ + return FIND_NEXT_BIT(addr1[idx] & addr2[idx] & ~addr3[idx], /* nop */, nb= its, start); +} +EXPORT_SYMBOL(_find_next_and_andnot_bit); + #ifndef find_next_or_bit unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start) --=20 2.43.0 From nobody Tue Dec 2 02:32:07 2025 Received: from mail-qk1-f176.google.com (mail-qk1-f176.google.com [209.85.222.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C54A2E62C3 for ; Wed, 19 Nov 2025 03:13:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521994; cv=none; b=N22b7MJ0Vdjjs660Es/ONx72HpNr/lCSvlOgreJt3WPKosSgjCY7VEmWE+nCVlfXHMvY4WkbVisxM4A9ctYaSY+Jif8Gu2WQFo4SVriZ86N31pUSpCfxFHyujugRghDrL7XNo2m49yUSLpCdPzzhzVzBALSv8YPJmQU9k2qS2GY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521994; c=relaxed/simple; bh=7Md/k+VoRlUWEhPZsj1MCCE357Fw8kLPt8Vdk/s++fs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X/JJI/3XDQeEIWQW6ARY5nO5WN6Met1IfQVxO16qx1bdE4mM+l9NPO+X3MCQCISvelzqm/7xtdq2wGS6y3H+5UJI5uCZhc8lRK6ddz+iED/nuPC66dG5wk8HBg04QRDd1NCb6dpmC49WX7pdceb2wlzaf5/Fg0gbvDG5lMoJ1I0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XXkbIfAi; arc=none smtp.client-ip=209.85.222.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XXkbIfAi" Received: by mail-qk1-f176.google.com with SMTP id af79cd13be357-8b22b1d3e7fso611294885a.3 for ; Tue, 18 Nov 2025 19:13:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763521992; x=1764126792; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wYjjMFDyj3II80FDbJo2VUAYgYCaimv8i5nmMzAY3mQ=; b=XXkbIfAi1RBT+7RkjxQzz4CD5fhvTPZM7cWE7niMWkGzif6TVK547esgvqXH0B8yGi 3mCm0k3JPT90JM9BasV7N6ad5lmMpXNMyJE+4QmfUJHxgDYrhv5N6n+LUT6owkncxHVm if/EGtmsw+61eOl0XsD2kf7qkTo1/gX0JIbs1kZ5ItXBfhAWeSCoArasqa2JJLZ8Y42f bh+GB2lbcVqFeICNIXZQvU+G0CP5mmfM5tk3tjEGdeNc0jLfpK/guhUPrZy9rOe5PuhN nmpn6SCpDIyxMVkXr2mgVpICRsnZsQ60zGrK+Ml8gvS2Nzfaa+HBjR/FwNG6pUH8hyyA hyCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763521992; x=1764126792; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wYjjMFDyj3II80FDbJo2VUAYgYCaimv8i5nmMzAY3mQ=; b=EeyFqf6w/dZGYMkwlfz9JrfuOkZFKiw5z39OI1Y1PTSOoX5EO/KJkdBHcMznYjMkSt N0W2EZwQBJOPG/NKVUii/IrtlsjM0PAO7vkv+m3EJrk1IPXp5n/BIl33vBCQHTw0Zkbg 6Q52N5yG1fbburNctK8FfBxokmNJz5NcnI/uYLELo+smSxYR/PoBubzxTEymOR/F+foT NYQfr8j8rkxcm2AFBeyuOqrJ52t8h09C8CJGfYKs1oo9bQH+LwQLmGQXA4tKnv/RS5jw aL4EQXwWZsCqpd/zn/Hz0WYoselNmOrWiVT4nVOtPqTEXathCEcQByIUjBoGmk0c9m8Q 2wbg== X-Forwarded-Encrypted: i=1; AJvYcCU5kdJxz5YIKF1s7y9KJ8QHlU3tvMUVjP5BIlMJ1UQjeQUlgjHusb+n/19QeCd2ZdMwyc7F7TVNrVhJweA=@vger.kernel.org X-Gm-Message-State: AOJu0YwpLhfMg5c/UQ5eqHxOgi1/3Az+XPADK3GXEFk1hAPVC/EBqk3s dthGejKgwbXi+q43z53E+hAkXr1NLl+PPSpI4r44ZuPIbKIDA3B7yGds X-Gm-Gg: ASbGncvpe9jdJCCbXmCVRyZFUOfstDuTui3tm/Ch52t1dhzK1xn85IoSdzzLOXTSOfa nF4dSNZkWbDpwNJAB1HLf/Nk6uFGZezrKa0u4JqPXaIdQWk6EM3eLm7gHEqfmaPkIA28D8NF+/u cV6aGFPm/BkSH5axZGjDdWTNzg5YWwHUXSWNvEJvHBYzKw5IA9o80b2TJdI3mG8wkK009yMtkAR kBUvYjKiHURwvmoAeGuDa8xX79UErI4aOY0JaUCrX52cCTgRb8C8jUD8LCsTUs4o4GZ+y2u3joo UGq11wfXeOhEhtwFmiwuY2P4325/hBB0YVXVZKL8AvZQC2x5SdVsEYdJdfSGl4eZzDDTH6nns8Y DzuqKXPbIxq2BfhXB834cB/8mWYoEkWRL+PZlYETQXaihpMXyOJ2jem2jnF42mbEyJuiNrZGYiu raZRFBj3o= X-Google-Smtp-Source: AGHT+IFd6zw/6k0xiGo7nDDstLjYtFjZ3rqIqq8KbhxzG+jcE1x4N3uRVhhNGtqMHccqX+axXLmcWw== X-Received: by 2002:a05:620a:4091:b0:8b2:a3a9:f770 with SMTP id af79cd13be357-8b2c31e4bdcmr2429017585a.83.1763521992160; Tue, 18 Nov 2025 19:13:12 -0800 (PST) Received: from localhost ([12.22.141.131]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b2af041e55sm1338774185a.42.2025.11.18.19.13.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Nov 2025 19:13:11 -0800 (PST) From: "Yury Norov (NVIDIA)" To: Andrew Morton , Thomas Gleixner Cc: "Yury Norov (NVIDIA)" , Rasmus Villemoes , linux-kernel@vger.kernel.org Subject: [PATCH 2/3] group_cpus: don't call cpumask_weight() prematurely Date: Tue, 18 Nov 2025 22:13:04 -0500 Message-ID: <20251119031306.644129-3-yury.norov@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251119031306.644129-1-yury.norov@gmail.com> References: <20251119031306.644129-1-yury.norov@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" alloc_nodes_groups() and __group_cpus_evenly() call cpumask_weight() unconditionally in the for_each() loops. cpumask_weight() is O(N), so the complexity of the functions become O(MAX_NUMNODES * nr_cpu_ids). This call may be avoided if the nmsk is empty. Signed-off-by: Yury Norov (NVIDIA) --- lib/group_cpus.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/lib/group_cpus.c b/lib/group_cpus.c index 6d08ac05f371..6aae1560b796 100644 --- a/lib/group_cpus.c +++ b/lib/group_cpus.c @@ -142,15 +142,11 @@ static void alloc_nodes_groups(unsigned int numgrps, } =20 for_each_node_mask(n, nodemsk) { - unsigned ncpus; - - cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); - ncpus =3D cpumask_weight(nmsk); - - if (!ncpus) + if (!cpumask_and(nmsk, cpu_mask, node_to_cpumask[n])) continue; - remaining_ncpus +=3D ncpus; - node_groups[n].ncpus =3D ncpus; + + node_groups[n].ncpus =3D cpumask_weight(nmsk); + remaining_ncpus +=3D node_groups[n].ncpus; } =20 numgrps =3D min_t(unsigned, remaining_ncpus, numgrps); @@ -294,11 +290,10 @@ static int __group_cpus_evenly(unsigned int startgrp,= unsigned int numgrps, continue; =20 /* Get the cpus on this node which are in the mask */ - cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]); - ncpus =3D cpumask_weight(nmsk); - if (!ncpus) + if (!cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id])) continue; =20 + ncpus =3D cpumask_weight(nmsk); WARN_ON_ONCE(nv->ngroups > ncpus); =20 /* Account for rounding errors */ --=20 2.43.0 From nobody Tue Dec 2 02:32:07 2025 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B745A2E7160 for ; Wed, 19 Nov 2025 03:13:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521996; cv=none; b=p5beCq0atjP9wgmw7/dJPz6mGu0fIEM4h9P/ZcwfU4Smwaj3H2P99u92Qhd9gX1JSitcT9k6VNTbYLZCWCKOzVcGvJBypfhCAcQmHX5Hfn/xh92/IgxcDUf5Qi6IHvFk3mWs2gsFzk0CaChKFow4lFC5+Prb0vn0PFFrTA8wiac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763521996; c=relaxed/simple; bh=//McQDuufDIbU2OLIRz91v6W5HXWp3WUNWkAMzqmkB0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=q0w+KEr5LxcWfdGAX9mr0fpuRE53hfLjBRAudyYYkTbG11j7L8g3yuifafG3oUTEd2qOB4M9t0wRyCJzX9ld1lnj6RpHGpdz29lCd1oRCL96iw0xaZUR2xtgfFadeJ3ZOlPEp/VV4dFzczp9bHHAgyj5o+MA035jNH856jow2bs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AtKrXmrI; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AtKrXmrI" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-8b2dcdde698so525694685a.3 for ; Tue, 18 Nov 2025 19:13:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763521993; x=1764126793; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Wfr6lv6AsT0Zh3via7B3Sd/hj1nTeM1m84o67PMNi4A=; b=AtKrXmrIkfLj9tfqbL1q73iKlgqCNpu+INLB7ffaCDSpknlC/YyF3zxSN8BZAdXVr3 CXVCeG3AhlkhU+8DUU6JP+tQ84ot43hUSXALuNEjCPu3nlBVpx0zLH6DsbaHmZ51w6XE nRRzYHp9yQYniUBbnwVfBQv69/oTbctPSwBT+oC8221krG2UubidNrjVvBwixeOzKznW 6459yAIdxoZm6lGuHLen7I3gy0VFf9GeO1rP7/WsTeJqG2ndeN4NIWl3oF18kg8ljvPu cFF75+OEh5RDS0SOJYHK23eAZIEmbLoTQEJ8NhlK7gqww05qXhnrCUwJ/N1RKyiDNKgk VXxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763521993; x=1764126793; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Wfr6lv6AsT0Zh3via7B3Sd/hj1nTeM1m84o67PMNi4A=; b=S81K94+YHq7WOF617QZPZ/avhawoI99YW0p8OPevJPWdzr5xio9Y9vuAGEh8NMexpp RVW+iGMGwLEpdfPjLssh56Q/7iLLqvghALQBTQZ6UWY9emJb9ILjfz4b3vJcFyw9Pp0m NLK3gcg85t3CDbSZmINoYgKIf6eEEqhQjYZnVgL3oCNi+oiYd/mrFH3DIZbfqpjJBal6 6+bmJiUsT764lW+6kAmmBBJ3Dp1oPcXk1bLXpsR6WmpKDphBXFrJbqI96016+0pe/onB GFhXMejxGXhwgoB1wujFPUOPkKhN/DMzDfkej8Yuj8qO7PDZVbT8kqX9MzJComiuGdLG uDkg== X-Forwarded-Encrypted: i=1; AJvYcCUSU48/1TqBBgSMbrMRNuIJOMQFW8FZJedTGcuxlE73xk3plUOOz7/47u16hmTe83KbjLgXBlgsbLKEt3w=@vger.kernel.org X-Gm-Message-State: AOJu0YzAFW+VHezt3B7v9YTDFaNHY+OO5QVOJAkgLZs5SKOwT2IZV615 dSxVs9zCG9/mQzfMSG/8SRiKeuWy+ZhJnLj1BU+3WCkmh1dzeO/dp5iB X-Gm-Gg: ASbGncvUDMi+rdULUtPduHNg2sqhCdSVN4ZSZkOubC0SQDAg/H86MocrX5wHVpGRfkL 6+ZEb65M5L9DeUwyLA6qRv8w5YBFx3TpgWW3eChQKy5iYwJgm3zYSXMSOnYW3yd730t38TIeKhJ cEQzDquVstRzSiHiMgiUUPg4+1uN9mskACiHwjNT2UvCPZH9gBCC9QFDGFnUT0j73/loy3s4o/M O4CNXzgfVU5Ucps2Skzs9zqhOZ0WjL4vA5CWoq88B/vO4okZc13Gi1hZeI+YqhRdbFbzMowJZPP Y36vZ7kHmxFDzd8t8PNW0cCwCtWW/+vVhG37WPsew8u0KhyBUokjC25L1eCsHR+7+9XpMo1YW8L f4wwZCUE/k9UqEOnCZfvIfRGH3w159TxKZBOdN5SVtRCvxlJ2ZXLRX796kHUAkJSD3SuhmIyY2U DoN9/G1as= X-Google-Smtp-Source: AGHT+IGWB8c6pAwXrMdDLlVzG9i0qLvS7Z0Vj1ke1kyLqFAeWysFd2QEDBsRVI8EutmO0vTtNqQFqw== X-Received: by 2002:a05:620a:237:b0:8b2:d2be:3d04 with SMTP id af79cd13be357-8b2d2be3e06mr1598907285a.36.1763521993568; Tue, 18 Nov 2025 19:13:13 -0800 (PST) Received: from localhost ([12.22.141.131]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b2af044e3asm1318252485a.41.2025.11.18.19.13.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Nov 2025 19:13:13 -0800 (PST) From: "Yury Norov (NVIDIA)" To: Andrew Morton , Thomas Gleixner Cc: "Yury Norov (NVIDIA)" , Rasmus Villemoes , linux-kernel@vger.kernel.org Subject: [PATCH 3/3] group_cpus: simplify inner loop in grp_spread_init_one() Date: Tue, 18 Nov 2025 22:13:05 -0500 Message-ID: <20251119031306.644129-4-yury.norov@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251119031306.644129-1-yury.norov@gmail.com> References: <20251119031306.644129-1-yury.norov@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Three optimizations for grp_spread_init_one(). 1. Drop most of housekeeping code in grp_spread_init_one() with for_each_cpu_and_andnot_from(). 2. Fix Shlemiel the Painter's algorithm by adding 'sibl =3D cpu' line. This improves the grp_spread_init_one() complexity from quadratic to linear. 3. Don't clear the nmsk because it's rewritten in the caller code anyways, and switch to non-atomic bit setter for irqmsk as the mask is local and implies no concurrency. Signed-off-by: Yury Norov (NVIDIA) --- lib/group_cpus.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/lib/group_cpus.c b/lib/group_cpus.c index 6aae1560b796..35aba99d8cd0 100644 --- a/lib/group_cpus.c +++ b/lib/group_cpus.c @@ -17,27 +17,14 @@ static void grp_spread_init_one(struct cpumask *irqmsk,= struct cpumask *nmsk, const struct cpumask *siblmsk; int cpu, sibl; =20 - for ( ; cpus_per_grp > 0; ) { - cpu =3D cpumask_first(nmsk); - - /* Should not happen, but I'm too lazy to think about it */ - if (cpu >=3D nr_cpu_ids) - return; - - cpumask_clear_cpu(cpu, nmsk); - cpumask_set_cpu(cpu, irqmsk); - cpus_per_grp--; - + for_each_cpu(cpu, nmsk) { /* If the cpu has siblings, use them first */ siblmsk =3D topology_sibling_cpumask(cpu); - for (sibl =3D -1; cpus_per_grp > 0; ) { - sibl =3D cpumask_next(sibl, siblmsk); - if (sibl >=3D nr_cpu_ids) - break; - if (!cpumask_test_and_clear_cpu(sibl, nmsk)) - continue; - cpumask_set_cpu(sibl, irqmsk); - cpus_per_grp--; + sibl =3D cpu; + for_each_cpu_and_andnot_from(sibl, nmsk, siblmsk, irqmsk) { + __cpumask_set_cpu(sibl, irqmsk); + if (--cpus_per_grp) + return; } } } --=20 2.43.0