From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455F018BBA2 for ; Fri, 23 Aug 2024 19:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439651; cv=none; b=TPMN/iMgRXOztjGozaPvvZaQFciNPGMImx7vf/yDkufDwUX/+/756Dtyy/ADxlLBHKteUGlmzeeQ7fXU8D9rOfsVAqX1Gl+fyxMcf5qlKfeyvv232NjN7A3QDdAZw3lkWwVH4NbtC77L2t8iJIW4IqEmtFHM1hNST80+fhSvXEc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439651; c=relaxed/simple; bh=iKticH+hVtk0Jsn+Z4/9YfNt/Gnnukcx5t9hJyZ9KVg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RWCwE7gDP91vKdBX5CdIyitopt4MY/7MEyGkTUkJ2Ce1T4WmWcRDbSyqVtW88wStQ8Mzn0e8BlFHioiYE6TGEeLUQdtRWdKKhYmi+qGegicDwmzE6IavL2aDnL5znjrbn0CGQVH2FIfSJFXoePqXqfmzdbOVb98IIsjbuo431do= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=CyZAz0WI; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="CyZAz0WI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439621; bh=iKticH+hVtk0Jsn+Z4/9YfNt/Gnnukcx5t9hJyZ9KVg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CyZAz0WIV5x2VKCuTn/N3QmZJ+nn4Ykk3Ks6/eOlBC1I91BwfI2OJmZKcCsQUtaEx UWUiEUeyvmVYeYBRLEitBQP65nX8wxKmHlqYZdWEwLOFqNMV8wnjRiaq8FglxO994B +rhIUpNbVX6WHnf90xMdvRN8Q22eHDv8goQy7bSyDM+hxVel3oWQscMj/YrQtCQsDE OGuLldOmyuJTWECy6OfDbRRUpKL1hVFSp364rICykk0YTDqloUAaDszcELbJ3M4/JF 0nBUIxgpHJyfy5qQiqkXqQ/w1QpvgG4Hi02IfjqjTYQtgJ1t1Cg7Xl+loRpO7WHb6M GP3Cu2qGeZ2+g== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XP5g0xz1J2p; Fri, 23 Aug 2024 15:00:21 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH v1 1/6] lib: Clarify comment on top of find_next_andnot_bit Date: Fri, 23 Aug 2024 14:59:41 -0400 Message-Id: <20240823185946.418340-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make the comment on top of find_next_andnot_bit clearer by discussing in terms of "cleared bits" rather than "excluding bits". Signed-off-by: Mathieu Desnoyers Cc: Yury Norov Cc: Rasmus Villemoes Acked-by: Yury Norov Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- include/linux/find.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/find.h b/include/linux/find.h index 5dfca4225fef..8a170aa55634 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -102,15 +102,14 @@ unsigned long find_next_and_bit(const unsigned long *= addr1, =20 #ifndef find_next_andnot_bit /** - * find_next_andnot_bit - find the next set bit in *addr1 excluding all th= e bits - * in *addr2 + * find_next_andnot_bit - find the next set bit in *addr1, cleared in *add= r2 * @addr1: The first address to base the search on * @addr2: The second address to base the search on * @size: The bitmap size in bits * @offset: The bitnumber to start searching at * - * Returns the bit number for the next set bit - * If no bits are set, returns @size. + * Returns the bit number for the next bit set in *addr1, cleared in *addr= 2. + * If no such bits are found, returns @size. */ static inline unsigned long find_next_andnot_bit(const unsigned long *addr1, --=20 2.39.2 From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4556080C02 for ; Fri, 23 Aug 2024 19:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439649; cv=none; b=jl+MXhzJbc1twOSXg3Klhz+uHQQiQ7yxf7LDov/rH726HB+eUa2xZJYCB7k55Ty6eHVPbKhTQF2r7Mq6vI32pW6KdfUVs0jajba01EqvcOjROY54pkrYxUyqWbRK8tb4cUNg2/s/v3bxb22+d65YCwK0fVIypUmJzjspIh9FN8Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439649; c=relaxed/simple; bh=VukAq3LGwaUV03MNRLtxpLWypvPIz5coxzufwy4yjQI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=a2ELKRKcerw/Fz4the5rquuBFz3O18tveCE0g2WTlPc2x9zvSS/4u5qREu+4USRT3Lv1c0mXAxze7N5yJet3cWm3j8OrBp3Vxb49U8NxYNqWuH2FalWmxro6ScLk2D3ahe/mBIckhwAJVCJfNhch0eb63I9o6TNxI45ypPYTyoM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=qaLsB7fi; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="qaLsB7fi" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439622; bh=VukAq3LGwaUV03MNRLtxpLWypvPIz5coxzufwy4yjQI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=qaLsB7fiIiBiDOSx0lfJQbu+/FIUwqmTxjx6ea3IaIEbzAZ0UsaN/p93NilxEpxp3 cl/0HxDnP+NKIw49GlI2XMof/6GA7D4Xy8R3pmQK7R8eB46t1Wb4JnJabo4uotwY4k b1E0RgNGMyZEUR8n9sHD6JLEj+JiFuqkuQ0XMwRHlna764ChBX77bQL3OtFvT//Ulf Mi/VD/z/OSyor5czaCELIlUrNlp5uxKOKmbYDLQypBGe6p8T9Q4kw22DgqB++BZ2A8 JvIPAuycOW87OYQpt2FtI4nlHQNp86B3pB77+zu94va6dZsOK7MNDgBn5xg4owaZXR rQK8AwVPM+8lw== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XP6l8lz1J2q; Fri, 23 Aug 2024 15:00:21 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH v1 2/6] lib: Implement find_{first,next,nth}_nor_bit, find_first_andnot_bit Date: Fri, 23 Aug 2024 14:59:42 -0400 Message-Id: <20240823185946.418340-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow finding the first, next, or nth bit within two input bitmasks which is zero in both masks. Allow fiding the first bit within two input bitmasks which is set in first mask and cleared in the second mask. find_next_andnot_bit and find_nth_andnot_bit already exist, so find the first bit appears to be missing. Signed-off-by: Mathieu Desnoyers Cc: Yury Norov Cc: Rasmus Villemoes Acked-by: Yury Norov Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- Changes since v0: - Rename "notandnot" to "nor", document equivalence. - Move comment cleanups to a separate patch. - Use __always_inline. --- include/linux/find.h | 112 +++++++++++++++++++++++++++++++++++++++++++ lib/find_bit.c | 36 ++++++++++++++ 2 files changed, 148 insertions(+) diff --git a/include/linux/find.h b/include/linux/find.h index 8a170aa55634..b1394ba92654 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -14,6 +14,8 @@ unsigned long _find_next_and_bit(const unsigned long *add= r1, const unsigned long unsigned long nbits, unsigned long start); unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsi= gned long *addr2, unsigned long nbits, unsigned long start); +unsigned long _find_next_nor_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long nbits, unsigned long start); unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start); unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long= nbits, @@ -24,11 +26,17 @@ unsigned long __find_nth_and_bit(const unsigned long *a= ddr1, const unsigned long unsigned long size, unsigned long n); unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsi= gned long *addr2, unsigned long size, unsigned long n); +unsigned long __find_nth_nor_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long size, unsigned long n); unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const = unsigned long *addr2, const unsigned long *addr3, unsigned long size, unsigned long n); extern unsigned long _find_first_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long size); +extern unsigned long _find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); +extern unsigned long _find_first_nor_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); unsigned long _find_first_and_and_bit(const unsigned long *addr1, const un= signed long *addr2, const unsigned long *addr3, unsigned long size); extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsig= ned long size); @@ -130,6 +138,35 @@ unsigned long find_next_andnot_bit(const unsigned long= *addr1, } #endif =20 +/** + * find_next_nor_bit - find the next bit cleared in both *addr1 and *addr2 + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next bit cleared in both *addr1 and *add= r2. + * If no such bits are found, returns @size. + * The bitwise operation nor ~(A | B) is equivalent to (~A & ~B). + */ +static __always_inline +unsigned long find_next_nor_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size, + unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >=3D size)) + return size; + + val =3D ~(*addr1 | *addr2) & GENMASK(size - 1, offset); + return val ? __ffs(val) : size; + } + + return _find_next_nor_bit(addr1, addr2, size, offset); +} + #ifndef find_next_or_bit /** * find_next_or_bit - find the next set bit in either memory regions @@ -291,6 +328,33 @@ unsigned long find_nth_andnot_bit(const unsigned long = *addr1, const unsigned lon return __find_nth_andnot_bit(addr1, addr2, size, n); } =20 +/** + * find_nth_nor_bit - find N'th cleared bit in 2 memory regions. + * @addr1: The 1st address to start the search at + * @addr2: The 2nd address to start the search at + * @size: The maximum number of bits to search + * @n: The number of set bit, which position is needed, counting from 0 + * + * Returns the bit number of the N'th bit cleared in the two regions. + * If no such, returns @size. + * The bitwise operation nor ~(A | B) is equivalent to (~A & ~B). + */ +static __always_inline +unsigned long find_nth_nor_bit(const unsigned long *addr1, const unsigned = long *addr2, + unsigned long size, unsigned long n) +{ + if (n >=3D size) + return size; + + if (small_const_nbits(size)) { + unsigned long val =3D ~(*addr1 | *addr2) & GENMASK(size - 1, 0); + + return val ? fns(val, n) : size; + } + + return __find_nth_nor_bit(addr1, addr2, size, n); +} + /** * find_nth_and_andnot_bit - find N'th set bit in 2 memory regions, * excluding those set in 3rd region @@ -346,6 +410,54 @@ unsigned long find_first_and_bit(const unsigned long *= addr1, } #endif =20 +/** + * find_first_andnot_bit - find the first set bit in 2 memory regions, + * flipping bits in 2nd region. + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next set bit. + * If no bits are set, returns @size. + */ +static __always_inline +unsigned long find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val =3D *addr1 & (~*addr2) & GENMASK(size - 1, 0); + + return val ? __ffs(val) : size; + } + + return _find_first_andnot_bit(addr1, addr2, size); +} + +/** + * find_first_nor_bit - find the first cleared bit in 2 memory regions + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next cleared bit. + * If no bits are set, returns @size. + * The bitwise operation nor ~(A | B) is equivalent to (~A & ~B). + */ +static __always_inline +unsigned long find_first_nor_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val =3D ~(*addr1 | *addr2) & GENMASK(size - 1, 0); + + return val ? __ffs(val) : size; + } + + return _find_first_nor_bit(addr1, addr2, size); +} + /** * find_first_and_and_bit - find the first set bit in 3 memory regions * @addr1: The first address to base the search on diff --git a/lib/find_bit.c b/lib/find_bit.c index 0836bb3d76c5..8050bc7c7ede 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -116,6 +116,28 @@ unsigned long _find_first_and_bit(const unsigned long = *addr1, EXPORT_SYMBOL(_find_first_and_bit); #endif =20 +/* + * Find the first set bit in two memory regions, flipping bits in 2nd regi= on. + */ +unsigned long _find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + return FIND_FIRST_BIT(addr1[idx] & ~addr2[idx], /* nop */, size); +} +EXPORT_SYMBOL(_find_first_andnot_bit); + +/* + * Find the first cleared bit in two memory regions. + */ +unsigned long _find_first_nor_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + return FIND_FIRST_BIT(~(addr1[idx] | addr2[idx]), /* nop */, size); +} +EXPORT_SYMBOL(_find_first_nor_bit); + /* * Find the first set bit in three memory regions. */ @@ -167,6 +189,13 @@ unsigned long __find_nth_andnot_bit(const unsigned lon= g *addr1, const unsigned l } EXPORT_SYMBOL(__find_nth_andnot_bit); =20 +unsigned long __find_nth_nor_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long size, unsigned long n) +{ + return FIND_NTH_BIT(~(addr1[idx] | addr2[idx]), size, n); +} +EXPORT_SYMBOL(__find_nth_nor_bit); + unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2, const unsigned long *addr3, @@ -194,6 +223,13 @@ unsigned long _find_next_andnot_bit(const unsigned lon= g *addr1, const unsigned l EXPORT_SYMBOL(_find_next_andnot_bit); #endif =20 +unsigned long _find_next_nor_bit(const unsigned long *addr1, const unsigne= d long *addr2, + unsigned long nbits, unsigned long start) +{ + return FIND_NEXT_BIT(~(addr1[idx] | addr2[idx]), /* nop */, nbits, start); +} +EXPORT_SYMBOL(_find_next_nor_bit); + #ifndef find_next_or_bit unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start) --=20 2.39.2 From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455196026A for ; Fri, 23 Aug 2024 19:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439650; cv=none; b=sPc1EAHT9hLa97tNzu1pMUsgkUHPRYBR2DZgf3Qwg+JKi92xtTuhxlAXhSuSvHRKIzRkeHeoYyAf3KRzOzZ8VskHNaq6RBHFKELB/jSbbWEtBSOQTmQa2Qtl28frotVp6ZFp+dfS1+3qG3PZbDQJaB1ALytxl5wuNAD62t45O70= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439650; c=relaxed/simple; bh=xPO3WBIua1wHM2bB6ZQOkhcMQEPuEXhULjQIcQKnpTs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QWBgVu1S3+o4YVLoVoFqhJG2Vvh7bMqc5u5c1yHe44Mrr+xLTvGmmyCkKDRcCxTdWlMtjAV5ElYbb6fL+zqiaTgpyULUDauyZB29UUNzr7MPAbuxztXPW/59EviiK8HjO3PtKDUeTXE3df4xlxTC3DuBciENd0AZ0kZX3ieSaqU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=g6RmWf26; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="g6RmWf26" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439622; bh=xPO3WBIua1wHM2bB6ZQOkhcMQEPuEXhULjQIcQKnpTs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=g6RmWf26zT5n6hD8MBw/o1HUTdwKYJnPlZ7H9EcruD9UKMe9rZbSo/vSLpjNZgoiz WLn0AYuZ6vNHXwzVTMZ6TUg2eODPbmb88ySwlHTC2jOPmiLR8uvEjzFXGk+ZeeyXon +dCHgPwVPSQL5t0hjYezCZLfC2AYziuQlviWsr7SLaDfhQe7CPaeYUO17A1r7gs0pW 8UL2VMzQE69mxmWgsO/1drJFQpvqJHA1K3ucsiELt89oFT0a8S/yUIUo5agHWK7CEv cqgTTA3ImyC2qit7YkTB4DEHD1NKc7tUfQjeNgKqkAP1MPIXBsMWEUKXZ0OOHWaZ9l 1sdV6Z890pJPA== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XQ0pcjz1Hkv; Fri, 23 Aug 2024 15:00:22 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH v1 3/6] cpumask: Implement cpumask_{first,next}_{nor,andnot} Date: Fri, 23 Aug 2024 14:59:43 -0400 Message-Id: <20240823185946.418340-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow finding the first or next bit within two input cpumasks which is either: - both zero and zero, - respectively one and zero. Signed-off-by: Mathieu Desnoyers Cc: Yury Norov Cc: Rasmus Villemoes Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- Changes since v0: - Rename "notandnot" to "nor". - Use __always_inline. --- include/linux/cpumask.h | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 23686bed441d..5573e75c13ec 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -204,6 +204,32 @@ unsigned int cpumask_first_and_and(const struct cpumas= k *srcp1, cpumask_bits(srcp3), small_cpumask_bits); } =20 +/** + * cpumask_first_andnot - return the first cpu from *srcp1 & ~*srcp2 + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static __always_inline +unsigned int cpumask_first_andnot(const struct cpumask *srcp1, const struc= t cpumask *srcp2) +{ + return find_first_andnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr= _cpumask_bits); +} + +/** + * cpumask_first_nor - return the first cpu from ~(*srcp1 | *srcp2) + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static __always_inline +unsigned int cpumask_first_nor(const struct cpumask *srcp1, const struct c= pumask *srcp2) +{ + return find_first_nor_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cp= umask_bits); +} + /** * cpumask_last - get the last CPU in a cpumask * @srcp: - the cpumask pointer @@ -246,6 +272,40 @@ static inline unsigned int cpumask_next_zero(int n, co= nst struct cpumask *srcp) return find_next_zero_bit(cpumask_bits(srcp), small_cpumask_bits, n+1); } =20 +/** + * cpumask_next_andnot - return the next cpu from *srcp1 & ~*srcp2 + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static __always_inline +unsigned int cpumask_next_andnot(int n, const struct cpumask *srcp1, const= struct cpumask *srcp2) +{ + /* -1 is a legal arg here. */ + if (n !=3D -1) + cpumask_check(n); + return find_next_andnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_= cpumask_bits, n+1); +} + +/** + * cpumask_next_nor - return the next cpu from ~(*srcp1 | *srcp2) + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static __always_inline +unsigned int cpumask_next_nor(int n, const struct cpumask *srcp1, const st= ruct cpumask *srcp2) +{ + /* -1 is a legal arg here. */ + if (n !=3D -1) + cpumask_check(n); + return find_next_nor_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cpu= mask_bits, n+1); +} + #if NR_CPUS =3D=3D 1 /* Uniprocessor: there is only one valid CPU */ static inline unsigned int cpumask_local_spread(unsigned int i, int node) --=20 2.39.2 From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 454D72D7B8 for ; Fri, 23 Aug 2024 19:00:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439650; cv=none; b=kZ9tY88QEWugyJhhG0zsjzwJvtz4mGKwb9RVvFrVQSIAoIgD3cadHMSq4UFb4z82VLmJ8e04MVMZ9mVj8HtbJVLrfHHmmkUMWtLMjc9TBv7JijsgqRIfJZ5IocAqw69Ud3VeyMZeSW5PuspbYi5bbvoJCfq7RAYA/6a6xjvaUKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439650; c=relaxed/simple; bh=AIhRzOLvS4U65/4cPzOeZ2udR7+Z/2R+ED2RYBBkDwY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OdM8TV9SlagFGNCsnV34RluzZIZwIXX2wkA6RLoifEVs7CmtjSQsEp07/VtKp2N7HakqlRzz4khiNCoP1bYRggBl2Ao4ssU58HF4DJQypOYK9uO60ix2fxytZPsBPYihWg4kBGVVOfmTGhPte0exKuOY17/VGvSOV3VzzfwUMKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=KsoJeWSM; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="KsoJeWSM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439622; bh=AIhRzOLvS4U65/4cPzOeZ2udR7+Z/2R+ED2RYBBkDwY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KsoJeWSMSSgeAYtM0WVPLXm267Hj+9DuRO2jBFsrICEQuNzgVL4OhFWtm5wesMCSd zPI/8Cm8MWifm6SfBljKHq1dgNHQn8J1rPJhRRSd2rQnCljPEi3l97E2csyyuvbkSY aXD6xwc5Oy1edLf6rjpG3/AGr6sqyfvoQ241qSySCttd3O43S2WY/SRHsbUvwbw3x9 iYIWm3FYtMdYClN9hF6/nroaJx6/ppjxTMlpSH9ZKu/j7gUY5KDceDttn4gXy+1VaZ vIJ3M5OOy6IohTrtdFXkN9GQDVamhekGAiSUgo4EUVUnout3v37zNqfbaRr8tJDpze sKRDA1mgB1rAA== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XQ1zCGz1J04; Fri, 23 Aug 2024 15:00:22 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH v1 4/6] sched: NUMA-aware per-memory-map concurrency IDs Date: Fri, 23 Aug 2024 14:59:44 -0400 Message-Id: <20240823185946.418340-5-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The issue addressed by this change is the non-locality of NUMA accesses to data structures indexed by concurrency IDs: for example, in a scenario where a process has two threads, and they periodically run one after the other on different NUMA nodes, each will be assigned mm_cid=3D0. As a consequence, they will end up accessing the same pages, and thus at least one of the threads will need to perform remote NUMA accesses, which is inefficient. That being said, the same issue theoretically exists due to false sharing of cache lines by threads running on after another on different cores/CPUs within a single NUMA node, but the extent of the performance impact is lesser than remote NUMA accesses. Solve this by making the rseq concurrency ID (mm_cid) NUMA-aware. On NUMA systems, when a NUMA-aware concurrency ID is observed by user-space to be associated with a NUMA node, guarantee that it never changes NUMA node unless either a kernel-level NUMA configuration change happens, or scheduler migrations end up migrating tasks across NUMA nodes. There is a tradeoff between NUMA locality and compactness of the concurrency ID allocation. Favor compactness over NUMA locality when the scheduler migrates tasks across NUMA nodes, as this does not cause the frequent remote NUMA accesses behavior. This is done by limiting the concurrency ID range to minimum between the number of threads belonging to the process and the number of allowed CPUs. Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Valentin Schneider Cc: Mel Gorman Cc: Steven Rostedt Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Ben Segall Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- Changes since v0: - Rename "notandnot" to "nor". --- include/linux/mm_types.h | 57 +++++++++++++++- kernel/sched/core.c | 10 ++- kernel/sched/sched.h | 139 +++++++++++++++++++++++++++++++++++---- 3 files changed, 190 insertions(+), 16 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index af3a0256fa93..4307352c8900 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include =20 @@ -1154,6 +1155,59 @@ static inline cpumask_t *mm_cidmask(struct mm_struct= *mm) return (struct cpumask *)cid_bitmap; } =20 +#ifdef CONFIG_NUMA +/* + * Layout of NUMA cidmasks: + * - node_alloc cidmask: cpumask tracking which cids were + * allocated (across nodes) in this + * memory map. + * - node cidmask[nr_node_ids]: per-node cpumask tracking which cid + * were allocated in this memory map. + */ +static inline cpumask_t *mm_node_alloc_cidmask(struct mm_struct *mm) +{ + unsigned long cid_bitmap =3D (unsigned long)mm_cidmask(mm); + + /* Skip mm_cidmask */ + cid_bitmap +=3D cpumask_size(); + return (struct cpumask *)cid_bitmap; +} + +static inline cpumask_t *mm_node_cidmask(struct mm_struct *mm, unsigned in= t node) +{ + unsigned long cid_bitmap =3D (unsigned long)mm_node_alloc_cidmask(mm); + + /* Skip node alloc cidmask */ + cid_bitmap +=3D cpumask_size(); + cid_bitmap +=3D node * cpumask_size(); + return (struct cpumask *)cid_bitmap; +} + +static inline void mm_init_node_cidmask(struct mm_struct *mm) +{ + unsigned int node; + + if (num_possible_nodes() =3D=3D 1) + return; + cpumask_clear(mm_node_alloc_cidmask(mm)); + for (node =3D 0; node < nr_node_ids; node++) + cpumask_clear(mm_node_cidmask(mm, node)); +} + +static inline unsigned int mm_node_cidmask_size(void) +{ + if (num_possible_nodes() =3D=3D 1) + return 0; + return (nr_node_ids + 1) * cpumask_size(); +} +#else /* CONFIG_NUMA */ +static inline void mm_init_node_cidmask(struct mm_struct *mm) { } +static inline unsigned int mm_node_cidmask_size(void) +{ + return 0; +} +#endif /* CONFIG_NUMA */ + static inline void mm_init_cid(struct mm_struct *mm) { int i; @@ -1165,6 +1219,7 @@ static inline void mm_init_cid(struct mm_struct *mm) pcpu_cid->time =3D 0; } cpumask_clear(mm_cidmask(mm)); + mm_init_node_cidmask(mm); } =20 static inline int mm_alloc_cid_noprof(struct mm_struct *mm) @@ -1185,7 +1240,7 @@ static inline void mm_destroy_cid(struct mm_struct *m= m) =20 static inline unsigned int mm_cid_size(void) { - return cpumask_size(); + return cpumask_size() + mm_node_cidmask_size(); } #else /* CONFIG_SCHED_MM_CID */ static inline void mm_init_cid(struct mm_struct *mm) { } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3e84a3b7b7bb..26f53bd66de6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11818,9 +11818,13 @@ void sched_mm_cid_migrate_to(struct rq *dst_rq, st= ruct task_struct *t) * scenarios. * * It is not useful to clear the src cid when the number of threads is - * greater or equal to the number of allowed cpus, because user-space + * greater or equal to the number of CPUs allowed, because user-space * can expect that the number of allowed cids can reach the number of - * allowed cpus. + * CPUs allowed. + * + * This also prevents moving cid across NUMA nodes when the + * number of threads is greater or equal to the number of + * CPUs allowed. */ dst_pcpu_cid =3D per_cpu_ptr(mm->pcpu_cid, cpu_of(dst_rq)); dst_cid =3D READ_ONCE(dst_pcpu_cid->cid); @@ -12079,7 +12083,7 @@ void sched_mm_cid_after_execve(struct task_struct *= t) * Matches barrier in sched_mm_cid_remote_clear_old(). */ smp_mb(); - t->last_mm_cid =3D t->mm_cid =3D mm_cid_get(rq, mm); + t->last_mm_cid =3D t->mm_cid =3D mm_cid_get(rq, t, mm); } rseq_set_notify_resume(t); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 38aeedd8a6cc..5422e0df155f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -68,6 +68,7 @@ #include #include #include +#include =20 #include #include @@ -3311,12 +3312,10 @@ static inline void mm_cid_put(struct mm_struct *mm) __mm_cid_put(mm, mm_cid_clear_lazy_put(cid)); } =20 -static inline int __mm_cid_try_get(struct mm_struct *mm) +static inline int __mm_cid_test_and_set_first(struct cpumask *cpumask) { - struct cpumask *cpumask; int cid; =20 - cpumask =3D mm_cidmask(mm); /* * Retry finding first zero bit if the mask is temporarily * filled. This only happens during concurrent remote-clear @@ -3333,9 +3332,123 @@ static inline int __mm_cid_try_get(struct mm_struct= *mm) return cid; } =20 +#ifdef CONFIG_NUMA +/* + * NUMA locality is preserved as long as the mm_cid range is restricted + * to the minimum between the number of CPUs allowed and the number of + * threads with references to the mm_struct. + */ +static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct= *mm) +{ + struct cpumask *cpumask =3D mm_cidmask(mm), + *node_cpumask =3D mm_node_cidmask(mm, numa_node_id()), + *node_alloc_cpumask =3D mm_node_alloc_cidmask(mm); + unsigned int node; + int cid; + + if (num_possible_nodes() =3D=3D 1) + return __mm_cid_test_and_set_first(cpumask); + + /* + * Try to reserve lowest available cid number within those + * already reserved for this NUMA node. + */ + cid =3D cpumask_first_andnot(node_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed || cid >=3D atomic_read(&mm->mm_users)) + goto alloc_numa; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + goto end; + +alloc_numa: + /* + * Try to reserve lowest available cid number within those not + * already allocated for NUMA nodes. + */ + cid =3D cpumask_first_nor(node_alloc_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed) + goto steal_overprovisioned_cid; + if (cid >=3D atomic_read(&mm->mm_users)) + goto steal_first_available_cid; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + __cpumask_set_cpu(cid, node_cpumask); + __cpumask_set_cpu(cid, node_alloc_cpumask); + goto end; + +steal_overprovisioned_cid: + /* + * Either the NUMA node id configuration changed for at least + * one CPU in the system, or the scheduler migrated threads + * across NUMA nodes, or the CPUs allowed mask changed. We need + * to steal a currently unused cid. Userspace must handle the + * fact that the node id associated with this cid may change. + * + * Try to steal an available cid number from an overprovisioned + * NUMA node. A NUMA node is overprovisioned when more cids are + * associated to it than the number of cores associated with + * this NUMA node in the CPUs allowed mask. Stealing from + * overprovisioned NUMA nodes ensures cid movement across NUMA + * nodes stabilises after a configuration or CPUs allowed mask + * change. + */ + for (node =3D 0; node < nr_node_ids; node++) { + struct cpumask *iter_cpumask; + int nr_allowed_cores; + + if (node =3D=3D numa_node_id()) + continue; + iter_cpumask =3D mm_node_cidmask(mm, node); + nr_allowed_cores =3D cpumask_weight_and(cpumask_of_node(node), t->cpus_p= tr); + if (cpumask_weight(iter_cpumask) <=3D nr_allowed_cores) + continue; + /* Try to steal from an overprovisioned NUMA node. */ + cid =3D cpumask_first_andnot(iter_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed || cid >=3D atomic_read(&mm->mm_users)) + goto steal_first_available_cid; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + __cpumask_clear_cpu(cid, iter_cpumask); + __cpumask_set_cpu(cid, node_cpumask); + goto end; + } + +steal_first_available_cid: + /* + * Steal the first available cid, without caring about NUMA + * locality. This is needed when the scheduler migrates threads + * across NUMA nodes, when those threads belong to processes + * which have fewer threads than the number of CPUs allowed. + */ + cid =3D __mm_cid_test_and_set_first(cpumask); + if (cid < 0) + return -1; + /* Steal cid from its NUMA node mask. */ + for (node =3D 0; node < nr_node_ids; node++) { + struct cpumask *iter_cpumask; + + if (node =3D=3D numa_node_id()) + continue; + iter_cpumask =3D mm_node_cidmask(mm, node); + if (cpumask_test_cpu(cid, iter_cpumask)) { + __cpumask_clear_cpu(cid, iter_cpumask); + break; + } + } + __cpumask_set_cpu(cid, node_cpumask); +end: + return cid; +} +#else +static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct= *mm) +{ + return __mm_cid_test_and_set_first(mm_cidmask(mm)); +} +#endif + /* - * Save a snapshot of the current runqueue time of this cpu - * with the per-cpu cid value, allowing to estimate how recently it was us= ed. + * Save a snapshot of the current runqueue time of this CPU + * with the per-CPU cid value, allowing to estimate how recently it was us= ed. */ static inline void mm_cid_snapshot_time(struct rq *rq, struct mm_struct *m= m) { @@ -3345,7 +3458,8 @@ static inline void mm_cid_snapshot_time(struct rq *rq= , struct mm_struct *mm) WRITE_ONCE(pcpu_cid->time, rq->clock); } =20 -static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm) +static inline int __mm_cid_get(struct rq *rq, struct task_struct *t, + struct mm_struct *mm) { int cid; =20 @@ -3355,13 +3469,13 @@ static inline int __mm_cid_get(struct rq *rq, struc= t mm_struct *mm) * guarantee forward progress. */ if (!READ_ONCE(use_cid_lock)) { - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); if (cid >=3D 0) goto end; raw_spin_lock(&cid_lock); } else { raw_spin_lock(&cid_lock); - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); if (cid >=3D 0) goto unlock; } @@ -3381,7 +3495,7 @@ static inline int __mm_cid_get(struct rq *rq, struct = mm_struct *mm) * all newcoming allocations observe the use_cid_lock flag set. */ do { - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); cpu_relax(); } while (cid < 0); /* @@ -3397,7 +3511,8 @@ static inline int __mm_cid_get(struct rq *rq, struct = mm_struct *mm) return cid; } =20 -static inline int mm_cid_get(struct rq *rq, struct mm_struct *mm) +static inline int mm_cid_get(struct rq *rq, struct task_struct *t, + struct mm_struct *mm) { struct mm_cid __percpu *pcpu_cid =3D mm->pcpu_cid; struct cpumask *cpumask; @@ -3414,7 +3529,7 @@ static inline int mm_cid_get(struct rq *rq, struct mm= _struct *mm) if (try_cmpxchg(&this_cpu_ptr(pcpu_cid)->cid, &cid, MM_CID_UNSET)) __mm_cid_put(mm, mm_cid_clear_lazy_put(cid)); } - cid =3D __mm_cid_get(rq, mm); + cid =3D __mm_cid_get(rq, t, mm); __this_cpu_write(pcpu_cid->cid, cid); return cid; } @@ -3467,7 +3582,7 @@ static inline void switch_mm_cid(struct rq *rq, prev->mm_cid =3D -1; } if (next->mm_cid_active) - next->last_mm_cid =3D next->mm_cid =3D mm_cid_get(rq, next->mm); + next->last_mm_cid =3D next->mm_cid =3D mm_cid_get(rq, next, next->mm); } =20 #else --=20 2.39.2 From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8602A1922C9; Fri, 23 Aug 2024 19:00:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439652; cv=none; b=mgftg2gq4rZ3qDxbcRJBXPbGcgIRU/1btK8j5hKmI9mcgq6F91OHsFImbybpnNAw5jgrrnL4LwXV1Isz4I496KT6o+0Pc2CRseD3p1sf7j0X9oDDExsR9B0D0xO185HldgcmKsej7ejEVMrHhXUiwUZEYmAEx8xVCHygplsL41g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439652; c=relaxed/simple; bh=bs+/EWjAhs966KMUIhRMp9DxjSO6mOM2w5UU/c1zyqw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nzizOklUZ68ygmhyYNroHirw+eU7h9JoVXEUbWikzN5ENqGGRK8/V7sFNDPma0hcNFEgGICmV3qyHv+K9844pOwewla/7fMq1mItLZB71Jx3OazaxbypJburWtJOXQSgkg0E7DtFMh5yY2znb0BHpT1vh17LcyP96SQN+qhIxJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=LetfqUA8; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="LetfqUA8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439622; bh=bs+/EWjAhs966KMUIhRMp9DxjSO6mOM2w5UU/c1zyqw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LetfqUA8vTxITbVR+/8Naz3XntQhCRMzTU5f+A9/vXZB7ywf+M4371xDZRebGj6Kw PeoUfKtj/PVsf02ZnkGYqA39WR2uiDlBZLVAHkAaSo/UTlo3ZJ6vSyFothWxukeKcu grGKotQMMvFm28c5m+zsomYFihXJQB82CI9Gb4eNND0klPVPYFvk7Uey2KboF8i3uG fR6rw7RBIlPNCZpHhEDTvwu1oP09tAeB/8c8mCKcZ9klRIobLkPiF3vmfWDOpxSbl6 RccVvUBz6dcP4zu4oi29H69u5gSKYSZUy2j0pjJxoxYT8+0V5KvLGIxOo6wtm5R+LR 6PIYOmY+gN7Lw== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XQ34Cfz1Hhj; Fri, 23 Aug 2024 15:00:22 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan , linux-kselftest@vger.kernel.org Subject: [RFC PATCH v1 5/6] selftests/rseq: x86: Implement rseq_load_u32_u32 Date: Fri, 23 Aug 2024 14:59:45 -0400 Message-Id: <20240823185946.418340-6-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow loading a pair of u32 within a rseq critical section. It can be used in situations where both rseq_abi()->mm_cid and rseq_abi()->node_id need to be sampled atomically with respect to preemption, signal delivery and migration. Signed-off-by: Mathieu Desnoyers Reviewed-by: Shuah Khan Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Shuah Khan Cc: linux-kselftest@vger.kernel.org Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- tools/testing/selftests/rseq/rseq-x86-bits.h | 43 ++++++++++++++++++++ tools/testing/selftests/rseq/rseq.h | 14 +++++++ 2 files changed, 57 insertions(+) diff --git a/tools/testing/selftests/rseq/rseq-x86-bits.h b/tools/testing/s= elftests/rseq/rseq-x86-bits.h index 8a9431eec467..fdf5ef398393 100644 --- a/tools/testing/selftests/rseq/rseq-x86-bits.h +++ b/tools/testing/selftests/rseq/rseq-x86-bits.h @@ -990,4 +990,47 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_cmpeqv_trymemcpy_sto= rev)(intptr_t *v, intptr_t =20 #endif =20 +#if defined(RSEQ_TEMPLATE_CPU_ID_NONE) && defined(RSEQ_TEMPLATE_MO_RELAXED) + +#define RSEQ_ARCH_HAS_LOAD_U32_U32 + +static inline __attribute__((always_inline)) +int RSEQ_TEMPLATE_IDENTIFIER(rseq_load_u32_u32)(uint32_t *dst1, uint32_t *= src1, + uint32_t *dst2, uint32_t *src2) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, RSEQ_ASM_TP_SEGMENT:RSEQ_CS_OFFSET(%[rseq_= offset])) + RSEQ_INJECT_ASM(3) + "movl %[src1], %%eax\n\t" + "movl %%eax, %[dst1]\n\t" + "movl %[src2], %%eax\n\t" + "movl %%eax, %[dst2]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [rseq_offset] "r" (rseq_offset), + /* final store input */ + [dst1] "m" (*dst1), + [src1] "m" (*src1), + [dst2] "m" (*dst2), + [src2] "m" (*src2) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort + ); + rseq_after_asm_goto(); + return 0; +abort: + rseq_after_asm_goto(); + RSEQ_INJECT_FAILED + return -1; +} + +#endif /* defined(RSEQ_TEMPLATE_CPU_ID_NONE) && defined(RSEQ_TEMPLATE_MO_R= ELAXED) */ + #include "rseq-bits-reset.h" diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/= rseq/rseq.h index d7364ea4d201..b6095c2a5da6 100644 --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -381,4 +381,18 @@ int rseq_cmpeqv_trymemcpy_storev(enum rseq_mo rseq_mo,= enum rseq_percpu_mode per } } =20 +#ifdef RSEQ_ARCH_HAS_LOAD_U32_U32 + +static inline __attribute__((always_inline)) +int rseq_load_u32_u32(enum rseq_mo rseq_mo, + uint32_t *dst1, uint32_t *src1, + uint32_t *dst2, uint32_t *src2) +{ + if (rseq_mo !=3D RSEQ_MO_RELAXED) + return -1; + return rseq_load_u32_u32_relaxed(dst1, src1, dst2, src2); +} + +#endif + #endif /* RSEQ_H_ */ --=20 2.39.2 From nobody Sun Feb 8 06:21:45 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 860781922E4; Fri, 23 Aug 2024 19:00:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439651; cv=none; b=ps5kZf/LxU7wu0XmSlJtAg3q4y9XJvJwcdrYIkxuSeaAZMWTZ8livS/gC1Z+9iZZIBQ6aYi5tkLYUoSSiVWVbj3T9d818RTW4IPayY5WK7wk6pB7z55y+YdW4uBwNeskzhfPYbcoEw9WnNZswP29H9nOQv9mQ7QUjhPCzW6nm+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724439651; c=relaxed/simple; bh=fQ9fVrF6ZGfCufEBlLjF0Q+idjH2+ayxgpQ03Vq3uJs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=AcMsp6dQNeI8vjDFmbCFrQ/pSTJGIDwFWnSUtz7TYwC33SPMB6J6bT9nWu7dG4zkLkw8ovtGKjvkMgRSD4rspmmcg8iU6l5naCx7z7zc/1YEpVY+0cqdp22tvZCF+SdhqdTY+ZiTs8KQwY6MCmHb62v/nnuKLG2WnyBGeOsx+Zg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=YQW3DwYf; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="YQW3DwYf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724439622; bh=fQ9fVrF6ZGfCufEBlLjF0Q+idjH2+ayxgpQ03Vq3uJs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=YQW3DwYfh7dfn1s05J7PsYMfkl6Z4t0oMj21kx8c440ZmdKOXETJXzIeI3tOFXBXp myxY4zM9z3g1VTW+x90mV1d83NF4B7pEvD0JA572VjLqpNvGerCKlbfbDKTlzbyAPu 1gQIjUIdPIhRUn+WVCiTHLhqZbJOOMyaZhfp2+ZHyBza0MqCh0N9kyWtNnuaINWVqI c0nm8AD1PwSekwEAuEtR+SC+LLk9jUsWZButdF4a36ViG4a9bYa6M3CXuK4TFR/RVE ITt1XPwbAVNKIXAnpJCa3Nrv8dp0gxeJ6mqmYfJJYwh5NIXHY519i2WRPI0YuugYiF oZ+hdNAcnfz9Q== Received: from thinkos.internal.efficios.com (unknown [IPv6:2606:6d00:100:4000:b243:804e:3bbd:91c9]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4Wr8XQ4JFmz1Hhk; Fri, 23 Aug 2024 15:00:22 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan , linux-kselftest@vger.kernel.org Subject: [RFC PATCH v1 6/6] selftests/rseq: Implement NUMA node id vs mm_cid invariant test Date: Fri, 23 Aug 2024 14:59:46 -0400 Message-Id: <20240823185946.418340-7-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> References: <20240823185946.418340-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test validates that the mapping between a mm_cid and a NUMA node id remains invariant for the process lifetime for a process with a number of threads >=3D number of allowed CPUs. In other words, it validates that if any thread within the process running on behalf of a mm_cid N observes a NUMA node id M, all threads within this process will always observe the same NUMA node id value when running on behalf of that same mm_cid. This characteristic is important for NUMA locality. On all architectures except Power, the NUMA topology is never reconfigured after a CPU has been associated with a NUMA node in the system lifetime. Even on Power, we can assume that NUMA topology reconfiguration happens rarely, and therefore we do not expect it to happen while the NUMA test is running. As a result the NUMA node id associated with a mm_cid should be invariant as long as: - A process has a number of threads >=3D number of allowed CPUs, - The allowed CPUs mask is unchanged, and - The NUMA configuration is unchanged. This test is skipped on architectures that do not implement rseq_load_u32_u32. Signed-off-by: Mathieu Desnoyers Reviewed-by: Shuah Khan Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Shuah Khan Cc: linux-kselftest@vger.kernel.org Reviewed-by tag from Shuah Khan for selftests changes. Rebased on --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../testing/selftests/rseq/basic_numa_test.c | 144 ++++++++++++++++++ 3 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selfte= sts/rseq/.gitignore index 16496de5f6ce..8a8d163cbb9f 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only +basic_numa_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftest= s/rseq/Makefile index 5a3432fceb58..9ef1c949114a 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -14,7 +14,7 @@ LDLIBS +=3D -lpthread -ldl # still track changes to header files and depend on shared object. OVERRIDE_TARGETS =3D 1 =20 -TEST_GEN_PROGS =3D basic_test basic_percpu_ops_test basic_percpu_ops_mm_ci= d_test param_test \ +TEST_GEN_PROGS =3D basic_test basic_numa_test basic_percpu_ops_test basic_= percpu_ops_mm_cid_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice =20 diff --git a/tools/testing/selftests/rseq/basic_numa_test.c b/tools/testing= /selftests/rseq/basic_numa_test.c new file mode 100644 index 000000000000..8e51c662057d --- /dev/null +++ b/tools/testing/selftests/rseq/basic_numa_test.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic rseq NUMA test. Validate that (mm_cid, numa_node_id) pairs are + * invariant when the number of threads >=3D number of allowed CPUs, as + * long as those preconditions are respected: + * + * - A process has a number of threads >=3D number of allowed CPUs, + * - The allowed CPUs mask is unchanged, and + * - The NUMA configuration is unchanged. + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +#include "rseq.h" + +#define NR_LOOPS 100 + +static int nr_threads, nr_active_threads, test_go, test_stop; + +#ifdef RSEQ_ARCH_HAS_LOAD_U32_U32 + +static int cpu_numa_id[CPU_SETSIZE]; + +static int get_affinity_weight(void) +{ + cpu_set_t allowed_cpus; + + if (sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus)) { + perror("sched_getaffinity"); + abort(); + } + return CPU_COUNT(&allowed_cpus); +} + +static void numa_id_init(void) +{ + int i; + + for (i =3D 0; i < CPU_SETSIZE; i++) + cpu_numa_id[i] =3D -1; +} + +static void *test_thread(void *arg) +{ + int i; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s= \n", + errno, strerror(errno)); + abort(); + } + /* + * Rendez-vous across all threads to make sure the number of + * threads >=3D number of possible CPUs for the entire test duration. + */ + if (__atomic_add_fetch(&nr_active_threads, 1, __ATOMIC_RELAXED) =3D=3D nr= _threads) + __atomic_store_n(&test_go, 1, __ATOMIC_RELAXED); + while (!__atomic_load_n(&test_go, __ATOMIC_RELAXED)) + rseq_barrier(); + + for (i =3D 0; i < NR_LOOPS; i++) { + uint32_t mm_cid, node; + int cached_node_id; + + while (rseq_load_u32_u32(RSEQ_MO_RELAXED, &mm_cid, + &rseq_get_abi()->mm_cid, + &node, &rseq_get_abi()->node_id) !=3D 0) { + /* Retry. */ + } + cached_node_id =3D RSEQ_READ_ONCE(cpu_numa_id[mm_cid]); + if (cached_node_id =3D=3D -1) { + RSEQ_WRITE_ONCE(cpu_numa_id[mm_cid], node); + } else { + if (node !=3D cached_node_id) { + fprintf(stderr, "Error: NUMA node id discrepancy: mm_cid %u cached nod= e id %d node id %u.\n", + mm_cid, cached_node_id, node); + fprintf(stderr, "This is likely a kernel bug, or caused by a concurren= t NUMA topology reconfiguration.\n"); + abort(); + } + } + (void) poll(NULL, 0, 10); /* wait 10ms */ + } + /* + * Rendez-vous before exiting all threads to make sure the + * number of threads >=3D number of possible CPUs for the entire + * test duration. + */ + if (__atomic_sub_fetch(&nr_active_threads, 1, __ATOMIC_RELAXED) =3D=3D 0) + __atomic_store_n(&test_stop, 1, __ATOMIC_RELAXED); + while (!__atomic_load_n(&test_stop, __ATOMIC_RELAXED)) + rseq_barrier(); + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): = %s\n", + errno, strerror(errno)); + abort(); + } + return NULL; +} + +static int test_numa(void) +{ + pthread_t tid[nr_threads]; + int err, i; + void *tret; + + numa_id_init(); + + printf("testing rseq (mm_cid, numa_node_id) invariant, multi-threaded (%d= threads)\n", + nr_threads); + + for (i =3D 0; i < nr_threads; i++) { + err =3D pthread_create(&tid[i], NULL, test_thread, NULL); + if (err !=3D 0) + abort(); + } + + for (i =3D 0; i < nr_threads; i++) { + err =3D pthread_join(tid[i], &tret); + if (err !=3D 0) + abort(); + } + + return 0; +} +#else +static int test_numa(void) +{ + fprintf(stderr, "rseq_load_u32_u32 is not implemented on this architectur= e. Skipping numa test.\n"); + return 0; +} +#endif + +int main(int argc, char **argv) +{ + nr_threads =3D get_affinity_weight(); + return test_numa(); +} --=20 2.39.2