From nobody Sun Feb 8 04:13:19 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1277916C6AC for ; Mon, 19 Aug 2024 14:32:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077933; cv=none; b=bpBBLgWaGow8E9gFyQGvVt156Gyhf7XHZmPENsv8wwXdHrMpT9dC5T1+3qNU/UG8wIFjyRWLL+gVjCsmyK5Bbhxd/ArQjCHJPwu8tZ6wikVWt0qu/a5y1ohOyOLyS0UgMjPaUgKAMtR0DJ24tMfvMc+TXocOqN59xmDZm2r+XcM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077933; c=relaxed/simple; bh=wyM/VPWrODiAeuvDltg6SpNw20yBBry4iGqdbnWnf94=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=I5OmyetVYDXZloEbFKzqedCGOYILG6/Om0Pwt+jZrv/AXkZHwONcwxuLcJC8E5yOaARedMpwxibNb3okd4oIr0TtEOtUJX0resxU6OjALJrmyD6X3yt5gL0utuXirDN07iZXtNvC0hpXNR6+YPMdpPflKF6KiYZr7nSrogHS/po= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=j7oiACOX; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="j7oiACOX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724077473; bh=wyM/VPWrODiAeuvDltg6SpNw20yBBry4iGqdbnWnf94=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=j7oiACOXuqAG6hNHL3VHQCyV22kgqGnrehXoZErnPL5O08ez5JeGHbDJCkrmt1FMh UWMRBpGQMJy4aPWcvM5pn1BgVcRX3HPghYAtIMfwUe030/PqKRhuaWGLjbME1AwaTf HXRLo7N+2Sw74bfcYpkLBb72KetZUBWGUvgkjyH30Psakcz2oWfEOAnkNHOOW/HiRb /qKxGgGmGet4iLVEMLOCkbK2XeQKWtNo/ImpwHFwBrdQyec4pH1V8kcnlSqOkG0pGp ZUghWn9Mha9Hds17zhA+8XEkhYMsMdqesNy/0ss67uSI9svYGMP+aJ3OdsVK/1xhzi UlAbib+Gc/TEw== Received: from thinkos.internal.efficios.com (109.56.13.38.mobile.3.dk [109.56.13.38]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4WnZbz3B90z1Hjt; Mon, 19 Aug 2024 10:24:31 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH 1/5] lib: Implement find_{first,next,nth}_notandnot_bit, find_first_andnot_bit Date: Mon, 19 Aug 2024 16:24:02 +0200 Message-Id: <20240819142406.339084-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> References: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow finding the first, next, or nth bit within two input bitmasks which is zero in both masks. Allow fiding the first bit within two input bitmasks which is set in first mask and cleared in the second mask. find_next_andnot_bit and find_nth_andnot_bit already exist, so find the first bit appears to be missing. Signed-off-by: Mathieu Desnoyers Cc: Yury Norov Cc: Rasmus Villemoes Reviewed-by: Shuah Khan --- include/linux/find.h | 122 +++++++++++++++++++++++++++++++++++++++++-- lib/find_bit.c | 42 +++++++++++++++ 2 files changed, 160 insertions(+), 4 deletions(-) diff --git a/include/linux/find.h b/include/linux/find.h index 5dfca4225fef..6b2377006b22 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -14,6 +14,8 @@ unsigned long _find_next_and_bit(const unsigned long *add= r1, const unsigned long unsigned long nbits, unsigned long start); unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsi= gned long *addr2, unsigned long nbits, unsigned long start); +unsigned long _find_next_notandnot_bit(const unsigned long *addr1, const u= nsigned long *addr2, + unsigned long nbits, unsigned long start); unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start); unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long= nbits, @@ -24,11 +26,17 @@ unsigned long __find_nth_and_bit(const unsigned long *a= ddr1, const unsigned long unsigned long size, unsigned long n); unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsi= gned long *addr2, unsigned long size, unsigned long n); +unsigned long __find_nth_notandnot_bit(const unsigned long *addr1, const u= nsigned long *addr2, + unsigned long size, unsigned long n); unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const = unsigned long *addr2, const unsigned long *addr3, unsigned long size, unsigned long n); extern unsigned long _find_first_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long size); +extern unsigned long _find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); +extern unsigned long _find_first_notandnot_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); unsigned long _find_first_and_and_bit(const unsigned long *addr1, const un= signed long *addr2, const unsigned long *addr3, unsigned long size); extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsig= ned long size); @@ -102,15 +110,14 @@ unsigned long find_next_and_bit(const unsigned long *= addr1, =20 #ifndef find_next_andnot_bit /** - * find_next_andnot_bit - find the next set bit in *addr1 excluding all th= e bits - * in *addr2 + * find_next_andnot_bit - find the next set bit in *addr1, cleared in *add= r2 * @addr1: The first address to base the search on * @addr2: The second address to base the search on * @size: The bitmap size in bits * @offset: The bitnumber to start searching at * - * Returns the bit number for the next set bit - * If no bits are set, returns @size. + * Returns the bit number for the next bit set in *addr1, cleared in *addr= 2. + * If no such bits are found, returns @size. */ static inline unsigned long find_next_andnot_bit(const unsigned long *addr1, @@ -131,6 +138,36 @@ unsigned long find_next_andnot_bit(const unsigned long= *addr1, } #endif =20 +#ifndef find_next_notandnot_bit +/** + * find_next_notandnot_bit - find the next bit cleared in both *addr1 and = *addr2 + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next bit cleared in both *addr1 and *add= r2. + * If no such bits are found, returns @size. + */ +static inline +unsigned long find_next_notandnot_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size, + unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >=3D size)) + return size; + + val =3D (~*addr1) & (~*addr2) & GENMASK(size - 1, offset); + return val ? __ffs(val) : size; + } + + return _find_next_notandnot_bit(addr1, addr2, size, offset); +} +#endif + #ifndef find_next_or_bit /** * find_next_or_bit - find the next set bit in either memory regions @@ -292,6 +329,32 @@ unsigned long find_nth_andnot_bit(const unsigned long = *addr1, const unsigned lon return __find_nth_andnot_bit(addr1, addr2, size, n); } =20 +/** + * find_nth_notandnot_bit - find N'th cleared bit in 2 memory regions. + * @addr1: The 1st address to start the search at + * @addr2: The 2nd address to start the search at + * @size: The maximum number of bits to search + * @n: The number of set bit, which position is needed, counting from 0 + * + * Returns the bit number of the N'th bit cleared in the two regions. + * If no such, returns @size. + */ +static inline +unsigned long find_nth_notandnot_bit(const unsigned long *addr1, const uns= igned long *addr2, + unsigned long size, unsigned long n) +{ + if (n >=3D size) + return size; + + if (small_const_nbits(size)) { + unsigned long val =3D (~*addr1) & (~*addr2) & GENMASK(size - 1, 0); + + return val ? fns(val, n) : size; + } + + return __find_nth_notandnot_bit(addr1, addr2, size, n); +} + /** * find_nth_and_andnot_bit - find N'th set bit in 2 memory regions, * excluding those set in 3rd region @@ -347,6 +410,57 @@ unsigned long find_first_and_bit(const unsigned long *= addr1, } #endif =20 +#ifndef find_first_andnot_bit +/** + * find_first_andnot_bit - find the first set bit in 2 memory regions, + * flipping bits in 2nd region. + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next set bit. + * If no bits are set, returns @size. + */ +static inline +unsigned long find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val =3D *addr1 & (~*addr2) & GENMASK(size - 1, 0); + + return val ? __ffs(val) : size; + } + + return _find_first_andnot_bit(addr1, addr2, size); +} +#endif + +#ifndef find_first_notandnot_bit +/** + * find_first_notandnot_bit - find the first cleared bit in 2 memory regio= ns + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next cleared bit. + * If no bits are set, returns @size. + */ +static inline +unsigned long find_first_notandnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val =3D (~*addr1) & (~*addr2) & GENMASK(size - 1, 0); + + return val ? __ffs(val) : size; + } + + return _find_first_notandnot_bit(addr1, addr2, size); +} +#endif + /** * find_first_and_and_bit - find the first set bit in 3 memory regions * @addr1: The first address to base the search on diff --git a/lib/find_bit.c b/lib/find_bit.c index 0836bb3d76c5..b4a3dd62a255 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -116,6 +116,32 @@ unsigned long _find_first_and_bit(const unsigned long = *addr1, EXPORT_SYMBOL(_find_first_and_bit); #endif =20 +#ifndef find_first_andnot_bit +/* + * Find the first set bit in two memory regions, flipping bits in 2nd regi= on. + */ +unsigned long _find_first_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + return FIND_FIRST_BIT(addr1[idx] & ~addr2[idx], /* nop */, size); +} +EXPORT_SYMBOL(_find_first_andnot_bit); +#endif + +#ifndef find_first_notandnot_bit +/* + * Find the first cleared bit in two memory regions. + */ +unsigned long _find_first_notandnot_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + return FIND_FIRST_BIT(~addr1[idx] & ~addr2[idx], /* nop */, size); +} +EXPORT_SYMBOL(_find_first_notandnot_bit); +#endif + /* * Find the first set bit in three memory regions. */ @@ -167,6 +193,13 @@ unsigned long __find_nth_andnot_bit(const unsigned lon= g *addr1, const unsigned l } EXPORT_SYMBOL(__find_nth_andnot_bit); =20 +unsigned long __find_nth_notandnot_bit(const unsigned long *addr1, const u= nsigned long *addr2, + unsigned long size, unsigned long n) +{ + return FIND_NTH_BIT(~addr1[idx] & ~addr2[idx], size, n); +} +EXPORT_SYMBOL(__find_nth_notandnot_bit); + unsigned long __find_nth_and_andnot_bit(const unsigned long *addr1, const unsigned long *addr2, const unsigned long *addr3, @@ -194,6 +227,15 @@ unsigned long _find_next_andnot_bit(const unsigned lon= g *addr1, const unsigned l EXPORT_SYMBOL(_find_next_andnot_bit); #endif =20 +#ifndef find_next_notandnot_bit +unsigned long _find_next_notandnot_bit(const unsigned long *addr1, const u= nsigned long *addr2, + unsigned long nbits, unsigned long start) +{ + return FIND_NEXT_BIT(~addr1[idx] & ~addr2[idx], /* nop */, nbits, start); +} +EXPORT_SYMBOL(_find_next_notandnot_bit); +#endif + #ifndef find_next_or_bit unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned= long *addr2, unsigned long nbits, unsigned long start) --=20 2.39.2 From nobody Sun Feb 8 04:13:19 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1273416C692 for ; Mon, 19 Aug 2024 14:32:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077932; cv=none; b=ki/C8HlruG6GY+Pff8+L+eUcnam4TZzec/2ZkVdXRtTLqvD2I69aI2BYMPxKv2O4a2tjJAzSy3GZu5d4qVudznlsKDmRpPLvoj0tNk+9tdTy+Wj6bW7r0HfU5QGUlgm0FHE5RqcIMwLb6W+AvrGZPDJQclY1wEM8A3ywISTqcUM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077932; c=relaxed/simple; bh=DEbGp9wYg5NOjou+uBlHQvfdNvk7CmHWxQAcAMsPyd8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L0S7TpaHZcMMN1dQK0J/Fjom5Q0O2Go2XoF/vpHHfITXpMlXDDI50GfitIdjSrW9fzV0mJdv1GzaC63CLe9VPGAwJzu6hZ7pvObbjR20B7+M+ixPZoPN/h+Yi8H4LhoAsstVub7yD11GisdVbEVAixT/yCaAMRKlZ4E+ydoWB/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=dESzAWsb; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="dESzAWsb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724077475; bh=DEbGp9wYg5NOjou+uBlHQvfdNvk7CmHWxQAcAMsPyd8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dESzAWsbJlCtqstz6N9scjxx4ImaShlLSD3KjnzWzOeGAM3Jvepqoe8oaZsC+Yq59 h/AgFB2vDxsYMePiIRF8lxpMocDnV7nefMyVtyFalf17oWPgHz1NxKM/N8h7+LSKgO LdvO1haIYWXoUnEJqljo9LNyO/N4lyqyJruq04tZM9N09ImMuHwZXacssnR27zkZxS eDghaJ1WWCduzQ7I9GCEUEkk8cQNLDUg0f0/2jbP7ntkwQkiwIyS2YQ5euDLnQHEiy qs2P+YS6vlqfikFmzGiIWFG+oBLtuFxPxVhRfDjdI43hWH3f7TIlpOtDr5f5OgLAjS qiAaRJ4ASqdvA== Received: from thinkos.internal.efficios.com (109.56.13.38.mobile.3.dk [109.56.13.38]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4WnZc16Jvvz1Hjv; Mon, 19 Aug 2024 10:24:33 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH 2/5] cpumask: Implement cpumask_{first,next}_{not,}andnot Date: Mon, 19 Aug 2024 16:24:03 +0200 Message-Id: <20240819142406.339084-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> References: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow finding the first or next bit within two input cpumasks which is either: - both zero and zero, - respectively one and zero. Signed-off-by: Mathieu Desnoyers Cc: Yury Norov Cc: Rasmus Villemoes Reviewed-by: Shuah Khan --- include/linux/cpumask.h | 60 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 23686bed441d..57b7d99d6da1 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -204,6 +204,32 @@ unsigned int cpumask_first_and_and(const struct cpumas= k *srcp1, cpumask_bits(srcp3), small_cpumask_bits); } =20 +/** + * cpumask_first_andnot - return the first cpu from *srcp1 & ~*srcp2 + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static inline +unsigned int cpumask_first_andnot(const struct cpumask *srcp1, const struc= t cpumask *srcp2) +{ + return find_first_andnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr= _cpumask_bits); +} + +/** + * cpumask_first_notandnot - return the first cpu from ~*srcp1 & ~*srcp2 + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static inline +unsigned int cpumask_first_notandnot(const struct cpumask *srcp1, const st= ruct cpumask *srcp2) +{ + return find_first_notandnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),= nr_cpumask_bits); +} + /** * cpumask_last - get the last CPU in a cpumask * @srcp: - the cpumask pointer @@ -246,6 +272,40 @@ static inline unsigned int cpumask_next_zero(int n, co= nst struct cpumask *srcp) return find_next_zero_bit(cpumask_bits(srcp), small_cpumask_bits, n+1); } =20 +/** + * cpumask_next_andnot - return the next cpu from *srcp1 & ~*srcp2 + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static inline +unsigned int cpumask_next_andnot(int n, const struct cpumask *srcp1, const= struct cpumask *srcp2) +{ + /* -1 is a legal arg here. */ + if (n !=3D -1) + cpumask_check(n); + return find_next_andnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_= cpumask_bits, n+1); +} + +/** + * cpumask_next_notandnot - return the next cpu from ~*srcp1 & ~*srcp2 + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @src1p: the first input + * @src2p: the second input + * + * Returns >=3D nr_cpu_ids if no cpus match in both. + */ +static inline +unsigned int cpumask_next_notandnot(int n, const struct cpumask *srcp1, co= nst struct cpumask *srcp2) +{ + /* -1 is a legal arg here. */ + if (n !=3D -1) + cpumask_check(n); + return find_next_notandnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), = nr_cpumask_bits, n+1); +} + #if NR_CPUS =3D=3D 1 /* Uniprocessor: there is only one valid CPU */ static inline unsigned int cpumask_local_spread(unsigned int i, int node) --=20 2.39.2 From nobody Sun Feb 8 04:13:19 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86B9816DC1E for ; Mon, 19 Aug 2024 14:24:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077485; cv=none; b=e3EpXqcWyLpxcA7dQcDlZChiW0z8Ux55PhPSaaV0II0Vu2/9VQNWK1R+CTiHYfMAGZ3N08BOJhrns8RpVaX4LoRlpnZbKr+CiTlp1+CDmJdKxMc5CNl8nyDZdlvefdA2r38RS7PFK36zQgNAn5A4rsZCUnfG5z20OA0kiJeG1uk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077485; c=relaxed/simple; bh=sDSlEergNjG3h3mH678bfA92YLczi+0Qg86GnA30D5w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=F1mtBV6Vq0ITeCQVdYewH/ikFlmIZ/gSH3wfH6EhJdyG8fzWhXPcwnVSwTWA+o5haRQwBV3QccaE7d7Aq3wsV26ZTCu16EjONxuLW+20ken1ehj8jnf1dWwMQeINosY/Zxstwxz5HtqoUKqdNvwYTKx9iH6wRuCtCt8eUhUZKn4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=S3OY6pvT; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="S3OY6pvT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724077478; bh=sDSlEergNjG3h3mH678bfA92YLczi+0Qg86GnA30D5w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S3OY6pvTqq2YNstw+mQW1vbMxJqTCGu5VH2wucxM4NAE6vBJyLlRHM7auyE+8vABa Osii6Dbhg58FRMRcC79dFUXBU45n4384FxfR6rxyyfxcazml7nPjvS6EmF5hlEha/L lmB3N2v2ktJS2cDTEi5oKFjXE2MULy91VI03VuBFVa6DVd8/f+A43mdLN5JmzTDbAQ mAm0qqjUYIebKu3FLt++mRGIQQX7AFhupVzO2nJ5M0xR0GDfjLOqyDH4+A3U9IjbQk 3OneMNuwoH/RBX4U2gdTTeJDVU2Znolz+e7oUscq+SzLtLLAkWHE7kVi2SAO7YPVFX 30aonrQKSCr0Q== Received: from thinkos.internal.efficios.com (109.56.13.38.mobile.3.dk [109.56.13.38]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4WnZc4238fz1Hrx; Mon, 19 Aug 2024 10:24:36 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan Subject: [RFC PATCH 3/5] sched: NUMA-aware per-memory-map concurrency IDs Date: Mon, 19 Aug 2024 16:24:04 +0200 Message-Id: <20240819142406.339084-4-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> References: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The issue addressed by this change is the non-locality of NUMA accesses to data structures indexed by concurrency IDs: for example, in a scenario where a process has two threads, and they periodically run one after the other on different NUMA nodes, each will be assigned mm_cid=3D0. As a consequence, they will end up accessing the same pages, and thus at least one of the threads will need to perform remote NUMA accesses, which is inefficient. That being said, the same issue theoretically exists due to false sharing of cache lines by threads running on after another on different cores/CPUs within a single NUMA node, but the extent of the performance impact is lesser than remote NUMA accesses. Solve this by making the rseq concurrency ID (mm_cid) NUMA-aware. On NUMA systems, when a NUMA-aware concurrency ID is observed by user-space to be associated with a NUMA node, guarantee that it never changes NUMA node unless either a kernel-level NUMA configuration change happens, or scheduler migrations end up migrating tasks across NUMA nodes. There is a tradeoff between NUMA locality and compactness of the concurrency ID allocation. Favor compactness over NUMA locality when the scheduler migrates tasks across NUMA nodes, as this does not cause the frequent remote NUMA accesses behavior. This is done by limiting the concurrency ID range to minimum between the number of threads belonging to the process and the number of allowed CPUs. Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Valentin Schneider Cc: Mel Gorman Cc: Steven Rostedt Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Ben Segall Reviewed-by: Shuah Khan --- include/linux/mm_types.h | 57 +++++++++++++++- kernel/sched/core.c | 10 ++- kernel/sched/sched.h | 139 +++++++++++++++++++++++++++++++++++---- 3 files changed, 190 insertions(+), 16 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index af3a0256fa93..4307352c8900 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include =20 @@ -1154,6 +1155,59 @@ static inline cpumask_t *mm_cidmask(struct mm_struct= *mm) return (struct cpumask *)cid_bitmap; } =20 +#ifdef CONFIG_NUMA +/* + * Layout of NUMA cidmasks: + * - node_alloc cidmask: cpumask tracking which cids were + * allocated (across nodes) in this + * memory map. + * - node cidmask[nr_node_ids]: per-node cpumask tracking which cid + * were allocated in this memory map. + */ +static inline cpumask_t *mm_node_alloc_cidmask(struct mm_struct *mm) +{ + unsigned long cid_bitmap =3D (unsigned long)mm_cidmask(mm); + + /* Skip mm_cidmask */ + cid_bitmap +=3D cpumask_size(); + return (struct cpumask *)cid_bitmap; +} + +static inline cpumask_t *mm_node_cidmask(struct mm_struct *mm, unsigned in= t node) +{ + unsigned long cid_bitmap =3D (unsigned long)mm_node_alloc_cidmask(mm); + + /* Skip node alloc cidmask */ + cid_bitmap +=3D cpumask_size(); + cid_bitmap +=3D node * cpumask_size(); + return (struct cpumask *)cid_bitmap; +} + +static inline void mm_init_node_cidmask(struct mm_struct *mm) +{ + unsigned int node; + + if (num_possible_nodes() =3D=3D 1) + return; + cpumask_clear(mm_node_alloc_cidmask(mm)); + for (node =3D 0; node < nr_node_ids; node++) + cpumask_clear(mm_node_cidmask(mm, node)); +} + +static inline unsigned int mm_node_cidmask_size(void) +{ + if (num_possible_nodes() =3D=3D 1) + return 0; + return (nr_node_ids + 1) * cpumask_size(); +} +#else /* CONFIG_NUMA */ +static inline void mm_init_node_cidmask(struct mm_struct *mm) { } +static inline unsigned int mm_node_cidmask_size(void) +{ + return 0; +} +#endif /* CONFIG_NUMA */ + static inline void mm_init_cid(struct mm_struct *mm) { int i; @@ -1165,6 +1219,7 @@ static inline void mm_init_cid(struct mm_struct *mm) pcpu_cid->time =3D 0; } cpumask_clear(mm_cidmask(mm)); + mm_init_node_cidmask(mm); } =20 static inline int mm_alloc_cid_noprof(struct mm_struct *mm) @@ -1185,7 +1240,7 @@ static inline void mm_destroy_cid(struct mm_struct *m= m) =20 static inline unsigned int mm_cid_size(void) { - return cpumask_size(); + return cpumask_size() + mm_node_cidmask_size(); } #else /* CONFIG_SCHED_MM_CID */ static inline void mm_init_cid(struct mm_struct *mm) { } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ebf21373f663..74b0e76bf036 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11792,9 +11792,13 @@ void sched_mm_cid_migrate_to(struct rq *dst_rq, st= ruct task_struct *t) * scenarios. * * It is not useful to clear the src cid when the number of threads is - * greater or equal to the number of allowed cpus, because user-space + * greater or equal to the number of CPUs allowed, because user-space * can expect that the number of allowed cids can reach the number of - * allowed cpus. + * CPUs allowed. + * + * This also prevents moving cid across NUMA nodes when the + * number of threads is greater or equal to the number of + * CPUs allowed. */ dst_pcpu_cid =3D per_cpu_ptr(mm->pcpu_cid, cpu_of(dst_rq)); dst_cid =3D READ_ONCE(dst_pcpu_cid->cid); @@ -12053,7 +12057,7 @@ void sched_mm_cid_after_execve(struct task_struct *= t) * Matches barrier in sched_mm_cid_remote_clear_old(). */ smp_mb(); - t->last_mm_cid =3D t->mm_cid =3D mm_cid_get(rq, mm); + t->last_mm_cid =3D t->mm_cid =3D mm_cid_get(rq, t, mm); } rseq_set_notify_resume(t); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 38aeedd8a6cc..723f3fb727b4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -68,6 +68,7 @@ #include #include #include +#include =20 #include #include @@ -3311,12 +3312,10 @@ static inline void mm_cid_put(struct mm_struct *mm) __mm_cid_put(mm, mm_cid_clear_lazy_put(cid)); } =20 -static inline int __mm_cid_try_get(struct mm_struct *mm) +static inline int __mm_cid_test_and_set_first(struct cpumask *cpumask) { - struct cpumask *cpumask; int cid; =20 - cpumask =3D mm_cidmask(mm); /* * Retry finding first zero bit if the mask is temporarily * filled. This only happens during concurrent remote-clear @@ -3333,9 +3332,123 @@ static inline int __mm_cid_try_get(struct mm_struct= *mm) return cid; } =20 +#ifdef CONFIG_NUMA +/* + * NUMA locality is preserved as long as the mm_cid range is restricted + * to the minimum between the number of CPUs allowed and the number of + * threads with references to the mm_struct. + */ +static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct= *mm) +{ + struct cpumask *cpumask =3D mm_cidmask(mm), + *node_cpumask =3D mm_node_cidmask(mm, numa_node_id()), + *node_alloc_cpumask =3D mm_node_alloc_cidmask(mm); + unsigned int node; + int cid; + + if (num_possible_nodes() =3D=3D 1) + return __mm_cid_test_and_set_first(cpumask); + + /* + * Try to reserve lowest available cid number within those + * already reserved for this NUMA node. + */ + cid =3D cpumask_first_andnot(node_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed || cid >=3D atomic_read(&mm->mm_users)) + goto alloc_numa; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + goto end; + +alloc_numa: + /* + * Try to reserve lowest available cid number within those not + * already allocated for NUMA nodes. + */ + cid =3D cpumask_first_notandnot(node_alloc_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed) + goto steal_overprovisioned_cid; + if (cid >=3D atomic_read(&mm->mm_users)) + goto steal_first_available_cid; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + __cpumask_set_cpu(cid, node_cpumask); + __cpumask_set_cpu(cid, node_alloc_cpumask); + goto end; + +steal_overprovisioned_cid: + /* + * Either the NUMA node id configuration changed for at least + * one CPU in the system, or the scheduler migrated threads + * across NUMA nodes, or the CPUs allowed mask changed. We need + * to steal a currently unused cid. Userspace must handle the + * fact that the node id associated with this cid may change. + * + * Try to steal an available cid number from an overprovisioned + * NUMA node. A NUMA node is overprovisioned when more cids are + * associated to it than the number of cores associated with + * this NUMA node in the CPUs allowed mask. Stealing from + * overprovisioned NUMA nodes ensures cid movement across NUMA + * nodes stabilises after a configuration or CPUs allowed mask + * change. + */ + for (node =3D 0; node < nr_node_ids; node++) { + struct cpumask *iter_cpumask; + int nr_allowed_cores; + + if (node =3D=3D numa_node_id()) + continue; + iter_cpumask =3D mm_node_cidmask(mm, node); + nr_allowed_cores =3D cpumask_weight_and(cpumask_of_node(node), t->cpus_p= tr); + if (cpumask_weight(iter_cpumask) <=3D nr_allowed_cores) + continue; + /* Try to steal from an overprovisioned NUMA node. */ + cid =3D cpumask_first_andnot(iter_cpumask, cpumask); + if (cid >=3D t->nr_cpus_allowed || cid >=3D atomic_read(&mm->mm_users)) + goto steal_first_available_cid; + if (cpumask_test_and_set_cpu(cid, cpumask)) + return -1; + __cpumask_clear_cpu(cid, iter_cpumask); + __cpumask_set_cpu(cid, node_cpumask); + goto end; + } + +steal_first_available_cid: + /* + * Steal the first available cid, without caring about NUMA + * locality. This is needed when the scheduler migrates threads + * across NUMA nodes, when those threads belong to processes + * which have fewer threads than the number of CPUs allowed. + */ + cid =3D __mm_cid_test_and_set_first(cpumask); + if (cid < 0) + return -1; + /* Steal cid from its NUMA node mask. */ + for (node =3D 0; node < nr_node_ids; node++) { + struct cpumask *iter_cpumask; + + if (node =3D=3D numa_node_id()) + continue; + iter_cpumask =3D mm_node_cidmask(mm, node); + if (cpumask_test_cpu(cid, iter_cpumask)) { + __cpumask_clear_cpu(cid, iter_cpumask); + break; + } + } + __cpumask_set_cpu(cid, node_cpumask); +end: + return cid; +} +#else +static inline int __mm_cid_try_get(struct task_struct *t, struct mm_struct= *mm) +{ + return __mm_cid_test_and_set_first(mm_cidmask(mm)); +} +#endif + /* - * Save a snapshot of the current runqueue time of this cpu - * with the per-cpu cid value, allowing to estimate how recently it was us= ed. + * Save a snapshot of the current runqueue time of this CPU + * with the per-CPU cid value, allowing to estimate how recently it was us= ed. */ static inline void mm_cid_snapshot_time(struct rq *rq, struct mm_struct *m= m) { @@ -3345,7 +3458,8 @@ static inline void mm_cid_snapshot_time(struct rq *rq= , struct mm_struct *mm) WRITE_ONCE(pcpu_cid->time, rq->clock); } =20 -static inline int __mm_cid_get(struct rq *rq, struct mm_struct *mm) +static inline int __mm_cid_get(struct rq *rq, struct task_struct *t, + struct mm_struct *mm) { int cid; =20 @@ -3355,13 +3469,13 @@ static inline int __mm_cid_get(struct rq *rq, struc= t mm_struct *mm) * guarantee forward progress. */ if (!READ_ONCE(use_cid_lock)) { - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); if (cid >=3D 0) goto end; raw_spin_lock(&cid_lock); } else { raw_spin_lock(&cid_lock); - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); if (cid >=3D 0) goto unlock; } @@ -3381,7 +3495,7 @@ static inline int __mm_cid_get(struct rq *rq, struct = mm_struct *mm) * all newcoming allocations observe the use_cid_lock flag set. */ do { - cid =3D __mm_cid_try_get(mm); + cid =3D __mm_cid_try_get(t, mm); cpu_relax(); } while (cid < 0); /* @@ -3397,7 +3511,8 @@ static inline int __mm_cid_get(struct rq *rq, struct = mm_struct *mm) return cid; } =20 -static inline int mm_cid_get(struct rq *rq, struct mm_struct *mm) +static inline int mm_cid_get(struct rq *rq, struct task_struct *t, + struct mm_struct *mm) { struct mm_cid __percpu *pcpu_cid =3D mm->pcpu_cid; struct cpumask *cpumask; @@ -3414,7 +3529,7 @@ static inline int mm_cid_get(struct rq *rq, struct mm= _struct *mm) if (try_cmpxchg(&this_cpu_ptr(pcpu_cid)->cid, &cid, MM_CID_UNSET)) __mm_cid_put(mm, mm_cid_clear_lazy_put(cid)); } - cid =3D __mm_cid_get(rq, mm); + cid =3D __mm_cid_get(rq, t, mm); __this_cpu_write(pcpu_cid->cid, cid); return cid; } @@ -3467,7 +3582,7 @@ static inline void switch_mm_cid(struct rq *rq, prev->mm_cid =3D -1; } if (next->mm_cid_active) - next->last_mm_cid =3D next->mm_cid =3D mm_cid_get(rq, next->mm); + next->last_mm_cid =3D next->mm_cid =3D mm_cid_get(rq, next, next->mm); } =20 #else --=20 2.39.2 From nobody Sun Feb 8 04:13:19 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3F4C16DC35; Mon, 19 Aug 2024 14:24:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077484; cv=none; b=reUYej2XZQbmFJp1qyA2YckhFP2Ag0RLh8xyDgBsUKpukvQTqca9R82Udfj1JBUV7L4F+RFgzS6zCpU2phXMFWszRuW+iFovSv2HfEFr0icbuD8ed2xpyyviJ9v2N3+K0r3Jfh022OVs3Y5m3WZgmIoCrHRKNZKqoY6pbvyKyxw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077484; c=relaxed/simple; bh=CYMdyg3x/d18fKFLJ4Xma0j0PTuNo6RSgspNychNcv0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NmuvYQkLsgK5eTONm50AUtfs/Yk0+SwjIDgdI9E7ehgs0vK7HgUKBA41IFL71czLKperXhy201eOF2I96SuHqLeDYYWKXiguKp4NBfk9wLDm3j8RlWm/gYDooHy6rwKil9BthPUeoT6TVCTsxqa0zxXDdkCIoZdiphnOfUuADdE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=u4k2d/0M; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="u4k2d/0M" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724077480; bh=CYMdyg3x/d18fKFLJ4Xma0j0PTuNo6RSgspNychNcv0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u4k2d/0MmggWT7xYGW+xGUVk85VqoFIRfL+iIfLM6RhXsV0TywYBPh0wqpMw5/QWA iLMEwtmypD8yDZSe4M4tJonvrOVXRN0zg7nO+Ub4rtAe+3ZKgvB9tA4K7Z+pB62hkJ VZGzNJyOJNtaNoFrY0V9pfqvXlW13fwsZFJ3j7qE/7B8bTNerLX5xLSVRraNggMkWN OiUg1SkbDeTTbK20Jevznx4TCIOkOHBLFLuXl/UaHt08Eu7HfrGAsQj511UEDgQ1AB au2qT9qNtU380fhSWMvONu9IzjZVK/Sdh2Mn8uq6ndV35ccH91KxfQlFMq90H/iRuq RHuIqM4B8HS/g== Received: from thinkos.internal.efficios.com (109.56.13.38.mobile.3.dk [109.56.13.38]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4WnZc64l1Vz1HSP; Mon, 19 Aug 2024 10:24:38 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan , linux-kselftest@vger.kernel.org Subject: [RFC PATCH 4/5] selftests/rseq: x86: Implement rseq_load_u32_u32 Date: Mon, 19 Aug 2024 16:24:05 +0200 Message-Id: <20240819142406.339084-5-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> References: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow loading a pair of u32 within a rseq critical section. It can be used in situations where both rseq_abi()->mm_cid and rseq_abi()->node_id need to be sampled atomically with respect to preemption, signal delivery and migration. Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Shuah Khan Cc: linux-kselftest@vger.kernel.org Reviewed-by: Shuah Khan --- tools/testing/selftests/rseq/rseq-x86-bits.h | 43 ++++++++++++++++++++ tools/testing/selftests/rseq/rseq.h | 14 +++++++ 2 files changed, 57 insertions(+) diff --git a/tools/testing/selftests/rseq/rseq-x86-bits.h b/tools/testing/s= elftests/rseq/rseq-x86-bits.h index 8a9431eec467..fdf5ef398393 100644 --- a/tools/testing/selftests/rseq/rseq-x86-bits.h +++ b/tools/testing/selftests/rseq/rseq-x86-bits.h @@ -990,4 +990,47 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_cmpeqv_trymemcpy_sto= rev)(intptr_t *v, intptr_t =20 #endif =20 +#if defined(RSEQ_TEMPLATE_CPU_ID_NONE) && defined(RSEQ_TEMPLATE_MO_RELAXED) + +#define RSEQ_ARCH_HAS_LOAD_U32_U32 + +static inline __attribute__((always_inline)) +int RSEQ_TEMPLATE_IDENTIFIER(rseq_load_u32_u32)(uint32_t *dst1, uint32_t *= src1, + uint32_t *dst2, uint32_t *src2) +{ + RSEQ_INJECT_C(9) + + __asm__ __volatile__ goto ( + RSEQ_ASM_DEFINE_TABLE(3, 1f, 2f, 4f) /* start, commit, abort */ + /* Start rseq by storing table entry pointer into rseq_cs. */ + RSEQ_ASM_STORE_RSEQ_CS(1, 3b, RSEQ_ASM_TP_SEGMENT:RSEQ_CS_OFFSET(%[rseq_= offset])) + RSEQ_INJECT_ASM(3) + "movl %[src1], %%eax\n\t" + "movl %%eax, %[dst1]\n\t" + "movl %[src2], %%eax\n\t" + "movl %%eax, %[dst2]\n\t" + "2:\n\t" + RSEQ_INJECT_ASM(4) + RSEQ_ASM_DEFINE_ABORT(4, "", abort) + : /* gcc asm goto does not allow outputs */ + : [rseq_offset] "r" (rseq_offset), + /* final store input */ + [dst1] "m" (*dst1), + [src1] "m" (*src1), + [dst2] "m" (*dst2), + [src2] "m" (*src2) + : "memory", "cc", "rax" + RSEQ_INJECT_CLOBBER + : abort + ); + rseq_after_asm_goto(); + return 0; +abort: + rseq_after_asm_goto(); + RSEQ_INJECT_FAILED + return -1; +} + +#endif /* defined(RSEQ_TEMPLATE_CPU_ID_NONE) && defined(RSEQ_TEMPLATE_MO_R= ELAXED) */ + #include "rseq-bits-reset.h" diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/= rseq/rseq.h index d7364ea4d201..b6095c2a5da6 100644 --- a/tools/testing/selftests/rseq/rseq.h +++ b/tools/testing/selftests/rseq/rseq.h @@ -381,4 +381,18 @@ int rseq_cmpeqv_trymemcpy_storev(enum rseq_mo rseq_mo,= enum rseq_percpu_mode per } } =20 +#ifdef RSEQ_ARCH_HAS_LOAD_U32_U32 + +static inline __attribute__((always_inline)) +int rseq_load_u32_u32(enum rseq_mo rseq_mo, + uint32_t *dst1, uint32_t *src1, + uint32_t *dst2, uint32_t *src2) +{ + if (rseq_mo !=3D RSEQ_MO_RELAXED) + return -1; + return rseq_load_u32_u32_relaxed(dst1, src1, dst2, src2); +} + +#endif + #endif /* RSEQ_H_ */ --=20 2.39.2 From nobody Sun Feb 8 04:13:19 2026 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 797D216D9A4; Mon, 19 Aug 2024 14:24:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077489; cv=none; b=pRshjTsllpl61V+32IVv+FtwMgucLEHKocc10v2yshkPtbwPtdMn6BitWO9gWecVITfm3muEqoXFEKZY63mw8POXE7NjjFlUeIjTT3JvKc3HYiEnsQ4ODQfhip4/BWHgEK4vEAQcJDehKL/kNSUOgqWSK6E2BktVAmkMJ45Kzro= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724077489; c=relaxed/simple; bh=RjoBZ0RC9t+J8YPy3KqE0+7gzbB14ug6icqY6ivCwU0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZMDuYHfsCHxzLXY/Mmkydv1Qu5ApsMuh7n4IvubA8gRIZvLv4IHYVctGc7PqAlRkMsmV4S7JUS7WWnIDlRaui2ktlBhOCqIQEpSIQ7d+ZYM/cAxG6jBppfGOdCTViz0DQvCM4QvHMZAlzTAULMGdQLikqhav/UF5dCeZn78SCRE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=jVjIaA3u; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="jVjIaA3u" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1724077483; bh=RjoBZ0RC9t+J8YPy3KqE0+7gzbB14ug6icqY6ivCwU0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jVjIaA3uJKw84Hpm1kazZUWmxmgvN0rQv+rtckMGEWtzUGZFPcy8i0doYiNbzezz4 o88QfhKpfredf49Vh8ZQiRmF7qOmG20HbwvQlhTN4ve02RvKGTaloMuN0tlny9rAZB hhx48yu0ci5m0KfVVQBh5k2X0PV+ERxQn7OmofpdXUYPZko99/nbEZ4vVAcVRHupv4 vYPzRtFdV/5j6T6gb94ekKKAffQfihzJhUv9/8J2N+2bzTDS0u49hFGhdak5WqFBHa 4WZXPRI2PN/t17rY+jGsIXa508Bx9lapw9NNmE2r+vGuX7JY09BQuBlXq568kchAEi o5gSbcragBBoA== Received: from thinkos.internal.efficios.com (109.56.13.38.mobile.3.dk [109.56.13.38]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4WnZc916vwz1HSQ; Mon, 19 Aug 2024 10:24:41 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra , Ingo Molnar Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Valentin Schneider , Mel Gorman , Steven Rostedt , Vincent Guittot , Dietmar Eggemann , Ben Segall , Yury Norov , Rasmus Villemoes , Shuah Khan , linux-kselftest@vger.kernel.org Subject: [RFC PATCH 5/5] selftests/rseq: Implement NUMA node id vs mm_cid invariant test Date: Mon, 19 Aug 2024 16:24:06 +0200 Message-Id: <20240819142406.339084-6-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> References: <20240819142406.339084-1-mathieu.desnoyers@efficios.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This test validates that the mapping between a mm_cid and a NUMA node id remains invariant for the process lifetime for a process with a number of threads >=3D number of allowed CPUs. In other words, it validates that if any thread within the process running on behalf of a mm_cid N observes a NUMA node id M, all threads within this process will always observe the same NUMA node id value when running on behalf of that same mm_cid. This characteristic is important for NUMA locality. On all architectures except Power, the NUMA topology is never reconfigured after a CPU has been associated with a NUMA node in the system lifetime. Even on Power, we can assume that NUMA topology reconfiguration happens rarely, and therefore we do not expect it to happen while the NUMA test is running. As a result the NUMA node id associated with a mm_cid should be invariant as long as: - A process has a number of threads >=3D number of allowed CPUs, - The allowed CPUs mask is unchanged, and - The NUMA configuration is unchanged. This test is skipped on architectures that do not implement rseq_load_u32_u32. Signed-off-by: Mathieu Desnoyers Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Shuah Khan Cc: linux-kselftest@vger.kernel.org Reviewed-by: Shuah Khan --- tools/testing/selftests/rseq/.gitignore | 1 + tools/testing/selftests/rseq/Makefile | 2 +- .../testing/selftests/rseq/basic_numa_test.c | 144 ++++++++++++++++++ 3 files changed, 146 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/rseq/basic_numa_test.c diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selfte= sts/rseq/.gitignore index 16496de5f6ce..8a8d163cbb9f 100644 --- a/tools/testing/selftests/rseq/.gitignore +++ b/tools/testing/selftests/rseq/.gitignore @@ -1,4 +1,5 @@ # SPDX-License-Identifier: GPL-2.0-only +basic_numa_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test basic_test diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftest= s/rseq/Makefile index 5a3432fceb58..9ef1c949114a 100644 --- a/tools/testing/selftests/rseq/Makefile +++ b/tools/testing/selftests/rseq/Makefile @@ -14,7 +14,7 @@ LDLIBS +=3D -lpthread -ldl # still track changes to header files and depend on shared object. OVERRIDE_TARGETS =3D 1 =20 -TEST_GEN_PROGS =3D basic_test basic_percpu_ops_test basic_percpu_ops_mm_ci= d_test param_test \ +TEST_GEN_PROGS =3D basic_test basic_numa_test basic_percpu_ops_test basic_= percpu_ops_mm_cid_test param_test \ param_test_benchmark param_test_compare_twice param_test_mm_cid \ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice =20 diff --git a/tools/testing/selftests/rseq/basic_numa_test.c b/tools/testing= /selftests/rseq/basic_numa_test.c new file mode 100644 index 000000000000..8e51c662057d --- /dev/null +++ b/tools/testing/selftests/rseq/basic_numa_test.c @@ -0,0 +1,144 @@ +// SPDX-License-Identifier: LGPL-2.1 +/* + * Basic rseq NUMA test. Validate that (mm_cid, numa_node_id) pairs are + * invariant when the number of threads >=3D number of allowed CPUs, as + * long as those preconditions are respected: + * + * - A process has a number of threads >=3D number of allowed CPUs, + * - The allowed CPUs mask is unchanged, and + * - The NUMA configuration is unchanged. + */ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include + +#include "rseq.h" + +#define NR_LOOPS 100 + +static int nr_threads, nr_active_threads, test_go, test_stop; + +#ifdef RSEQ_ARCH_HAS_LOAD_U32_U32 + +static int cpu_numa_id[CPU_SETSIZE]; + +static int get_affinity_weight(void) +{ + cpu_set_t allowed_cpus; + + if (sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus)) { + perror("sched_getaffinity"); + abort(); + } + return CPU_COUNT(&allowed_cpus); +} + +static void numa_id_init(void) +{ + int i; + + for (i =3D 0; i < CPU_SETSIZE; i++) + cpu_numa_id[i] =3D -1; +} + +static void *test_thread(void *arg) +{ + int i; + + if (rseq_register_current_thread()) { + fprintf(stderr, "Error: rseq_register_current_thread(...) failed(%d): %s= \n", + errno, strerror(errno)); + abort(); + } + /* + * Rendez-vous across all threads to make sure the number of + * threads >=3D number of possible CPUs for the entire test duration. + */ + if (__atomic_add_fetch(&nr_active_threads, 1, __ATOMIC_RELAXED) =3D=3D nr= _threads) + __atomic_store_n(&test_go, 1, __ATOMIC_RELAXED); + while (!__atomic_load_n(&test_go, __ATOMIC_RELAXED)) + rseq_barrier(); + + for (i =3D 0; i < NR_LOOPS; i++) { + uint32_t mm_cid, node; + int cached_node_id; + + while (rseq_load_u32_u32(RSEQ_MO_RELAXED, &mm_cid, + &rseq_get_abi()->mm_cid, + &node, &rseq_get_abi()->node_id) !=3D 0) { + /* Retry. */ + } + cached_node_id =3D RSEQ_READ_ONCE(cpu_numa_id[mm_cid]); + if (cached_node_id =3D=3D -1) { + RSEQ_WRITE_ONCE(cpu_numa_id[mm_cid], node); + } else { + if (node !=3D cached_node_id) { + fprintf(stderr, "Error: NUMA node id discrepancy: mm_cid %u cached nod= e id %d node id %u.\n", + mm_cid, cached_node_id, node); + fprintf(stderr, "This is likely a kernel bug, or caused by a concurren= t NUMA topology reconfiguration.\n"); + abort(); + } + } + (void) poll(NULL, 0, 10); /* wait 10ms */ + } + /* + * Rendez-vous before exiting all threads to make sure the + * number of threads >=3D number of possible CPUs for the entire + * test duration. + */ + if (__atomic_sub_fetch(&nr_active_threads, 1, __ATOMIC_RELAXED) =3D=3D 0) + __atomic_store_n(&test_stop, 1, __ATOMIC_RELAXED); + while (!__atomic_load_n(&test_stop, __ATOMIC_RELAXED)) + rseq_barrier(); + + if (rseq_unregister_current_thread()) { + fprintf(stderr, "Error: rseq_unregister_current_thread(...) failed(%d): = %s\n", + errno, strerror(errno)); + abort(); + } + return NULL; +} + +static int test_numa(void) +{ + pthread_t tid[nr_threads]; + int err, i; + void *tret; + + numa_id_init(); + + printf("testing rseq (mm_cid, numa_node_id) invariant, multi-threaded (%d= threads)\n", + nr_threads); + + for (i =3D 0; i < nr_threads; i++) { + err =3D pthread_create(&tid[i], NULL, test_thread, NULL); + if (err !=3D 0) + abort(); + } + + for (i =3D 0; i < nr_threads; i++) { + err =3D pthread_join(tid[i], &tret); + if (err !=3D 0) + abort(); + } + + return 0; +} +#else +static int test_numa(void) +{ + fprintf(stderr, "rseq_load_u32_u32 is not implemented on this architectur= e. Skipping numa test.\n"); + return 0; +} +#endif + +int main(int argc, char **argv) +{ + nr_threads =3D get_affinity_weight(); + return test_numa(); +} --=20 2.39.2