[PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

Kuan-Wei Chiu posted 16 patches 11 months, 1 week ago
drivers/fsi/fsi-master-i2cr.c                 | 18 ++-----
.../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
drivers/input/joystick/grip_mp.c              | 17 +-----
drivers/input/joystick/sidewinder.c           | 24 ++-------
drivers/media/i2c/saa7115.c                   | 12 +----
drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +----
.../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
drivers/mtd/ssfdc.c                           | 20 ++-----
drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
drivers/net/ethernet/oa_tc6.c                 | 19 ++-----
.../broadcom/brcm80211/brcmsmac/dma.c         | 16 +-----
drivers/tty/serial/max3100.c                  |  3 +-
include/linux/bitops.h                        | 52 +++++++++++++++++--
lib/bch.c                                     | 14 +----
14 files changed, 77 insertions(+), 153 deletions(-)
[PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months, 1 week ago
Several parts of the kernel contain redundant implementations of parity
calculations for 16/32/64-bit values. Introduces generic
parity16/32/64() helpers in bitops.h, providing a standardized
and optimized implementation. 

Subsequent patches refactor various kernel components to replace
open-coded parity calculations with the new helpers, reducing code
duplication and improving maintainability.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
In v3, I use parityXX() instead of the parity() macro since the
parity() macro may generate suboptimal code and requires special hacks
to make GCC happy. If anyone still prefers a single parity() macro,
please let me know.

Additionally, I changed parityXX() << y users to !!parityXX() << y
because, unlike C++, C does not guarantee that true casts to int as 1.

Changes in v3:
- Avoid using __builtin_parity.
- Change return type to bool.
- Drop parity() macro.
- Change parityXX() << y to !!parityXX() << y.


Changes in v2:
- Provide fallback functions for __builtin_parity() when the compiler
  decides not to inline it
- Use __builtin_parity() when no architecture-specific implementation
  is available
- Optimize for constant folding when val is a compile-time constant
- Add a generic parity() macro
- Drop the x86 bootflag conversion patch since it has been merged into
  the tip tree

v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitorckw@gmail.com/
v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitorckw@gmail.com/

Kuan-Wei Chiu (16):
  bitops: Change parity8() return type to bool
  bitops: Add parity16(), parity32(), and parity64() helpers
  media: media/test_drivers: Replace open-coded parity calculation with
    parity8()
  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
    parity8()
  media: saa7115: Replace open-coded parity calculation with parity8()
  serial: max3100: Replace open-coded parity calculation with parity8()
  lib/bch: Replace open-coded parity calculation with parity32()
  Input: joystick - Replace open-coded parity calculation with
    parity32()
  net: ethernet: oa_tc6: Replace open-coded parity calculation with
    parity32()
  wifi: brcm80211: Replace open-coded parity calculation with parity32()
  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
    parity32()
  mtd: ssfdc: Replace open-coded parity calculation with parity32()
  fsi: i2cr: Replace open-coded parity calculation with parity32()
  fsi: i2cr: Replace open-coded parity calculation with parity64()
  Input: joystick - Replace open-coded parity calculation with
    parity64()
  nfp: bpf: Replace open-coded parity calculation with parity64()

 drivers/fsi/fsi-master-i2cr.c                 | 18 ++-----
 .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
 drivers/input/joystick/grip_mp.c              | 17 +-----
 drivers/input/joystick/sidewinder.c           | 24 ++-------
 drivers/media/i2c/saa7115.c                   | 12 +----
 drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +----
 .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
 drivers/mtd/ssfdc.c                           | 20 ++-----
 drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
 drivers/net/ethernet/oa_tc6.c                 | 19 ++-----
 .../broadcom/brcm80211/brcmsmac/dma.c         | 16 +-----
 drivers/tty/serial/max3100.c                  |  3 +-
 include/linux/bitops.h                        | 52 +++++++++++++++++--
 lib/bch.c                                     | 14 +----
 14 files changed, 77 insertions(+), 153 deletions(-)

-- 
2.34.1
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months, 1 week ago
On March 6, 2025 8:25:25 AM PST, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
>Several parts of the kernel contain redundant implementations of parity
>calculations for 16/32/64-bit values. Introduces generic
>parity16/32/64() helpers in bitops.h, providing a standardized
>and optimized implementation. 
>
>Subsequent patches refactor various kernel components to replace
>open-coded parity calculations with the new helpers, reducing code
>duplication and improving maintainability.
>
>Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
>Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
>Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>---
>In v3, I use parityXX() instead of the parity() macro since the
>parity() macro may generate suboptimal code and requires special hacks
>to make GCC happy. If anyone still prefers a single parity() macro,
>please let me know.
>
>Additionally, I changed parityXX() << y users to !!parityXX() << y
>because, unlike C++, C does not guarantee that true casts to int as 1.
>
>Changes in v3:
>- Avoid using __builtin_parity.
>- Change return type to bool.
>- Drop parity() macro.
>- Change parityXX() << y to !!parityXX() << y.
>
>
>Changes in v2:
>- Provide fallback functions for __builtin_parity() when the compiler
>  decides not to inline it
>- Use __builtin_parity() when no architecture-specific implementation
>  is available
>- Optimize for constant folding when val is a compile-time constant
>- Add a generic parity() macro
>- Drop the x86 bootflag conversion patch since it has been merged into
>  the tip tree
>
>v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitorckw@gmail.com/
>v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitorckw@gmail.com/
>
>Kuan-Wei Chiu (16):
>  bitops: Change parity8() return type to bool
>  bitops: Add parity16(), parity32(), and parity64() helpers
>  media: media/test_drivers: Replace open-coded parity calculation with
>    parity8()
>  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
>    parity8()
>  media: saa7115: Replace open-coded parity calculation with parity8()
>  serial: max3100: Replace open-coded parity calculation with parity8()
>  lib/bch: Replace open-coded parity calculation with parity32()
>  Input: joystick - Replace open-coded parity calculation with
>    parity32()
>  net: ethernet: oa_tc6: Replace open-coded parity calculation with
>    parity32()
>  wifi: brcm80211: Replace open-coded parity calculation with parity32()
>  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
>    parity32()
>  mtd: ssfdc: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity64()
>  Input: joystick - Replace open-coded parity calculation with
>    parity64()
>  nfp: bpf: Replace open-coded parity calculation with parity64()
>
> drivers/fsi/fsi-master-i2cr.c                 | 18 ++-----
> .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
> drivers/input/joystick/grip_mp.c              | 17 +-----
> drivers/input/joystick/sidewinder.c           | 24 ++-------
> drivers/media/i2c/saa7115.c                   | 12 +----
> drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +----
> .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
> drivers/mtd/ssfdc.c                           | 20 ++-----
> drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
> drivers/net/ethernet/oa_tc6.c                 | 19 ++-----
> .../broadcom/brcm80211/brcmsmac/dma.c         | 16 +-----
> drivers/tty/serial/max3100.c                  |  3 +-
> include/linux/bitops.h                        | 52 +++++++++++++++++--
> lib/bch.c                                     | 14 +----
> 14 files changed, 77 insertions(+), 153 deletions(-)
>

!!x is used with a value that is not necessary booleanized already, and is exactly equivalent to (x ? true : false). It is totally redundant on a value known to be bool.

If (int)true wasn't inherently 1, then !! wouldn't work either. 

There was a time when some code would use as a temporary hack: 

typedef enum { false, true } bool;

... when compiling on pre-C99 compilers; in that case a (bool) case wouldn't necessarily work as expected, whereas !! would. Furthermore, unlike (bool), !! works in the preprocessor.
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
+Cc Waiman Long for bool cast to int discussion

Hi Peter,

On Thu, Mar 06, 2025 at 07:14:13PM -0800, H. Peter Anvin wrote:
> On March 6, 2025 8:25:25 AM PST, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> >Several parts of the kernel contain redundant implementations of parity
> >calculations for 16/32/64-bit values. Introduces generic
> >parity16/32/64() helpers in bitops.h, providing a standardized
> >and optimized implementation. 
> >
> >Subsequent patches refactor various kernel components to replace
> >open-coded parity calculations with the new helpers, reducing code
> >duplication and improving maintainability.
> >
> >Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> >Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> >Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> >---
> >In v3, I use parityXX() instead of the parity() macro since the
> >parity() macro may generate suboptimal code and requires special hacks
> >to make GCC happy. If anyone still prefers a single parity() macro,
> >please let me know.
> >
> >Additionally, I changed parityXX() << y users to !!parityXX() << y
> >because, unlike C++, C does not guarantee that true casts to int as 1.
> >
> >Changes in v3:
> >- Avoid using __builtin_parity.
> >- Change return type to bool.
> >- Drop parity() macro.
> >- Change parityXX() << y to !!parityXX() << y.
> >
> >
> >Changes in v2:
> >- Provide fallback functions for __builtin_parity() when the compiler
> >  decides not to inline it
> >- Use __builtin_parity() when no architecture-specific implementation
> >  is available
> >- Optimize for constant folding when val is a compile-time constant
> >- Add a generic parity() macro
> >- Drop the x86 bootflag conversion patch since it has been merged into
> >  the tip tree
> >
> >v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitorckw@gmail.com/
> >v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitorckw@gmail.com/
> >
> >Kuan-Wei Chiu (16):
> >  bitops: Change parity8() return type to bool
> >  bitops: Add parity16(), parity32(), and parity64() helpers
> >  media: media/test_drivers: Replace open-coded parity calculation with
> >    parity8()
> >  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
> >    parity8()
> >  media: saa7115: Replace open-coded parity calculation with parity8()
> >  serial: max3100: Replace open-coded parity calculation with parity8()
> >  lib/bch: Replace open-coded parity calculation with parity32()
> >  Input: joystick - Replace open-coded parity calculation with
> >    parity32()
> >  net: ethernet: oa_tc6: Replace open-coded parity calculation with
> >    parity32()
> >  wifi: brcm80211: Replace open-coded parity calculation with parity32()
> >  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
> >    parity32()
> >  mtd: ssfdc: Replace open-coded parity calculation with parity32()
> >  fsi: i2cr: Replace open-coded parity calculation with parity32()
> >  fsi: i2cr: Replace open-coded parity calculation with parity64()
> >  Input: joystick - Replace open-coded parity calculation with
> >    parity64()
> >  nfp: bpf: Replace open-coded parity calculation with parity64()
> >
> > drivers/fsi/fsi-master-i2cr.c                 | 18 ++-----
> > .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
> > drivers/input/joystick/grip_mp.c              | 17 +-----
> > drivers/input/joystick/sidewinder.c           | 24 ++-------
> > drivers/media/i2c/saa7115.c                   | 12 +----
> > drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +----
> > .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
> > drivers/mtd/ssfdc.c                           | 20 ++-----
> > drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
> > drivers/net/ethernet/oa_tc6.c                 | 19 ++-----
> > .../broadcom/brcm80211/brcmsmac/dma.c         | 16 +-----
> > drivers/tty/serial/max3100.c                  |  3 +-
> > include/linux/bitops.h                        | 52 +++++++++++++++++--
> > lib/bch.c                                     | 14 +----
> > 14 files changed, 77 insertions(+), 153 deletions(-)
> >
> 
> !!x is used with a value that is not necessary booleanized already, and is exactly equivalent to (x ? true : false). It is totally redundant on a value known to be bool.
> 
I used to believe that casting a boolean variable to int would always
result in 0 or 1 until a few months ago when Waiman Long explicitly
pointed out during a review that C does not guarantee this.

So I revisited the C11 standard, which states that casting to _Bool
always results in 0 or 1 [1]. Another section specifies that bool,
true, and false are macros defined in <stdbool.h>, with true expanding
to 1 and false to 0. However, these macros can be #undef and redefined
to other values [2]. I'm not sure if this is sufficient to conclude
that casting bool to int will always result in 0 or 1, but if the
consensus is that it does, I'll remove the !! hack in the next version.

> If (int)true wasn't inherently 1, then !! wouldn't work either. 
> 
The C standard guarantees that the ! operator returns an int, either 0
or 1. So regardless of how true casts, using !! should work. Right?

> There was a time when some code would use as a temporary hack: 
> 
> typedef enum { false, true } bool;
> 
> ... when compiling on pre-C99 compilers; in that case a (bool) case wouldn't necessarily work as expected, whereas !! would. Furthermore, unlike (bool), !! works in the preprocessor.

I'm not entirely sure how !! works in the preprocessor. I always
thought it was handled by the compiler. Could you elaborate on this?

Regards,
Kuan-Wei

[1]: 6.3.1.2 Boolean type
1 When any scalar value is converted to _Bool, the result is 0 if the value compares equal
to 0; otherwise, the result is 1.59)

[2]: 7.18 Boolean type and values <stdbool.h>
1 The header <stdbool.h> defines four macros.
2 The macro
bool
expands to _Bool.
3 The remaining three macros are suitable for use in #if preprocessing directives. They
are
true
which expands to the integer constant 1,
false
which expands to the integer constant 0, and
_ _bool_true_false_are_defined
which expands to the integer constant 1.
4 Notwithstanding the provisions of 7.1.3, a program may undefine and perhaps then
redefine the macros bool, true, and false.
259)
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Jiri Slaby 11 months ago
On 07. 03. 25, 10:19, Kuan-Wei Chiu wrote:
> I used to believe that casting a boolean variable to int would always
> result in 0 or 1 until a few months ago when Waiman Long explicitly
> pointed out during a review that C does not guarantee this.
> 
> So I revisited the C11 standard, which states that casting to _Bool
> always results in 0 or 1 [1]. Another section specifies that bool,
> true, and false are macros defined in <stdbool.h>, with true expanding
> to 1 and false to 0. However, these macros can be #undef and redefined
> to other values [2].

Note that we do not have/use user's stdbool.h in kernel at all. Instead, 
in linux/stddef.h, we define:
enum {
         false   = 0,
         true    = 1
};

So all is blue.

thanks,
-- 
js
suse labs
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months, 1 week ago
On March 6, 2025 8:25:25 AM PST, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
>Several parts of the kernel contain redundant implementations of parity
>calculations for 16/32/64-bit values. Introduces generic
>parity16/32/64() helpers in bitops.h, providing a standardized
>and optimized implementation. 
>
>Subsequent patches refactor various kernel components to replace
>open-coded parity calculations with the new helpers, reducing code
>duplication and improving maintainability.
>
>Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
>Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
>Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>---
>In v3, I use parityXX() instead of the parity() macro since the
>parity() macro may generate suboptimal code and requires special hacks
>to make GCC happy. If anyone still prefers a single parity() macro,
>please let me know.
>
>Additionally, I changed parityXX() << y users to !!parityXX() << y
>because, unlike C++, C does not guarantee that true casts to int as 1.
>
>Changes in v3:
>- Avoid using __builtin_parity.
>- Change return type to bool.
>- Drop parity() macro.
>- Change parityXX() << y to !!parityXX() << y.
>
>
>Changes in v2:
>- Provide fallback functions for __builtin_parity() when the compiler
>  decides not to inline it
>- Use __builtin_parity() when no architecture-specific implementation
>  is available
>- Optimize for constant folding when val is a compile-time constant
>- Add a generic parity() macro
>- Drop the x86 bootflag conversion patch since it has been merged into
>  the tip tree
>
>v1: https://lore.kernel.org/lkml/20250223164217.2139331-1-visitorckw@gmail.com/
>v2: https://lore.kernel.org/lkml/20250301142409.2513835-1-visitorckw@gmail.com/
>
>Kuan-Wei Chiu (16):
>  bitops: Change parity8() return type to bool
>  bitops: Add parity16(), parity32(), and parity64() helpers
>  media: media/test_drivers: Replace open-coded parity calculation with
>    parity8()
>  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
>    parity8()
>  media: saa7115: Replace open-coded parity calculation with parity8()
>  serial: max3100: Replace open-coded parity calculation with parity8()
>  lib/bch: Replace open-coded parity calculation with parity32()
>  Input: joystick - Replace open-coded parity calculation with
>    parity32()
>  net: ethernet: oa_tc6: Replace open-coded parity calculation with
>    parity32()
>  wifi: brcm80211: Replace open-coded parity calculation with parity32()
>  drm/bridge: dw-hdmi: Replace open-coded parity calculation with
>    parity32()
>  mtd: ssfdc: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity32()
>  fsi: i2cr: Replace open-coded parity calculation with parity64()
>  Input: joystick - Replace open-coded parity calculation with
>    parity64()
>  nfp: bpf: Replace open-coded parity calculation with parity64()
>
> drivers/fsi/fsi-master-i2cr.c                 | 18 ++-----
> .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +--
> drivers/input/joystick/grip_mp.c              | 17 +-----
> drivers/input/joystick/sidewinder.c           | 24 ++-------
> drivers/media/i2c/saa7115.c                   | 12 +----
> drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +----
> .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +--
> drivers/mtd/ssfdc.c                           | 20 ++-----
> drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
> drivers/net/ethernet/oa_tc6.c                 | 19 ++-----
> .../broadcom/brcm80211/brcmsmac/dma.c         | 16 +-----
> drivers/tty/serial/max3100.c                  |  3 +-
> include/linux/bitops.h                        | 52 +++++++++++++++++--
> lib/bch.c                                     | 14 +----
> 14 files changed, 77 insertions(+), 153 deletions(-)
>

(int)true most definitely is guaranteed to be 1.
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Andrew Cooper 11 months ago
> (int)true most definitely is guaranteed to be 1.

That's not technically correct any more.

GCC has introduced hardened bools that intentionally have bit patterns
other than 0 and 1.

https://gcc.gnu.org/gcc-14/changes.html

~Andrew
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months ago
On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> (int)true most definitely is guaranteed to be 1.
>
>That's not technically correct any more.
>
>GCC has introduced hardened bools that intentionally have bit patterns
>other than 0 and 1.
>
>https://gcc.gnu.org/gcc-14/changes.html
>
>~Andrew

Bit patterns in memory maybe (not that I can see the Linux kernel using them) but for compiler-generated conversations that's still a given, or the manager isn't C or anything even remotely like it.
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by David Laight 11 months ago
On Fri, 07 Mar 2025 11:30:35 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> (int)true most definitely is guaranteed to be 1.  
> >
> >That's not technically correct any more.
> >
> >GCC has introduced hardened bools that intentionally have bit patterns
> >other than 0 and 1.
> >
> >https://gcc.gnu.org/gcc-14/changes.html
> >
> >~Andrew  
> 
> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> for compiler-generated conversations that's still a given, or the manager isn't C
> or anything even remotely like it.
> 

The whole idea of 'bool' is pretty much broken by design.
The underlying problem is that values other than 'true' and 'false' can
always get into 'bool' variables.

Once that has happened it is all fubar.

Trying to sanitise a value with (say):
int f(bool v)
{
	return (int)v & 1;
}    
just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)

I really don't see how using (say) 0xaa and 0x55 helps.
What happens if the value is wrong? a trap or exception?, good luck recovering
from that.

	David
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months ago
On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Fri, 07 Mar 2025 11:30:35 -0800
>"H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> >> (int)true most definitely is guaranteed to be 1.  
>> >
>> >That's not technically correct any more.
>> >
>> >GCC has introduced hardened bools that intentionally have bit patterns
>> >other than 0 and 1.
>> >
>> >https://gcc.gnu.org/gcc-14/changes.html
>> >
>> >~Andrew  
>> 
>> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
>> for compiler-generated conversations that's still a given, or the manager isn't C
>> or anything even remotely like it.
>> 
>
>The whole idea of 'bool' is pretty much broken by design.
>The underlying problem is that values other than 'true' and 'false' can
>always get into 'bool' variables.
>
>Once that has happened it is all fubar.
>
>Trying to sanitise a value with (say):
>int f(bool v)
>{
>	return (int)v & 1;
>}    
>just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>
>I really don't see how using (say) 0xaa and 0x55 helps.
>What happens if the value is wrong? a trap or exception?, good luck recovering
>from that.
>
>	David

Did you just discover GIGO?
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> >On Fri, 07 Mar 2025 11:30:35 -0800
> >"H. Peter Anvin" <hpa@zytor.com> wrote:
> >
> >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> >> (int)true most definitely is guaranteed to be 1.  
> >> >
> >> >That's not technically correct any more.
> >> >
> >> >GCC has introduced hardened bools that intentionally have bit patterns
> >> >other than 0 and 1.
> >> >
> >> >https://gcc.gnu.org/gcc-14/changes.html
> >> >
> >> >~Andrew  
> >> 
> >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> >> for compiler-generated conversations that's still a given, or the manager isn't C
> >> or anything even remotely like it.
> >> 
> >
> >The whole idea of 'bool' is pretty much broken by design.
> >The underlying problem is that values other than 'true' and 'false' can
> >always get into 'bool' variables.
> >
> >Once that has happened it is all fubar.
> >
> >Trying to sanitise a value with (say):
> >int f(bool v)
> >{
> >	return (int)v & 1;
> >}    
> >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> >
> >I really don't see how using (say) 0xaa and 0x55 helps.
> >What happens if the value is wrong? a trap or exception?, good luck recovering
> >from that.
> >
> >	David
> 
> Did you just discover GIGO?

Thanks for all the suggestions.

I don't have a strong opinion on the naming or return type. I'm still a
bit confused about whether I can assume that casting bool to int always
results in 0 or 1.

If that's the case, since most people prefer bool over int as the
return type and some are against introducing u1, my current plan is to
use the following in the next version:

bool parity_odd(u64 val);

This keeps the bool return type, renames the function for better
clarity, and avoids extra maintenance burden by having just one
function.

If I can't assume that casting bool to int always results in 0 or 1,
would it be acceptable to keep the return type as int?

Would this work for everyone?

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Yury Norov 11 months ago
On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> > >
> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >> >> (int)true most definitely is guaranteed to be 1.  
> > >> >
> > >> >That's not technically correct any more.
> > >> >
> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> > >> >other than 0 and 1.
> > >> >
> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > >> >
> > >> >~Andrew  
> > >> 
> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> > >> or anything even remotely like it.
> > >> 
> > >
> > >The whole idea of 'bool' is pretty much broken by design.
> > >The underlying problem is that values other than 'true' and 'false' can
> > >always get into 'bool' variables.
> > >
> > >Once that has happened it is all fubar.
> > >
> > >Trying to sanitise a value with (say):
> > >int f(bool v)
> > >{
> > >	return (int)v & 1;
> > >}    
> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > >
> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> > >from that.
> > >
> > >	David
> > 
> > Did you just discover GIGO?
> 
> Thanks for all the suggestions.
> 
> I don't have a strong opinion on the naming or return type. I'm still a
> bit confused about whether I can assume that casting bool to int always
> results in 0 or 1.
> 
> If that's the case, since most people prefer bool over int as the
> return type and some are against introducing u1, my current plan is to
> use the following in the next version:
> 
> bool parity_odd(u64 val);
> 
> This keeps the bool return type, renames the function for better
> clarity, and avoids extra maintenance burden by having just one
> function.
> 
> If I can't assume that casting bool to int always results in 0 or 1,
> would it be acceptable to keep the return type as int?
> 
> Would this work for everyone?

Alright, it's clearly a split opinion. So what I would do myself in
such case is to look at existing code and see what people who really
need parity invent in their drivers:

                                     bool      parity_odd
static inline int parity8(u8 val)       -               -
static u8 calc_parity(u8 val)           -               -
static int odd_parity(u8 c)             -               +
static int saa711x_odd_parity           -               +
static int max3100_do_parity            -               -
static inline int parity(unsigned x)    -               -
static int bit_parity(u32 pkt)          -               -
static int oa_tc6_get_parity(u32 p)     -               -
static u32 parity32(__le32 data)        -               -
static u32 parity(u32 sample)           -               -
static int get_parity(int number,       -               -
                      int size)
static bool i2cr_check_parity32(u32 v,  +               -
                        bool parity)
static bool i2cr_check_parity64(u64 v)  +               -
static int sw_parity(__u64 t)           -               -
static bool parity(u64 value)           +               -

Now you can refer to that table say that int parity(uXX) is what
people want to see in their drivers.

Whichever interface you choose, please discuss it's pros and cons.
What bloat-o-meter says for each option? What's maintenance burden?
Perf test? Look at generated code?

I personally for a macro returning boolean, something like I
proposed at the very beginning.

Thanks,
Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months ago
On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
>On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
>> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>> > >On Fri, 07 Mar 2025 11:30:35 -0800
>> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
>> > >
>> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> > >> >> (int)true most definitely is guaranteed to be 1.  
>> > >> >
>> > >> >That's not technically correct any more.
>> > >> >
>> > >> >GCC has introduced hardened bools that intentionally have bit patterns
>> > >> >other than 0 and 1.
>> > >> >
>> > >> >https://gcc.gnu.org/gcc-14/changes.html
>> > >> >
>> > >> >~Andrew  
>> > >> 
>> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
>> > >> for compiler-generated conversations that's still a given, or the manager isn't C
>> > >> or anything even remotely like it.
>> > >> 
>> > >
>> > >The whole idea of 'bool' is pretty much broken by design.
>> > >The underlying problem is that values other than 'true' and 'false' can
>> > >always get into 'bool' variables.
>> > >
>> > >Once that has happened it is all fubar.
>> > >
>> > >Trying to sanitise a value with (say):
>> > >int f(bool v)
>> > >{
>> > >	return (int)v & 1;
>> > >}    
>> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> > >
>> > >I really don't see how using (say) 0xaa and 0x55 helps.
>> > >What happens if the value is wrong? a trap or exception?, good luck recovering
>> > >from that.
>> > >
>> > >	David
>> > 
>> > Did you just discover GIGO?
>> 
>> Thanks for all the suggestions.
>> 
>> I don't have a strong opinion on the naming or return type. I'm still a
>> bit confused about whether I can assume that casting bool to int always
>> results in 0 or 1.
>> 
>> If that's the case, since most people prefer bool over int as the
>> return type and some are against introducing u1, my current plan is to
>> use the following in the next version:
>> 
>> bool parity_odd(u64 val);
>> 
>> This keeps the bool return type, renames the function for better
>> clarity, and avoids extra maintenance burden by having just one
>> function.
>> 
>> If I can't assume that casting bool to int always results in 0 or 1,
>> would it be acceptable to keep the return type as int?
>> 
>> Would this work for everyone?
>
>Alright, it's clearly a split opinion. So what I would do myself in
>such case is to look at existing code and see what people who really
>need parity invent in their drivers:
>
>                                     bool      parity_odd
>static inline int parity8(u8 val)       -               -
>static u8 calc_parity(u8 val)           -               -
>static int odd_parity(u8 c)             -               +
>static int saa711x_odd_parity           -               +
>static int max3100_do_parity            -               -
>static inline int parity(unsigned x)    -               -
>static int bit_parity(u32 pkt)          -               -
>static int oa_tc6_get_parity(u32 p)     -               -
>static u32 parity32(__le32 data)        -               -
>static u32 parity(u32 sample)           -               -
>static int get_parity(int number,       -               -
>                      int size)
>static bool i2cr_check_parity32(u32 v,  +               -
>                        bool parity)
>static bool i2cr_check_parity64(u64 v)  +               -
>static int sw_parity(__u64 t)           -               -
>static bool parity(u64 value)           +               -
>
>Now you can refer to that table say that int parity(uXX) is what
>people want to see in their drivers.
>
>Whichever interface you choose, please discuss it's pros and cons.
>What bloat-o-meter says for each option? What's maintenance burden?
>Perf test? Look at generated code?
>
>I personally for a macro returning boolean, something like I
>proposed at the very beginning.
>
>Thanks,
>Yury

Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Yury Norov 11 months ago
On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> >> > >
> >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> > >> >> (int)true most definitely is guaranteed to be 1.  
> >> > >> >
> >> > >> >That's not technically correct any more.
> >> > >> >
> >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> >> > >> >other than 0 and 1.
> >> > >> >
> >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> >> > >> >
> >> > >> >~Andrew  
> >> > >> 
> >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> >> > >> or anything even remotely like it.
> >> > >> 
> >> > >
> >> > >The whole idea of 'bool' is pretty much broken by design.
> >> > >The underlying problem is that values other than 'true' and 'false' can
> >> > >always get into 'bool' variables.
> >> > >
> >> > >Once that has happened it is all fubar.
> >> > >
> >> > >Trying to sanitise a value with (say):
> >> > >int f(bool v)
> >> > >{
> >> > >	return (int)v & 1;
> >> > >}    
> >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> >> > >
> >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> >> > >from that.
> >> > >
> >> > >	David
> >> > 
> >> > Did you just discover GIGO?
> >> 
> >> Thanks for all the suggestions.
> >> 
> >> I don't have a strong opinion on the naming or return type. I'm still a
> >> bit confused about whether I can assume that casting bool to int always
> >> results in 0 or 1.
> >> 
> >> If that's the case, since most people prefer bool over int as the
> >> return type and some are against introducing u1, my current plan is to
> >> use the following in the next version:
> >> 
> >> bool parity_odd(u64 val);
> >> 
> >> This keeps the bool return type, renames the function for better
> >> clarity, and avoids extra maintenance burden by having just one
> >> function.
> >> 
> >> If I can't assume that casting bool to int always results in 0 or 1,
> >> would it be acceptable to keep the return type as int?
> >> 
> >> Would this work for everyone?
> >
> >Alright, it's clearly a split opinion. So what I would do myself in
> >such case is to look at existing code and see what people who really
> >need parity invent in their drivers:
> >
> >                                     bool      parity_odd
> >static inline int parity8(u8 val)       -               -
> >static u8 calc_parity(u8 val)           -               -
> >static int odd_parity(u8 c)             -               +
> >static int saa711x_odd_parity           -               +
> >static int max3100_do_parity            -               -
> >static inline int parity(unsigned x)    -               -
> >static int bit_parity(u32 pkt)          -               -
> >static int oa_tc6_get_parity(u32 p)     -               -
> >static u32 parity32(__le32 data)        -               -
> >static u32 parity(u32 sample)           -               -
> >static int get_parity(int number,       -               -
> >                      int size)
> >static bool i2cr_check_parity32(u32 v,  +               -
> >                        bool parity)
> >static bool i2cr_check_parity64(u64 v)  +               -
> >static int sw_parity(__u64 t)           -               -
> >static bool parity(u64 value)           +               -
> >
> >Now you can refer to that table say that int parity(uXX) is what
> >people want to see in their drivers.
> >
> >Whichever interface you choose, please discuss it's pros and cons.
> >What bloat-o-meter says for each option? What's maintenance burden?
> >Perf test? Look at generated code?
> >
> >I personally for a macro returning boolean, something like I
> >proposed at the very beginning.
> >
> >Thanks,
> >Yury
> 
> Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.

Yeah. And because linux/bitops.h already includes asm/bitops.h
the simplest way would be wrapping generic implementation with
the #ifndef parity, similarly to how we handle find_next_bit case.

So:
1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
2. This may, and probably should, be a separate follow-up series,
   likely created by corresponding arch experts.

Thanks,
Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> > >> > >
> > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> > >> > >> >
> > >> > >> >That's not technically correct any more.
> > >> > >> >
> > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> > >> > >> >other than 0 and 1.
> > >> > >> >
> > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > >> > >> >
> > >> > >> >~Andrew  
> > >> > >> 
> > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> > >> > >> or anything even remotely like it.
> > >> > >> 
> > >> > >
> > >> > >The whole idea of 'bool' is pretty much broken by design.
> > >> > >The underlying problem is that values other than 'true' and 'false' can
> > >> > >always get into 'bool' variables.
> > >> > >
> > >> > >Once that has happened it is all fubar.
> > >> > >
> > >> > >Trying to sanitise a value with (say):
> > >> > >int f(bool v)
> > >> > >{
> > >> > >	return (int)v & 1;
> > >> > >}    
> > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > >> > >
> > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> > >> > >from that.
> > >> > >
> > >> > >	David
> > >> > 
> > >> > Did you just discover GIGO?
> > >> 
> > >> Thanks for all the suggestions.
> > >> 
> > >> I don't have a strong opinion on the naming or return type. I'm still a
> > >> bit confused about whether I can assume that casting bool to int always
> > >> results in 0 or 1.
> > >> 
> > >> If that's the case, since most people prefer bool over int as the
> > >> return type and some are against introducing u1, my current plan is to
> > >> use the following in the next version:
> > >> 
> > >> bool parity_odd(u64 val);
> > >> 
> > >> This keeps the bool return type, renames the function for better
> > >> clarity, and avoids extra maintenance burden by having just one
> > >> function.
> > >> 
> > >> If I can't assume that casting bool to int always results in 0 or 1,
> > >> would it be acceptable to keep the return type as int?
> > >> 
> > >> Would this work for everyone?
> > >
> > >Alright, it's clearly a split opinion. So what I would do myself in
> > >such case is to look at existing code and see what people who really
> > >need parity invent in their drivers:
> > >
> > >                                     bool      parity_odd
> > >static inline int parity8(u8 val)       -               -
> > >static u8 calc_parity(u8 val)           -               -
> > >static int odd_parity(u8 c)             -               +
> > >static int saa711x_odd_parity           -               +
> > >static int max3100_do_parity            -               -
> > >static inline int parity(unsigned x)    -               -
> > >static int bit_parity(u32 pkt)          -               -
> > >static int oa_tc6_get_parity(u32 p)     -               -
> > >static u32 parity32(__le32 data)        -               -
> > >static u32 parity(u32 sample)           -               -
> > >static int get_parity(int number,       -               -
> > >                      int size)
> > >static bool i2cr_check_parity32(u32 v,  +               -
> > >                        bool parity)
> > >static bool i2cr_check_parity64(u64 v)  +               -
> > >static int sw_parity(__u64 t)           -               -
> > >static bool parity(u64 value)           +               -
> > >
> > >Now you can refer to that table say that int parity(uXX) is what
> > >people want to see in their drivers.
> > >
> > >Whichever interface you choose, please discuss it's pros and cons.
> > >What bloat-o-meter says for each option? What's maintenance burden?
> > >Perf test? Look at generated code?
> > >
> > >I personally for a macro returning boolean, something like I
> > >proposed at the very beginning.
> > >
> > >Thanks,
> > >Yury
> > 
> > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
> 
> Yeah. And because linux/bitops.h already includes asm/bitops.h
> the simplest way would be wrapping generic implementation with
> the #ifndef parity, similarly to how we handle find_next_bit case.
> 
> So:
> 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> 2. This may, and probably should, be a separate follow-up series,
>    likely created by corresponding arch experts.
> 
I saw discussions in the previous email thread about both
__builtin_parity and x86-specific implementations. However, from the
discussion, I learned that before considering any optimization, we
should first ask: which driver or subsystem actually cares about parity
efficiency? If someone does, I can help with a micro-benchmark to
provide performance numbers, but I don't have enough domain knowledge
to identify hot paths where parity efficiency matters.

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
> On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> > > >> > >
> > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> > > >> > >> >
> > > >> > >> >That's not technically correct any more.
> > > >> > >> >
> > > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> > > >> > >> >other than 0 and 1.
> > > >> > >> >
> > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > > >> > >> >
> > > >> > >> >~Andrew  
> > > >> > >> 
> > > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> > > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> > > >> > >> or anything even remotely like it.
> > > >> > >> 
> > > >> > >
> > > >> > >The whole idea of 'bool' is pretty much broken by design.
> > > >> > >The underlying problem is that values other than 'true' and 'false' can
> > > >> > >always get into 'bool' variables.
> > > >> > >
> > > >> > >Once that has happened it is all fubar.
> > > >> > >
> > > >> > >Trying to sanitise a value with (say):
> > > >> > >int f(bool v)
> > > >> > >{
> > > >> > >	return (int)v & 1;
> > > >> > >}    
> > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > > >> > >
> > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> > > >> > >from that.
> > > >> > >
> > > >> > >	David
> > > >> > 
> > > >> > Did you just discover GIGO?
> > > >> 
> > > >> Thanks for all the suggestions.
> > > >> 
> > > >> I don't have a strong opinion on the naming or return type. I'm still a
> > > >> bit confused about whether I can assume that casting bool to int always
> > > >> results in 0 or 1.
> > > >> 
> > > >> If that's the case, since most people prefer bool over int as the
> > > >> return type and some are against introducing u1, my current plan is to
> > > >> use the following in the next version:
> > > >> 
> > > >> bool parity_odd(u64 val);
> > > >> 
> > > >> This keeps the bool return type, renames the function for better
> > > >> clarity, and avoids extra maintenance burden by having just one
> > > >> function.
> > > >> 
> > > >> If I can't assume that casting bool to int always results in 0 or 1,
> > > >> would it be acceptable to keep the return type as int?
> > > >> 
> > > >> Would this work for everyone?
> > > >
> > > >Alright, it's clearly a split opinion. So what I would do myself in
> > > >such case is to look at existing code and see what people who really
> > > >need parity invent in their drivers:
> > > >
> > > >                                     bool      parity_odd
> > > >static inline int parity8(u8 val)       -               -
> > > >static u8 calc_parity(u8 val)           -               -
> > > >static int odd_parity(u8 c)             -               +
> > > >static int saa711x_odd_parity           -               +
> > > >static int max3100_do_parity            -               -
> > > >static inline int parity(unsigned x)    -               -
> > > >static int bit_parity(u32 pkt)          -               -
> > > >static int oa_tc6_get_parity(u32 p)     -               -
> > > >static u32 parity32(__le32 data)        -               -
> > > >static u32 parity(u32 sample)           -               -
> > > >static int get_parity(int number,       -               -
> > > >                      int size)
> > > >static bool i2cr_check_parity32(u32 v,  +               -
> > > >                        bool parity)
> > > >static bool i2cr_check_parity64(u64 v)  +               -
> > > >static int sw_parity(__u64 t)           -               -
> > > >static bool parity(u64 value)           +               -
> > > >
> > > >Now you can refer to that table say that int parity(uXX) is what
> > > >people want to see in their drivers.
> > > >
> > > >Whichever interface you choose, please discuss it's pros and cons.
> > > >What bloat-o-meter says for each option? What's maintenance burden?
> > > >Perf test? Look at generated code?
> > > >
> > > >I personally for a macro returning boolean, something like I
> > > >proposed at the very beginning.
> > > >
> > > >Thanks,
> > > >Yury
> > > 
> > > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
> > 
> > Yeah. And because linux/bitops.h already includes asm/bitops.h
> > the simplest way would be wrapping generic implementation with
> > the #ifndef parity, similarly to how we handle find_next_bit case.
> > 
> > So:
> > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> > 2. This may, and probably should, be a separate follow-up series,
> >    likely created by corresponding arch experts.
> > 
> I saw discussions in the previous email thread about both
> __builtin_parity and x86-specific implementations. However, from the
> discussion, I learned that before considering any optimization, we
> should first ask: which driver or subsystem actually cares about parity
> efficiency? If someone does, I can help with a micro-benchmark to
> provide performance numbers, but I don't have enough domain knowledge
> to identify hot paths where parity efficiency matters.
> 
IMHO,

If parity is never used in any hot path and we don't care about parity:

Then benchmarking its performance seems meaningless. In this case, a
function with a u64 argument would suffice, and we might not even need
a macro to optimize for different types—especially since the macro
requires special hacks to avoid compiler warnings. Also, I don't think
code size matters here. If it does, we should first consider making
parity a non-inline function in a .c file rather than an inline
function/macro in a header.

If parity is used in a hot path:

We need different handling for different type sizes. As previously
discussed, x86 assembly might use different instructions for u8 and
u16. This may sound stubborn, but I want to ask again: should we
consider using parity8/16/32/64 interfaces? Like in the i3c driver
example, if we only have a single parity macro that selects an
implementation based on type size, users must explicitly cast types.
If future users also need parity in a hot path, they might not be aware
of this requirement and end up generating suboptimal code. Since we
care about efficiency and generated code, why not follow hweight() and
provide separate implementations for different sizes?

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 2 weeks ago
On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > > > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> > > > >> > >
> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> > > > >> > >> >
> > > > >> > >> >That's not technically correct any more.
> > > > >> > >> >
> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> > > > >> > >> >other than 0 and 1.
> > > > >> > >> >
> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > > > >> > >> >
> > > > >> > >> >~Andrew  
> > > > >> > >> 
> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> > > > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> > > > >> > >> or anything even remotely like it.
> > > > >> > >> 
> > > > >> > >
> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
> > > > >> > >The underlying problem is that values other than 'true' and 'false' can
> > > > >> > >always get into 'bool' variables.
> > > > >> > >
> > > > >> > >Once that has happened it is all fubar.
> > > > >> > >
> > > > >> > >Trying to sanitise a value with (say):
> > > > >> > >int f(bool v)
> > > > >> > >{
> > > > >> > >	return (int)v & 1;
> > > > >> > >}    
> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > > > >> > >
> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > > > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> > > > >> > >from that.
> > > > >> > >
> > > > >> > >	David
> > > > >> > 
> > > > >> > Did you just discover GIGO?
> > > > >> 
> > > > >> Thanks for all the suggestions.
> > > > >> 
> > > > >> I don't have a strong opinion on the naming or return type. I'm still a
> > > > >> bit confused about whether I can assume that casting bool to int always
> > > > >> results in 0 or 1.
> > > > >> 
> > > > >> If that's the case, since most people prefer bool over int as the
> > > > >> return type and some are against introducing u1, my current plan is to
> > > > >> use the following in the next version:
> > > > >> 
> > > > >> bool parity_odd(u64 val);
> > > > >> 
> > > > >> This keeps the bool return type, renames the function for better
> > > > >> clarity, and avoids extra maintenance burden by having just one
> > > > >> function.
> > > > >> 
> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
> > > > >> would it be acceptable to keep the return type as int?
> > > > >> 
> > > > >> Would this work for everyone?
> > > > >
> > > > >Alright, it's clearly a split opinion. So what I would do myself in
> > > > >such case is to look at existing code and see what people who really
> > > > >need parity invent in their drivers:
> > > > >
> > > > >                                     bool      parity_odd
> > > > >static inline int parity8(u8 val)       -               -
> > > > >static u8 calc_parity(u8 val)           -               -
> > > > >static int odd_parity(u8 c)             -               +
> > > > >static int saa711x_odd_parity           -               +
> > > > >static int max3100_do_parity            -               -
> > > > >static inline int parity(unsigned x)    -               -
> > > > >static int bit_parity(u32 pkt)          -               -
> > > > >static int oa_tc6_get_parity(u32 p)     -               -
> > > > >static u32 parity32(__le32 data)        -               -
> > > > >static u32 parity(u32 sample)           -               -
> > > > >static int get_parity(int number,       -               -
> > > > >                      int size)
> > > > >static bool i2cr_check_parity32(u32 v,  +               -
> > > > >                        bool parity)
> > > > >static bool i2cr_check_parity64(u64 v)  +               -
> > > > >static int sw_parity(__u64 t)           -               -
> > > > >static bool parity(u64 value)           +               -
> > > > >
> > > > >Now you can refer to that table say that int parity(uXX) is what
> > > > >people want to see in their drivers.
> > > > >
> > > > >Whichever interface you choose, please discuss it's pros and cons.
> > > > >What bloat-o-meter says for each option? What's maintenance burden?
> > > > >Perf test? Look at generated code?
> > > > >
> > > > >I personally for a macro returning boolean, something like I
> > > > >proposed at the very beginning.
> > > > >
> > > > >Thanks,
> > > > >Yury
> > > > 
> > > > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
> > > 
> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
> > > the simplest way would be wrapping generic implementation with
> > > the #ifndef parity, similarly to how we handle find_next_bit case.
> > > 
> > > So:
> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> > > 2. This may, and probably should, be a separate follow-up series,
> > >    likely created by corresponding arch experts.
> > > 
> > I saw discussions in the previous email thread about both
> > __builtin_parity and x86-specific implementations. However, from the
> > discussion, I learned that before considering any optimization, we
> > should first ask: which driver or subsystem actually cares about parity
> > efficiency? If someone does, I can help with a micro-benchmark to
> > provide performance numbers, but I don't have enough domain knowledge
> > to identify hot paths where parity efficiency matters.
> > 
> IMHO,
> 
> If parity is never used in any hot path and we don't care about parity:
> 
> Then benchmarking its performance seems meaningless. In this case, a
> function with a u64 argument would suffice, and we might not even need
> a macro to optimize for different types—especially since the macro
> requires special hacks to avoid compiler warnings. Also, I don't think
> code size matters here. If it does, we should first consider making
> parity a non-inline function in a .c file rather than an inline
> function/macro in a header.
> 
> If parity is used in a hot path:
> 
> We need different handling for different type sizes. As previously
> discussed, x86 assembly might use different instructions for u8 and
> u16. This may sound stubborn, but I want to ask again: should we
> consider using parity8/16/32/64 interfaces? Like in the i3c driver
> example, if we only have a single parity macro that selects an
> implementation based on type size, users must explicitly cast types.
> If future users also need parity in a hot path, they might not be aware
> of this requirement and end up generating suboptimal code. Since we
> care about efficiency and generated code, why not follow hweight() and
> provide separate implementations for different sizes?
> 
It seems no one will reply to my two emails. So, I have summarized
different interface approaches. If there is a next version, I will send
it after the merge window closes.

Interface 1: Single Function
Description: bool parity_odd(u64)
Pros: Minimal maintenance cost
Cons: Difficult to integrate with architecture-specific implementations
      due to the inability to optimize for different argument sizes
Opinions: Jiri supports this approach

Interface 2: Single Macro
Description: parity_odd() macro
Pros: Allows type-specific implementation
Cons: Requires hacks to avoid warnings; users may need explicit
      casting; potential sub-optimal code on 32-bit x86
Opinions: Yury supports this approach

Interface 3: Multiple Functions
Description: bool parity_odd8/16/32/64()
Pros: No need for explicit casting; easy to integrate
      architecture-specific optimizations; except for parity8(), all
      functions are one-liners with no significant code duplication
Cons: More functions may increase maintenance burden
Opinions: Only I support this approach

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 10 months, 2 weeks ago
On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> 
> Interface 3: Multiple Functions
> Description: bool parity_odd8/16/32/64()
> Pros: No need for explicit casting; easy to integrate
>        architecture-specific optimizations; except for parity8(), all
>        functions are one-liners with no significant code duplication
> Cons: More functions may increase maintenance burden
> Opinions: Only I support this approach
> 

OK, so I responded to this but I can't find my reply or any of the 
followups, so let me go again:

I prefer this option, because:

a. Virtually all uses of parity is done in contexts where the sizes of 
the items for which parity is to be taken are well-defined, but it is 
*really* easy for integer promotion to cause a value to be extended to 
32 bits unnecessarily (sign or zero extend, although for parity it 
doesn't make any difference -- if the compiler realizes it.)

b. It makes it easier to add arch-specific implementations, notably 
using __builtin_parity on architectures where that is known to generate 
good code.

c. For architectures where only *some* parity implementations are 
fast/practical, the generic fallbacks will either naturally synthesize 
them from components via shift-xor, or they can be defined to use a 
larger version; the function prototype acts like a cast.

d. If there is a reason in the future to add a generic version, it is 
really easy to do using the size-specific functions as components; this 
is something we do literally all over the place, using a pattern so 
common that it, itself, probably should be macroized:

#define parity(x) 				\
({						\
	typeof(x) __x = (x);			\
	bool __y;				\
	switch (sizeof(__x)) {			\
		case 1:				\
			__y = parity8(__x);	\
			break;			\
		case 2:				\
			__y = parity16(__x);	\
			break;			\
		case 4:				\
			__y = parity32(__x);	\
			break;			\
		case 8:				\
			__y = parity64(__x);	\
			break;			\
		default:			\
			BUILD_BUG();		\
			break;			\
	}					\
	__y;					\
})
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 1 week ago
On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > 
> > Interface 3: Multiple Functions
> > Description: bool parity_odd8/16/32/64()
> > Pros: No need for explicit casting; easy to integrate
> >        architecture-specific optimizations; except for parity8(), all
> >        functions are one-liners with no significant code duplication
> > Cons: More functions may increase maintenance burden
> > Opinions: Only I support this approach
> > 
> 
> OK, so I responded to this but I can't find my reply or any of the
> followups, so let me go again:
> 
> I prefer this option, because:
> 
> a. Virtually all uses of parity is done in contexts where the sizes of the
> items for which parity is to be taken are well-defined, but it is *really*
> easy for integer promotion to cause a value to be extended to 32 bits
> unnecessarily (sign or zero extend, although for parity it doesn't make any
> difference -- if the compiler realizes it.)
> 
> b. It makes it easier to add arch-specific implementations, notably using
> __builtin_parity on architectures where that is known to generate good code.
> 
> c. For architectures where only *some* parity implementations are
> fast/practical, the generic fallbacks will either naturally synthesize them
> from components via shift-xor, or they can be defined to use a larger
> version; the function prototype acts like a cast.
> 
> d. If there is a reason in the future to add a generic version, it is really
> easy to do using the size-specific functions as components; this is
> something we do literally all over the place, using a pattern so common that
> it, itself, probably should be macroized:
> 
> #define parity(x) 				\
> ({						\
> 	typeof(x) __x = (x);			\
> 	bool __y;				\
> 	switch (sizeof(__x)) {			\
> 		case 1:				\
> 			__y = parity8(__x);	\
> 			break;			\
> 		case 2:				\
> 			__y = parity16(__x);	\
> 			break;			\
> 		case 4:				\
> 			__y = parity32(__x);	\
> 			break;			\
> 		case 8:				\
> 			__y = parity64(__x);	\
> 			break;			\
> 		default:			\
> 			BUILD_BUG();		\
> 			break;			\
> 	}					\
> 	__y;					\
> })
>
Thank you for your detailed response and for explaining the rationale
behind your preference. The points you outlined in (a)–(d) all seem
quite reasonable to me.

Yury,
do you have any feedback on this?
Thank you.

Regards,
Kuan-Wei

Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Yury Norov 10 months, 1 week ago
On Thu, Apr 03, 2025 at 10:39:03PM +0800, Kuan-Wei Chiu wrote:
> On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> > On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > > 
> > > Interface 3: Multiple Functions
> > > Description: bool parity_odd8/16/32/64()
> > > Pros: No need for explicit casting; easy to integrate
> > >        architecture-specific optimizations; except for parity8(), all
> > >        functions are one-liners with no significant code duplication
> > > Cons: More functions may increase maintenance burden
> > > Opinions: Only I support this approach
> > > 
> > 
> > OK, so I responded to this but I can't find my reply or any of the
> > followups, so let me go again:
> > 
> > I prefer this option, because:
> > 
> > a. Virtually all uses of parity is done in contexts where the sizes of the
> > items for which parity is to be taken are well-defined, but it is *really*
> > easy for integer promotion to cause a value to be extended to 32 bits
> > unnecessarily (sign or zero extend, although for parity it doesn't make any
> > difference -- if the compiler realizes it.)
> > 
> > b. It makes it easier to add arch-specific implementations, notably using
> > __builtin_parity on architectures where that is known to generate good code.
> > 
> > c. For architectures where only *some* parity implementations are
> > fast/practical, the generic fallbacks will either naturally synthesize them
> > from components via shift-xor, or they can be defined to use a larger
> > version; the function prototype acts like a cast.
> > 
> > d. If there is a reason in the future to add a generic version, it is really
> > easy to do using the size-specific functions as components; this is
> > something we do literally all over the place, using a pattern so common that
> > it, itself, probably should be macroized:
> > 
> > #define parity(x) 				\
> > ({						\
> > 	typeof(x) __x = (x);			\
> > 	bool __y;				\
> > 	switch (sizeof(__x)) {			\
> > 		case 1:				\
> > 			__y = parity8(__x);	\
> > 			break;			\
> > 		case 2:				\
> > 			__y = parity16(__x);	\
> > 			break;			\
> > 		case 4:				\
> > 			__y = parity32(__x);	\
> > 			break;			\
> > 		case 8:				\
> > 			__y = parity64(__x);	\
> > 			break;			\
> > 		default:			\
> > 			BUILD_BUG();		\
> > 			break;			\
> > 	}					\
> > 	__y;					\
> > })
> >
> Thank you for your detailed response and for explaining the rationale
> behind your preference. The points you outlined in (a)–(d) all seem
> quite reasonable to me.
> 
> Yury,
> do you have any feedback on this?
> Thank you.

My feedback to you:

I asked you to share any numbers about each approach. Asm listings,
performance tests, bloat-o-meter. But you did nothing or very little
in that department. You move this series, and it means you should be
very well aware of alternative solutions, their pros and cons.

Instead, you started a poll to pick the best solution. This is not
what I expected, and this is not how the best solution can be found.

To H. Peter and everyone:

Thank you for sharing your opinion on this fixed parity(). Your
arguments may or may not be important, depending on what existing
users actually need. Unfortunately, Kuan-Wei didn't collect
performance numbers and opinions from those proposed users.

I already told that, and I will say again: with the lack of any
evidence that performance and/or code generation is important here,
the best solution is one that minimizes maintainers' (my!) burden.

In other words, bool parity(unsigned long long). I'm OK to maintain
a macro, as well. I understand that more complicated solutions may be
more effective. I will take them only if they will be well advocated.

I hope this will help us to stop moving this discussion back and forth
and save our time, guys.

Thanks,
Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 1 week ago
On Thu, Apr 03, 2025 at 12:14:04PM -0400, Yury Norov wrote:
> On Thu, Apr 03, 2025 at 10:39:03PM +0800, Kuan-Wei Chiu wrote:
> > On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> > > On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > > > 
> > > > Interface 3: Multiple Functions
> > > > Description: bool parity_odd8/16/32/64()
> > > > Pros: No need for explicit casting; easy to integrate
> > > >        architecture-specific optimizations; except for parity8(), all
> > > >        functions are one-liners with no significant code duplication
> > > > Cons: More functions may increase maintenance burden
> > > > Opinions: Only I support this approach
> > > > 
> > > 
> > > OK, so I responded to this but I can't find my reply or any of the
> > > followups, so let me go again:
> > > 
> > > I prefer this option, because:
> > > 
> > > a. Virtually all uses of parity is done in contexts where the sizes of the
> > > items for which parity is to be taken are well-defined, but it is *really*
> > > easy for integer promotion to cause a value to be extended to 32 bits
> > > unnecessarily (sign or zero extend, although for parity it doesn't make any
> > > difference -- if the compiler realizes it.)
> > > 
> > > b. It makes it easier to add arch-specific implementations, notably using
> > > __builtin_parity on architectures where that is known to generate good code.
> > > 
> > > c. For architectures where only *some* parity implementations are
> > > fast/practical, the generic fallbacks will either naturally synthesize them
> > > from components via shift-xor, or they can be defined to use a larger
> > > version; the function prototype acts like a cast.
> > > 
> > > d. If there is a reason in the future to add a generic version, it is really
> > > easy to do using the size-specific functions as components; this is
> > > something we do literally all over the place, using a pattern so common that
> > > it, itself, probably should be macroized:
> > > 
> > > #define parity(x) 				\
> > > ({						\
> > > 	typeof(x) __x = (x);			\
> > > 	bool __y;				\
> > > 	switch (sizeof(__x)) {			\
> > > 		case 1:				\
> > > 			__y = parity8(__x);	\
> > > 			break;			\
> > > 		case 2:				\
> > > 			__y = parity16(__x);	\
> > > 			break;			\
> > > 		case 4:				\
> > > 			__y = parity32(__x);	\
> > > 			break;			\
> > > 		case 8:				\
> > > 			__y = parity64(__x);	\
> > > 			break;			\
> > > 		default:			\
> > > 			BUILD_BUG();		\
> > > 			break;			\
> > > 	}					\
> > > 	__y;					\
> > > })
> > >
> > Thank you for your detailed response and for explaining the rationale
> > behind your preference. The points you outlined in (a)–(d) all seem
> > quite reasonable to me.
> > 
> > Yury,
> > do you have any feedback on this?
> > Thank you.
> 
> My feedback to you:
> 
> I asked you to share any numbers about each approach. Asm listings,
> performance tests, bloat-o-meter. But you did nothing or very little
> in that department. You move this series, and it means you should be
> very well aware of alternative solutions, their pros and cons.
> 
It seems the concern is that I didn't provide assembly results and
performance numbers. While I believe that listing these numbers alone
cannot prove which users really care about parity efficiency, I have
included the assembly results and my initial observations below. Some
differences, like mov vs movzh, are likely difficult to measure.

Compilation on x86-64 using GCC 14.2 with O2 Optimization:

Link to Godbolt: https://godbolt.org/z/EsqPMz8cq

For u8 Input:
- #2 and #3 generate exactly the same assembly code, while #1 replaces
  one `mov` instruction with `movzh`, which may slightly slow down the
  performance due to zero extension.
- Efficiency: #2 = #3 > #1

For u16 Input:
- As with u8 input, #1 performs an unnecessary zero extension, while #3
  replaces one of the `shr` instructions in #2 with a `mov`, making it
  slightly faster.
- Efficiency: #3 > #2 > #1

For u32 Input:
- #1 has an additional `mov` instruction compared to #2, and #2 has an
  extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u64 Input:
- #1 and #2 generate the same code, but #3 has one less `shr`
  instruction compared to the others.
- Efficiency: #3 > #1 = #2

---

Adding -m32 Flag to View Assembly for 32-bit Machine:

Link to Godbolt: https://godbolt.org/z/GrPa86Eq5

For u8 Input:
- #2 and #3 generate identical assembly code, whereas #1 has additional
  `mov`, `shr`, and `push/pop` instructions.
- Efficiency: #2 = #3 > #1

For u16 Input:
- #1 uses a lot of `xmm` register operations, making it slower than #2
  and #3. Additionally, #2 has an extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u32 Input:
- #1 again uses a lot of `xmm` register operations, so it is slower
  than #2 and #3, and #2 has an additional `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u64 Input:
- Both #1 and #2 use `xmm` register operations, but #1 has a few extra
  `movdqa` instructions. #3 is more concise, using a few `shr`, `xor`,
  and `mov` instructions to complete the operation.
- Efficiency: #3 > #2 > #1

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Jeremy Kerr 10 months, 1 week ago
Hi Yuri & Kuan-Wei:

> Thank you for sharing your opinion on this fixed parity(). Your
> arguments may or may not be important, depending on what existing
> users actually need. Unfortunately, Kuan-Wei didn't collect
> performance numbers and opinions from those proposed users.

For the fsi-i2c side: this isn't a performance-critical path, and any
reasonable common approach would likely perform better that the current
per-bit implementation.

Our common targets for this driver would be arm and powerpc64le. In case
it's useful as a reference, using the kernel compilers I have to hand, a
__builtin_parity() is a library call on the former, and a two-instruction
sequence for the latter.

Cheers,


Jeremy
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 1 week ago
Hi Jeremy,

On Fri, Apr 04, 2025 at 10:51:55AM +0800, Jeremy Kerr wrote:
> Hi Yuri & Kuan-Wei:
> 
> > Thank you for sharing your opinion on this fixed parity(). Your
> > arguments may or may not be important, depending on what existing
> > users actually need. Unfortunately, Kuan-Wei didn't collect
> > performance numbers and opinions from those proposed users.
> 
> For the fsi-i2c side: this isn't a performance-critical path, and any
> reasonable common approach would likely perform better that the current
> per-bit implementation.
> 
> Our common targets for this driver would be arm and powerpc64le. In case
> it's useful as a reference, using the kernel compilers I have to hand, a
> __builtin_parity() is a library call on the former, and a two-instruction
> sequence for the latter.
> 
Thanks for your feedback.

IIUC, from the fsi-i2c perspective, parity efficiency isn't a major
concern, but you still prefer optimizing with methods like
__builtin_parity(). I'm just unsure if this aligns with Yury's point
about needing "evidence that performance and/or code generation is
important here."

Regards,
Kuna-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Jeremy Kerr 10 months, 1 week ago
Hi Kuan-Wei,

> Thanks for your feedback.

No problem!

> IIUC, from the fsi-i2c perspective, parity efficiency isn't a major
> concern,

Yes

> but you still prefer optimizing with methods like __builtin_parity().

No, it's not really about optimisation. In the case of this driver, my
preference would be more directed to using common code (either in the
form of these changes, or __builtin_parity) rather than any performance
considerations.

The implementation details I gave was more a note about the platforms
that are applicable for the driver.

Cheers,


Jeremy
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 1 week ago
On Thu, Apr 03, 2025 at 12:14:04PM -0400, Yury Norov wrote:
> On Thu, Apr 03, 2025 at 10:39:03PM +0800, Kuan-Wei Chiu wrote:
> > On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> > > On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > > > 
> > > > Interface 3: Multiple Functions
> > > > Description: bool parity_odd8/16/32/64()
> > > > Pros: No need for explicit casting; easy to integrate
> > > >        architecture-specific optimizations; except for parity8(), all
> > > >        functions are one-liners with no significant code duplication
> > > > Cons: More functions may increase maintenance burden
> > > > Opinions: Only I support this approach
> > > > 
> > > 
> > > OK, so I responded to this but I can't find my reply or any of the
> > > followups, so let me go again:
> > > 
> > > I prefer this option, because:
> > > 
> > > a. Virtually all uses of parity is done in contexts where the sizes of the
> > > items for which parity is to be taken are well-defined, but it is *really*
> > > easy for integer promotion to cause a value to be extended to 32 bits
> > > unnecessarily (sign or zero extend, although for parity it doesn't make any
> > > difference -- if the compiler realizes it.)
> > > 
> > > b. It makes it easier to add arch-specific implementations, notably using
> > > __builtin_parity on architectures where that is known to generate good code.
> > > 
> > > c. For architectures where only *some* parity implementations are
> > > fast/practical, the generic fallbacks will either naturally synthesize them
> > > from components via shift-xor, or they can be defined to use a larger
> > > version; the function prototype acts like a cast.
> > > 
> > > d. If there is a reason in the future to add a generic version, it is really
> > > easy to do using the size-specific functions as components; this is
> > > something we do literally all over the place, using a pattern so common that
> > > it, itself, probably should be macroized:
> > > 
> > > #define parity(x) 				\
> > > ({						\
> > > 	typeof(x) __x = (x);			\
> > > 	bool __y;				\
> > > 	switch (sizeof(__x)) {			\
> > > 		case 1:				\
> > > 			__y = parity8(__x);	\
> > > 			break;			\
> > > 		case 2:				\
> > > 			__y = parity16(__x);	\
> > > 			break;			\
> > > 		case 4:				\
> > > 			__y = parity32(__x);	\
> > > 			break;			\
> > > 		case 8:				\
> > > 			__y = parity64(__x);	\
> > > 			break;			\
> > > 		default:			\
> > > 			BUILD_BUG();		\
> > > 			break;			\
> > > 	}					\
> > > 	__y;					\
> > > })
> > >
> > Thank you for your detailed response and for explaining the rationale
> > behind your preference. The points you outlined in (a)–(d) all seem
> > quite reasonable to me.
> > 
> > Yury,
> > do you have any feedback on this?
> > Thank you.
> 
> My feedback to you:
> 
> I asked you to share any numbers about each approach. Asm listings,
> performance tests, bloat-o-meter. But you did nothing or very little
> in that department. You move this series, and it means you should be
> very well aware of alternative solutions, their pros and cons.
>
I'm willing to run micro-benchmarks, but even with performance data, I
lack the domain knowledge to determine which users care about parity
efficiency. No one in Cc has clarified this either.

> Instead, you started a poll to pick the best solution. This is not
> what I expected, and this is not how the best solution can be found.
> 
> To H. Peter and everyone:
> 
> Thank you for sharing your opinion on this fixed parity(). Your
> arguments may or may not be important, depending on what existing
> users actually need. Unfortunately, Kuan-Wei didn't collect
> performance numbers and opinions from those proposed users.
> 
> I already told that, and I will say again: with the lack of any
> evidence that performance and/or code generation is important here,
> the best solution is one that minimizes maintainers' (my!) burden.
> 
> In other words, bool parity(unsigned long long). I'm OK to maintain
> a macro, as well. I understand that more complicated solutions may be
> more effective. I will take them only if they will be well advocated.
> 
Before Peter suggested an arch-specific implementation, I planned to go
with approach #1, as it minimizes maintenance overhead in the absence
of clear user requirements.

Peter,
Have you identified any users who care about parity efficiency?
If not, do we still need to introduce an arch-specific implementation?

Regards,
Kuan-Wei

> I hope this will help us to stop moving this discussion back and forth
> and save our time, guys.
> 
> Thanks,
> Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 10 months, 2 weeks ago
On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
>On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
>> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
>> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
>> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
>> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
>> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
>> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
>> > > > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
>> > > > >> > >
>> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
>> > > > >> > >> >
>> > > > >> > >> >That's not technically correct any more.
>> > > > >> > >> >
>> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
>> > > > >> > >> >other than 0 and 1.
>> > > > >> > >> >
>> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
>> > > > >> > >> >
>> > > > >> > >> >~Andrew  
>> > > > >> > >> 
>> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
>> > > > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
>> > > > >> > >> or anything even remotely like it.
>> > > > >> > >> 
>> > > > >> > >
>> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
>> > > > >> > >The underlying problem is that values other than 'true' and 'false' can
>> > > > >> > >always get into 'bool' variables.
>> > > > >> > >
>> > > > >> > >Once that has happened it is all fubar.
>> > > > >> > >
>> > > > >> > >Trying to sanitise a value with (say):
>> > > > >> > >int f(bool v)
>> > > > >> > >{
>> > > > >> > >	return (int)v & 1;
>> > > > >> > >}    
>> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> > > > >> > >
>> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
>> > > > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
>> > > > >> > >from that.
>> > > > >> > >
>> > > > >> > >	David
>> > > > >> > 
>> > > > >> > Did you just discover GIGO?
>> > > > >> 
>> > > > >> Thanks for all the suggestions.
>> > > > >> 
>> > > > >> I don't have a strong opinion on the naming or return type. I'm still a
>> > > > >> bit confused about whether I can assume that casting bool to int always
>> > > > >> results in 0 or 1.
>> > > > >> 
>> > > > >> If that's the case, since most people prefer bool over int as the
>> > > > >> return type and some are against introducing u1, my current plan is to
>> > > > >> use the following in the next version:
>> > > > >> 
>> > > > >> bool parity_odd(u64 val);
>> > > > >> 
>> > > > >> This keeps the bool return type, renames the function for better
>> > > > >> clarity, and avoids extra maintenance burden by having just one
>> > > > >> function.
>> > > > >> 
>> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
>> > > > >> would it be acceptable to keep the return type as int?
>> > > > >> 
>> > > > >> Would this work for everyone?
>> > > > >
>> > > > >Alright, it's clearly a split opinion. So what I would do myself in
>> > > > >such case is to look at existing code and see what people who really
>> > > > >need parity invent in their drivers:
>> > > > >
>> > > > >                                     bool      parity_odd
>> > > > >static inline int parity8(u8 val)       -               -
>> > > > >static u8 calc_parity(u8 val)           -               -
>> > > > >static int odd_parity(u8 c)             -               +
>> > > > >static int saa711x_odd_parity           -               +
>> > > > >static int max3100_do_parity            -               -
>> > > > >static inline int parity(unsigned x)    -               -
>> > > > >static int bit_parity(u32 pkt)          -               -
>> > > > >static int oa_tc6_get_parity(u32 p)     -               -
>> > > > >static u32 parity32(__le32 data)        -               -
>> > > > >static u32 parity(u32 sample)           -               -
>> > > > >static int get_parity(int number,       -               -
>> > > > >                      int size)
>> > > > >static bool i2cr_check_parity32(u32 v,  +               -
>> > > > >                        bool parity)
>> > > > >static bool i2cr_check_parity64(u64 v)  +               -
>> > > > >static int sw_parity(__u64 t)           -               -
>> > > > >static bool parity(u64 value)           +               -
>> > > > >
>> > > > >Now you can refer to that table say that int parity(uXX) is what
>> > > > >people want to see in their drivers.
>> > > > >
>> > > > >Whichever interface you choose, please discuss it's pros and cons.
>> > > > >What bloat-o-meter says for each option? What's maintenance burden?
>> > > > >Perf test? Look at generated code?
>> > > > >
>> > > > >I personally for a macro returning boolean, something like I
>> > > > >proposed at the very beginning.
>> > > > >
>> > > > >Thanks,
>> > > > >Yury
>> > > > 
>> > > > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
>> > > 
>> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
>> > > the simplest way would be wrapping generic implementation with
>> > > the #ifndef parity, similarly to how we handle find_next_bit case.
>> > > 
>> > > So:
>> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
>> > > 2. This may, and probably should, be a separate follow-up series,
>> > >    likely created by corresponding arch experts.
>> > > 
>> > I saw discussions in the previous email thread about both
>> > __builtin_parity and x86-specific implementations. However, from the
>> > discussion, I learned that before considering any optimization, we
>> > should first ask: which driver or subsystem actually cares about parity
>> > efficiency? If someone does, I can help with a micro-benchmark to
>> > provide performance numbers, but I don't have enough domain knowledge
>> > to identify hot paths where parity efficiency matters.
>> > 
>> IMHO,
>> 
>> If parity is never used in any hot path and we don't care about parity:
>> 
>> Then benchmarking its performance seems meaningless. In this case, a
>> function with a u64 argument would suffice, and we might not even need
>> a macro to optimize for different types—especially since the macro
>> requires special hacks to avoid compiler warnings. Also, I don't think
>> code size matters here. If it does, we should first consider making
>> parity a non-inline function in a .c file rather than an inline
>> function/macro in a header.
>> 
>> If parity is used in a hot path:
>> 
>> We need different handling for different type sizes. As previously
>> discussed, x86 assembly might use different instructions for u8 and
>> u16. This may sound stubborn, but I want to ask again: should we
>> consider using parity8/16/32/64 interfaces? Like in the i3c driver
>> example, if we only have a single parity macro that selects an
>> implementation based on type size, users must explicitly cast types.
>> If future users also need parity in a hot path, they might not be aware
>> of this requirement and end up generating suboptimal code. Since we
>> care about efficiency and generated code, why not follow hweight() and
>> provide separate implementations for different sizes?
>> 
>It seems no one will reply to my two emails. So, I have summarized
>different interface approaches. If there is a next version, I will send
>it after the merge window closes.
>
>Interface 1: Single Function
>Description: bool parity_odd(u64)
>Pros: Minimal maintenance cost
>Cons: Difficult to integrate with architecture-specific implementations
>      due to the inability to optimize for different argument sizes
>Opinions: Jiri supports this approach
>
>Interface 2: Single Macro
>Description: parity_odd() macro
>Pros: Allows type-specific implementation
>Cons: Requires hacks to avoid warnings; users may need explicit
>      casting; potential sub-optimal code on 32-bit x86
>Opinions: Yury supports this approach
>
>Interface 3: Multiple Functions
>Description: bool parity_odd8/16/32/64()
>Pros: No need for explicit casting; easy to integrate
>      architecture-specific optimizations; except for parity8(), all
>      functions are one-liners with no significant code duplication
>Cons: More functions may increase maintenance burden
>Opinions: Only I support this approach
>
>Regards,
>Kuan-Wei

You can add me to the final option. I think it makes most sense
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Yury Norov 10 months, 2 weeks ago
On Sun, Mar 23, 2025 at 03:40:20PM -0700, H. Peter Anvin wrote:
> On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> >On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
> >> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
> >> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> >> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> >> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> >> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> >> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> >> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> >> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> >> > > > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> >> > > > >> > >
> >> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> >> > > > >> > >> >
> >> > > > >> > >> >That's not technically correct any more.
> >> > > > >> > >> >
> >> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> >> > > > >> > >> >other than 0 and 1.
> >> > > > >> > >> >
> >> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> >> > > > >> > >> >
> >> > > > >> > >> >~Andrew  
> >> > > > >> > >> 
> >> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> >> > > > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> >> > > > >> > >> or anything even remotely like it.
> >> > > > >> > >> 
> >> > > > >> > >
> >> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
> >> > > > >> > >The underlying problem is that values other than 'true' and 'false' can
> >> > > > >> > >always get into 'bool' variables.
> >> > > > >> > >
> >> > > > >> > >Once that has happened it is all fubar.
> >> > > > >> > >
> >> > > > >> > >Trying to sanitise a value with (say):
> >> > > > >> > >int f(bool v)
> >> > > > >> > >{
> >> > > > >> > >	return (int)v & 1;
> >> > > > >> > >}    
> >> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> >> > > > >> > >
> >> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> >> > > > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> >> > > > >> > >from that.
> >> > > > >> > >
> >> > > > >> > >	David
> >> > > > >> > 
> >> > > > >> > Did you just discover GIGO?
> >> > > > >> 
> >> > > > >> Thanks for all the suggestions.
> >> > > > >> 
> >> > > > >> I don't have a strong opinion on the naming or return type. I'm still a
> >> > > > >> bit confused about whether I can assume that casting bool to int always
> >> > > > >> results in 0 or 1.
> >> > > > >> 
> >> > > > >> If that's the case, since most people prefer bool over int as the
> >> > > > >> return type and some are against introducing u1, my current plan is to
> >> > > > >> use the following in the next version:
> >> > > > >> 
> >> > > > >> bool parity_odd(u64 val);
> >> > > > >> 
> >> > > > >> This keeps the bool return type, renames the function for better
> >> > > > >> clarity, and avoids extra maintenance burden by having just one
> >> > > > >> function.
> >> > > > >> 
> >> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
> >> > > > >> would it be acceptable to keep the return type as int?
> >> > > > >> 
> >> > > > >> Would this work for everyone?
> >> > > > >
> >> > > > >Alright, it's clearly a split opinion. So what I would do myself in
> >> > > > >such case is to look at existing code and see what people who really
> >> > > > >need parity invent in their drivers:
> >> > > > >
> >> > > > >                                     bool      parity_odd
> >> > > > >static inline int parity8(u8 val)       -               -
> >> > > > >static u8 calc_parity(u8 val)           -               -
> >> > > > >static int odd_parity(u8 c)             -               +
> >> > > > >static int saa711x_odd_parity           -               +
> >> > > > >static int max3100_do_parity            -               -
> >> > > > >static inline int parity(unsigned x)    -               -
> >> > > > >static int bit_parity(u32 pkt)          -               -
> >> > > > >static int oa_tc6_get_parity(u32 p)     -               -
> >> > > > >static u32 parity32(__le32 data)        -               -
> >> > > > >static u32 parity(u32 sample)           -               -
> >> > > > >static int get_parity(int number,       -               -
> >> > > > >                      int size)
> >> > > > >static bool i2cr_check_parity32(u32 v,  +               -
> >> > > > >                        bool parity)
> >> > > > >static bool i2cr_check_parity64(u64 v)  +               -
> >> > > > >static int sw_parity(__u64 t)           -               -
> >> > > > >static bool parity(u64 value)           +               -
> >> > > > >
> >> > > > >Now you can refer to that table say that int parity(uXX) is what
> >> > > > >people want to see in their drivers.
> >> > > > >
> >> > > > >Whichever interface you choose, please discuss it's pros and cons.
> >> > > > >What bloat-o-meter says for each option? What's maintenance burden?
> >> > > > >Perf test? Look at generated code?
> >> > > > >
> >> > > > >I personally for a macro returning boolean, something like I
> >> > > > >proposed at the very beginning.
> >> > > > >
> >> > > > >Thanks,
> >> > > > >Yury
> >> > > > 
> >> > > > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
> >> > > 
> >> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
> >> > > the simplest way would be wrapping generic implementation with
> >> > > the #ifndef parity, similarly to how we handle find_next_bit case.
> >> > > 
> >> > > So:
> >> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> >> > > 2. This may, and probably should, be a separate follow-up series,
> >> > >    likely created by corresponding arch experts.
> >> > > 
> >> > I saw discussions in the previous email thread about both
> >> > __builtin_parity and x86-specific implementations. However, from the
> >> > discussion, I learned that before considering any optimization, we
> >> > should first ask: which driver or subsystem actually cares about parity
> >> > efficiency? If someone does, I can help with a micro-benchmark to
> >> > provide performance numbers, but I don't have enough domain knowledge
> >> > to identify hot paths where parity efficiency matters.
> >> > 
> >> IMHO,
> >> 
> >> If parity is never used in any hot path and we don't care about parity:
> >> 
> >> Then benchmarking its performance seems meaningless. In this case, a
> >> function with a u64 argument would suffice, and we might not even need
> >> a macro to optimize for different types—especially since the macro
> >> requires special hacks to avoid compiler warnings. Also, I don't think
> >> code size matters here. If it does, we should first consider making
> >> parity a non-inline function in a .c file rather than an inline
> >> function/macro in a header.
> >> 
> >> If parity is used in a hot path:
> >> 
> >> We need different handling for different type sizes. As previously
> >> discussed, x86 assembly might use different instructions for u8 and
> >> u16. This may sound stubborn, but I want to ask again: should we
> >> consider using parity8/16/32/64 interfaces? Like in the i3c driver
> >> example, if we only have a single parity macro that selects an
> >> implementation based on type size, users must explicitly cast types.
> >> If future users also need parity in a hot path, they might not be aware
> >> of this requirement and end up generating suboptimal code. Since we
> >> care about efficiency and generated code, why not follow hweight() and
> >> provide separate implementations for different sizes?
> >> 
> >It seems no one will reply to my two emails. So, I have summarized
> >different interface approaches. If there is a next version, I will send
> >it after the merge window closes.
> >
> >Interface 1: Single Function
> >Description: bool parity_odd(u64)
> >Pros: Minimal maintenance cost
> >Cons: Difficult to integrate with architecture-specific implementations
> >      due to the inability to optimize for different argument sizes

How difficult? It's just as simple as find_next_bit(). I already
pointed that.

> >Opinions: Jiri supports this approach
> >
> >Interface 2: Single Macro
> >Description: parity_odd() macro
> >Pros: Allows type-specific implementation
> >Cons: Requires hacks to avoid warnings; users may need explicit

So if the hack is documented, it's OK. I don't know the other way to
motivate compilers to get better other than pointing them to their
bugs.

> >      casting; potential sub-optimal code on 32-bit x86

Any asm listings? Any real impact?

> >Opinions: Yury supports this approach
> >
> >Interface 3: Multiple Functions
> >Description: bool parity_odd8/16/32/64()
> >Pros: No need for explicit casting; easy to integrate

Explicit castings are sometimes better than implicit ones.

> >      architecture-specific optimizations; except for parity8(), all
> >      functions are one-liners with no significant code duplication
> >Cons: More functions may increase maintenance burden

s/may/will/

> >Opinions: Only I support this approach
> >
> >Regards,
> >Kuan-Wei
> 
> You can add me to the final option. I think it makes most sense

This is not a democracy, and we are not voting here. We are engineers.
We share our expert opinions and choose the best one. I'll be happy to
go with any solution, if we all make sure it's the best.

I'm for #2 because it 
 - generates better code than #1;
 - easier to use than #3; and
 - has less maintenance burden than #3.

Why exactly #3 makes the most sense to you? Most variables are ints
and longs. How are you going to handle those with fixed-types parity()s?

Thanks,
Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 10 months, 2 weeks ago
On Mon, Mar 24, 2025 at 11:53:35AM -0400, Yury Norov wrote:
> On Sun, Mar 23, 2025 at 03:40:20PM -0700, H. Peter Anvin wrote:
> > On March 23, 2025 8:16:24 AM PDT, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> > >On Thu, Mar 13, 2025 at 03:41:49PM +0800, Kuan-Wei Chiu wrote:
> > >> On Thu, Mar 13, 2025 at 12:29:13AM +0800, Kuan-Wei Chiu wrote:
> > >> > On Wed, Mar 12, 2025 at 11:51:12AM -0400, Yury Norov wrote:
> > >> > > On Tue, Mar 11, 2025 at 03:24:14PM -0700, H. Peter Anvin wrote:
> > >> > > > On March 11, 2025 3:01:30 PM PDT, Yury Norov <yury.norov@gmail.com> wrote:
> > >> > > > >On Sun, Mar 09, 2025 at 11:48:26PM +0800, Kuan-Wei Chiu wrote:
> > >> > > > >> On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
> > >> > > > >> > On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
> > >> > > > >> > >On Fri, 07 Mar 2025 11:30:35 -0800
> > >> > > > >> > >"H. Peter Anvin" <hpa@zytor.com> wrote:
> > >> > > > >> > >
> > >> > > > >> > >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > >> > > > >> > >> >> (int)true most definitely is guaranteed to be 1.  
> > >> > > > >> > >> >
> > >> > > > >> > >> >That's not technically correct any more.
> > >> > > > >> > >> >
> > >> > > > >> > >> >GCC has introduced hardened bools that intentionally have bit patterns
> > >> > > > >> > >> >other than 0 and 1.
> > >> > > > >> > >> >
> > >> > > > >> > >> >https://gcc.gnu.org/gcc-14/changes.html
> > >> > > > >> > >> >
> > >> > > > >> > >> >~Andrew  
> > >> > > > >> > >> 
> > >> > > > >> > >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
> > >> > > > >> > >> for compiler-generated conversations that's still a given, or the manager isn't C
> > >> > > > >> > >> or anything even remotely like it.
> > >> > > > >> > >> 
> > >> > > > >> > >
> > >> > > > >> > >The whole idea of 'bool' is pretty much broken by design.
> > >> > > > >> > >The underlying problem is that values other than 'true' and 'false' can
> > >> > > > >> > >always get into 'bool' variables.
> > >> > > > >> > >
> > >> > > > >> > >Once that has happened it is all fubar.
> > >> > > > >> > >
> > >> > > > >> > >Trying to sanitise a value with (say):
> > >> > > > >> > >int f(bool v)
> > >> > > > >> > >{
> > >> > > > >> > >	return (int)v & 1;
> > >> > > > >> > >}    
> > >> > > > >> > >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
> > >> > > > >> > >
> > >> > > > >> > >I really don't see how using (say) 0xaa and 0x55 helps.
> > >> > > > >> > >What happens if the value is wrong? a trap or exception?, good luck recovering
> > >> > > > >> > >from that.
> > >> > > > >> > >
> > >> > > > >> > >	David
> > >> > > > >> > 
> > >> > > > >> > Did you just discover GIGO?
> > >> > > > >> 
> > >> > > > >> Thanks for all the suggestions.
> > >> > > > >> 
> > >> > > > >> I don't have a strong opinion on the naming or return type. I'm still a
> > >> > > > >> bit confused about whether I can assume that casting bool to int always
> > >> > > > >> results in 0 or 1.
> > >> > > > >> 
> > >> > > > >> If that's the case, since most people prefer bool over int as the
> > >> > > > >> return type and some are against introducing u1, my current plan is to
> > >> > > > >> use the following in the next version:
> > >> > > > >> 
> > >> > > > >> bool parity_odd(u64 val);
> > >> > > > >> 
> > >> > > > >> This keeps the bool return type, renames the function for better
> > >> > > > >> clarity, and avoids extra maintenance burden by having just one
> > >> > > > >> function.
> > >> > > > >> 
> > >> > > > >> If I can't assume that casting bool to int always results in 0 or 1,
> > >> > > > >> would it be acceptable to keep the return type as int?
> > >> > > > >> 
> > >> > > > >> Would this work for everyone?
> > >> > > > >
> > >> > > > >Alright, it's clearly a split opinion. So what I would do myself in
> > >> > > > >such case is to look at existing code and see what people who really
> > >> > > > >need parity invent in their drivers:
> > >> > > > >
> > >> > > > >                                     bool      parity_odd
> > >> > > > >static inline int parity8(u8 val)       -               -
> > >> > > > >static u8 calc_parity(u8 val)           -               -
> > >> > > > >static int odd_parity(u8 c)             -               +
> > >> > > > >static int saa711x_odd_parity           -               +
> > >> > > > >static int max3100_do_parity            -               -
> > >> > > > >static inline int parity(unsigned x)    -               -
> > >> > > > >static int bit_parity(u32 pkt)          -               -
> > >> > > > >static int oa_tc6_get_parity(u32 p)     -               -
> > >> > > > >static u32 parity32(__le32 data)        -               -
> > >> > > > >static u32 parity(u32 sample)           -               -
> > >> > > > >static int get_parity(int number,       -               -
> > >> > > > >                      int size)
> > >> > > > >static bool i2cr_check_parity32(u32 v,  +               -
> > >> > > > >                        bool parity)
> > >> > > > >static bool i2cr_check_parity64(u64 v)  +               -
> > >> > > > >static int sw_parity(__u64 t)           -               -
> > >> > > > >static bool parity(u64 value)           +               -
> > >> > > > >
> > >> > > > >Now you can refer to that table say that int parity(uXX) is what
> > >> > > > >people want to see in their drivers.
> > >> > > > >
> > >> > > > >Whichever interface you choose, please discuss it's pros and cons.
> > >> > > > >What bloat-o-meter says for each option? What's maintenance burden?
> > >> > > > >Perf test? Look at generated code?
> > >> > > > >
> > >> > > > >I personally for a macro returning boolean, something like I
> > >> > > > >proposed at the very beginning.
> > >> > > > >
> > >> > > > >Thanks,
> > >> > > > >Yury
> > >> > > > 
> > >> > > > Also, please at least provide a way for an arch to opt in to using the builtins, which seem to produce as good results or better at least on some architectures like x86 and probably with CPU options that imply fast popcnt is available.
> > >> > > 
> > >> > > Yeah. And because linux/bitops.h already includes asm/bitops.h
> > >> > > the simplest way would be wrapping generic implementation with
> > >> > > the #ifndef parity, similarly to how we handle find_next_bit case.
> > >> > > 
> > >> > > So:
> > >> > > 1. Kuan-Wei, please don't invent something like ARCH_HAS_PARITY;
> > >> > > 2. This may, and probably should, be a separate follow-up series,
> > >> > >    likely created by corresponding arch experts.
> > >> > > 
> > >> > I saw discussions in the previous email thread about both
> > >> > __builtin_parity and x86-specific implementations. However, from the
> > >> > discussion, I learned that before considering any optimization, we
> > >> > should first ask: which driver or subsystem actually cares about parity
> > >> > efficiency? If someone does, I can help with a micro-benchmark to
> > >> > provide performance numbers, but I don't have enough domain knowledge
> > >> > to identify hot paths where parity efficiency matters.
> > >> > 
> > >> IMHO,
> > >> 
> > >> If parity is never used in any hot path and we don't care about parity:
> > >> 
> > >> Then benchmarking its performance seems meaningless. In this case, a
> > >> function with a u64 argument would suffice, and we might not even need
> > >> a macro to optimize for different types—especially since the macro
> > >> requires special hacks to avoid compiler warnings. Also, I don't think
> > >> code size matters here. If it does, we should first consider making
> > >> parity a non-inline function in a .c file rather than an inline
> > >> function/macro in a header.
> > >> 
> > >> If parity is used in a hot path:
> > >> 
> > >> We need different handling for different type sizes. As previously
> > >> discussed, x86 assembly might use different instructions for u8 and
> > >> u16. This may sound stubborn, but I want to ask again: should we
> > >> consider using parity8/16/32/64 interfaces? Like in the i3c driver
> > >> example, if we only have a single parity macro that selects an
> > >> implementation based on type size, users must explicitly cast types.
> > >> If future users also need parity in a hot path, they might not be aware
> > >> of this requirement and end up generating suboptimal code. Since we
> > >> care about efficiency and generated code, why not follow hweight() and
> > >> provide separate implementations for different sizes?
> > >> 
> > >It seems no one will reply to my two emails. So, I have summarized
> > >different interface approaches. If there is a next version, I will send
> > >it after the merge window closes.
> > >
> > >Interface 1: Single Function
> > >Description: bool parity_odd(u64)
> > >Pros: Minimal maintenance cost
> > >Cons: Difficult to integrate with architecture-specific implementations
> > >      due to the inability to optimize for different argument sizes
> 
> How difficult? It's just as simple as find_next_bit(). I already
> pointed that.
> 

The architecture-specific implementation may use different approaches
for different bit widths. As previously discussed by Peter and David,
the x86 implementation of parity uses different instructions for u8 and
u16 arguments. Having an interface that only takes u64 makes it
difficult to accommodate these differences.

Are you referring to the #ifndef parity approach when mentioning
find_next_bit()? If so, I'm not sure how that would solve the issue
while keeping different implementations for different bit widths under
an interface that only takes u64.

> > >Opinions: Jiri supports this approach
> > >
> > >Interface 2: Single Macro
> > >Description: parity_odd() macro
> > >Pros: Allows type-specific implementation
> > >Cons: Requires hacks to avoid warnings; users may need explicit
> 
> So if the hack is documented, it's OK. I don't know the other way to
> motivate compilers to get better other than pointing them to their
> bugs.
> 

Perhaps this comes down to taste. IMHO, a solution relying on #if gcc
#else with extensive comments to explain workarounds has a higher
maintenance cost and is uglier compared to simply having three
additional one-liner functions.

That said, you are the maintainer, and you likely have a better
perspective on the maintenance burden than I do.

> > >      casting; potential sub-optimal code on 32-bit x86
> 
> Any asm listings? Any real impact?
> 
I was just referring to the comment previously mentioned by David.
Below is a simple assembly comparison between #2 and #3:

generic_macro:
        movq    xmm0, QWORD PTR [esp+4]
        mov     edx, 27030
        movdqa  xmm2, xmm0
        psrlq   xmm2, 32
        movdqa  xmm1, xmm2
        pxor    xmm1, xmm0
        movdqa  xmm3, xmm1
        psrlq   xmm3, 16
        movdqa  xmm0, xmm3
        pxor    xmm0, xmm1
        movdqa  xmm4, xmm0
        psrlq   xmm4, 8
        movdqa  xmm1, xmm4
        pxor    xmm1, xmm0
        movdqa  xmm5, xmm1
        psrlq   xmm5, 4
        movdqa  xmm0, xmm5
        pxor    xmm0, xmm1
        movd    eax, xmm0
        and     eax, 15
        bt      edx, eax
        setc    al
        ret
fixed_type_parity_function:
        mov     edx, DWORD PTR [esp+8]
        xor     edx, DWORD PTR [esp+4]
        mov     eax, edx
        shr     eax, 16
        xor     eax, edx
        mov     edx, eax
        xor     dl, ah
        mov     eax, edx
        shr     al, 4
        xor     eax, edx
        mov     edx, 27030
        and     eax, 15
        bt      edx, eax
        setc    al
        ret

https://godbolt.org/z/cPjbEYKPP

As I mentioned earlier, I can't point to any specific driver or
subsystem where parity efficiency is critical. However, if performance
is not a concern, then perhaps #1 is the simplest option, and we might
not even need architecture-specific implementations.

> > >Opinions: Yury supports this approach
> > >
> > >Interface 3: Multiple Functions
> > >Description: bool parity_odd8/16/32/64()
> > >Pros: No need for explicit casting; easy to integrate
> 
> Explicit castings are sometimes better than implicit ones.
> 
In #2, users might easily overlook the need for explicit casting. As
Peter pointed out, integer promotion could unintentionally lead to the
32-bit version being used when it wasn't intended.

By making the function names and parameter types explicitly indicate
8/16/32/64 bits, such issues become less likely to occur.

> > >      architecture-specific optimizations; except for parity8(), all
> > >      functions are one-liners with no significant code duplication
> > >Cons: More functions may increase maintenance burden
> 
> s/may/will/
> 
> > >Opinions: Only I support this approach
> > >
> > >Regards,
> > >Kuan-Wei
> > 
> > You can add me to the final option. I think it makes most sense
> 
> This is not a democracy, and we are not voting here. We are engineers.
> We share our expert opinions and choose the best one. I'll be happy to
> go with any solution, if we all make sure it's the best.
> 
I wasn't trying to initiate a vote. I was simply summarizing the
possible approaches discussed so far and the feedback I've received.

> I'm for #2 because it 
>  - generates better code than #1;

Which driver or subsystem would particularly care about parity
efficiency?

>  - easier to use than #3; and

I agree that #2 is easier to use than #3 if users don't need to perform
explicit casting. However, if they do need to cast explicitly to get
correct code generation, then I think #3 is clearer and simpler for
users.

>  - has less maintenance burden than #3.
> 
> Why exactly #3 makes the most sense to you? Most variables are ints
> and longs. How are you going to handle those with fixed-types parity()s?
> 
If explicit casting is required for correct code generation, then both
#2 and #3 face the same issue.

To address this, my first thought was to add a parity_long() function,
similar to hweight_long(). However, I can imagine that you'd find this
even harder to accept due to increased maintenance burden.

This also brings me to another question that isn't directly related to
parity but has me curious. Since you maintain hweight as well, what
makes hweight and parity different enough that hweight follows the
hweight8/16/32/64 and hweight_long() interface, while parity is leaning
toward a macro-based approach? Their functionality seems quite similar
to me.

Regards,
Kuan-Wei

Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Jiri Slaby 11 months ago
On 09. 03. 25, 16:48, Kuan-Wei Chiu wrote:
> Would this work for everyone?

+1 for /me.

-- 
js
suse labs
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by H. Peter Anvin 11 months ago
On March 9, 2025 8:48:26 AM PDT, Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
>On Fri, Mar 07, 2025 at 12:07:02PM -0800, H. Peter Anvin wrote:
>> On March 7, 2025 11:53:10 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>> >On Fri, 07 Mar 2025 11:30:35 -0800
>> >"H. Peter Anvin" <hpa@zytor.com> wrote:
>> >
>> >> On March 7, 2025 10:49:56 AM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> >> >> (int)true most definitely is guaranteed to be 1.  
>> >> >
>> >> >That's not technically correct any more.
>> >> >
>> >> >GCC has introduced hardened bools that intentionally have bit patterns
>> >> >other than 0 and 1.
>> >> >
>> >> >https://gcc.gnu.org/gcc-14/changes.html
>> >> >
>> >> >~Andrew  
>> >> 
>> >> Bit patterns in memory maybe (not that I can see the Linux kernel using them) but
>> >> for compiler-generated conversations that's still a given, or the manager isn't C
>> >> or anything even remotely like it.
>> >> 
>> >
>> >The whole idea of 'bool' is pretty much broken by design.
>> >The underlying problem is that values other than 'true' and 'false' can
>> >always get into 'bool' variables.
>> >
>> >Once that has happened it is all fubar.
>> >
>> >Trying to sanitise a value with (say):
>> >int f(bool v)
>> >{
>> >	return (int)v & 1;
>> >}    
>> >just doesn't work (see https://www.godbolt.org/z/MEndP3q9j)
>> >
>> >I really don't see how using (say) 0xaa and 0x55 helps.
>> >What happens if the value is wrong? a trap or exception?, good luck recovering
>> >from that.
>> >
>> >	David
>> 
>> Did you just discover GIGO?
>
>Thanks for all the suggestions.
>
>I don't have a strong opinion on the naming or return type. I'm still a
>bit confused about whether I can assume that casting bool to int always
>results in 0 or 1.
>
>If that's the case, since most people prefer bool over int as the
>return type and some are against introducing u1, my current plan is to
>use the following in the next version:
>
>bool parity_odd(u64 val);
>
>This keeps the bool return type, renames the function for better
>clarity, and avoids extra maintenance burden by having just one
>function.
>
>If I can't assume that casting bool to int always results in 0 or 1,
>would it be acceptable to keep the return type as int?
>
>Would this work for everyone?
>
>Regards,
>Kuan-Wei

You *CAN* safely assume that bool is an integer type which always has the value 0 or 1.
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Jiri Slaby 11 months, 1 week ago
On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
> Several parts of the kernel contain redundant implementations of parity
> calculations for 16/32/64-bit values. Introduces generic
> parity16/32/64() helpers in bitops.h, providing a standardized
> and optimized implementation.
> 
> Subsequent patches refactor various kernel components to replace
> open-coded parity calculations with the new helpers, reducing code
> duplication and improving maintainability.
> 
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> In v3, I use parityXX() instead of the parity() macro since the
> parity() macro may generate suboptimal code and requires special hacks
> to make GCC happy. If anyone still prefers a single parity() macro,
> please let me know.

What is suboptimal and where exactly it matters? Have you actually 
measured it?

> Additionally, I changed parityXX() << y users to !!parityXX() << y
> because, unlike C++, C does not guarantee that true casts to int as 1.

How comes? ANSI C99 exactly states:
===
true
which expands to the integer constant 1,
===

thanks,
-- 
js
suse labs
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Yury Norov 11 months ago
On Fri, Mar 07, 2025 at 07:57:48AM +0100, Jiri Slaby wrote:
> On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
> > Several parts of the kernel contain redundant implementations of parity
> > calculations for 16/32/64-bit values. Introduces generic
> > parity16/32/64() helpers in bitops.h, providing a standardized
> > and optimized implementation.
> > 
> > Subsequent patches refactor various kernel components to replace
> > open-coded parity calculations with the new helpers, reducing code
> > duplication and improving maintainability.
> > 
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> > In v3, I use parityXX() instead of the parity() macro since the
> > parity() macro may generate suboptimal code and requires special hacks
> > to make GCC happy. If anyone still prefers a single parity() macro,
> > please let me know.
> 
> What is suboptimal and where exactly it matters? Have you actually measured
> it?

I asked exactly this question at least 3 times, and have never
received perf tests or asm listings - nothing. I've never received
any comments from driver maintainers about how performance of the
parity() is important for them, as well.

With the absence of _any_ feedback, I'm not going to take this series,
of course, for the reason: overengineering.

With that said, the simplest way would be replacing parity8(u8) with
parity(u64) 'one size fits all' thing. I even made a one extra step,
suggesting a macro that would generate a better code for smaller types
with almost no extra maintenance burden. This is another acceptable
option to me.

Thanks,
Yury
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
Hi Yury,

On Fri, Mar 07, 2025 at 10:55:13AM -0500, Yury Norov wrote:
> On Fri, Mar 07, 2025 at 07:57:48AM +0100, Jiri Slaby wrote:
> > On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
> > > Several parts of the kernel contain redundant implementations of parity
> > > calculations for 16/32/64-bit values. Introduces generic
> > > parity16/32/64() helpers in bitops.h, providing a standardized
> > > and optimized implementation.
> > > 
> > > Subsequent patches refactor various kernel components to replace
> > > open-coded parity calculations with the new helpers, reducing code
> > > duplication and improving maintainability.
> > > 
> > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > ---
> > > In v3, I use parityXX() instead of the parity() macro since the
> > > parity() macro may generate suboptimal code and requires special hacks
> > > to make GCC happy. If anyone still prefers a single parity() macro,
> > > please let me know.
> > 
> > What is suboptimal and where exactly it matters? Have you actually measured
> > it?
> 
> I asked exactly this question at least 3 times, and have never
> received perf tests or asm listings - nothing. I've never received
> any comments from driver maintainers about how performance of the
> parity() is important for them, as well.
> 
To be clear, I use parityXX() was mainly because you dislike the >>
16 >> 16 hack, and I dislike the #if gcc #else hack—not due to
performance or generated code considerations.

> With the absence of _any_ feedback, I'm not going to take this series,
> of course, for the reason: overengineering.
> 
I'm quite surprised that three separate one-line functions are
considered overengineering compared to a multi-line approach that
requires special handling to satisfy gcc.

> With that said, the simplest way would be replacing parity8(u8) with
> parity(u64) 'one size fits all' thing. I even made a one extra step,
> suggesting a macro that would generate a better code for smaller types
> with almost no extra maintenance burden. This is another acceptable
> option to me.
> 
I'm fine with unifying everything to a single parity(u64) function.
Before I submit the next version, please let me know if anyone has
objections to this approach.

Regards,
Kuan-Wei
Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Posted by Kuan-Wei Chiu 11 months ago
Hi Jiri,

On Fri, Mar 07, 2025 at 07:57:48AM +0100, Jiri Slaby wrote:
> On 06. 03. 25, 17:25, Kuan-Wei Chiu wrote:
> > Several parts of the kernel contain redundant implementations of parity
> > calculations for 16/32/64-bit values. Introduces generic
> > parity16/32/64() helpers in bitops.h, providing a standardized
> > and optimized implementation.
> > 
> > Subsequent patches refactor various kernel components to replace
> > open-coded parity calculations with the new helpers, reducing code
> > duplication and improving maintainability.
> > 
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> > In v3, I use parityXX() instead of the parity() macro since the
> > parity() macro may generate suboptimal code and requires special hacks
> > to make GCC happy. If anyone still prefers a single parity() macro,
> > please let me know.
> 
> What is suboptimal and where exactly it matters? Have you actually measured
> it?
> 
In the previous thread, David and Yury had different opinions regarding
the implementation details of the parity() macro. I am trying to find a
solution that satisfies most people while keeping it as simple as
possible.

I cannot point to any specific users who are particularly concerned
about efficiency, so personally, I am not really concerned about the
generated code either. However, I am not a fan of the #if gcc #else
approach, and Yury also mentioned that he does not like the >> 16 >> 16
hack. At the same time, David pointed out that GCC might generate
double-register math. Given these concerns, I leaned toward reverting
to the parityXX() approach.

If you still prefer using the parity() macro, we can revisit and
discuss its implementation details further.

> > Additionally, I changed parityXX() << y users to !!parityXX() << y
> > because, unlike C++, C does not guarantee that true casts to int as 1.
> 
> How comes? ANSI C99 exactly states:
> ===
> true
> which expands to the integer constant 1,
> ===
> 
I gave a more detailed response in my reply to Peter. If we can confirm
that casting bool to int will only result in 1 or 0, I will remove the
!! hack in the next version.

Regards,
Kuan-Wei