[v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

[PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

The current implementation of num_digits() uses a loop with repeated
multiplication, which is inefficient. Furthermore, the negation of
the input value "val = -val" causes undefined behavior when val is
INT_MIN, as its absolute value cannot be represented as a 32-bit integer.

Replace the loop with a switch statement using GCC case ranges. This
allows the compiler to generate a jump table or a series of optimized
comparisons, providing O(1) performance. By using an unsigned int to
handle the magnitude, we safely handle the INT_MIN case without
relying on 64-bit types or undefined signed overflow.

Removed the outdated comment.

Signed-off-by: David Desobry <david.desobry@formalgen.com>
---
v2:
 - Replaced loop with switch statement and GCC case ranges.
 - Fixed INT_MIN overflow using unsigned int cast instead of s64/long long.
 - Removed outdated comment regarding mobile submission.
 arch/x86/lib/misc.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
index 40b81c338ae5..03ba028d5326 100644
--- a/arch/x86/lib/misc.c
+++ b/arch/x86/lib/misc.c
@@ -3,22 +3,37 @@
 
 /*
  * Count the digits of @val including a possible sign.
- *
- * (Typed on and submitted from hpa's mobile phone.)
  */
 int num_digits(int val)
 {
-	long long m = 10;
-	int d = 1;
+	unsigned int v = val;
+	int d = 0;
 
 	if (val < 0) {
-		d++;
-		val = -val;
+		d = 1;
+		v = -v;
 	}
 
-	while (val >= m) {
-		m *= 10;
-		d++;
+	switch (v) {
+	case 0 ... 9:
+		return d + 1;
+	case 10 ... 99:
+		return d + 2;
+	case 100 ... 999:
+		return d + 3;
+	case 1000 ... 9999:
+		return d + 4;
+	case 10000 ... 99999:
+		return d + 5;
+	case 100000 ... 999999:
+		return d + 6;
+	case 1000000 ... 9999999:
+		return d + 7;
+	case 10000000 ... 99999999:
+		return d + 8;
+	case 100000000 ... 999999999:
+		return d + 9;
+	default:
+		return d + 10;
 	}
-	return d;
 }
-- 
2.43.0

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Laight 2 weeks, 3 days ago

On Tue, 20 Jan 2026 18:47:48 +0100
David Desobry <david.desobry@formalgen.com> wrote:

> The current implementation of num_digits() uses a loop with repeated
> multiplication, which is inefficient. Furthermore, the negation of
> the input value "val = -val" causes undefined behavior when val is
> INT_MIN, as its absolute value cannot be represented as a 32-bit integer.
> 
> Replace the loop with a switch statement using GCC case ranges. This
> allows the compiler to generate a jump table or a series of optimized
> comparisons, providing O(1) performance. By using an unsigned int to
> handle the magnitude, we safely handle the INT_MIN case without
> relying on 64-bit types or undefined signed overflow.
> 
> Removed the outdated comment.
> 
> Signed-off-by: David Desobry <david.desobry@formalgen.com>
> ---
> v2:
>  - Replaced loop with switch statement and GCC case ranges.
>  - Fixed INT_MIN overflow using unsigned int cast instead of s64/long long.
>  - Removed outdated comment regarding mobile submission.
>  arch/x86/lib/misc.c | 35 +++++++++++++++++++++++++----------
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
> index 40b81c338ae5..03ba028d5326 100644
> --- a/arch/x86/lib/misc.c
> +++ b/arch/x86/lib/misc.c
> @@ -3,22 +3,37 @@
>  
>  /*
>   * Count the digits of @val including a possible sign.
> - *
> - * (Typed on and submitted from hpa's mobile phone.)
>   */
>  int num_digits(int val)
>  {
> -	long long m = 10;
> -	int d = 1;
> +	unsigned int v = val;
> +	int d = 0;
>  
>  	if (val < 0) {
> -		d++;
> -		val = -val;
> +		d = 1;
> +		v = -v;
>  	}

Maybe better to write as:
	if (val < 0) {
		d = 1;
		v = -val;
	} else {
		d = 0;
		v = val;
	}

The compiler will only generate one jump.

>  
> -	while (val >= m) {
> -		m *= 10;
> -		d++;
> +	switch (v) {
> +	case 0 ... 9:
> +		return d + 1;
> +	case 10 ... 99:
> +		return d + 2;
> +	case 100 ... 999:
> +		return d + 3;
> +	case 1000 ... 9999:
> +		return d + 4;
> +	case 10000 ... 99999:
> +		return d + 5;
> +	case 100000 ... 999999:
> +		return d + 6;
> +	case 1000000 ... 9999999:
> +		return d + 7;
> +	case 10000000 ... 99999999:
> +		return d + 8;
> +	case 100000000 ... 999999999:
> +		return d + 9;
> +	default:
> +		return d + 10;

clang generates something really horrid for that.

Either:
    if (v <= 9) return d + 1;
    if (v <= 99) return d + 2;
    if (v <= 999) return d + 3;
    if (v <= 9999) return d + 4;
    if (v <= 99999) return d + 5;
    if (v <= 999999) return d + 6;
    if (v <= 9999999) return d + 7;
    if (v <= 99999999) return d + 8;
    if (v <= 999999999) return d + 9;
    return d + 10;
or:
    if ((++d && v > 9) &&
        (++d && v > 99) &&
        (++d && v > 999) &&
        (++d && v > 9999) &&
        (++d && v > 99999) &&
        (++d && v > 999999) &&
        (++d && v > 9999999) &&
        (++d && v > 99999999) &&
        (++d && v > 999999999))
        d++;
    return d;
generate better code.
In particular it is almost certainly best to only have one taken branch.
Dumping in a load of unlikely() might help that as well.
(Neither compiler does the ++d inline, the add is done before the return.)

	David

>  	}
> -	return d;
>  }

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

On 1/20/26 22:46, David Laight wrote:

> 
> Maybe better to write as:
> 	if (val < 0) {
> 		d = 1;
> 		v = -val;
> 	} else {
> 		d = 0;
> 		v = val;
> 	}
> The compiler will only generate one jump.

Agreed, I've adopted this in v3.

> 
>>   
>> -	while (val >= m) {
>> -		m *= 10;
>> -		d++;
>> +	switch (v) {
>> +	case 0 ... 9:
>> +		return d + 1;
>> +	case 10 ... 99:
>> +		return d + 2;
>> +	case 100 ... 999:
>> +		return d + 3;
>> +	case 1000 ... 9999:
>> +		return d + 4;
>> +	case 10000 ... 99999:
>> +		return d + 5;
>> +	case 100000 ... 999999:
>> +		return d + 6;
>> +	case 1000000 ... 9999999:
>> +		return d + 7;
>> +	case 10000000 ... 99999999:
>> +		return d + 8;
>> +	case 100000000 ... 999999999:
>> +		return d + 9;
>> +	default:
>> +		return d + 10;
> 
> clang generates something really horrid for that.
> 
> Either:
>      if (v <= 9) return d + 1;
>      if (v <= 99) return d + 2;
>      if (v <= 999) return d + 3;
>      if (v <= 9999) return d + 4;
>      if (v <= 99999) return d + 5;
>      if (v <= 999999) return d + 6;
>      if (v <= 9999999) return d + 7;
>      if (v <= 99999999) return d + 8;
>      if (v <= 999999999) return d + 9;
>      return d + 10;
> or:
>      if ((++d && v > 9) &&
>          (++d && v > 99) &&
>          (++d && v > 999) &&
>          (++d && v > 9999) &&
>          (++d && v > 99999) &&
>          (++d && v > 999999) &&
>          (++d && v > 9999999) &&
>          (++d && v > 99999999) &&
>          (++d && v > 999999999))
>          d++;
>      return d;
> generate better code.
> In particular it is almost certainly best to only have one taken branch.
> Dumping in a load of unlikely() might help that as well.
> (Neither compiler does the ++d inline, the add is done before the return.)

Good catch. I have replaced the switch statement with a linear if-chain 
in v3 to ensure better code generation for both GCC and Clang.

David

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by H. Peter Anvin 2 weeks, 3 days ago

On 2026-01-20 15:32, David Desobry wrote:
> 
> Good catch. I have replaced the switch statement with a linear if-chain in v3
> to ensure better code generation for both GCC and Clang.
> 

I think a bigger deal is just to change it to unsigned.

Now, for really silly optimization:

int num_digits(unsigned int x)
{
    int n = 0;
    asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
    asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));

    return n;
}

No branches at all!

	-hpa

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by Maciej W. Rozycki 2 weeks, 3 days ago

On Tue, 20 Jan 2026, H. Peter Anvin wrote:

> Now, for really silly optimization:
> 
> int num_digits(unsigned int x)
> {
>     int n = 0;
>     asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
> 
>     return n;
> }
> 
> No branches at all!

 I guess you chose to use SBB rather than somewhat less mind-twisting ADC 
for the entertainment of the reader?

 Anyway branchless code can be produced from C code as well, e.g.:

int num_digits(unsigned int x)
{
	return (1 + (x > 9) + (x > 99) + (x > 999) + (x > 9999) +
		(x > 99999) + (x > 999999) + (x > 9999999) +
		(x > 99999999) + (x > 999999999));
}

although GCC at least as at version 11 I have here uses SETA rather than 
ADC/SBB (it doesn't care if you write (x > 9) or (x >= 10), etc.) emitting 
a longer and likely slower sequence even at -Os.  And likewise the POWER 
backend doesn't take advantage of the carry flag and prefers calculations 
involving shifting the sign bit into bit 0.  Obviously no one must have 
thought of adding the right transformation to the optimiser, which might 
be an interesting challenge to someone.

  Maciej

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by H. Peter Anvin 2 weeks, 1 day ago

On January 21, 2026 3:51:29 AM PST, "Maciej W. Rozycki" <macro@orcam.me.uk> wrote:
>On Tue, 20 Jan 2026, H. Peter Anvin wrote:
>
>> Now, for really silly optimization:
>> 
>> int num_digits(unsigned int x)
>> {
>>     int n = 0;
>>     asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
>> 
>>     return n;
>> }
>> 
>> No branches at all!
>
> I guess you chose to use SBB rather than somewhat less mind-twisting ADC 
>for the entertainment of the reader?
>
> Anyway branchless code can be produced from C code as well, e.g.:
>
>int num_digits(unsigned int x)
>{
>	return (1 + (x > 9) + (x > 99) + (x > 999) + (x > 9999) +
>		(x > 99999) + (x > 999999) + (x > 9999999) +
>		(x > 99999999) + (x > 999999999));
>}
>
>although GCC at least as at version 11 I have here uses SETA rather than 
>ADC/SBB (it doesn't care if you write (x > 9) or (x >= 10), etc.) emitting 
>a longer and likely slower sequence even at -Os.  And likewise the POWER 
>backend doesn't take advantage of the carry flag and prefers calculations 
>involving shifting the sign bit into bit 0.  Obviously no one must have 
>thought of adding the right transformation to the optimiser, which might 
>be an interesting challenge to someone.
>
>  Maciej

No, I use it because SBB subtracts CF, whereas ADC adds CF.

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by Maciej W. Rozycki 2 weeks, 1 day ago

On Thu, 22 Jan 2026, H. Peter Anvin wrote:

> > I guess you chose to use SBB rather than somewhat less mind-twisting ADC 
> >for the entertainment of the reader?
[...]
> 
> No, I use it because SBB subtracts CF, whereas ADC adds CF.

 Ah, right, on x86 an immediate needs to be the subtrahend, so you don't 
have the freedom to swap the operands of CMP, which would let you use CF 
either way (as say on VAX), which is what I had in mind.  I stand 
corrected then, it's the only way.

  Maciej

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

On 1/21/26 00:49, H. Peter Anvin wrote:
> On 2026-01-20 15:32, David Desobry wrote:
>>
>> Good catch. I have replaced the switch statement with a linear if-chain in v3
>> to ensure better code generation for both GCC and Clang.
>>
> 
> I think a bigger deal is just to change it to unsigned.
> 
> Now, for really silly optimization:
> 
> int num_digits(unsigned int x)
> {
>      int n = 0;
>      asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>      asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
> 
>      return n;
> }
> 
> No branches at all!
> 
> 	-hpa
> 

Actually, the V3 change: 	
if (val < 0) {
-		d++;
-		val = -val;
+		d = 1;
+		v = -val;
+	} else {
+		d = 0;
+		v = val;
  	}
reintroduced the undefined behavior for val == INT_MIN.
So this V3 version is incorrect.
I'm not familiar enough with the rest of the codebase to know if 
changing the function signature to unsigned int is correct here.

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Laight 2 weeks, 3 days ago

On Wed, 21 Jan 2026 01:04:39 +0100
David Desobry <david.desobry@formalgen.com> wrote:

> On 1/21/26 00:49, H. Peter Anvin wrote:
..
> Actually, the V3 change: 	
> if (val < 0) {
> -		d++;
> -		val = -val;
> +		d = 1;
> +		v = -val;
> +	} else {
> +		d = 0;
> +		v = val;
>   	}
> reintroduced the undefined behavior for val == INT_MIN.
> So this V3 version is incorrect.

Change to:
	v = -(val + 1); v++;
The compiler notices the two '+ 1' cancel each other out.

	David

> I'm not familiar enough with the rest of the codebase to know if 
> changing the function signature to unsigned int is correct here.

In theory you'd need to change the name and all the callers.

	David

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

On 1/21/26 10:54, David Laight wrote:

>> I'm not familiar enough with the rest of the codebase to know if
>> changing the function signature to unsigned int is correct here.
> 
> In theory you'd need to change the name and all the callers.
> 
> 	David
> 
> 

Agreed. It turns out there were very few callers.

I have just submitted V5, which renames the function to 
num_digits_u32(), switches the interface to unsigned int, and updates 
all callers accordingly.

David

[PATCH v5] x86/lib: Rename num_digits() to num_digits_u32() and optimize

Posted by David Desobry 2 weeks, 3 days ago

The current implementation of num_digits() uses a loop with repeated
multiplication, which is inefficient. Furthermore, it takes a signed
integer which leads to undefined behavior when negating INT_MIN.

Since this function is only used for unsigned magnitudes, change it
to num_digits_u32() and convert the interface to take an unsigned int.
This naturally resolves the INT_MIN overflow issue and clarifies the
function's intended use.

Replace the loop with a branchless sequence using inline assembly. By
using the 'sbb' instruction against the carry flag after comparisons,
we eliminate all conditional branches. This provides constant-time
performance and avoids CPU branch misprediction penalties.

Update all existing callers and the function comment to reflect the
new name and the fact that signs are no longer handled.

Signed-off-by: David Desobry <david.desobry@formalgen.com>
---
v5:
 - Changed name to num_digits_u32() and updated all callers to avoid 
   semantic confusion
v4:
 - Switched to branchless inline assembly.
 - Changed function signature to unsigned int.
 - Updated prototype in arch/x86/include/asm/misc.h.

 arch/x86/include/asm/misc.h |  2 +-
 arch/x86/kernel/smpboot.c   |  8 ++++----
 arch/x86/lib/misc.c         | 28 +++++++++++++---------------
 3 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h
index bb049cca3729..78d96b81d985 100644
--- a/arch/x86/include/asm/misc.h
+++ b/arch/x86/include/asm/misc.h
@@ -2,6 +2,6 @@
 #ifndef _ASM_X86_MISC_H
 #define _ASM_X86_MISC_H
 
-int num_digits(int val);
+int num_digits_u32(unsigned int x);
 
 #endif /* _ASM_X86_MISC_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..fc92f04c7f04 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -849,10 +849,10 @@ static void announce_cpu(int cpu, int apicid)
 	int node = early_cpu_to_node(cpu);
 
 	if (!width)
-		width = num_digits(num_possible_cpus()) + 1; /* + '#' sign */
+		width = num_digits_u32(num_possible_cpus()) + 1; /* + '#' sign */
 
 	if (!node_width)
-		node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */
+		node_width = num_digits_u32(num_possible_nodes()) + 1; /* + '#' */
 
 	if (system_state < SYSTEM_RUNNING) {
 		if (first)
@@ -864,7 +864,7 @@ static void announce_cpu(int cpu, int apicid)
 			current_node = node;
 
 			printk(KERN_INFO ".... node %*s#%d, CPUs:  ",
-			       node_width - num_digits(node), " ", node);
+			       node_width - num_digits_u32(node), " ", node);
 		}
 
 		/* Add padding for the BSP */
@@ -872,7 +872,7 @@ static void announce_cpu(int cpu, int apicid)
 			pr_cont("%*s", width + 1, " ");
 		first = 0;
 
-		pr_cont("%*s#%d", width - num_digits(cpu), " ", cpu);
+		pr_cont("%*s#%d", width - num_digits_u32(cpu), " ", cpu);
 	} else
 		pr_info("Booting Node %d Processor %d APIC 0x%x\n",
 			node, cpu, apicid);
diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
index 40b81c338ae5..b21ca6f0dbf8 100644
--- a/arch/x86/lib/misc.c
+++ b/arch/x86/lib/misc.c
@@ -2,23 +2,21 @@
 #include <asm/misc.h>
 
 /*
- * Count the digits of @val including a possible sign.
- *
- * (Typed on and submitted from hpa's mobile phone.)
+ * Count the decimal digits of an unsigned integer.
  */
-int num_digits(int val)
+int num_digits_u32(unsigned int x)
 {
-	long long m = 10;
-	int d = 1;
+	int n = 0;
 
-	if (val < 0) {
-		d++;
-		val = -val;
-	}
+	asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
 
-	while (val >= m) {
-		m *= 10;
-		d++;
-	}
-	return d;
+	return n;
 }
-- 
2.43.0

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by H. Peter Anvin 2 weeks, 3 days ago

On 2026-01-20 16:04, David Desobry wrote:
> 
> Actually, the V3 change:    
> if (val < 0) {
> -        d++;
> -        val = -val;
> +        d = 1;
> +        v = -val;
> +    } else {
> +        d = 0;
> +        v = val;
>      }
> reintroduced the undefined behavior for val == INT_MIN.
> So this V3 version is incorrect.
> I'm not familiar enough with the rest of the codebase to know if changing the
> function signature to unsigned int is correct here.

It is only ever called with a cpu number, a node number, or a CPU count. Never
negative.

	-hpa

[PATCH v4] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

The current implementation of num_digits() uses a loop with repeated
multiplication, which is inefficient. Furthermore, it takes a signed
integer which leads to undefined behavior when negating INT_MIN.

As this function is only used for unsigned magnitudes in the kernel code,
convert the interface to take an unsigned int. This naturally resolves
the INT_MIN overflow issue.

Replace the loop with a branchless sequence using inline assembly. By
using the 'sbb' instruction against the carry flag after comparisons,
we eliminate all conditional branches. This provides constant-time
performance and avoids CPU branch misprediction penalties.

Update the function comment to reflect that signs are no longer handled.

Signed-off-by: David Desobry <david.desobry@formalgen.com>
---
 v4:
 - Switched to branchless inline assembly. 
 - Changed function signature to unsigned int.
 - Updated prototype in arch/x86/include/asm/misc.h. 

 arch/x86/include/asm/misc.h |  2 +-
 arch/x86/lib/misc.c         | 28 +++++++++++++---------------
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h
index bb049cca3729..48b6bd7c08b9 100644
--- a/arch/x86/include/asm/misc.h
+++ b/arch/x86/include/asm/misc.h
@@ -2,6 +2,6 @@
 #ifndef _ASM_X86_MISC_H
 #define _ASM_X86_MISC_H
 
-int num_digits(int val);
+int num_digits(unsigned int val);
 
 #endif /* _ASM_X86_MISC_H */
diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
index 40b81c338ae5..9623795b059f 100644
--- a/arch/x86/lib/misc.c
+++ b/arch/x86/lib/misc.c
@@ -2,23 +2,21 @@
 #include <asm/misc.h>
 
 /*
- * Count the digits of @val including a possible sign.
- *
- * (Typed on and submitted from hpa's mobile phone.)
+ * Count the decimal digits of an unsigned integer.
  */
-int num_digits(int val)
+int num_digits(unsigned int x)
 {
-	long long m = 10;
-	int d = 1;
+	int n = 0;
 
-	if (val < 0) {
-		d++;
-		val = -val;
-	}
+	asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
+	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
 
-	while (val >= m) {
-		m *= 10;
-		d++;
-	}
-	return d;
+	return n;
 }
-- 
2.43.0

Re: [PATCH v4] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Laight 2 weeks, 3 days ago

On Wed, 21 Jan 2026 10:39:11 +0100
David Desobry <david.desobry@formalgen.com> wrote:

> The current implementation of num_digits() uses a loop with repeated
> multiplication, which is inefficient. Furthermore, it takes a signed
> integer which leads to undefined behavior when negating INT_MIN.
> 
> As this function is only used for unsigned magnitudes in the kernel code,
> convert the interface to take an unsigned int. This naturally resolves
> the INT_MIN overflow issue.
> 
> Replace the loop with a branchless sequence using inline assembly. By
> using the 'sbb' instruction against the carry flag after comparisons,
> we eliminate all conditional branches. This provides constant-time
> performance and avoids CPU branch misprediction penalties.

Don't do this version, it isn't worth the effort.

	David

> 
> Update the function comment to reflect that signs are no longer handled.
> 
> Signed-off-by: David Desobry <david.desobry@formalgen.com>
> ---
>  v4:
>  - Switched to branchless inline assembly. 
>  - Changed function signature to unsigned int.
>  - Updated prototype in arch/x86/include/asm/misc.h. 
> 
>  arch/x86/include/asm/misc.h |  2 +-
>  arch/x86/lib/misc.c         | 28 +++++++++++++---------------
>  2 files changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h
> index bb049cca3729..48b6bd7c08b9 100644
> --- a/arch/x86/include/asm/misc.h
> +++ b/arch/x86/include/asm/misc.h
> @@ -2,6 +2,6 @@
>  #ifndef _ASM_X86_MISC_H
>  #define _ASM_X86_MISC_H
>  
> -int num_digits(int val);
> +int num_digits(unsigned int val);
>  
>  #endif /* _ASM_X86_MISC_H */
> diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
> index 40b81c338ae5..9623795b059f 100644
> --- a/arch/x86/lib/misc.c
> +++ b/arch/x86/lib/misc.c
> @@ -2,23 +2,21 @@
>  #include <asm/misc.h>
>  
>  /*
> - * Count the digits of @val including a possible sign.
> - *
> - * (Typed on and submitted from hpa's mobile phone.)
> + * Count the decimal digits of an unsigned integer.
>   */
> -int num_digits(int val)
> +int num_digits(unsigned int x)
>  {
> -	long long m = 10;
> -	int d = 1;
> +	int n = 0;
>  
> -	if (val < 0) {
> -		d++;
> -		val = -val;
> -	}
> +	asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
>  
> -	while (val >= m) {
> -		m *= 10;
> -		d++;
> -	}
> -	return d;
> +	return n;
>  }

Re: [PATCH v4] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by H. Peter Anvin 2 weeks, 3 days ago

On January 21, 2026 3:36:43 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Wed, 21 Jan 2026 10:39:11 +0100
>David Desobry <david.desobry@formalgen.com> wrote:
>
>> The current implementation of num_digits() uses a loop with repeated
>> multiplication, which is inefficient. Furthermore, it takes a signed
>> integer which leads to undefined behavior when negating INT_MIN.
>> 
>> As this function is only used for unsigned magnitudes in the kernel code,
>> convert the interface to take an unsigned int. This naturally resolves
>> the INT_MIN overflow issue.
>> 
>> Replace the loop with a branchless sequence using inline assembly. By
>> using the 'sbb' instruction against the carry flag after comparisons,
>> we eliminate all conditional branches. This provides constant-time
>> performance and avoids CPU branch misprediction penalties.
>
>Don't do this version, it isn't worth the effort.
>
>	David
>
>> 
>> Update the function comment to reflect that signs are no longer handled.
>> 
>> Signed-off-by: David Desobry <david.desobry@formalgen.com>
>> ---
>>  v4:
>>  - Switched to branchless inline assembly. 
>>  - Changed function signature to unsigned int.
>>  - Updated prototype in arch/x86/include/asm/misc.h. 
>> 
>>  arch/x86/include/asm/misc.h |  2 +-
>>  arch/x86/lib/misc.c         | 28 +++++++++++++---------------
>>  2 files changed, 14 insertions(+), 16 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/misc.h b/arch/x86/include/asm/misc.h
>> index bb049cca3729..48b6bd7c08b9 100644
>> --- a/arch/x86/include/asm/misc.h
>> +++ b/arch/x86/include/asm/misc.h
>> @@ -2,6 +2,6 @@
>>  #ifndef _ASM_X86_MISC_H
>>  #define _ASM_X86_MISC_H
>>  
>> -int num_digits(int val);
>> +int num_digits(unsigned int val);
>>  
>>  #endif /* _ASM_X86_MISC_H */
>> diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
>> index 40b81c338ae5..9623795b059f 100644
>> --- a/arch/x86/lib/misc.c
>> +++ b/arch/x86/lib/misc.c
>> @@ -2,23 +2,21 @@
>>  #include <asm/misc.h>
>>  
>>  /*
>> - * Count the digits of @val including a possible sign.
>> - *
>> - * (Typed on and submitted from hpa's mobile phone.)
>> + * Count the decimal digits of an unsigned integer.
>>   */
>> -int num_digits(int val)
>> +int num_digits(unsigned int x)
>>  {
>> -	long long m = 10;
>> -	int d = 1;
>> +	int n = 0;
>>  
>> -	if (val < 0) {
>> -		d++;
>> -		val = -val;
>> -	}
>> +	asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>> +	asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
>>  
>> -	while (val >= m) {
>> -		m *= 10;
>> -		d++;
>> -	}
>> -	return d;
>> +	return n;
>>  }
>

No, please, it was a joke.

[PATCH v3] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by David Desobry 2 weeks, 3 days ago

The current implementation of num_digits() uses a loop with repeated
multiplication, which is inefficient. Furthermore, the negation of
the input value "val = -val" causes undefined behavior when val is
INT_MIN, as its absolute value cannot be represented as a 32-bit integer.

Replace the loop with a linear chain of comparisons. This allows the
compiler to generate optimized branch sequences, providing better
performance than the original loop and more efficient assembly than a
switch statement on compilers like Clang. By using an internal unsigned
int to handle the magnitude, we safely handle the INT_MIN case without
relying on 64-bit types or triggering undefined signed overflow.

Removed the outdated comment.

Signed-off-by: David Desobry <david.desobry@formalgen.com>
---
v3:
 - Replaced switch statement with a linear if-chain for better compiler 
   optimization.
 - Simplified sign handling logic to a single if/else block.
 - (v2) Fixed INT_MIN overflow using unsigned magnitude and removed 
   outdated comment.

 arch/x86/lib/misc.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
index 40b81c338ae5..3d56583411cf 100644
--- a/arch/x86/lib/misc.c
+++ b/arch/x86/lib/misc.c
@@ -3,22 +3,38 @@
 
 /*
  * Count the digits of @val including a possible sign.
- *
- * (Typed on and submitted from hpa's mobile phone.)
  */
 int num_digits(int val)
 {
-	long long m = 10;
-	int d = 1;
+	unsigned int v;
+	int d;
 
 	if (val < 0) {
-		d++;
-		val = -val;
+		d = 1;
+		v = -val;
+	} else {
+		d = 0;
+		v = val;
 	}
 
-	while (val >= m) {
-		m *= 10;
-		d++;
-	}
-	return d;
+	if (v <= 9)
+		return d + 1;
+	if (v <= 99)
+		return d + 2;
+	if (v <= 999)
+		return d + 3;
+	if (v <= 9999)
+		return d + 4;
+	if (v <= 99999)
+		return d + 5;
+	if (v <= 999999)
+		return d + 6;
+	if (v <= 9999999)
+		return d + 7;
+	if (v <= 99999999)
+		return d + 8;
+	if (v <= 999999999)
+		return d + 9;
+
+	return d + 10;
 }
-- 
2.43.0

Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

Posted by H. Peter Anvin 2 weeks, 3 days ago

On January 20, 2026 9:47:48 AM PST, David Desobry <david.desobry@formalgen.com> wrote:
>The current implementation of num_digits() uses a loop with repeated
>multiplication, which is inefficient. Furthermore, the negation of
>the input value "val = -val" causes undefined behavior when val is
>INT_MIN, as its absolute value cannot be represented as a 32-bit integer.
>
>Replace the loop with a switch statement using GCC case ranges. This
>allows the compiler to generate a jump table or a series of optimized
>comparisons, providing O(1) performance. By using an unsigned int to
>handle the magnitude, we safely handle the INT_MIN case without
>relying on 64-bit types or undefined signed overflow.
>
>Removed the outdated comment.
>
>Signed-off-by: David Desobry <david.desobry@formalgen.com>
>---
>v2:
> - Replaced loop with switch statement and GCC case ranges.
> - Fixed INT_MIN overflow using unsigned int cast instead of s64/long long.
> - Removed outdated comment regarding mobile submission.
> arch/x86/lib/misc.c | 35 +++++++++++++++++++++++++----------
> 1 file changed, 25 insertions(+), 10 deletions(-)
>
>diff --git a/arch/x86/lib/misc.c b/arch/x86/lib/misc.c
>index 40b81c338ae5..03ba028d5326 100644
>--- a/arch/x86/lib/misc.c
>+++ b/arch/x86/lib/misc.c
>@@ -3,22 +3,37 @@
> 
> /*
>  * Count the digits of @val including a possible sign.
>- *
>- * (Typed on and submitted from hpa's mobile phone.)
>  */
> int num_digits(int val)
> {
>-	long long m = 10;
>-	int d = 1;
>+	unsigned int v = val;
>+	int d = 0;
> 
> 	if (val < 0) {
>-		d++;
>-		val = -val;
>+		d = 1;
>+		v = -v;
> 	}
> 
>-	while (val >= m) {
>-		m *= 10;
>-		d++;
>+	switch (v) {
>+	case 0 ... 9:
>+		return d + 1;
>+	case 10 ... 99:
>+		return d + 2;
>+	case 100 ... 999:
>+		return d + 3;
>+	case 1000 ... 9999:
>+		return d + 4;
>+	case 10000 ... 99999:
>+		return d + 5;
>+	case 100000 ... 999999:
>+		return d + 6;
>+	case 1000000 ... 9999999:
>+		return d + 7;
>+	case 10000000 ... 99999999:
>+		return d + 8;
>+	case 100000000 ... 999999999:
>+		return d + 9;
>+	default:
>+		return d + 10;
> 	}
>-	return d;
> }

Looks good to me 

Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com>