[v4] Implement mul_u64_u64_div_u64_roundup()

[PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by David Laight 3 months, 1 week ago

If the product is only 64bits div64_u64() can be used for the divide.
Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
simple post-multiply check that the high 64bits are zero.

This has the advantage of being simpler, more accurate and less code.
It will always be faster when the product is larger than 64bits.

Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
(for the low 64bits) even when div64_u64() is called - so the early
check gains nothing and is just extra code.

32bit cpu will need a compare (etc) to generate the 64bit ilog2()
from two 32bit bit scans - so that is non-trivial.
(Never mind the mess of x86's 'bsr' and any oddball cpu without
fast bit-scan instructions.)
Whereas the additional instructions for the 128bit multiply result
are pretty much one multiply and two adds (typically the 'adc $0,%reg'
can be run in parallel with the instruction that follows).

The only outliers are 64bit systems without 128bit mutiply and
simple in order 32bit ones with fast bit scan but needing extra
instructions to get the high bits of the multiply result.
I doubt it makes much difference to either, the latter is definitely
not mainstream.

If anyone is worried about the analysis they can look at the
generated code for x86 (especially when cmov isn't used).

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

Split from patch 3 for v2, unchanged since.

 lib/math/div64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/math/div64.c b/lib/math/div64.c
index 1092f41e878e..7158d141b6e9 100644
--- a/lib/math/div64.c
+++ b/lib/math/div64.c
@@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
 #ifndef mul_u64_u64_div_u64
 u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
 {
-	if (ilog2(a) + ilog2(b) <= 62)
-		return div64_u64(a * b, d);
-
 #if defined(__SIZEOF_INT128__)
 
 	/* native 64x64=128 bits multiplication */
@@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
 		return ~0ULL;
 	}
 
+	if (!n_hi)
+		return div64_u64(n_lo, d);
+
 	int shift = __builtin_ctzll(d);
 
 	/* try reducing the fraction in case the dividend becomes <= 64 bits */
-- 
2.39.5

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by Nicolas Pitre 3 months, 1 week ago

On Wed, 29 Oct 2025, David Laight wrote:

> If the product is only 64bits div64_u64() can be used for the divide.
> Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> simple post-multiply check that the high 64bits are zero.
> 
> This has the advantage of being simpler, more accurate and less code.
> It will always be faster when the product is larger than 64bits.
> 
> Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> (for the low 64bits) even when div64_u64() is called - so the early
> check gains nothing and is just extra code.
> 
> 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> from two 32bit bit scans - so that is non-trivial.
> (Never mind the mess of x86's 'bsr' and any oddball cpu without
> fast bit-scan instructions.)
> Whereas the additional instructions for the 128bit multiply result
> are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> can be run in parallel with the instruction that follows).
> 
> The only outliers are 64bit systems without 128bit mutiply and
> simple in order 32bit ones with fast bit scan but needing extra
> instructions to get the high bits of the multiply result.
> I doubt it makes much difference to either, the latter is definitely
> not mainstream.
> 
> If anyone is worried about the analysis they can look at the
> generated code for x86 (especially when cmov isn't used).
> 
> Signed-off-by: David Laight <david.laight.linux@gmail.com>

Comment below.


> ---
> 
> Split from patch 3 for v2, unchanged since.
> 
>  lib/math/div64.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/math/div64.c b/lib/math/div64.c
> index 1092f41e878e..7158d141b6e9 100644
> --- a/lib/math/div64.c
> +++ b/lib/math/div64.c
> @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
>  #ifndef mul_u64_u64_div_u64
>  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
>  {
> -	if (ilog2(a) + ilog2(b) <= 62)
> -		return div64_u64(a * b, d);
> -
>  #if defined(__SIZEOF_INT128__)
>  
>  	/* native 64x64=128 bits multiplication */
> @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
>  		return ~0ULL;
>  	}
>  
> +	if (!n_hi)
> +		return div64_u64(n_lo, d);

I'd move this before the overflow test. If this is to be taken then 
you'll save one test. same cost otherwise.

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by David Laight 3 months, 1 week ago

On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Wed, 29 Oct 2025, David Laight wrote:
> 
> > If the product is only 64bits div64_u64() can be used for the divide.
> > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > simple post-multiply check that the high 64bits are zero.
> > 
> > This has the advantage of being simpler, more accurate and less code.
> > It will always be faster when the product is larger than 64bits.
> > 
> > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > (for the low 64bits) even when div64_u64() is called - so the early
> > check gains nothing and is just extra code.
> > 
> > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > from two 32bit bit scans - so that is non-trivial.
> > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > fast bit-scan instructions.)
> > Whereas the additional instructions for the 128bit multiply result
> > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > can be run in parallel with the instruction that follows).
> > 
> > The only outliers are 64bit systems without 128bit mutiply and
> > simple in order 32bit ones with fast bit scan but needing extra
> > instructions to get the high bits of the multiply result.
> > I doubt it makes much difference to either, the latter is definitely
> > not mainstream.
> > 
> > If anyone is worried about the analysis they can look at the
> > generated code for x86 (especially when cmov isn't used).
> > 
> > Signed-off-by: David Laight <david.laight.linux@gmail.com>  
> 
> Comment below.
> 
> 
> > ---
> > 
> > Split from patch 3 for v2, unchanged since.
> > 
> >  lib/math/div64.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > index 1092f41e878e..7158d141b6e9 100644
> > --- a/lib/math/div64.c
> > +++ b/lib/math/div64.c
> > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> >  #ifndef mul_u64_u64_div_u64
> >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> >  {
> > -	if (ilog2(a) + ilog2(b) <= 62)
> > -		return div64_u64(a * b, d);
> > -
> >  #if defined(__SIZEOF_INT128__)
> >  
> >  	/* native 64x64=128 bits multiplication */
> > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> >  		return ~0ULL;
> >  	}
> >  
> > +	if (!n_hi)
> > +		return div64_u64(n_lo, d);  
> 
> I'd move this before the overflow test. If this is to be taken then 
> you'll save one test. same cost otherwise.
> 

I wanted the 'divide by zero' result to be consistent.

Additionally the change to stop the x86-64 version panicking on
overflow also makes it return ~0 for divide by zero.
If that is done then this version needs to be consistent and
return ~0 for divide by zero - which div64_u64() won't do.

It is worth remembering that the chance of (a * b + c)/d being ~0
is pretty small (for non-test inputs), and any code that might expect
such a value is likely to have to handle overflow as well.
(Not to mention avoiding overflow of 'a' and 'b'.)
So using ~0 for overflow isn't really a problem.

	David

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by Nicolas Pitre 3 months, 1 week ago

On Fri, 31 Oct 2025, David Laight wrote:

> On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
> Nicolas Pitre <npitre@baylibre.com> wrote:
> 
> > On Wed, 29 Oct 2025, David Laight wrote:
> > 
> > > If the product is only 64bits div64_u64() can be used for the divide.
> > > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > > simple post-multiply check that the high 64bits are zero.
> > > 
> > > This has the advantage of being simpler, more accurate and less code.
> > > It will always be faster when the product is larger than 64bits.
> > > 
> > > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > > (for the low 64bits) even when div64_u64() is called - so the early
> > > check gains nothing and is just extra code.
> > > 
> > > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > > from two 32bit bit scans - so that is non-trivial.
> > > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > > fast bit-scan instructions.)
> > > Whereas the additional instructions for the 128bit multiply result
> > > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > > can be run in parallel with the instruction that follows).
> > > 
> > > The only outliers are 64bit systems without 128bit mutiply and
> > > simple in order 32bit ones with fast bit scan but needing extra
> > > instructions to get the high bits of the multiply result.
> > > I doubt it makes much difference to either, the latter is definitely
> > > not mainstream.
> > > 
> > > If anyone is worried about the analysis they can look at the
> > > generated code for x86 (especially when cmov isn't used).
> > > 
> > > Signed-off-by: David Laight <david.laight.linux@gmail.com>  
> > 
> > Comment below.
> > 
> > 
> > > ---
> > > 
> > > Split from patch 3 for v2, unchanged since.
> > > 
> > >  lib/math/div64.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > index 1092f41e878e..7158d141b6e9 100644
> > > --- a/lib/math/div64.c
> > > +++ b/lib/math/div64.c
> > > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> > >  #ifndef mul_u64_u64_div_u64
> > >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > >  {
> > > -	if (ilog2(a) + ilog2(b) <= 62)
> > > -		return div64_u64(a * b, d);
> > > -
> > >  #if defined(__SIZEOF_INT128__)
> > >  
> > >  	/* native 64x64=128 bits multiplication */
> > > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > >  		return ~0ULL;
> > >  	}
> > >  
> > > +	if (!n_hi)
> > > +		return div64_u64(n_lo, d);  
> > 
> > I'd move this before the overflow test. If this is to be taken then 
> > you'll save one test. same cost otherwise.
> > 
> 
> I wanted the 'divide by zero' result to be consistent.

It is. div64_u64(x, 0) will produce the same result/behavior.

> Additionally the change to stop the x86-64 version panicking on
> overflow also makes it return ~0 for divide by zero.
> If that is done then this version needs to be consistent and
> return ~0 for divide by zero - which div64_u64() won't do.

Well here I disagree. If that is some x86 peculiarity then x86 should 
deal with it and not impose it on everybody. At least having most other 
architectures raising SIGFPE when encountering a divide by 0 should 
provide enough coverage to have such obviously buggy code fixed.

> It is worth remembering that the chance of (a * b + c)/d being ~0
> is pretty small (for non-test inputs), and any code that might expect
> such a value is likely to have to handle overflow as well.
> (Not to mention avoiding overflow of 'a' and 'b'.)
> So using ~0 for overflow isn't really a problem.

It is not.

To be clear I'm not talking about overflow nor divide by zero here. I'm 
suggesting that the case where div64_u64() can be used should be tested 
first as this is a far more prevalent valid occurrence than a zero 
divisor which is not.


Nicolas

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by David Laight 3 months, 1 week ago

On Fri, 31 Oct 2025 13:26:41 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Fri, 31 Oct 2025, David Laight wrote:
> 
> > On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
> > Nicolas Pitre <npitre@baylibre.com> wrote:
> >   
> > > On Wed, 29 Oct 2025, David Laight wrote:
> > >   
> > > > If the product is only 64bits div64_u64() can be used for the divide.
> > > > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > > > simple post-multiply check that the high 64bits are zero.
> > > > 
> > > > This has the advantage of being simpler, more accurate and less code.
> > > > It will always be faster when the product is larger than 64bits.
> > > > 
> > > > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > > > (for the low 64bits) even when div64_u64() is called - so the early
> > > > check gains nothing and is just extra code.
> > > > 
> > > > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > > > from two 32bit bit scans - so that is non-trivial.
> > > > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > > > fast bit-scan instructions.)
> > > > Whereas the additional instructions for the 128bit multiply result
> > > > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > > > can be run in parallel with the instruction that follows).
> > > > 
> > > > The only outliers are 64bit systems without 128bit mutiply and
> > > > simple in order 32bit ones with fast bit scan but needing extra
> > > > instructions to get the high bits of the multiply result.
> > > > I doubt it makes much difference to either, the latter is definitely
> > > > not mainstream.
> > > > 
> > > > If anyone is worried about the analysis they can look at the
> > > > generated code for x86 (especially when cmov isn't used).
> > > > 
> > > > Signed-off-by: David Laight <david.laight.linux@gmail.com>    
> > > 
> > > Comment below.
> > > 
> > >   
> > > > ---
> > > > 
> > > > Split from patch 3 for v2, unchanged since.
> > > > 
> > > >  lib/math/div64.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > > index 1092f41e878e..7158d141b6e9 100644
> > > > --- a/lib/math/div64.c
> > > > +++ b/lib/math/div64.c
> > > > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> > > >  #ifndef mul_u64_u64_div_u64
> > > >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > >  {
> > > > -	if (ilog2(a) + ilog2(b) <= 62)
> > > > -		return div64_u64(a * b, d);
> > > > -
> > > >  #if defined(__SIZEOF_INT128__)
> > > >  
> > > >  	/* native 64x64=128 bits multiplication */
> > > > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > >  		return ~0ULL;
> > > >  	}
> > > >  
> > > > +	if (!n_hi)
> > > > +		return div64_u64(n_lo, d);    
> > > 
> > > I'd move this before the overflow test. If this is to be taken then 
> > > you'll save one test. same cost otherwise.
> > >   
> > 
> > I wanted the 'divide by zero' result to be consistent.  
> 
> It is. div64_u64(x, 0) will produce the same result/behavior.

Are you sure, for all architectures?

> 
> > Additionally the change to stop the x86-64 version panicking on
> > overflow also makes it return ~0 for divide by zero.
> > If that is done then this version needs to be consistent and
> > return ~0 for divide by zero - which div64_u64() won't do.  
> 
> Well here I disagree. If that is some x86 peculiarity then x86 should 
> deal with it and not impose it on everybody. At least having most other 
> architectures raising SIGFPE when encountering a divide by 0 should 
> provide enough coverage to have such obviously buggy code fixed.

The issue here is that crashing the kernel isn't really acceptable.
An extra parameter could be added to return the 'status',
but that makes the calling interface horrid.
So returning ~0 on overflow and divide-by-zero makes it possible
for the caller to check for errors.
Ok, you lose ~0 as a valid result - but that is very unlikely to
need to be treated differently to 'overflow'.

> 
> > It is worth remembering that the chance of (a * b + c)/d being ~0
> > is pretty small (for non-test inputs), and any code that might expect
> > such a value is likely to have to handle overflow as well.
> > (Not to mention avoiding overflow of 'a' and 'b'.)
> > So using ~0 for overflow isn't really a problem.  
> 
> It is not.
> 
> To be clear I'm not talking about overflow nor divide by zero here. I'm 
> suggesting that the case where div64_u64() can be used should be tested 
> first as this is a far more prevalent valid occurrence than a zero 
> divisor which is not.

and I'd rather use the same error path for 'divide by zero'.

	David

> 
> 
> Nicolas

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by Nicolas Pitre 3 months, 1 week ago

On Fri, 31 Oct 2025, David Laight wrote:

> On Fri, 31 Oct 2025 13:26:41 -0400 (EDT)
> Nicolas Pitre <npitre@baylibre.com> wrote:
> 
> > On Fri, 31 Oct 2025, David Laight wrote:
> > 
> > > On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
> > > Nicolas Pitre <npitre@baylibre.com> wrote:
> > >   
> > > > On Wed, 29 Oct 2025, David Laight wrote:
> > > >   
> > > > > If the product is only 64bits div64_u64() can be used for the divide.
> > > > > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > > > > simple post-multiply check that the high 64bits are zero.
> > > > > 
> > > > > This has the advantage of being simpler, more accurate and less code.
> > > > > It will always be faster when the product is larger than 64bits.
> > > > > 
> > > > > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > > > > (for the low 64bits) even when div64_u64() is called - so the early
> > > > > check gains nothing and is just extra code.
> > > > > 
> > > > > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > > > > from two 32bit bit scans - so that is non-trivial.
> > > > > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > > > > fast bit-scan instructions.)
> > > > > Whereas the additional instructions for the 128bit multiply result
> > > > > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > > > > can be run in parallel with the instruction that follows).
> > > > > 
> > > > > The only outliers are 64bit systems without 128bit mutiply and
> > > > > simple in order 32bit ones with fast bit scan but needing extra
> > > > > instructions to get the high bits of the multiply result.
> > > > > I doubt it makes much difference to either, the latter is definitely
> > > > > not mainstream.
> > > > > 
> > > > > If anyone is worried about the analysis they can look at the
> > > > > generated code for x86 (especially when cmov isn't used).
> > > > > 
> > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com>    
> > > > 
> > > > Comment below.
> > > > 
> > > >   
> > > > > ---
> > > > > 
> > > > > Split from patch 3 for v2, unchanged since.
> > > > > 
> > > > >  lib/math/div64.c | 6 +++---
> > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > 
> > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > > > index 1092f41e878e..7158d141b6e9 100644
> > > > > --- a/lib/math/div64.c
> > > > > +++ b/lib/math/div64.c
> > > > > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> > > > >  #ifndef mul_u64_u64_div_u64
> > > > >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > > >  {
> > > > > -	if (ilog2(a) + ilog2(b) <= 62)
> > > > > -		return div64_u64(a * b, d);
> > > > > -
> > > > >  #if defined(__SIZEOF_INT128__)
> > > > >  
> > > > >  	/* native 64x64=128 bits multiplication */
> > > > > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > > >  		return ~0ULL;
> > > > >  	}
> > > > >  
> > > > > +	if (!n_hi)
> > > > > +		return div64_u64(n_lo, d);    
> > > > 
> > > > I'd move this before the overflow test. If this is to be taken then 
> > > > you'll save one test. same cost otherwise.
> > > >   
> > > 
> > > I wanted the 'divide by zero' result to be consistent.  
> > 
> > It is. div64_u64(x, 0) will produce the same result/behavior.
> 
> Are you sure, for all architectures?

At least all the ones I'm familiar with.

> > 
> > > Additionally the change to stop the x86-64 version panicking on
> > > overflow also makes it return ~0 for divide by zero.
> > > If that is done then this version needs to be consistent and
> > > return ~0 for divide by zero - which div64_u64() won't do.  
> > 
> > Well here I disagree. If that is some x86 peculiarity then x86 should 
> > deal with it and not impose it on everybody. At least having most other 
> > architectures raising SIGFPE when encountering a divide by 0 should 
> > provide enough coverage to have such obviously buggy code fixed.
> 
> The issue here is that crashing the kernel isn't really acceptable.

Encountering a div-by-0 _will_ crash the kernel (or at least kill the 
current task) with most CPUs. They do raise an exception already with 
the other division types. This is no different.

> An extra parameter could be added to return the 'status',
> but that makes the calling interface horrid.

No please.

> So returning ~0 on overflow and divide-by-zero makes it possible
> for the caller to check for errors.

The caller should check for a possible zero divisor _before_ performing 
a division not after. Relying on the div-by_0 CPU behavior is a bug.

> Ok, you lose ~0 as a valid result - but that is very unlikely to
> need to be treated differently to 'overflow'.

I disagree. You need to check for a zero divisor up front and not rely 
on the division to tell you about it.  This is true whether you do
a = b/c; a = div64_u64(b, c); or a = mul_u64_u64_div_u64(a, b, c);.
Most architectures will simply raise an exception if you attempt a div 
by 0, some will return a plain 0. You can't rely on that.

But you need to perform the mul+div before you know there is an 
overflow. Maybe the handling of those cases is the same for the caller 
but this is certainly not universal.


Nicolas

Re: [PATCH v4 next 3/9] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

Posted by David Laight 3 months, 1 week ago

On Fri, 31 Oct 2025 14:45:49 -0400 (EDT)
Nicolas Pitre <npitre@baylibre.com> wrote:

> On Fri, 31 Oct 2025, David Laight wrote:
> 
> > On Fri, 31 Oct 2025 13:26:41 -0400 (EDT)
> > Nicolas Pitre <npitre@baylibre.com> wrote:
> >   
> > > On Fri, 31 Oct 2025, David Laight wrote:
> > >   
> > > > On Wed, 29 Oct 2025 14:11:08 -0400 (EDT)
> > > > Nicolas Pitre <npitre@baylibre.com> wrote:
> > > >     
> > > > > On Wed, 29 Oct 2025, David Laight wrote:
> > > > >     
> > > > > > If the product is only 64bits div64_u64() can be used for the divide.
> > > > > > Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
> > > > > > simple post-multiply check that the high 64bits are zero.
> > > > > > 
> > > > > > This has the advantage of being simpler, more accurate and less code.
> > > > > > It will always be faster when the product is larger than 64bits.
> > > > > > 
> > > > > > Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
> > > > > > (for the low 64bits) even when div64_u64() is called - so the early
> > > > > > check gains nothing and is just extra code.
> > > > > > 
> > > > > > 32bit cpu will need a compare (etc) to generate the 64bit ilog2()
> > > > > > from two 32bit bit scans - so that is non-trivial.
> > > > > > (Never mind the mess of x86's 'bsr' and any oddball cpu without
> > > > > > fast bit-scan instructions.)
> > > > > > Whereas the additional instructions for the 128bit multiply result
> > > > > > are pretty much one multiply and two adds (typically the 'adc $0,%reg'
> > > > > > can be run in parallel with the instruction that follows).
> > > > > > 
> > > > > > The only outliers are 64bit systems without 128bit mutiply and
> > > > > > simple in order 32bit ones with fast bit scan but needing extra
> > > > > > instructions to get the high bits of the multiply result.
> > > > > > I doubt it makes much difference to either, the latter is definitely
> > > > > > not mainstream.
> > > > > > 
> > > > > > If anyone is worried about the analysis they can look at the
> > > > > > generated code for x86 (especially when cmov isn't used).
> > > > > > 
> > > > > > Signed-off-by: David Laight <david.laight.linux@gmail.com>      
> > > > > 
> > > > > Comment below.
> > > > > 
> > > > >     
> > > > > > ---
> > > > > > 
> > > > > > Split from patch 3 for v2, unchanged since.
> > > > > > 
> > > > > >  lib/math/div64.c | 6 +++---
> > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > > > > index 1092f41e878e..7158d141b6e9 100644
> > > > > > --- a/lib/math/div64.c
> > > > > > +++ b/lib/math/div64.c
> > > > > > @@ -186,9 +186,6 @@ EXPORT_SYMBOL(iter_div_u64_rem);
> > > > > >  #ifndef mul_u64_u64_div_u64
> > > > > >  u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > > > >  {
> > > > > > -	if (ilog2(a) + ilog2(b) <= 62)
> > > > > > -		return div64_u64(a * b, d);
> > > > > > -
> > > > > >  #if defined(__SIZEOF_INT128__)
> > > > > >  
> > > > > >  	/* native 64x64=128 bits multiplication */
> > > > > > @@ -224,6 +221,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
> > > > > >  		return ~0ULL;
> > > > > >  	}
> > > > > >  
> > > > > > +	if (!n_hi)
> > > > > > +		return div64_u64(n_lo, d);      
> > > > > 
> > > > > I'd move this before the overflow test. If this is to be taken then 
> > > > > you'll save one test. same cost otherwise.
> > > > >     
> > > > 
> > > > I wanted the 'divide by zero' result to be consistent.    
> > > 
> > > It is. div64_u64(x, 0) will produce the same result/behavior.  
> > 
> > Are you sure, for all architectures?  
> 
> At least all the ones I'm familiar with.
> 
> > >   
> > > > Additionally the change to stop the x86-64 version panicking on
> > > > overflow also makes it return ~0 for divide by zero.
> > > > If that is done then this version needs to be consistent and
> > > > return ~0 for divide by zero - which div64_u64() won't do.    
> > > 
> > > Well here I disagree. If that is some x86 peculiarity then x86 should 
> > > deal with it and not impose it on everybody. At least having most other 
> > > architectures raising SIGFPE when encountering a divide by 0 should 
> > > provide enough coverage to have such obviously buggy code fixed.  
> > 
> > The issue here is that crashing the kernel isn't really acceptable.  
> 
> Encountering a div-by-0 _will_ crash the kernel (or at least kill the 
> current task) with most CPUs. They do raise an exception already with 
> the other division types. This is no different.
> 
> > An extra parameter could be added to return the 'status',
> > but that makes the calling interface horrid.  
> 
> No please.
> 
> > So returning ~0 on overflow and divide-by-zero makes it possible
> > for the caller to check for errors.  
> 
> The caller should check for a possible zero divisor _before_ performing 
> a division not after. Relying on the div-by_0 CPU behavior is a bug.
> 
> > Ok, you lose ~0 as a valid result - but that is very unlikely to
> > need to be treated differently to 'overflow'.  
> 
> I disagree. You need to check for a zero divisor up front and not rely 
> on the division to tell you about it.  This is true whether you do
> a = b/c; a = div64_u64(b, c); or a = mul_u64_u64_div_u64(a, b, c);.
> Most architectures will simply raise an exception if you attempt a div 
> by 0, some will return a plain 0. You can't rely on that.
> 
> But you need to perform the mul+div before you know there is an 
> overflow. Maybe the handling of those cases is the same for the caller 
> but this is certainly not universal.

Anyway this is all pretty much irrelevant for this patch.

	David

> 
> 
> Nicolas