The asm in arch_ffs() is safe but inefficient.
CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input. It is safe to pre-load the destination register with -1 and drop the
conditional logic.
However, it is common to find ffs() in a context where the optimiser knows
that x in nonzero even if it the value isn't known precisely, and in that case
it's safe to drop the preload of -1 too.
There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31)
Function old new delta
mask_write 114 107 -7
xmem_pool_alloc 1063 1039 -24
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Bertrand Marquis <bertrand.marquis@arm.com>
CC: Michal Orzel <michal.orzel@amd.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
CC: Shawn Anastasio <sanastasio@raptorengineering.com>
CC: consulting@bugseng.com <consulting@bugseng.com>
CC: Simone Ballarin <simone.ballarin@bugseng.com>
CC: Federico Serafini <federico.serafini@bugseng.com>
CC: Nicola Vetrini <nicola.vetrini@bugseng.com>
v2:
* New.
* Use __builtin_constant_p(x > 0) to optimise better.
---
xen/arch/x86/include/asm/bitops.h | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/include/asm/bitops.h b/xen/arch/x86/include/asm/bitops.h
index 122767fc0d10..1d7aea6065ef 100644
--- a/xen/arch/x86/include/asm/bitops.h
+++ b/xen/arch/x86/include/asm/bitops.h
@@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x)
static always_inline unsigned int arch_ffs(unsigned int x)
{
- int r;
+ unsigned int r;
+
+ if ( __builtin_constant_p(x > 0) && x > 0 )
+ {
+ /* Safe, when the compiler knows that x is nonzero. */
+ asm ( "bsf %[val], %[res]"
+ : [res] "=r" (r)
+ : [val] "rm" (x) );
+ }
+ else
+ {
+ /*
+ * The AMD manual states that BSF won't modify the destination
+ * register if x=0. The Intel manual states that the result is
+ * undefined, but the architects have said that the register is
+ * written back with it's old value (zero extended as normal).
+ */
+ asm ( "bsf %[val], %[res]"
+ : [res] "=r" (r)
+ : [val] "rm" (x), "[res]" (-1) );
+ }
- asm ( "bsf %1,%0\n\t"
- "jnz 1f\n\t"
- "mov $-1,%0\n"
- "1:" : "=r" (r) : "rm" (x));
return r + 1;
}
#define arch_ffs arch_ffs
--
2.30.2
On 24.05.2024 22:03, Andrew Cooper wrote: > --- a/xen/arch/x86/include/asm/bitops.h > +++ b/xen/arch/x86/include/asm/bitops.h > @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) > > static always_inline unsigned int arch_ffs(unsigned int x) > { > - int r; > + unsigned int r; > + > + if ( __builtin_constant_p(x > 0) && x > 0 ) > + { > + /* Safe, when the compiler knows that x is nonzero. */ > + asm ( "bsf %[val], %[res]" > + : [res] "=r" (r) > + : [val] "rm" (x) ); > + } In patch 11 relevant things are all in a single patch, making it easier to spot that this is dead code: The sole caller already has a __builtin_constant_p(), hence I don't see how the one here could ever return true. With that the respective part of the description is then questionable, too, I'm afraid: Where did you observe any actual effect from this? Or if you did - what am I missing? Jan
On 27.05.2024 15:27, Jan Beulich wrote: > On 24.05.2024 22:03, Andrew Cooper wrote: >> --- a/xen/arch/x86/include/asm/bitops.h >> +++ b/xen/arch/x86/include/asm/bitops.h >> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) >> >> static always_inline unsigned int arch_ffs(unsigned int x) >> { >> - int r; >> + unsigned int r; >> + >> + if ( __builtin_constant_p(x > 0) && x > 0 ) >> + { >> + /* Safe, when the compiler knows that x is nonzero. */ >> + asm ( "bsf %[val], %[res]" >> + : [res] "=r" (r) >> + : [val] "rm" (x) ); >> + } > > In patch 11 relevant things are all in a single patch, making it easier > to spot that this is dead code: The sole caller already has a > __builtin_constant_p(), hence I don't see how the one here could ever > return true. With that the respective part of the description is then > questionable, too, I'm afraid: Where did you observe any actual effect > from this? Or if you did - what am I missing? Hmm, thinking about it: I suppose that's why you have __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit I'm (positively) surprised that the former may return true when the latter doesn't. Nevertheless I'm inclined to think this deserves a brief comment. As an aside, to better match the comment inside the if()'s body, how about if ( __builtin_constant_p(!!x) && x ) ? That also may make a little more clear that this isn't just a style choice, but actually needed for the intended purpose. Jan
On 27/05/2024 2:37 pm, Jan Beulich wrote: > On 27.05.2024 15:27, Jan Beulich wrote: >> On 24.05.2024 22:03, Andrew Cooper wrote: >>> --- a/xen/arch/x86/include/asm/bitops.h >>> +++ b/xen/arch/x86/include/asm/bitops.h >>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) >>> >>> static always_inline unsigned int arch_ffs(unsigned int x) >>> { >>> - int r; >>> + unsigned int r; >>> + >>> + if ( __builtin_constant_p(x > 0) && x > 0 ) >>> + { >>> + /* Safe, when the compiler knows that x is nonzero. */ >>> + asm ( "bsf %[val], %[res]" >>> + : [res] "=r" (r) >>> + : [val] "rm" (x) ); >>> + } >> In patch 11 relevant things are all in a single patch, making it easier >> to spot that this is dead code: The sole caller already has a >> __builtin_constant_p(), hence I don't see how the one here could ever >> return true. With that the respective part of the description is then >> questionable, too, I'm afraid: Where did you observe any actual effect >> from this? Or if you did - what am I missing? > Hmm, thinking about it: I suppose that's why you have > __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit > I'm (positively) surprised that the former may return true when the latter > doesn't. So was I, but this recommendation came straight from the GCC mailing list. And it really does work, even back in obsolete versions of GCC. __builtin_constant_p() operates on an expression not a value, and is documented as such. > Nevertheless I'm inclined to think this deserves a brief comment. There is a comment, and it's even visible in the snippet. > As an aside, to better match the comment inside the if()'s body, how about > > if ( __builtin_constant_p(!!x) && x ) > > ? That also may make a little more clear that this isn't just a style > choice, but actually needed for the intended purpose. I am not changing the logic. Apart from anything else, your suggestion is trivially buggy. I care about whether the RHS collapses to a constant, and the only way of doing that correctly is asking the compiler about the *exact* expression. Asking about some other expression which you hope - but do not know - that the compiler will treat equivalently is bogus. It would be strictly better to only take the else clause, than to have both halves emitted. This is the form I've tested extensively. It's also the clearest form IMO. You can experiment with alternative forms when we're not staring down code freeze of 4.19. ~Andrew
On 28.05.2024 14:30, Andrew Cooper wrote: > On 27/05/2024 2:37 pm, Jan Beulich wrote: >> On 27.05.2024 15:27, Jan Beulich wrote: >>> On 24.05.2024 22:03, Andrew Cooper wrote: >>>> --- a/xen/arch/x86/include/asm/bitops.h >>>> +++ b/xen/arch/x86/include/asm/bitops.h >>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) >>>> >>>> static always_inline unsigned int arch_ffs(unsigned int x) >>>> { >>>> - int r; >>>> + unsigned int r; >>>> + >>>> + if ( __builtin_constant_p(x > 0) && x > 0 ) >>>> + { >>>> + /* Safe, when the compiler knows that x is nonzero. */ >>>> + asm ( "bsf %[val], %[res]" >>>> + : [res] "=r" (r) >>>> + : [val] "rm" (x) ); >>>> + } >>> In patch 11 relevant things are all in a single patch, making it easier >>> to spot that this is dead code: The sole caller already has a >>> __builtin_constant_p(), hence I don't see how the one here could ever >>> return true. With that the respective part of the description is then >>> questionable, too, I'm afraid: Where did you observe any actual effect >>> from this? Or if you did - what am I missing? >> Hmm, thinking about it: I suppose that's why you have >> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit >> I'm (positively) surprised that the former may return true when the latter >> doesn't. > > So was I, but this recommendation came straight from the GCC mailing > list. And it really does work, even back in obsolete versions of GCC. > > __builtin_constant_p() operates on an expression not a value, and is > documented as such. Of course. >> Nevertheless I'm inclined to think this deserves a brief comment. > > There is a comment, and it's even visible in the snippet. The comment is about the asm(); it is neither placed to clearly relate to __builtin_constant_p(), nor is it saying anything about this specific property of it. You said you were equally surprised; don't you think that when both of us are surprised, a specific (even if brief) comment is warranted? >> As an aside, to better match the comment inside the if()'s body, how about >> >> if ( __builtin_constant_p(!!x) && x ) >> >> ? That also may make a little more clear that this isn't just a style >> choice, but actually needed for the intended purpose. > > I am not changing the logic. > > Apart from anything else, your suggestion is trivially buggy. I care > about whether the RHS collapses to a constant, and the only way of doing > that correctly is asking the compiler about the *exact* expression. > Asking about some other expression which you hope - but do not know - > that the compiler will treat equivalently is bogus. It would be > strictly better to only take the else clause, than to have both halves > emitted. > > This is the form I've tested extensively. It's also the clearest form > IMO. You can experiment with alternative forms when we're not staring > down code freeze of 4.19. "Clearest form" is almost always a matter of taste. To me, comparing unsigned values with > or < against 0 is generally at least suspicious. Using != is typically better (again: imo), and simply omitting the != 0 then is shorter with no difference in effect. Except in peculiar cases like this one, where indeed it took me some time to figure why the comparison operator may not be omitted. All that said: I'm not going to insist on any change; the R-b previously offered still stands. I would highly appreciate though if the (further) comment asked for could be added. What I definitely dislike here is you - not for the first time - turning down remarks because a change of yours is late. This feels even more so bad when considering that I'm typically replying to your patches with pretty little turnaround. Whereas various of mine, pending in part for years, do not seem to deserve any review comments at all. Unlike before, where it was "only" improvements or feature additions, meanwhile even bug fixes are left sit like that. If I may be blunt: This may not work this way for much longer. At some point I will need to artificially delay reviews, making them dependent on my own work also being allowed to make progress. I question though whether that would be in everyone's interest. Jan
On 28/05/2024 2:12 pm, Jan Beulich wrote: > On 28.05.2024 14:30, Andrew Cooper wrote: >> On 27/05/2024 2:37 pm, Jan Beulich wrote: >>> On 27.05.2024 15:27, Jan Beulich wrote: >>>> On 24.05.2024 22:03, Andrew Cooper wrote: >>>>> --- a/xen/arch/x86/include/asm/bitops.h >>>>> +++ b/xen/arch/x86/include/asm/bitops.h >>>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) >>>>> >>>>> static always_inline unsigned int arch_ffs(unsigned int x) >>>>> { >>>>> - int r; >>>>> + unsigned int r; >>>>> + >>>>> + if ( __builtin_constant_p(x > 0) && x > 0 ) >>>>> + { >>>>> + /* Safe, when the compiler knows that x is nonzero. */ >>>>> + asm ( "bsf %[val], %[res]" >>>>> + : [res] "=r" (r) >>>>> + : [val] "rm" (x) ); >>>>> + } >>>> In patch 11 relevant things are all in a single patch, making it easier >>>> to spot that this is dead code: The sole caller already has a >>>> __builtin_constant_p(), hence I don't see how the one here could ever >>>> return true. With that the respective part of the description is then >>>> questionable, too, I'm afraid: Where did you observe any actual effect >>>> from this? Or if you did - what am I missing? >>> Hmm, thinking about it: I suppose that's why you have >>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit >>> I'm (positively) surprised that the former may return true when the latter >>> doesn't. >> So was I, but this recommendation came straight from the GCC mailing >> list. And it really does work, even back in obsolete versions of GCC. >> >> __builtin_constant_p() operates on an expression not a value, and is >> documented as such. > Of course. > >>> Nevertheless I'm inclined to think this deserves a brief comment. >> There is a comment, and it's even visible in the snippet. > The comment is about the asm(); it is neither placed to clearly relate > to __builtin_constant_p(), nor is it saying anything about this specific > property of it. You said you were equally surprised; don't you think > that when both of us are surprised, a specific (even if brief) comment > is warranted? Spell it out for me like I'm an idiot. Because I'm looking at the patch I submitted, and at your request for "a brief comment", and I still have no idea what you think is wrong at the moment. I'm also not included to write a comment saying "go and read the GCC manual more carefully". > >>> As an aside, to better match the comment inside the if()'s body, how about >>> >>> if ( __builtin_constant_p(!!x) && x ) >>> >>> ? That also may make a little more clear that this isn't just a style >>> choice, but actually needed for the intended purpose. >> I am not changing the logic. >> >> Apart from anything else, your suggestion is trivially buggy. I care >> about whether the RHS collapses to a constant, and the only way of doing >> that correctly is asking the compiler about the *exact* expression. >> Asking about some other expression which you hope - but do not know - >> that the compiler will treat equivalently is bogus. It would be >> strictly better to only take the else clause, than to have both halves >> emitted. >> >> This is the form I've tested extensively. It's also the clearest form >> IMO. You can experiment with alternative forms when we're not staring >> down code freeze of 4.19. > "Clearest form" is almost always a matter of taste. To me, comparing > unsigned values with > or < against 0 is generally at least suspicious. > Using != is typically better (again: imo), and simply omitting the != 0 > then is shorter with no difference in effect. Except in peculiar cases > like this one, where indeed it took me some time to figure why the > comparison operator may not be omitted. > > All that said: I'm not going to insist on any change; the R-b previously > offered still stands. I would highly appreciate though if the (further) > comment asked for could be added. > > What I definitely dislike here is you - not for the first time - turning > down remarks because a change of yours is late. Actually it's not to do with the release. I'd reject it at any point because it's an unreasonable request to make; to me, or to anyone else. It would be a matter of taste (which again you have a singular view on), if it wasn't for the fact that what you actually said was: "I don't like it, and you should discard all the careful analysis you did because here's a form I prefer, that I haven't tested concerning a behaviour I didn't even realise until this email." and even if it wasn't a buggy suggestion to begin with, it's still toxic maintainer feedback. Frankly, I'd have more time to review other peoples patches if I wasn't wasting all of my time on premium grade manure like this, while trying to help Oleksii who's had it far worse this release trying to clean up droppings of maintainers-past. ~Andrew
On 01.06.2024 03:47, Andrew Cooper wrote: > On 28/05/2024 2:12 pm, Jan Beulich wrote: >> On 28.05.2024 14:30, Andrew Cooper wrote: >>> On 27/05/2024 2:37 pm, Jan Beulich wrote: >>>> On 27.05.2024 15:27, Jan Beulich wrote: >>>>> On 24.05.2024 22:03, Andrew Cooper wrote: >>>>>> --- a/xen/arch/x86/include/asm/bitops.h >>>>>> +++ b/xen/arch/x86/include/asm/bitops.h >>>>>> @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) >>>>>> >>>>>> static always_inline unsigned int arch_ffs(unsigned int x) >>>>>> { >>>>>> - int r; >>>>>> + unsigned int r; >>>>>> + >>>>>> + if ( __builtin_constant_p(x > 0) && x > 0 ) >>>>>> + { >>>>>> + /* Safe, when the compiler knows that x is nonzero. */ >>>>>> + asm ( "bsf %[val], %[res]" >>>>>> + : [res] "=r" (r) >>>>>> + : [val] "rm" (x) ); >>>>>> + } >>>>> In patch 11 relevant things are all in a single patch, making it easier >>>>> to spot that this is dead code: The sole caller already has a >>>>> __builtin_constant_p(), hence I don't see how the one here could ever >>>>> return true. With that the respective part of the description is then >>>>> questionable, too, I'm afraid: Where did you observe any actual effect >>>>> from this? Or if you did - what am I missing? >>>> Hmm, thinking about it: I suppose that's why you have >>>> __builtin_constant_p(x > 0), not __builtin_constant_p(x). I have to admit >>>> I'm (positively) surprised that the former may return true when the latter >>>> doesn't. >>> So was I, but this recommendation came straight from the GCC mailing >>> list. And it really does work, even back in obsolete versions of GCC. >>> >>> __builtin_constant_p() operates on an expression not a value, and is >>> documented as such. >> Of course. >> >>>> Nevertheless I'm inclined to think this deserves a brief comment. >>> There is a comment, and it's even visible in the snippet. >> The comment is about the asm(); it is neither placed to clearly relate >> to __builtin_constant_p(), nor is it saying anything about this specific >> property of it. You said you were equally surprised; don't you think >> that when both of us are surprised, a specific (even if brief) comment >> is warranted? > > Spell it out for me like I'm an idiot. > > Because I'm looking at the patch I submitted, and at your request for "a > brief comment", and I still have no idea what you think is wrong at the > moment. > > I'm also not included to write a comment saying "go and read the GCC > manual more carefully". > >> >>>> As an aside, to better match the comment inside the if()'s body, how about >>>> >>>> if ( __builtin_constant_p(!!x) && x ) >>>> >>>> ? That also may make a little more clear that this isn't just a style >>>> choice, but actually needed for the intended purpose. >>> I am not changing the logic. >>> >>> Apart from anything else, your suggestion is trivially buggy. I care >>> about whether the RHS collapses to a constant, and the only way of doing >>> that correctly is asking the compiler about the *exact* expression. >>> Asking about some other expression which you hope - but do not know - >>> that the compiler will treat equivalently is bogus. It would be >>> strictly better to only take the else clause, than to have both halves >>> emitted. >>> >>> This is the form I've tested extensively. It's also the clearest form >>> IMO. You can experiment with alternative forms when we're not staring >>> down code freeze of 4.19. >> "Clearest form" is almost always a matter of taste. To me, comparing >> unsigned values with > or < against 0 is generally at least suspicious. >> Using != is typically better (again: imo), and simply omitting the != 0 >> then is shorter with no difference in effect. Except in peculiar cases >> like this one, where indeed it took me some time to figure why the >> comparison operator may not be omitted. >> >> All that said: I'm not going to insist on any change; the R-b previously >> offered still stands. I would highly appreciate though if the (further) >> comment asked for could be added. >> >> What I definitely dislike here is you - not for the first time - turning >> down remarks because a change of yours is late. > > Actually it's not to do with the release. I'd reject it at any point > because it's an unreasonable request to make; to me, or to anyone else. > > It would be a matter of taste (which again you have a singular view on), > if it wasn't for the fact that what you actually said was: > > "I don't like it, and you should discard all the careful analysis you > did because here's a form I prefer, that I haven't tested concerning a > behaviour I didn't even realise until this email." Just to clarify: Long before this reply of yours I understood and admitted my mistake. A more clear / well placed comment (see further up) might have avoided that. Still - thanks for extending the comment in what you have committed. > and even if it wasn't a buggy suggestion to begin with, it's still toxic > maintainer feedback. What's toxic about making a mistake? What's toxic about disliking "x > 0" for unsigned quantities? As you say, it's a matter of taste to a fair degree. Yet there are ample cases where taste as in "make it as clear as possible to every reader" is used to ask me or others to change style. I don't see why I shouldn't be permitted to at least make a similar remark, even if then it's turned down (for good or bad reasons). Jan
On 24.05.2024 22:03, Andrew Cooper wrote: > The asm in arch_ffs() is safe but inefficient. > > CMOV would be an improvement over a conditional branch, but for 64bit CPUs > both Intel and AMD have provided enough details about the behaviour for a zero > input. It is safe to pre-load the destination register with -1 and drop the > conditional logic. > > However, it is common to find ffs() in a context where the optimiser knows > that x in nonzero even if it the value isn't known precisely, and in that case > it's safe to drop the preload of -1 too. > > There are only a handful of uses of ffs() in the x86 build, and all of them > improve as a result of this: > > add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-31 (-31) > Function old new delta > mask_write 114 107 -7 > xmem_pool_alloc 1063 1039 -24 > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> with one suggestion: > --- a/xen/arch/x86/include/asm/bitops.h > +++ b/xen/arch/x86/include/asm/bitops.h > @@ -432,12 +432,28 @@ static inline int ffsl(unsigned long x) > > static always_inline unsigned int arch_ffs(unsigned int x) > { > - int r; > + unsigned int r; > + > + if ( __builtin_constant_p(x > 0) && x > 0 ) __builtin_constant_p(x) surely will do. In fact even the other "> 0" could in principle be left out here. Jan
© 2016 - 2024 Red Hat, Inc.