[PATCH] x86/irq: Improve local_irq_restore() code generation and performance

Andrew Cooper posted 1 patch 2 years, 4 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20211206133828.8811-1-andrew.cooper3@citrix.com
xen/include/asm-x86/system.h | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
[PATCH] x86/irq: Improve local_irq_restore() code generation and performance
Posted by Andrew Cooper 2 years, 4 months ago
popf is a horribly expensive instruction, while sti is an optimised fastpath.
Switching popf for a conditional branch and sti caused an 8% perf improvement
in various linux measurements.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/include/asm-x86/system.h | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 65e63de69a67..4be235472ecd 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -267,13 +267,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 })
 #define local_irq_restore(x)                                     \
 ({                                                               \
-    BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
-    asm volatile ( "pushfq\n\t"                                  \
-                   "andq %0, (%%rsp)\n\t"                        \
-                   "orq  %1, (%%rsp)\n\t"                        \
-                   "popfq"                                       \
-                   : : "i?r" ( ~X86_EFLAGS_IF ),                 \
-                       "ri" ( (x) & X86_EFLAGS_IF ) );           \
+    if ( (x) & X86_EFLAGS_IF )                                   \
+        local_irq_enable();                                      \
 })
 
 static inline int local_irq_is_enabled(void)
-- 
2.11.0


Re: [PATCH] x86/irq: Improve local_irq_restore() code generation and performance
Posted by Andrew Cooper 2 years, 4 months ago
On 06/12/2021 13:38, Andrew Cooper wrote:
> popf is a horribly expensive instruction, while sti is an optimised fastpath.
> Switching popf for a conditional branch and sti caused an 8% perf improvement
> in various linux measurements.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> ---
>  xen/include/asm-x86/system.h | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
> index 65e63de69a67..4be235472ecd 100644
> --- a/xen/include/asm-x86/system.h
> +++ b/xen/include/asm-x86/system.h
> @@ -267,13 +267,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>  })
>  #define local_irq_restore(x)                                     \
>  ({                                                               \
> -    BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
> -    asm volatile ( "pushfq\n\t"                                  \
> -                   "andq %0, (%%rsp)\n\t"                        \
> -                   "orq  %1, (%%rsp)\n\t"                        \
> -                   "popfq"                                       \
> -                   : : "i?r" ( ~X86_EFLAGS_IF ),                 \
> -                       "ri" ( (x) & X86_EFLAGS_IF ) );           \
> +    if ( (x) & X86_EFLAGS_IF )                                   \
> +        local_irq_enable();                                      \
>  })

Bah.  There's still the one total abuse of local_irq_restore() to
disable interrupts.

I'll do a pre-requisite patch.

~Andrew

Re: [PATCH] x86/irq: Improve local_irq_restore() code generation and performance
Posted by Jan Beulich 2 years, 4 months ago
On 06.12.2021 14:55, Andrew Cooper wrote:
> On 06/12/2021 13:38, Andrew Cooper wrote:
>> popf is a horribly expensive instruction, while sti is an optimised fastpath.
>> Switching popf for a conditional branch and sti caused an 8% perf improvement
>> in various linux measurements.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>> ---
>>  xen/include/asm-x86/system.h | 9 ++-------
>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
>> index 65e63de69a67..4be235472ecd 100644
>> --- a/xen/include/asm-x86/system.h
>> +++ b/xen/include/asm-x86/system.h
>> @@ -267,13 +267,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>>  })
>>  #define local_irq_restore(x)                                     \
>>  ({                                                               \
>> -    BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
>> -    asm volatile ( "pushfq\n\t"                                  \
>> -                   "andq %0, (%%rsp)\n\t"                        \
>> -                   "orq  %1, (%%rsp)\n\t"                        \
>> -                   "popfq"                                       \
>> -                   : : "i?r" ( ~X86_EFLAGS_IF ),                 \
>> -                       "ri" ( (x) & X86_EFLAGS_IF ) );           \
>> +    if ( (x) & X86_EFLAGS_IF )                                   \
>> +        local_irq_enable();                                      \
>>  })
> 
> Bah.  There's still the one total abuse of local_irq_restore() to
> disable interrupts.

Question is whether that's really to be considered an abuse: To me
"restore" doesn't mean only "maybe re-enable", but also "maybe
re-disable". And a conditional STI-or-CLI is likely still be better
than POPF.

Jan


Re: [PATCH] x86/irq: Improve local_irq_restore() code generation and performance
Posted by Andrew Cooper 2 years, 4 months ago
On 06/12/2021 14:07, Jan Beulich wrote:
> On 06.12.2021 14:55, Andrew Cooper wrote:
>> On 06/12/2021 13:38, Andrew Cooper wrote:
>>> popf is a horribly expensive instruction, while sti is an optimised fastpath.
>>> Switching popf for a conditional branch and sti caused an 8% perf improvement
>>> in various linux measurements.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Wei Liu <wl@xen.org>
>>> ---
>>>  xen/include/asm-x86/system.h | 9 ++-------
>>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
>>> index 65e63de69a67..4be235472ecd 100644
>>> --- a/xen/include/asm-x86/system.h
>>> +++ b/xen/include/asm-x86/system.h
>>> @@ -267,13 +267,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>>>  })
>>>  #define local_irq_restore(x)                                     \
>>>  ({                                                               \
>>> -    BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
>>> -    asm volatile ( "pushfq\n\t"                                  \
>>> -                   "andq %0, (%%rsp)\n\t"                        \
>>> -                   "orq  %1, (%%rsp)\n\t"                        \
>>> -                   "popfq"                                       \
>>> -                   : : "i?r" ( ~X86_EFLAGS_IF ),                 \
>>> -                       "ri" ( (x) & X86_EFLAGS_IF ) );           \
>>> +    if ( (x) & X86_EFLAGS_IF )                                   \
>>> +        local_irq_enable();                                      \
>>>  })
>> Bah.  There's still the one total abuse of local_irq_restore() to
>> disable interrupts.
> Question is whether that's really to be considered an abuse:

These are Linux's APIs, not ours, and they've spoken on the matter. 
Furthermore, I agree with this being an abuse of the mechanism.

>  To me
> "restore" doesn't mean only "maybe re-enable", but also "maybe
> re-disable".

nor does "save" mean "save and disable", but that's what it does.

The naming may not be completely ideal, but the expected usage is very
much one way.

>  And a conditional STI-or-CLI is likely still be better
> than POPF.

It likely is better than popf, but for one single abuse which can be
written in a better way anyway, it's really not worth it.

~Andrew

Re: [PATCH] x86/irq: Improve local_irq_restore() code generation and performance
Posted by Jan Beulich 2 years, 4 months ago
On 06.12.2021 16:10, Andrew Cooper wrote:
> On 06/12/2021 14:07, Jan Beulich wrote:
>> On 06.12.2021 14:55, Andrew Cooper wrote:
>>> On 06/12/2021 13:38, Andrew Cooper wrote:
>>>> popf is a horribly expensive instruction, while sti is an optimised fastpath.
>>>> Switching popf for a conditional branch and sti caused an 8% perf improvement
>>>> in various linux measurements.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>> ---
>>>>  xen/include/asm-x86/system.h | 9 ++-------
>>>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
>>>> index 65e63de69a67..4be235472ecd 100644
>>>> --- a/xen/include/asm-x86/system.h
>>>> +++ b/xen/include/asm-x86/system.h
>>>> @@ -267,13 +267,8 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
>>>>  })
>>>>  #define local_irq_restore(x)                                     \
>>>>  ({                                                               \
>>>> -    BUILD_BUG_ON(sizeof(x) != sizeof(long));                     \
>>>> -    asm volatile ( "pushfq\n\t"                                  \
>>>> -                   "andq %0, (%%rsp)\n\t"                        \
>>>> -                   "orq  %1, (%%rsp)\n\t"                        \
>>>> -                   "popfq"                                       \
>>>> -                   : : "i?r" ( ~X86_EFLAGS_IF ),                 \
>>>> -                       "ri" ( (x) & X86_EFLAGS_IF ) );           \
>>>> +    if ( (x) & X86_EFLAGS_IF )                                   \
>>>> +        local_irq_enable();                                      \
>>>>  })
>>> Bah.  There's still the one total abuse of local_irq_restore() to
>>> disable interrupts.
>> Question is whether that's really to be considered an abuse:
> 
> These are Linux's APIs, not ours, and they've spoken on the matter. 
> Furthermore, I agree with this being an abuse of the mechanism.
> 
>>  To me
>> "restore" doesn't mean only "maybe re-enable", but also "maybe
>> re-disable".
> 
> nor does "save" mean "save and disable", but that's what it does.
> 
> The naming may not be completely ideal, but the expected usage is very
> much one way.
> 
>>  And a conditional STI-or-CLI is likely still be better
>> than POPF.
> 
> It likely is better than popf, but for one single abuse which can be
> written in a better way anyway, it's really not worth it.

Fine with me as long as we can be very certain that's it's really only
one such case.

Jan