[PATCH] xen/arm: p2m_set_entry duplicate calculation.

Paran Lee posted 1 patch 2 years ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220421151755.GA4718@DESKTOP-NK4TH6S.localdomain
xen/arch/arm/p2m.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
[PATCH] xen/arm: p2m_set_entry duplicate calculation.
Posted by Paran Lee 2 years ago
It doesn't seem necessary to do that calculation of order shift again.

Signed-off-by: Paran Lee <p4ranlee@gmail.com>
---
 xen/arch/arm/p2m.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 1d1059f7d2..533afc830a 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
     while ( nr )
     {
         unsigned long mask;
-        unsigned long order;
+        unsigned long order, pages;
 
         /*
          * Don't take into account the MFN when removing mapping (i.e
@@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
         if ( rc )
             break;
 
-        sgfn = gfn_add(sgfn, (1 << order));
+        pages = 1 << order;
+        sgfn = gfn_add(sgfn, pages);
         if ( !mfn_eq(smfn, INVALID_MFN) )
-           smfn = mfn_add(smfn, (1 << order));
+           smfn = mfn_add(smfn, pages);
 
-        nr -= (1 << order);
+        nr -= pages;
     }
 
     return rc;
-- 
2.25.1
Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Posted by Julien Grall 2 years ago
Hi,

On 21/04/2022 16:17, Paran Lee wrote:
> It doesn't seem necessary to do that calculation of order shift again.

I think we need to weight that against increasing the number of local 
variables that do pretty much the same.

This is pretty much done to a matter of taste here. IMHO, the original 
version is better but I see Stefano reviewed it so I will not argue 
against it.

That said, given you already sent a few patches, can you explain why you 
are doing this? Is this optimization purpose? Is it clean-up?

> 
> Signed-off-by: Paran Lee <p4ranlee@gmail.com>
> ---
>   xen/arch/arm/p2m.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 1d1059f7d2..533afc830a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>       while ( nr )
>       {
>           unsigned long mask;
> -        unsigned long order;
> +        unsigned long order, pages;
>   
>           /*
>            * Don't take into account the MFN when removing mapping (i.e
> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>           if ( rc )
>               break;
>   
> -        sgfn = gfn_add(sgfn, (1 << order));
> +        pages = 1 << order;

Please take the opportunity to switch to 1UL.

> +        sgfn = gfn_add(sgfn, pages);
>           if ( !mfn_eq(smfn, INVALID_MFN) )
> -           smfn = mfn_add(smfn, (1 << order));
> +           smfn = mfn_add(smfn, pages);
>   
> -        nr -= (1 << order);
> +        nr -= pages;
>       }
>   
>       return rc;

Cheers,

-- 
Julien Grall
Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Posted by Paran Lee 2 years ago
Hello, Julien Grall.

Thanks you, I agreed! It made me think once more about what my patch
could improve.
patches I sent have been reviewed in various ways. It was a good
opportunity to analyze my patch from various perspectives. :)

I checked objdump in -O2 optimization(default) of Xen Makefile to make
sure CSE (Common subexpression elimination) works well on the latest
arm64 cross compiler on x86_64 from  Arm GNU Toolchain.

$
~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-v
...
A-profile Architecture 10.3-2021.07 (arm-10.29)'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
Architecture 10.3-2021.07 (arm-10.29)

I compared the before and after my patches. This time, without adding a
"pages" variable, I proceeded to use the local variable mask with order
operation.

I was able to confirm that it does one less operation.

(1) before clean up

0000000000001bb4 <p2m_set_entry>:
    while ( nr )
    1bb4:       b40005e2        cbz     x2, 1c70 <p2m_set_entry+0xbc>
{
    ...
        if ( rc )
    1c1c:       350002e0        cbnz    w0, 1c78 <p2m_set_entry+0xc4>
        sgfn = gfn_add(sgfn, (1 << order));
    1c20:       1ad32373        lsl     w19, w27, w19   // <<< CES works
    1c24:       93407e73        sxtw    x19, w19        // <<< well!
    return _gfn(gfn_x(gfn) + i);
    1c28:       8b1302d6        add     x22, x22, x19
    return _mfn(mfn_x(mfn) + i);
    1c2c:       8b130281        add     x1, x20, x19
    1c30:       b100069f        cmn     x20, #0x1
    1c34:       9a941034        csel    x20, x1, x20, ne  // ne = any
    while ( nr )
    1c38:       eb1302b5        subs    x21, x21, x19
    1c3c:       540001e0        b.eq    1c78 <p2m_set_entry+0xc4>  // b.none

(2) Using again mask variable. mask = 1UL << order
code show me   sxtw    x19, w19    operation disappeared.

0000000000001bb4 <p2m_set_entry>:
    while ( nr )
    1bb4:       b40005c2        cbz     x2, 1c6c <p2m_set_entry+0xb8>
{
    ...
        if ( rc )
    1c1c:       350002c0        cbnz    w0, 1c74 <p2m_set_entry+0xc0>
        mask = 1UL << order;
    1c20:       9ad32373        lsl     x19, x27, x19   // <<< only lsl
    return _gfn(gfn_x(gfn) + i);
    1c24:       8b1302d6        add     x22, x22, x19
    return _mfn(mfn_x(mfn) + i);
    1c28:       8b130281        add     x1, x20, x19
    1c2c:       b100069f        cmn     x20, #0x1
    1c30:       9a941034        csel    x20, x1, x20, ne  // ne = any
    while ( nr )
    1c34:       eb1302b5        subs    x21, x21, x19
    1c38:       540001e0        b.eq    1c74 <p2m_set_entry+0xc0>  // b.none

I'll send you a follow-up patch I've been working on.

BR
Paran Lee

2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글:
> Hi,
> 
> On 21/04/2022 16:17, Paran Lee wrote:
>> It doesn't seem necessary to do that calculation of order shift again.
> 
> I think we need to weight that against increasing the number of local
> variables that do pretty much the same.
> 
> This is pretty much done to a matter of taste here. IMHO, the original
> version is better but I see Stefano reviewed it so I will not argue
> against it.
> 
> That said, given you already sent a few patches, can you explain why you
> are doing this? Is this optimization purpose? Is it clean-up?
> 
>>
>> Signed-off-by: Paran Lee <p4ranlee@gmail.com>
>> ---
>>   xen/arch/arm/p2m.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 1d1059f7d2..533afc830a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>>       while ( nr )
>>       {
>>           unsigned long mask;
>> -        unsigned long order;
>> +        unsigned long order, pages;
>>             /*
>>            * Don't take into account the MFN when removing mapping (i.e
>> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>>           if ( rc )
>>               break;
>>   -        sgfn = gfn_add(sgfn, (1 << order));
>> +        pages = 1 << order;
> 
> Please take the opportunity to switch to 1UL.
> 
>> +        sgfn = gfn_add(sgfn, pages);
>>           if ( !mfn_eq(smfn, INVALID_MFN) )
>> -           smfn = mfn_add(smfn, (1 << order));
>> +           smfn = mfn_add(smfn, pages);
>>   -        nr -= (1 << order);
>> +        nr -= pages;
>>       }
>>         return rc;
> 
> Cheers,
> 

Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Posted by Julien Grall 2 years ago
Hi,

On 26/04/2022 16:37, Paran Lee wrote:
> Thanks you, I agreed! It made me think once more about what my patch
> could improve.
> patches I sent have been reviewed in various ways. It was a good
> opportunity to analyze my patch from various perspectives. :)
> 
> I checked objdump in -O2 optimization(default) of Xen Makefile to make
> sure CSE (Common subexpression elimination) works well on the latest
> arm64 cross compiler on x86_64 from  Arm GNU Toolchain.
> 
> $
> ~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
> -v
> ...
> A-profile Architecture 10.3-2021.07 (arm-10.29)'
> Thread model: posix
> Supported LTO compression algorithms: zlib
> gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
> Architecture 10.3-2021.07 (arm-10.29)
> 
> I compared the before and after my patches. This time, without adding a
> "pages" variable, I proceeded to use the local variable mask with order
> operation.
> 
> I was able to confirm that it does one less operation.

Well... I don't think the one less operation is because of introduction 
of the local variable (see more below).

> 
> (1) before clean up
> 
> 0000000000001bb4 <p2m_set_entry>:
>      while ( nr )
>      1bb4:       b40005e2        cbz     x2, 1c70 <p2m_set_entry+0xbc>
> {
>      ...
>          if ( rc )
>      1c1c:       350002e0        cbnz    w0, 1c78 <p2m_set_entry+0xc4>
>          sgfn = gfn_add(sgfn, (1 << order));

1 << order is a 32-bit value but the second parameter is a 64-bit value 
(assuming arm64). So...

>      1c20:       1ad32373        lsl     w19, w27, w19   // <<< CES works
>      1c24:       93407e73        sxtw    x19, w19        // <<< well!

... this instruction is extending the 32-bit value to 64-bit value.

>      return _gfn(gfn_x(gfn) + i);
>      1c28:       8b1302d6        add     x22, x22, x19
>      return _mfn(mfn_x(mfn) + i);
>      1c2c:       8b130281        add     x1, x20, x19
>      1c30:       b100069f        cmn     x20, #0x1
>      1c34:       9a941034        csel    x20, x1, x20, ne  // ne = any
>      while ( nr )
>      1c38:       eb1302b5        subs    x21, x21, x19
>      1c3c:       540001e0        b.eq    1c78 <p2m_set_entry+0xc4>  // b.none
> 
> (2) Using again mask variable. mask = 1UL << order
> code show me   sxtw    x19, w19    operation disappeared.
This code is not only using a local variable but also using "1UL". So, I 
suspect that if you were using 1 << order, the instruction would re-appear.

Cheers,

-- 
Julien Grall
Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Posted by Stefano Stabellini 2 years ago
On Fri, 22 Apr 2022, Paran Lee wrote:
> It doesn't seem necessary to do that calculation of order shift again.
> 
> Signed-off-by: Paran Lee <p4ranlee@gmail.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
>  xen/arch/arm/p2m.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index 1d1059f7d2..533afc830a 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>      while ( nr )
>      {
>          unsigned long mask;
> -        unsigned long order;
> +        unsigned long order, pages;
>  
>          /*
>           * Don't take into account the MFN when removing mapping (i.e
> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>          if ( rc )
>              break;
>  
> -        sgfn = gfn_add(sgfn, (1 << order));
> +        pages = 1 << order;
> +        sgfn = gfn_add(sgfn, pages);
>          if ( !mfn_eq(smfn, INVALID_MFN) )
> -           smfn = mfn_add(smfn, (1 << order));
> +           smfn = mfn_add(smfn, pages);
>  
> -        nr -= (1 << order);
> +        nr -= pages;
>      }
>  
>      return rc;
> -- 
> 2.25.1
>