[PATCH v3 0/4] riscv: uaccess: optimizations

Cyril Bur posted 4 patches 9 months, 4 weeks ago
There is a newer version of this series
arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
1 file changed, 152 insertions(+), 53 deletions(-)
[PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Cyril Bur 9 months, 4 weeks ago
This series tries to optimize riscv uaccess by allowing the use of
user_access_begin() and user_access_end() which permits grouping user accesses
and avoiding the CSR write penalty for each access.

The error path can also be optimised using asm goto which patches 3 and 4
achieve. This will speed up jumping to labels by avoiding the need of an
intermediary error type variable within the uaccess macros

I did read the discussion this series generated. It isn't clear to me
which direction to take the patches, if any.

V2:
I've taken on this series as there isn't any response from Jisheng. No
significant changes other than build fixes.
- Fixes build breakage in patch 3 to do with not having used 'goto' keyword.
- Fixes build breakage in patch 4 on 32bit not having delcared __ptr in the
  macro.

V3:
Significant commit message rewrites.
 - Corrected the justification for patch 2
 - Better explained/justified patches 3 and 4
Minor code changes for legibility and more comments.

Jisheng Zhang (4):
  riscv: implement user_access_begin() and families
  riscv: uaccess: use input constraints for ptr of __put_user()
  riscv: uaccess: use 'asm goto' for put_user()
  riscv: uaccess: use 'asm_goto_output' for get_user()

 arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
 1 file changed, 152 insertions(+), 53 deletions(-)

-- 
2.34.1
Re: [PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Alexandre Ghiti 9 months ago
Hi Cyril,

On 21/02/2025 01:09, Cyril Bur wrote:
> This series tries to optimize riscv uaccess by allowing the use of
> user_access_begin() and user_access_end() which permits grouping user accesses
> and avoiding the CSR write penalty for each access.
>
> The error path can also be optimised using asm goto which patches 3 and 4
> achieve. This will speed up jumping to labels by avoiding the need of an
> intermediary error type variable within the uaccess macros
>
> I did read the discussion this series generated. It isn't clear to me
> which direction to take the patches, if any.
>
> V2:
> I've taken on this series as there isn't any response from Jisheng. No
> significant changes other than build fixes.
> - Fixes build breakage in patch 3 to do with not having used 'goto' keyword.
> - Fixes build breakage in patch 4 on 32bit not having delcared __ptr in the
>    macro.
>
> V3:
> Significant commit message rewrites.
>   - Corrected the justification for patch 2
>   - Better explained/justified patches 3 and 4
> Minor code changes for legibility and more comments.
>
> Jisheng Zhang (4):
>    riscv: implement user_access_begin() and families
>    riscv: uaccess: use input constraints for ptr of __put_user()
>    riscv: uaccess: use 'asm goto' for put_user()
>    riscv: uaccess: use 'asm_goto_output' for get_user()
>
>   arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
>   1 file changed, 152 insertions(+), 53 deletions(-)
>
Following up on Ben's comment here 
https://lore.kernel.org/linux-riscv/b45aab1e-6d37-4027-9a15-4fa917d806b9@codethink.co.uk/

The problem that Ben mentions is caused by the use of *macros* which 
used to make the evaluation of the parameter inside the user-accessible 
section, and since this parameter could be a sleeping function, we could 
schedule another process with the SUM bit set, which could be cleared by 
this process, which would make the first process fault when trying to 
access user memory. I did not find any macro using unsafe_XXX() 
functions which could cause a problem right now, but I may have missed 
one and new could come up later, so we have multiple solutions here:

- suppose that a macro using unsafe_get/put_user() and passing a 
sleeping function as argument won't happen and then do nothing
- or save/restore CSR sstatus when switching processes
- or simply check that SUM is not set when switching processes

Let me know what you think.

Thanks,

Alex
Re: [PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Ben Dooks 9 months ago
On 14/03/2025 13:28, Alexandre Ghiti wrote:
> Hi Cyril,
> 
> On 21/02/2025 01:09, Cyril Bur wrote:
>> This series tries to optimize riscv uaccess by allowing the use of
>> user_access_begin() and user_access_end() which permits grouping user 
>> accesses
>> and avoiding the CSR write penalty for each access.
>>
>> The error path can also be optimised using asm goto which patches 3 and 4
>> achieve. This will speed up jumping to labels by avoiding the need of an
>> intermediary error type variable within the uaccess macros
>>
>> I did read the discussion this series generated. It isn't clear to me
>> which direction to take the patches, if any.
>>
>> V2:
>> I've taken on this series as there isn't any response from Jisheng. No
>> significant changes other than build fixes.
>> - Fixes build breakage in patch 3 to do with not having used 'goto' 
>> keyword.
>> - Fixes build breakage in patch 4 on 32bit not having delcared __ptr 
>> in the
>>    macro.
>>
>> V3:
>> Significant commit message rewrites.
>>   - Corrected the justification for patch 2
>>   - Better explained/justified patches 3 and 4
>> Minor code changes for legibility and more comments.
>>
>> Jisheng Zhang (4):
>>    riscv: implement user_access_begin() and families
>>    riscv: uaccess: use input constraints for ptr of __put_user()
>>    riscv: uaccess: use 'asm goto' for put_user()
>>    riscv: uaccess: use 'asm_goto_output' for get_user()
>>
>>   arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
>>   1 file changed, 152 insertions(+), 53 deletions(-)
>>
> Following up on Ben's comment here https://lore.kernel.org/linux-riscv/ 
> b45aab1e-6d37-4027-9a15-4fa917d806b9@codethink.co.uk/
> 
> The problem that Ben mentions is caused by the use of *macros* which 
> used to make the evaluation of the parameter inside the user-accessible 
> section, and since this parameter could be a sleeping function, we could 
> schedule another process with the SUM bit set, which could be cleared by 
> this process, which would make the first process fault when trying to 
> access user memory. I did not find any macro using unsafe_XXX() 
> functions which could cause a problem right now, but I may have missed 
> one and new could come up later, so we have multiple solutions here:
> 
> - suppose that a macro using unsafe_get/put_user() and passing a 
> sleeping function as argument won't happen and then do nothing
> - or save/restore CSR sstatus when switching processes
> - or simply check that SUM is not set when switching processes
> 
> Let me know what you think.

I'm on the save the flag side, for these reasons:

#1 sleeping functions can happen more often when various checks
    get enabled in the kernel (this was why the original fault
    was found).  Adding larger sections is just going to make
    the fault pop up again at some point in the future.

#2 the save/restore is a small addition to the swap registers

#3 saving SUM over a regs swap is always going to make sure we
    never see this gremlin turn up again

FYI, I think I may have posted our original test thread at some
point, but I could do so again.

> 
> Thanks,
> 
> Alex
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html
Re: [EXT] Re: [PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Cyril Bur 9 months ago

On 15/3/2025 12:49 am, Ben Dooks wrote:
> On 14/03/2025 13:28, Alexandre Ghiti wrote:
>> Hi Cyril,
>>
>> On 21/02/2025 01:09, Cyril Bur wrote:
>>> This series tries to optimize riscv uaccess by allowing the use of
>>> user_access_begin() and user_access_end() which permits grouping user 
>>> accesses
>>> and avoiding the CSR write penalty for each access.
>>>
>>> The error path can also be optimised using asm goto which patches 3 
>>> and 4
>>> achieve. This will speed up jumping to labels by avoiding the need of an
>>> intermediary error type variable within the uaccess macros
>>>
>>> I did read the discussion this series generated. It isn't clear to me
>>> which direction to take the patches, if any.
>>>
>>> V2:
>>> I've taken on this series as there isn't any response from Jisheng. No
>>> significant changes other than build fixes.
>>> - Fixes build breakage in patch 3 to do with not having used 'goto' 
>>> keyword.
>>> - Fixes build breakage in patch 4 on 32bit not having delcared __ptr 
>>> in the
>>>    macro.
>>>
>>> V3:
>>> Significant commit message rewrites.
>>>   - Corrected the justification for patch 2
>>>   - Better explained/justified patches 3 and 4
>>> Minor code changes for legibility and more comments.
>>>
>>> Jisheng Zhang (4):
>>>    riscv: implement user_access_begin() and families
>>>    riscv: uaccess: use input constraints for ptr of __put_user()
>>>    riscv: uaccess: use 'asm goto' for put_user()
>>>    riscv: uaccess: use 'asm_goto_output' for get_user()
>>>
>>>   arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
>>>   1 file changed, 152 insertions(+), 53 deletions(-)
>>>
>> Following up on Ben's comment here https://lore.kernel.org/linux- 
>> riscv/ b45aab1e-6d37-4027-9a15-4fa917d806b9@codethink.co.uk/
>>
>> The problem that Ben mentions is caused by the use of *macros* which 
>> used to make the evaluation of the parameter inside the user- 
>> accessible section, and since this parameter could be a sleeping 
>> function, we could schedule another process with the SUM bit set, 
>> which could be cleared by this process, which would make the first 
>> process fault when trying to access user memory. I did not find any 
>> macro using unsafe_XXX() functions which could cause a problem right 
>> now, but I may have missed one and new could come up later, so we have 
>> multiple solutions here:
>>
>> - suppose that a macro using unsafe_get/put_user() and passing a 
>> sleeping function as argument won't happen and then do nothing
>> - or save/restore CSR sstatus when switching processes
>> - or simply check that SUM is not set when switching processes
>>
>> Let me know what you think.
> 
> I'm on the save the flag side, for these reasons:
> 
> #1 sleeping functions can happen more often when various checks
>     get enabled in the kernel (this was why the original fault
>     was found).  Adding larger sections is just going to make
>     the fault pop up again at some point in the future.
> 
> #2 the save/restore is a small addition to the swap registers
> 
> #3 saving SUM over a regs swap is always going to make sure we
>     never see this gremlin turn up again
> 
> FYI, I think I may have posted our original test thread at some
> point, but I could do so again.

Yes, after Ben pointed out the issue I came to the conclusion we 
probably want Bens patch which saves the bit. Apologies if I didn't 
express this thought in email.

I'm happy to take the patch and put it on the front of this series, 
although perhaps it makes more sense you to revive the patch since 
you're still around Ben?

> 
>>
>> Thanks,
>>
>> Alex
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
> 
> 

Re: [EXT] Re: [PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Ben Dooks 9 months ago
On 17/03/2025 23:52, Cyril Bur wrote:
> 
> 
> On 15/3/2025 12:49 am, Ben Dooks wrote:
>> On 14/03/2025 13:28, Alexandre Ghiti wrote:
>>> Hi Cyril,
>>>
>>> On 21/02/2025 01:09, Cyril Bur wrote:
>>>> This series tries to optimize riscv uaccess by allowing the use of
>>>> user_access_begin() and user_access_end() which permits grouping 
>>>> user accesses
>>>> and avoiding the CSR write penalty for each access.
>>>>
>>>> The error path can also be optimised using asm goto which patches 3 
>>>> and 4
>>>> achieve. This will speed up jumping to labels by avoiding the need 
>>>> of an
>>>> intermediary error type variable within the uaccess macros
>>>>
>>>> I did read the discussion this series generated. It isn't clear to me
>>>> which direction to take the patches, if any.
>>>>
>>>> V2:
>>>> I've taken on this series as there isn't any response from Jisheng. No
>>>> significant changes other than build fixes.
>>>> - Fixes build breakage in patch 3 to do with not having used 'goto' 
>>>> keyword.
>>>> - Fixes build breakage in patch 4 on 32bit not having delcared __ptr 
>>>> in the
>>>>    macro.
>>>>
>>>> V3:
>>>> Significant commit message rewrites.
>>>>   - Corrected the justification for patch 2
>>>>   - Better explained/justified patches 3 and 4
>>>> Minor code changes for legibility and more comments.
>>>>
>>>> Jisheng Zhang (4):
>>>>    riscv: implement user_access_begin() and families
>>>>    riscv: uaccess: use input constraints for ptr of __put_user()
>>>>    riscv: uaccess: use 'asm goto' for put_user()
>>>>    riscv: uaccess: use 'asm_goto_output' for get_user()
>>>>
>>>>   arch/riscv/include/asm/uaccess.h | 205 ++++++++++++++++++++++ 
>>>> +--------
>>>>   1 file changed, 152 insertions(+), 53 deletions(-)
>>>>
>>> Following up on Ben's comment here https://lore.kernel.org/linux- 
>>> riscv/ b45aab1e-6d37-4027-9a15-4fa917d806b9@codethink.co.uk/
>>>
>>> The problem that Ben mentions is caused by the use of *macros* which 
>>> used to make the evaluation of the parameter inside the user- 
>>> accessible section, and since this parameter could be a sleeping 
>>> function, we could schedule another process with the SUM bit set, 
>>> which could be cleared by this process, which would make the first 
>>> process fault when trying to access user memory. I did not find any 
>>> macro using unsafe_XXX() functions which could cause a problem right 
>>> now, but I may have missed one and new could come up later, so we 
>>> have multiple solutions here:
>>>
>>> - suppose that a macro using unsafe_get/put_user() and passing a 
>>> sleeping function as argument won't happen and then do nothing
>>> - or save/restore CSR sstatus when switching processes
>>> - or simply check that SUM is not set when switching processes
>>>
>>> Let me know what you think.
>>
>> I'm on the save the flag side, for these reasons:
>>
>> #1 sleeping functions can happen more often when various checks
>>     get enabled in the kernel (this was why the original fault
>>     was found).  Adding larger sections is just going to make
>>     the fault pop up again at some point in the future.
>>
>> #2 the save/restore is a small addition to the swap registers
>>
>> #3 saving SUM over a regs swap is always going to make sure we
>>     never see this gremlin turn up again
>>
>> FYI, I think I may have posted our original test thread at some
>> point, but I could do so again.
> 
> Yes, after Ben pointed out the issue I came to the conclusion we 
> probably want Bens patch which saves the bit. Apologies if I didn't 
> express this thought in email.
> 
> I'm happy to take the patch and put it on the front of this series, 
> although perhaps it makes more sense you to revive the patch since 
> you're still around Ben?

Yes, I'm currently very busy so happy for someone else to get this
merged and tested.

We could do with some better testing on whether we leak flags on
task switches (possibly), but that's a side quest and not for now.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html
Re: [PATCH v3 0/4] riscv: uaccess: optimizations
Posted by Anton Blanchard 9 months, 3 weeks ago
Hi Cyril,

On Fri, Feb 21, 2025 at 11:09 AM Cyril Bur <cyrilbur@tenstorrent.com> wrote:
> This series tries to optimize riscv uaccess by allowing the use of
> user_access_begin() and user_access_end() which permits grouping user accesses
> and avoiding the CSR write penalty for each access.

I tested this on the upcoming Tenstorrent Ascalon CPU using a simple
microbenchmark (getdents64() on a directory with 10 files in it) and it was
significantly faster (over 40%). This came from both a reduction in
instruction count as well as a reduction in sstatus CSR writes.

It would be great if we could get this merged considering x86, arm64 and
POWER are exploiting this optimisation already.

Thanks,
Anton

> The error path can also be optimised using asm goto which patches 3 and 4
> achieve. This will speed up jumping to labels by avoiding the need of an
> intermediary error type variable within the uaccess macros
>
> I did read the discussion this series generated. It isn't clear to me
> which direction to take the patches, if any.
>
> V2:
> I've taken on this series as there isn't any response from Jisheng. No
> significant changes other than build fixes.
> - Fixes build breakage in patch 3 to do with not having used 'goto' keyword.
> - Fixes build breakage in patch 4 on 32bit not having delcared __ptr in the
>   macro.
>
> V3:
> Significant commit message rewrites.
>  - Corrected the justification for patch 2
>  - Better explained/justified patches 3 and 4
> Minor code changes for legibility and more comments.
>
> Jisheng Zhang (4):
>   riscv: implement user_access_begin() and families
>   riscv: uaccess: use input constraints for ptr of __put_user()
>   riscv: uaccess: use 'asm goto' for put_user()
>   riscv: uaccess: use 'asm_goto_output' for get_user()
>
>  arch/riscv/include/asm/uaccess.h | 205 +++++++++++++++++++++++--------
>  1 file changed, 152 insertions(+), 53 deletions(-)
>
> --
> 2.34.1
>
>