[RFC PATCH v1 0/5] nolibc x86-64 string functions

Ammar Faizi posted 5 patches 2 years, 3 months ago
There is a newer version of this series
tools/include/nolibc/arch-x86_64.h | 60 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h      | 38 ++++++++-----------
2 files changed, 75 insertions(+), 23 deletions(-)
[RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Ammar Faizi 2 years, 3 months ago
Hi Willy,

This is an RFC patchset for nolibc x86-64 string functions. There are 5
patches in this series.

## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
    - memcpy() and memmove()
    - memset()
    - memcmp()
respectively. They can simplify the generated ASM code.

Patch 4 and 5 are not related, just a small cleanup.

## Patch 4: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 5: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
  0000000000001479 <memmove>:
    1479: f3 0f 1e fa           endbr64 
    147d: 48 39 f7              cmp    %rsi,%rdi
    1480: 48 c7 c1 ff ff ff ff  mov    $0xffffffffffffffff,%rcx
    1487: 48 89 f8              mov    %rdi,%rax
    148a: 48 0f 43 ca           cmovae %rdx,%rcx
    148e: 48 19 ff              sbb    %rdi,%rdi
    1491: 83 e7 02              and    $0x2,%edi
    1494: 48 ff cf              dec    %rdi
    1497: 48 85 d2              test   %rdx,%rdx
    149a: 74 10                 je     14ac <memmove+0x33>
    149c: 48 01 f9              add    %rdi,%rcx
    149f: 48 ff ca              dec    %rdx
    14a2: 44 8a 04 0e           mov    (%rsi,%rcx,1),%r8b
    14a6: 44 88 04 08           mov    %r8b,(%rax,%rcx,1)
    14aa: eb eb                 jmp    1497 <memmove+0x1e>
    14ac: c3                    ret

  00000000000014ad <memcpy>:
    14ad: f3 0f 1e fa           endbr64 
    14b1: 48 89 f8              mov    %rdi,%rax
    14b4: 31 c9                 xor    %ecx,%ecx
    14b6: 48 39 ca              cmp    %rcx,%rdx
    14b9: 74 0d                 je     14c8 <memcpy+0x1b>
    14bb: 40 8a 3c 0e           mov    (%rsi,%rcx,1),%dil
    14bf: 40 88 3c 08           mov    %dil,(%rax,%rcx,1)
    14c3: 48 ff c1              inc    %rcx
    14c6: eb ee                 jmp    14b6 <memcpy+0x9>
    14c8: c3                    ret

  00000000000014c9 <memset>:
    14c9: f3 0f 1e fa           endbr64 
    14cd: 48 89 f8              mov    %rdi,%rax
    14d0: 31 c9                 xor    %ecx,%ecx
    14d2: 48 39 ca              cmp    %rcx,%rdx
    14d5: 74 09                 je     14e0 <memset+0x17>
    14d7: 40 88 34 08           mov    %sil,(%rax,%rcx,1)
    14db: 48 ff c1              inc    %rcx
    14de: eb f2                 jmp    14d2 <memset+0x9>
    14e0: c3                    ret
```

After this series:
```
  000000000000140a <memmove>:
    140a: 48 89 f8              mov    %rdi,%rax
    140d: 48 89 d1              mov    %rdx,%rcx
    1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
    1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
    141a: fd                    std    
    141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    141d: fc                    cld
    141e: c3                    ret

  000000000000141f <memcpy>:
    141f: 48 89 f8              mov    %rdi,%rax
    1422: 48 89 d1              mov    %rdx,%rcx
    1425: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    1427: c3                    ret

  0000000000001428 <memset>:
    1428: 48 89 f0              mov    %rsi,%rax
    142b: 48 89 d1              mov    %rdx,%rcx
    142e: 48 89 fa              mov    %rdi,%rdx
    1431: f3 aa                 rep stos %al,%es:(%rdi)
    1433: 48 89 d0              mov    %rdx,%rax
    1436: c3                    ret
```

Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
---

Ammar Faizi (5):
  tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

 tools/include/nolibc/arch-x86_64.h | 60 ++++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      | 38 ++++++++-----------
 2 files changed, 75 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
-- 
Ammar Faizi
Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Willy Tarreau 2 years, 3 months ago
Hi Ammar,

On Wed, Aug 30, 2023 at 08:57:21PM +0700, Ammar Faizi wrote:
> Hi Willy,
> 
> This is an RFC patchset for nolibc x86-64 string functions. There are 5
> patches in this series.
> 
> ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
>     - memcpy() and memmove()
>     - memset()
>     - memcmp()
> respectively. They can simplify the generated ASM code.
> 
> Patch 4 and 5 are not related, just a small cleanup.

So overall I'm fine with this, I think it's reasonable. As you said
we're not trying to chase the very last byte, but for such functions
it's also nice if they can remain small. Some of them might even
benefit from being inlined by the way (in this case they'd rather
move to C functions with an asm() statement), because the call
instruction and the register moves or spilling code will generally
be larger than the functions themselves. That might be worth checking.
Ah no, we cannot because some of them are called from libgcc and friends.
Or we may need to mark them inline and weak without static, I'm not sure
how well that works.

Please just let me know if you intend to change a few things based on
previous comments, and also this memcmp() stuff that's both C and asm.

Thanks!
Willy
RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by David Laight 2 years, 3 months ago
From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> Sent: 30 August 2023 14:57
> 
> This is an RFC patchset for nolibc x86-64 string functions. There are 5
> patches in this series.
> 
> ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
>     - memcpy() and memmove()
>     - memset()
>     - memcmp()
> respectively. They can simplify the generated ASM code.
> 
...
> After this series:
> ```
>   000000000000140a <memmove>:
>     140a: 48 89 f8              mov    %rdi,%rax
>     140d: 48 89 d1              mov    %rdx,%rcx
>     1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
>     1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
>     141a: fd                    std
>     141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
>     141d: fc                    cld
>     141e: c3                    ret

Isn't that completely broken?

You need to select between forwards and backwards moves.
Since forwards moves are preferred it is best to do
	if (dst - src < len)
		backards_copy()
	else
		formwards_copy()

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Willy Tarreau 2 years, 3 months ago
On Fri, Sep 01, 2023 at 11:34:18AM +0000, David Laight wrote:
> From: Ammar Faizi <ammarfaizi2@gnuweeb.org>
> > Sent: 30 August 2023 14:57
> > 
> > This is an RFC patchset for nolibc x86-64 string functions. There are 5
> > patches in this series.
> > 
> > ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
> >     - memcpy() and memmove()
> >     - memset()
> >     - memcmp()
> > respectively. They can simplify the generated ASM code.
> > 
> ...
> > After this series:
> > ```
> >   000000000000140a <memmove>:
> >     140a: 48 89 f8              mov    %rdi,%rax
> >     140d: 48 89 d1              mov    %rdx,%rcx
> >     1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
> >     1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
> >     141a: fd                    std
> >     141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
> >     141d: fc                    cld
> >     141e: c3                    ret
> 
> Isn't that completely broken?
> 
> You need to select between forwards and backwards moves.
> Since forwards moves are preferred it is best to do
> 	if (dst - src < len)
> 		backards_copy()
> 	else
> 		formwards_copy()
> 
> 	David

You're completely right indeed, reminds me about the copy_up/copy_down
that were not used anymore :-)

Willy
Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Ammar Faizi 2 years, 3 months ago
On Fri, Sep 01, 2023 at 01:46:44PM +0200, Willy Tarreau wrote:
> On Fri, Sep 01, 2023 at 11:34:18AM +0000, David Laight wrote:
> > Isn't that completely broken?
> > 
> > You need to select between forwards and backwards moves.
> > Since forwards moves are preferred it is best to do
> > 	if (dst - src < len)
> > 		backards_copy()
> > 	else
> > 		formwards_copy()
> > 
> > 	David
> 
> You're completely right indeed, reminds me about the copy_up/copy_down
> that were not used anymore :-)

I'm an idiot, will fix that. Another attempt as suggested below:

__asm__ (
".section .text.nolibc_memmove\n"
".weak memmove\n"
"memmove:\n"
"    movq    %rdx, %rcx\n"
"    movq    %rdi, %rdx\n"
"    movq    %rdi, %rax\n"
"    subq    %rsi, %rdx\n"
"    cmpq    %rcx, %rdx\n"
"    jnb     .Lforward_copy\n"
"    leaq    -1(%rdi, %rcx, 1), %rdi\n"
"    leaq    -1(%rsi, %rcx, 1), %rsi\n"
"    std\n"
".Lforward_copy:\n"
"    rep movsb\n"
"    cld\n"
"    ret\n"
);

-- 
Ammar Faizi
RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by David Laight 2 years, 3 months ago
From: Ammar Faizi
> Sent: 01 September 2023 14:06
...
> > You're completely right indeed, reminds me about the copy_up/copy_down
> > that were not used anymore :-)
> 
> I'm an idiot, will fix that. Another attempt as suggested below:
> 
> __asm__ (
> ".section .text.nolibc_memmove\n"
> ".weak memmove\n"
> "memmove:\n"
> "    movq    %rdx, %rcx\n"
> "    movq    %rdi, %rdx\n"
> "    movq    %rdi, %rax\n"

You seem to have confused yourself about whether you are using %eax or %edx.

> "    subq    %rsi, %rdx\n"
> "    cmpq    %rcx, %rdx\n"
> "    jnb     .Lforward_copy\n"

I think I'd fall through to the forwards copy
and not worry about replicating the 'reps movsb' and 'ret'.
IIRC 'cld' can be slow as well.

> "    leaq    -1(%rdi, %rcx, 1), %rdi\n"
> "    leaq    -1(%rsi, %rcx, 1), %rsi\n"
> "    std\n"
> ".Lforward_copy:\n"
> "    rep movsb\n"
> "    cld\n"
> "    ret\n"
> );
> 
> --
> Ammar Faizi

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Ammar Faizi 2 years, 3 months ago
On Fri, Sep 01, 2023 at 02:23:28PM +0000, David Laight wrote:
> From: Ammar Faizi
> > Sent: 01 September 2023 14:06
...
> > __asm__ (
> > ".section .text.nolibc_memmove\n"
> > ".weak memmove\n"
> > "memmove:\n"
> > "    movq    %rdx, %rcx\n"
> > "    movq    %rdi, %rdx\n"
> > "    movq    %rdi, %rax\n"
> 
> You seem to have confused yourself about whether you are using %eax or %edx.

What do you mean? They're all 64-bit pointers.

What I know is that the %rdx will be clobbered by "subq %rsi, %rdx"
below and the %rax should be return value. That's why I copy the %rdi
twice. memmove() returns the dst pointer. Did I miss something?

> > "    subq    %rsi, %rdx\n"
> > "    cmpq    %rcx, %rdx\n"
> > "    jnb     .Lforward_copy\n"
> 
> I think I'd fall through to the forwards copy
> and not worry about replicating the 'reps movsb' and 'ret'.
> IIRC 'cld' can be slow as well.

Alright, I will avoid cld for the forward copy.

> > "    leaq    -1(%rdi, %rcx, 1), %rdi\n"
> > "    leaq    -1(%rsi, %rcx, 1), %rsi\n"
> > "    std\n"
> > ".Lforward_copy:\n"
> > "    rep movsb\n"
> > "    cld\n"
> > "    ret\n"
> > );

-- 
Ammar Faizi
RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by David Laight 2 years, 3 months ago
From: Ammar Faizi
> Sent: 01 September 2023 15:42
...
> > > "    movq    %rdx, %rcx\n"
> > > "    movq    %rdi, %rdx\n"
> > > "    movq    %rdi, %rax\n"
> >
> > You seem to have confused yourself about whether you are using %eax or %edx.
> 
> What do you mean? They're all 64-bit pointers.

%ax, %eax, %rax - what is the difference :-)

> What I know is that the %rdx will be clobbered by "subq %rsi, %rdx"
> below and the %rax should be return value. That's why I copy the %rdi
> twice. memmove() returns the dst pointer. Did I miss something?

I'd forgotten about the (stupid) return value.

I'm pretty sure it is an accident from the original pdp-11
implementation from the days before C had an explicit 'return'
statement.
(The pdp-11 I used ran RSX/11M - so had a Fortran compiler
not a C one.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
Posted by Ammar Faizi 2 years, 3 months ago
On Fri, Sep 01, 2023 at 02:54:56PM +0000, David Laight wrote:
> I'd forgotten about the (stupid) return value.
> 
> I'm pretty sure it is an accident from the original pdp-11
> implementation from the days before C had an explicit 'return'
> statement.
> (The pdp-11 I used ran RSX/11M - so had a Fortran compiler
> not a C one.)

You're old. I did not exist in that era. And my parents were still young :-)

-- 
Ammar Faizi