[v2] riscv: optimize string functions and add kunit tests

[PATCH v2 00/14] riscv: optimize string functions and add kunit tests

Posted by Feng Jiang 3 weeks, 5 days ago

This series introduces optimized assembly implementations for strnlen,
strchr, and strrchr on the RISC-V architecture. To support a rigorous
verification process, the series also significantly expands the
string_kunit test suite with both functional correctness tests and
performance benchmarks.

The patchset is organized as follows:
- Refactoring (Patches 1-4): Extract generic C implementations for
  strlen, strnlen, strchr, and strrchr into exported __generic_* functions.
- Correctness Testing (Patches 5-7): Extend string_kunit with detailed
  functional tests for the target functions.
- Performance Benchmarking (Patches 8-11): Add a benchmarking framework
  to string_kunit to measure execution time across various string lengths.
- RISC-V Optimizations (Patches 12-14): Provide the optimized assembly
  implementations for the RISC-V architecture.

Testing:
All patches have been verified using the KUnit framework on QEMU 
virt machine (riscv64). All string-related tests passed.

    $ ./tools/testing/kunit/kunit.py run --arch=riscv \
        --cross_compile=riscv64-linux-gnu- \
        --kunitconfig=my_string.kunitconfig \
        --raw_output
    [15:26:26] Configuring KUnit Kernel ...
    ...
        ok 1 string_test_memset16
        ok 2 string_test_memset32
        ok 3 string_test_memset64
        ok 4 string_test_strlen
        # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
        # string_test_strlen_bench:   arch-optimized: 148900 ns
        # string_test_strlen_bench:   generic C:      5551900 ns
        # string_test_strlen_bench:   speedup:        37.28x
        # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
        # string_test_strlen_bench:   arch-optimized: 166000 ns
        # string_test_strlen_bench:   generic C:      16250200 ns
        # string_test_strlen_bench:   speedup:        97.89x
        # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
        # string_test_strlen_bench:   arch-optimized: 14100 ns
        # string_test_strlen_bench:   generic C:      35605600 ns
        # string_test_strlen_bench:   speedup:        2525.21x
        ok 5 string_test_strlen_bench
        ok 6 string_test_strnlen
        # string_test_strnlen_bench: strnlen performance (short, len: 8, iters: 100000):
        # string_test_strnlen_bench:   arch-optimized: 147500 ns
        # string_test_strnlen_bench:   generic C:      6429800 ns
        # string_test_strnlen_bench:   speedup:        43.59x
        # string_test_strnlen_bench: strnlen performance (medium, len: 64, iters: 100000):
        # string_test_strnlen_bench:   arch-optimized: 197900 ns
        # string_test_strnlen_bench:   generic C:      22322500 ns
        # string_test_strnlen_bench:   speedup:        112.79x
        # string_test_strnlen_bench: strnlen performance (long, len: 2048, iters: 10000):
        # string_test_strnlen_bench:   arch-optimized: 14100 ns
        # string_test_strnlen_bench:   generic C:      56162600 ns
        # string_test_strnlen_bench:   speedup:        3983.16x
        ok 7 string_test_strnlen_bench
        ok 8 string_test_strchr
        # string_test_strchr_bench: strchr performance (short, len: 8, iters: 100000):
        # string_test_strchr_bench:   arch-optimized: 166800 ns
        # string_test_strchr_bench:   generic C:      6079400 ns
        # string_test_strchr_bench:   speedup:        36.44x
        # string_test_strchr_bench: strchr performance (medium, len: 64, iters: 100000):
        # string_test_strchr_bench:   arch-optimized: 151500 ns
        # string_test_strchr_bench:   generic C:      21130400 ns
        # string_test_strchr_bench:   speedup:        139.47x
        # string_test_strchr_bench: strchr performance (long, len: 2048, iters: 10000):
        # string_test_strchr_bench:   arch-optimized: 32800 ns
        # string_test_strchr_bench:   generic C:      50630400 ns
        # string_test_strchr_bench:   speedup:        1543.60x
        ok 9 string_test_strchr_bench
        ok 10 string_test_strnchr
        ok 11 string_test_strrchr
        # string_test_strrchr_bench: strrchr performance (short, len: 8, iters: 100000):
        # string_test_strrchr_bench:   arch-optimized: 166300 ns
        # string_test_strrchr_bench:   generic C:      6201400 ns
        # string_test_strrchr_bench:   speedup:        37.29x
        # string_test_strrchr_bench: strrchr performance (medium, len: 64, iters: 100000):
        # string_test_strrchr_bench:   arch-optimized: 207200 ns
        # string_test_strrchr_bench:   generic C:      23062700 ns
        # string_test_strrchr_bench:   speedup:        111.30x
        # string_test_strrchr_bench: strrchr performance (long, len: 2048, iters: 10000):
        # string_test_strrchr_bench:   arch-optimized: 14000 ns
        # string_test_strrchr_bench:   generic C:      51192900 ns
        # string_test_strrchr_bench:   speedup:        3656.63x
        ok 12 string_test_strrchr_bench
        ok 13 string_test_strspn
    ...
    # string: pass:28 fail:0 skip:0 total:28
    # Totals: pass:28 fail:0 skip:0 total:28
    ok 1 string
    reboot: Restarting system
    [15:28:10] Elapsed time: 103.449s total, 0.001s configuring, 101.878s building, 1.569s running

Changes:
v1: Initial submission.

v2: 
- Refactored lib/string.c to export __generic_* functions and added
  corresponding functional/performance tests for strnlen, strchr,
  and strrchr (Andy Shevchenko).
- Replaced magic numbers with STRING_TEST_MAX_LEN etc. (Andy Shevchenko).

---

Feng Jiang (14):
  lib/string: extract generic strlen() into __generic_strlen()
  lib/string: extract generic strnlen() into __generic_strnlen()
  lib/string: extract generic strchr() into __generic_strchr()
  lib/string: extract generic strrchr() into __generic_strrchr()
  lib/string_kunit: add correctness test for strlen
  lib/string_kunit: add correctness test for strnlen
  lib/string_kunit: add correctness test for strrchr()
  lib/string_kunit: add performance benchmark for strlen()
  lib/string_kunit: add performance benchmark for strnlen()
  lib/string_kunit: add performance benchmark for strchr()
  lib/string_kunit: add performance benchmark for strrchr()
  riscv: lib: add strnlen implementation
  riscv: lib: add strchr implementation
  riscv: lib: add strrchr implementation

 arch/riscv/include/asm/string.h |   9 +
 arch/riscv/lib/Makefile         |   3 +
 arch/riscv/lib/strchr.S         |  35 ++++
 arch/riscv/lib/strnlen.S        | 164 +++++++++++++++
 arch/riscv/lib/strrchr.S        |  37 ++++
 arch/riscv/purgatory/Makefile   |  11 +-
 include/linux/string.h          |   4 +
 lib/string.c                    |  53 +++--
 lib/tests/string_kunit.c        | 344 ++++++++++++++++++++++++++++++++
 9 files changed, 645 insertions(+), 15 deletions(-)
 create mode 100644 arch/riscv/lib/strchr.S
 create mode 100644 arch/riscv/lib/strnlen.S
 create mode 100644 arch/riscv/lib/strrchr.S

-- 
2.25.1

Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests

Posted by Joel Stanley 3 weeks, 3 days ago

On Tue, 13 Jan 2026 at 18:58, Feng Jiang <jiangfeng@kylinos.cn> wrote:
>
> This series introduces optimized assembly implementations for strnlen,
> strchr, and strrchr on the RISC-V architecture. To support a rigorous
> verification process, the series also significantly expands the
> string_kunit test suite with both functional correctness tests and
> performance benchmarks.

I ran the kunit tests on Ascalon, a RVA23 CPU, in emulation. The arch
optimised version showed significant improvements over the plain
version.

I didn't have time to investigate if the numbers made sense. As Andy
noted, the 'long' benchmark had a much higher ratio improvement than
the short and medium.

Tested-by: Joel Stanley <joel@jms.id.au>

Cheers,

Joel

Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests

Posted by Feng Jiang 2 weeks, 5 days ago

On 2026/1/15 12:43, Joel Stanley wrote:
> On Tue, 13 Jan 2026 at 18:58, Feng Jiang <jiangfeng@kylinos.cn> wrote:
>>
>> This series introduces optimized assembly implementations for strnlen,
>> strchr, and strrchr on the RISC-V architecture. To support a rigorous
>> verification process, the series also significantly expands the
>> string_kunit test suite with both functional correctness tests and
>> performance benchmarks.
> 
> I ran the kunit tests on Ascalon, a RVA23 CPU, in emulation. The arch
> optimised version showed significant improvements over the plain
> version.
> 
> I didn't have time to investigate if the numbers made sense. As Andy
> noted, the 'long' benchmark had a much higher ratio improvement than
> the short and medium.
> 
> Tested-by: Joel Stanley <joel@jms.id.au>
> 

Thank you very much for your time and the test results.

You were absolutely right to question the numbers. I've realized there were
some flaws in the previous benchmark logic that led to those inconsistent
ratios. I am sincerely sorry for the confusion this may have caused.

I am currently refining the implementation for v3 to ensure much more
accurate and reliable measurements. I'll send out the updated series
once it's ready.

Thanks again for helping me catch this!

-- 
With Best Regards,
Feng Jiang

Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests

Posted by Andy Shevchenko 3 weeks, 5 days ago

On Tue, Jan 13, 2026 at 04:27:34PM +0800, Feng Jiang wrote:
> This series introduces optimized assembly implementations for strnlen,
> strchr, and strrchr on the RISC-V architecture. To support a rigorous
> verification process, the series also significantly expands the
> string_kunit test suite with both functional correctness tests and
> performance benchmarks.
> 
> The patchset is organized as follows:
> - Refactoring (Patches 1-4): Extract generic C implementations for
>   strlen, strnlen, strchr, and strrchr into exported __generic_* functions.
> - Correctness Testing (Patches 5-7): Extend string_kunit with detailed
>   functional tests for the target functions.
> - Performance Benchmarking (Patches 8-11): Add a benchmarking framework
>   to string_kunit to measure execution time across various string lengths.
> - RISC-V Optimizations (Patches 12-14): Provide the optimized assembly
>   implementations for the RISC-V architecture.

...

>         # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>         # string_test_strlen_bench:   arch-optimized: 14100 ns
>         # string_test_strlen_bench:   generic C:      35605600 ns
>         # string_test_strlen_bench:   speedup:        2525.21x

Doesn't sound right. I think you measured cache performance and not your algo.

-- 
With Best Regards,
Andy Shevchenko