[v4] riscv: optimize string functions and add kunit tests

[PATCH v4 4/8] lib/string_kunit: add performance benchmark for strlen()

Posted by Feng Jiang 2 weeks, 3 days ago

Introduce a benchmarking framework to the string_kunit test suite to
measure the execution efficiency of string functions.

The implementation is inspired by crc_benchmark(), measuring throughput
(MB/s) and latency (ns/call) across a range of string lengths. It
includes a warm-up phase, disables preemption during measurement, and
uses a fixed seed for reproducible results.

This framework allows for comparing different implementations (e.g.,
generic C vs. architecture-optimized assembly) within the KUnit
environment.

Initially, provide a benchmark for strlen().

Suggested-by: Andy Shevchenko <andy@kernel.org>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/Kconfig.debug        |  11 +++
 lib/tests/string_kunit.c | 158 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 169 insertions(+)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ba36939fda79..21b058ae815f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2475,6 +2475,17 @@ config STRING_HELPERS_KUNIT_TEST
 	depends on KUNIT
 	default KUNIT_ALL_TESTS
 
+config STRING_KUNIT_BENCH
+       bool "Benchmark string functions at runtime"
+       depends on STRING_KUNIT_TEST
+       help
+         Enable performance measurement for string functions.
+
+         This measures the execution efficiency of string functions
+         during the KUnit test run.
+
+         If unsure, say N.
+
 config FFS_KUNIT_TEST
 	tristate "KUnit test ffs-family functions at runtime" if !KUNIT_ALL_TESTS
 	depends on KUNIT
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 7f1e2bf6a352..52798f426aa9 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -6,10 +6,13 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <kunit/test.h>
+#include <linux/math64.h>
 #include <linux/module.h>
+#include <linux/prandom.h>
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/units.h>
 
 #define STRCMP_LARGE_BUF_LEN 2048
 #define STRCMP_CHANGE_POINT 1337
@@ -20,6 +23,9 @@
 #define STRING_TEST_MAX_LEN	128
 #define STRING_TEST_MAX_OFFSET	16
 
+#define STRING_BENCH_SEED	888
+#define STRING_BENCH_WORKLOAD	(1 * MEGA)
+
 static void string_test_memset16(struct kunit *test)
 {
 	unsigned i, j, k;
@@ -700,6 +706,157 @@ static void string_test_strends(struct kunit *test)
 	KUNIT_EXPECT_TRUE(test, strends("", ""));
 }
 
+/* Target string lengths for benchmarking */
+static const size_t bench_lens[] = {
+	0, 1, 7, 8, 16, 31, 64, 127, 512, 1024, 3173, 4096,
+};
+
+/**
+ * alloc_max_bench_buffer() - Allocate buffer for the max test case.
+ * @test: KUnit context for managed allocation.
+ * @lens: Array of lengths used in the benchmark cases.
+ * @count: Number of elements in the @lens array.
+ * @buf_len: [out] Pointer to store the actually allocated buffer
+ * size (including NUL character).
+ *
+ * Return: Pointer to the allocated memory, or NULL on failure.
+ */
+static void *alloc_max_bench_buffer(struct kunit *test,
+		const size_t *lens, size_t count, size_t *buf_len)
+{
+	size_t i, max_len = 0;
+	void *buf;
+
+	for (i = 0; i < count; i++) {
+		if (max_len < lens[i])
+			max_len = lens[i];
+	}
+
+	/* Add space for NUL character */
+	max_len += 1;
+
+	buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
+	if (!buf)
+		return NULL;
+
+	if (buf_len)
+		*buf_len = max_len;
+
+	return buf;
+}
+
+/**
+ * fill_random_string() - Populate a buffer with a random NUL-terminated string.
+ * @buf: Buffer to fill.
+ * @len: Length of the buffer in bytes.
+ *
+ * Fills the buffer with random non-NUL bytes and ensures the string is
+ * properly NUL-terminated.
+ */
+static void fill_random_string(char *buf, size_t len)
+{
+	struct rnd_state state;
+	size_t i;
+
+	if (!buf || !len)
+		return;
+
+	/* Use a fixed seed to ensure deterministic benchmark results */
+	prandom_seed_state(&state, STRING_BENCH_SEED);
+	prandom_bytes_state(&state, buf, len);
+
+	/* Replace NUL characters to avoid early string termination */
+	for (i = 0; i < len; i++) {
+		if (buf[i] == '\0')
+			buf[i] = 0x01;
+	}
+
+	buf[len - 1] = '\0';
+}
+
+/**
+ * STRING_BENCH() - Benchmark string functions.
+ * @iters: Number of iterations to run.
+ * @func: Function to benchmark.
+ * @...: Variable arguments passed to @func.
+ *
+ * Disables preemption and measures the total time in nanoseconds to execute
+ * @func(@__VA_ARGS__) for @iters times, including a small warm-up phase.
+ *
+ * Context: Disables preemption during measurement.
+ * Return: Total execution time in nanoseconds (u64).
+ */
+#define STRING_BENCH(iters, func, ...)					\
+({									\
+	/* Volatile function pointer prevents dead code elimination */	\
+	typeof(func) (* volatile __func) = (func);			\
+	size_t __bn_iters = (iters);					\
+	size_t __bn_warm_iters;						\
+	size_t __bn_i;							\
+	u64 __bn_t;							\
+									\
+	__bn_warm_iters = max(__bn_iters / 10, 50U);			\
+									\
+	for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++)		\
+		(void)__func(__VA_ARGS__);				\
+									\
+	preempt_disable();						\
+	__bn_t = ktime_get_ns();					\
+	for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++)			\
+		(void)__func(__VA_ARGS__);				\
+	__bn_t = ktime_get_ns() - __bn_t;				\
+	preempt_enable();						\
+	__bn_t;								\
+})
+
+/**
+ * STRING_BENCH_BUF() - Benchmark harness for single-buffer functions.
+ * @test: KUnit context.
+ * @buf_name: Local char * variable name to be defined.
+ * @buf_size: Local size_t variable name to be defined.
+ * @func: Function to benchmark.
+ * @...: Extra arguments for @func.
+ *
+ * Prepares a randomized, NUL-terminated buffer and iterates through lengths
+ * in bench_lens, defining @buf_name and @buf_size in each loop.
+ */
+#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
+do {									\
+	size_t buf_size, _bn_i, _bn_iters, _bn_size = 0;		\
+	u64 _bn_t, _bn_mbps = 0, _bn_lat = 0;				\
+	char *buf_name, *_bn_buf;					\
+									\
+	if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH))			\
+		kunit_skip(test, "not enabled");			\
+									\
+	_bn_buf = alloc_max_bench_buffer(test, bench_lens,		\
+			ARRAY_SIZE(bench_lens), &_bn_size);		\
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf);			\
+									\
+	fill_random_string(_bn_buf, _bn_size);				\
+									\
+	for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) {	\
+		buf_size = bench_lens[_bn_i];				\
+		buf_name = _bn_buf + _bn_size - buf_size - 1;		\
+		_bn_iters = STRING_BENCH_WORKLOAD / max(buf_size, 1U);	\
+									\
+		_bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__);	\
+									\
+		if (_bn_t > 0) {					\
+			_bn_mbps = (u64)(buf_size) * _bn_iters * 1000;	\
+			_bn_mbps = div64_u64(_bn_mbps, _bn_t);		\
+			_bn_lat = div64_u64(_bn_t, _bn_iters);		\
+		}							\
+		kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n",	\
+				buf_size, _bn_mbps, _bn_lat);		\
+	}								\
+} while (0)
+
+static void string_bench_strlen(struct kunit *test)
+{
+	STRING_BENCH_BUF(test, buf, len, strlen, buf);
+}
+
 static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_memset16),
 	KUNIT_CASE(string_test_memset32),
@@ -725,6 +882,7 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_strtomem),
 	KUNIT_CASE(string_test_memtostr),
 	KUNIT_CASE(string_test_strends),
+	KUNIT_CASE(string_bench_strlen),
 	{}
 };
 
-- 
2.25.1

Re: [PATCH v4 4/8] lib/string_kunit: add performance benchmark for strlen()

Posted by Andy Shevchenko 2 weeks, 2 days ago

On Fri, Jan 23, 2026 at 04:58:37PM +0800, Feng Jiang wrote:
> Introduce a benchmarking framework to the string_kunit test suite to
> measure the execution efficiency of string functions.
> 
> The implementation is inspired by crc_benchmark(), measuring throughput
> (MB/s) and latency (ns/call) across a range of string lengths. It
> includes a warm-up phase, disables preemption during measurement, and
> uses a fixed seed for reproducible results.
> 
> This framework allows for comparing different implementations (e.g.,
> generic C vs. architecture-optimized assembly) within the KUnit
> environment.
> 
> Initially, provide a benchmark for strlen().

...

> +static void *alloc_max_bench_buffer(struct kunit *test,
> +		const size_t *lens, size_t count, size_t *buf_len)
> +{
> +	size_t i, max_len = 0;
> +	void *buf;

> +	for (i = 0; i < count; i++) {
> +		if (max_len < lens[i])
> +			max_len = lens[i];
> +	}

	size_t max_len = 0;
	void *buf;

	for (size_t i = 0; i < count; i++)
		max_len = max(lens[i], max_len);

> +	/* Add space for NUL character */
> +	max_len += 1;
> +
> +	buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
> +	if (!buf)
> +		return NULL;
> +
> +	if (buf_len)
> +		*buf_len = max_len;
> +
> +	return buf;
> +}

...

> +#define STRING_BENCH(iters, func, ...)					\
> +({									\
> +	/* Volatile function pointer prevents dead code elimination */	\
> +	typeof(func) (* volatile __func) = (func);			\
> +	size_t __bn_iters = (iters);					\
> +	size_t __bn_warm_iters;						\

> +	size_t __bn_i;							\

Define it inside for-loop:s.

> +	u64 __bn_t;							\
> +									\
> +	__bn_warm_iters = max(__bn_iters / 10, 50U);			\
> +									\
> +	for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++)		\
> +		(void)__func(__VA_ARGS__);				\
> +									\
> +	preempt_disable();						\
> +	__bn_t = ktime_get_ns();					\
> +	for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++)			\
> +		(void)__func(__VA_ARGS__);				\
> +	__bn_t = ktime_get_ns() - __bn_t;				\
> +	preempt_enable();						\
> +	__bn_t;								\
> +})

...

> +#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
> +do {									\
> +	size_t buf_size, _bn_i, _bn_iters, _bn_size = 0;		\
> +	u64 _bn_t, _bn_mbps = 0, _bn_lat = 0;				\
> +	char *buf_name, *_bn_buf;					\

> +	if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH))			\
> +		kunit_skip(test, "not enabled");			\

Hmm... Since it's a macro anyway, I think the old style is okay:


#if IS_ENABLED(CONFIG_STRING_KUNIT_BENCH)
#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
	...
#else
#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
	kunit_skip(test, "not enabled");				\
#endif

But check it that it doesn't produce warnings in `make W=1` case.

> +	_bn_buf = alloc_max_bench_buffer(test, bench_lens,		\
> +			ARRAY_SIZE(bench_lens), &_bn_size);		\
> +	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf);			\
> +									\
> +	fill_random_string(_bn_buf, _bn_size);				\
> +									\
> +	for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) {	\
> +		buf_size = bench_lens[_bn_i];				\
> +		buf_name = _bn_buf + _bn_size - buf_size - 1;		\
> +		_bn_iters = STRING_BENCH_WORKLOAD / max(buf_size, 1U);	\
> +									\
> +		_bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__);	\
> +									\
> +		if (_bn_t > 0) {					\
> +			_bn_mbps = (u64)(buf_size) * _bn_iters * 1000;	\

"KILO"? Or "(MEGA/KILO)"? I'm puzzled with this 1000 multiplier.

> +			_bn_mbps = div64_u64(_bn_mbps, _bn_t);		\
> +			_bn_lat = div64_u64(_bn_t, _bn_iters);		\
> +		}							\
> +		kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n",	\
> +				buf_size, _bn_mbps, _bn_lat);		\
> +	}								\
> +} while (0)

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v4 4/8] lib/string_kunit: add performance benchmark for strlen()

Posted by Feng Jiang 2 weeks ago

On 2026/1/23 19:02, Andy Shevchenko wrote:
> On Fri, Jan 23, 2026 at 04:58:37PM +0800, Feng Jiang wrote:
>> Introduce a benchmarking framework to the string_kunit test suite to
>> measure the execution efficiency of string functions.
>>
>> The implementation is inspired by crc_benchmark(), measuring throughput
>> (MB/s) and latency (ns/call) across a range of string lengths. It
>> includes a warm-up phase, disables preemption during measurement, and
>> uses a fixed seed for reproducible results.
>>
>> This framework allows for comparing different implementations (e.g.,
>> generic C vs. architecture-optimized assembly) within the KUnit
>> environment.
>>
>> Initially, provide a benchmark for strlen().
> 
> ...
> 
>> +static void *alloc_max_bench_buffer(struct kunit *test,
>> +		const size_t *lens, size_t count, size_t *buf_len)
>> +{
>> +	size_t i, max_len = 0;
>> +	void *buf;
> 
>> +	for (i = 0; i < count; i++) {
>> +		if (max_len < lens[i])
>> +			max_len = lens[i];
>> +	}
> 
> 	size_t max_len = 0;
> 	void *buf;
> 
> 	for (size_t i = 0; i < count; i++)
> 		max_len = max(lens[i], max_len);
> 

Agreed. I will simplify the loop and use max() as suggested.

>> +	/* Add space for NUL character */
>> +	max_len += 1;
>> +
>> +	buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
>> +	if (!buf)
>> +		return NULL;
>> +
>> +	if (buf_len)
>> +		*buf_len = max_len;
>> +
>> +	return buf;
>> +}
> 
> ...
> 
>> +#define STRING_BENCH(iters, func, ...)					\
>> +({									\
>> +	/* Volatile function pointer prevents dead code elimination */	\
>> +	typeof(func) (* volatile __func) = (func);			\
>> +	size_t __bn_iters = (iters);					\
>> +	size_t __bn_warm_iters;						\
> 
>> +	size_t __bn_i;							\
> 
> Define it inside for-loop:s.
> 

Will do.

>> +	u64 __bn_t;							\
>> +									\
>> +	__bn_warm_iters = max(__bn_iters / 10, 50U);			\
>> +									\
>> +	for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++)		\
>> +		(void)__func(__VA_ARGS__);				\
>> +									\
>> +	preempt_disable();						\
>> +	__bn_t = ktime_get_ns();					\
>> +	for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++)			\
>> +		(void)__func(__VA_ARGS__);				\
>> +	__bn_t = ktime_get_ns() - __bn_t;				\
>> +	preempt_enable();						\
>> +	__bn_t;								\
>> +})
> 
> ...
> 
>> +#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
>> +do {									\
>> +	size_t buf_size, _bn_i, _bn_iters, _bn_size = 0;		\
>> +	u64 _bn_t, _bn_mbps = 0, _bn_lat = 0;				\
>> +	char *buf_name, *_bn_buf;					\
> 
>> +	if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH))			\
>> +		kunit_skip(test, "not enabled");			\
> 
> Hmm... Since it's a macro anyway, I think the old style is okay:
> > 
> #if IS_ENABLED(CONFIG_STRING_KUNIT_BENCH)
> #define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
> 	...
> #else
> #define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...)		\
> 	kunit_skip(test, "not enabled");				\
> #endif
> 
> But check it that it doesn't produce warnings in `make W=1` case.
> 

Thanks. Using #if IS_ENABLED(...) to define the macro differently is cleaner.
I will implement it this way and ensure it passes make W=1 without warnings

>> +	_bn_buf = alloc_max_bench_buffer(test, bench_lens,		\
>> +			ARRAY_SIZE(bench_lens), &_bn_size);		\
>> +	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf);			\
>> +									\
>> +	fill_random_string(_bn_buf, _bn_size);				\
>> +									\
>> +	for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) {	\
>> +		buf_size = bench_lens[_bn_i];				\
>> +		buf_name = _bn_buf + _bn_size - buf_size - 1;		\
>> +		_bn_iters = STRING_BENCH_WORKLOAD / max(buf_size, 1U);	\
>> +									\
>> +		_bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__);	\
>> +									\
>> +		if (_bn_t > 0) {					\
>> +			_bn_mbps = (u64)(buf_size) * _bn_iters * 1000;	\
> 
> "KILO"? Or "(MEGA/KILO)"? I'm puzzled with this 1000 multiplier.
> 

The 1000 factor converts bytes/ns to MB/s：
  (bytes/ns) * (10^9 ns/s) / (10^6 bytes/MB)
In v5, I will replace it with (NSEC_PER_SEC / MEGA) to make the unit
conversion explicit and avoid confusion.

>> +			_bn_mbps = div64_u64(_bn_mbps, _bn_t);		\
>> +			_bn_lat = div64_u64(_bn_t, _bn_iters);		\
>> +		}							\
>> +		kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n",	\
>> +				buf_size, _bn_mbps, _bn_lat);		\
>> +	}								\
>> +} while (0)
> 

Thanks again for your time and for the detailed review!

-- 
With Best Regards,
Feng Jiang

Re: [PATCH v4 4/8] lib/string_kunit: add performance benchmark for strlen()

Posted by Andy Shevchenko 2 weeks ago

On Mon, Jan 26, 2026 at 8:14 AM Feng Jiang <jiangfeng@kylinos.cn> wrote:
> On 2026/1/23 19:02, Andy Shevchenko wrote:
> > On Fri, Jan 23, 2026 at 04:58:37PM +0800, Feng Jiang wrote:

...

> >> +                    _bn_mbps = (u64)(buf_size) * _bn_iters * 1000;  \
> >
> > "KILO"? Or "(MEGA/KILO)"? I'm puzzled with this 1000 multiplier.
>
> The 1000 factor converts bytes/ns to MB/s：
>   (bytes/ns) * (10^9 ns/s) / (10^6 bytes/MB)
> In v5, I will replace it with (NSEC_PER_SEC / MEGA) to make the unit
> conversion explicit and avoid confusion.

Or NSEC_PER_USEC. Whatever, choose the one you think fits best.

-- 
With Best Regards,
Andy Shevchenko

[PATCH v4 1/8] lib/string_kunit: add correctness test for strlen()
[PATCH v4 2/8] lib/string_kunit: add correctness test for strnlen()
[PATCH v4 3/8] lib/string_kunit: add correctness test for strrchr()
[PATCH v4 4/8] lib/string_kunit: add performance benchmark for strlen()
[PATCH v4 5/8] lib/string_kunit: extend benchmarks to strnlen() and chr searches
[PATCH v4 6/8] riscv: lib: add strnlen() implementation
[PATCH v4 7/8] riscv: lib: add strchr() implementation
[PATCH v4 8/8] riscv: lib: add strrchr() implementation