To support performance benchmarking in KUnit tests, extract the
generic C implementation of strlen() into a standalone function
__generic_strlen(). This allows tests to compare architecture-optimized
versions against the generic baseline without duplicating code.
Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
include/linux/string.h | 1 +
lib/string.c | 10 ++++++++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/string.h b/include/linux/string.h
index 1b564c36d721..961645633b4d 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
#ifndef __HAVE_ARCH_STRNSTR
extern char * strnstr(const char *, const char *, size_t);
#endif
+extern __kernel_size_t __generic_strlen(const char *);
#ifndef __HAVE_ARCH_STRLEN
extern __kernel_size_t strlen(const char *);
#endif
diff --git a/lib/string.c b/lib/string.c
index b632c71df1a5..047ecb38e09b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
EXPORT_SYMBOL(strnchr);
#endif
-#ifndef __HAVE_ARCH_STRLEN
-size_t strlen(const char *s)
+size_t __generic_strlen(const char *s)
{
const char *sc;
@@ -421,6 +420,13 @@ size_t strlen(const char *s)
/* nothing */;
return sc - s;
}
+EXPORT_SYMBOL(__generic_strlen);
+
+#ifndef __HAVE_ARCH_STRLEN
+size_t strlen(const char *s)
+{
+ return __generic_strlen(s);
+}
EXPORT_SYMBOL(strlen);
#endif
--
2.25.1
On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
> To support performance benchmarking in KUnit tests, extract the
> generic C implementation of strlen() into a standalone function
> __generic_strlen(). This allows tests to compare architecture-optimized
> versions against the generic baseline without duplicating code.
>
> Suggested-by: Andy Shevchenko <andy@kernel.org>
> Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
> ---
> include/linux/string.h | 1 +
> lib/string.c | 10 ++++++++--
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 1b564c36d721..961645633b4d 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
> #ifndef __HAVE_ARCH_STRNSTR
> extern char * strnstr(const char *, const char *, size_t);
> #endif
> +extern __kernel_size_t __generic_strlen(const char *);
> #ifndef __HAVE_ARCH_STRLEN
> extern __kernel_size_t strlen(const char *);
> #endif
> diff --git a/lib/string.c b/lib/string.c
> index b632c71df1a5..047ecb38e09b 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
> EXPORT_SYMBOL(strnchr);
> #endif
>
> -#ifndef __HAVE_ARCH_STRLEN
> -size_t strlen(const char *s)
> +size_t __generic_strlen(const char *s)
> {
> const char *sc;
>
> @@ -421,6 +420,13 @@ size_t strlen(const char *s)
> /* nothing */;
> return sc - s;
> }
> +EXPORT_SYMBOL(__generic_strlen);
> +
> +#ifndef __HAVE_ARCH_STRLEN
> +size_t strlen(const char *s)
> +{
> + return __generic_strlen(s);
> +}
> EXPORT_SYMBOL(strlen);
A similar problem exists with the architecture-optimized CRC and crypto
functions. Historically, these subsystems exported both generic and
architecture-optimized functions.
We've actually been moving away from that design to simplify things.
For example, for CRC-32C there's now just the crc32c() function which
delegates to the "best" CRC-32C implementation, with no direct access to
the generic implementation of CRC-32C.
crc_kunit then just tests and benchmarks crc32c(). To check how the
performance of crc32c() changes when its implementation changes (whether
the change is the addition of an arch-optimized implementation or a
change in an existing arch-optimized implementation), the developer just
needs to run crc_kunit with two kernels, before and after.
I suggest just doing that. In that case there would be no need to
export the generic implementations of these functions.
(Also note that *if* the generic functions are exported, they probably
should be exported only when the KUnit test is enabled. There's no need
to include them in the kernel image when the test isn't enabled.)
- Eric
On Tue, 13 Jan 2026 16:01:51 -0800 Eric Biggers <ebiggers@kernel.org> wrote: .. > A similar problem exists with the architecture-optimized CRC and crypto > functions. Historically, these subsystems exported both generic and > architecture-optimized functions. > > We've actually been moving away from that design to simplify things. > For example, for CRC-32C there's now just the crc32c() function which > delegates to the "best" CRC-32C implementation, with no direct access to > the generic implementation of CRC-32C. > > crc_kunit then just tests and benchmarks crc32c(). To check how the > performance of crc32c() changes when its implementation changes (whether > the change is the addition of an arch-optimized implementation or a > change in an existing arch-optimized implementation), the developer just > needs to run crc_kunit with two kernels, before and after. For the mul_div tests I arranged that the test code could #include the source for the generic implementation so it could run that as well as the version compiled into the main kernel. This involved wrapping the function in: #if !defined(function) || defined(test_function) type function(args) ... } #if !defined(function) EXPORT_SYMBOL(function) #endif #endif So the test code can use: #define function generic_function #define test_function #include "function.c" to get a private copy of the generic code. David
On 2026/1/14 18:10, David Laight wrote: > On Tue, 13 Jan 2026 16:01:51 -0800 > Eric Biggers <ebiggers@kernel.org> wrote: > > .. >> A similar problem exists with the architecture-optimized CRC and crypto >> functions. Historically, these subsystems exported both generic and >> architecture-optimized functions. >> >> We've actually been moving away from that design to simplify things. >> For example, for CRC-32C there's now just the crc32c() function which >> delegates to the "best" CRC-32C implementation, with no direct access to >> the generic implementation of CRC-32C. >> >> crc_kunit then just tests and benchmarks crc32c(). To check how the >> performance of crc32c() changes when its implementation changes (whether >> the change is the addition of an arch-optimized implementation or a >> change in an existing arch-optimized implementation), the developer just >> needs to run crc_kunit with two kernels, before and after. > > For the mul_div tests I arranged that the test code could #include the > source for the generic implementation so it could run that as well as > the version compiled into the main kernel. > > This involved wrapping the function in: > #if !defined(function) || defined(test_function) > type function(args) > ... > } > #if !defined(function) > EXPORT_SYMBOL(function) > #endif > #endif > > So the test code can use: > #define function generic_function > #define test_function > #include "function.c" > > to get a private copy of the generic code. Thank you for the suggestion! That technique is very clever and interesting— I've definitely learned something new and will keep it in mind for the future. However, since lib/string.c is such a foundational and low-level library, I'm hesitant to add macro wrappers or conditional blocks for KUnit. Given its importance, I feel that increasing its complexity for side-by-side testing isn't quite worth it. I'd prefer to keep the core code clean and follow Eric's minimalist approach of benchmarking across different kernel configurations. I really appreciate the guidance! -- With Best Regards, Feng Jiang
On Thu, Jan 15, 2026 at 8:51 AM Feng Jiang <jiangfeng@kylinos.cn> wrote: > On 2026/1/14 18:10, David Laight wrote: > > On Tue, 13 Jan 2026 16:01:51 -0800 > > Eric Biggers <ebiggers@kernel.org> wrote: ... > >> A similar problem exists with the architecture-optimized CRC and crypto > >> functions. Historically, these subsystems exported both generic and > >> architecture-optimized functions. > >> > >> We've actually been moving away from that design to simplify things. > >> For example, for CRC-32C there's now just the crc32c() function which > >> delegates to the "best" CRC-32C implementation, with no direct access to > >> the generic implementation of CRC-32C. > >> > >> crc_kunit then just tests and benchmarks crc32c(). To check how the > >> performance of crc32c() changes when its implementation changes (whether > >> the change is the addition of an arch-optimized implementation or a > >> change in an existing arch-optimized implementation), the developer just > >> needs to run crc_kunit with two kernels, before and after. > > > > For the mul_div tests I arranged that the test code could #include the > > source for the generic implementation so it could run that as well as > > the version compiled into the main kernel. > > > > This involved wrapping the function in: > > #if !defined(function) || defined(test_function) > > type function(args) > > ... > > } > > #if !defined(function) > > EXPORT_SYMBOL(function) > > #endif > > #endif > > > > So the test code can use: > > #define function generic_function > > #define test_function > > #include "function.c" > > > > to get a private copy of the generic code. > > Thank you for the suggestion! That technique is very clever and interesting— > I've definitely learned something new and will keep it in mind for the future. > > However, since lib/string.c is such a foundational and low-level library, I'm > hesitant to add macro wrappers or conditional blocks for KUnit. Given its > importance, I feel that increasing its complexity for side-by-side testing > isn't quite worth it. I'd prefer to keep the core code clean and follow Eric's > minimalist approach of benchmarking across different kernel configurations. > > I really appreciate the guidance! I second you, we currently may stick with what Eric proposed and consider other approaches in the future where appropriate. -- With Best Regards, Andy Shevchenko
On Tue, Jan 13, 2026 at 04:01:51PM -0800, Eric Biggers wrote: > On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote: > > To support performance benchmarking in KUnit tests, extract the > > generic C implementation of strlen() into a standalone function > > __generic_strlen(). This allows tests to compare architecture-optimized > > versions against the generic baseline without duplicating code. ... > A similar problem exists with the architecture-optimized CRC and crypto > functions. Historically, these subsystems exported both generic and > architecture-optimized functions. > > We've actually been moving away from that design to simplify things. > For example, for CRC-32C there's now just the crc32c() function which > delegates to the "best" CRC-32C implementation, with no direct access to > the generic implementation of CRC-32C. > > crc_kunit then just tests and benchmarks crc32c(). To check how the > performance of crc32c() changes when its implementation changes (whether > the change is the addition of an arch-optimized implementation or a > change in an existing arch-optimized implementation), the developer just > needs to run crc_kunit with two kernels, before and after. > > I suggest just doing that. In that case there would be no need to > export the generic implementations of these functions. This also would work for me! Whatever, folks, you find the best from the readability and maintenance point of view. > (Also note that *if* the generic functions are exported, they probably > should be exported only when the KUnit test is enabled. There's no need > to include them in the kernel image when the test isn't enabled.) True. -- With Best Regards, Andy Shevchenko
On 2026/1/14 08:01, Eric Biggers wrote:
> On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
>> To support performance benchmarking in KUnit tests, extract the
>> generic C implementation of strlen() into a standalone function
>> __generic_strlen(). This allows tests to compare architecture-optimized
>> versions against the generic baseline without duplicating code.
>>
>> Suggested-by: Andy Shevchenko <andy@kernel.org>
>> Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
>> ---
>> include/linux/string.h | 1 +
>> lib/string.c | 10 ++++++++--
>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 1b564c36d721..961645633b4d 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
>> #ifndef __HAVE_ARCH_STRNSTR
>> extern char * strnstr(const char *, const char *, size_t);
>> #endif
>> +extern __kernel_size_t __generic_strlen(const char *);
>> #ifndef __HAVE_ARCH_STRLEN
>> extern __kernel_size_t strlen(const char *);
>> #endif
>> diff --git a/lib/string.c b/lib/string.c
>> index b632c71df1a5..047ecb38e09b 100644
>> --- a/lib/string.c
>> +++ b/lib/string.c
>> @@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
>> EXPORT_SYMBOL(strnchr);
>> #endif
>>
>> -#ifndef __HAVE_ARCH_STRLEN
>> -size_t strlen(const char *s)
>> +size_t __generic_strlen(const char *s)
>> {
>> const char *sc;
>>
>> @@ -421,6 +420,13 @@ size_t strlen(const char *s)
>> /* nothing */;
>> return sc - s;
>> }
>> +EXPORT_SYMBOL(__generic_strlen);
>> +
>> +#ifndef __HAVE_ARCH_STRLEN
>> +size_t strlen(const char *s)
>> +{
>> + return __generic_strlen(s);
>> +}
>> EXPORT_SYMBOL(strlen);
>
> A similar problem exists with the architecture-optimized CRC and crypto
> functions. Historically, these subsystems exported both generic and
> architecture-optimized functions.
>
> We've actually been moving away from that design to simplify things.
> For example, for CRC-32C there's now just the crc32c() function which
> delegates to the "best" CRC-32C implementation, with no direct access to
> the generic implementation of CRC-32C.
>
> crc_kunit then just tests and benchmarks crc32c(). To check how the
> performance of crc32c() changes when its implementation changes (whether
> the change is the addition of an arch-optimized implementation or a
> change in an existing arch-optimized implementation), the developer just
> needs to run crc_kunit with two kernels, before and after.
>
> I suggest just doing that. In that case there would be no need to
> export the generic implementations of these functions.
>
> (Also note that *if* the generic functions are exported, they probably
> should be exported only when the KUnit test is enabled. There's no need
> to include them in the kernel image when the test isn't enabled.)
>
> - Eric
Hi Eric, Andy,
Thanks for the insights. I agree with Eric's point on keeping the internal
implementations encapsulated. It's a cleaner design for the long term.
In v3, I will drop the __generic_* exports and simplify the patchset
to benchmark only the standard functions.
To address Andy's concern regarding performance, I will provide a "Before
vs. After" comparison in the v3 cover letter. This should demonstrate
the speedup while keeping the core kernel code tidy. I'll also refine the
benchmark logic to ensure more realistic results as discussed.
This seems to be the most robust way to validate the optimizations without
adding unnecessary exports.
--
With Best Regards,
Feng Jiang
On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
> To support performance benchmarking in KUnit tests, extract the
> generic C implementation of strlen() into a standalone function
> __generic_strlen(). This allows tests to compare architecture-optimized
> versions against the generic baseline without duplicating code.
...
> +size_t strlen(const char *s)
> +{
> + return __generic_strlen(s);
> +}
> EXPORT_SYMBOL(strlen);
There is no point anymore to have this as an exported function, right? So it can
be moved to string.h as static inline.
--
With Best Regards,
Andy Shevchenko
© 2016 - 2026 Red Hat, Inc.