[v2] bitops: Optimize fns() for improved performance

[PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()

Posted by Kuan-Wei Chiu 1 year, 9 months ago

Introduce a benchmark test for the fns(). It measures the total time
taken by fns() to process 1,000,000 test data generated using
get_random_long() for each n in the range [0, BITS_PER_LONG].

Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
index d3fb09e6eff1..8712eacf3bbd 100644
--- a/lib/find_bit_benchmark.c
+++ b/lib/find_bit_benchmark.c
@@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
 	return 0;
 }
 
+static int __init test_fns(void)
+{
+	const unsigned long round = 1000000;
+	s64 time[BITS_PER_LONG + 1];
+	unsigned int i, n;
+	volatile unsigned long x, y;
+
+	for (n = 0; n <= BITS_PER_LONG; n++) {
+		time[n] = ktime_get();
+		for (i = 0; i < round; i++) {
+			x = get_random_long();
+			y = fns(x, n);
+		}
+		time[n] = ktime_get() - time[n];
+	}
+
+	for (n = 0; n <= BITS_PER_LONG; n++)
+		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
+
+	return 0;
+}
+
 static int __init find_bit_test(void)
 {
 	unsigned long nbits = BITMAP_LEN / SPARSE;
@@ -186,6 +208,9 @@ static int __init find_bit_test(void)
 	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
 	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
 
+	pr_err("\nStart testing for fns()\n");
+	test_fns();
+
 	/*
 	 * Everything is OK. Return error just to let user run benchmark
 	 * again without annoying rmmod.
-- 
2.34.1

Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()

Posted by kernel test robot 1 year, 9 months ago

Hi Kuan-Wei,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.9-rc6 next-20240430]
[cannot apply to akpm-mm/mm-everything akpm-mm/mm-nonmm-unstable]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kuan-Wei-Chiu/lib-find_bit_benchmark-Add-benchmark-test-for-fns/20240430-144241
base:   linus/master
patch link:    https://lore.kernel.org/r/20240430054912.124237-2-visitorckw%40gmail.com
patch subject: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()
config: i386-buildonly-randconfig-004-20240501 (https://download.01.org/0day-ci/archive/20240501/202405010642.tHmTpd1i-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240501/202405010642.tHmTpd1i-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405010642.tHmTpd1i-lkp@intel.com/

All warnings (new ones prefixed by >>):

   lib/find_bit_benchmark.c: In function 'test_fns':
>> lib/find_bit_benchmark.c:154:35: warning: variable 'y' set but not used [-Wunused-but-set-variable]
     154 |         volatile unsigned long x, y;
         |                                   ^


vim +/y +154 lib/find_bit_benchmark.c

   148	
   149	static int __init test_fns(void)
   150	{
   151		const unsigned long round = 1000000;
   152		s64 time[BITS_PER_LONG + 1];
   153		unsigned int i, n;
 > 154		volatile unsigned long x, y;
   155	
   156		for (n = 0; n <= BITS_PER_LONG; n++) {
   157			time[n] = ktime_get();
   158			for (i = 0; i < round; i++) {
   159				x = get_random_long();
   160				y = fns(x, n);
   161			}
   162			time[n] = ktime_get() - time[n];
   163		}
   164	
   165		for (n = 0; n <= BITS_PER_LONG; n++)
   166			pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
   167	
   168		return 0;
   169	}
   170	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()

Posted by Yury Norov 1 year, 9 months ago

On Tue, Apr 30, 2024 at 01:49:11PM +0800, Kuan-Wei Chiu wrote:
> Introduce a benchmark test for the fns(). It measures the total time
> taken by fns() to process 1,000,000 test data generated using
> get_random_long() for each n in the range [0, BITS_PER_LONG].

Can you also print an example of test output?
 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> index d3fb09e6eff1..8712eacf3bbd 100644
> --- a/lib/find_bit_benchmark.c
> +++ b/lib/find_bit_benchmark.c
> @@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
>  	return 0;
>  }
>  
> +static int __init test_fns(void)
> +{
> +	const unsigned long round = 1000000;
> +	s64 time[BITS_PER_LONG + 1];
> +	unsigned int i, n;
> +	volatile unsigned long x, y;
> +
> +	for (n = 0; n <= BITS_PER_LONG; n++) {

n == BITS_PER_LONG is an error. Testing error case together with
normal cases is even worse error because it fools readers.

> +		time[n] = ktime_get();
> +		for (i = 0; i < round; i++) {
> +			x = get_random_long();
> +			y = fns(x, n);
> +		}

Here you count fns() + get_random_long() time. For your microbench
purposes it would be better exclude a random number generation
overhead.

> +		time[n] = ktime_get() - time[n];
> +	}
> +
> +	for (n = 0; n <= BITS_PER_LONG; n++)
> +		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);

Nah, not like that. Each test in there prints one line in the
report. Let's keep it that way for test_fns() too. Unless we have
a strong evidence that fns() for a particular input is worth to be
tracked separately, let's just print a total gross?

> +
> +	return 0;
> +}

I'd suggest to modify it like:

        static unsigned long buf[1000000];

        static int __init test_fns(void)
        {
                get_random_bytes(buf, ARRAY_SIZE(buf));
                time = ktime_get();

                for (n = 0; n < BITS_PER_LONG; n++)
                        for (i = 0; i < 1000000; i++)
                                fns(buf[i], n);

                time = ktime_get() - time;
                pr_err(...);
        }

>  static int __init find_bit_test(void)
>  {
>  	unsigned long nbits = BITMAP_LEN / SPARSE;
> @@ -186,6 +208,9 @@ static int __init find_bit_test(void)
>  	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
>  	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
>  
> +	pr_err("\nStart testing for fns()\n");
> +	test_fns();

There are 2 sections in the test - one for regular, and another for
sparse data. Adding a new section for a just one function doesn't look
like a good idea.

Even more, the fns() is already tested here. Maybe test_bitops is a
better place for this test?

> +
>  	/*
>  	 * Everything is OK. Return error just to let user run benchmark
>  	 * again without annoying rmmod.
> -- 
> 2.34.1

Re: [PATCH v2 1/2] lib/find_bit_benchmark: Add benchmark test for fns()

Posted by Kuan-Wei Chiu 1 year, 9 months ago

On Tue, Apr 30, 2024 at 10:24:03AM -0700, Yury Norov wrote:
> On Tue, Apr 30, 2024 at 01:49:11PM +0800, Kuan-Wei Chiu wrote:
> > Introduce a benchmark test for the fns(). It measures the total time
> > taken by fns() to process 1,000,000 test data generated using
> > get_random_long() for each n in the range [0, BITS_PER_LONG].
> 
> Can you also print an example of test output?
>  
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >  lib/find_bit_benchmark.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c
> > index d3fb09e6eff1..8712eacf3bbd 100644
> > --- a/lib/find_bit_benchmark.c
> > +++ b/lib/find_bit_benchmark.c
> > @@ -146,6 +146,28 @@ static int __init test_find_next_and_bit(const void *bitmap,
> >  	return 0;
> >  }
> >  
> > +static int __init test_fns(void)
> > +{
> > +	const unsigned long round = 1000000;
> > +	s64 time[BITS_PER_LONG + 1];
> > +	unsigned int i, n;
> > +	volatile unsigned long x, y;
> > +
> > +	for (n = 0; n <= BITS_PER_LONG; n++) {
> 
> n == BITS_PER_LONG is an error. Testing error case together with
> normal cases is even worse error because it fools readers.
>
My initial intention was to add a test for fns() always returning
BITS_PER_LONG. However, I agree that this is not a good idea and may
confuse readers.

> > +		time[n] = ktime_get();
> > +		for (i = 0; i < round; i++) {
> > +			x = get_random_long();
> > +			y = fns(x, n);
> > +		}
> 
> Here you count fns() + get_random_long() time. For your microbench
> purposes it would be better exclude a random number generation
> overhead.
> 
> > +		time[n] = ktime_get() - time[n];
> > +	}
> > +
> > +	for (n = 0; n <= BITS_PER_LONG; n++)
> > +		pr_err("fns: n = %2u: %12lld ns\n", n, time[n]);
> 
> Nah, not like that. Each test in there prints one line in the
> report. Let's keep it that way for test_fns() too. Unless we have
> a strong evidence that fns() for a particular input is worth to be
> tracked separately, let's just print a total gross?
> 
> > +
> > +	return 0;
> > +}
> 
> I'd suggest to modify it like:
> 
>         static unsigned long buf[1000000];
> 
>         static int __init test_fns(void)
>         {
>                 get_random_bytes(buf, ARRAY_SIZE(buf));

Instead of ARRAY_SIZE(buf), it should be sizeof(buf).

>                 time = ktime_get();
> 
>                 for (n = 0; n < BITS_PER_LONG; n++)
>                         for (i = 0; i < 1000000; i++)
>                                 fns(buf[i], n);
> 
>                 time = ktime_get() - time;
>                 pr_err(...);
>         }
>

That does seem like a better approach. I'll move it to lib/test_bitops
and send a v3 patch series.

Regards,
Kuan-Wei

> >  static int __init find_bit_test(void)
> >  {
> >  	unsigned long nbits = BITMAP_LEN / SPARSE;
> > @@ -186,6 +208,9 @@ static int __init find_bit_test(void)
> >  	test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN);
> >  	test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
> >  
> > +	pr_err("\nStart testing for fns()\n");
> > +	test_fns();
> 
> There are 2 sections in the test - one for regular, and another for
> sparse data. Adding a new section for a just one function doesn't look
> like a good idea.
> 
> Even more, the fns() is already tested here. Maybe test_bitops is a
> better place for this test?
> 
> > +
> >  	/*
> >  	 * Everything is OK. Return error just to let user run benchmark
> >  	 * again without annoying rmmod.
> > -- 
> > 2.34.1