fs/resctrl: Slightly optimize cbm_validate()

[PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Christophe JAILLET 3 months, 2 weeks ago

'first_bit' is known to be 1, so it can be skipped when searching for the
next 0 bit. Doing so mimics bitmap_next_set_region() and can save a few
cycles.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
Compile tested only.

For the records, on x86, the diff of the asm code is:
--- fs/resctrl/ctrlmondata.s.old        2025-10-26 08:21:46.928920563 +0100
+++ fs/resctrl/ctrlmondata.s    2025-10-26 08:21:40.864024143 +0100
@@ -1603,11 +1603,12 @@
        call    _find_first_bit
 # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
        movq    %r12, %rsi
-       leaq    48(%rsp), %rdi
-       movq    %rax, %rdx
+# fs/resctrl/ctrlmondata.c:133:        zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
+       leaq    1(%rax), %rdx
 # ./include/linux/find.h:214:  return _find_first_bit(addr, size);
        movq    %rax, 8(%rsp)
 # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
+       leaq    48(%rsp), %rdi
        call    _find_next_zero_bit
 # fs/resctrl/ctrlmondata.c:136:        if (!r->cache.arch_has_sparse_bitmasks &&
        leaq    28(%rbx), %rdi
---
 fs/resctrl/ctrlmondata.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 0d0ef54fc4de..1ff479a2dbbc 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -130,7 +130,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
 	}
 
 	first_bit = find_first_bit(&val, cbm_len);
-	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
+	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
 
 	/* Are non-contiguous bitmasks allowed? */
 	if (!r->cache.arch_has_sparse_bitmasks &&
-- 
2.51.0

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Reinette Chatre 3 months, 1 week ago

Hi Christophe,

On 10/26/25 12:39 AM, Christophe JAILLET wrote:
> 'first_bit' is known to be 1, so it can be skipped when searching for the
> next 0 bit. Doing so mimics bitmap_next_set_region() and can save a few
> cycles.

This is not part of a flow where cycles matter and may thus
not be considered unless it forms part of a larger series. We could
work more on getting this ready for inclusion but please be aware that it
may not be considered. This is up to the x86 maintainers so please also
include them in your next submission (x86@kernel.org).

> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---

...

> ---
>  fs/resctrl/ctrlmondata.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 0d0ef54fc4de..1ff479a2dbbc 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -130,7 +130,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
>  	}
>  
>  	first_bit = find_first_bit(&val, cbm_len);
> -	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
> +	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
>  

It looks like cbm_ensure_valid() has the same pattern so a similar change there would
help to make the code consistent.
sidenote: Not related to this patch but looks like cbm_ensure_valid() is missing taking
arch support for sparse masks into account.

>  	/* Are non-contiguous bitmasks allowed? */
>  	if (!r->cache.arch_has_sparse_bitmasks &&

Reinette

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Dave Martin 3 months, 2 weeks ago

Hi,

[Tony, I have a side question on min_cbm_bits -- see below.]

On Sun, Oct 26, 2025 at 08:39:52AM +0100, Christophe JAILLET wrote:
> 'first_bit' is known to be 1, so it can be skipped when searching for the
> next 0 bit. Doing so mimics bitmap_next_set_region() and can save a few
> cycles.

This seems reasonable, although:

Nit: missing statement of what the patch does.  (Your paragraph
describes only something that _could_ be done and gives rationale for
it.)

> 
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> Compile tested only.
> 
> For the records, on x86, the diff of the asm code is:
> --- fs/resctrl/ctrlmondata.s.old        2025-10-26 08:21:46.928920563 +0100
> +++ fs/resctrl/ctrlmondata.s    2025-10-26 08:21:40.864024143 +0100
> @@ -1603,11 +1603,12 @@
>         call    _find_first_bit
>  # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
>         movq    %r12, %rsi
> -       leaq    48(%rsp), %rdi
> -       movq    %rax, %rdx
> +# fs/resctrl/ctrlmondata.c:133:        zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
> +       leaq    1(%rax), %rdx
>  # ./include/linux/find.h:214:  return _find_first_bit(addr, size);
>         movq    %rax, 8(%rsp)
>  # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
> +       leaq    48(%rsp), %rdi

(This is really only showing that the compiler works.  The real
question is whether the logic is still sound after this change to the
arguments of _find_first_bit()...)

>         call    _find_next_zero_bit
>  # fs/resctrl/ctrlmondata.c:136:        if (!r->cache.arch_has_sparse_bitmasks &&
>         leaq    28(%rbx), %rdi
> ---
>  fs/resctrl/ctrlmondata.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 0d0ef54fc4de..1ff479a2dbbc 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -130,7 +130,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
>  	}
>  
>  	first_bit = find_first_bit(&val, cbm_len);
> -	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
> +	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);

Does this definitely do the right thing if val was zero?

>  
>  	/* Are non-contiguous bitmasks allowed? */
>  	if (!r->cache.arch_has_sparse_bitmasks &&

Also, what about the find_first_bit() below?


[...]

<aside>

Also, not directly related to this patch, but, looking at the final if
statement:

	if ((zero_bit - first_bit) < r->cache.min_cbm_bits) {
	        rdt_last_cmd_printf("Need at least %d bits in the mask\n",
	                            r->cache.min_cbm_bits);
	        return false;
	}

If min_cbm_bits is two or greater, this can fail if the bitmap has
enough contiguous set bits but not in the first block of set bits,
and it can succeed if there are blocks of set bits beyond the first
block, that have fewer than min_cbm_bits.

Is that intended?  Do we ever expect arch_has_sparse_bitmasks alongside
min_cbm_bits > 1, or should these be mutually exclusive?

</aside>

Cheers
---Dave

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Luck, Tony 3 months, 1 week ago

On Mon, Oct 27, 2025 at 11:43:49AM +0000, Dave Martin wrote:
> Hi,
> 
> [Tony, I have a side question on min_cbm_bits -- see below.]
> [...]
> 
> <aside>
> 
> Also, not directly related to this patch, but, looking at the final if
> statement:
> 
> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits) {
> 	        rdt_last_cmd_printf("Need at least %d bits in the mask\n",
> 	                            r->cache.min_cbm_bits);
> 	        return false;
> 	}
> 
> If min_cbm_bits is two or greater, this can fail if the bitmap has
> enough contiguous set bits but not in the first block of set bits,
> and it can succeed if there are blocks of set bits beyond the first
> block, that have fewer than min_cbm_bits.
> 
> Is that intended?  Do we ever expect arch_has_sparse_bitmasks alongside
> min_cbm_bits > 1, or should these be mutually exclusive?
> 
> </aside>

There's no enumeration for the minimium number of bits in a CBM mask.
Haswell (first to implemenent L3 cache allocation) got a quirk to
to set it to "2". I don't expect that we'd do that again.

So safe to assume that resctrl doesn't have to handle the combination
of min_cbm_bits > 1 with arch_has_sparse_bitmasks.

-Tony

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Dave Martin 3 months ago

Hi,

On Mon, Nov 03, 2025 at 10:17:16AM -0800, Luck, Tony wrote:
> On Mon, Oct 27, 2025 at 11:43:49AM +0000, Dave Martin wrote:
> > Hi,
> > 
> > [Tony, I have a side question on min_cbm_bits -- see below.]
> > [...]
> > 
> > <aside>
> > 
> > Also, not directly related to this patch, but, looking at the final if
> > statement:
> > 
> > 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits) {
> > 	        rdt_last_cmd_printf("Need at least %d bits in the mask\n",
> > 	                            r->cache.min_cbm_bits);
> > 	        return false;
> > 	}
> > 
> > If min_cbm_bits is two or greater, this can fail if the bitmap has
> > enough contiguous set bits but not in the first block of set bits,
> > and it can succeed if there are blocks of set bits beyond the first
> > block, that have fewer than min_cbm_bits.
> > 
> > Is that intended?  Do we ever expect arch_has_sparse_bitmasks alongside
> > min_cbm_bits > 1, or should these be mutually exclusive?
> > 
> > </aside>
> 
> There's no enumeration for the minimium number of bits in a CBM mask.
> Haswell (first to implemenent L3 cache allocation) got a quirk to
> to set it to "2". I don't expect that we'd do that again.
> 
> So safe to assume that resctrl doesn't have to handle the combination
> of min_cbm_bits > 1 with arch_has_sparse_bitmasks.
> 
> -Tony

OK.  A min_cbm_bits value > 1 seems unlikely with sparse bitmasks
anyway.  If the hardware has independent storage for each bit, there
would be no need for such a constraint...  so I would be surprised to
see this in practice.

Just wanted to check that I wasn't missing something!

In MPAM, bitmap controls always allow each bit to be controlled
independently, according to the architecture.

Cheers
---Dave

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Christophe JAILLET 3 months, 1 week ago

Le 27/10/2025 à 12:43, Dave Martin a écrit :
> Hi,
> 
> [Tony, I have a side question on min_cbm_bits -- see below.]
> 
> On Sun, Oct 26, 2025 at 08:39:52AM +0100, Christophe JAILLET wrote:
>> 'first_bit' is known to be 1, so it can be skipped when searching for the
>> next 0 bit. Doing so mimics bitmap_next_set_region() and can save a few
>> cycles.
> 
> This seems reasonable, although:
> 
> Nit: missing statement of what the patch does.  (Your paragraph
> describes only something that _could_ be done and gives rationale for
> it.)

Will add it in v2.

> 
>>
>> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
>> ---
>> Compile tested only.
>>
>> For the records, on x86, the diff of the asm code is:
>> --- fs/resctrl/ctrlmondata.s.old        2025-10-26 08:21:46.928920563 +0100
>> +++ fs/resctrl/ctrlmondata.s    2025-10-26 08:21:40.864024143 +0100
>> @@ -1603,11 +1603,12 @@
>>          call    _find_first_bit
>>   # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
>>          movq    %r12, %rsi
>> -       leaq    48(%rsp), %rdi
>> -       movq    %rax, %rdx
>> +# fs/resctrl/ctrlmondata.c:133:        zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
>> +       leaq    1(%rax), %rdx
>>   # ./include/linux/find.h:214:  return _find_first_bit(addr, size);
>>          movq    %rax, 8(%rsp)
>>   # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
>> +       leaq    48(%rsp), %rdi
> 
> (This is really only showing that the compiler works.  The real
> question is whether the logic is still sound after this change to the
> arguments of _find_first_bit()...)

Will remove in v2, if not useful.

> 
>>          call    _find_next_zero_bit
>>   # fs/resctrl/ctrlmondata.c:136:        if (!r->cache.arch_has_sparse_bitmasks &&
>>          leaq    28(%rbx), %rdi
>> ---
>>   fs/resctrl/ctrlmondata.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
>> index 0d0ef54fc4de..1ff479a2dbbc 100644
>> --- a/fs/resctrl/ctrlmondata.c
>> +++ b/fs/resctrl/ctrlmondata.c
>> @@ -130,7 +130,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
>>   	}
>>   
>>   	first_bit = find_first_bit(&val, cbm_len);
>> -	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
>> +	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
> 
> Does this definitely do the right thing if val was zero?

Yes, IMHO, it does.

If val is zero, first_bit will be assigned to cbm_len (see [1]).
Then, find_next_zero_bit() will do the same because 'first_bit + 1' will 
overflow the size of the bitmap. (see [2] and [3])

The only case were we could have trouble would be to have 'first_bit + 
1' overflow and be equal to 0. I don't think that such a case is possible.

> 
>>   
>>   	/* Are non-contiguous bitmasks allowed? */
>>   	if (!r->cache.arch_has_sparse_bitmasks &&
> 
> Also, what about the find_first_bit() below?

Should be updated as well.
Will send a v2.

> 
> 
> [...]
> 
> <aside>
> 
> Also, not directly related to this patch, but, looking at the final if
> statement:
> 
> 	if ((zero_bit - first_bit) < r->cache.min_cbm_bits) {
> 	        rdt_last_cmd_printf("Need at least %d bits in the mask\n",
> 	                            r->cache.min_cbm_bits);
> 	        return false;
> 	}
> 
> If min_cbm_bits is two or greater, this can fail if the bitmap has
> enough contiguous set bits but not in the first block of set bits,
> and it can succeed if there are blocks of set bits beyond the first
> block, that have fewer than min_cbm_bits.
> 
> Is that intended?  Do we ever expect arch_has_sparse_bitmasks alongside
> min_cbm_bits > 1, or should these be mutually exclusive?
> 
> </aside>
> 
> Cheers
> ---Dave
> 
> 
CJ

[1]: 
https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/find.h#L203
[2]: 
https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/find.h#L185
[3]: https://elixir.bootlin.com/linux/v6.18-rc3/source/lib/find_bit.c#L55

Re: [PATCH] fs/resctrl: Slightly optimize cbm_validate()

Posted by Dave Martin 3 months, 1 week ago

Hi,

On Sat, Nov 01, 2025 at 02:40:58PM +0100, Christophe JAILLET wrote:
> Le 27/10/2025 à 12:43, Dave Martin a écrit :
> > Hi,
> > 
> > [Tony, I have a side question on min_cbm_bits -- see below.]
> > 
> > On Sun, Oct 26, 2025 at 08:39:52AM +0100, Christophe JAILLET wrote:
> > > 'first_bit' is known to be 1, so it can be skipped when searching for the
> > > next 0 bit. Doing so mimics bitmap_next_set_region() and can save a few
> > > cycles.
> > 
> > This seems reasonable, although:
> > 
> > Nit: missing statement of what the patch does.  (Your paragraph
> > describes only something that _could_ be done and gives rationale for
> > it.)
> 
> Will add it in v2.

Thanks

[...]

> > > For the records, on x86, the diff of the asm code is:
> > > --- fs/resctrl/ctrlmondata.s.old        2025-10-26 08:21:46.928920563 +0100
> > > +++ fs/resctrl/ctrlmondata.s    2025-10-26 08:21:40.864024143 +0100
> > > @@ -1603,11 +1603,12 @@
> > >          call    _find_first_bit
> > >   # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
> > >          movq    %r12, %rsi
> > > -       leaq    48(%rsp), %rdi
> > > -       movq    %rax, %rdx
> > > +# fs/resctrl/ctrlmondata.c:133:        zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
> > > +       leaq    1(%rax), %rdx
> > >   # ./include/linux/find.h:214:  return _find_first_bit(addr, size);
> > >          movq    %rax, 8(%rsp)
> > >   # ./include/linux/find.h:192:  return _find_next_zero_bit(addr, size, offset);
> > > +       leaq    48(%rsp), %rdi
> > 
> > (This is really only showing that the compiler works.  The real
> > question is whether the logic is still sound after this change to the
> > arguments of _find_first_bit()...)
> 
> Will remove in v2, if not useful.

It's harmless, but a bit of a distraction...

> > >          call    _find_next_zero_bit
> > >   # fs/resctrl/ctrlmondata.c:136:        if (!r->cache.arch_has_sparse_bitmasks &&
> > >          leaq    28(%rbx), %rdi
> > > ---
> > >   fs/resctrl/ctrlmondata.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> > > index 0d0ef54fc4de..1ff479a2dbbc 100644
> > > --- a/fs/resctrl/ctrlmondata.c
> > > +++ b/fs/resctrl/ctrlmondata.c
> > > @@ -130,7 +130,7 @@ static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
> > >   	}
> > >   	first_bit = find_first_bit(&val, cbm_len);
> > > -	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit);
> > > +	zero_bit = find_next_zero_bit(&val, cbm_len, first_bit + 1);
> > 
> > Does this definitely do the right thing if val was zero?
> 
> Yes, IMHO, it does.
> 
> If val is zero, first_bit will be assigned to cbm_len (see [1]).
> Then, find_next_zero_bit() will do the same because 'first_bit + 1' will
> overflow the size of the bitmap. (see [2] and [3])

Right, I think that works.

> The only case were we could have trouble would be to have 'first_bit + 1'
> overflow and be equal to 0. I don't think that such a case is possible.

I looks impossible to me: first_bit comes from
find_first_bit(..., cbm_len), so I don't think it can be greater than
cbm_len.

> > >   	/* Are non-contiguous bitmasks allowed? */
> > >   	if (!r->cache.arch_has_sparse_bitmasks &&
> > 
> > Also, what about the find_first_bit() below?
> 
> Should be updated as well.
> Will send a v2.

OK, sounds fair.

Cheers
---Dave

[...]

> [1]:
> https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/find.h#L203
> [2]:
> https://elixir.bootlin.com/linux/v6.18-rc3/source/include/linux/find.h#L185
> [3]: https://elixir.bootlin.com/linux/v6.18-rc3/source/lib/find_bit.c#L55