lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

[PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Florian Westphal 1 year, 4 months ago

Ben Greear reports following splat:
 ------------[ cut here ]------------
 net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
 WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
 Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
 Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
 RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
  codetag_unload_module+0x19b/0x2a0
  ? codetag_load_module+0x80/0x80

nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.

Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.

Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat                # will WARN.

Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 lib/codetag.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/codetag.c b/lib/codetag.c
index afa8a2d4f317..a3472d428821 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
 	if (!mod)
 		return true;
 
+	kvfree_rcu_barrier();
+
 	mutex_lock(&codetag_lock);
 	list_for_each_entry(cttype, &codetag_types, link) {
 		struct codetag_module *found = NULL;
-- 
2.45.2

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Andrew Morton 1 year, 4 months ago

On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:

> Ben Greear reports following splat:
>  ------------[ cut here ]------------
>  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
>  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
>  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> ...
>  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
>  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
>   codetag_unload_module+0x19b/0x2a0
>   ? codetag_load_module+0x80/0x80
> 
> nf_nat module exit calls kfree_rcu on those addresses, but the free
> operation is likely still pending by the time alloc_tag checks for leaks.
> 
> Wait for outstanding kfree_rcu operations to complete before checking
> resolves this warning.
> 
> Reproducer:
> unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> grep nf_nat /proc/allocinfo # will list 4 allocations
> rmmod nft_chain_nat
> rmmod nf_nat                # will WARN.
> 
> ...
>
> --- a/lib/codetag.c
> +++ b/lib/codetag.c
> @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
>  	if (!mod)
>  		return true;
>  
> +	kvfree_rcu_barrier();
> +
>  	mutex_lock(&codetag_lock);
>  	list_for_each_entry(cttype, &codetag_types, link) {
>  		struct codetag_module *found = NULL;

It's always hard to determine why a thing like this is present, so a
comment is helpful:

--- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
+++ a/lib/codetag.c
@@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
 	if (!mod)
 		return true;
 
+	/* await any module's kfree_rcu() operations to complete */
 	kvfree_rcu_barrier();
 
 	mutex_lock(&codetag_lock);
_

But I do wonder whether this is in the correct place.

Waiting for a module's ->exit() function's kfree_rcu()s to complete
should properly be done by the core module handling code?

free_module() does a full-on synchronize_rcu() prior to freeing the
module memory itself and I think codetag_unload_module() could be
called after that?

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Suren Baghdasaryan 1 year, 4 months ago

On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
>
> > Ben Greear reports following splat:
> >  ------------[ cut here ]------------
> >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > ...
> >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> >   codetag_unload_module+0x19b/0x2a0
> >   ? codetag_load_module+0x80/0x80
> >
> > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > operation is likely still pending by the time alloc_tag checks for leaks.
> >
> > Wait for outstanding kfree_rcu operations to complete before checking
> > resolves this warning.
> >
> > Reproducer:
> > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > grep nf_nat /proc/allocinfo # will list 4 allocations
> > rmmod nft_chain_nat
> > rmmod nf_nat                # will WARN.
> >
> > ...
> >
> > --- a/lib/codetag.c
> > +++ b/lib/codetag.c
> > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> >       if (!mod)
> >               return true;
> >
> > +     kvfree_rcu_barrier();
> > +
> >       mutex_lock(&codetag_lock);
> >       list_for_each_entry(cttype, &codetag_types, link) {
> >               struct codetag_module *found = NULL;
>
> It's always hard to determine why a thing like this is present, so a
> comment is helpful:
>
> --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> +++ a/lib/codetag.c
> @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
>         if (!mod)
>                 return true;
>
> +       /* await any module's kfree_rcu() operations to complete */
>         kvfree_rcu_barrier();
>
>         mutex_lock(&codetag_lock);
> _
>
> But I do wonder whether this is in the correct place.
>
> Waiting for a module's ->exit() function's kfree_rcu()s to complete
> should properly be done by the core module handling code?

I don't think core module code cares about kfree_rcu()s being complete
before the module is unloaded.
Allocation tagging OTOH cares because it is about to destroy tags
which will be accessed when kfree() actually happens, therefore a
strict ordering is important.

>
> free_module() does a full-on synchronize_rcu() prior to freeing the
> module memory itself and I think codetag_unload_module() could be
> called after that?

I think we could move codetag_unload_module() after synchronize_rcu()
inside free_module() but according to the reply in
https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
synchronize_rcu() does not help. I'm not quite sure why...
Note that once I'm done upstreaming
https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
this change will not be needed and I'm planning to remove this call,
however this change is useful for backporting. It should be sent to
stable@vger.kernel.org # v6.10+

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Vlastimil Babka 1 year, 4 months ago

On 10/8/24 03:49, Suren Baghdasaryan wrote:
> On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
>>
>> > Ben Greear reports following splat:
>> >  ------------[ cut here ]------------
>> >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
>> >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
>> >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
>> > ...
>> >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
>> >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
>> >   codetag_unload_module+0x19b/0x2a0
>> >   ? codetag_load_module+0x80/0x80
>> >
>> > nf_nat module exit calls kfree_rcu on those addresses, but the free
>> > operation is likely still pending by the time alloc_tag checks for leaks.
>> >
>> > Wait for outstanding kfree_rcu operations to complete before checking
>> > resolves this warning.
>> >
>> > Reproducer:
>> > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
>> > grep nf_nat /proc/allocinfo # will list 4 allocations
>> > rmmod nft_chain_nat
>> > rmmod nf_nat                # will WARN.
>> >
>> > ...
>> >
>> > --- a/lib/codetag.c
>> > +++ b/lib/codetag.c
>> > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
>> >       if (!mod)
>> >               return true;
>> >
>> > +     kvfree_rcu_barrier();
>> > +
>> >       mutex_lock(&codetag_lock);
>> >       list_for_each_entry(cttype, &codetag_types, link) {
>> >               struct codetag_module *found = NULL;
>>
>> It's always hard to determine why a thing like this is present, so a
>> comment is helpful:
>>
>> --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
>> +++ a/lib/codetag.c
>> @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
>>         if (!mod)
>>                 return true;
>>
>> +       /* await any module's kfree_rcu() operations to complete */
>>         kvfree_rcu_barrier();
>>
>>         mutex_lock(&codetag_lock);
>> _
>>
>> But I do wonder whether this is in the correct place.
>>
>> Waiting for a module's ->exit() function's kfree_rcu()s to complete
>> should properly be done by the core module handling code?
> 
> I don't think core module code cares about kfree_rcu()s being complete
> before the module is unloaded.

Right, module unload should care about pending call_rcu() involving module
code and/or module-created kmem_caches that are to be destroyed, but I think
it's up to individual modules anyway to do a rcu_barrier() in those cases?

> Allocation tagging OTOH cares because it is about to destroy tags
> which will be accessed when kfree() actually happens, therefore a
> strict ordering is important.
> 
>>
>> free_module() does a full-on synchronize_rcu() prior to freeing the
>> module memory itself and I think codetag_unload_module() could be
>> called after that?
> I think we could move codetag_unload_module() after synchronize_rcu()
> inside free_module() but according to the reply in
> https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> synchronize_rcu() does not help. I'm not quite sure why...

synchronize_rcu() is only waiting for potential rcu read sections? You might
have expected rcu_barrier() to help as that's waiting for pending
call_rcu(). But as Ulad said the kfree_rcu() implementation does extra
batching so there's now a new barrier for that, originally intended for
kmem_cache_destroy(), but indeed useful here as well.

> Note that once I'm done upstreaming
> https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> this change will not be needed and I'm planning to remove this call,
> however this change is useful for backporting. It should be sent to
> stable@vger.kernel.org # v6.10+

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Uladzislau Rezki 1 year, 4 months ago

On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> >
> > > Ben Greear reports following splat:
> > >  ------------[ cut here ]------------
> > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > ...
> > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > >   codetag_unload_module+0x19b/0x2a0
> > >   ? codetag_load_module+0x80/0x80
> > >
> > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > operation is likely still pending by the time alloc_tag checks for leaks.
> > >
> > > Wait for outstanding kfree_rcu operations to complete before checking
> > > resolves this warning.
> > >
> > > Reproducer:
> > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > rmmod nft_chain_nat
> > > rmmod nf_nat                # will WARN.
> > >
> > > ...
> > >
> > > --- a/lib/codetag.c
> > > +++ b/lib/codetag.c
> > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > >       if (!mod)
> > >               return true;
> > >
> > > +     kvfree_rcu_barrier();
> > > +
> > >       mutex_lock(&codetag_lock);
> > >       list_for_each_entry(cttype, &codetag_types, link) {
> > >               struct codetag_module *found = NULL;
> >
> > It's always hard to determine why a thing like this is present, so a
> > comment is helpful:
> >
> > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > +++ a/lib/codetag.c
> > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> >         if (!mod)
> >                 return true;
> >
> > +       /* await any module's kfree_rcu() operations to complete */
> >         kvfree_rcu_barrier();
> >
> >         mutex_lock(&codetag_lock);
> > _
> >
> > But I do wonder whether this is in the correct place.
> >
> > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > should properly be done by the core module handling code?
> 
> I don't think core module code cares about kfree_rcu()s being complete
> before the module is unloaded.
> Allocation tagging OTOH cares because it is about to destroy tags
> which will be accessed when kfree() actually happens, therefore a
> strict ordering is important.
> 
> >
> > free_module() does a full-on synchronize_rcu() prior to freeing the
> > module memory itself and I think codetag_unload_module() could be
> > called after that?
> 
> I think we could move codetag_unload_module() after synchronize_rcu()
> inside free_module() but according to the reply in
> https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> synchronize_rcu() does not help. I'm not quite sure why...
>
It is because, synchronize_rcu() is used for a bit different things,
i.e. it is about a GP completion. Offloading objects can span several
GPs.

> Note that once I'm done upstreaming
> https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> this change will not be needed and I'm planning to remove this call,
> however this change is useful for backporting. It should be sent to
> stable@vger.kernel.org # v6.10+
>
The kvfree_rcu_barrier() has been added into v6.12:

<snip>
urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
next-20240912
next-20240919
next-20240920
next-20241002
v6.12-rc1
urezki@pc638:~/data/raid0/coding/linux.git$
<snip>

For 6.10+, it implies that the mentioned commit should be backported also.

--
Uladzislau Rezki

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Suren Baghdasaryan 1 year, 4 months ago

On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > >
> > > > Ben Greear reports following splat:
> > > >  ------------[ cut here ]------------
> > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > ...
> > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > >   codetag_unload_module+0x19b/0x2a0
> > > >   ? codetag_load_module+0x80/0x80
> > > >
> > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > >
> > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > resolves this warning.
> > > >
> > > > Reproducer:
> > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > rmmod nft_chain_nat
> > > > rmmod nf_nat                # will WARN.
> > > >
> > > > ...
> > > >
> > > > --- a/lib/codetag.c
> > > > +++ b/lib/codetag.c
> > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > >       if (!mod)
> > > >               return true;
> > > >
> > > > +     kvfree_rcu_barrier();
> > > > +
> > > >       mutex_lock(&codetag_lock);
> > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > >               struct codetag_module *found = NULL;
> > >
> > > It's always hard to determine why a thing like this is present, so a
> > > comment is helpful:
> > >
> > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > +++ a/lib/codetag.c
> > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > >         if (!mod)
> > >                 return true;
> > >
> > > +       /* await any module's kfree_rcu() operations to complete */
> > >         kvfree_rcu_barrier();
> > >
> > >         mutex_lock(&codetag_lock);
> > > _
> > >
> > > But I do wonder whether this is in the correct place.
> > >
> > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > should properly be done by the core module handling code?
> >
> > I don't think core module code cares about kfree_rcu()s being complete
> > before the module is unloaded.
> > Allocation tagging OTOH cares because it is about to destroy tags
> > which will be accessed when kfree() actually happens, therefore a
> > strict ordering is important.
> >
> > >
> > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > module memory itself and I think codetag_unload_module() could be
> > > called after that?
> >
> > I think we could move codetag_unload_module() after synchronize_rcu()
> > inside free_module() but according to the reply in
> > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > synchronize_rcu() does not help. I'm not quite sure why...
> >
> It is because, synchronize_rcu() is used for a bit different things,
> i.e. it is about a GP completion. Offloading objects can span several
> GPs.

Ah, thanks! Now that I looked into the patch that recently added
kvfree_rcu_barrier() I understand that a bit better.

>
> > Note that once I'm done upstreaming
> > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > this change will not be needed and I'm planning to remove this call,
> > however this change is useful for backporting. It should be sent to
> > stable@vger.kernel.org # v6.10+
> >
> The kvfree_rcu_barrier() has been added into v6.12:
>
> <snip>
> urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> next-20240912
> next-20240919
> next-20240920
> next-20241002
> v6.12-rc1
> urezki@pc638:~/data/raid0/coding/linux.git$
> <snip>
>
> For 6.10+, it implies that the mentioned commit should be backported also.

I see. I guess for pre-6.12 we would use rcu_barrier() instead of
kvfree_rcu_barrier()?

>
> --
> Uladzislau Rezki

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Uladzislau Rezki 1 year, 4 months ago

On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > > >
> > > > > Ben Greear reports following splat:
> > > > >  ------------[ cut here ]------------
> > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > ...
> > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > >   codetag_unload_module+0x19b/0x2a0
> > > > >   ? codetag_load_module+0x80/0x80
> > > > >
> > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > >
> > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > resolves this warning.
> > > > >
> > > > > Reproducer:
> > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > rmmod nft_chain_nat
> > > > > rmmod nf_nat                # will WARN.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/lib/codetag.c
> > > > > +++ b/lib/codetag.c
> > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > >       if (!mod)
> > > > >               return true;
> > > > >
> > > > > +     kvfree_rcu_barrier();
> > > > > +
> > > > >       mutex_lock(&codetag_lock);
> > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > >               struct codetag_module *found = NULL;
> > > >
> > > > It's always hard to determine why a thing like this is present, so a
> > > > comment is helpful:
> > > >
> > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > +++ a/lib/codetag.c
> > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > >         if (!mod)
> > > >                 return true;
> > > >
> > > > +       /* await any module's kfree_rcu() operations to complete */
> > > >         kvfree_rcu_barrier();
> > > >
> > > >         mutex_lock(&codetag_lock);
> > > > _
> > > >
> > > > But I do wonder whether this is in the correct place.
> > > >
> > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > should properly be done by the core module handling code?
> > >
> > > I don't think core module code cares about kfree_rcu()s being complete
> > > before the module is unloaded.
> > > Allocation tagging OTOH cares because it is about to destroy tags
> > > which will be accessed when kfree() actually happens, therefore a
> > > strict ordering is important.
> > >
> > > >
> > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > module memory itself and I think codetag_unload_module() could be
> > > > called after that?
> > >
> > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > inside free_module() but according to the reply in
> > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > synchronize_rcu() does not help. I'm not quite sure why...
> > >
> > It is because, synchronize_rcu() is used for a bit different things,
> > i.e. it is about a GP completion. Offloading objects can span several
> > GPs.
> 
> Ah, thanks! Now that I looked into the patch that recently added
> kvfree_rcu_barrier() I understand that a bit better.
> 
> >
> > > Note that once I'm done upstreaming
> > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > this change will not be needed and I'm planning to remove this call,
> > > however this change is useful for backporting. It should be sent to
> > > stable@vger.kernel.org # v6.10+
> > >
> > The kvfree_rcu_barrier() has been added into v6.12:
> >
> > <snip>
> > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > next-20240912
> > next-20240919
> > next-20240920
> > next-20241002
> > v6.12-rc1
> > urezki@pc638:~/data/raid0/coding/linux.git$
> > <snip>
> >
> > For 6.10+, it implies that the mentioned commit should be backported also.
> 
> I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> kvfree_rcu_barrier()?
> 
For kernels < 6.12, unfortunately not :) If you have a possibility to
switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
which waits for all callbacks to be completed.

Or backport kvfree_rcu_barrier(). It __should__ be easy since there
were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.

--
Uladzislau Rezki

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Suren Baghdasaryan 1 year, 4 months ago

On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > >
> > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > > > >
> > > > > > Ben Greear reports following splat:
> > > > > >  ------------[ cut here ]------------
> > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > ...
> > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > >   codetag_unload_module+0x19b/0x2a0
> > > > > >   ? codetag_load_module+0x80/0x80
> > > > > >
> > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > >
> > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > resolves this warning.
> > > > > >
> > > > > > Reproducer:
> > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > rmmod nft_chain_nat
> > > > > > rmmod nf_nat                # will WARN.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > --- a/lib/codetag.c
> > > > > > +++ b/lib/codetag.c
> > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > >       if (!mod)
> > > > > >               return true;
> > > > > >
> > > > > > +     kvfree_rcu_barrier();
> > > > > > +
> > > > > >       mutex_lock(&codetag_lock);
> > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > > >               struct codetag_module *found = NULL;
> > > > >
> > > > > It's always hard to determine why a thing like this is present, so a
> > > > > comment is helpful:
> > > > >
> > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > +++ a/lib/codetag.c
> > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > >         if (!mod)
> > > > >                 return true;
> > > > >
> > > > > +       /* await any module's kfree_rcu() operations to complete */
> > > > >         kvfree_rcu_barrier();
> > > > >
> > > > >         mutex_lock(&codetag_lock);
> > > > > _
> > > > >
> > > > > But I do wonder whether this is in the correct place.
> > > > >
> > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > should properly be done by the core module handling code?
> > > >
> > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > before the module is unloaded.
> > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > which will be accessed when kfree() actually happens, therefore a
> > > > strict ordering is important.
> > > >
> > > > >
> > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > module memory itself and I think codetag_unload_module() could be
> > > > > called after that?
> > > >
> > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > inside free_module() but according to the reply in
> > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > >
> > > It is because, synchronize_rcu() is used for a bit different things,
> > > i.e. it is about a GP completion. Offloading objects can span several
> > > GPs.
> >
> > Ah, thanks! Now that I looked into the patch that recently added
> > kvfree_rcu_barrier() I understand that a bit better.
> >
> > >
> > > > Note that once I'm done upstreaming
> > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > > this change will not be needed and I'm planning to remove this call,
> > > > however this change is useful for backporting. It should be sent to
> > > > stable@vger.kernel.org # v6.10+
> > > >
> > > The kvfree_rcu_barrier() has been added into v6.12:
> > >
> > > <snip>
> > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > next-20240912
> > > next-20240919
> > > next-20240920
> > > next-20241002
> > > v6.12-rc1
> > > urezki@pc638:~/data/raid0/coding/linux.git$
> > > <snip>
> > >
> > > For 6.10+, it implies that the mentioned commit should be backported also.
> >
> > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > kvfree_rcu_barrier()?
> >
> For kernels < 6.12, unfortunately not :) If you have a possibility to
> switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> which waits for all callbacks to be completed.

We do not control the call-site inside the module, so this would not
be possible.

>
> Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.

That sounds better. I'll take a stab at it. Thanks!

>
> --
> Uladzislau Rezki

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Uladzislau Rezki 1 year, 4 months ago

On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
> On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > >
> > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > > > > >
> > > > > > > Ben Greear reports following splat:
> > > > > > >  ------------[ cut here ]------------
> > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > > ...
> > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > > >   codetag_unload_module+0x19b/0x2a0
> > > > > > >   ? codetag_load_module+0x80/0x80
> > > > > > >
> > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > > >
> > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > > resolves this warning.
> > > > > > >
> > > > > > > Reproducer:
> > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > > rmmod nft_chain_nat
> > > > > > > rmmod nf_nat                # will WARN.
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > --- a/lib/codetag.c
> > > > > > > +++ b/lib/codetag.c
> > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > > >       if (!mod)
> > > > > > >               return true;
> > > > > > >
> > > > > > > +     kvfree_rcu_barrier();
> > > > > > > +
> > > > > > >       mutex_lock(&codetag_lock);
> > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > > > >               struct codetag_module *found = NULL;
> > > > > >
> > > > > > It's always hard to determine why a thing like this is present, so a
> > > > > > comment is helpful:
> > > > > >
> > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > > +++ a/lib/codetag.c
> > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > > >         if (!mod)
> > > > > >                 return true;
> > > > > >
> > > > > > +       /* await any module's kfree_rcu() operations to complete */
> > > > > >         kvfree_rcu_barrier();
> > > > > >
> > > > > >         mutex_lock(&codetag_lock);
> > > > > > _
> > > > > >
> > > > > > But I do wonder whether this is in the correct place.
> > > > > >
> > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > > should properly be done by the core module handling code?
> > > > >
> > > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > > before the module is unloaded.
> > > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > > which will be accessed when kfree() actually happens, therefore a
> > > > > strict ordering is important.
> > > > >
> > > > > >
> > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > > module memory itself and I think codetag_unload_module() could be
> > > > > > called after that?
> > > > >
> > > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > > inside free_module() but according to the reply in
> > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > > >
> > > > It is because, synchronize_rcu() is used for a bit different things,
> > > > i.e. it is about a GP completion. Offloading objects can span several
> > > > GPs.
> > >
> > > Ah, thanks! Now that I looked into the patch that recently added
> > > kvfree_rcu_barrier() I understand that a bit better.
> > >
> > > >
> > > > > Note that once I'm done upstreaming
> > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > > > this change will not be needed and I'm planning to remove this call,
> > > > > however this change is useful for backporting. It should be sent to
> > > > > stable@vger.kernel.org # v6.10+
> > > > >
> > > > The kvfree_rcu_barrier() has been added into v6.12:
> > > >
> > > > <snip>
> > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > > next-20240912
> > > > next-20240919
> > > > next-20240920
> > > > next-20241002
> > > > v6.12-rc1
> > > > urezki@pc638:~/data/raid0/coding/linux.git$
> > > > <snip>
> > > >
> > > > For 6.10+, it implies that the mentioned commit should be backported also.
> > >
> > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > > kvfree_rcu_barrier()?
> > >
> > For kernels < 6.12, unfortunately not :) If you have a possibility to
> > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> > which waits for all callbacks to be completed.
> 
> We do not control the call-site inside the module, so this would not
> be possible.
> 
> >
> > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
> 
> That sounds better. I'll take a stab at it. Thanks!
> 
You are welcome!

--
Uladzislau Rezki

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Suren Baghdasaryan 1 year, 3 months ago

On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > >
> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > > >
> > > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > > > > > >
> > > > > > > > Ben Greear reports following splat:
> > > > > > > >  ------------[ cut here ]------------
> > > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > > > ...
> > > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > >   codetag_unload_module+0x19b/0x2a0
> > > > > > > >   ? codetag_load_module+0x80/0x80
> > > > > > > >
> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > > > >
> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > > > resolves this warning.
> > > > > > > >
> > > > > > > > Reproducer:
> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > > > rmmod nft_chain_nat
> > > > > > > > rmmod nf_nat                # will WARN.
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > --- a/lib/codetag.c
> > > > > > > > +++ b/lib/codetag.c
> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > > > >       if (!mod)
> > > > > > > >               return true;
> > > > > > > >
> > > > > > > > +     kvfree_rcu_barrier();
> > > > > > > > +
> > > > > > > >       mutex_lock(&codetag_lock);
> > > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > > > > >               struct codetag_module *found = NULL;
> > > > > > >
> > > > > > > It's always hard to determine why a thing like this is present, so a
> > > > > > > comment is helpful:
> > > > > > >
> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > > > +++ a/lib/codetag.c
> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > > > >         if (!mod)
> > > > > > >                 return true;
> > > > > > >
> > > > > > > +       /* await any module's kfree_rcu() operations to complete */
> > > > > > >         kvfree_rcu_barrier();
> > > > > > >
> > > > > > >         mutex_lock(&codetag_lock);
> > > > > > > _
> > > > > > >
> > > > > > > But I do wonder whether this is in the correct place.
> > > > > > >
> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > > > should properly be done by the core module handling code?
> > > > > >
> > > > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > > > before the module is unloaded.
> > > > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > > > which will be accessed when kfree() actually happens, therefore a
> > > > > > strict ordering is important.
> > > > > >
> > > > > > >
> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > > > module memory itself and I think codetag_unload_module() could be
> > > > > > > called after that?
> > > > > >
> > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > > > inside free_module() but according to the reply in
> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > > > >
> > > > > It is because, synchronize_rcu() is used for a bit different things,
> > > > > i.e. it is about a GP completion. Offloading objects can span several
> > > > > GPs.
> > > >
> > > > Ah, thanks! Now that I looked into the patch that recently added
> > > > kvfree_rcu_barrier() I understand that a bit better.
> > > >
> > > > >
> > > > > > Note that once I'm done upstreaming
> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > > > > this change will not be needed and I'm planning to remove this call,
> > > > > > however this change is useful for backporting. It should be sent to
> > > > > > stable@vger.kernel.org # v6.10+
> > > > > >
> > > > > The kvfree_rcu_barrier() has been added into v6.12:
> > > > >
> > > > > <snip>
> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > > > next-20240912
> > > > > next-20240919
> > > > > next-20240920
> > > > > next-20241002
> > > > > v6.12-rc1
> > > > > urezki@pc638:~/data/raid0/coding/linux.git$
> > > > > <snip>
> > > > >
> > > > > For 6.10+, it implies that the mentioned commit should be backported also.
> > > >
> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > > > kvfree_rcu_barrier()?
> > > >
> > > For kernels < 6.12, unfortunately not :) If you have a possibility to
> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> > > which waits for all callbacks to be completed.
> >
> > We do not control the call-site inside the module, so this would not
> > be possible.
> >
> > >
> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
> >
> > That sounds better. I'll take a stab at it. Thanks!

I prepared backports for stable 6.10.y and 6.11.y branches which
contain this patch along with the prerequisite "rcu/kvfree: Add
kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
required. However, since this patch is not yet in Linus' tree, I'll
wait for it to get there before sending backports to stable (per Greg
KH's recommendation). Please let me know if you think it's more
critical and should be posted earlier.
Thanks,
Suren.

> >
> You are welcome!
>
> --
> Uladzislau Rezki
>

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Vlastimil Babka 1 year, 3 months ago

On 10/15/24 18:22, Suren Baghdasaryan wrote:
> On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>>
>> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
>> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>> > >
>> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
>> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>> > > > >
>> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
>> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> > > > > > >
>> > > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
>> > > > > > >
>> > > > > > > > Ben Greear reports following splat:
>> > > > > > > >  ------------[ cut here ]------------
>> > > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
>> > > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
>> > > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
>> > > > > > > > ...
>> > > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
>> > > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
>> > > > > > > >   codetag_unload_module+0x19b/0x2a0
>> > > > > > > >   ? codetag_load_module+0x80/0x80
>> > > > > > > >
>> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
>> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
>> > > > > > > >
>> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
>> > > > > > > > resolves this warning.
>> > > > > > > >
>> > > > > > > > Reproducer:
>> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
>> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
>> > > > > > > > rmmod nft_chain_nat
>> > > > > > > > rmmod nf_nat                # will WARN.
>> > > > > > > >
>> > > > > > > > ...
>> > > > > > > >
>> > > > > > > > --- a/lib/codetag.c
>> > > > > > > > +++ b/lib/codetag.c
>> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
>> > > > > > > >       if (!mod)
>> > > > > > > >               return true;
>> > > > > > > >
>> > > > > > > > +     kvfree_rcu_barrier();
>> > > > > > > > +
>> > > > > > > >       mutex_lock(&codetag_lock);
>> > > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
>> > > > > > > >               struct codetag_module *found = NULL;
>> > > > > > >
>> > > > > > > It's always hard to determine why a thing like this is present, so a
>> > > > > > > comment is helpful:
>> > > > > > >
>> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
>> > > > > > > +++ a/lib/codetag.c
>> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
>> > > > > > >         if (!mod)
>> > > > > > >                 return true;
>> > > > > > >
>> > > > > > > +       /* await any module's kfree_rcu() operations to complete */
>> > > > > > >         kvfree_rcu_barrier();
>> > > > > > >
>> > > > > > >         mutex_lock(&codetag_lock);
>> > > > > > > _
>> > > > > > >
>> > > > > > > But I do wonder whether this is in the correct place.
>> > > > > > >
>> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
>> > > > > > > should properly be done by the core module handling code?
>> > > > > >
>> > > > > > I don't think core module code cares about kfree_rcu()s being complete
>> > > > > > before the module is unloaded.
>> > > > > > Allocation tagging OTOH cares because it is about to destroy tags
>> > > > > > which will be accessed when kfree() actually happens, therefore a
>> > > > > > strict ordering is important.
>> > > > > >
>> > > > > > >
>> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
>> > > > > > > module memory itself and I think codetag_unload_module() could be
>> > > > > > > called after that?
>> > > > > >
>> > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
>> > > > > > inside free_module() but according to the reply in
>> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
>> > > > > > synchronize_rcu() does not help. I'm not quite sure why...
>> > > > > >
>> > > > > It is because, synchronize_rcu() is used for a bit different things,
>> > > > > i.e. it is about a GP completion. Offloading objects can span several
>> > > > > GPs.
>> > > >
>> > > > Ah, thanks! Now that I looked into the patch that recently added
>> > > > kvfree_rcu_barrier() I understand that a bit better.
>> > > >
>> > > > >
>> > > > > > Note that once I'm done upstreaming
>> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
>> > > > > > this change will not be needed and I'm planning to remove this call,
>> > > > > > however this change is useful for backporting. It should be sent to
>> > > > > > stable@vger.kernel.org # v6.10+
>> > > > > >
>> > > > > The kvfree_rcu_barrier() has been added into v6.12:
>> > > > >
>> > > > > <snip>
>> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
>> > > > > next-20240912
>> > > > > next-20240919
>> > > > > next-20240920
>> > > > > next-20241002
>> > > > > v6.12-rc1
>> > > > > urezki@pc638:~/data/raid0/coding/linux.git$
>> > > > > <snip>
>> > > > >
>> > > > > For 6.10+, it implies that the mentioned commit should be backported also.
>> > > >
>> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
>> > > > kvfree_rcu_barrier()?
>> > > >
>> > > For kernels < 6.12, unfortunately not :) If you have a possibility to
>> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
>> > > which waits for all callbacks to be completed.
>> >
>> > We do not control the call-site inside the module, so this would not
>> > be possible.
>> >
>> > >
>> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
>> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
>> >
>> > That sounds better. I'll take a stab at it. Thanks!
> 
> I prepared backports for stable 6.10.y and 6.11.y branches which

6.10 is already EOL thus won't be necessary.

> contain this patch along with the prerequisite "rcu/kvfree: Add
> kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
> required. However, since this patch is not yet in Linus' tree, I'll
> wait for it to get there before sending backports to stable (per Greg
> KH's recommendation). Please let me know if you think it's more
> critical and should be posted earlier.
> Thanks,
> Suren.
> 
>> >
>> You are welcome!
>>
>> --
>> Uladzislau Rezki
>>

Re: [PATCH lib] lib: alloc_tag_module_unload must wait for pending kfree_rcu calls

Posted by Uladzislau Rezki 1 year, 3 months ago

On Tue, Oct 15, 2024 at 09:22:55AM -0700, Suren Baghdasaryan wrote:
> On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote:
> > > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote:
> > > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote:
> > > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > > > >
> > > > > > > > On Mon,  7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote:
> > > > > > > >
> > > > > > > > > Ben Greear reports following splat:
> > > > > > > > >  ------------[ cut here ]------------
> > > > > > > > >  net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
> > > > > > > > >  WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > > >  Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
> > > > > > > > > ...
> > > > > > > > >  Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
> > > > > > > > >  RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
> > > > > > > > >   codetag_unload_module+0x19b/0x2a0
> > > > > > > > >   ? codetag_load_module+0x80/0x80
> > > > > > > > >
> > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free
> > > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks.
> > > > > > > > >
> > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking
> > > > > > > > > resolves this warning.
> > > > > > > > >
> > > > > > > > > Reproducer:
> > > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp
> > > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations
> > > > > > > > > rmmod nft_chain_nat
> > > > > > > > > rmmod nf_nat                # will WARN.
> > > > > > > > >
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > --- a/lib/codetag.c
> > > > > > > > > +++ b/lib/codetag.c
> > > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
> > > > > > > > >       if (!mod)
> > > > > > > > >               return true;
> > > > > > > > >
> > > > > > > > > +     kvfree_rcu_barrier();
> > > > > > > > > +
> > > > > > > > >       mutex_lock(&codetag_lock);
> > > > > > > > >       list_for_each_entry(cttype, &codetag_types, link) {
> > > > > > > > >               struct codetag_module *found = NULL;
> > > > > > > >
> > > > > > > > It's always hard to determine why a thing like this is present, so a
> > > > > > > > comment is helpful:
> > > > > > > >
> > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix
> > > > > > > > +++ a/lib/codetag.c
> > > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module
> > > > > > > >         if (!mod)
> > > > > > > >                 return true;
> > > > > > > >
> > > > > > > > +       /* await any module's kfree_rcu() operations to complete */
> > > > > > > >         kvfree_rcu_barrier();
> > > > > > > >
> > > > > > > >         mutex_lock(&codetag_lock);
> > > > > > > > _
> > > > > > > >
> > > > > > > > But I do wonder whether this is in the correct place.
> > > > > > > >
> > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete
> > > > > > > > should properly be done by the core module handling code?
> > > > > > >
> > > > > > > I don't think core module code cares about kfree_rcu()s being complete
> > > > > > > before the module is unloaded.
> > > > > > > Allocation tagging OTOH cares because it is about to destroy tags
> > > > > > > which will be accessed when kfree() actually happens, therefore a
> > > > > > > strict ordering is important.
> > > > > > >
> > > > > > > >
> > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the
> > > > > > > > module memory itself and I think codetag_unload_module() could be
> > > > > > > > called after that?
> > > > > > >
> > > > > > > I think we could move codetag_unload_module() after synchronize_rcu()
> > > > > > > inside free_module() but according to the reply in
> > > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/
> > > > > > > synchronize_rcu() does not help. I'm not quite sure why...
> > > > > > >
> > > > > > It is because, synchronize_rcu() is used for a bit different things,
> > > > > > i.e. it is about a GP completion. Offloading objects can span several
> > > > > > GPs.
> > > > >
> > > > > Ah, thanks! Now that I looked into the patch that recently added
> > > > > kvfree_rcu_barrier() I understand that a bit better.
> > > > >
> > > > > >
> > > > > > > Note that once I'm done upstreaming
> > > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/,
> > > > > > > this change will not be needed and I'm planning to remove this call,
> > > > > > > however this change is useful for backporting. It should be sent to
> > > > > > > stable@vger.kernel.org # v6.10+
> > > > > > >
> > > > > > The kvfree_rcu_barrier() has been added into v6.12:
> > > > > >
> > > > > > <snip>
> > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951
> > > > > > next-20240912
> > > > > > next-20240919
> > > > > > next-20240920
> > > > > > next-20241002
> > > > > > v6.12-rc1
> > > > > > urezki@pc638:~/data/raid0/coding/linux.git$
> > > > > > <snip>
> > > > > >
> > > > > > For 6.10+, it implies that the mentioned commit should be backported also.
> > > > >
> > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of
> > > > > kvfree_rcu_barrier()?
> > > > >
> > > > For kernels < 6.12, unfortunately not :) If you have a possibility to
> > > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier()
> > > > which waits for all callbacks to be completed.
> > >
> > > We do not control the call-site inside the module, so this would not
> > > be possible.
> > >
> > > >
> > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there
> > > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX.
> > >
> > > That sounds better. I'll take a stab at it. Thanks!
> 
> I prepared backports for stable 6.10.y and 6.11.y branches which
> contain this patch along with the prerequisite "rcu/kvfree: Add
> kvfree_rcu_barrier() API" patch. I think that's the only prerequisite
> required. However, since this patch is not yet in Linus' tree, I'll
> wait for it to get there before sending backports to stable (per Greg
> KH's recommendation). Please let me know if you think it's more
> critical and should be posted earlier.
>
To me it sounds reasonable.

--
Uladzislau Rezki