lib/codetag.c | 2 ++ 1 file changed, 2 insertions(+)
Ben Greear reports following splat:
------------[ cut here ]------------
net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
...
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
codetag_unload_module+0x19b/0x2a0
? codetag_load_module+0x80/0x80
nf_nat module exit calls kfree_rcu on those addresses, but the free
operation is likely still pending by the time alloc_tag checks for leaks.
Wait for outstanding kfree_rcu operations to complete before checking
resolves this warning.
Reproducer:
unshare -n iptables-nft -t nat -A PREROUTING -p tcp
grep nf_nat /proc/allocinfo # will list 4 allocations
rmmod nft_chain_nat
rmmod nf_nat # will WARN.
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Ben Greear <greearb@candelatech.com>
Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
Fixes: a473573964e5 ("lib: code tagging module support")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
lib/codetag.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/codetag.c b/lib/codetag.c
index afa8a2d4f317..a3472d428821 100644
--- a/lib/codetag.c
+++ b/lib/codetag.c
@@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod)
if (!mod)
return true;
+ kvfree_rcu_barrier();
+
mutex_lock(&codetag_lock);
list_for_each_entry(cttype, &codetag_types, link) {
struct codetag_module *found = NULL;
--
2.45.2
On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > Ben Greear reports following splat: > ------------[ cut here ]------------ > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > ... > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > codetag_unload_module+0x19b/0x2a0 > ? codetag_load_module+0x80/0x80 > > nf_nat module exit calls kfree_rcu on those addresses, but the free > operation is likely still pending by the time alloc_tag checks for leaks. > > Wait for outstanding kfree_rcu operations to complete before checking > resolves this warning. > > Reproducer: > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > grep nf_nat /proc/allocinfo # will list 4 allocations > rmmod nft_chain_nat > rmmod nf_nat # will WARN. > > ... > > --- a/lib/codetag.c > +++ b/lib/codetag.c > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > if (!mod) > return true; > > + kvfree_rcu_barrier(); > + > mutex_lock(&codetag_lock); > list_for_each_entry(cttype, &codetag_types, link) { > struct codetag_module *found = NULL; It's always hard to determine why a thing like this is present, so a comment is helpful: --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix +++ a/lib/codetag.c @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module if (!mod) return true; + /* await any module's kfree_rcu() operations to complete */ kvfree_rcu_barrier(); mutex_lock(&codetag_lock); _ But I do wonder whether this is in the correct place. Waiting for a module's ->exit() function's kfree_rcu()s to complete should properly be done by the core module handling code? free_module() does a full-on synchronize_rcu() prior to freeing the module memory itself and I think codetag_unload_module() could be called after that?
On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > Ben Greear reports following splat: > > ------------[ cut here ]------------ > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > ... > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > codetag_unload_module+0x19b/0x2a0 > > ? codetag_load_module+0x80/0x80 > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > Wait for outstanding kfree_rcu operations to complete before checking > > resolves this warning. > > > > Reproducer: > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > grep nf_nat /proc/allocinfo # will list 4 allocations > > rmmod nft_chain_nat > > rmmod nf_nat # will WARN. > > > > ... > > > > --- a/lib/codetag.c > > +++ b/lib/codetag.c > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > if (!mod) > > return true; > > > > + kvfree_rcu_barrier(); > > + > > mutex_lock(&codetag_lock); > > list_for_each_entry(cttype, &codetag_types, link) { > > struct codetag_module *found = NULL; > > It's always hard to determine why a thing like this is present, so a > comment is helpful: > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > +++ a/lib/codetag.c > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > if (!mod) > return true; > > + /* await any module's kfree_rcu() operations to complete */ > kvfree_rcu_barrier(); > > mutex_lock(&codetag_lock); > _ > > But I do wonder whether this is in the correct place. > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > should properly be done by the core module handling code? I don't think core module code cares about kfree_rcu()s being complete before the module is unloaded. Allocation tagging OTOH cares because it is about to destroy tags which will be accessed when kfree() actually happens, therefore a strict ordering is important. > > free_module() does a full-on synchronize_rcu() prior to freeing the > module memory itself and I think codetag_unload_module() could be > called after that? I think we could move codetag_unload_module() after synchronize_rcu() inside free_module() but according to the reply in https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ synchronize_rcu() does not help. I'm not quite sure why... Note that once I'm done upstreaming https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, this change will not be needed and I'm planning to remove this call, however this change is useful for backporting. It should be sent to stable@vger.kernel.org # v6.10+
On 10/8/24 03:49, Suren Baghdasaryan wrote: > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> >> On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: >> >> > Ben Greear reports following splat: >> > ------------[ cut here ]------------ >> > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >> > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >> > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >> > ... >> > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >> > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >> > codetag_unload_module+0x19b/0x2a0 >> > ? codetag_load_module+0x80/0x80 >> > >> > nf_nat module exit calls kfree_rcu on those addresses, but the free >> > operation is likely still pending by the time alloc_tag checks for leaks. >> > >> > Wait for outstanding kfree_rcu operations to complete before checking >> > resolves this warning. >> > >> > Reproducer: >> > unshare -n iptables-nft -t nat -A PREROUTING -p tcp >> > grep nf_nat /proc/allocinfo # will list 4 allocations >> > rmmod nft_chain_nat >> > rmmod nf_nat # will WARN. >> > >> > ... >> > >> > --- a/lib/codetag.c >> > +++ b/lib/codetag.c >> > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) >> > if (!mod) >> > return true; >> > >> > + kvfree_rcu_barrier(); >> > + >> > mutex_lock(&codetag_lock); >> > list_for_each_entry(cttype, &codetag_types, link) { >> > struct codetag_module *found = NULL; >> >> It's always hard to determine why a thing like this is present, so a >> comment is helpful: >> >> --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix >> +++ a/lib/codetag.c >> @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module >> if (!mod) >> return true; >> >> + /* await any module's kfree_rcu() operations to complete */ >> kvfree_rcu_barrier(); >> >> mutex_lock(&codetag_lock); >> _ >> >> But I do wonder whether this is in the correct place. >> >> Waiting for a module's ->exit() function's kfree_rcu()s to complete >> should properly be done by the core module handling code? > > I don't think core module code cares about kfree_rcu()s being complete > before the module is unloaded. Right, module unload should care about pending call_rcu() involving module code and/or module-created kmem_caches that are to be destroyed, but I think it's up to individual modules anyway to do a rcu_barrier() in those cases? > Allocation tagging OTOH cares because it is about to destroy tags > which will be accessed when kfree() actually happens, therefore a > strict ordering is important. > >> >> free_module() does a full-on synchronize_rcu() prior to freeing the >> module memory itself and I think codetag_unload_module() could be >> called after that? > I think we could move codetag_unload_module() after synchronize_rcu() > inside free_module() but according to the reply in > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > synchronize_rcu() does not help. I'm not quite sure why... synchronize_rcu() is only waiting for potential rcu read sections? You might have expected rcu_barrier() to help as that's waiting for pending call_rcu(). But as Ulad said the kfree_rcu() implementation does extra batching so there's now a new barrier for that, originally intended for kmem_cache_destroy(), but indeed useful here as well. > Note that once I'm done upstreaming > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > this change will not be needed and I'm planning to remove this call, > however this change is useful for backporting. It should be sent to > stable@vger.kernel.org # v6.10+
On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > Ben Greear reports following splat: > > > ------------[ cut here ]------------ > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > ... > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > codetag_unload_module+0x19b/0x2a0 > > > ? codetag_load_module+0x80/0x80 > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > resolves this warning. > > > > > > Reproducer: > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > rmmod nft_chain_nat > > > rmmod nf_nat # will WARN. > > > > > > ... > > > > > > --- a/lib/codetag.c > > > +++ b/lib/codetag.c > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > if (!mod) > > > return true; > > > > > > + kvfree_rcu_barrier(); > > > + > > > mutex_lock(&codetag_lock); > > > list_for_each_entry(cttype, &codetag_types, link) { > > > struct codetag_module *found = NULL; > > > > It's always hard to determine why a thing like this is present, so a > > comment is helpful: > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > +++ a/lib/codetag.c > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > if (!mod) > > return true; > > > > + /* await any module's kfree_rcu() operations to complete */ > > kvfree_rcu_barrier(); > > > > mutex_lock(&codetag_lock); > > _ > > > > But I do wonder whether this is in the correct place. > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > should properly be done by the core module handling code? > > I don't think core module code cares about kfree_rcu()s being complete > before the module is unloaded. > Allocation tagging OTOH cares because it is about to destroy tags > which will be accessed when kfree() actually happens, therefore a > strict ordering is important. > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > module memory itself and I think codetag_unload_module() could be > > called after that? > > I think we could move codetag_unload_module() after synchronize_rcu() > inside free_module() but according to the reply in > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > synchronize_rcu() does not help. I'm not quite sure why... > It is because, synchronize_rcu() is used for a bit different things, i.e. it is about a GP completion. Offloading objects can span several GPs. > Note that once I'm done upstreaming > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > this change will not be needed and I'm planning to remove this call, > however this change is useful for backporting. It should be sent to > stable@vger.kernel.org # v6.10+ > The kvfree_rcu_barrier() has been added into v6.12: <snip> urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 next-20240912 next-20240919 next-20240920 next-20241002 v6.12-rc1 urezki@pc638:~/data/raid0/coding/linux.git$ <snip> For 6.10+, it implies that the mentioned commit should be backported also. -- Uladzislau Rezki
On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > Ben Greear reports following splat: > > > > ------------[ cut here ]------------ > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > ... > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > codetag_unload_module+0x19b/0x2a0 > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > resolves this warning. > > > > > > > > Reproducer: > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > rmmod nft_chain_nat > > > > rmmod nf_nat # will WARN. > > > > > > > > ... > > > > > > > > --- a/lib/codetag.c > > > > +++ b/lib/codetag.c > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > if (!mod) > > > > return true; > > > > > > > > + kvfree_rcu_barrier(); > > > > + > > > > mutex_lock(&codetag_lock); > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > struct codetag_module *found = NULL; > > > > > > It's always hard to determine why a thing like this is present, so a > > > comment is helpful: > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > +++ a/lib/codetag.c > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > if (!mod) > > > return true; > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > kvfree_rcu_barrier(); > > > > > > mutex_lock(&codetag_lock); > > > _ > > > > > > But I do wonder whether this is in the correct place. > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > should properly be done by the core module handling code? > > > > I don't think core module code cares about kfree_rcu()s being complete > > before the module is unloaded. > > Allocation tagging OTOH cares because it is about to destroy tags > > which will be accessed when kfree() actually happens, therefore a > > strict ordering is important. > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > module memory itself and I think codetag_unload_module() could be > > > called after that? > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > inside free_module() but according to the reply in > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > synchronize_rcu() does not help. I'm not quite sure why... > > > It is because, synchronize_rcu() is used for a bit different things, > i.e. it is about a GP completion. Offloading objects can span several > GPs. Ah, thanks! Now that I looked into the patch that recently added kvfree_rcu_barrier() I understand that a bit better. > > > Note that once I'm done upstreaming > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > this change will not be needed and I'm planning to remove this call, > > however this change is useful for backporting. It should be sent to > > stable@vger.kernel.org # v6.10+ > > > The kvfree_rcu_barrier() has been added into v6.12: > > <snip> > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > next-20240912 > next-20240919 > next-20240920 > next-20241002 > v6.12-rc1 > urezki@pc638:~/data/raid0/coding/linux.git$ > <snip> > > For 6.10+, it implies that the mentioned commit should be backported also. I see. I guess for pre-6.12 we would use rcu_barrier() instead of kvfree_rcu_barrier()? > > -- > Uladzislau Rezki
On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > Ben Greear reports following splat: > > > > > ------------[ cut here ]------------ > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > ... > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > resolves this warning. > > > > > > > > > > Reproducer: > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > rmmod nft_chain_nat > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > ... > > > > > > > > > > --- a/lib/codetag.c > > > > > +++ b/lib/codetag.c > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > if (!mod) > > > > > return true; > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > + > > > > > mutex_lock(&codetag_lock); > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > struct codetag_module *found = NULL; > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > comment is helpful: > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > +++ a/lib/codetag.c > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > if (!mod) > > > > return true; > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > kvfree_rcu_barrier(); > > > > > > > > mutex_lock(&codetag_lock); > > > > _ > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > should properly be done by the core module handling code? > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > before the module is unloaded. > > > Allocation tagging OTOH cares because it is about to destroy tags > > > which will be accessed when kfree() actually happens, therefore a > > > strict ordering is important. > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > module memory itself and I think codetag_unload_module() could be > > > > called after that? > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > inside free_module() but according to the reply in > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > It is because, synchronize_rcu() is used for a bit different things, > > i.e. it is about a GP completion. Offloading objects can span several > > GPs. > > Ah, thanks! Now that I looked into the patch that recently added > kvfree_rcu_barrier() I understand that a bit better. > > > > > > Note that once I'm done upstreaming > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > this change will not be needed and I'm planning to remove this call, > > > however this change is useful for backporting. It should be sent to > > > stable@vger.kernel.org # v6.10+ > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > <snip> > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > next-20240912 > > next-20240919 > > next-20240920 > > next-20241002 > > v6.12-rc1 > > urezki@pc638:~/data/raid0/coding/linux.git$ > > <snip> > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > kvfree_rcu_barrier()? > For kernels < 6.12, unfortunately not :) If you have a possibility to switch to reclaim over call_rcu() API, then you can go with rcu_barrier() which waits for all callbacks to be completed. Or backport kvfree_rcu_barrier(). It __should__ be easy since there were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. -- Uladzislau Rezki
On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > Ben Greear reports following splat: > > > > > > ------------[ cut here ]------------ > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > ... > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > resolves this warning. > > > > > > > > > > > > Reproducer: > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > rmmod nft_chain_nat > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > ... > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > +++ b/lib/codetag.c > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > if (!mod) > > > > > > return true; > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > + > > > > > > mutex_lock(&codetag_lock); > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > comment is helpful: > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > +++ a/lib/codetag.c > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > if (!mod) > > > > > return true; > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > kvfree_rcu_barrier(); > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > _ > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > should properly be done by the core module handling code? > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > before the module is unloaded. > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > which will be accessed when kfree() actually happens, therefore a > > > > strict ordering is important. > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > module memory itself and I think codetag_unload_module() could be > > > > > called after that? > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > inside free_module() but according to the reply in > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > i.e. it is about a GP completion. Offloading objects can span several > > > GPs. > > > > Ah, thanks! Now that I looked into the patch that recently added > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > Note that once I'm done upstreaming > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > this change will not be needed and I'm planning to remove this call, > > > > however this change is useful for backporting. It should be sent to > > > > stable@vger.kernel.org # v6.10+ > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > <snip> > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > next-20240912 > > > next-20240919 > > > next-20240920 > > > next-20241002 > > > v6.12-rc1 > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > <snip> > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > kvfree_rcu_barrier()? > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > which waits for all callbacks to be completed. We do not control the call-site inside the module, so this would not be possible. > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. That sounds better. I'll take a stab at it. Thanks! > > -- > Uladzislau Rezki
On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > ------------[ cut here ]------------ > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > ... > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > resolves this warning. > > > > > > > > > > > > > > Reproducer: > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > rmmod nft_chain_nat > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > +++ b/lib/codetag.c > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > if (!mod) > > > > > > > return true; > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > + > > > > > > > mutex_lock(&codetag_lock); > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > comment is helpful: > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > +++ a/lib/codetag.c > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > if (!mod) > > > > > > return true; > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > _ > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > should properly be done by the core module handling code? > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > before the module is unloaded. > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > called after that? > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > inside free_module() but according to the reply in > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > GPs. > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > this change will not be needed and I'm planning to remove this call, > > > > > however this change is useful for backporting. It should be sent to > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > <snip> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > next-20240912 > > > > next-20240919 > > > > next-20240920 > > > > next-20241002 > > > > v6.12-rc1 > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > <snip> > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > kvfree_rcu_barrier()? > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > which waits for all callbacks to be completed. > > We do not control the call-site inside the module, so this would not > be possible. > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > That sounds better. I'll take a stab at it. Thanks! > You are welcome! -- Uladzislau Rezki
On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > > ------------[ cut here ]------------ > > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > > ... > > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > > resolves this warning. > > > > > > > > > > > > > > > > Reproducer: > > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > > rmmod nft_chain_nat > > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > > +++ b/lib/codetag.c > > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > > if (!mod) > > > > > > > > return true; > > > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > > + > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > > comment is helpful: > > > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > > +++ a/lib/codetag.c > > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > > if (!mod) > > > > > > > return true; > > > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > _ > > > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > > should properly be done by the core module handling code? > > > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > > before the module is unloaded. > > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > > called after that? > > > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > > inside free_module() but according to the reply in > > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > > GPs. > > > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > > this change will not be needed and I'm planning to remove this call, > > > > > > however this change is useful for backporting. It should be sent to > > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > > > <snip> > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > > next-20240912 > > > > > next-20240919 > > > > > next-20240920 > > > > > next-20241002 > > > > > v6.12-rc1 > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > > <snip> > > > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > > kvfree_rcu_barrier()? > > > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > > which waits for all callbacks to be completed. > > > > We do not control the call-site inside the module, so this would not > > be possible. > > > > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > > > That sounds better. I'll take a stab at it. Thanks! I prepared backports for stable 6.10.y and 6.11.y branches which contain this patch along with the prerequisite "rcu/kvfree: Add kvfree_rcu_barrier() API" patch. I think that's the only prerequisite required. However, since this patch is not yet in Linus' tree, I'll wait for it to get there before sending backports to stable (per Greg KH's recommendation). Please let me know if you think it's more critical and should be posted earlier. Thanks, Suren. > > > You are welcome! > > -- > Uladzislau Rezki >
On 10/15/24 18:22, Suren Baghdasaryan wrote: > On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> >> On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: >> > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> > > >> > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: >> > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: >> > > > > >> > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: >> > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: >> > > > > > > >> > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: >> > > > > > > >> > > > > > > > Ben Greear reports following splat: >> > > > > > > > ------------[ cut here ]------------ >> > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload >> > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 >> > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat >> > > > > > > > ... >> > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 >> > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 >> > > > > > > > codetag_unload_module+0x19b/0x2a0 >> > > > > > > > ? codetag_load_module+0x80/0x80 >> > > > > > > > >> > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free >> > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. >> > > > > > > > >> > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking >> > > > > > > > resolves this warning. >> > > > > > > > >> > > > > > > > Reproducer: >> > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp >> > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations >> > > > > > > > rmmod nft_chain_nat >> > > > > > > > rmmod nf_nat # will WARN. >> > > > > > > > >> > > > > > > > ... >> > > > > > > > >> > > > > > > > --- a/lib/codetag.c >> > > > > > > > +++ b/lib/codetag.c >> > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) >> > > > > > > > if (!mod) >> > > > > > > > return true; >> > > > > > > > >> > > > > > > > + kvfree_rcu_barrier(); >> > > > > > > > + >> > > > > > > > mutex_lock(&codetag_lock); >> > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { >> > > > > > > > struct codetag_module *found = NULL; >> > > > > > > >> > > > > > > It's always hard to determine why a thing like this is present, so a >> > > > > > > comment is helpful: >> > > > > > > >> > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix >> > > > > > > +++ a/lib/codetag.c >> > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module >> > > > > > > if (!mod) >> > > > > > > return true; >> > > > > > > >> > > > > > > + /* await any module's kfree_rcu() operations to complete */ >> > > > > > > kvfree_rcu_barrier(); >> > > > > > > >> > > > > > > mutex_lock(&codetag_lock); >> > > > > > > _ >> > > > > > > >> > > > > > > But I do wonder whether this is in the correct place. >> > > > > > > >> > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete >> > > > > > > should properly be done by the core module handling code? >> > > > > > >> > > > > > I don't think core module code cares about kfree_rcu()s being complete >> > > > > > before the module is unloaded. >> > > > > > Allocation tagging OTOH cares because it is about to destroy tags >> > > > > > which will be accessed when kfree() actually happens, therefore a >> > > > > > strict ordering is important. >> > > > > > >> > > > > > > >> > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the >> > > > > > > module memory itself and I think codetag_unload_module() could be >> > > > > > > called after that? >> > > > > > >> > > > > > I think we could move codetag_unload_module() after synchronize_rcu() >> > > > > > inside free_module() but according to the reply in >> > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ >> > > > > > synchronize_rcu() does not help. I'm not quite sure why... >> > > > > > >> > > > > It is because, synchronize_rcu() is used for a bit different things, >> > > > > i.e. it is about a GP completion. Offloading objects can span several >> > > > > GPs. >> > > > >> > > > Ah, thanks! Now that I looked into the patch that recently added >> > > > kvfree_rcu_barrier() I understand that a bit better. >> > > > >> > > > > >> > > > > > Note that once I'm done upstreaming >> > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, >> > > > > > this change will not be needed and I'm planning to remove this call, >> > > > > > however this change is useful for backporting. It should be sent to >> > > > > > stable@vger.kernel.org # v6.10+ >> > > > > > >> > > > > The kvfree_rcu_barrier() has been added into v6.12: >> > > > > >> > > > > <snip> >> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 >> > > > > next-20240912 >> > > > > next-20240919 >> > > > > next-20240920 >> > > > > next-20241002 >> > > > > v6.12-rc1 >> > > > > urezki@pc638:~/data/raid0/coding/linux.git$ >> > > > > <snip> >> > > > > >> > > > > For 6.10+, it implies that the mentioned commit should be backported also. >> > > > >> > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of >> > > > kvfree_rcu_barrier()? >> > > > >> > > For kernels < 6.12, unfortunately not :) If you have a possibility to >> > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() >> > > which waits for all callbacks to be completed. >> > >> > We do not control the call-site inside the module, so this would not >> > be possible. >> > >> > > >> > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there >> > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. >> > >> > That sounds better. I'll take a stab at it. Thanks! > > I prepared backports for stable 6.10.y and 6.11.y branches which 6.10 is already EOL thus won't be necessary. > contain this patch along with the prerequisite "rcu/kvfree: Add > kvfree_rcu_barrier() API" patch. I think that's the only prerequisite > required. However, since this patch is not yet in Linus' tree, I'll > wait for it to get there before sending backports to stable (per Greg > KH's recommendation). Please let me know if you think it's more > critical and should be posted earlier. > Thanks, > Suren. > >> > >> You are welcome! >> >> -- >> Uladzislau Rezki >>
On Tue, Oct 15, 2024 at 09:22:55AM -0700, Suren Baghdasaryan wrote: > On Wed, Oct 9, 2024 at 1:36 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > On Tue, Oct 08, 2024 at 11:34:10AM -0700, Suren Baghdasaryan wrote: > > > On Tue, Oct 8, 2024 at 6:07 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > On Tue, Oct 08, 2024 at 04:16:39AM -0700, Suren Baghdasaryan wrote: > > > > > On Tue, Oct 8, 2024 at 12:15 AM Uladzislau Rezki <urezki@gmail.com> wrote: > > > > > > > > > > > > On Mon, Oct 07, 2024 at 06:49:32PM -0700, Suren Baghdasaryan wrote: > > > > > > > On Mon, Oct 7, 2024 at 6:15 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > > > > > > > > > > > > > > > On Mon, 7 Oct 2024 22:52:24 +0200 Florian Westphal <fw@strlen.de> wrote: > > > > > > > > > > > > > > > > > Ben Greear reports following splat: > > > > > > > > > ------------[ cut here ]------------ > > > > > > > > > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > > > > > > > > > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > > > > > > > > > ... > > > > > > > > > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > > > > > > > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > > > > > > > > > codetag_unload_module+0x19b/0x2a0 > > > > > > > > > ? codetag_load_module+0x80/0x80 > > > > > > > > > > > > > > > > > > nf_nat module exit calls kfree_rcu on those addresses, but the free > > > > > > > > > operation is likely still pending by the time alloc_tag checks for leaks. > > > > > > > > > > > > > > > > > > Wait for outstanding kfree_rcu operations to complete before checking > > > > > > > > > resolves this warning. > > > > > > > > > > > > > > > > > > Reproducer: > > > > > > > > > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > > > > > > > > > grep nf_nat /proc/allocinfo # will list 4 allocations > > > > > > > > > rmmod nft_chain_nat > > > > > > > > > rmmod nf_nat # will WARN. > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > --- a/lib/codetag.c > > > > > > > > > +++ b/lib/codetag.c > > > > > > > > > @@ -228,6 +228,8 @@ bool codetag_unload_module(struct module *mod) > > > > > > > > > if (!mod) > > > > > > > > > return true; > > > > > > > > > > > > > > > > > > + kvfree_rcu_barrier(); > > > > > > > > > + > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > > list_for_each_entry(cttype, &codetag_types, link) { > > > > > > > > > struct codetag_module *found = NULL; > > > > > > > > > > > > > > > > It's always hard to determine why a thing like this is present, so a > > > > > > > > comment is helpful: > > > > > > > > > > > > > > > > --- a/lib/codetag.c~lib-alloc_tag_module_unload-must-wait-for-pending-kfree_rcu-calls-fix > > > > > > > > +++ a/lib/codetag.c > > > > > > > > @@ -228,6 +228,7 @@ bool codetag_unload_module(struct module > > > > > > > > if (!mod) > > > > > > > > return true; > > > > > > > > > > > > > > > > + /* await any module's kfree_rcu() operations to complete */ > > > > > > > > kvfree_rcu_barrier(); > > > > > > > > > > > > > > > > mutex_lock(&codetag_lock); > > > > > > > > _ > > > > > > > > > > > > > > > > But I do wonder whether this is in the correct place. > > > > > > > > > > > > > > > > Waiting for a module's ->exit() function's kfree_rcu()s to complete > > > > > > > > should properly be done by the core module handling code? > > > > > > > > > > > > > > I don't think core module code cares about kfree_rcu()s being complete > > > > > > > before the module is unloaded. > > > > > > > Allocation tagging OTOH cares because it is about to destroy tags > > > > > > > which will be accessed when kfree() actually happens, therefore a > > > > > > > strict ordering is important. > > > > > > > > > > > > > > > > > > > > > > > free_module() does a full-on synchronize_rcu() prior to freeing the > > > > > > > > module memory itself and I think codetag_unload_module() could be > > > > > > > > called after that? > > > > > > > > > > > > > > I think we could move codetag_unload_module() after synchronize_rcu() > > > > > > > inside free_module() but according to the reply in > > > > > > > https://lore.kernel.org/all/20241007112904.GA27104@breakpoint.cc/ > > > > > > > synchronize_rcu() does not help. I'm not quite sure why... > > > > > > > > > > > > > It is because, synchronize_rcu() is used for a bit different things, > > > > > > i.e. it is about a GP completion. Offloading objects can span several > > > > > > GPs. > > > > > > > > > > Ah, thanks! Now that I looked into the patch that recently added > > > > > kvfree_rcu_barrier() I understand that a bit better. > > > > > > > > > > > > > > > > > > Note that once I'm done upstreaming > > > > > > > https://lore.kernel.org/all/20240902044128.664075-3-surenb@google.com/, > > > > > > > this change will not be needed and I'm planning to remove this call, > > > > > > > however this change is useful for backporting. It should be sent to > > > > > > > stable@vger.kernel.org # v6.10+ > > > > > > > > > > > > > The kvfree_rcu_barrier() has been added into v6.12: > > > > > > > > > > > > <snip> > > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ git tag --contains 2b55d6a42d14c8675e38d6d9adca3014fdf01951 > > > > > > next-20240912 > > > > > > next-20240919 > > > > > > next-20240920 > > > > > > next-20241002 > > > > > > v6.12-rc1 > > > > > > urezki@pc638:~/data/raid0/coding/linux.git$ > > > > > > <snip> > > > > > > > > > > > > For 6.10+, it implies that the mentioned commit should be backported also. > > > > > > > > > > I see. I guess for pre-6.12 we would use rcu_barrier() instead of > > > > > kvfree_rcu_barrier()? > > > > > > > > > For kernels < 6.12, unfortunately not :) If you have a possibility to > > > > switch to reclaim over call_rcu() API, then you can go with rcu_barrier() > > > > which waits for all callbacks to be completed. > > > > > > We do not control the call-site inside the module, so this would not > > > be possible. > > > > > > > > > > > Or backport kvfree_rcu_barrier(). It __should__ be easy since there > > > > were not many changes into kvfree_rcu() between 6.10 and 6.12-rcX. > > > > > > That sounds better. I'll take a stab at it. Thanks! > > I prepared backports for stable 6.10.y and 6.11.y branches which > contain this patch along with the prerequisite "rcu/kvfree: Add > kvfree_rcu_barrier() API" patch. I think that's the only prerequisite > required. However, since this patch is not yet in Linus' tree, I'll > wait for it to get there before sending backports to stable (per Greg > KH's recommendation). Please let me know if you think it's more > critical and should be posted earlier. > To me it sounds reasonable. -- Uladzislau Rezki
© 2016 - 2024 Red Hat, Inc.