[v2] memcg: decouple memcg and objcg stocks

[PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Shakeel Butt 9 months, 1 week ago

There is no need to disable irqs to use memcg per-cpu stock, so let's
just not do that. One consequence of this change is if the kernel while
in task context has the memcg stock lock and that cpu got interrupted.
The memcg charges on that cpu in the irq context will take the slow path
of memcg charging. However that should be super rare and should be fine
in general.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/memcontrol.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index cd81c70d144b..f8b9c7aa6771 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 {
 	struct memcg_stock_pcp *stock;
 	uint8_t stock_pages;
-	unsigned long flags;
 	bool ret = false;
 	int i;
 
@@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 		return ret;
 
 	if (gfpflags_allow_spinning(gfp_mask))
-		local_lock_irqsave(&memcg_stock.lock, flags);
-	else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
+		local_lock(&memcg_stock.lock);
+	else if (!local_trylock(&memcg_stock.lock))
 		return ret;
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -1884,7 +1883,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 		break;
 	}
 
-	local_unlock_irqrestore(&memcg_stock.lock, flags);
+	local_unlock(&memcg_stock.lock);
 
 	return ret;
 }
@@ -1928,18 +1927,17 @@ static void drain_stock_fully(struct memcg_stock_pcp *stock)
 static void drain_local_memcg_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 
 	if (WARN_ONCE(!in_task(), "drain in non-task context"))
 		return;
 
-	local_lock_irqsave(&memcg_stock.lock, flags);
+	local_lock(&memcg_stock.lock);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	drain_stock_fully(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
 
-	local_unlock_irqrestore(&memcg_stock.lock, flags);
+	local_unlock(&memcg_stock.lock);
 }
 
 static void drain_local_obj_stock(struct work_struct *dummy)
@@ -1964,7 +1962,6 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 	struct memcg_stock_pcp *stock;
 	struct mem_cgroup *cached;
 	uint8_t stock_pages;
-	unsigned long flags;
 	bool success = false;
 	int empty_slot = -1;
 	int i;
@@ -1979,7 +1976,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 	VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
 
 	if (nr_pages > MEMCG_CHARGE_BATCH ||
-	    !local_trylock_irqsave(&memcg_stock.lock, flags)) {
+	    !local_trylock(&memcg_stock.lock)) {
 		/*
 		 * In case of larger than batch refill or unlikely failure to
 		 * lock the percpu memcg_stock.lock, uncharge memcg directly.
@@ -2014,7 +2011,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 		WRITE_ONCE(stock->nr_pages[i], nr_pages);
 	}
 
-	local_unlock_irqrestore(&memcg_stock.lock, flags);
+	local_unlock(&memcg_stock.lock);
 }
 
 static bool is_memcg_drain_needed(struct memcg_stock_pcp *stock,
-- 
2.47.1

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Alexei Starovoitov 9 months, 1 week ago

On Thu, May 1, 2025 at 5:18 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> There is no need to disable irqs to use memcg per-cpu stock, so let's
> just not do that. One consequence of this change is if the kernel while
> in task context has the memcg stock lock and that cpu got interrupted.
> The memcg charges on that cpu in the irq context will take the slow path
> of memcg charging. However that should be super rare and should be fine
> in general.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/memcontrol.c | 17 +++++++----------
>  1 file changed, 7 insertions(+), 10 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index cd81c70d144b..f8b9c7aa6771 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>  {
>         struct memcg_stock_pcp *stock;
>         uint8_t stock_pages;
> -       unsigned long flags;
>         bool ret = false;
>         int i;
>
> @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>                 return ret;
>
>         if (gfpflags_allow_spinning(gfp_mask))
> -               local_lock_irqsave(&memcg_stock.lock, flags);
> -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> +               local_lock(&memcg_stock.lock);
> +       else if (!local_trylock(&memcg_stock.lock))
>                 return ret;

I don't think it works.
When there is a normal irq and something doing regular GFP_NOWAIT
allocation gfpflags_allow_spinning() will be true and
local_lock() will reenter and complain that lock->acquired is
already set... but only with lockdep on.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Shakeel Butt 9 months, 1 week ago

On Fri, May 2, 2025 at 11:29 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, May 1, 2025 at 5:18 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > There is no need to disable irqs to use memcg per-cpu stock, so let's
> > just not do that. One consequence of this change is if the kernel while
> > in task context has the memcg stock lock and that cpu got interrupted.
> > The memcg charges on that cpu in the irq context will take the slow path
> > of memcg charging. However that should be super rare and should be fine
> > in general.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > ---
> >  mm/memcontrol.c | 17 +++++++----------
> >  1 file changed, 7 insertions(+), 10 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index cd81c70d144b..f8b9c7aa6771 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >  {
> >         struct memcg_stock_pcp *stock;
> >         uint8_t stock_pages;
> > -       unsigned long flags;
> >         bool ret = false;
> >         int i;
> >
> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >                 return ret;
> >
> >         if (gfpflags_allow_spinning(gfp_mask))
> > -               local_lock_irqsave(&memcg_stock.lock, flags);
> > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> > +               local_lock(&memcg_stock.lock);
> > +       else if (!local_trylock(&memcg_stock.lock))
> >                 return ret;
>
> I don't think it works.
> When there is a normal irq and something doing regular GFP_NOWAIT
> allocation gfpflags_allow_spinning() will be true and
> local_lock() will reenter and complain that lock->acquired is
> already set... but only with lockdep on.

Yes indeed. I dropped the first patch and didn't fix this one
accordingly. I think the fix can be as simple as checking for
in_task() here instead of gfp_mask. That should work for both RT and
non-RT kernels.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Vlastimil Babka 9 months, 1 week ago

On 5/3/25 01:03, Shakeel Butt wrote:
>> > index cd81c70d144b..f8b9c7aa6771 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>> >  {
>> >         struct memcg_stock_pcp *stock;
>> >         uint8_t stock_pages;
>> > -       unsigned long flags;
>> >         bool ret = false;
>> >         int i;
>> >
>> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>> >                 return ret;
>> >
>> >         if (gfpflags_allow_spinning(gfp_mask))
>> > -               local_lock_irqsave(&memcg_stock.lock, flags);
>> > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
>> > +               local_lock(&memcg_stock.lock);
>> > +       else if (!local_trylock(&memcg_stock.lock))
>> >                 return ret;
>>
>> I don't think it works.
>> When there is a normal irq and something doing regular GFP_NOWAIT
>> allocation gfpflags_allow_spinning() will be true and
>> local_lock() will reenter and complain that lock->acquired is
>> already set... but only with lockdep on.
> 
> Yes indeed. I dropped the first patch and didn't fix this one
> accordingly. I think the fix can be as simple as checking for
> in_task() here instead of gfp_mask. That should work for both RT and
> non-RT kernels.

These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
correct way how to use the local_trylock() to avoid these.

As for the RT concerns, AFAIK RT isn't about being fast, but about being
preemptible, and the v1 approach didn't violate that - taking the slowpaths
more often shouldn't be an issue.

Let me quote Shakeel's scenario from the v1 thread:

> I didn't really think too much about PREEMPT_RT kernels as I assume
> performance is not top priority but I think I get your point. Let me

Agreed.

> explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> lock is a spin lock which is actually a mutex but with priority
> inheritance. A task having the local lock can still get context switched

Let's say (seems implied already) this is a low prio task.

> (but will remain on same CPU run queue) and the newer task can try to

And this is a high prio task.

> acquire the memcg stock local lock. If we just do trylock, it will
> always go to the slow path but if we do local_lock() then it will sleeps
> and possibly gives its priority to the task owning the lock and possibly
> make that task to get the CPU. Later the task slept on memcg stock lock
> will wake up and go through fast path.

I think from RT latency perspective it could very much be better for the
high prio task just skip the fast path and go for the slowpath, instead of
going to sleep while boosting the low prio task to let the high prio task
use the fast path later. It's not really a fast path anymore I'd say.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Shakeel Butt 9 months, 1 week ago

Ccing networking folks.

Background: https://lore.kernel.org/dvyyqubghf67b3qsuoreegqk4qnuuqfkk7plpfhhrck5yeeuic@xbn4c6c7yc42/

On Mon, May 05, 2025 at 12:28:43PM +0200, Vlastimil Babka wrote:
> On 5/3/25 01:03, Shakeel Butt wrote:
> >> > index cd81c70d144b..f8b9c7aa6771 100644
> >> > --- a/mm/memcontrol.c
> >> > +++ b/mm/memcontrol.c
> >> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >> >  {
> >> >         struct memcg_stock_pcp *stock;
> >> >         uint8_t stock_pages;
> >> > -       unsigned long flags;
> >> >         bool ret = false;
> >> >         int i;
> >> >
> >> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >> >                 return ret;
> >> >
> >> >         if (gfpflags_allow_spinning(gfp_mask))
> >> > -               local_lock_irqsave(&memcg_stock.lock, flags);
> >> > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> >> > +               local_lock(&memcg_stock.lock);
> >> > +       else if (!local_trylock(&memcg_stock.lock))
> >> >                 return ret;
> >>
> >> I don't think it works.
> >> When there is a normal irq and something doing regular GFP_NOWAIT
> >> allocation gfpflags_allow_spinning() will be true and
> >> local_lock() will reenter and complain that lock->acquired is
> >> already set... but only with lockdep on.
> > 
> > Yes indeed. I dropped the first patch and didn't fix this one
> > accordingly. I think the fix can be as simple as checking for
> > in_task() here instead of gfp_mask. That should work for both RT and
> > non-RT kernels.
> 
> These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
> correct way how to use the local_trylock() to avoid these.
> 
> As for the RT concerns, AFAIK RT isn't about being fast, but about being
> preemptible, and the v1 approach didn't violate that - taking the slowpaths
> more often shouldn't be an issue.
> 
> Let me quote Shakeel's scenario from the v1 thread:
> 
> > I didn't really think too much about PREEMPT_RT kernels as I assume
> > performance is not top priority but I think I get your point. Let me
> 
> Agreed.
> 
> > explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> > lock is a spin lock which is actually a mutex but with priority
> > inheritance. A task having the local lock can still get context switched
> 
> Let's say (seems implied already) this is a low prio task.
> 
> > (but will remain on same CPU run queue) and the newer task can try to
> 
> And this is a high prio task.
> 
> > acquire the memcg stock local lock. If we just do trylock, it will
> > always go to the slow path but if we do local_lock() then it will sleeps
> > and possibly gives its priority to the task owning the lock and possibly
> > make that task to get the CPU. Later the task slept on memcg stock lock
> > will wake up and go through fast path.
> 
> I think from RT latency perspective it could very much be better for the
> high prio task just skip the fast path and go for the slowpath, instead of
> going to sleep while boosting the low prio task to let the high prio task
> use the fast path later. It's not really a fast path anymore I'd say.

Thanks Vlastimil, this is actually a very good point. Slow path of memcg
charging is couple of atomic operations while the alternative here is at
least two context switches (and possibly scheduler delay). So, it does
not seem like a fast path anymore.

I have cc'ed networking folks to get their take as well. Orthogonally I
will do some netperf benchmarking on v1 with RT kernel.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Shakeel Butt 9 months, 1 week ago

On Mon, May 05, 2025 at 10:13:37AM -0700, Shakeel Butt wrote:
> Ccing networking folks.
> 
> Background: https://lore.kernel.org/dvyyqubghf67b3qsuoreegqk4qnuuqfkk7plpfhhrck5yeeuic@xbn4c6c7yc42/
> 
> On Mon, May 05, 2025 at 12:28:43PM +0200, Vlastimil Babka wrote:
> > On 5/3/25 01:03, Shakeel Butt wrote:
> > >> > index cd81c70d144b..f8b9c7aa6771 100644
> > >> > --- a/mm/memcontrol.c
> > >> > +++ b/mm/memcontrol.c
> > >> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> >  {
> > >> >         struct memcg_stock_pcp *stock;
> > >> >         uint8_t stock_pages;
> > >> > -       unsigned long flags;
> > >> >         bool ret = false;
> > >> >         int i;
> > >> >
> > >> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >> >                 return ret;
> > >> >
> > >> >         if (gfpflags_allow_spinning(gfp_mask))
> > >> > -               local_lock_irqsave(&memcg_stock.lock, flags);
> > >> > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> > >> > +               local_lock(&memcg_stock.lock);
> > >> > +       else if (!local_trylock(&memcg_stock.lock))
> > >> >                 return ret;
> > >>
> > >> I don't think it works.
> > >> When there is a normal irq and something doing regular GFP_NOWAIT
> > >> allocation gfpflags_allow_spinning() will be true and
> > >> local_lock() will reenter and complain that lock->acquired is
> > >> already set... but only with lockdep on.
> > > 
> > > Yes indeed. I dropped the first patch and didn't fix this one
> > > accordingly. I think the fix can be as simple as checking for
> > > in_task() here instead of gfp_mask. That should work for both RT and
> > > non-RT kernels.
> > 
> > These in_task() checks seem hacky to me. I think the patch 1 in v1 was the
> > correct way how to use the local_trylock() to avoid these.
> > 
> > As for the RT concerns, AFAIK RT isn't about being fast, but about being
> > preemptible, and the v1 approach didn't violate that - taking the slowpaths
> > more often shouldn't be an issue.
> > 
> > Let me quote Shakeel's scenario from the v1 thread:
> > 
> > > I didn't really think too much about PREEMPT_RT kernels as I assume
> > > performance is not top priority but I think I get your point. Let me
> > 
> > Agreed.
> > 
> > > explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> > > lock is a spin lock which is actually a mutex but with priority
> > > inheritance. A task having the local lock can still get context switched
> > 
> > Let's say (seems implied already) this is a low prio task.
> > 
> > > (but will remain on same CPU run queue) and the newer task can try to
> > 
> > And this is a high prio task.
> > 
> > > acquire the memcg stock local lock. If we just do trylock, it will
> > > always go to the slow path but if we do local_lock() then it will sleeps
> > > and possibly gives its priority to the task owning the lock and possibly
> > > make that task to get the CPU. Later the task slept on memcg stock lock
> > > will wake up and go through fast path.
> > 
> > I think from RT latency perspective it could very much be better for the
> > high prio task just skip the fast path and go for the slowpath, instead of
> > going to sleep while boosting the low prio task to let the high prio task
> > use the fast path later. It's not really a fast path anymore I'd say.
> 
> Thanks Vlastimil, this is actually a very good point. Slow path of memcg
> charging is couple of atomic operations while the alternative here is at
> least two context switches (and possibly scheduler delay). So, it does
> not seem like a fast path anymore.
> 
> I have cc'ed networking folks to get their take as well. Orthogonally I
> will do some netperf benchmarking on v1 with RT kernel.

Let me share the result with PREEMPT_RT config on next-20250505 with and
without the v1 of this series.

I ran varying number of netperf clients in different cgroups on a 72 CPU
machine.

 $ netserver -6
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

number of clients | Without series | With series
  6               | 38559.1 Mbps   | 38652.6 Mbps
  12              | 37388.8 Mbps   | 37560.1 Mbps
  18              | 30707.5 Mbps   | 31378.3 Mbps
  24              | 25908.4 Mbps   | 26423.9 Mbps
  30              | 22347.7 Mbps   | 22326.5 Mbps
  36              | 20235.1 Mbps   | 20165.0 Mbps

I don't see any significant performance difference for the network
intensive workload with this series.

I am going to send out v3 which will be rebased version of v1 with all
these details unless someone has concerns about this.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Alexei Starovoitov 9 months, 1 week ago

On Fri, May 2, 2025 at 4:03 PM Shakeel Butt <shakeel.butt@gmail.com> wrote:
>
> On Fri, May 2, 2025 at 11:29 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, May 1, 2025 at 5:18 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > There is no need to disable irqs to use memcg per-cpu stock, so let's
> > > just not do that. One consequence of this change is if the kernel while
> > > in task context has the memcg stock lock and that cpu got interrupted.
> > > The memcg charges on that cpu in the irq context will take the slow path
> > > of memcg charging. However that should be super rare and should be fine
> > > in general.
> > >
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > > ---
> > >  mm/memcontrol.c | 17 +++++++----------
> > >  1 file changed, 7 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index cd81c70d144b..f8b9c7aa6771 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >  {
> > >         struct memcg_stock_pcp *stock;
> > >         uint8_t stock_pages;
> > > -       unsigned long flags;
> > >         bool ret = false;
> > >         int i;
> > >
> > > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > >                 return ret;
> > >
> > >         if (gfpflags_allow_spinning(gfp_mask))
> > > -               local_lock_irqsave(&memcg_stock.lock, flags);
> > > -       else if (!local_trylock_irqsave(&memcg_stock.lock, flags))
> > > +               local_lock(&memcg_stock.lock);
> > > +       else if (!local_trylock(&memcg_stock.lock))
> > >                 return ret;
> >
> > I don't think it works.
> > When there is a normal irq and something doing regular GFP_NOWAIT
> > allocation gfpflags_allow_spinning() will be true and
> > local_lock() will reenter and complain that lock->acquired is
> > already set... but only with lockdep on.
>
> Yes indeed. I dropped the first patch and didn't fix this one
> accordingly. I think the fix can be as simple as checking for
> in_task() here instead of gfp_mask. That should work for both RT and
> non-RT kernels.

Like:
if (in_task())
  local_lock(...);
else if (!local_trylock(...))

Most of the networking runs in bh, so it will be using
local_trylock() path which is probably ok in !PREEMPT_RT,
but will cause random performance issues in PREEMP_RT,
since rt_spin_trylock() will be randomly failing and taking
slow path of charging. It's not going to cause permanent
nginx 3x regression :), but unlucky slowdowns will be seen.
A task can grab that per-cpu rt_spin_lock and preempted
by network processing.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Shakeel Butt 9 months, 1 week ago

On Fri, May 2, 2025 at 4:28 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
[...]
> > >
> > > I don't think it works.
> > > When there is a normal irq and something doing regular GFP_NOWAIT
> > > allocation gfpflags_allow_spinning() will be true and
> > > local_lock() will reenter and complain that lock->acquired is
> > > already set... but only with lockdep on.
> >
> > Yes indeed. I dropped the first patch and didn't fix this one
> > accordingly. I think the fix can be as simple as checking for
> > in_task() here instead of gfp_mask. That should work for both RT and
> > non-RT kernels.
>
> Like:
> if (in_task())
>   local_lock(...);
> else if (!local_trylock(...))
>
> Most of the networking runs in bh, so it will be using
> local_trylock() path which is probably ok in !PREEMPT_RT,
> but will cause random performance issues in PREEMP_RT,
> since rt_spin_trylock() will be randomly failing and taking
> slow path of charging. It's not going to cause permanent
> nginx 3x regression :), but unlucky slowdowns will be seen.
> A task can grab that per-cpu rt_spin_lock and preempted
> by network processing.

Does networking run in bh for PREEMPT_RT as well?

I think I should get networking & RT folks opinion on this one. I will
decouple this irq patch from the decoupling lock patches and start a
separate discussion thread.

Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock

Posted by Sebastian Andrzej Siewior 9 months, 1 week ago

On 2025-05-02 16:40:53 [-0700], Shakeel Butt wrote:
> On Fri, May 2, 2025 at 4:28 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> [...]
> > > >
> > > > I don't think it works.
> > > > When there is a normal irq and something doing regular GFP_NOWAIT
> > > > allocation gfpflags_allow_spinning() will be true and
> > > > local_lock() will reenter and complain that lock->acquired is
> > > > already set... but only with lockdep on.
> > >
> > > Yes indeed. I dropped the first patch and didn't fix this one
> > > accordingly. I think the fix can be as simple as checking for
> > > in_task() here instead of gfp_mask. That should work for both RT and
> > > non-RT kernels.
> >
> > Like:
> > if (in_task())
> >   local_lock(...);
> > else if (!local_trylock(...))
> >
> > Most of the networking runs in bh, so it will be using
> > local_trylock() path which is probably ok in !PREEMPT_RT,
> > but will cause random performance issues in PREEMP_RT,
> > since rt_spin_trylock() will be randomly failing and taking
> > slow path of charging. It's not going to cause permanent
> > nginx 3x regression :), but unlucky slowdowns will be seen.
> > A task can grab that per-cpu rt_spin_lock and preempted
> > by network processing.
> 
> Does networking run in bh for PREEMPT_RT as well?

It does but BH is preemptible.

> I think I should get networking & RT folks opinion on this one. I will
> decouple this irq patch from the decoupling lock patches and start a
> separate discussion thread.

Sebastian