[v1] memcg: decouple memcg and objcg stocks

[PATCH 1/4] memcg: simplify consume_stock

Posted by Shakeel Butt 9 months, 1 week ago

The consume_stock() does not need to check gfp_mask for spinning and can
simply trylock the local lock to decide to proceed or fail. No need to
spin at all for local lock.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 650fe4314c39..40d0838d88bc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1804,16 +1804,14 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
  * consume_stock: Try to consume stocked charge on this cpu.
  * @memcg: memcg to consume from.
  * @nr_pages: how many pages to charge.
- * @gfp_mask: allocation mask.
  *
- * The charges will only happen if @memcg matches the current cpu's memcg
- * stock, and at least @nr_pages are available in that stock.  Failure to
- * service an allocation will refill the stock.
+ * Consume the cached charge if enough nr_pages are present otherwise return
+ * failure. Also return failure for charge request larger than
+ * MEMCG_CHARGE_BATCH or if the local lock is already taken.
  *
  * returns true if successful, false otherwise.
  */
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
-			  gfp_t gfp_mask)
+static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
 	uint8_t stock_pages;
@@ -1821,12 +1819,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 	bool ret = false;
 	int i;
 
-	if (nr_pages > MEMCG_CHARGE_BATCH)
-		return ret;
-
-	if (gfpflags_allow_spinning(gfp_mask))
-		local_lock_irqsave(&memcg_stock.stock_lock, flags);
-	else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags))
+	if (nr_pages > MEMCG_CHARGE_BATCH ||
+	    !local_trylock_irqsave(&memcg_stock.stock_lock, flags))
 		return ret;
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -2329,7 +2323,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	unsigned long pflags;
 
 retry:
-	if (consume_stock(memcg, nr_pages, gfp_mask))
+	if (consume_stock(memcg, nr_pages))
 		return 0;
 
 	if (!gfpflags_allow_spinning(gfp_mask))
-- 
2.47.1

Re: [PATCH 1/4] memcg: simplify consume_stock

Posted by Alexei Starovoitov 9 months, 1 week ago

On Tue, Apr 29, 2025 at 04:04:25PM -0700, Shakeel Butt wrote:
> The consume_stock() does not need to check gfp_mask for spinning and can
> simply trylock the local lock to decide to proceed or fail. No need to
> spin at all for local lock.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/memcontrol.c | 20 +++++++-------------
>  1 file changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 650fe4314c39..40d0838d88bc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1804,16 +1804,14 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
>   * consume_stock: Try to consume stocked charge on this cpu.
>   * @memcg: memcg to consume from.
>   * @nr_pages: how many pages to charge.
> - * @gfp_mask: allocation mask.
>   *
> - * The charges will only happen if @memcg matches the current cpu's memcg
> - * stock, and at least @nr_pages are available in that stock.  Failure to
> - * service an allocation will refill the stock.
> + * Consume the cached charge if enough nr_pages are present otherwise return
> + * failure. Also return failure for charge request larger than
> + * MEMCG_CHARGE_BATCH or if the local lock is already taken.
>   *
>   * returns true if successful, false otherwise.
>   */
> -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> -			  gfp_t gfp_mask)
> +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
>  	struct memcg_stock_pcp *stock;
>  	uint8_t stock_pages;
> @@ -1821,12 +1819,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>  	bool ret = false;
>  	int i;
>  
> -	if (nr_pages > MEMCG_CHARGE_BATCH)
> -		return ret;
> -
> -	if (gfpflags_allow_spinning(gfp_mask))
> -		local_lock_irqsave(&memcg_stock.stock_lock, flags);
> -	else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> +	if (nr_pages > MEMCG_CHARGE_BATCH ||
> +	    !local_trylock_irqsave(&memcg_stock.stock_lock, flags))

I don't think it's a good idea.
spin_trylock() will fail often enough in PREEMPT_RT.
Even during normal boot I see preemption between tasks and they
contend on the same cpu for the same local_lock==spin_lock.
Making them take slow path is a significant behavior change
that needs to be carefully considered.

Also please cc bpf@vger in the future for these kind of changes.

Re: [PATCH 1/4] memcg: simplify consume_stock

Posted by Shakeel Butt 9 months, 1 week ago

On Tue, Apr 29, 2025 at 04:51:03PM -0700, Alexei Starovoitov wrote:
> On Tue, Apr 29, 2025 at 04:04:25PM -0700, Shakeel Butt wrote:
> > The consume_stock() does not need to check gfp_mask for spinning and can
> > simply trylock the local lock to decide to proceed or fail. No need to
> > spin at all for local lock.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >  mm/memcontrol.c | 20 +++++++-------------
> >  1 file changed, 7 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 650fe4314c39..40d0838d88bc 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1804,16 +1804,14 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
> >   * consume_stock: Try to consume stocked charge on this cpu.
> >   * @memcg: memcg to consume from.
> >   * @nr_pages: how many pages to charge.
> > - * @gfp_mask: allocation mask.
> >   *
> > - * The charges will only happen if @memcg matches the current cpu's memcg
> > - * stock, and at least @nr_pages are available in that stock.  Failure to
> > - * service an allocation will refill the stock.
> > + * Consume the cached charge if enough nr_pages are present otherwise return
> > + * failure. Also return failure for charge request larger than
> > + * MEMCG_CHARGE_BATCH or if the local lock is already taken.
> >   *
> >   * returns true if successful, false otherwise.
> >   */
> > -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> > -			  gfp_t gfp_mask)
> > +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >  {
> >  	struct memcg_stock_pcp *stock;
> >  	uint8_t stock_pages;
> > @@ -1821,12 +1819,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> >  	bool ret = false;
> >  	int i;
> >  
> > -	if (nr_pages > MEMCG_CHARGE_BATCH)
> > -		return ret;
> > -
> > -	if (gfpflags_allow_spinning(gfp_mask))
> > -		local_lock_irqsave(&memcg_stock.stock_lock, flags);
> > -	else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> > +	if (nr_pages > MEMCG_CHARGE_BATCH ||
> > +	    !local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> 
> I don't think it's a good idea.
> spin_trylock() will fail often enough in PREEMPT_RT.
> Even during normal boot I see preemption between tasks and they
> contend on the same cpu for the same local_lock==spin_lock.
> Making them take slow path is a significant behavior change
> that needs to be carefully considered.

I didn't really think too much about PREEMPT_RT kernels as I assume
performance is not top priority but I think I get your point. Let me
explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
lock is a spin lock which is actually a mutex but with priority
inheritance. A task having the local lock can still get context switched
(but will remain on same CPU run queue) and the newer task can try to
acquire the memcg stock local lock. If we just do trylock, it will
always go to the slow path but if we do local_lock() then it will sleeps
and possibly gives its priority to the task owning the lock and possibly
make that task to get the CPU. Later the task slept on memcg stock lock
will wake up and go through fast path.


Ok, I will drop the first patch. Please let me know your comments on the
remaining series.

> 
> Also please cc bpf@vger in the future for these kind of changes.

Sounds good.

Re: [PATCH 1/4] memcg: simplify consume_stock

Posted by Sebastian Andrzej Siewior 9 months, 1 week ago

On 2025-04-29 21:37:26 [-0700], Shakeel Butt wrote:
> > > -	if (gfpflags_allow_spinning(gfp_mask))
> > > -		local_lock_irqsave(&memcg_stock.stock_lock, flags);
> > > -	else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> > > +	if (nr_pages > MEMCG_CHARGE_BATCH ||
> > > +	    !local_trylock_irqsave(&memcg_stock.stock_lock, flags))
> > 
> > I don't think it's a good idea.
> > spin_trylock() will fail often enough in PREEMPT_RT.
> > Even during normal boot I see preemption between tasks and they
> > contend on the same cpu for the same local_lock==spin_lock.
> > Making them take slow path is a significant behavior change
> > that needs to be carefully considered.
> 
> I didn't really think too much about PREEMPT_RT kernels as I assume
> performance is not top priority but I think I get your point. Let me

Not sure if this is performance nor simply failing to allocate memory.

> explain and correct me if I am wrong. On PREEMPT_RT kernel, the local
> lock is a spin lock which is actually a mutex but with priority
> inheritance. A task having the local lock can still get context switched
> (but will remain on same CPU run queue) and the newer task can try to
> acquire the memcg stock local lock. If we just do trylock, it will
> always go to the slow path but if we do local_lock() then it will sleeps
> and possibly gives its priority to the task owning the lock and possibly
> make that task to get the CPU. Later the task slept on memcg stock lock
> will wake up and go through fast path.

So far correct. On PREEMPT_RT a task with spinlock_t or local_lock_t can
get preempted while owning the lock. The local_lock_t is a per-CPU lock.

Sebastian