[PATCH 3/6] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack

Dawei Li posted 6 patches 1 year, 10 months ago
There is a newer version of this series
[PATCH 3/6] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
Posted by Dawei Li 1 year, 10 months ago
In general it's preferable to avoid placing cpumasks on the stack, as
for large values of NR_CPUS these can consume significant amounts of
stack space and make stack overflows more likely.

Remove cpumask var on stack and use proper cpumask API to address it.

Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
---
 drivers/irqchip/irq-gic-v3-its.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b36680..a821396c4261 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,7 +3826,7 @@ static int its_vpe_set_affinity(struct irq_data *d,
 				bool force)
 {
 	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-	struct cpumask common, *table_mask;
+	struct cpumask *table_mask;
 	unsigned long flags;
 	int from, cpu;
 
@@ -3850,8 +3850,11 @@ static int its_vpe_set_affinity(struct irq_data *d,
 	 * If we are offered another CPU in the same GICv4.1 ITS
 	 * affinity, pick this one. Otherwise, any CPU will do.
 	 */
-	if (table_mask && cpumask_and(&common, mask_val, table_mask))
-		cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
+	if (table_mask && cpumask_intersects(mask_val, table_mask)) {
+		cpu = cpumask_test_cpu(from, mask_val) &&
+		      cpumask_test_cpu(from, table_mask) ?
+		      from : cpumask_first_and(mask_val, table_mask);
+	}
 	else
 		cpu = cpumask_first(mask_val);
 
-- 
2.27.0
Re: [PATCH 3/6] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
Posted by Marc Zyngier 1 year, 10 months ago
On Fri, 12 Apr 2024 11:58:36 +0100,
Dawei Li <dawei.li@shingroup.cn> wrote:
> 
> In general it's preferable to avoid placing cpumasks on the stack, as
> for large values of NR_CPUS these can consume significant amounts of
> stack space and make stack overflows more likely.
>
> Remove cpumask var on stack and use proper cpumask API to address it.

Define proper. Or better, define what is "improper" about the current
usage.

>
> Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index fca888b36680..a821396c4261 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3826,7 +3826,7 @@ static int its_vpe_set_affinity(struct irq_data *d,
>  				bool force)
>  {
>  	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> -	struct cpumask common, *table_mask;
> +	struct cpumask *table_mask;
>  	unsigned long flags;
>  	int from, cpu;
>  
> @@ -3850,8 +3850,11 @@ static int its_vpe_set_affinity(struct irq_data *d,
>  	 * If we are offered another CPU in the same GICv4.1 ITS
>  	 * affinity, pick this one. Otherwise, any CPU will do.
>  	 */
> -	if (table_mask && cpumask_and(&common, mask_val, table_mask))
> -		cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
> +	if (table_mask && cpumask_intersects(mask_val, table_mask)) {
> +		cpu = cpumask_test_cpu(from, mask_val) &&
> +		      cpumask_test_cpu(from, table_mask) ?
> +		      from : cpumask_first_and(mask_val, table_mask);

So we may end-up computing the AND of the two bitmaps twice (once for
cpumask_intersects(), once for cpumask_first_and()), instead of only
doing it once.

I don't expect that to be horrible, but I also note that you don't
even talk about the trade-offs you are choosing to make.

> +	}
>  	else
>  		cpu = cpumask_first(mask_val);

Please fix the coding style (if () { ... } else { ... }).

	M.

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH 3/6] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
Posted by Dawei Li 1 year, 10 months ago
Hi Marc,

Thanks for the review.

On Fri, Apr 12, 2024 at 02:53:32PM +0100, Marc Zyngier wrote:
> On Fri, 12 Apr 2024 11:58:36 +0100,
> Dawei Li <dawei.li@shingroup.cn> wrote:
> > 
> > In general it's preferable to avoid placing cpumasks on the stack, as
> > for large values of NR_CPUS these can consume significant amounts of
> > stack space and make stack overflows more likely.
> >
> > Remove cpumask var on stack and use proper cpumask API to address it.
> 
> Define proper. Or better, define what is "improper" about the current
> usage.

Sorry for the confusion.

I didn't mean current implementation is 'improper', actually both
implementations share equivalent API usages. I will remove this
misleading expression from commit message.

> 
> >
> > Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
> > ---
> >  drivers/irqchip/irq-gic-v3-its.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index fca888b36680..a821396c4261 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -3826,7 +3826,7 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >  				bool force)
> >  {
> >  	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> > -	struct cpumask common, *table_mask;
> > +	struct cpumask *table_mask;
> >  	unsigned long flags;
> >  	int from, cpu;
> >  
> > @@ -3850,8 +3850,11 @@ static int its_vpe_set_affinity(struct irq_data *d,
> >  	 * If we are offered another CPU in the same GICv4.1 ITS
> >  	 * affinity, pick this one. Otherwise, any CPU will do.
> >  	 */
> > -	if (table_mask && cpumask_and(&common, mask_val, table_mask))
> > -		cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
> > +	if (table_mask && cpumask_intersects(mask_val, table_mask)) {
> > +		cpu = cpumask_test_cpu(from, mask_val) &&
> > +		      cpumask_test_cpu(from, table_mask) ?
> > +		      from : cpumask_first_and(mask_val, table_mask);
> 
> So we may end-up computing the AND of the two bitmaps twice (once for
> cpumask_intersects(), once for cpumask_first_and()), instead of only
> doing it once.

Actually maybe it's possible to merge these 2 bitmap ops into one:

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b36680..7a267777bd0b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,7 +3826,8 @@ static int its_vpe_set_affinity(struct irq_data *d,
                                bool force)
 {
        struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-       struct cpumask common, *table_mask;
+       struct cpumask *table_mask;
+       unsigned int common;
        unsigned long flags;
        int from, cpu;

@@ -3850,10 +3851,13 @@ static int its_vpe_set_affinity(struct irq_data *d,
         * If we are offered another CPU in the same GICv4.1 ITS
         * affinity, pick this one. Otherwise, any CPU will do.
         */
-       if (table_mask && cpumask_and(&common, mask_val, table_mask))
-               cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
-       else
+       if (table_mask && (common = cpumask_first_and(mask_val, table_mask)) < nr_cpu_ids) {
+               cpu = cpumask_test_cpu(from, mask_val) &&
+                     cpumask_test_cpu(from, table_mask) ?
+                     from : common;
+       } else {
                cpu = cpumask_first(mask_val);
+       }

> 
> I don't expect that to be horrible, but I also note that you don't
> even talk about the trade-offs you are choosing to make.

With change above, I assume that the tradeoff is minor and can be ignored?

And I aplogize if I am missing something.

> 
> > +	}
> >  	else
> >  		cpu = cpumask_first(mask_val);
> 
> Please fix the coding style (if () { ... } else { ... }).

Ack.


Thanks,

    Dawei

> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.
>
Re: [PATCH 3/6] irqchip/gic-v3-its: Avoid explicit cpumask allocation on stack
Posted by Marc Zyngier 1 year, 10 months ago
On Sat, 13 Apr 2024 11:29:20 +0100,
Dawei Li <dawei.li@shingroup.cn> wrote:
> 
> Hi Marc,
> 
> Thanks for the review.
> 
> On Fri, Apr 12, 2024 at 02:53:32PM +0100, Marc Zyngier wrote:
> > On Fri, 12 Apr 2024 11:58:36 +0100,
> > Dawei Li <dawei.li@shingroup.cn> wrote:
> > > 
> > > In general it's preferable to avoid placing cpumasks on the stack, as
> > > for large values of NR_CPUS these can consume significant amounts of
> > > stack space and make stack overflows more likely.
> > >
> > > Remove cpumask var on stack and use proper cpumask API to address it.
> > 
> > Define proper. Or better, define what is "improper" about the current
> > usage.
> 
> Sorry for the confusion.
> 
> I didn't mean current implementation is 'improper', actually both
> implementations share equivalent API usages. I will remove this
> misleading expression from commit message.
> 
> > 
> > >
> > > Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
> > > ---
> > >  drivers/irqchip/irq-gic-v3-its.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > > index fca888b36680..a821396c4261 100644
> > > --- a/drivers/irqchip/irq-gic-v3-its.c
> > > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > > @@ -3826,7 +3826,7 @@ static int its_vpe_set_affinity(struct irq_data *d,
> > >  				bool force)
> > >  {
> > >  	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> > > -	struct cpumask common, *table_mask;
> > > +	struct cpumask *table_mask;
> > >  	unsigned long flags;
> > >  	int from, cpu;
> > >  
> > > @@ -3850,8 +3850,11 @@ static int its_vpe_set_affinity(struct irq_data *d,
> > >  	 * If we are offered another CPU in the same GICv4.1 ITS
> > >  	 * affinity, pick this one. Otherwise, any CPU will do.
> > >  	 */
> > > -	if (table_mask && cpumask_and(&common, mask_val, table_mask))
> > > -		cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
> > > +	if (table_mask && cpumask_intersects(mask_val, table_mask)) {
> > > +		cpu = cpumask_test_cpu(from, mask_val) &&
> > > +		      cpumask_test_cpu(from, table_mask) ?
> > > +		      from : cpumask_first_and(mask_val, table_mask);
> > 
> > So we may end-up computing the AND of the two bitmaps twice (once for
> > cpumask_intersects(), once for cpumask_first_and()), instead of only
> > doing it once.
> 
> Actually maybe it's possible to merge these 2 bitmap ops into one:
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index fca888b36680..7a267777bd0b 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3826,7 +3826,8 @@ static int its_vpe_set_affinity(struct irq_data *d,
>                                 bool force)
>  {
>         struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
> -       struct cpumask common, *table_mask;
> +       struct cpumask *table_mask;
> +       unsigned int common;
>         unsigned long flags;
>         int from, cpu;
> 
> @@ -3850,10 +3851,13 @@ static int its_vpe_set_affinity(struct irq_data *d,
>          * If we are offered another CPU in the same GICv4.1 ITS
>          * affinity, pick this one. Otherwise, any CPU will do.
>          */
> -       if (table_mask && cpumask_and(&common, mask_val, table_mask))
> -               cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
> -       else
> +       if (table_mask && (common = cpumask_first_and(mask_val, table_mask)) < nr_cpu_ids) {
> +               cpu = cpumask_test_cpu(from, mask_val) &&
> +                     cpumask_test_cpu(from, table_mask) ?
> +                     from : common;
> +       } else {
>                 cpu = cpumask_first(mask_val);
> +       }
> 
> > 
> > I don't expect that to be horrible, but I also note that you don't
> > even talk about the trade-offs you are choosing to make.
> 
> With change above, I assume that the tradeoff is minor and can be ignored?

Yup, this works. My preference would be something which I find
slightly more readable though (avoiding assignment in the
conditional):

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index fca888b36680..299dafc7c0ea 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3826,9 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
 				bool force)
 {
 	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-	struct cpumask common, *table_mask;
+	struct cpumask *table_mask;
 	unsigned long flags;
-	int from, cpu;
+	int from, cpu = nr_cpu_ids;
 
 	/*
 	 * Changing affinity is mega expensive, so let's be as lazy as
@@ -3850,10 +3850,15 @@ static int its_vpe_set_affinity(struct irq_data *d,
 	 * If we are offered another CPU in the same GICv4.1 ITS
 	 * affinity, pick this one. Otherwise, any CPU will do.
 	 */
-	if (table_mask && cpumask_and(&common, mask_val, table_mask))
-		cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
-	else
+	if (table_mask)
+		cpu = cpumask_any_and(mask_val, table_mask);
+	if (cpu < nr_cpu_ids) {
+		 if (cpumask_test_cpu(from, mask_val) &&
+		     cpumask_test_cpu(from, table_mask))
+			 cpu = from;
+	} else {
 		cpu = cpumask_first(mask_val);
+	}
 
 	if (from == cpu)
 		goto out;

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.