[PATCH v2 0/3] genirq: Managed affinity fixes

Marc Zyngier posted 3 patches 4 years, 3 months ago
There is a newer version of this series
drivers/irqchip/irq-gic-v3-its.c |  2 +-
kernel/irq/manage.c              | 25 +++++++++++++++++--------
kernel/irq/msi.c                 | 15 +++++++++++++++
3 files changed, 33 insertions(+), 9 deletions(-)
[PATCH v2 0/3] genirq: Managed affinity fixes
Posted by Marc Zyngier 4 years, 3 months ago
John (and later on David) reported[1] a while ago that booting with
maxcpus=1, managed affinity devices would fail to get the interrupts
that were associated with offlined CPUs.

Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
non-housekeeping CPUs instead of the affinity that was passed down as
a parameter.

[1] can be fixed by not trying to activate these interrupts if no CPU
that can satisfy the affinity is present (a patch addressing this was
already posted[3])

[2] is a consequence of affinities containing non-online CPUs being
passed down to the interrupt controller driver and the ITS driver
trying to paper over that by ignoring the affinity parameter and doing
its own (stupid) thing. It would be better to (a) get the core code to
remove the offline CPUs from the affinity mask at all times, and (b)
fix the drivers so that they can trust the core code not to trip them.

This small series, based on 5.17, addresses the above.

Thanks,

	M.

[1] https://lore.kernel.org/r/78615d08-1764-c895-f3b7-bfddfbcbdfb9@huawei.com
[2] https://lore.kernel.org/r/20220124073440.88598-1-wangxiongfeng2@huawei.com
[3] https://lore.kernel.org/r/20220307190625.254426-1-maz@kernel.org

Marc Zyngier (3):
  genirq/msi: Shutdown managed interrupts with unsatifiable affinities
  genirq: Always limit the affinity to online CPUs
  irqchip/gic-v3: Always trust the managed affinity provided by the core
    code

 drivers/irqchip/irq-gic-v3-its.c |  2 +-
 kernel/irq/manage.c              | 25 +++++++++++++++++--------
 kernel/irq/msi.c                 | 15 +++++++++++++++
 3 files changed, 33 insertions(+), 9 deletions(-)

-- 
2.34.1
Re: [PATCH v2 0/3] genirq: Managed affinity fixes
Posted by Xiongfeng Wang 4 years, 3 months ago
Hi, Marc

On 2022/3/22 3:36, Marc Zyngier wrote:
> John (and later on David) reported[1] a while ago that booting with
> maxcpus=1, managed affinity devices would fail to get the interrupts
> that were associated with offlined CPUs.
> 
> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
> non-housekeeping CPUs instead of the affinity that was passed down as
> a parameter.
> 
> [1] can be fixed by not trying to activate these interrupts if no CPU
> that can satisfy the affinity is present (a patch addressing this was
> already posted[3])
> 
> [2] is a consequence of affinities containing non-online CPUs being
> passed down to the interrupt controller driver and the ITS driver
> trying to paper over that by ignoring the affinity parameter and doing
> its own (stupid) thing. It would be better to (a) get the core code to
> remove the offline CPUs from the affinity mask at all times, and (b)
> fix the drivers so that they can trust the core code not to trip them.
> 
> This small series, based on 5.17, addresses the above.

I have tested this patchset on D06. It works well with kernel parameter
'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'.
Also the 'effective_affinity' is correct. Thanks!
By the way, I merged the second patch manually because of conflicts.
Maybe I lack some patches on your local repo.

Thanks,
Xiongfeng

> 
> Thanks,
> 
> 	M.
> 
> [1] https://lore.kernel.org/r/78615d08-1764-c895-f3b7-bfddfbcbdfb9@huawei.com
> [2] https://lore.kernel.org/r/20220124073440.88598-1-wangxiongfeng2@huawei.com
> [3] https://lore.kernel.org/r/20220307190625.254426-1-maz@kernel.org
> 
> Marc Zyngier (3):
>   genirq/msi: Shutdown managed interrupts with unsatifiable affinities
>   genirq: Always limit the affinity to online CPUs
>   irqchip/gic-v3: Always trust the managed affinity provided by the core
>     code
> 
>  drivers/irqchip/irq-gic-v3-its.c |  2 +-
>  kernel/irq/manage.c              | 25 +++++++++++++++++--------
>  kernel/irq/msi.c                 | 15 +++++++++++++++
>  3 files changed, 33 insertions(+), 9 deletions(-)
>
Re: [PATCH v2 0/3] genirq: Managed affinity fixes
Posted by Marc Zyngier 4 years, 3 months ago
Hi Xiongfeng,

On Wed, 23 Mar 2022 03:52:46 +0000,
Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
> 
> Hi, Marc
> 
> On 2022/3/22 3:36, Marc Zyngier wrote:
> > John (and later on David) reported[1] a while ago that booting with
> > maxcpus=1, managed affinity devices would fail to get the interrupts
> > that were associated with offlined CPUs.
> > 
> > Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
> > non-housekeeping CPUs instead of the affinity that was passed down as
> > a parameter.
> > 
> > [1] can be fixed by not trying to activate these interrupts if no CPU
> > that can satisfy the affinity is present (a patch addressing this was
> > already posted[3])
> > 
> > [2] is a consequence of affinities containing non-online CPUs being
> > passed down to the interrupt controller driver and the ITS driver
> > trying to paper over that by ignoring the affinity parameter and doing
> > its own (stupid) thing. It would be better to (a) get the core code to
> > remove the offline CPUs from the affinity mask at all times, and (b)
> > fix the drivers so that they can trust the core code not to trip them.
> > 
> > This small series, based on 5.17, addresses the above.
> 
> I have tested this patchset on D06. It works well with kernel parameter
> 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'.
> Also the 'effective_affinity' is correct. Thanks!

Thanks for having given it a go.

> By the way, I merged the second patch manually because of conflicts.
> Maybe I lack some patches on your local repo.

That's odd, as the patches are directly sitting on top of 5.17 in my
tree (see [1]). Do you have any out of tree patches around? Please
make sure you test this without any extra change.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/managed-affinity-fixes

-- 
Without deviation from the norm, progress is not possible.
Re: [PATCH v2 0/3] genirq: Managed affinity fixes
Posted by Xiongfeng Wang 4 years, 3 months ago

On 2022/3/23 16:56, Marc Zyngier wrote:
> Hi Xiongfeng,
> 
> On Wed, 23 Mar 2022 03:52:46 +0000,
> Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
>>
>> Hi, Marc
>>
>> On 2022/3/22 3:36, Marc Zyngier wrote:
>>> John (and later on David) reported[1] a while ago that booting with
>>> maxcpus=1, managed affinity devices would fail to get the interrupts
>>> that were associated with offlined CPUs.
>>>
>>> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
>>> non-housekeeping CPUs instead of the affinity that was passed down as
>>> a parameter.
>>>
>>> [1] can be fixed by not trying to activate these interrupts if no CPU
>>> that can satisfy the affinity is present (a patch addressing this was
>>> already posted[3])
>>>
>>> [2] is a consequence of affinities containing non-online CPUs being
>>> passed down to the interrupt controller driver and the ITS driver
>>> trying to paper over that by ignoring the affinity parameter and doing
>>> its own (stupid) thing. It would be better to (a) get the core code to
>>> remove the offline CPUs from the affinity mask at all times, and (b)
>>> fix the drivers so that they can trust the core code not to trip them.
>>>
>>> This small series, based on 5.17, addresses the above.
>>
>> I have tested this patchset on D06. It works well with kernel parameter
>> 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'.
>> Also the 'effective_affinity' is correct. Thanks!
> 
> Thanks for having given it a go.
> 
>> By the way, I merged the second patch manually because of conflicts.
>> Maybe I lack some patches on your local repo.
> 
> That's odd, as the patches are directly sitting on top of 5.17 in my
> tree (see [1]). Do you have any out of tree patches around? Please
> make sure you test this without any extra change.

I apply the patchset based on the latest mainline kernel. The latest commit is
  commit 3bf03b9a0839c9fb06927ae53ebd0f960b19d408
  Merge branch 'akpm' (patches from Andrew)
I didn't change the modification of the second patch. Only resolve the
context conflicts, which is cause by the following commit.
  commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
  sched/isolation: Use single feature type while referring to housekeeping cpumask
It changed 'HK_FLAG_MANAGED_IRQ' to 'HK_TYPE_MANAGED_IRQ'.

Thanks,
Xiongfeng

> 
> Thanks,
> 
> 	M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/managed-affinity-fixes
>
Re: [PATCH v2 0/3] genirq: Managed affinity fixes
Posted by Marc Zyngier 4 years, 3 months ago
On Wed, 23 Mar 2022 10:58:33 +0000,
Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
> 
> 
> 
> On 2022/3/23 16:56, Marc Zyngier wrote:
> > Hi Xiongfeng,
> > 
> > On Wed, 23 Mar 2022 03:52:46 +0000,
> > Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote:
> >>
> >> Hi, Marc
> >>
> >> On 2022/3/22 3:36, Marc Zyngier wrote:
> >>> John (and later on David) reported[1] a while ago that booting with
> >>> maxcpus=1, managed affinity devices would fail to get the interrupts
> >>> that were associated with offlined CPUs.
> >>>
> >>> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
> >>> non-housekeeping CPUs instead of the affinity that was passed down as
> >>> a parameter.
> >>>
> >>> [1] can be fixed by not trying to activate these interrupts if no CPU
> >>> that can satisfy the affinity is present (a patch addressing this was
> >>> already posted[3])
> >>>
> >>> [2] is a consequence of affinities containing non-online CPUs being
> >>> passed down to the interrupt controller driver and the ITS driver
> >>> trying to paper over that by ignoring the affinity parameter and doing
> >>> its own (stupid) thing. It would be better to (a) get the core code to
> >>> remove the offline CPUs from the affinity mask at all times, and (b)
> >>> fix the drivers so that they can trust the core code not to trip them.
> >>>
> >>> This small series, based on 5.17, addresses the above.
> >>
> >> I have tested this patchset on D06. It works well with kernel parameter
> >> 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'.
> >> Also the 'effective_affinity' is correct. Thanks!
> > 
> > Thanks for having given it a go.
> > 
> >> By the way, I merged the second patch manually because of conflicts.
> >> Maybe I lack some patches on your local repo.
> > 
> > That's odd, as the patches are directly sitting on top of 5.17 in my
> > tree (see [1]). Do you have any out of tree patches around? Please
> > make sure you test this without any extra change.
> 
> I apply the patchset based on the latest mainline kernel. The latest commit is
>   commit 3bf03b9a0839c9fb06927ae53ebd0f960b19d408
>   Merge branch 'akpm' (patches from Andrew)
> I didn't change the modification of the second patch. Only resolve the
> context conflicts, which is cause by the following commit.
>   commit 04d4e665a60902cf36e7ad39af1179cb5df542ad
>   sched/isolation: Use single feature type while referring to housekeeping cpumask
> It changed 'HK_FLAG_MANAGED_IRQ' to 'HK_TYPE_MANAGED_IRQ'.

Ah, that's on top of linux/master then. Yeah, I expect some small
conflicts (this is a popular spot). I'll rebase things at some point
once (and if) we agree that patch #2 is the right thing to do.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.