drivers/irqchip/irq-gic-v3-its.c | 2 +- kernel/irq/manage.c | 25 +++++++++++++++++-------- kernel/irq/msi.c | 15 +++++++++++++++ 3 files changed, 33 insertions(+), 9 deletions(-)
John (and later on David) reported[1] a while ago that booting with
maxcpus=1, managed affinity devices would fail to get the interrupts
that were associated with offlined CPUs.
Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use
non-housekeeping CPUs instead of the affinity that was passed down as
a parameter.
[1] can be fixed by not trying to activate these interrupts if no CPU
that can satisfy the affinity is present (a patch addressing this was
already posted[3])
[2] is a consequence of affinities containing non-online CPUs being
passed down to the interrupt controller driver and the ITS driver
trying to paper over that by ignoring the affinity parameter and doing
its own (stupid) thing. It would be better to (a) get the core code to
remove the offline CPUs from the affinity mask at all times, and (b)
fix the drivers so that they can trust the core code not to trip them.
This small series, based on 5.17, addresses the above.
Thanks,
M.
[1] https://lore.kernel.org/r/78615d08-1764-c895-f3b7-bfddfbcbdfb9@huawei.com
[2] https://lore.kernel.org/r/20220124073440.88598-1-wangxiongfeng2@huawei.com
[3] https://lore.kernel.org/r/20220307190625.254426-1-maz@kernel.org
Marc Zyngier (3):
genirq/msi: Shutdown managed interrupts with unsatifiable affinities
genirq: Always limit the affinity to online CPUs
irqchip/gic-v3: Always trust the managed affinity provided by the core
code
drivers/irqchip/irq-gic-v3-its.c | 2 +-
kernel/irq/manage.c | 25 +++++++++++++++++--------
kernel/irq/msi.c | 15 +++++++++++++++
3 files changed, 33 insertions(+), 9 deletions(-)
--
2.34.1
Hi, Marc On 2022/3/22 3:36, Marc Zyngier wrote: > John (and later on David) reported[1] a while ago that booting with > maxcpus=1, managed affinity devices would fail to get the interrupts > that were associated with offlined CPUs. > > Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use > non-housekeeping CPUs instead of the affinity that was passed down as > a parameter. > > [1] can be fixed by not trying to activate these interrupts if no CPU > that can satisfy the affinity is present (a patch addressing this was > already posted[3]) > > [2] is a consequence of affinities containing non-online CPUs being > passed down to the interrupt controller driver and the ITS driver > trying to paper over that by ignoring the affinity parameter and doing > its own (stupid) thing. It would be better to (a) get the core code to > remove the offline CPUs from the affinity mask at all times, and (b) > fix the drivers so that they can trust the core code not to trip them. > > This small series, based on 5.17, addresses the above. I have tested this patchset on D06. It works well with kernel parameter 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'. Also the 'effective_affinity' is correct. Thanks! By the way, I merged the second patch manually because of conflicts. Maybe I lack some patches on your local repo. Thanks, Xiongfeng > > Thanks, > > M. > > [1] https://lore.kernel.org/r/78615d08-1764-c895-f3b7-bfddfbcbdfb9@huawei.com > [2] https://lore.kernel.org/r/20220124073440.88598-1-wangxiongfeng2@huawei.com > [3] https://lore.kernel.org/r/20220307190625.254426-1-maz@kernel.org > > Marc Zyngier (3): > genirq/msi: Shutdown managed interrupts with unsatifiable affinities > genirq: Always limit the affinity to online CPUs > irqchip/gic-v3: Always trust the managed affinity provided by the core > code > > drivers/irqchip/irq-gic-v3-its.c | 2 +- > kernel/irq/manage.c | 25 +++++++++++++++++-------- > kernel/irq/msi.c | 15 +++++++++++++++ > 3 files changed, 33 insertions(+), 9 deletions(-) >
Hi Xiongfeng, On Wed, 23 Mar 2022 03:52:46 +0000, Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote: > > Hi, Marc > > On 2022/3/22 3:36, Marc Zyngier wrote: > > John (and later on David) reported[1] a while ago that booting with > > maxcpus=1, managed affinity devices would fail to get the interrupts > > that were associated with offlined CPUs. > > > > Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use > > non-housekeeping CPUs instead of the affinity that was passed down as > > a parameter. > > > > [1] can be fixed by not trying to activate these interrupts if no CPU > > that can satisfy the affinity is present (a patch addressing this was > > already posted[3]) > > > > [2] is a consequence of affinities containing non-online CPUs being > > passed down to the interrupt controller driver and the ITS driver > > trying to paper over that by ignoring the affinity parameter and doing > > its own (stupid) thing. It would be better to (a) get the core code to > > remove the offline CPUs from the affinity mask at all times, and (b) > > fix the drivers so that they can trust the core code not to trip them. > > > > This small series, based on 5.17, addresses the above. > > I have tested this patchset on D06. It works well with kernel parameter > 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'. > Also the 'effective_affinity' is correct. Thanks! Thanks for having given it a go. > By the way, I merged the second patch manually because of conflicts. > Maybe I lack some patches on your local repo. That's odd, as the patches are directly sitting on top of 5.17 in my tree (see [1]). Do you have any out of tree patches around? Please make sure you test this without any extra change. Thanks, M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/managed-affinity-fixes -- Without deviation from the norm, progress is not possible.
On 2022/3/23 16:56, Marc Zyngier wrote: > Hi Xiongfeng, > > On Wed, 23 Mar 2022 03:52:46 +0000, > Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote: >> >> Hi, Marc >> >> On 2022/3/22 3:36, Marc Zyngier wrote: >>> John (and later on David) reported[1] a while ago that booting with >>> maxcpus=1, managed affinity devices would fail to get the interrupts >>> that were associated with offlined CPUs. >>> >>> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use >>> non-housekeeping CPUs instead of the affinity that was passed down as >>> a parameter. >>> >>> [1] can be fixed by not trying to activate these interrupts if no CPU >>> that can satisfy the affinity is present (a patch addressing this was >>> already posted[3]) >>> >>> [2] is a consequence of affinities containing non-online CPUs being >>> passed down to the interrupt controller driver and the ITS driver >>> trying to paper over that by ignoring the affinity parameter and doing >>> its own (stupid) thing. It would be better to (a) get the core code to >>> remove the offline CPUs from the affinity mask at all times, and (b) >>> fix the drivers so that they can trust the core code not to trip them. >>> >>> This small series, based on 5.17, addresses the above. >> >> I have tested this patchset on D06. It works well with kernel parameter >> 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'. >> Also the 'effective_affinity' is correct. Thanks! > > Thanks for having given it a go. > >> By the way, I merged the second patch manually because of conflicts. >> Maybe I lack some patches on your local repo. > > That's odd, as the patches are directly sitting on top of 5.17 in my > tree (see [1]). Do you have any out of tree patches around? Please > make sure you test this without any extra change. I apply the patchset based on the latest mainline kernel. The latest commit is commit 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 Merge branch 'akpm' (patches from Andrew) I didn't change the modification of the second patch. Only resolve the context conflicts, which is cause by the following commit. commit 04d4e665a60902cf36e7ad39af1179cb5df542ad sched/isolation: Use single feature type while referring to housekeeping cpumask It changed 'HK_FLAG_MANAGED_IRQ' to 'HK_TYPE_MANAGED_IRQ'. Thanks, Xiongfeng > > Thanks, > > M. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/managed-affinity-fixes >
On Wed, 23 Mar 2022 10:58:33 +0000, Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote: > > > > On 2022/3/23 16:56, Marc Zyngier wrote: > > Hi Xiongfeng, > > > > On Wed, 23 Mar 2022 03:52:46 +0000, > > Xiongfeng Wang <wangxiongfeng2@huawei.com> wrote: > >> > >> Hi, Marc > >> > >> On 2022/3/22 3:36, Marc Zyngier wrote: > >>> John (and later on David) reported[1] a while ago that booting with > >>> maxcpus=1, managed affinity devices would fail to get the interrupts > >>> that were associated with offlined CPUs. > >>> > >>> Similarly, Xiongfeng reported[2] that the GICv3 ITS would sometime use > >>> non-housekeeping CPUs instead of the affinity that was passed down as > >>> a parameter. > >>> > >>> [1] can be fixed by not trying to activate these interrupts if no CPU > >>> that can satisfy the affinity is present (a patch addressing this was > >>> already posted[3]) > >>> > >>> [2] is a consequence of affinities containing non-online CPUs being > >>> passed down to the interrupt controller driver and the ITS driver > >>> trying to paper over that by ignoring the affinity parameter and doing > >>> its own (stupid) thing. It would be better to (a) get the core code to > >>> remove the offline CPUs from the affinity mask at all times, and (b) > >>> fix the drivers so that they can trust the core code not to trip them. > >>> > >>> This small series, based on 5.17, addresses the above. > >> > >> I have tested this patchset on D06. It works well with kernel parameter > >> 'maxcpus=1' or 'nohz_full=1-127 isolcpus=nohz,domain,managed_irq,1-127'. > >> Also the 'effective_affinity' is correct. Thanks! > > > > Thanks for having given it a go. > > > >> By the way, I merged the second patch manually because of conflicts. > >> Maybe I lack some patches on your local repo. > > > > That's odd, as the patches are directly sitting on top of 5.17 in my > > tree (see [1]). Do you have any out of tree patches around? Please > > make sure you test this without any extra change. > > I apply the patchset based on the latest mainline kernel. The latest commit is > commit 3bf03b9a0839c9fb06927ae53ebd0f960b19d408 > Merge branch 'akpm' (patches from Andrew) > I didn't change the modification of the second patch. Only resolve the > context conflicts, which is cause by the following commit. > commit 04d4e665a60902cf36e7ad39af1179cb5df542ad > sched/isolation: Use single feature type while referring to housekeeping cpumask > It changed 'HK_FLAG_MANAGED_IRQ' to 'HK_TYPE_MANAGED_IRQ'. Ah, that's on top of linux/master then. Yeah, I expect some small conflicts (this is a popular spot). I'll rebase things at some point once (and if) we agree that patch #2 is the right thing to do. Thanks, M. -- Without deviation from the norm, progress is not possible.
© 2016 - 2026 Red Hat, Inc.