Keeping channels enabled when they're unused is only causing problems:
Extra interrupts harm performance, and extra nested interrupts could even
have caused worse problems.
Note that no explicit "enable" is necessary - that's implicitly done by
set_channel_irq_affinity() once the channel goes into use again.
Along with disabling the counter, also "clear" the channel's "next event",
for it to be properly written by whatever the next user is going to want
(possibly avoiding too early an IRQ).
Further, along the same lines, don't enable channels early when starting
up an IRQ. This similarly should happen no earlier than from
set_channel_irq_affinity() (here: once a channel goes into use the very
first time). This eliminates a single instance of
(XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
(XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
during boot. (Why exactly there's only one instance, when we use multiple
counters and hence multiple IRQs, I can't tell. My understanding would be
that this was due to __hpet_setup_msi_irq() being called only after
request_irq() [and hence the .startup handler], yet that should have
affected all channels.)
Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
A window still remains for IRQs to be caused by stale comparator values:
hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
Should we also write the comparator to "far into the future"?
Furthermore this prolongues the window until "old" vectors may be released
again, as this way we potentially (and intentionally) delay the ocurrence
of the next IRQ for the channel in question. (This issue will disappear
once we switch to a fixed, global vector.)
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -262,10 +262,9 @@ static void cf_check hpet_msi_unmask(str
ch->msi.msi_attrib.host_masked = 0;
}
-static void cf_check hpet_msi_mask(struct irq_desc *desc)
+static void hpet_disable_channel(struct hpet_event_channel *ch)
{
u32 cfg;
- struct hpet_event_channel *ch = desc->action->dev_id;
cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
cfg &= ~HPET_TN_ENABLE;
@@ -273,6 +272,11 @@ static void cf_check hpet_msi_mask(struc
ch->msi.msi_attrib.host_masked = 1;
}
+static void cf_check hpet_msi_mask(struct irq_desc *desc)
+{
+ hpet_disable_channel(desc->action->dev_id);
+}
+
static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg *msg)
{
ch->msi.msg = *msg;
@@ -295,12 +299,6 @@ static int hpet_msi_write(struct hpet_ev
return 0;
}
-static unsigned int cf_check hpet_msi_startup(struct irq_desc *desc)
-{
- hpet_msi_unmask(desc);
- return 0;
-}
-
#define hpet_msi_shutdown hpet_msi_mask
static void cf_check hpet_msi_set_affinity(
@@ -326,7 +324,7 @@ static void cf_check hpet_msi_set_affini
*/
static hw_irq_controller hpet_msi_type = {
.typename = "HPET-MSI",
- .startup = hpet_msi_startup,
+ .startup = irq_startup_none,
.shutdown = hpet_msi_shutdown,
.enable = hpet_msi_unmask,
.disable = hpet_msi_mask,
@@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
spin_unlock_irq(&ch->lock);
else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
{
+ hpet_disable_channel(ch);
+ ch->next_event = STIME_MAX;
ch->cpu = -1;
clear_bit(HPET_EVT_USED_BIT, &ch->flags);
spin_unlock_irq(&ch->lock);
On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
> Keeping channels enabled when they're unused is only causing problems:
> Extra interrupts harm performance, and extra nested interrupts could even
> have caused worse problems.
>
> Note that no explicit "enable" is necessary - that's implicitly done by
> set_channel_irq_affinity() once the channel goes into use again.
>
> Along with disabling the counter, also "clear" the channel's "next event",
> for it to be properly written by whatever the next user is going to want
> (possibly avoiding too early an IRQ).
>
> Further, along the same lines, don't enable channels early when starting
> up an IRQ. This similarly should happen no earlier than from
> set_channel_irq_affinity() (here: once a channel goes into use the very
> first time). This eliminates a single instance of
>
> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>
> during boot. (Why exactly there's only one instance, when we use multiple
> counters and hence multiple IRQs, I can't tell. My understanding would be
> that this was due to __hpet_setup_msi_irq() being called only after
> request_irq() [and hence the .startup handler], yet that should have
> affected all channels.)
>
> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> A window still remains for IRQs to be caused by stale comparator values:
> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
> Should we also write the comparator to "far into the future"?
I think we can possibly live with this to avoid doing an extra MMIO
access?
Thanks, Roger.
On 16.10.2025 18:31, Roger Pau Monné wrote:
> On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
>> Keeping channels enabled when they're unused is only causing problems:
>> Extra interrupts harm performance, and extra nested interrupts could even
>> have caused worse problems.
>>
>> Note that no explicit "enable" is necessary - that's implicitly done by
>> set_channel_irq_affinity() once the channel goes into use again.
>>
>> Along with disabling the counter, also "clear" the channel's "next event",
>> for it to be properly written by whatever the next user is going to want
>> (possibly avoiding too early an IRQ).
>>
>> Further, along the same lines, don't enable channels early when starting
>> up an IRQ. This similarly should happen no earlier than from
>> set_channel_irq_affinity() (here: once a channel goes into use the very
>> first time). This eliminates a single instance of
>>
>> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
>> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>>
>> during boot. (Why exactly there's only one instance, when we use multiple
>> counters and hence multiple IRQs, I can't tell. My understanding would be
>> that this was due to __hpet_setup_msi_irq() being called only after
>> request_irq() [and hence the .startup handler], yet that should have
>> affected all channels.)
>>
>> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Thanks, but I realized I want to make one further change here: I want to clear
the pointer when handing off the channel to another CPU while detaching. That
is the one point where we clearly know the affinity moves off of the CPU that
is recording the channel as least recently used one. Are you happy for me to
keep the R-b with that extra change?
>> ---
>> A window still remains for IRQs to be caused by stale comparator values:
>> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
>> Should we also write the comparator to "far into the future"?
>
> I think we can possibly live with this to avoid doing an extra MMIO
> access?
I think we can; which one's more beneficial I simply don't know.
Jan
On 17.10.2025 08:08, Jan Beulich wrote:
> On 16.10.2025 18:31, Roger Pau Monné wrote:
>> On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
>>> Keeping channels enabled when they're unused is only causing problems:
>>> Extra interrupts harm performance, and extra nested interrupts could even
>>> have caused worse problems.
>>>
>>> Note that no explicit "enable" is necessary - that's implicitly done by
>>> set_channel_irq_affinity() once the channel goes into use again.
>>>
>>> Along with disabling the counter, also "clear" the channel's "next event",
>>> for it to be properly written by whatever the next user is going to want
>>> (possibly avoiding too early an IRQ).
>>>
>>> Further, along the same lines, don't enable channels early when starting
>>> up an IRQ. This similarly should happen no earlier than from
>>> set_channel_irq_affinity() (here: once a channel goes into use the very
>>> first time). This eliminates a single instance of
>>>
>>> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
>>> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>>>
>>> during boot. (Why exactly there's only one instance, when we use multiple
>>> counters and hence multiple IRQs, I can't tell. My understanding would be
>>> that this was due to __hpet_setup_msi_irq() being called only after
>>> request_irq() [and hence the .startup handler], yet that should have
>>> affected all channels.)
>>>
>>> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
> Thanks, but I realized I want to make one further change here: I want to clear
> the pointer when handing off the channel to another CPU while detaching. That
> is the one point where we clearly know the affinity moves off of the CPU that
> is recording the channel as least recently used one. Are you happy for me to
> keep the R-b with that extra change?
Oh, that was wrong here. That's a change I'm meaning to make to patch 1.
Jan
On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
> Keeping channels enabled when they're unused is only causing problems:
> Extra interrupts harm performance, and extra nested interrupts could even
> have caused worse problems.
>
> Note that no explicit "enable" is necessary - that's implicitly done by
> set_channel_irq_affinity() once the channel goes into use again.
>
> Along with disabling the counter, also "clear" the channel's "next event",
> for it to be properly written by whatever the next user is going to want
> (possibly avoiding too early an IRQ).
>
> Further, along the same lines, don't enable channels early when starting
> up an IRQ. This similarly should happen no earlier than from
> set_channel_irq_affinity() (here: once a channel goes into use the very
> first time). This eliminates a single instance of
>
> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>
> during boot. (Why exactly there's only one instance, when we use multiple
> counters and hence multiple IRQs, I can't tell. My understanding would be
> that this was due to __hpet_setup_msi_irq() being called only after
> request_irq() [and hence the .startup handler], yet that should have
> affected all channels.)
>
> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> A window still remains for IRQs to be caused by stale comparator values:
> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
> Should we also write the comparator to "far into the future"?
It might be helpful to reprogram the comparator as far ahead as
possible in hpet_attach_channel() ahead of enabling it, or
alternatively in hpet_detach_channel().
> Furthermore this prolongues the window until "old" vectors may be released
> again, as this way we potentially (and intentionally) delay the ocurrence
> of the next IRQ for the channel in question. (This issue will disappear
> once we switch to a fixed, global vector.)
>
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c
> @@ -262,10 +262,9 @@ static void cf_check hpet_msi_unmask(str
> ch->msi.msi_attrib.host_masked = 0;
> }
>
> -static void cf_check hpet_msi_mask(struct irq_desc *desc)
> +static void hpet_disable_channel(struct hpet_event_channel *ch)
> {
> u32 cfg;
> - struct hpet_event_channel *ch = desc->action->dev_id;
>
> cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
> cfg &= ~HPET_TN_ENABLE;
> @@ -273,6 +272,11 @@ static void cf_check hpet_msi_mask(struc
> ch->msi.msi_attrib.host_masked = 1;
> }
>
> +static void cf_check hpet_msi_mask(struct irq_desc *desc)
> +{
> + hpet_disable_channel(desc->action->dev_id);
> +}
> +
> static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg *msg)
> {
> ch->msi.msg = *msg;
> @@ -295,12 +299,6 @@ static int hpet_msi_write(struct hpet_ev
> return 0;
> }
>
> -static unsigned int cf_check hpet_msi_startup(struct irq_desc *desc)
> -{
> - hpet_msi_unmask(desc);
> - return 0;
> -}
> -
> #define hpet_msi_shutdown hpet_msi_mask
>
> static void cf_check hpet_msi_set_affinity(
> @@ -326,7 +324,7 @@ static void cf_check hpet_msi_set_affini
> */
> static hw_irq_controller hpet_msi_type = {
> .typename = "HPET-MSI",
> - .startup = hpet_msi_startup,
> + .startup = irq_startup_none,
> .shutdown = hpet_msi_shutdown,
> .enable = hpet_msi_unmask,
> .disable = hpet_msi_mask,
> @@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
> spin_unlock_irq(&ch->lock);
> else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
> {
> + hpet_disable_channel(ch);
> + ch->next_event = STIME_MAX;
> ch->cpu = -1;
> clear_bit(HPET_EVT_USED_BIT, &ch->flags);
> spin_unlock_irq(&ch->lock);
I'm a bit confused with what the HPET code does here (don't know
enough about it, and there are no comments). Why is the timer rotated
to a CPU in ch->cpumask once disabled, instead of just being plain
disabled?
Thanks, Roger.
On 16.10.2025 13:42, Roger Pau Monné wrote:
> On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
>> Keeping channels enabled when they're unused is only causing problems:
>> Extra interrupts harm performance, and extra nested interrupts could even
>> have caused worse problems.
>>
>> Note that no explicit "enable" is necessary - that's implicitly done by
>> set_channel_irq_affinity() once the channel goes into use again.
>>
>> Along with disabling the counter, also "clear" the channel's "next event",
>> for it to be properly written by whatever the next user is going to want
>> (possibly avoiding too early an IRQ).
>>
>> Further, along the same lines, don't enable channels early when starting
>> up an IRQ. This similarly should happen no earlier than from
>> set_channel_irq_affinity() (here: once a channel goes into use the very
>> first time). This eliminates a single instance of
>>
>> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
>> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>>
>> during boot. (Why exactly there's only one instance, when we use multiple
>> counters and hence multiple IRQs, I can't tell. My understanding would be
>> that this was due to __hpet_setup_msi_irq() being called only after
>> request_irq() [and hence the .startup handler], yet that should have
>> affected all channels.)
>>
>> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> A window still remains for IRQs to be caused by stale comparator values:
>> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
>> Should we also write the comparator to "far into the future"?
>
> It might be helpful to reprogram the comparator as far ahead as
> possible in hpet_attach_channel() ahead of enabling it, or
> alternatively in hpet_detach_channel().
The downside is yet another (slow) MMIO access. Hence why I didn't make
such a change right away. Plus I wasn't quite sure about the locking there:
Imo if we did so, it would be better if the lock wasn't dropped
intermediately.
>> @@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
>> spin_unlock_irq(&ch->lock);
>> else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
>> {
>> + hpet_disable_channel(ch);
>> + ch->next_event = STIME_MAX;
>> ch->cpu = -1;
>> clear_bit(HPET_EVT_USED_BIT, &ch->flags);
>> spin_unlock_irq(&ch->lock);
>
> I'm a bit confused with what the HPET code does here (don't know
> enough about it, and there are no comments). Why is the timer rotated
> to a CPU in ch->cpumask once disabled, instead of just being plain
> disabled?
Because it will still be needed by the other CPUs that the channel is
shared with.
Jan
On Thu, Oct 16, 2025 at 01:57:41PM +0200, Jan Beulich wrote:
> On 16.10.2025 13:42, Roger Pau Monné wrote:
> > On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
> >> Keeping channels enabled when they're unused is only causing problems:
> >> Extra interrupts harm performance, and extra nested interrupts could even
> >> have caused worse problems.
> >>
> >> Note that no explicit "enable" is necessary - that's implicitly done by
> >> set_channel_irq_affinity() once the channel goes into use again.
> >>
> >> Along with disabling the counter, also "clear" the channel's "next event",
> >> for it to be properly written by whatever the next user is going to want
> >> (possibly avoiding too early an IRQ).
> >>
> >> Further, along the same lines, don't enable channels early when starting
> >> up an IRQ. This similarly should happen no earlier than from
> >> set_channel_irq_affinity() (here: once a channel goes into use the very
> >> first time). This eliminates a single instance of
> >>
> >> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
> >> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
> >>
> >> during boot. (Why exactly there's only one instance, when we use multiple
> >> counters and hence multiple IRQs, I can't tell. My understanding would be
> >> that this was due to __hpet_setup_msi_irq() being called only after
> >> request_irq() [and hence the .startup handler], yet that should have
> >> affected all channels.)
> >>
> >> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> A window still remains for IRQs to be caused by stale comparator values:
> >> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
> >> Should we also write the comparator to "far into the future"?
> >
> > It might be helpful to reprogram the comparator as far ahead as
> > possible in hpet_attach_channel() ahead of enabling it, or
> > alternatively in hpet_detach_channel().
>
> The downside is yet another (slow) MMIO access. Hence why I didn't make
> such a change right away. Plus I wasn't quite sure about the locking there:
> Imo if we did so, it would be better if the lock wasn't dropped
> intermediately.
>
> >> @@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
> >> spin_unlock_irq(&ch->lock);
> >> else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
> >> {
> >> + hpet_disable_channel(ch);
> >> + ch->next_event = STIME_MAX;
> >> ch->cpu = -1;
> >> clear_bit(HPET_EVT_USED_BIT, &ch->flags);
> >> spin_unlock_irq(&ch->lock);
> >
> > I'm a bit confused with what the HPET code does here (don't know
> > enough about it, and there are no comments). Why is the timer rotated
> > to a CPU in ch->cpumask once disabled, instead of just being plain
> > disabled?
>
> Because it will still be needed by the other CPUs that the channel is
> shared with.
Yeah, missed that part, the channel is migrated to a different CPU. I
wonder however: since an active channel can be migrated around between
CPUs, isn't there a risk of the timer firing just in the middle of
migration (when interrupt generation is disabled), and hence Xen
possibly missing a deadline?
In hpet_broadcast_exit() we need to check whether the timer has
expired after the migration, and manually trigger a broadcast if
needed. This also risks doing to broadcasts also back-to-back, but
it's the only option I see to avoid missing a deadline.
Maybe there's something I'm missing, this is all fairly complex.
Thanks, Roger.
On 16.10.2025 17:34, Roger Pau Monné wrote:
> On Thu, Oct 16, 2025 at 01:57:41PM +0200, Jan Beulich wrote:
>> On 16.10.2025 13:42, Roger Pau Monné wrote:
>>> On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
>>>> Keeping channels enabled when they're unused is only causing problems:
>>>> Extra interrupts harm performance, and extra nested interrupts could even
>>>> have caused worse problems.
>>>>
>>>> Note that no explicit "enable" is necessary - that's implicitly done by
>>>> set_channel_irq_affinity() once the channel goes into use again.
>>>>
>>>> Along with disabling the counter, also "clear" the channel's "next event",
>>>> for it to be properly written by whatever the next user is going to want
>>>> (possibly avoiding too early an IRQ).
>>>>
>>>> Further, along the same lines, don't enable channels early when starting
>>>> up an IRQ. This similarly should happen no earlier than from
>>>> set_channel_irq_affinity() (here: once a channel goes into use the very
>>>> first time). This eliminates a single instance of
>>>>
>>>> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
>>>> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
>>>>
>>>> during boot. (Why exactly there's only one instance, when we use multiple
>>>> counters and hence multiple IRQs, I can't tell. My understanding would be
>>>> that this was due to __hpet_setup_msi_irq() being called only after
>>>> request_irq() [and hence the .startup handler], yet that should have
>>>> affected all channels.)
>>>>
>>>> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> A window still remains for IRQs to be caused by stale comparator values:
>>>> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
>>>> Should we also write the comparator to "far into the future"?
>>>
>>> It might be helpful to reprogram the comparator as far ahead as
>>> possible in hpet_attach_channel() ahead of enabling it, or
>>> alternatively in hpet_detach_channel().
>>
>> The downside is yet another (slow) MMIO access. Hence why I didn't make
>> such a change right away. Plus I wasn't quite sure about the locking there:
>> Imo if we did so, it would be better if the lock wasn't dropped
>> intermediately.
>>
>>>> @@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
>>>> spin_unlock_irq(&ch->lock);
>>>> else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
>>>> {
>>>> + hpet_disable_channel(ch);
>>>> + ch->next_event = STIME_MAX;
>>>> ch->cpu = -1;
>>>> clear_bit(HPET_EVT_USED_BIT, &ch->flags);
>>>> spin_unlock_irq(&ch->lock);
>>>
>>> I'm a bit confused with what the HPET code does here (don't know
>>> enough about it, and there are no comments). Why is the timer rotated
>>> to a CPU in ch->cpumask once disabled, instead of just being plain
>>> disabled?
>>
>> Because it will still be needed by the other CPUs that the channel is
>> shared with.
>
> Yeah, missed that part, the channel is migrated to a different CPU. I
> wonder however: since an active channel can be migrated around between
> CPUs, isn't there a risk of the timer firing just in the middle of
> migration (when interrupt generation is disabled), and hence Xen
> possibly missing a deadline?
>
> In hpet_broadcast_exit() we need to check whether the timer has
> expired after the migration, and manually trigger a broadcast if
> needed. This also risks doing to broadcasts also back-to-back, but
> it's the only option I see to avoid missing a deadline.
>
> Maybe there's something I'm missing, this is all fairly complex.
set_channel_irq_affinity() invokes handle_hpet_broadcast() to cover that
case.
Jan
On Thu, Oct 16, 2025 at 05:55:30PM +0200, Jan Beulich wrote:
> On 16.10.2025 17:34, Roger Pau Monné wrote:
> > On Thu, Oct 16, 2025 at 01:57:41PM +0200, Jan Beulich wrote:
> >> On 16.10.2025 13:42, Roger Pau Monné wrote:
> >>> On Thu, Oct 16, 2025 at 09:31:42AM +0200, Jan Beulich wrote:
> >>>> Keeping channels enabled when they're unused is only causing problems:
> >>>> Extra interrupts harm performance, and extra nested interrupts could even
> >>>> have caused worse problems.
> >>>>
> >>>> Note that no explicit "enable" is necessary - that's implicitly done by
> >>>> set_channel_irq_affinity() once the channel goes into use again.
> >>>>
> >>>> Along with disabling the counter, also "clear" the channel's "next event",
> >>>> for it to be properly written by whatever the next user is going to want
> >>>> (possibly avoiding too early an IRQ).
> >>>>
> >>>> Further, along the same lines, don't enable channels early when starting
> >>>> up an IRQ. This similarly should happen no earlier than from
> >>>> set_channel_irq_affinity() (here: once a channel goes into use the very
> >>>> first time). This eliminates a single instance of
> >>>>
> >>>> (XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
> >>>> (XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request
> >>>>
> >>>> during boot. (Why exactly there's only one instance, when we use multiple
> >>>> counters and hence multiple IRQs, I can't tell. My understanding would be
> >>>> that this was due to __hpet_setup_msi_irq() being called only after
> >>>> request_irq() [and hence the .startup handler], yet that should have
> >>>> affected all channels.)
> >>>>
> >>>> Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >>>> ---
> >>>> A window still remains for IRQs to be caused by stale comparator values:
> >>>> hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
> >>>> Should we also write the comparator to "far into the future"?
> >>>
> >>> It might be helpful to reprogram the comparator as far ahead as
> >>> possible in hpet_attach_channel() ahead of enabling it, or
> >>> alternatively in hpet_detach_channel().
> >>
> >> The downside is yet another (slow) MMIO access. Hence why I didn't make
> >> such a change right away. Plus I wasn't quite sure about the locking there:
> >> Imo if we did so, it would be better if the lock wasn't dropped
> >> intermediately.
> >>
> >>>> @@ -542,6 +540,8 @@ static void hpet_detach_channel(unsigned
> >>>> spin_unlock_irq(&ch->lock);
> >>>> else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
> >>>> {
> >>>> + hpet_disable_channel(ch);
> >>>> + ch->next_event = STIME_MAX;
> >>>> ch->cpu = -1;
> >>>> clear_bit(HPET_EVT_USED_BIT, &ch->flags);
> >>>> spin_unlock_irq(&ch->lock);
> >>>
> >>> I'm a bit confused with what the HPET code does here (don't know
> >>> enough about it, and there are no comments). Why is the timer rotated
> >>> to a CPU in ch->cpumask once disabled, instead of just being plain
> >>> disabled?
> >>
> >> Because it will still be needed by the other CPUs that the channel is
> >> shared with.
> >
> > Yeah, missed that part, the channel is migrated to a different CPU. I
> > wonder however: since an active channel can be migrated around between
> > CPUs, isn't there a risk of the timer firing just in the middle of
> > migration (when interrupt generation is disabled), and hence Xen
> > possibly missing a deadline?
> >
> > In hpet_broadcast_exit() we need to check whether the timer has
> > expired after the migration, and manually trigger a broadcast if
> > needed. This also risks doing to broadcasts also back-to-back, but
> > it's the only option I see to avoid missing a deadline.
> >
> > Maybe there's something I'm missing, this is all fairly complex.
>
> set_channel_irq_affinity() invokes handle_hpet_broadcast() to cover that
> case.
Oh, indeed, sorry for the fuzz then.
Thanks, Roger.
© 2016 - 2026 Red Hat, Inc.