[v2] x86/HPET: broadcast IRQ and other improvements

[PATCH v2 for-4.21 1/9] x86/HPET: disable unused channels

Posted by Jan Beulich 1 week, 3 days ago

Keeping channels enabled when they're unused is only causing problems:
Extra interrupts harm performance, and extra nested interrupts could even
have caused worse problems.

Note that no explicit "enable" is necessary - that's implicitly done by
set_channel_irq_affinity() once the channel goes into use again.

Along with disabling the counter, also "clear" the channel's "next event",
for it to be properly written by whatever the next user is going to want
(possibly avoiding too early an IRQ).

Further, along the same lines, don't enable channels early when starting
up an IRQ. This similarly should happen no earlier than from
set_channel_irq_affinity() (here: once a channel goes into use the very
first time). This eliminates a single instance of

(XEN) [VT-D]INTR-REMAP: Request device [0000:00:1f.0] fault index 0
(XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt request

during boot. (Why exactly there's only one instance, when we use multiple
counters and hence multiple IRQs, I can't tell. My understanding would be
that this was due to __hpet_setup_msi_irq() being called only after
request_irq() [and hence the .startup handler], yet that should have
affected all channels.)

Fixes: 3ba523ff957c ("CPUIDLE: enable MSI capable HPET for timer broadcast")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
---
A window still remains for IRQs to be caused by stale comparator values:
hpet_attach_channel() is called ahead of reprogram_hpet_evt_channel().
Should we also write the comparator to "far into the future"?

Furthermore this prolongues the window until "old" vectors may be released
again, as this way we potentially (and intentionally) delay the ocurrence
of the next IRQ for the channel in question. (This issue will disappear
once we switch to a fixed, global vector.)

--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -262,10 +262,9 @@ static void cf_check hpet_msi_unmask(str
     ch->msi.msi_attrib.host_masked = 0;
 }
 
-static void cf_check hpet_msi_mask(struct irq_desc *desc)
+static void hpet_disable_channel(struct hpet_event_channel *ch)
 {
     u32 cfg;
-    struct hpet_event_channel *ch = desc->action->dev_id;
 
     cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
     cfg &= ~HPET_TN_ENABLE;
@@ -273,6 +272,11 @@ static void cf_check hpet_msi_mask(struc
     ch->msi.msi_attrib.host_masked = 1;
 }
 
+static void cf_check hpet_msi_mask(struct irq_desc *desc)
+{
+    hpet_disable_channel(desc->action->dev_id);
+}
+
 static int hpet_msi_write(struct hpet_event_channel *ch, struct msi_msg *msg)
 {
     ch->msi.msg = *msg;
@@ -295,12 +299,6 @@ static int hpet_msi_write(struct hpet_ev
     return 0;
 }
 
-static unsigned int cf_check hpet_msi_startup(struct irq_desc *desc)
-{
-    hpet_msi_unmask(desc);
-    return 0;
-}
-
 #define hpet_msi_shutdown hpet_msi_mask
 
 static void cf_check hpet_msi_set_affinity(
@@ -326,7 +324,7 @@ static void cf_check hpet_msi_set_affini
  */
 static hw_irq_controller hpet_msi_type = {
     .typename   = "HPET-MSI",
-    .startup    = hpet_msi_startup,
+    .startup    = irq_startup_none,
     .shutdown   = hpet_msi_shutdown,
     .enable	    = hpet_msi_unmask,
     .disable    = hpet_msi_mask,
@@ -526,6 +524,8 @@ static void hpet_detach_channel(unsigned
         spin_unlock_irq(&ch->lock);
     else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
     {
+        hpet_disable_channel(ch);
+        ch->next_event = STIME_MAX;
         ch->cpu = -1;
         clear_bit(HPET_EVT_USED_BIT, &ch->flags);
         spin_unlock_irq(&ch->lock);

Re: [PATCH v2 for-4.21 1/9] x86/HPET: disable unused channels

Posted by Jan Beulich 1 week ago

On 20.10.2025 13:18, Jan Beulich wrote:
> @@ -526,6 +524,8 @@ static void hpet_detach_channel(unsigned
>          spin_unlock_irq(&ch->lock);
>      else if ( (next = cpumask_first(ch->cpumask)) >= nr_cpu_ids )
>      {
> +        hpet_disable_channel(ch);
> +        ch->next_event = STIME_MAX;
>          ch->cpu = -1;
>          clear_bit(HPET_EVT_USED_BIT, &ch->flags);
>          spin_unlock_irq(&ch->lock);

Now that I have everything else working, I thought I'd look into where the
excess IRQs come from on Intel hardware. Since my earlier experiment with
making conditional the write in hpet_enable_channel() / hpet_msi_unmask()
had failed, I was misguided into assuming some more complex logic in their
HPETs. It now appears to be as simple as I initially suspected: It's 0 -> 1
transitions of the ENABLE bit which cause immediate IRQs. And the reason
why the removal of the mask/unmask pairs in patch 2 weren't (sufficiently)
helpful is above: We better wouldn't disable the channels, to avoid said
(mis-)feature. Instead I'm now intending to merely write a long timeout
value here, along the lines of what I did in this version (code there fully
dropped in v3 though) in patch 8.

Jan