[PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes

Peter Maydell posted 1 patch 2 years, 4 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211124202005.989935-1-peter.maydell@linaro.org
Maintainers: Peter Maydell <peter.maydell@linaro.org>
hw/intc/gicv3_internal.h   | 17 +++++++++++++++++
hw/intc/arm_gicv3.c        |  6 ++++--
hw/intc/arm_gicv3_redist.c | 14 ++++++++++----
3 files changed, 31 insertions(+), 6 deletions(-)
[PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes
Posted by Peter Maydell 2 years, 4 months ago
The logic of gicv3_redist_update() is as follows:
 * it must be called in any code path that changes the state of
   (only) redistributor interrupts
 * if it finds a redistributor interrupt that is (now) higher
   priority than the previous highest-priority pending interrupt,
   then this must be the new highest-priority pending interrupt
 * if it does *not* find a better redistributor interrupt, then:
    - if the previous state was "no interrupts pending" then
      the new state is still "no interrupts pending"
    - if the previous best interrupt was not a redistributor
      interrupt then that remains the best interrupt
    - if the previous best interrupt *was* a redistributor interrupt,
      then the new best interrupt must be some non-redistributor
      interrupt, but we don't know which so must do a full scan

In commit 17fb5e36aabd4b2c125 we effectively added the LPI interrupts
as a kind of "redistributor interrupt" for this purpose, by adding
cs->hpplpi to the set of things that gicv3_redist_update() considers
before it gives up and decides to do a full scan of distributor
interrupts. However we didn't quite get this right:
 * the condition check for "was the previous best interrupt a
   redistributor interrupt" must be updated to include LPIs
   in what it considers to be redistributor interrupts
 * every code path which updates the LPI state which
   gicv3_redist_update() checks must also call gicv3_redist_update():
   this is cs->hpplpi and the GICR_CTLR ENABLE_LPIS bit

This commit fixes this by:
 * correcting the test on cs->hppi.irq in gicv3_redist_update()
 * making gicv3_redist_update_lpi() always call gicv3_redist_update()
 * introducing a new gicv3_redist_update_lpi_only() for the one
   callsite (the post-load hook) which must not call
   gicv3_redist_update()
 * making gicv3_redist_lpi_pending() always call gicv3_redist_update(),
   either directly or via gicv3_redist_update_lpi()
 * removing a couple of now-unnecessary calls to gicv3_redist_update()
   from some callers of those two functions
 * calling gicv3_redist_update() when the GICR_CTLR ENABLE_LPIS
   bit is cleared

(This means that the not-file-local gicv3_redist_* LPI related
functions now all take care of the updates of internally cached
GICv3 information, in the same way the older functions
gicv3_redist_set_irq() and gicv3_redist_send_sgi() do.)

The visible effect of this bug was that when the guest acknowledged
an LPI by reading ICC_IAR1_EL1, we marked it as not pending in the
LPI data structure but still left it in cs->hppi so we would offer it
to the guest again.  In particular for setups using an emulated GICv3
and ITS and using devices which use LPIs (ie PCI devices) a Linux
guest would complain "irq 54: nobody cared" and then hang.  (The hang
was intermittent, presumably depending on the timing between
different interrupts arriving and being completed.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I think this is now a proper fix for the problem. Testing
definitely welcomed... The commit message makes it sound like a bit
of a "several things in one patch" change, but it isn't really IMHO:
I just erred on the side of being very verbose in the description...
---
 hw/intc/gicv3_internal.h   | 17 +++++++++++++++++
 hw/intc/arm_gicv3.c        |  6 ++++--
 hw/intc/arm_gicv3_redist.c | 14 ++++++++++----
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index a0369dace7b..70f34ee4955 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -463,7 +463,24 @@ void gicv3_dist_set_irq(GICv3State *s, int irq, int level);
 void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level);
 void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level);
 void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level);
+/**
+ * gicv3_redist_update_lpi:
+ * @cs: GICv3CPUState
+ *
+ * Scan the LPI pending table and recalculate the highest priority
+ * pending LPI and also the overall highest priority pending interrupt.
+ */
 void gicv3_redist_update_lpi(GICv3CPUState *cs);
+/**
+ * gicv3_redist_update_lpi_only:
+ * @cs: GICv3CPUState
+ *
+ * Scan the LPI pending table and recalculate cs->hpplpi only,
+ * without calling gicv3_redist_update() to recalculate the overall
+ * highest priority pending interrupt. This should be called after
+ * an incoming migration has loaded new state.
+ */
+void gicv3_redist_update_lpi_only(GICv3CPUState *cs);
 void gicv3_redist_send_sgi(GICv3CPUState *cs, int grp, int irq, bool ns);
 void gicv3_init_cpuif(GICv3State *s);
 
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index c6282984b1e..9f5f815db9b 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -186,7 +186,9 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
      * interrupt has reduced in priority and any other interrupt could
      * now be the new best one).
      */
-    if (!seenbetter && cs->hppi.prio != 0xff && cs->hppi.irq < GIC_INTERNAL) {
+    if (!seenbetter && cs->hppi.prio != 0xff &&
+        (cs->hppi.irq < GIC_INTERNAL ||
+         cs->hppi.irq >= GICV3_LPI_INTID_START)) {
         gicv3_full_update_noirqset(cs->gic);
     }
 }
@@ -354,7 +356,7 @@ static void arm_gicv3_post_load(GICv3State *s)
      * pending interrupt, but don't set IRQ or FIQ lines.
      */
     for (i = 0; i < s->num_cpu; i++) {
-        gicv3_redist_update_lpi(&s->cpu[i]);
+        gicv3_redist_update_lpi_only(&s->cpu[i]);
     }
     gicv3_full_update_noirqset(s);
     /* Repopulate the cache of GICv3CPUState pointers for target CPUs */
diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index 424e7e28a86..c8ff3eca085 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -256,9 +256,10 @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr offset,
                 cs->gicr_ctlr |= GICR_CTLR_ENABLE_LPIS;
                 /* Check for any pending interr in pending table */
                 gicv3_redist_update_lpi(cs);
-                gicv3_redist_update(cs);
             } else {
                 cs->gicr_ctlr &= ~GICR_CTLR_ENABLE_LPIS;
+                /* cs->hppi might have been an LPI; recalculate */
+                gicv3_redist_update(cs);
             }
         }
         return MEMTX_OK;
@@ -571,7 +572,7 @@ static void gicv3_redist_check_lpi_priority(GICv3CPUState *cs, int irq)
     }
 }
 
-void gicv3_redist_update_lpi(GICv3CPUState *cs)
+void gicv3_redist_update_lpi_only(GICv3CPUState *cs)
 {
     /*
      * This function scans the LPI pending table and for each pending
@@ -614,6 +615,12 @@ void gicv3_redist_update_lpi(GICv3CPUState *cs)
     }
 }
 
+void gicv3_redist_update_lpi(GICv3CPUState *cs)
+{
+    gicv3_redist_update_lpi_only(cs);
+    gicv3_redist_update(cs);
+}
+
 void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level)
 {
     /*
@@ -651,6 +658,7 @@ void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level)
      */
     if (level) {
         gicv3_redist_check_lpi_priority(cs, irq);
+        gicv3_redist_update(cs);
     } else {
         if (irq == cs->hpplpi.irq) {
             gicv3_redist_update_lpi(cs);
@@ -673,8 +681,6 @@ void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level)
 
     /* set/clear the pending bit for this irq */
     gicv3_redist_lpi_pending(cs, irq, level);
-
-    gicv3_redist_update(cs);
 }
 
 void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level)
-- 
2.25.1


Re: [PATCH v2] hw/intc/arm_gicv3: Update cached state after LPI state changes
Posted by Alex Bennée 2 years, 4 months ago
Peter Maydell <peter.maydell@linaro.org> writes:

> The logic of gicv3_redist_update() is as follows:
>  * it must be called in any code path that changes the state of
>    (only) redistributor interrupts
>  * if it finds a redistributor interrupt that is (now) higher
>    priority than the previous highest-priority pending interrupt,
>    then this must be the new highest-priority pending interrupt
>  * if it does *not* find a better redistributor interrupt, then:
>     - if the previous state was "no interrupts pending" then
>       the new state is still "no interrupts pending"
>     - if the previous best interrupt was not a redistributor
>       interrupt then that remains the best interrupt
>     - if the previous best interrupt *was* a redistributor interrupt,
>       then the new best interrupt must be some non-redistributor
>       interrupt, but we don't know which so must do a full scan
>
> In commit 17fb5e36aabd4b2c125 we effectively added the LPI interrupts
> as a kind of "redistributor interrupt" for this purpose, by adding
> cs->hpplpi to the set of things that gicv3_redist_update() considers
> before it gives up and decides to do a full scan of distributor
> interrupts. However we didn't quite get this right:
>  * the condition check for "was the previous best interrupt a
>    redistributor interrupt" must be updated to include LPIs
>    in what it considers to be redistributor interrupts
>  * every code path which updates the LPI state which
>    gicv3_redist_update() checks must also call gicv3_redist_update():
>    this is cs->hpplpi and the GICR_CTLR ENABLE_LPIS bit
>
> This commit fixes this by:
>  * correcting the test on cs->hppi.irq in gicv3_redist_update()
>  * making gicv3_redist_update_lpi() always call gicv3_redist_update()
>  * introducing a new gicv3_redist_update_lpi_only() for the one
>    callsite (the post-load hook) which must not call
>    gicv3_redist_update()
>  * making gicv3_redist_lpi_pending() always call gicv3_redist_update(),
>    either directly or via gicv3_redist_update_lpi()
>  * removing a couple of now-unnecessary calls to gicv3_redist_update()
>    from some callers of those two functions
>  * calling gicv3_redist_update() when the GICR_CTLR ENABLE_LPIS
>    bit is cleared
>
> (This means that the not-file-local gicv3_redist_* LPI related
> functions now all take care of the updates of internally cached
> GICv3 information, in the same way the older functions
> gicv3_redist_set_irq() and gicv3_redist_send_sgi() do.)
>
> The visible effect of this bug was that when the guest acknowledged
> an LPI by reading ICC_IAR1_EL1, we marked it as not pending in the
> LPI data structure but still left it in cs->hppi so we would offer it
> to the guest again.  In particular for setups using an emulated GICv3
> and ITS and using devices which use LPIs (ie PCI devices) a Linux
> guest would complain "irq 54: nobody cared" and then hang.  (The hang
> was intermittent, presumably depending on the timing between
> different interrupts arriving and being completed.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Tested-by: Alex Bennée <alex.bennee@linaro.org>

Interestingly this also triggers an extra IRQ in v4 of my kvm-unit-test
ITS patches. However it works with v3 which was more limited in the
excising of the test:

v3:

--8<---------------cut here---------------start------------->8---
modified   arm/gic.c
@@ -732,21 +732,17 @@ static void test_its_trigger(void)
 			"dev2/eventid=20 does not trigger any LPI");
 
 	/*
-	 * re-enable the LPI but willingly do not call invall
-	 * so the change in config is not taken into account.
-	 * The LPI should not hit
+	 * re-enable the LPI. While "A change to the LPI configuration
+	 * is not guaranteed to be visible until an appropriate
+	 * invalidation operation has completed" hardware that doesn't
+	 * implement caches may have delivered the event at any point
+	 * after the enabling. Check the LPI has hit by the time the
+	 * invall is done.
 	 */
 	gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT);
 	stats_reset();
 	cpumask_clear(&mask);
 	its_send_int(dev2, 20);
-	wait_for_interrupts(&mask);
-	report(check_acked(&mask, -1, -1),
-			"dev2/eventid=20 still does not trigger any LPI");
-
-	/* Now call the invall and check the LPI hits */
-	stats_reset();
-	cpumask_clear(&mask);
 	cpumask_set_cpu(3, &mask);
 	its_send_invall(col3);
 	wait_for_interrupts(&mask);
--8<---------------cut here---------------end--------------->8---

v4:

--8<---------------cut here---------------start------------->8---
modified   arm/gic.c
@@ -732,34 +732,22 @@ static void test_its_trigger(void)
 			"dev2/eventid=20 does not trigger any LPI");
 
 	/*
-	 * re-enable the LPI but willingly do not call invall
-	 * so the change in config is not taken into account.
-	 * The LPI should not hit
+	 * re-enable the LPI. While "A change to the LPI configuration
+	 * is not guaranteed to be visible until an appropriate
+	 * invalidation operation has completed" hardware that doesn't
+	 * implement caches may have delivered the event at any point
+	 * after the enabling. Check the LPI has hit by the time the
+	 * invall is done.
 	 */
-	gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT);
-	stats_reset();
-	cpumask_clear(&mask);
-	its_send_int(dev2, 20);
-	wait_for_interrupts(&mask);
-	report(check_acked(&mask, -1, -1),
-			"dev2/eventid=20 still does not trigger any LPI");
-
-	/* Now call the invall and check the LPI hits */
 	stats_reset();
-	cpumask_clear(&mask);
-	cpumask_set_cpu(3, &mask);
+	gicv3_lpi_set_config(8195, LPI_PROP_DEFAULT);
 	its_send_invall(col3);
-	wait_for_interrupts(&mask);
-	report(check_acked(&mask, 0, 8195),
-			"dev2/eventid=20 pending LPI is received");
-
-	stats_reset();
 	cpumask_clear(&mask);
 	cpumask_set_cpu(3, &mask);
 	its_send_int(dev2, 20);
 	wait_for_interrupts(&mask);
 	report(check_acked(&mask, 0, 8195),
-			"dev2/eventid=20 now triggers an LPI");
+			"dev2/eventid=20 triggers an LPI");
 
 	report_prefix_pop();
 
--8<---------------cut here---------------end--------------->8---

I think my v3 was correct and the v4 is too aggressive as I was chasing
a regression in the QEMU code.

> ---
> I think this is now a proper fix for the problem. Testing
> definitely welcomed... The commit message makes it sound like a bit
> of a "several things in one patch" change, but it isn't really IMHO:
> I just erred on the side of being very verbose in the description...
> ---
>  hw/intc/gicv3_internal.h   | 17 +++++++++++++++++
>  hw/intc/arm_gicv3.c        |  6 ++++--
>  hw/intc/arm_gicv3_redist.c | 14 ++++++++++----
>  3 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
> index a0369dace7b..70f34ee4955 100644
> --- a/hw/intc/gicv3_internal.h
> +++ b/hw/intc/gicv3_internal.h
> @@ -463,7 +463,24 @@ void gicv3_dist_set_irq(GICv3State *s, int irq, int level);
>  void gicv3_redist_set_irq(GICv3CPUState *cs, int irq, int level);
>  void gicv3_redist_process_lpi(GICv3CPUState *cs, int irq, int level);
>  void gicv3_redist_lpi_pending(GICv3CPUState *cs, int irq, int level);
> +/**
> + * gicv3_redist_update_lpi:
> + * @cs: GICv3CPUState
> + *
> + * Scan the LPI pending table and recalculate the highest priority
> + * pending LPI and also the overall highest priority pending interrupt.
> + */
>  void gicv3_redist_update_lpi(GICv3CPUState *cs);
> +/**
> + * gicv3_redist_update_lpi_only:
> + * @cs: GICv3CPUState
> + *
> + * Scan the LPI pending table and recalculate cs->hpplpi only,
> + * without calling gicv3_redist_update() to recalculate the overall
> + * highest priority pending interrupt. This should be called after
> + * an incoming migration has loaded new state.
> + */
> +void gicv3_redist_update_lpi_only(GICv3CPUState *cs);

good commenting ;-)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée