[PATCH] x86/dpci: do not leak pending interrupts on CPU offline

Roger Pau Monne posted 1 patch 1 month, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20241003142036.43287-1-roger.pau@citrix.com
xen/drivers/passthrough/x86/hvm.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
[PATCH] x86/dpci: do not leak pending interrupts on CPU offline
Posted by Roger Pau Monne 1 month, 3 weeks ago
The current dpci logic relies on a softirq being executed as a side effect of
the cpu_notifier_call_chain() call in the code path that offlines the target
CPU.  However the call to cpu_notifier_call_chain() won't trigger any softirq
processing, and even if it did, such processing should be done after all
interrupts have been migrated off the current CPU, otherwise new pending dpci
interrupts could still appear.

Current ASSERT in the cpu callback notifier is fairly easy to trigger by doing
CPU offline from a PVH dom0.

Solve this by instead moving out any dpci interrupts pending processing once
the CPU is dead.  This might introduce more latency than attempting to drain
before the CPU is put offline, but it's less complex, and CPU online/offline is
not a common action.  Any extra introduced latency should be tolerable.

Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/passthrough/x86/hvm.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index d3627e4af71b..f5faff7a499a 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -1105,23 +1105,27 @@ static int cf_check cpu_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    unsigned long flags;
 
     switch ( action )
     {
     case CPU_UP_PREPARE:
         INIT_LIST_HEAD(&per_cpu(dpci_list, cpu));
         break;
+
     case CPU_UP_CANCELED:
-    case CPU_DEAD:
-        /*
-         * On CPU_DYING this callback is called (on the CPU that is dying)
-         * with an possible HVM_DPIC_SOFTIRQ pending - at which point we can
-         * clear out any outstanding domains (by the virtue of the idle loop
-         * calling the softirq later). In CPU_DEAD case the CPU is deaf and
-         * there are no pending softirqs for us to handle so we can chill.
-         */
         ASSERT(list_empty(&per_cpu(dpci_list, cpu)));
         break;
+
+    case CPU_DEAD:
+        if ( list_empty(&per_cpu(dpci_list, cpu)) )
+            break;
+        /* Take whatever dpci interrupts are pending on the dead CPU. */
+        local_irq_save(flags);
+        list_splice_init(&per_cpu(dpci_list, cpu), &this_cpu(dpci_list));
+        local_irq_restore(flags);
+        raise_softirq(HVM_DPCI_SOFTIRQ);
+        break;
     }
 
     return NOTIFY_DONE;
-- 
2.46.0


Re: [PATCH] x86/dpci: do not leak pending interrupts on CPU offline
Posted by Andrew Cooper 1 month, 3 weeks ago
On 03/10/2024 3:20 pm, Roger Pau Monne wrote:
> The current dpci logic relies on a softirq being executed as a side effect of
> the cpu_notifier_call_chain() call in the code path that offlines the target
> CPU.  However the call to cpu_notifier_call_chain() won't trigger any softirq
> processing, and even if it did, such processing should be done after all
> interrupts have been migrated off the current CPU, otherwise new pending dpci
> interrupts could still appear.
>
> Current ASSERT in

"Currently the ASSERT() in"

>  the cpu callback notifier is fairly easy to trigger by doing
> CPU offline from a PVH dom0.
>
> Solve this by instead moving out any dpci interrupts pending processing once
> the CPU is dead.  This might introduce more latency than attempting to drain
> before the CPU is put offline, but it's less complex, and CPU online/offline is
> not a common action.  Any extra introduced latency should be tolerable.
>
> Fixes: f6dd295381f4 ('dpci: replace tasklet with softirq')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Yeah, I'm not concerned with minor extra latency in the offline path. 
In production it's used 0% of the time to many many significant figures.

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>