[PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug

Bibo Mao posted 6 patches 2 weeks ago
There is a newer version of this series
[PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug
Posted by Bibo Mao 2 weeks ago
In function virt_cpu_plug(), it will send cpu plug message to interrupt
controller extioi and ipi irqchip. If there is problem in this function,
system should continue to run and keep state the same before cpu is
added.

Object cpuslot::cpu is set at last only when there is no any error.
If there is, send cpu unplug message to extioi and ipi irqchip, and then
return immediately.

Fixes: ab9935d2991e (hw/loongarch/virt: Implement cpu plug interface)
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
---
 hw/loongarch/virt.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index a5840ff968..5118f01e4b 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -981,8 +981,6 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
     LoongArchVirtMachineState *lvms = LOONGARCH_VIRT_MACHINE(hotplug_dev);
     Error *err = NULL;
 
-    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
-    cpu_slot->cpu = CPU(dev);
     if (lvms->ipi) {
         hotplug_handler_plug(HOTPLUG_HANDLER(lvms->ipi), dev, &err);
         if (err) {
@@ -995,6 +993,10 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
         hotplug_handler_plug(HOTPLUG_HANDLER(lvms->extioi), dev, &err);
         if (err) {
             error_propagate(errp, err);
+            if (lvms->ipi) {
+                /* Send unplug message to restore, discard error here */
+                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
+            }
             return;
         }
     }
@@ -1003,9 +1005,20 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
         hotplug_handler_plug(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);
         if (err) {
             error_propagate(errp, err);
+            if (lvms->ipi) {
+                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
+            }
+
+            if (lvms->extioi) {
+                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->extioi),
+                                       dev, NULL);
+            }
+            return;
         }
     }
 
+    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
+    cpu_slot->cpu = CPU(dev);
     return;
 }
 
-- 
2.39.3
Re: [PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug
Posted by Markus Armbruster 2 weeks ago
Bibo Mao <maobibo@loongson.cn> writes:

> In function virt_cpu_plug(), it will send cpu plug message to interrupt
> controller extioi and ipi irqchip. If there is problem in this function,
> system should continue to run and keep state the same before cpu is
> added.
>
> Object cpuslot::cpu is set at last only when there is no any error.
> If there is, send cpu unplug message to extioi and ipi irqchip, and then
> return immediately.
>
> Fixes: ab9935d2991e (hw/loongarch/virt: Implement cpu plug interface)
> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> ---
>  hw/loongarch/virt.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
> index a5840ff968..5118f01e4b 100644
> --- a/hw/loongarch/virt.c
> +++ b/hw/loongarch/virt.c
> @@ -981,8 +981,6 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>      LoongArchVirtMachineState *lvms = LOONGARCH_VIRT_MACHINE(hotplug_dev);
>      Error *err = NULL;
>  
> -    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
> -    cpu_slot->cpu = CPU(dev);
>      if (lvms->ipi) {
>          hotplug_handler_plug(HOTPLUG_HANDLER(lvms->ipi), dev, &err);
>          if (err) {
> @@ -995,6 +993,10 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>          hotplug_handler_plug(HOTPLUG_HANDLER(lvms->extioi), dev, &err);
>          if (err) {
>              error_propagate(errp, err);
> +            if (lvms->ipi) {
> +                /* Send unplug message to restore, discard error here */
> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
> +            }
>              return;
>          }
>      }
> @@ -1003,9 +1005,20 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>          hotplug_handler_plug(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);
>          if (err) {
>              error_propagate(errp, err);
> +            if (lvms->ipi) {
> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
> +            }
> +
> +            if (lvms->extioi) {
> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->extioi),
> +                                       dev, NULL);
> +            }
> +            return;
>          }
>      }
>  
> +    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
> +    cpu_slot->cpu = CPU(dev);
>      return;
>  }

Hmm.

You're right about the problem: virt_cpu_plug() neglects to revert
changes when it fails.

You're probably right to move the assignment to cpu_slot->cpu to the
end.  Anything you can delay until success is assured you don't have to
revert.  I say "probably" because the code that now runs before the
assignment might theoretically "see" the assignment, and I didn't
examine it to exclude that.

Where I have doubts is the code to revert changes.

The hotplug_handler_plug() error checkign suggests it can fail.

Can hotplug_handler_unplug() fail, too?  The error checking in
virt_cpu_unplug() right above suggests it can.

What happens if it fails in virt_cpu_plug()?
Re: [PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug
Posted by bibo mao 2 weeks ago

On 2025/3/20 下午2:16, Markus Armbruster wrote:
> Bibo Mao <maobibo@loongson.cn> writes:
> 
>> In function virt_cpu_plug(), it will send cpu plug message to interrupt
>> controller extioi and ipi irqchip. If there is problem in this function,
>> system should continue to run and keep state the same before cpu is
>> added.
>>
>> Object cpuslot::cpu is set at last only when there is no any error.
>> If there is, send cpu unplug message to extioi and ipi irqchip, and then
>> return immediately.
>>
>> Fixes: ab9935d2991e (hw/loongarch/virt: Implement cpu plug interface)
>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
>> ---
>>   hw/loongarch/virt.c | 17 +++++++++++++++--
>>   1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
>> index a5840ff968..5118f01e4b 100644
>> --- a/hw/loongarch/virt.c
>> +++ b/hw/loongarch/virt.c
>> @@ -981,8 +981,6 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>>       LoongArchVirtMachineState *lvms = LOONGARCH_VIRT_MACHINE(hotplug_dev);
>>       Error *err = NULL;
>>   
>> -    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
>> -    cpu_slot->cpu = CPU(dev);
>>       if (lvms->ipi) {
>>           hotplug_handler_plug(HOTPLUG_HANDLER(lvms->ipi), dev, &err);
>>           if (err) {
>> @@ -995,6 +993,10 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>>           hotplug_handler_plug(HOTPLUG_HANDLER(lvms->extioi), dev, &err);
>>           if (err) {
>>               error_propagate(errp, err);
>> +            if (lvms->ipi) {
>> +                /* Send unplug message to restore, discard error here */
>> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
>> +            }
>>               return;
>>           }
>>       }
>> @@ -1003,9 +1005,20 @@ static void virt_cpu_plug(HotplugHandler *hotplug_dev,
>>           hotplug_handler_plug(HOTPLUG_HANDLER(lvms->acpi_ged), dev, &err);
>>           if (err) {
>>               error_propagate(errp, err);
>> +            if (lvms->ipi) {
>> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->ipi), dev, NULL);
>> +            }
>> +
>> +            if (lvms->extioi) {
>> +                hotplug_handler_unplug(HOTPLUG_HANDLER(lvms->extioi),
>> +                                       dev, NULL);
>> +            }
>> +            return;
>>           }
>>       }
>>   
>> +    cpu_slot = virt_find_cpu_slot(MACHINE(lvms), cpu->phy_id);
>> +    cpu_slot->cpu = CPU(dev);
>>       return;
>>   }
> 
> Hmm.
> 
> You're right about the problem: virt_cpu_plug() neglects to revert
> changes when it fails.
> 
> You're probably right to move the assignment to cpu_slot->cpu to the
> end.  Anything you can delay until success is assured you don't have to
> revert.  I say "probably" because the code that now runs before the
> assignment might theoretically "see" the assignment, and I didn't
> examine it to exclude that.
> 
> Where I have doubts is the code to revert changes.
> 
> The hotplug_handler_plug() error checkign suggests it can fail.
> 
> Can hotplug_handler_unplug() fail, too?  The error checking in
> virt_cpu_unplug() right above suggests it can.
Basically from existing code about ipi/extioi hotplug handler, it is
impossible to there is error, here is only for future use.

If there is error in function virt_cpu_plug(), undo() such as 
hotplug_handler_unplug() should be called. However if undo() reports 
error, I do not know how to handle it, here just discard error in 
function undo().

Regards
Bibo Mao
> 
> What happens if it fails in virt_cpu_plug()?
> 


Re: [PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug
Posted by Markus Armbruster 2 weeks ago
bibo mao <maobibo@loongson.cn> writes:

On 2025/3/20 下午2:16, Markus Armbruster wrote:
>> Bibo Mao <maobibo@loongson.cn> writes:
>> 
>>> In function virt_cpu_plug(), it will send cpu plug message to interrupt
>>> controller extioi and ipi irqchip. If there is problem in this function,
>>> system should continue to run and keep state the same before cpu is
>>> added.
>>>
>>> Object cpuslot::cpu is set at last only when there is no any error.
>>> If there is, send cpu unplug message to extioi and ipi irqchip, and then
>>> return immediately.
>>>
>>> Fixes: ab9935d2991e (hw/loongarch/virt: Implement cpu plug interface)
>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>

[...]

>> Hmm.
>> 
>> You're right about the problem: virt_cpu_plug() neglects to revert
>> changes when it fails.
>> 
>> You're probably right to move the assignment to cpu_slot->cpu to the
>> end.  Anything you can delay until success is assured you don't have to
>> revert.  I say "probably" because the code that now runs before the
>> assignment might theoretically "see" the assignment, and I didn't
>> examine it to exclude that.
>> 
>> Where I have doubts is the code to revert changes.
>> 
>> The hotplug_handler_plug() error checkign suggests it can fail.
>> 
>> Can hotplug_handler_unplug() fail, too?  The error checking in
>> virt_cpu_unplug() right above suggests it can.
>
> Basically from existing code about ipi/extioi hotplug handler, it is
> impossible to there is error, here is only for future use.

Aha.  More at the end of my reply.

> If there is error in function virt_cpu_plug(), undo() such as 
> hotplug_handler_unplug() should be called. However if undo() reports 
> error, I do not know how to handle it, here just discard error in 
> function undo().

Steinbach's Guideline for Systems Programming: Never test for an error
condition you don't know how to handle.

This old quip is a funny way to say that errors we don't know how to
handle are *bad*, and should be avoided.

> Regards
> Bibo Mao
>> 
>> What happens if it fails in virt_cpu_plug()?

You assure us this can't happen today.  Because of that, broken error
recovery is not an actual problem.

However, if things change some day so it can happen, broken error
recovery becomes an actual problem.

so, broken error recovery just "for future use" is actually just for
silent future breakage.

But is it broken?  This is what I'm trying to find out with my "what
happens if" question.

If it is broken, then passing &error_abort would likely be less bad:
crash instead of silent breakage.  Also makes it completely obvious in
the code that these errors are not handled, whereas broken error
handling looks like it is until you actually think about it.
Re: [PATCH v5 2/6] hw/loongarch/virt: Fix error handling in cpu plug
Posted by bibo mao 2 weeks ago

On 2025/3/20 下午3:25, Markus Armbruster wrote:
> bibo mao <maobibo@loongson.cn> writes:
> 
> On 2025/3/20 下午2:16, Markus Armbruster wrote:
>>> Bibo Mao <maobibo@loongson.cn> writes:
>>>
>>>> In function virt_cpu_plug(), it will send cpu plug message to interrupt
>>>> controller extioi and ipi irqchip. If there is problem in this function,
>>>> system should continue to run and keep state the same before cpu is
>>>> added.
>>>>
>>>> Object cpuslot::cpu is set at last only when there is no any error.
>>>> If there is, send cpu unplug message to extioi and ipi irqchip, and then
>>>> return immediately.
>>>>
>>>> Fixes: ab9935d2991e (hw/loongarch/virt: Implement cpu plug interface)
>>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn>
> 
> [...]
> 
>>> Hmm.
>>>
>>> You're right about the problem: virt_cpu_plug() neglects to revert
>>> changes when it fails.
>>>
>>> You're probably right to move the assignment to cpu_slot->cpu to the
>>> end.  Anything you can delay until success is assured you don't have to
>>> revert.  I say "probably" because the code that now runs before the
>>> assignment might theoretically "see" the assignment, and I didn't
>>> examine it to exclude that.
>>>
>>> Where I have doubts is the code to revert changes.
>>>
>>> The hotplug_handler_plug() error checkign suggests it can fail.
>>>
>>> Can hotplug_handler_unplug() fail, too?  The error checking in
>>> virt_cpu_unplug() right above suggests it can.
>>
>> Basically from existing code about ipi/extioi hotplug handler, it is
>> impossible to there is error, here is only for future use.
> 
> Aha.  More at the end of my reply.
> 
>> If there is error in function virt_cpu_plug(), undo() such as
>> hotplug_handler_unplug() should be called. However if undo() reports
>> error, I do not know how to handle it, here just discard error in
>> function undo().
> 
> Steinbach's Guideline for Systems Programming: Never test for an error
> condition you don't know how to handle.
> 
> This old quip is a funny way to say that errors we don't know how to
> handle are *bad*, and should be avoided.
> 
>> Regards
>> Bibo Mao
>>>
>>> What happens if it fails in virt_cpu_plug()?
> 
> You assure us this can't happen today.  Because of that, broken error
> recovery is not an actual problem.
> 
> However, if things change some day so it can happen, broken error
> recovery becomes an actual problem.
> 
> so, broken error recovery just "for future use" is actually just for
> silent future breakage.
> 
> But is it broken?  This is what I'm trying to find out with my "what
> happens if" question.
> 
> If it is broken, then passing &error_abort would likely be less bad:
> crash instead of silent breakage.  Also makes it completely obvious in
> the code that these errors are not handled, whereas broken error
> handling looks like it is until you actually think about it.
> 
yes, it seems that &error_abort is better than NULL, it is better than 
slient and do nothing. If error really happens, we had to solve it then.

And I will refresh the patch in next version.

Regards
Bibo Mao