[PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events

Daniel Henrique Barboza posted 4 patches 3 years, 1 month ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210312200740.815014-1-danielhb413@gmail.com
Maintainers: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Greg Kurz <groug@kaod.org>, Eduardo Habkost <ehabkost@redhat.com>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, David Gibson <david@gibson.dropbear.id.au>
hw/ppc/spapr.c     |  2 +-
hw/ppc/spapr_drc.c |  8 ++++++++
qapi/machine.json  | 23 +++++++++++++++++++++++
qapi/qdev.json     | 28 ++++++++++++++++++++++++++++
4 files changed, 60 insertions(+), 1 deletion(-)
[PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Daniel Henrique Barboza 3 years, 1 month ago
Hi,

This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].

Patches 1 and 3 are independent of the ppc patches and can be applied
separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
are dependent on the QAPI patches.


[1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01900.html

Daniel Henrique Barboza (4):
  qapi/qdev.json: add DEVICE_NOT_DELETED event
  spapr_drc.c: send DEVICE_NOT_DELETED event on unplug timeout
  qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  spapr.c: use DEVICE_UNPLUG_ERROR event in
    spapr_memory_unplug_rollback()

 hw/ppc/spapr.c     |  2 +-
 hw/ppc/spapr_drc.c |  8 ++++++++
 qapi/machine.json  | 23 +++++++++++++++++++++++
 qapi/qdev.json     | 28 ++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 1 deletion(-)

-- 
2.29.2


Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by David Gibson 3 years, 1 month ago
On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> 
> Patches 1 and 3 are independent of the ppc patches and can be applied
> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> are dependent on the QAPI patches.

Implementation looks fine, but I think there's a bit more to discuss
before we can apply.

I think it would make sense to re-order this and put UNPLUG_ERROR
first.  Its semantics are clearer, and I think there's a stronger case
for it.

I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
it really tell the user/management anything useful beyond what
receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Daniel Henrique Barboza 3 years, 1 month ago

On 3/22/21 10:12 PM, David Gibson wrote:
> On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
>> Hi,
>>
>> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
>> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
>>
>> Patches 1 and 3 are independent of the ppc patches and can be applied
>> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
>> are dependent on the QAPI patches.
> 
> Implementation looks fine, but I think there's a bit more to discuss
> before we can apply.
> 
> I think it would make sense to re-order this and put UNPLUG_ERROR
> first.  Its semantics are clearer, and I think there's a stronger case
> for it.

Alright

> 
> I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
> it really tell the user/management anything useful beyond what
> receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?


It informs that the hotunplug operation exceed the timeout that QEMU
internals considers adequate, but QEMU can't assert that it was caused
by an error or an unexpected delay. The end result is that the device
is not going to be deleted from QMP, so DEVICE_NOT_DELETED.

Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
event.



Thanks,


DHB


> 

Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by David Gibson 3 years, 1 month ago
On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/22/21 10:12 PM, David Gibson wrote:
> > On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
> > > Hi,
> > > 
> > > This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> > > DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> > > 
> > > Patches 1 and 3 are independent of the ppc patches and can be applied
> > > separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> > > are dependent on the QAPI patches.
> > 
> > Implementation looks fine, but I think there's a bit more to discuss
> > before we can apply.
> > 
> > I think it would make sense to re-order this and put UNPLUG_ERROR
> > first.  Its semantics are clearer, and I think there's a stronger case
> > for it.
> 
> Alright
> 
> > 
> > I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
> > it really tell the user/management anything useful beyond what
> > receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?
> 
> 
> It informs that the hotunplug operation exceed the timeout that QEMU
> internals considers adequate, but QEMU can't assert that it was caused
> by an error or an unexpected delay. The end result is that the device
> is not going to be deleted from QMP, so DEVICE_NOT_DELETED.

Is it, though?  I mean, it is with this implementation for papr:
because we clear the unplug_requested flag, even if the guest later
tries to complete the unplug, it will fail.

But if I understand what Markus was saying correctly, that might not
be possible for all hotplug systems.  I believe Markus was suggesting
that DEVICE_NOT_DELETED could just mean that we haven't deleted the
device yet, but it could still happen later.

And in that case, I'm not yet sold on the value of a message that
essentially just means "Ayup, still dunno what's happening, sorry".

> Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
> event.

Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
be "guest rejected hotplug", or something more specific, in the rare
case that the guest has a way of signalling something more specific,
or "timeout" - but the later *only* to be sent in cases where on the
timeout we're able to block any later completion of the unplug (as we
can on papr).

Thoughs, Markus?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Daniel Henrique Barboza 3 years, 1 month ago

On 3/23/21 10:40 PM, David Gibson wrote:
> On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:
>>
>>
>> On 3/22/21 10:12 PM, David Gibson wrote:
>>> On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
>>>> Hi,
>>>>
>>>> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
>>>> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
>>>>
>>>> Patches 1 and 3 are independent of the ppc patches and can be applied
>>>> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
>>>> are dependent on the QAPI patches.
>>>
>>> Implementation looks fine, but I think there's a bit more to discuss
>>> before we can apply.
>>>
>>> I think it would make sense to re-order this and put UNPLUG_ERROR
>>> first.  Its semantics are clearer, and I think there's a stronger case
>>> for it.
>>
>> Alright
>>
>>>
>>> I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
>>> it really tell the user/management anything useful beyond what
>>> receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?
>>
>>
>> It informs that the hotunplug operation exceed the timeout that QEMU
>> internals considers adequate, but QEMU can't assert that it was caused
>> by an error or an unexpected delay. The end result is that the device
>> is not going to be deleted from QMP, so DEVICE_NOT_DELETED.
> 
> Is it, though?  I mean, it is with this implementation for papr:
> because we clear the unplug_requested flag, even if the guest later
> tries to complete the unplug, it will fail.
> 
> But if I understand what Markus was saying correctly, that might not
> be possible for all hotplug systems.  I believe Markus was suggesting
> that DEVICE_NOT_DELETED could just mean that we haven't deleted the
> device yet, but it could still happen later.
> 
> And in that case, I'm not yet sold on the value of a message that
> essentially just means "Ayup, still dunno what's happening, sorry".
> 
>> Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
>> event.
> 
> Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
> be "guest rejected hotplug", or something more specific, in the rare
> case that the guest has a way of signalling something more specific,
> or "timeout" - but the later *only* to be sent in cases where on the
> timeout we're able to block any later completion of the unplug (as we
> can on papr).

I believe that's already covered by the existing API:


+# @DEVICE_UNPLUG_ERROR:
+#
+# Emitted when a device hot unplug error occurs.
+#
+# @device: device name
+#
+# @msg: Informative message

The 'informative message' would be the reason the event occurred. In patch
4/4, for the memory hotunplug refused by the guest, it is being set as:

      qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
                                   "for device %s", dev->id);
      qapi_event_send_device_unplug_error(dev->id, qapi_error);



We could use the same DEVICE_UNPLUG_ERROR event in the CPU hotunplug timeout
case (currently on patch 2/4) by just changing 'msg', e.g.:


      qapi_error = g_strdup_printf("CPU hotunplug timeout for device %s", dev->id);
      qapi_event_send_device_unplug_error(dev->id, qapi_error);


Thanks,

DHB


> 
> Thoughs, Markus?
> 

Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by David Gibson 3 years, 1 month ago
On Wed, Mar 24, 2021 at 04:09:59PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/23/21 10:40 PM, David Gibson wrote:
> > On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:
> > > 
> > > 
> > > On 3/22/21 10:12 PM, David Gibson wrote:
> > > > On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
> > > > > Hi,
> > > > > 
> > > > > This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> > > > > DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> > > > > 
> > > > > Patches 1 and 3 are independent of the ppc patches and can be applied
> > > > > separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> > > > > are dependent on the QAPI patches.
> > > > 
> > > > Implementation looks fine, but I think there's a bit more to discuss
> > > > before we can apply.
> > > > 
> > > > I think it would make sense to re-order this and put UNPLUG_ERROR
> > > > first.  Its semantics are clearer, and I think there's a stronger case
> > > > for it.
> > > 
> > > Alright
> > > 
> > > > 
> > > > I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
> > > > it really tell the user/management anything useful beyond what
> > > > receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?
> > > 
> > > 
> > > It informs that the hotunplug operation exceed the timeout that QEMU
> > > internals considers adequate, but QEMU can't assert that it was caused
> > > by an error or an unexpected delay. The end result is that the device
> > > is not going to be deleted from QMP, so DEVICE_NOT_DELETED.
> > 
> > Is it, though?  I mean, it is with this implementation for papr:
> > because we clear the unplug_requested flag, even if the guest later
> > tries to complete the unplug, it will fail.
> > 
> > But if I understand what Markus was saying correctly, that might not
> > be possible for all hotplug systems.  I believe Markus was suggesting
> > that DEVICE_NOT_DELETED could just mean that we haven't deleted the
> > device yet, but it could still happen later.
> > 
> > And in that case, I'm not yet sold on the value of a message that
> > essentially just means "Ayup, still dunno what's happening, sorry".
> > 
> > > Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
> > > event.
> > 
> > Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
> > be "guest rejected hotplug", or something more specific, in the rare
> > case that the guest has a way of signalling something more specific,
> > or "timeout" - but the later *only* to be sent in cases where on the
> > timeout we're able to block any later completion of the unplug (as we
> > can on papr).
> 
> I believe that's already covered by the existing API:
> 
> 
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message

Oh, sorry, I missed that

> The 'informative message' would be the reason the event occurred. In patch
> 4/4, for the memory hotunplug refused by the guest, it is being set as:
> 
>      qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>                                   "for device %s", dev->id);
>      qapi_event_send_device_unplug_error(dev->id, qapi_error);
> 
> 
> 
> We could use the same DEVICE_UNPLUG_ERROR event in the CPU hotunplug timeout
> case (currently on patch 2/4) by just changing 'msg', e.g.:
> 
> 
>      qapi_error = g_strdup_printf("CPU hotunplug timeout for device %s", dev->id);
>      qapi_event_send_device_unplug_error(dev->id, qapi_error);

I think that makes sense for the cases on papr.  Less sure about the
cases Markus has mentioned.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Igor Mammedov 3 years, 1 month ago
On Wed, 24 Mar 2021 16:09:59 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> On 3/23/21 10:40 PM, David Gibson wrote:
> > On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:  
> >>
> >>
> >> On 3/22/21 10:12 PM, David Gibson wrote:  
> >>> On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:  
> >>>> Hi,
> >>>>
> >>>> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> >>>> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> >>>>
> >>>> Patches 1 and 3 are independent of the ppc patches and can be applied
> >>>> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> >>>> are dependent on the QAPI patches.  
> >>>
> >>> Implementation looks fine, but I think there's a bit more to discuss
> >>> before we can apply.
> >>>
> >>> I think it would make sense to re-order this and put UNPLUG_ERROR
> >>> first.  Its semantics are clearer, and I think there's a stronger case
> >>> for it.  
> >>
> >> Alright
> >>  
> >>>
> >>> I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
> >>> it really tell the user/management anything useful beyond what
> >>> receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?  
> >>
> >>
> >> It informs that the hotunplug operation exceed the timeout that QEMU
> >> internals considers adequate, but QEMU can't assert that it was caused
> >> by an error or an unexpected delay. The end result is that the device
> >> is not going to be deleted from QMP, so DEVICE_NOT_DELETED.  
> > 
> > Is it, though?  I mean, it is with this implementation for papr:
> > because we clear the unplug_requested flag, even if the guest later
> > tries to complete the unplug, it will fail.
> > 
> > But if I understand what Markus was saying correctly, that might not
> > be possible for all hotplug systems.  I believe Markus was suggesting
> > that DEVICE_NOT_DELETED could just mean that we haven't deleted the
> > device yet, but it could still happen later.
> > 
> > And in that case, I'm not yet sold on the value of a message that
> > essentially just means "Ayup, still dunno what's happening, sorry".
> >   
> >> Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
> >> event.  
> > 
> > Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
> > be "guest rejected hotplug", or something more specific, in the rare
> > case that the guest has a way of signalling something more specific,
> > or "timeout" - but the later *only* to be sent in cases where on the
> > timeout we're able to block any later completion of the unplug (as we
> > can on papr).

Is canceling unplug on timeout documented somewhere (like some spec)?

If not it might (theoretically) confuse guest when it tries to unplug
after timeout and leave guest in some unexpected state. 

> 
> I believe that's already covered by the existing API:
> 
> 
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> 
> The 'informative message' would be the reason the event occurred. In patch
> 4/4, for the memory hotunplug refused by the guest, it is being set as:
> 
>       qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>                                    "for device %s", dev->id);
>       qapi_event_send_device_unplug_error(dev->id, qapi_error);
> 
> 
> 
> We could use the same DEVICE_UNPLUG_ERROR event in the CPU hotunplug timeout
> case (currently on patch 2/4) by just changing 'msg', e.g.:
> 
> 
>       qapi_error = g_strdup_printf("CPU hotunplug timeout for device %s", dev->id);
>       qapi_event_send_device_unplug_error(dev->id, qapi_error);
> 

lets make everything support ACPI (just kidding).

maybe we can reuse already existing ACPI_DEVICE_OST instead of DEVICE_UNPLUG_ERROR
which sort of does the same thing (and more) but instead of strings uses status codes
defined by spec.

Idea similar to DEVICE_UNPLUG_ERROR was considered back then, but instead of QEMU being
a poor translator of status codes to non machine-readable strings we went with
exposing well documented status codes to user. This way user can implement
specific reactions to particular errors just looking at JSON + spec.

> Thanks,
> 
> DHB
> 
> 
> > 
> > Thoughs, Markus?
> >   
> 


Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Daniel Henrique Barboza 3 years, 1 month ago

On 3/29/21 8:28 PM, Igor Mammedov wrote:
> On Wed, 24 Mar 2021 16:09:59 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> On 3/23/21 10:40 PM, David Gibson wrote:
>>> On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:
>>>>
>>>>
>>>> On 3/22/21 10:12 PM, David Gibson wrote:
>>>>> On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
>>>>>> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
>>>>>>
>>>>>> Patches 1 and 3 are independent of the ppc patches and can be applied
>>>>>> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
>>>>>> are dependent on the QAPI patches.
>>>>>
>>>>> Implementation looks fine, but I think there's a bit more to discuss
>>>>> before we can apply.
>>>>>
>>>>> I think it would make sense to re-order this and put UNPLUG_ERROR
>>>>> first.  Its semantics are clearer, and I think there's a stronger case
>>>>> for it.
>>>>
>>>> Alright
>>>>   
>>>>>
>>>>> I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
>>>>> it really tell the user/management anything useful beyond what
>>>>> receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?
>>>>
>>>>
>>>> It informs that the hotunplug operation exceed the timeout that QEMU
>>>> internals considers adequate, but QEMU can't assert that it was caused
>>>> by an error or an unexpected delay. The end result is that the device
>>>> is not going to be deleted from QMP, so DEVICE_NOT_DELETED.
>>>
>>> Is it, though?  I mean, it is with this implementation for papr:
>>> because we clear the unplug_requested flag, even if the guest later
>>> tries to complete the unplug, it will fail.
>>>
>>> But if I understand what Markus was saying correctly, that might not
>>> be possible for all hotplug systems.  I believe Markus was suggesting
>>> that DEVICE_NOT_DELETED could just mean that we haven't deleted the
>>> device yet, but it could still happen later.
>>>
>>> And in that case, I'm not yet sold on the value of a message that
>>> essentially just means "Ayup, still dunno what's happening, sorry".
>>>    
>>>> Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
>>>> event.
>>>
>>> Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
>>> be "guest rejected hotplug", or something more specific, in the rare
>>> case that the guest has a way of signalling something more specific,
>>> or "timeout" - but the later *only* to be sent in cases where on the
>>> timeout we're able to block any later completion of the unplug (as we
>>> can on papr).
> 
> Is canceling unplug on timeout documented somewhere (like some spec)?

Nope.

> 
> If not it might (theoretically) confuse guest when it tries to unplug
> after timeout and leave guest in some unexpected state.

I consider this a remote possibility due to the generous timeout we're
using in the machine, but it is a possibility nevertheless.

> 
>>
>> I believe that's already covered by the existing API:
>>
>>
>> +# @DEVICE_UNPLUG_ERROR:
>> +#
>> +# Emitted when a device hot unplug error occurs.
>> +#
>> +# @device: device name
>> +#
>> +# @msg: Informative message
>>
>> The 'informative message' would be the reason the event occurred. In patch
>> 4/4, for the memory hotunplug refused by the guest, it is being set as:
>>
>>        qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>>                                     "for device %s", dev->id);
>>        qapi_event_send_device_unplug_error(dev->id, qapi_error);
>>
>>
>>
>> We could use the same DEVICE_UNPLUG_ERROR event in the CPU hotunplug timeout
>> case (currently on patch 2/4) by just changing 'msg', e.g.:
>>
>>
>>        qapi_error = g_strdup_printf("CPU hotunplug timeout for device %s", dev->id);
>>        qapi_event_send_device_unplug_error(dev->id, qapi_error);
>>
> 
> lets make everything support ACPI (just kidding).

I would love to make PAPR more ACPI and less .... PAPR.

> 
> maybe we can reuse already existing ACPI_DEVICE_OST instead of DEVICE_UNPLUG_ERROR
> which sort of does the same thing (and more) but instead of strings uses status codes
> defined by spec.
> 
> Idea similar to DEVICE_UNPLUG_ERROR was considered back then, but instead of QEMU being
> a poor translator of status codes to non machine-readable strings we went with
> exposing well documented status codes to user. This way user can implement
> specific reactions to particular errors just looking at JSON + spec.


This is not a bad idea. Problem is that we don't have all ACPI_DEVICE_OST
fields to fill in because, well, both the timeout and the cancell mechanism
aren't specified there.


Thanks,


DHB

> 
>> Thanks,
>>
>> DHB
>>
>>
>>>
>>> Thoughs, Markus?
>>>    
>>
> 

Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Posted by Daniel Henrique Barboza 3 years ago
Hello,

Quick update about this work:

- the hotunplug timeout was reverted in the pSeries machine for 6.0.0.
This means that we have no use for a DEVICE_NOT_DELETED event or similar.
I'll drop it for the next version of the series.

- there is a good chance that the pSeries kernel will introduce a way to
report hotunplug errors in v5.13 ([1] for more info). This would make
the DEVICE_UNPLUG_ERROR event relevant again (otherwise it would just be
a rebranding of the existing mem_unplug_error) since we'll be able to
properly report guest side unplug errors for all devices in the pSeries
machine, starting with CPUs.


Thanks,


DHB


[1] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210416210216.380291-3-danielhb413@gmail.com/




On 3/12/21 5:07 PM, Daniel Henrique Barboza wrote:
> Hi,
> 
> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
> 
> Patches 1 and 3 are independent of the ppc patches and can be applied
> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
> are dependent on the QAPI patches.
> 
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01900.html
> 
> Daniel Henrique Barboza (4):
>    qapi/qdev.json: add DEVICE_NOT_DELETED event
>    spapr_drc.c: send DEVICE_NOT_DELETED event on unplug timeout
>    qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
>    spapr.c: use DEVICE_UNPLUG_ERROR event in
>      spapr_memory_unplug_rollback()
> 
>   hw/ppc/spapr.c     |  2 +-
>   hw/ppc/spapr_drc.c |  8 ++++++++
>   qapi/machine.json  | 23 +++++++++++++++++++++++
>   qapi/qdev.json     | 28 ++++++++++++++++++++++++++++
>   4 files changed, 60 insertions(+), 1 deletion(-)
>