[PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state

Vladimir Sementsov-Ogievskiy posted 6 patches 8 months, 4 weeks ago
Maintainers: Raphael Norwitz <raphael@enfabrica.net>, "Michael S. Tsirkin" <mst@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Eduardo Habkost <eduardo@habkost.net>, Markus Armbruster <armbru@redhat.com>, "Dr. David Alan Gilbert" <dave@treblig.org>, Eric Blake <eblake@redhat.com>
There is a newer version of this series
[PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Vladimir Sementsov-Ogievskiy 8 months, 4 weeks ago
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 system/qdev-monitor.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 9febb743f1..cf7481e416 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -877,13 +877,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
     object_unref(OBJECT(dev));
 }
 
-static DeviceState *find_device_state(const char *id, Error **errp)
+/*
+ * Note that creating new APIs using error classes other than GenericError is
+ * not recommended. Set use_generic_error=true for new interfaces.
+ */
+static DeviceState *find_device_state(const char *id, bool use_generic_error,
+                                      Error **errp)
 {
     Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
     DeviceState *dev;
 
     if (!obj) {
-        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+        error_set(errp,
+                  (use_generic_error ?
+                   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
                   "Device '%s' not found", id);
         return NULL;
     }
@@ -947,7 +954,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
 
 void qmp_device_del(const char *id, Error **errp)
 {
-    DeviceState *dev = find_device_state(id, errp);
+    DeviceState *dev = find_device_state(id, false, errp);
     if (dev != NULL) {
         if (dev->pending_deleted_event &&
             (dev->pending_deleted_expires_ms == 0 ||
@@ -1067,7 +1074,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
 
     GLOBAL_STATE_CODE();
 
-    dev = find_device_state(id, errp);
+    dev = find_device_state(id, false, errp);
     if (dev == NULL) {
         return NULL;
     }
-- 
2.34.1
Re: [PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Markus Armbruster 8 months, 3 weeks ago
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  system/qdev-monitor.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
> index 9febb743f1..cf7481e416 100644
> --- a/system/qdev-monitor.c
> +++ b/system/qdev-monitor.c
> @@ -877,13 +877,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
>      object_unref(OBJECT(dev));
>  }
>  
> -static DeviceState *find_device_state(const char *id, Error **errp)
> +/*
> + * Note that creating new APIs using error classes other than GenericError is
> + * not recommended. Set use_generic_error=true for new interfaces.
> + */
> +static DeviceState *find_device_state(const char *id, bool use_generic_error,
> +                                      Error **errp)
>  {
>      Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
>      DeviceState *dev;
>  
>      if (!obj) {
> -        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
> +        error_set(errp,
> +                  (use_generic_error ?
> +                   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
>                    "Device '%s' not found", id);
>          return NULL;
>      }
> @@ -947,7 +954,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
>  
>  void qmp_device_del(const char *id, Error **errp)
>  {
> -    DeviceState *dev = find_device_state(id, errp);
> +    DeviceState *dev = find_device_state(id, false, errp);
>      if (dev != NULL) {
>          if (dev->pending_deleted_event &&
>              (dev->pending_deleted_expires_ms == 0 ||
> @@ -1067,7 +1074,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
>  
>      GLOBAL_STATE_CODE();
>  
> -    dev = find_device_state(id, errp);
> +    dev = find_device_state(id, false, errp);
>      if (dev == NULL) {
>          return NULL;
>      }

I appreciate the attempt to curb the spread of DeviceNotFound errors.
Two issues:

* Copy-pasting find_device_state() with a false argument is an easy
  error to make.

* Most uses of find_device_state() are via blk_by_qdev_id() and
  qmp_get_blk().  Any new uses of qemu_get_blk() will still produce
  DeviceNotFound.

Hmm.
Re: [PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Vladimir Sementsov-Ogievskiy 8 months, 3 weeks ago
On 07.03.24 12:46, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   system/qdev-monitor.c | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
>> index 9febb743f1..cf7481e416 100644
>> --- a/system/qdev-monitor.c
>> +++ b/system/qdev-monitor.c
>> @@ -877,13 +877,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
>>       object_unref(OBJECT(dev));
>>   }
>>   
>> -static DeviceState *find_device_state(const char *id, Error **errp)
>> +/*
>> + * Note that creating new APIs using error classes other than GenericError is
>> + * not recommended. Set use_generic_error=true for new interfaces.
>> + */
>> +static DeviceState *find_device_state(const char *id, bool use_generic_error,
>> +                                      Error **errp)
>>   {
>>       Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
>>       DeviceState *dev;
>>   
>>       if (!obj) {
>> -        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
>> +        error_set(errp,
>> +                  (use_generic_error ?
>> +                   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
>>                     "Device '%s' not found", id);
>>           return NULL;
>>       }
>> @@ -947,7 +954,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
>>   
>>   void qmp_device_del(const char *id, Error **errp)
>>   {
>> -    DeviceState *dev = find_device_state(id, errp);
>> +    DeviceState *dev = find_device_state(id, false, errp);
>>       if (dev != NULL) {
>>           if (dev->pending_deleted_event &&
>>               (dev->pending_deleted_expires_ms == 0 ||
>> @@ -1067,7 +1074,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
>>   
>>       GLOBAL_STATE_CODE();
>>   
>> -    dev = find_device_state(id, errp);
>> +    dev = find_device_state(id, false, errp);
>>       if (dev == NULL) {
>>           return NULL;
>>       }
> 
> I appreciate the attempt to curb the spread of DeviceNotFound errors.
> Two issues:
> 
> * Copy-pasting find_device_state() with a false argument is an easy
>    error to make.
> 
> * Most uses of find_device_state() are via blk_by_qdev_id() and
>    qmp_get_blk().  Any new uses of qemu_get_blk() will still produce
>    DeviceNotFound.
> 

Hm. But what to do?

- rename all three functions to FOO_base(), and add a boolean parameter
- add FOO() wrappers, passing true (use generic error)
- add FOO_deprecated() wrappers, passing false (use device not found error)
- change existing callers to use FOO_deprecated()

Still, new generic-error-based blk_by_qdev_id() and qmp_get_blk() will be unused..

That seem too ugly for me. And that doesn't give 100% sure that nobody will simply duplicate call to _deprecated() function...

Maybe better don't care for now (and continue use device-not-found for new APIs, that reuse find_device_state()), and just declare the deprecation for ERROR_CLASS_DEVICE_NOT_FOUND? And drop it usage everywhere after 3-releases deprecation cycle.

Or do we want deprecate everything except for generic-error, and deprecate error/class field in qmp return value altogether?

-- 
Best regards,
Vladimir
Re: [PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Vladimir Sementsov-Ogievskiy 8 months, 3 weeks ago
On 07.03.24 12:46, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   system/qdev-monitor.c | 15 +++++++++++----
>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
>> index 9febb743f1..cf7481e416 100644
>> --- a/system/qdev-monitor.c
>> +++ b/system/qdev-monitor.c
>> @@ -877,13 +877,20 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
>>       object_unref(OBJECT(dev));
>>   }
>>   
>> -static DeviceState *find_device_state(const char *id, Error **errp)
>> +/*
>> + * Note that creating new APIs using error classes other than GenericError is
>> + * not recommended. Set use_generic_error=true for new interfaces.
>> + */
>> +static DeviceState *find_device_state(const char *id, bool use_generic_error,
>> +                                      Error **errp)
>>   {
>>       Object *obj = object_resolve_path_at(qdev_get_peripheral(), id);
>>       DeviceState *dev;
>>   
>>       if (!obj) {
>> -        error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
>> +        error_set(errp,
>> +                  (use_generic_error ?
>> +                   ERROR_CLASS_GENERIC_ERROR : ERROR_CLASS_DEVICE_NOT_FOUND),
>>                     "Device '%s' not found", id);
>>           return NULL;
>>       }
>> @@ -947,7 +954,7 @@ void qdev_unplug(DeviceState *dev, Error **errp)
>>   
>>   void qmp_device_del(const char *id, Error **errp)
>>   {
>> -    DeviceState *dev = find_device_state(id, errp);
>> +    DeviceState *dev = find_device_state(id, false, errp);
>>       if (dev != NULL) {
>>           if (dev->pending_deleted_event &&
>>               (dev->pending_deleted_expires_ms == 0 ||
>> @@ -1067,7 +1074,7 @@ BlockBackend *blk_by_qdev_id(const char *id, Error **errp)
>>   
>>       GLOBAL_STATE_CODE();
>>   
>> -    dev = find_device_state(id, errp);
>> +    dev = find_device_state(id, false, errp);
>>       if (dev == NULL) {
>>           return NULL;
>>       }
> 
> I appreciate the attempt to curb the spread of DeviceNotFound errors.
> Two issues:
> 
> * Copy-pasting find_device_state() with a false argument is an easy
>    error to make.
> 
> * Most uses of find_device_state() are via blk_by_qdev_id() and
>    qmp_get_blk().  Any new uses of qemu_get_blk() will still produce
>    DeviceNotFound.
> 
> Hmm.

Hmm. Right. Wait a bit, I can make the change stricter.

Could you still explain (or give a link), why and when we decided to use only GenericError? I think, having different "error-codes" is a good thing, why we are trying to get rid of it?

-- 
Best regards,
Vladimir
Re: [PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Markus Armbruster 8 months, 2 weeks ago
Sorry for the late answer.

Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

> On 07.03.24 12:46, Markus Armbruster wrote:

[...]

>> I appreciate the attempt to curb the spread of DeviceNotFound errors.
>> Two issues:
>>
>> * Copy-pasting find_device_state() with a false argument is an easy
>>   error to make.
>>
>> * Most uses of find_device_state() are via blk_by_qdev_id() and
>>   qmp_get_blk().  Any new uses of qemu_get_blk() will still produce
>>   DeviceNotFound.
>>
>> Hmm.
>
> Hmm. Right. Wait a bit, I can make the change stricter.
>
> Could you still explain (or give a link), why and when we decided to use only GenericError? I think, having different "error-codes" is a good thing, why we are trying to get rid of it?

We actually got rid of most of them years ago :)

But you deserve a more complete answer.

QMP initially specified the following error response[1]:

    2.4.2 error
    -----------
    
    The error response is issued when the command execution could not be
    completed because of an error condition.
    
    The format is:
    
    { "error": { "class": json-string, "data": json-value }, "id": json-value }
    
     Where,
    
    - The "class" member contains the error class name (eg. "ServiceUnavailable")
    - The "data" member contains specific error data and is defined in a
      per-command basis, it will be an empty json-object if the error has no data
    - The "id" member contains the transaction identification associated with
      the command execution (if issued by the Client)

Note the structure of @data depends on @class.  We documented a
command's possible error classes (well, we tried), but never bothered to
document the @data it comes with.

Documentation deficits aside, this is looks quite expressive.  There are
issues, though:

1. Formatting errors in human-readable form is bothersome, and creates a
   tight coupling between QMP server and client.

   Consider:

    {"class": "DeviceNotFound", "data": {"device": "ide1-cd0"}}

   To format this in human-readable form, you need to know the error.

   The server does.  Fine print: it has a table mapping JSON templates
   to human-readable error message templates.

   The client needs to duplicate this somehow.  If it receives an error
   it doesn't know, all it can do is barf (possibly dressed up) JSON at
   the human.  To avoid that, clients need to track the server closely:
   tight coupling.

2. Errors have notational overhead, which leads to bad errors.

   To create a new error, you have to edit two source files (not
   counting clients).  Strong incentive to reuse existing errors.  Even
   when they don't quite fit.  When a name provided by the user couldn't
   be resolved, reusing DeviceNotFound is easier than creating a new
   error that is more precise.

3. The human-readable error message is hidden from the programmer's
   view, which leads to bad error messages.

   At the point in the source where the error is created, we see
   something like QERR_DEVICE_NOT_FOUND, name.  To see the
   human-readable message, we have to look up macro
   QERR_DEVICE_NOT_FOUND's error message template in the table, or
   actually test (*gasp*) the error.  People generally do neither, and
   bad error messages proliferate.

4. Too little gain for the pain

   Clients rarely want to handle different errors differently.  More
   often than not, all they do with @class and @data is log them.  Only
   occasionally do they switch on @class, and basically never on @data.

It me took a considerable fight to get the protocol amended to include a
human-readable message[2].  This addressed issue 1.

Over the next two years or so, issues 2. to 4. slowly sank in.  We
eventually tired of the complexity, ripped out @data, and dumbed down
all error classes to GenericError, except for the few clients actually
cared for[3].  We also mandated that new errors avoid the QERR_ macros.

Eliminating the existing QERR_ macros has been slow.  We're down to 13
in master, with patches deleting 7 on the list.

This has served us reasonably well.

Questions?


[1] Commit f544d174dfc
    QMP: Introduce specification
    Dec 2009

[2] Commit 77e595e7c61q
    QMP: add human-readable description to error response
    Dec 2009

[3] Commit de253f14912
    qmp: switch to the new error format on the wire
    Aug 2012
Re: [PATCH v2 3/6] qdev-monitor: add option to report GenericError from find_device_state
Posted by Vladimir Sementsov-Ogievskiy 8 months, 2 weeks ago
On 15.03.24 15:51, Markus Armbruster wrote:
> Sorry for the late answer.
> 
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> 
>> On 07.03.24 12:46, Markus Armbruster wrote:
> 
> [...]
> 
>>> I appreciate the attempt to curb the spread of DeviceNotFound errors.
>>> Two issues:
>>>
>>> * Copy-pasting find_device_state() with a false argument is an easy
>>>    error to make.
>>>
>>> * Most uses of find_device_state() are via blk_by_qdev_id() and
>>>    qmp_get_blk().  Any new uses of qemu_get_blk() will still produce
>>>    DeviceNotFound.
>>>
>>> Hmm.
>>
>> Hmm. Right. Wait a bit, I can make the change stricter.
>>
>> Could you still explain (or give a link), why and when we decided to use only GenericError? I think, having different "error-codes" is a good thing, why we are trying to get rid of it?
> 
> We actually got rid of most of them years ago :)
> 
> But you deserve a more complete answer.
> 
> QMP initially specified the following error response[1]:
> 
>      2.4.2 error
>      -----------
>      
>      The error response is issued when the command execution could not be
>      completed because of an error condition.
>      
>      The format is:
>      
>      { "error": { "class": json-string, "data": json-value }, "id": json-value }
>      
>       Where,
>      
>      - The "class" member contains the error class name (eg. "ServiceUnavailable")
>      - The "data" member contains specific error data and is defined in a
>        per-command basis, it will be an empty json-object if the error has no data
>      - The "id" member contains the transaction identification associated with
>        the command execution (if issued by the Client)
> 
> Note the structure of @data depends on @class.  We documented a
> command's possible error classes (well, we tried), but never bothered to
> document the @data it comes with.
> 
> Documentation deficits aside, this is looks quite expressive.  There are
> issues, though:
> 
> 1. Formatting errors in human-readable form is bothersome, and creates a
>     tight coupling between QMP server and client.
> 
>     Consider:
> 
>      {"class": "DeviceNotFound", "data": {"device": "ide1-cd0"}}
> 
>     To format this in human-readable form, you need to know the error.
> 
>     The server does.  Fine print: it has a table mapping JSON templates
>     to human-readable error message templates.
> 
>     The client needs to duplicate this somehow.  If it receives an error
>     it doesn't know, all it can do is barf (possibly dressed up) JSON at
>     the human.  To avoid that, clients need to track the server closely:
>     tight coupling.
> 
> 2. Errors have notational overhead, which leads to bad errors.
> 
>     To create a new error, you have to edit two source files (not
>     counting clients).  Strong incentive to reuse existing errors.  Even
>     when they don't quite fit.  When a name provided by the user couldn't
>     be resolved, reusing DeviceNotFound is easier than creating a new
>     error that is more precise.
> 
> 3. The human-readable error message is hidden from the programmer's
>     view, which leads to bad error messages.
> 
>     At the point in the source where the error is created, we see
>     something like QERR_DEVICE_NOT_FOUND, name.  To see the
>     human-readable message, we have to look up macro
>     QERR_DEVICE_NOT_FOUND's error message template in the table, or
>     actually test (*gasp*) the error.  People generally do neither, and
>     bad error messages proliferate.
> 
> 4. Too little gain for the pain
> 
>     Clients rarely want to handle different errors differently.  More
>     often than not, all they do with @class and @data is log them.  Only
>     occasionally do they switch on @class, and basically never on @data.
> 
> It me took a considerable fight to get the protocol amended to include a
> human-readable message[2].  This addressed issue 1.
> 
> Over the next two years or so, issues 2. to 4. slowly sank in.  We
> eventually tired of the complexity, ripped out @data, and dumbed down
> all error classes to GenericError, except for the few clients actually
> cared for[3].  We also mandated that new errors avoid the QERR_ macros.
> 
> Eliminating the existing QERR_ macros has been slow.  We're down to 13
> in master, with patches deleting 7 on the list.
> 
> This has served us reasonably well.
> 
> Questions?
> 
> 
> [1] Commit f544d174dfc
>      QMP: Introduce specification
>      Dec 2009
> 
> [2] Commit 77e595e7c61q
>      QMP: add human-readable description to error response
>      Dec 2009
> 
> [3] Commit de253f14912
>      qmp: switch to the new error format on the wire
>      Aug 2012
> 

Thanks for full-picture story! Now I'm all for it.

-- 
Best regards,
Vladimir