[PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function

Arun Menon posted 27 patches 3 months, 2 weeks ago
Maintainers: Stefan Berger <stefanb@linux.vnet.ibm.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, "Michael S. Tsirkin" <mst@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>, Dmitry Osipenko <dmitry.osipenko@collabora.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Cornelia Huck <cohuck@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Eric Farman <farman@linux.ibm.com>, Thomas Huth <thuth@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, David Hildenbrand <david@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Matthew Rosato <mjrosato@linux.ibm.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <fam@euphon.net>, Nicholas Piggin <npiggin@gmail.com>, Harsh Prateek Bora <harshpb@linux.ibm.com>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Hailiang Zhang <zhanghailiang@xfusion.com>, Steve Sistare <steven.sistare@oracle.com>, "Marc-André Lureau" <marcandre.lureau@redhat.com>
There is a newer version of this series
[PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Arun Menon 3 months, 2 weeks ago
Currently post_save hook is called without checking its return. If post_save
fails, we need to set an error and propagate it to the caller.

Since post_save hook is called regardless of whether there is a preceeding error,
it is possible that we have 2 distict errors, one from the preceeding function
call, and the other from the post_save call.

Return the latest error to the caller.

Signed-off-by: Arun Menon <armenon@redhat.com>
---
 migration/vmstate.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/migration/vmstate.c b/migration/vmstate.c
index b725202bfcf69c3c81338f1f5479aa2ddc5db86f..25a819da069b982d4043f287b4562ea402d9eb0e 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -418,6 +418,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
                          void *opaque, JSONWriter *vmdesc, int version_id, Error **errp)
 {
     int ret = 0;
+    int ps_ret = 0;
     const VMStateField *field = vmsd->fields;
 
     trace_vmstate_save_state_top(vmsd->name);
@@ -533,7 +534,14 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
                     error_setg(errp, "Save of field %s/%s failed",
                                 vmsd->name, field->name);
                     if (vmsd->post_save) {
-                        vmsd->post_save(opaque);
+                        ps_ret = vmsd->post_save(opaque);
+                        if (ps_ret) {
+                            ret = ps_ret;
+                            error_free_or_abort(errp);
+                            error_setg(errp,
+                                       "post-save for %s failed, ret: '%d'",
+                                       vmsd->name, ret);
+                        }
                     }
                     return ret;
                 }
@@ -561,10 +569,12 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
     ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp);
 
     if (vmsd->post_save) {
-        int ps_ret = vmsd->post_save(opaque);
-        if (!ret && ps_ret) {
+        ps_ret = vmsd->post_save(opaque);
+        if (ps_ret) {
             ret = ps_ret;
-            error_setg(errp, "post-save failed: %s", vmsd->name);
+            error_free_or_abort(errp);
+            error_setg(errp, "post-save for %s failed, ret: '%d'",
+                       vmsd->name, ret);
         }
     }
     return ret;

-- 
2.50.0
Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Akihiko Odaki 3 months, 2 weeks ago
On 2025/07/31 22:21, Arun Menon wrote:
> Currently post_save hook is called without checking its return. If post_save
> fails, we need to set an error and propagate it to the caller.
> 
> Since post_save hook is called regardless of whether there is a preceeding error,
> it is possible that we have 2 distict errors, one from the preceeding function
> call, and the other from the post_save call.
> 
> Return the latest error to the caller.

This needs to be explained better. This patch makes two changes on the 
behavior when there are two errors:

1) Proceeding errors were propagated before, but they are now
    dismissed.
2) post_error() errors were dismissed before, but they are now
    propagated.

This message doesn't mention 1) at all. It does say 2) is necessary, but 
does not explain why.

> 
> Signed-off-by: Arun Menon <armenon@redhat.com>
> ---
>   migration/vmstate.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index b725202bfcf69c3c81338f1f5479aa2ddc5db86f..25a819da069b982d4043f287b4562ea402d9eb0e 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -418,6 +418,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
>                            void *opaque, JSONWriter *vmdesc, int version_id, Error **errp)
>   {
>       int ret = 0;
> +    int ps_ret = 0;
>       const VMStateField *field = vmsd->fields;
>   
>       trace_vmstate_save_state_top(vmsd->name);
> @@ -533,7 +534,14 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
>                       error_setg(errp, "Save of field %s/%s failed",
>                                   vmsd->name, field->name);
>                       if (vmsd->post_save) {
> -                        vmsd->post_save(opaque);
> +                        ps_ret = vmsd->post_save(opaque);
> +                        if (ps_ret) {
> +                            ret = ps_ret;
> +                            error_free_or_abort(errp);
> +                            error_setg(errp,
> +                                       "post-save for %s failed, ret: '%d'",
> +                                       vmsd->name, ret);
> +                        }
>                       }
>                       return ret;
>                   }
> @@ -561,10 +569,12 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
>       ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp);
>   
>       if (vmsd->post_save) {
> -        int ps_ret = vmsd->post_save(opaque);
> -        if (!ret && ps_ret) {
> +        ps_ret = vmsd->post_save(opaque);
> +        if (ps_ret) {
>               ret = ps_ret;
> -            error_setg(errp, "post-save failed: %s", vmsd->name);
> +            error_free_or_abort(errp);
> +            error_setg(errp, "post-save for %s failed, ret: '%d'",
> +                       vmsd->name, ret);
>           }
>       }
>       return ret;
>
Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Arun Menon 3 months, 2 weeks ago
Hi Akihiko,
Thanks for the review.

On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote:
> On 2025/07/31 22:21, Arun Menon wrote:
> > Currently post_save hook is called without checking its return. If post_save
> > fails, we need to set an error and propagate it to the caller.
> > 
> > Since post_save hook is called regardless of whether there is a preceeding error,
> > it is possible that we have 2 distict errors, one from the preceeding function
> > call, and the other from the post_save call.
> > 
> > Return the latest error to the caller.
> 
> This needs to be explained better. This patch makes two changes on the
> behavior when there are two errors:
> 
> 1) Proceeding errors were propagated before, but they are now
>    dismissed.
> 2) post_error() errors were dismissed before, but they are now
>    propagated.
> 
> This message doesn't mention 1) at all. It does say 2) is necessary, but
> does not explain why.

Please correct me if I am wrong, does the following look ok?

Currently post_save() hook is called without checking its return.
post_save() hooks, generally does the cleanup, and that is the reason why
we have been dismissing its failure.

We want to consider
  - the saving of the device state (save or subsection save) and 
  - running the cleanup after
as an atomic operation. And that is why, post_save is called regardless
of whether there is a preceeding error.
This means that, it is possible that we have 2 distict errors, one from 
the preceeding function and the other from the post_save() function.

This patch makes two changes on the behavior when there are two errors:

1) Preceeding errors were propagated before, but they are now
   dismissed if there is a new post_save() error.
2) post_save() errors were dismissed before, but they are now
   propagated.

We intend to extend the error propagation feature (saving erros in
Error object and return to the caller) to the post/pre
save/load hooks. Therefore, it is important for the user to know
the exact reason of failure in case any of these hooks fail.

====
I do not know much about the operations we do, or intend to do
using the post_save() call. But my guess is that we are going to
revert things or cleanup stuff, and if reverting/cleanup fails,
then the user should be notified about that.

> 
> > 
> > Signed-off-by: Arun Menon <armenon@redhat.com>
> > ---
> >   migration/vmstate.c | 18 ++++++++++++++----
> >   1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/migration/vmstate.c b/migration/vmstate.c
> > index b725202bfcf69c3c81338f1f5479aa2ddc5db86f..25a819da069b982d4043f287b4562ea402d9eb0e 100644
> > --- a/migration/vmstate.c
> > +++ b/migration/vmstate.c
> > @@ -418,6 +418,7 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
> >                            void *opaque, JSONWriter *vmdesc, int version_id, Error **errp)
> >   {
> >       int ret = 0;
> > +    int ps_ret = 0;
> >       const VMStateField *field = vmsd->fields;
> >       trace_vmstate_save_state_top(vmsd->name);
> > @@ -533,7 +534,14 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
> >                       error_setg(errp, "Save of field %s/%s failed",
> >                                   vmsd->name, field->name);
> >                       if (vmsd->post_save) {
> > -                        vmsd->post_save(opaque);
> > +                        ps_ret = vmsd->post_save(opaque);
> > +                        if (ps_ret) {
> > +                            ret = ps_ret;
> > +                            error_free_or_abort(errp);
> > +                            error_setg(errp,
> > +                                       "post-save for %s failed, ret: '%d'",
> > +                                       vmsd->name, ret);
> > +                        }
> >                       }
> >                       return ret;
> >                   }
> > @@ -561,10 +569,12 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
> >       ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp);
> >       if (vmsd->post_save) {
> > -        int ps_ret = vmsd->post_save(opaque);
> > -        if (!ret && ps_ret) {
> > +        ps_ret = vmsd->post_save(opaque);
> > +        if (ps_ret) {
> >               ret = ps_ret;
> > -            error_setg(errp, "post-save failed: %s", vmsd->name);
> > +            error_free_or_abort(errp);
> > +            error_setg(errp, "post-save for %s failed, ret: '%d'",
> > +                       vmsd->name, ret);
> >           }
> >       }
> >       return ret;
> > 
> 
Regards,
Arun
Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Akihiko Odaki 3 months, 2 weeks ago
On 2025/08/02 2:27, Arun Menon wrote:
> Hi Akihiko,
> Thanks for the review.
> 
> On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote:
>> On 2025/07/31 22:21, Arun Menon wrote:
>>> Currently post_save hook is called without checking its return. If post_save
>>> fails, we need to set an error and propagate it to the caller.
>>>
>>> Since post_save hook is called regardless of whether there is a preceeding error,
>>> it is possible that we have 2 distict errors, one from the preceeding function
>>> call, and the other from the post_save call.
>>>
>>> Return the latest error to the caller.
>>
>> This needs to be explained better. This patch makes two changes on the
>> behavior when there are two errors:
>>
>> 1) Proceeding errors were propagated before, but they are now
>>     dismissed.
>> 2) post_error() errors were dismissed before, but they are now
>>     propagated.
>>
>> This message doesn't mention 1) at all. It does say 2) is necessary, but
>> does not explain why.
> 
> Please correct me if I am wrong, does the following look ok?
> 
> Currently post_save() hook is called without checking its return.
> post_save() hooks, generally does the cleanup, and that is the reason why
> we have been dismissing its failure.
> 
> We want to consider
>    - the saving of the device state (save or subsection save) and
>    - running the cleanup after
> as an atomic operation. And that is why, post_save is called regardless
> of whether there is a preceeding error.
> This means that, it is possible that we have 2 distict errors, one from
> the preceeding function and the other from the post_save() function.
> 
> This patch makes two changes on the behavior when there are two errors:
> 
> 1) Preceeding errors were propagated before, but they are now
>     dismissed if there is a new post_save() error.
> 2) post_save() errors were dismissed before, but they are now
>     propagated.
> 
> We intend to extend the error propagation feature (saving erros in
> Error object and return to the caller) to the post/pre
> save/load hooks. Therefore, it is important for the user to know
> the exact reason of failure in case any of these hooks fail.
> 
> ====
> I do not know much about the operations we do, or intend to do
> using the post_save() call. But my guess is that we are going to
> revert things or cleanup stuff, and if reverting/cleanup fails,
> then the user should be notified about that.

All what you stated sounds correct to me and explains 2). But there is 
still no reasoning for 1).

Previously I listed three possible design options:
 > - propagate whatever error easier to do so
 > - somehow combine errors
 >   (languages that support exceptions often do this)
 > - make sure that no error will happen when there is a proceeding error
 >   by having two callbacks:
 >   - One callback that is called only when there is no proceeding error
 >     and can raise an error
 >   - Another that is always called but cannot raise an error

https://lore.kernel.org/qemu-devel/1ff92294-5c8c-4b33-89c1-91d37d6accb0@rsg.ci.i.u-tokyo.ac.jp/

It will be a good reasoning if we can show propagating post_save() 
errors is easier (or more important) than propagating proceeding errors.

There are alternative options. "Making sure that no error will happen 
when there is a proceeding error by having two callbacks" is one of 
them. You wrote:
 > We want to consider
 >    - the saving of the device state (save or subsection save) and
 >    - running the cleanup after
 > as an atomic operation. And that is why, post_save is called
 > regardless of whether there is a preceeding error.
 > This means that, it is possible that we have 2 distict errors, one
 > from the preceeding function and the other from the post_save()
 > function.

But it may not be mandatory. In fact, I searched "post_save" in the code 
base and found all implementations are for cleanup and always succeeds, 
which makes sense; most (if not all) cleanup operations like g_free() 
always succeeds in general.

So perhaps "making sure that no error will happen when there is a 
proceeding error" may be feasible, and it may not even require two 
callbacks because post_save() always succeeds.

Regards,
Akihiko Odaki
Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Arun Menon 3 months, 1 week ago
Hi Akihiko,

On Sun, Aug 03, 2025 at 04:15:16PM +0900, Akihiko Odaki wrote:
> On 2025/08/02 2:27, Arun Menon wrote:
> > Hi Akihiko,
> > Thanks for the review.
> > 
> > On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote:
> > > On 2025/07/31 22:21, Arun Menon wrote:
> > > > Currently post_save hook is called without checking its return. If post_save
> > > > fails, we need to set an error and propagate it to the caller.
> > > > 
> > > > Since post_save hook is called regardless of whether there is a preceeding error,
> > > > it is possible that we have 2 distict errors, one from the preceeding function
> > > > call, and the other from the post_save call.
> > > > 
> > > > Return the latest error to the caller.
> > > 
> > > This needs to be explained better. This patch makes two changes on the
> > > behavior when there are two errors:
> > > 
> > > 1) Proceeding errors were propagated before, but they are now
> > >     dismissed.
> > > 2) post_error() errors were dismissed before, but they are now
> > >     propagated.
> > > 
> > > This message doesn't mention 1) at all. It does say 2) is necessary, but
> > > does not explain why.
> > 
> > Please correct me if I am wrong, does the following look ok?
> > 
> > Currently post_save() hook is called without checking its return.
> > post_save() hooks, generally does the cleanup, and that is the reason why
> > we have been dismissing its failure.
> > 
> > We want to consider
> >    - the saving of the device state (save or subsection save) and
> >    - running the cleanup after
> > as an atomic operation. And that is why, post_save is called regardless
> > of whether there is a preceeding error.
> > This means that, it is possible that we have 2 distict errors, one from
> > the preceeding function and the other from the post_save() function.
> > 
> > This patch makes two changes on the behavior when there are two errors:
> > 
> > 1) Preceeding errors were propagated before, but they are now
> >     dismissed if there is a new post_save() error.
> > 2) post_save() errors were dismissed before, but they are now
> >     propagated.
> > 
> > We intend to extend the error propagation feature (saving erros in
> > Error object and return to the caller) to the post/pre
> > save/load hooks. Therefore, it is important for the user to know
> > the exact reason of failure in case any of these hooks fail.
> > 
> > ====
> > I do not know much about the operations we do, or intend to do
> > using the post_save() call. But my guess is that we are going to
> > revert things or cleanup stuff, and if reverting/cleanup fails,
> > then the user should be notified about that.
> 
> All what you stated sounds correct to me and explains 2). But there is still
> no reasoning for 1).
> 
> Previously I listed three possible design options:
> > - propagate whatever error easier to do so
> > - somehow combine errors
> >   (languages that support exceptions often do this)
> > - make sure that no error will happen when there is a proceeding error
> >   by having two callbacks:
> >   - One callback that is called only when there is no proceeding error
> >     and can raise an error
> >   - Another that is always called but cannot raise an error
> 
> https://lore.kernel.org/qemu-devel/1ff92294-5c8c-4b33-89c1-91d37d6accb0@rsg.ci.i.u-tokyo.ac.jp/
> 
> It will be a good reasoning if we can show propagating post_save() errors is
> easier (or more important) than propagating proceeding errors.
> 
> There are alternative options. "Making sure that no error will happen when
> there is a proceeding error by having two callbacks" is one of them. You
> wrote:
> > We want to consider
> >    - the saving of the device state (save or subsection save) and
> >    - running the cleanup after
> > as an atomic operation. And that is why, post_save is called
> > regardless of whether there is a preceeding error.
> > This means that, it is possible that we have 2 distict errors, one
> > from the preceeding function and the other from the post_save()
> > function.
> 
> But it may not be mandatory. In fact, I searched "post_save" in the code
> base and found all implementations are for cleanup and always succeeds,
> which makes sense; most (if not all) cleanup operations like g_free() always
> succeeds in general.

post_save calls do succeed in general.
Now, we are introducing the post_save_errp() function which is an alternative to the
post_save() call. We are adding the feature to catch errors in Error object,
thereby clearly considering the case where post_save_erp can go wrong.
So we have already deviated from the old design that all post_save operations are
infalliable cleaup routines. The only catch is, these alternate functions are introduced
a couple of commits later.

If that is the case, then we need to handle both the errors, and propagate only
the most significant (which in my opinion is the post_save) one.

In other words, the reasoning for 1, 
	1) Proceeding errors were propagated before, but they are now dismissed.

is that,
a failure during the preceeding operation means that we failed in saving the state
but a failure during the post-save call, means that we failed in cleanup or restoring
back the system to a stable state, which is a more serious one, because it may lead to
the system being in an inconsistent state.
Can I write something on these lines, to justify the change?

> 
> So perhaps "making sure that no error will happen when there is a proceeding
> error" may be feasible, and it may not even require two callbacks because
> post_save() always succeeds.

I am sorry, I could not completely understand the part, where you mentioned that
we need to have 2 callbacks; I am assuming you are mentioning something like try..catch,
finally blocks from the other languages. 
post_save (that can not raise an error) being called in the goto block.

So if the main operation fails, we say goto cleanup, and then execute post_save which always
succeeds. If the main operation succeeds then we call post_save that can raise an error. Please correct
me if I am wrong.


> 
> Regards,
> Akihiko Odaki
> 

Regards,
Arun
Re: [PATCH v8 23/27] migration: Propagate last encountered error in vmstate_save_state_v() function
Posted by Akihiko Odaki 3 months, 1 week ago
On 2025/08/04 16:48, Arun Menon wrote:
> Hi Akihiko,
> 
> On Sun, Aug 03, 2025 at 04:15:16PM +0900, Akihiko Odaki wrote:
>> On 2025/08/02 2:27, Arun Menon wrote:
>>> Hi Akihiko,
>>> Thanks for the review.
>>>
>>> On Fri, Aug 01, 2025 at 04:35:34PM +0900, Akihiko Odaki wrote:
>>>> On 2025/07/31 22:21, Arun Menon wrote:
>>>>> Currently post_save hook is called without checking its return. If post_save
>>>>> fails, we need to set an error and propagate it to the caller.
>>>>>
>>>>> Since post_save hook is called regardless of whether there is a preceeding error,
>>>>> it is possible that we have 2 distict errors, one from the preceeding function
>>>>> call, and the other from the post_save call.
>>>>>
>>>>> Return the latest error to the caller.
>>>>
>>>> This needs to be explained better. This patch makes two changes on the
>>>> behavior when there are two errors:
>>>>
>>>> 1) Proceeding errors were propagated before, but they are now
>>>>      dismissed.
>>>> 2) post_error() errors were dismissed before, but they are now
>>>>      propagated.
>>>>
>>>> This message doesn't mention 1) at all. It does say 2) is necessary, but
>>>> does not explain why.
>>>
>>> Please correct me if I am wrong, does the following look ok?
>>>
>>> Currently post_save() hook is called without checking its return.
>>> post_save() hooks, generally does the cleanup, and that is the reason why
>>> we have been dismissing its failure.
>>>
>>> We want to consider
>>>     - the saving of the device state (save or subsection save) and
>>>     - running the cleanup after
>>> as an atomic operation. And that is why, post_save is called regardless
>>> of whether there is a preceeding error.
>>> This means that, it is possible that we have 2 distict errors, one from
>>> the preceeding function and the other from the post_save() function.
>>>
>>> This patch makes two changes on the behavior when there are two errors:
>>>
>>> 1) Preceeding errors were propagated before, but they are now
>>>      dismissed if there is a new post_save() error.
>>> 2) post_save() errors were dismissed before, but they are now
>>>      propagated.
>>>
>>> We intend to extend the error propagation feature (saving erros in
>>> Error object and return to the caller) to the post/pre
>>> save/load hooks. Therefore, it is important for the user to know
>>> the exact reason of failure in case any of these hooks fail.
>>>
>>> ====
>>> I do not know much about the operations we do, or intend to do
>>> using the post_save() call. But my guess is that we are going to
>>> revert things or cleanup stuff, and if reverting/cleanup fails,
>>> then the user should be notified about that.
>>
>> All what you stated sounds correct to me and explains 2). But there is still
>> no reasoning for 1).
>>
>> Previously I listed three possible design options:
>>> - propagate whatever error easier to do so
>>> - somehow combine errors
>>>    (languages that support exceptions often do this)
>>> - make sure that no error will happen when there is a proceeding error
>>>    by having two callbacks:
>>>    - One callback that is called only when there is no proceeding error
>>>      and can raise an error
>>>    - Another that is always called but cannot raise an error
>>
>> https://lore.kernel.org/qemu-devel/1ff92294-5c8c-4b33-89c1-91d37d6accb0@rsg.ci.i.u-tokyo.ac.jp/
>>
>> It will be a good reasoning if we can show propagating post_save() errors is
>> easier (or more important) than propagating proceeding errors.
>>
>> There are alternative options. "Making sure that no error will happen when
>> there is a proceeding error by having two callbacks" is one of them. You
>> wrote:
>>> We want to consider
>>>     - the saving of the device state (save or subsection save) and
>>>     - running the cleanup after
>>> as an atomic operation. And that is why, post_save is called
>>> regardless of whether there is a preceeding error.
>>> This means that, it is possible that we have 2 distict errors, one
>>> from the preceeding function and the other from the post_save()
>>> function.
>>
>> But it may not be mandatory. In fact, I searched "post_save" in the code
>> base and found all implementations are for cleanup and always succeeds,
>> which makes sense; most (if not all) cleanup operations like g_free() always
>> succeeds in general.
> 
> post_save calls do succeed in general.
> Now, we are introducing the post_save_errp() function which is an alternative to the
> post_save() call. We are adding the feature to catch errors in Error object,
> thereby clearly considering the case where post_save_erp can go wrong.
> So we have already deviated from the old design that all post_save operations are
> infalliable cleaup routines. The only catch is, these alternate functions are introduced
> a couple of commits later.
> 
> If that is the case, then we need to handle both the errors, and propagate only
> the most significant (which in my opinion is the post_save) one.
> 
> In other words, the reasoning for 1,
> 	1) Proceeding errors were propagated before, but they are now dismissed.
> 
> is that,
> a failure during the preceeding operation means that we failed in saving the state
> but a failure during the post-save call, means that we failed in cleanup or restoring
> back the system to a stable state, which is a more serious one, because it may lead to
> the system being in an inconsistent state.
> Can I write something on these lines, to justify the change?

That is better, but also gives another question: when will "cleanup or 
restoring back the system to a stable state" fail? We already know that 
the all post_save() always succeeds, so there should be some anticipated 
use case that requires us to "deviate from the old design that all 
post_save operations are infalliable cleanup routines".

> 
>>
>> So perhaps "making sure that no error will happen when there is a proceeding
>> error" may be feasible, and it may not even require two callbacks because
>> post_save() always succeeds.
> 
> I am sorry, I could not completely understand the part, where you mentioned that
> we need to have 2 callbacks; I am assuming you are mentioning something like try..catch,
> finally blocks from the other languages.
> post_save (that can not raise an error) being called in the goto block.

Such two callbacks would handle two scenarios you previously wrote, 
respectively:
 > We want to consider
 >     - the saving of the device state (save or subsection save) and
 >     - running the cleanup after
 > as an atomic operation.

An analogy with try/catch/finally will indeed clarify the idea, though 
"catch" is not relevant here. It will look like as follows:

try {
   /* A pseudo function that represents the save operation. */
   save();

   /*
    * A new callback that can raise an error.
    * It will not be called if save() fails.
    */
   vmsd->commit_save();
} finally {
   /*
    * The old callback. Returning non-zero value is deprecated.
    * Any implementations that return a non-zero value should be
    * gradually converted to the new commit_save() callback.
    */
   vmsd->post_save();
}

> 
> So if the main operation fails, we say goto cleanup, and then execute post_save which always
> succeeds. If the main operation succeeds then we call post_save that can raise an error. Please correct
> me if I am wrong.

Yes, that's what I meant.

Regards,
Akihiko Odaki