Hi Akihiko,
Thanks for the review.
On Wed, Aug 06, 2025 at 02:24:37PM +0900, Akihiko Odaki wrote:
> On 2025/08/06 3:25, Arun Menon wrote:
> > We consider,
> > - the saving of the device state (save or subsection save) and
> > - running the cleanup after (post_save)
> > as an atomic operation. And that is why, post_save() is called regardless
> > of whether there is a preceeding error. This means that, it is possible
> > that we have 2 distict errors, one from the preceeding function and the
> > other from the post_save() function.
> >
> > This commit changes the error handling behavior when two errors occur during
> > a save operation.
> >
> > 1) Preceding errors were propagated before, but they are now dismissed, if there
> > is a new post_save() error.
> > - A failure during the main save operation, means the system failed to
> > reach a new desired state. A failure during the post-save cleanup,
> > however, is a more critical issue as it can leave the system in an
> > inconsistent or corrupted state. At present, all post_save() calls
> > succeed. However, the introduction of error handling variants of these
> > functions (post_save_errp) in the following commit, imply that we need
> > to handle and propagate these errors. Therefore, we prioritize reporting
> > the more severe error.
>
> This explains why the post_save() error is propagated instead of propagating
> the preceding error. But the preceding errors still do not have to be
> dismissed if we report them with error_report_err() instead.
Yes, both can also be reported.
>
> >
> > 2) post_save() errors were dismissed before, but they are now propagated.
> > - The original design implicitly assumed post-save hooks were infallible
> > cleanup routines.
> > This will not be the case after introducing the post/pre save/load errp
> > variant functions. Dismissing these errors prevents users from
> > identifying the specific operation that failed.
>
> Re-iterating previous comments, if introducing post save errp variant
> functions break the assumption, we just don't have to introduce one. The
> reason to introduce one needs to be explained.
Sure, let's keep the original design and try the approach where we rename
post_save to cleanup_save and return void. This should make things clear.
That way, we can avoid introducing post_save_errp() and also we can discard
patches [PATCH v9 23/27] Refactor vmstate_save_state_v() function and this
one [PATCH v9 24/27] Propagate last encountered error in vmstate_save_state_v()
function.
I will post a new version with that patch.
>
> > The entire save-and-cleanup process is treated as a single
> > logical operation, and all potential failures are communicated.
> >
> > Signed-off-by: Arun Menon <armenon@redhat.com>
> > ---
> > migration/vmstate.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/vmstate.c b/migration/vmstate.c
> > index fbc59caadbbcc75fe6de27b459aa9aa25e76aa0a..ef78a1e62ad92e9608de72d125da80ea496c8dd1 100644
> > --- a/migration/vmstate.c
> > +++ b/migration/vmstate.c
> > @@ -554,6 +554,12 @@ static int vmstate_save_dispatch(QEMUFile *f,
> > error_setg(errp, "Save of field %s/%s failed",
> > vmsd->name, field->name);
> > ps_ret = post_save_dispatch(vmsd, opaque, &local_err);
> > + if (ps_ret) {
> > + ret = ps_ret;
> > + error_free_or_abort(errp);
> > + error_setg(errp, "post-save for %s failed, ret: %d",
> > + vmsd->name, ret);
> > + }
> > return ret;
> > }
> > @@ -603,10 +609,14 @@ int vmstate_save_state_v(QEMUFile *f, const VMStateDescription *vmsd,
> > }
> > ret = vmstate_subsection_save(f, vmsd, opaque, vmdesc, errp);
> > +
> > ps_ret = post_save_dispatch(vmsd, opaque, &local_err);
> > - if (!ret && ps_ret) {
> > + if (ps_ret) {
> > + if (ret) {
> > + error_free_or_abort(errp);
> > + }
> > ret = ps_ret;
> > - error_setg(errp, "post-save failed: %s", vmsd->name);
> > + error_propagate(errp, local_err);
> > }
> > return ret;
> >
>
Regards,
Arun