From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
It's possible for load_cleanup SaveVMHandler to get called without
load_setup handler being called first.
Since we'll be soon running cleanup operations there that access objects
that need earlier initialization in load_setup let's make sure these
cleanups only run when load_setup handler had indeed been called
earlier.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
hw/vfio/migration.c | 21 +++++++++++++++++++--
include/hw/vfio/vfio-common.h | 1 +
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 01aa11013e42..9e2657073012 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -688,16 +688,33 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
{
VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
+ int ret;
+
+ assert(!migration->load_setup);
+
+ ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
+ migration->device_state, errp);
+ if (ret) {
+ return ret;
+ }
- return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
- vbasedev->migration->device_state, errp);
+ migration->load_setup = true;
+
+ return 0;
}
static int vfio_load_cleanup(void *opaque)
{
VFIODevice *vbasedev = opaque;
+ VFIOMigration *migration = vbasedev->migration;
+
+ if (!migration->load_setup) {
+ return 0;
+ }
vfio_migration_cleanup(vbasedev);
+ migration->load_setup = false;
trace_vfio_load_cleanup(vbasedev->name);
return 0;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e0ce6ec3a9b3..246250ed8b75 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,7 @@ typedef struct VFIOMigration {
VMChangeStateEntry *vm_state;
NotifierWithReturn migration_state;
uint32_t device_state;
+ bool load_setup;
int data_fd;
void *data_buffer;
size_t data_buffer_size;
On 11/17/24 20:20, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>
> It's possible for load_cleanup SaveVMHandler to get called without
> load_setup handler being called first.
>
> Since we'll be soon running cleanup operations there that access objects
> that need earlier initialization in load_setup let's make sure these
> cleanups only run when load_setup handler had indeed been called
> earlier.
>
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
attributes we have in some structs, so nothing new or really wrong.
But it does look like a workaound for a problem or cleanups missing
that would need time to untangle.
I would prefer to avoid this change and address the issue from the
migration subsystem if possible.
Thanks,
C.
> ---
> hw/vfio/migration.c | 21 +++++++++++++++++++--
> include/hw/vfio/vfio-common.h | 1 +
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 01aa11013e42..9e2657073012 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -688,16 +688,33 @@ static void vfio_save_state(QEMUFile *f, void *opaque)
> static int vfio_load_setup(QEMUFile *f, void *opaque, Error **errp)
> {
> VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> + int ret;
> +
> + assert(!migration->load_setup);
> +
> + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> + migration->device_state, errp);
> + if (ret) {
> + return ret;
> + }
>
> - return vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING,
> - vbasedev->migration->device_state, errp);
> + migration->load_setup = true;
> +
> + return 0;
> }
>
> static int vfio_load_cleanup(void *opaque)
> {
> VFIODevice *vbasedev = opaque;
> + VFIOMigration *migration = vbasedev->migration;
> +
> + if (!migration->load_setup) {
> + return 0;
> + }
>
> vfio_migration_cleanup(vbasedev);
> + migration->load_setup = false;
> trace_vfio_load_cleanup(vbasedev->name);
>
> return 0;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e0ce6ec3a9b3..246250ed8b75 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -66,6 +66,7 @@ typedef struct VFIOMigration {
> VMChangeStateEntry *vm_state;
> NotifierWithReturn migration_state;
> uint32_t device_state;
> + bool load_setup;
> int data_fd;
> void *data_buffer;
> size_t data_buffer_size;
>
On 29.11.2024 15:08, Cédric Le Goater wrote:
> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>
>> It's possible for load_cleanup SaveVMHandler to get called without
>> load_setup handler being called first.
>>
>> Since we'll be soon running cleanup operations there that access objects
>> that need earlier initialization in load_setup let's make sure these
>> cleanups only run when load_setup handler had indeed been called
>> earlier.
>>
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>
> tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
> attributes we have in some structs, so nothing new or really wrong.
> But it does look like a workaound for a problem or cleanups missing
> that would need time to untangle.
>
> I would prefer to avoid this change and address the issue from the
> migration subsystem if possible.
While it would be pretty simple to only call {load,save}_cleanup
SaveVMHandlers when the relevant {load,save}_setup handler was
successfully called first this would amount to a change of these
handler semantics.
This would risk introducing regressions - for example vfio_save_setup()
doesn't clean up (free) newly allocated migration->data_buffer
if vfio_migration_set_state() were to fail later in this handler
and relies on an unconstitutional call to vfio_save_cleanup() in
order to clean it up.
There might be similar issues in other drivers too.
> Thanks,
>
> C.
Thanks,
Maciej
On 29/11/2024 19:15, Maciej S. Szmigiero wrote:
> External email: Use caution opening links or attachments
>
>
> On 29.11.2024 15:08, Cédric Le Goater wrote:
>> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>
>>> It's possible for load_cleanup SaveVMHandler to get called without
>>> load_setup handler being called first.
>>>
>>> Since we'll be soon running cleanup operations there that access
>>> objects
>>> that need earlier initialization in load_setup let's make sure these
>>> cleanups only run when load_setup handler had indeed been called
>>> earlier.
>>>
>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>
>> tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
>> attributes we have in some structs, so nothing new or really wrong.
>> But it does look like a workaound for a problem or cleanups missing
>> that would need time to untangle.
>>
>> I would prefer to avoid this change and address the issue from the
>> migration subsystem if possible.
>
> While it would be pretty simple to only call {load,save}_cleanup
> SaveVMHandlers when the relevant {load,save}_setup handler was
> successfully called first this would amount to a change of these
> handler semantics.
>
> This would risk introducing regressions - for example vfio_save_setup()
> doesn't clean up (free) newly allocated migration->data_buffer
> if vfio_migration_set_state() were to fail later in this handler
> and relies on an unconstitutional call to vfio_save_cleanup() in
> order to clean it up.
>
> There might be similar issues in other drivers too.
We can put all objects related to multifd load in their own struct (as
suggested by Cedric in patch #22) and allocate the struct only if
multifd device state transfer is used.
Then in the cleanup flow we clean the struct only if it was allocated.
This way we don't need to add the load_setup flag and we can keep the
SaveVMHandlers semantics as is.
Do you think this will be OK?
Thanks.
On 3.12.2024 16:09, Avihai Horon wrote:
>
> On 29/11/2024 19:15, Maciej S. Szmigiero wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 29.11.2024 15:08, Cédric Le Goater wrote:
>>> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>
>>>> It's possible for load_cleanup SaveVMHandler to get called without
>>>> load_setup handler being called first.
>>>>
>>>> Since we'll be soon running cleanup operations there that access objects
>>>> that need earlier initialization in load_setup let's make sure these
>>>> cleanups only run when load_setup handler had indeed been called
>>>> earlier.
>>>>
>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>
>>> tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
>>> attributes we have in some structs, so nothing new or really wrong.
>>> But it does look like a workaound for a problem or cleanups missing
>>> that would need time to untangle.
>>>
>>> I would prefer to avoid this change and address the issue from the
>>> migration subsystem if possible.
>>
>> While it would be pretty simple to only call {load,save}_cleanup
>> SaveVMHandlers when the relevant {load,save}_setup handler was
>> successfully called first this would amount to a change of these
>> handler semantics.
>>
>> This would risk introducing regressions - for example vfio_save_setup()
>> doesn't clean up (free) newly allocated migration->data_buffer
>> if vfio_migration_set_state() were to fail later in this handler
>> and relies on an unconstitutional call to vfio_save_cleanup() in
>> order to clean it up.
>>
>> There might be similar issues in other drivers too.
>
> We can put all objects related to multifd load in their own struct (as suggested by Cedric in patch #22) and allocate the struct only if multifd device state transfer is used.
> Then in the cleanup flow we clean the struct only if it was allocated.
>
> This way we don't need to add the load_setup flag and we can keep the SaveVMHandlers semantics as is.
>
> Do you think this will be OK?
I think here the discussion is more of whether we refactor the
{load,save}_cleanup handler semantics to "cleaner" design where
these handlers are only called if the relevant {load,save}_setup
handler was successfully called first (but at the same time risk
introducing regressions).
If we keep the existing semantics of these handlers (like this
patch set did) then it is just an implementation detail whether
we keep an explicit flag like "migration->load_setup" or have
a struct pointer that serves as an implicit equivalent flag
(when not NULL) - I don't have a strong opinion on this particular
detail.
> Thanks.
>
Thanks,
Maciej
On 11/12/2024 1:04, Maciej S. Szmigiero wrote:
> External email: Use caution opening links or attachments
>
>
> On 3.12.2024 16:09, Avihai Horon wrote:
>>
>> On 29/11/2024 19:15, Maciej S. Szmigiero wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 29.11.2024 15:08, Cédric Le Goater wrote:
>>>> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>>
>>>>> It's possible for load_cleanup SaveVMHandler to get called without
>>>>> load_setup handler being called first.
>>>>>
>>>>> Since we'll be soon running cleanup operations there that access
>>>>> objects
>>>>> that need earlier initialization in load_setup let's make sure these
>>>>> cleanups only run when load_setup handler had indeed been called
>>>>> earlier.
>>>>>
>>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>>
>>>> tbh, that's a bit ugly. I agree it's similar to those 'bool
>>>> initialized'
>>>> attributes we have in some structs, so nothing new or really wrong.
>>>> But it does look like a workaound for a problem or cleanups missing
>>>> that would need time to untangle.
>>>>
>>>> I would prefer to avoid this change and address the issue from the
>>>> migration subsystem if possible.
>>>
>>> While it would be pretty simple to only call {load,save}_cleanup
>>> SaveVMHandlers when the relevant {load,save}_setup handler was
>>> successfully called first this would amount to a change of these
>>> handler semantics.
>>>
>>> This would risk introducing regressions - for example vfio_save_setup()
>>> doesn't clean up (free) newly allocated migration->data_buffer
>>> if vfio_migration_set_state() were to fail later in this handler
>>> and relies on an unconstitutional call to vfio_save_cleanup() in
>>> order to clean it up.
>>>
>>> There might be similar issues in other drivers too.
>>
>> We can put all objects related to multifd load in their own struct
>> (as suggested by Cedric in patch #22) and allocate the struct only if
>> multifd device state transfer is used.
>> Then in the cleanup flow we clean the struct only if it was allocated.
>>
>> This way we don't need to add the load_setup flag and we can keep the
>> SaveVMHandlers semantics as is.
>>
>> Do you think this will be OK?
>
> I think here the discussion is more of whether we refactor the
> {load,save}_cleanup handler semantics to "cleaner" design where
> these handlers are only called if the relevant {load,save}_setup
> handler was successfully called first (but at the same time risk
> introducing regressions).
Yes, and I agree with you that changing the semantics of SaveVMHandlers
can be risky and may deserve a series of its own.
But Cedric didn't like the flag option, so I suggested to do what we
usually do, AFAIU, which is to check if the structs are allocated and
need cleanup.
>
>
> If we keep the existing semantics of these handlers (like this
> patch set did) then it is just an implementation detail whether
> we keep an explicit flag like "migration->load_setup" or have
> a struct pointer that serves as an implicit equivalent flag
> (when not NULL) - I don't have a strong opinion on this particular
> detail.
>
I prefer the struct pointer way, it seems less cumbersome to me.
But it's Cedric's call at the end.
Thanks.
On 12.12.2024 15:30, Avihai Horon wrote:
>
> On 11/12/2024 1:04, Maciej S. Szmigiero wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 3.12.2024 16:09, Avihai Horon wrote:
>>>
>>> On 29/11/2024 19:15, Maciej S. Szmigiero wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> On 29.11.2024 15:08, Cédric Le Goater wrote:
>>>>> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>>>
>>>>>> It's possible for load_cleanup SaveVMHandler to get called without
>>>>>> load_setup handler being called first.
>>>>>>
>>>>>> Since we'll be soon running cleanup operations there that access objects
>>>>>> that need earlier initialization in load_setup let's make sure these
>>>>>> cleanups only run when load_setup handler had indeed been called
>>>>>> earlier.
>>>>>>
>>>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>>>
>>>>> tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
>>>>> attributes we have in some structs, so nothing new or really wrong.
>>>>> But it does look like a workaound for a problem or cleanups missing
>>>>> that would need time to untangle.
>>>>>
>>>>> I would prefer to avoid this change and address the issue from the
>>>>> migration subsystem if possible.
>>>>
>>>> While it would be pretty simple to only call {load,save}_cleanup
>>>> SaveVMHandlers when the relevant {load,save}_setup handler was
>>>> successfully called first this would amount to a change of these
>>>> handler semantics.
>>>>
>>>> This would risk introducing regressions - for example vfio_save_setup()
>>>> doesn't clean up (free) newly allocated migration->data_buffer
>>>> if vfio_migration_set_state() were to fail later in this handler
>>>> and relies on an unconstitutional call to vfio_save_cleanup() in
>>>> order to clean it up.
>>>>
>>>> There might be similar issues in other drivers too.
>>>
>>> We can put all objects related to multifd load in their own struct (as suggested by Cedric in patch #22) and allocate the struct only if multifd device state transfer is used.
>>> Then in the cleanup flow we clean the struct only if it was allocated.
>>>
>>> This way we don't need to add the load_setup flag and we can keep the SaveVMHandlers semantics as is.
>>>
>>> Do you think this will be OK?
>>
>> I think here the discussion is more of whether we refactor the
>> {load,save}_cleanup handler semantics to "cleaner" design where
>> these handlers are only called if the relevant {load,save}_setup
>> handler was successfully called first (but at the same time risk
>> introducing regressions).
>
> Yes, and I agree with you that changing the semantics of SaveVMHandlers can be risky and may deserve a series of its own.
> But Cedric didn't like the flag option, so I suggested to do what we usually do, AFAIU, which is to check if the structs are allocated and need cleanup.
>
>>
>>
>> If we keep the existing semantics of these handlers (like this
>> patch set did) then it is just an implementation detail whether
>> we keep an explicit flag like "migration->load_setup" or have
>> a struct pointer that serves as an implicit equivalent flag
>> (when not NULL) - I don't have a strong opinion on this particular
>> detail.
>>
> I prefer the struct pointer way, it seems less cumbersome to me.
> But it's Cedric's call at the end.
As I wrote above "I don't have a strong opinion on this particular
detail" - I'm okay with moving these new variables to a dedicated
struct.
I guess this means we settled on *not* changing the semantics of
{load,save}_cleanup handler SaveVMHandlers - that was the important
decision for me.
> Thanks.
>
>
Thanks,
Maciej
On 12/12/24 23:52, Maciej S. Szmigiero wrote:
> On 12.12.2024 15:30, Avihai Horon wrote:
>>
>> On 11/12/2024 1:04, Maciej S. Szmigiero wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 3.12.2024 16:09, Avihai Horon wrote:
>>>>
>>>> On 29/11/2024 19:15, Maciej S. Szmigiero wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> On 29.11.2024 15:08, Cédric Le Goater wrote:
>>>>>> On 11/17/24 20:20, Maciej S. Szmigiero wrote:
>>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
>>>>>>>
>>>>>>> It's possible for load_cleanup SaveVMHandler to get called without
>>>>>>> load_setup handler being called first.
>>>>>>>
>>>>>>> Since we'll be soon running cleanup operations there that access objects
>>>>>>> that need earlier initialization in load_setup let's make sure these
>>>>>>> cleanups only run when load_setup handler had indeed been called
>>>>>>> earlier.
>>>>>>>
>>>>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
>>>>>>
>>>>>> tbh, that's a bit ugly. I agree it's similar to those 'bool initialized'
>>>>>> attributes we have in some structs, so nothing new or really wrong.
>>>>>> But it does look like a workaound for a problem or cleanups missing
>>>>>> that would need time to untangle.
>>>>>>
>>>>>> I would prefer to avoid this change and address the issue from the
>>>>>> migration subsystem if possible.
>>>>>
>>>>> While it would be pretty simple to only call {load,save}_cleanup
>>>>> SaveVMHandlers when the relevant {load,save}_setup handler was
>>>>> successfully called first this would amount to a change of these
>>>>> handler semantics.
>>>>>
>>>>> This would risk introducing regressions - for example vfio_save_setup()
>>>>> doesn't clean up (free) newly allocated migration->data_buffer
>>>>> if vfio_migration_set_state() were to fail later in this handler
>>>>> and relies on an unconstitutional call to vfio_save_cleanup() in
>>>>> order to clean it up.
>>>>>
>>>>> There might be similar issues in other drivers too.
>>>>
>>>> We can put all objects related to multifd load in their own struct (as suggested by Cedric in patch #22) and allocate the struct only if multifd device state transfer is used.
>>>> Then in the cleanup flow we clean the struct only if it was allocated.
>>>>
>>>> This way we don't need to add the load_setup flag and we can keep the SaveVMHandlers semantics as is.
>>>>
>>>> Do you think this will be OK?
>>>
>>> I think here the discussion is more of whether we refactor the
>>> {load,save}_cleanup handler semantics to "cleaner" design where
>>> these handlers are only called if the relevant {load,save}_setup
>>> handler was successfully called first (but at the same time risk
>>> introducing regressions).
>>
>> Yes, and I agree with you that changing the semantics of SaveVMHandlers can be risky and may deserve a series of its own.
>> But Cedric didn't like the flag option, so I suggested to do what we usually do, AFAIU, which is to check if the structs are allocated and need cleanup.
>>
>>>
>>>
>>> If we keep the existing semantics of these handlers (like this
>>> patch set did) then it is just an implementation detail whether
>>> we keep an explicit flag like "migration->load_setup" or have
>>> a struct pointer that serves as an implicit equivalent flag
>>> (when not NULL) - I don't have a strong opinion on this particular
>>> detail.
>>>
>> I prefer the struct pointer way, it seems less cumbersome to me.
>> But it's Cedric's call at the end.
>
> As I wrote above "I don't have a strong opinion on this particular
> detail" - I'm okay with moving these new variables to a dedicated
> struct.
I would prefer that, to isolate multifd migation support from the rest.
> I guess this means we settled on *not* changing the semantics of
> {load,save}_cleanup handler SaveVMHandlers - that was the important
> decision for me.
Handling errors locally in SaveVMHandlers and unrolling what was done
previously is better practice than relying on another callback to do
the cleanup.
Let's see when v4 comes out.
Thanks,
C.
© 2016 - 2026 Red Hat, Inc.