After CPR transfer, source QEMU closes kvm fd and sets kvm_state to NULL,
"query-balloon" will check kvm_state->sync_mmu and trigger NULL pointer
reference.
We don't need to NULL kvm_state as all states in kvm_state aren't released
actually. Just closing kvm fd is enough so we could still query states
through "query_*" qmp command.
Opportunistically drop an unnecessary check in kvm_close().
Fixes: 7ed0919119b0 ("migration: close kvm after cpr")
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
accel/kvm/kvm-all.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 23fd491441..b4c717290d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -639,13 +639,10 @@ void kvm_close(void)
cpu->kvm_vcpu_stats_fd = -1;
}
- if (kvm_state && kvm_state->fd != -1) {
- close(kvm_state->vmfd);
- kvm_state->vmfd = -1;
- close(kvm_state->fd);
- kvm_state->fd = -1;
- }
- kvm_state = NULL;
+ close(kvm_state->vmfd);
+ kvm_state->vmfd = -1;
+ close(kvm_state->fd);
+ kvm_state->fd = -1;
}
/*
--
2.47.1
On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to NULL,
> "query-balloon" will check kvm_state->sync_mmu and trigger NULL pointer
> reference.
>
> We don't need to NULL kvm_state as all states in kvm_state aren't released
> actually. Just closing kvm fd is enough so we could still query states
> through "query_*" qmp command.
IMO this does not make sense. Much of the state in kvm_state was derived
from ioctl's on the descriptors, and closing them invalidates it. Asking
historical questions about what used to be makes no sense.
Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
- Steve
> Opportunistically drop an unnecessary check in kvm_close().
>
> Fixes: 7ed0919119b0 ("migration: close kvm after cpr")
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> accel/kvm/kvm-all.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 23fd491441..b4c717290d 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -639,13 +639,10 @@ void kvm_close(void)
> cpu->kvm_vcpu_stats_fd = -1;
> }
>
> - if (kvm_state && kvm_state->fd != -1) {
> - close(kvm_state->vmfd);
> - kvm_state->vmfd = -1;
> - close(kvm_state->fd);
> - kvm_state->fd = -1;
> - }
> - kvm_state = NULL;
> + close(kvm_state->vmfd);
> + kvm_state->vmfd = -1;
> + close(kvm_state->fd);
> + kvm_state->fd = -1;
> }
>
> /*
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>NULL,
>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL pointer
>> reference.
>>
>> We don't need to NULL kvm_state as all states in kvm_state aren't released
>> actually. Just closing kvm fd is enough so we could still query states
>> through "query_*" qmp command.
>
>IMO this does not make sense. Much of the state in kvm_state was derived
>from ioctl's on the descriptors, and closing them invalidates it. Asking
>historical questions about what used to be makes no sense.
You also have your valid point.
>
>Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
But clearing kvm_state will make some qmp commands which dereferencing
kvm_state trigger SIGSEGV. We can expect failure on those qmp, but SIGSEGV
is worse.
E.g., {"execute": "query-cpu-definitions"}, below error print with v2 but SIGSEGV with v1:
KVM get MSR (index=0x10a) feature failed, Bad file descriptor
I also see inconsistence on "query-balloon" result with v1 patch, before cpr-transfer:
{"error": {"class": "DeviceNotActive", "desc": "No balloon device has been activated"}}
after transfer:
{"error": {"class": "KVMMissingCap", "desc": "Using KVM without synchronous MMU, balloon unavailable"}}
It's confusing there is no synchronous MMU support but we do have it.
Thanks
Zhenzhong
On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>> -----Original Message-----
>> From: Steven Sistare <steven.sistare@oracle.com>
>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>> "query-balloon" after CPR transfer
>>
>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>> NULL,
>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL pointer
>>> reference.
>>>
>>> We don't need to NULL kvm_state as all states in kvm_state aren't released
>>> actually. Just closing kvm fd is enough so we could still query states
>>> through "query_*" qmp command.
>>
>> IMO this does not make sense. Much of the state in kvm_state was derived
>>from ioctl's on the descriptors, and closing them invalidates it. Asking
>> historical questions about what used to be makes no sense.
>
> You also have your valid point.
>
>>
>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
Setting kvm_allowed=false causes kvm_enabled() to return false which should
prevent kvm_state from being dereferenced anywhere:
#define kvm_enabled() (kvm_allowed)
Eg for the balloon:
static bool have_balloon(Error **errp)
{
if (kvm_enabled() && !kvm_has_sync_mmu()) {
- Steve
> But clearing kvm_state will make some qmp commands which dereferencing
> kvm_state trigger SIGSEGV. We can expect failure on those qmp, but SIGSEGV
> is worse.
>
> E.g., {"execute": "query-cpu-definitions"}, below error print with v2 but SIGSEGV with v1:
>
> KVM get MSR (index=0x10a) feature failed, Bad file descriptor
>
>
> I also see inconsistence on "query-balloon" result with v1 patch, before cpr-transfer:
>
> {"error": {"class": "DeviceNotActive", "desc": "No balloon device has been activated"}}
>
> after transfer:
>
> {"error": {"class": "KVMMissingCap", "desc": "Using KVM without synchronous MMU, balloon unavailable"}}
>
> It's confusing there is no synchronous MMU support but we do have it.
>
> Thanks
> Zhenzhong
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sistare@oracle.com>
>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>> "query-balloon" after CPR transfer
>>>
>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>>> NULL,
>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>pointer
>>>> reference.
>>>>
>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>released
>>>> actually. Just closing kvm fd is enough so we could still query states
>>>> through "query_*" qmp command.
>>>
>>> IMO this does not make sense. Much of the state in kvm_state was
>derived
>>>from ioctl's on the descriptors, and closing them invalidates it. Asking
>>> historical questions about what used to be makes no sense.
>>
>> You also have your valid point.
>>
>>>
>>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
>
>Setting kvm_allowed=false causes kvm_enabled() to return false which should
>prevent kvm_state from being dereferenced anywhere:
>
> #define kvm_enabled() (kvm_allowed)
>
> Eg for the balloon:
>
> static bool have_balloon(Error **errp)
> {
> if (kvm_enabled() && !kvm_has_sync_mmu()) {
OK, will do, clearing kvm_allowed is a good idea.
Currently there are two qmp commands "query-balloon" and "query-cpu-definitions"
causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
But clearing both kvm_allowed and kvm_state cause SIGSEGV on "query-cpu-definitions".
I'll send a patch to clearing only kvm_allowed if you are fine with it.
Thanks
Zhenzhong
On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Steven Sistare <steven.sistare@oracle.com>
>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>> "query-balloon" after CPR transfer
>>
>> On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>> "query-balloon" after CPR transfer
>>>>
>>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>>>> NULL,
>>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>> pointer
>>>>> reference.
>>>>>
>>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>> released
>>>>> actually. Just closing kvm fd is enough so we could still query states
>>>>> through "query_*" qmp command.
>>>>
>>>> IMO this does not make sense. Much of the state in kvm_state was
>> derived
>>> >from ioctl's on the descriptors, and closing them invalidates it. Asking
>>>> historical questions about what used to be makes no sense.
>>>
>>> You also have your valid point.
>>>
>>>>
>>>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
>>
>> Setting kvm_allowed=false causes kvm_enabled() to return false which should
>> prevent kvm_state from being dereferenced anywhere:
>>
>> #define kvm_enabled() (kvm_allowed)
>>
>> Eg for the balloon:
>>
>> static bool have_balloon(Error **errp)
>> {
>> if (kvm_enabled() && !kvm_has_sync_mmu()) {
>
> OK, will do, clearing kvm_allowed is a good idea.
>
> Currently there are two qmp commands "query-balloon" and "query-cpu-definitions"
> causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
> But clearing both kvm_allowed and kvm_state cause SIGSEGV on "query-cpu-definitions".
>
> I'll send a patch to clearing only kvm_allowed if you are fine with it.
I still don't love the idea. kvm_state is no longer valid.
It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
- Steve
On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
> On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
> >
> >
> > > -----Original Message-----
> > > From: Steven Sistare <steven.sistare@oracle.com>
> > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> > > "query-balloon" after CPR transfer
> > >
> > > On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
> > > > > -----Original Message-----
> > > > > From: Steven Sistare <steven.sistare@oracle.com>
> > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> > > > > "query-balloon" after CPR transfer
> > > > >
> > > > > On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
> > > > > > After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
> > > > > NULL,
> > > > > > "query-balloon" will check kvm_state->sync_mmu and trigger NULL
> > > pointer
> > > > > > reference.
> > > > > >
> > > > > > We don't need to NULL kvm_state as all states in kvm_state aren't
> > > released
> > > > > > actually. Just closing kvm fd is enough so we could still query states
> > > > > > through "query_*" qmp command.
> > > > >
> > > > > IMO this does not make sense. Much of the state in kvm_state was
> > > derived
> > > > >from ioctl's on the descriptors, and closing them invalidates it. Asking
> > > > > historical questions about what used to be makes no sense.
> > > >
> > > > You also have your valid point.
> > > >
> > > > >
> > > > > Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
> > >
> > > Setting kvm_allowed=false causes kvm_enabled() to return false which should
> > > prevent kvm_state from being dereferenced anywhere:
> > >
> > > #define kvm_enabled() (kvm_allowed)
> > >
> > > Eg for the balloon:
> > >
> > > static bool have_balloon(Error **errp)
> > > {
> > > if (kvm_enabled() && !kvm_has_sync_mmu()) {
> >
> > OK, will do, clearing kvm_allowed is a good idea.
> >
> > Currently there are two qmp commands "query-balloon" and "query-cpu-definitions"
> > causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
> > But clearing both kvm_allowed and kvm_state cause SIGSEGV on "query-cpu-definitions".
> >
> > I'll send a patch to clearing only kvm_allowed if you are fine with it.
>
> I still don't love the idea. kvm_state is no longer valid.
>
> It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
This patch / bug feels like it is side-stepping a general conceptual
question. After cpr-transfer is complete what QMP commands are still
valid to call in general ? This thread mentions two which have caused
bugs, but beyond that it feels like a large subset of QMP comamnds
are conceptually invalid to use.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote:
> On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
>> On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>> "query-balloon" after CPR transfer
>>>>
>>>> On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>>>>> -----Original Message-----
>>>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>>>> "query-balloon" after CPR transfer
>>>>>>
>>>>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>>>>>> NULL,
>>>>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>>>> pointer
>>>>>>> reference.
>>>>>>>
>>>>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>>>> released
>>>>>>> actually. Just closing kvm fd is enough so we could still query states
>>>>>>> through "query_*" qmp command.
>>>>>>
>>>>>> IMO this does not make sense. Much of the state in kvm_state was
>>>> derived
>>>>> >from ioctl's on the descriptors, and closing them invalidates it. Asking
>>>>>> historical questions about what used to be makes no sense.
>>>>>
>>>>> You also have your valid point.
>>>>>
>>>>>>
>>>>>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
>>>>
>>>> Setting kvm_allowed=false causes kvm_enabled() to return false which should
>>>> prevent kvm_state from being dereferenced anywhere:
>>>>
>>>> #define kvm_enabled() (kvm_allowed)
>>>>
>>>> Eg for the balloon:
>>>>
>>>> static bool have_balloon(Error **errp)
>>>> {
>>>> if (kvm_enabled() && !kvm_has_sync_mmu()) {
>>>
>>> OK, will do, clearing kvm_allowed is a good idea.
>>>
>>> Currently there are two qmp commands "query-balloon" and "query-cpu-definitions"
>>> causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
>>> But clearing both kvm_allowed and kvm_state cause SIGSEGV on "query-cpu-definitions".
>>>
>>> I'll send a patch to clearing only kvm_allowed if you are fine with it.
>>
>> I still don't love the idea. kvm_state is no longer valid.
>>
>> It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
>
> This patch / bug feels like it is side-stepping a general conceptual
> question. After cpr-transfer is complete what QMP commands are still
> valid to call in general ? This thread mentions two which have caused
> bugs, but beyond that it feels like a large subset of QMP comamnds
> are conceptually invalid to use.
Agreed. I don't see why these commands should be issued to the dead instance.
How about migration status, quit, and nothing else?
Half serious.
- Steve
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote:
>> On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
>>> On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>>> "query-balloon" after CPR transfer
>>>>>
>>>>> On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>>>>> "query-balloon" after CPR transfer
>>>>>>>
>>>>>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>>>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state
>to
>>>>>>> NULL,
>>>>>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>>>>> pointer
>>>>>>>> reference.
>>>>>>>>
>>>>>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>>>>> released
>>>>>>>> actually. Just closing kvm fd is enough so we could still query states
>>>>>>>> through "query_*" qmp command.
>>>>>>>
>>>>>>> IMO this does not make sense. Much of the state in kvm_state was
>>>>> derived
>>>>>> >from ioctl's on the descriptors, and closing them invalidates it.
>Asking
>>>>>>> historical questions about what used to be makes no sense.
>>>>>>
>>>>>> You also have your valid point.
>>>>>>
>>>>>>>
>>>>>>> Clearing kvm_state and setting kvm_allowed=false would be a safer
>fix.
>>>>>
>>>>> Setting kvm_allowed=false causes kvm_enabled() to return false which
>should
>>>>> prevent kvm_state from being dereferenced anywhere:
>>>>>
>>>>> #define kvm_enabled() (kvm_allowed)
>>>>>
>>>>> Eg for the balloon:
>>>>>
>>>>> static bool have_balloon(Error **errp)
>>>>> {
>>>>> if (kvm_enabled() && !kvm_has_sync_mmu()) {
>>>>
>>>> OK, will do, clearing kvm_allowed is a good idea.
>>>>
>>>> Currently there are two qmp commands "query-balloon" and
>"query-cpu-definitions"
>>>> causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
>>>> But clearing both kvm_allowed and kvm_state cause SIGSEGV on
>"query-cpu-definitions".
>>>>
>>>> I'll send a patch to clearing only kvm_allowed if you are fine with it.
>>>
>>> I still don't love the idea. kvm_state is no longer valid.
>>>
>>> It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
>>
>> This patch / bug feels like it is side-stepping a general conceptual
>> question. After cpr-transfer is complete what QMP commands are still
>> valid to call in general ? This thread mentions two which have caused
>> bugs, but beyond that it feels like a large subset of QMP comamnds
>> are conceptually invalid to use.
>
>Agreed. I don't see why these commands should be issued to the dead
>instance.
Uh, yes, that's why I hesitate if it's deserved to fix it.
>How about migration status, quit, and nothing else?
>Half serious.
We only have issues with "query-balloon" and "query-cpu-definitions", all other "query-*" qmp commands work fine.
{"execute": "query-migrate"}
{"timestamp": {"seconds": 1760067694, "microseconds": 815419}, "event": "STOP"}
{"return": {"status": "completed", "setup-time": 3, "downtime": 14, "total-time": 67, "ram": {"total": 0, "postcopy-requests": 0, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 0, "downtime-bytes": 0, "page-size": 4096, "remaining": 0, "postcopy-bytes": 0, "mbps": 58.152999999999999, "transferred": 465224, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 0, "duplicate": 0, "dirty-pages-rate": 0, "normal-bytes": 0, "normal": 0}}}
{"execute": "quit"}
{"timestamp": {"seconds": 1760067734, "microseconds": 599830}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
{"return": {}}
Connection closed by foreign host.Connection closed by foreign host.
Thanks
Zhenzhong
On Thu, Oct 09, 2025 at 10:32:34AM -0400, Steven Sistare wrote:
> On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote:
> > On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
> > > On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Steven Sistare <steven.sistare@oracle.com>
> > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> > > > > "query-balloon" after CPR transfer
> > > > >
> > > > > On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Steven Sistare <steven.sistare@oracle.com>
> > > > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> > > > > > > "query-balloon" after CPR transfer
> > > > > > >
> > > > > > > On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
> > > > > > > > After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
> > > > > > > NULL,
> > > > > > > > "query-balloon" will check kvm_state->sync_mmu and trigger NULL
> > > > > pointer
> > > > > > > > reference.
> > > > > > > >
> > > > > > > > We don't need to NULL kvm_state as all states in kvm_state aren't
> > > > > released
> > > > > > > > actually. Just closing kvm fd is enough so we could still query states
> > > > > > > > through "query_*" qmp command.
> > > > > > >
> > > > > > > IMO this does not make sense. Much of the state in kvm_state was
> > > > > derived
> > > > > > >from ioctl's on the descriptors, and closing them invalidates it. Asking
> > > > > > > historical questions about what used to be makes no sense.
> > > > > >
> > > > > > You also have your valid point.
> > > > > >
> > > > > > >
> > > > > > > Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
> > > > >
> > > > > Setting kvm_allowed=false causes kvm_enabled() to return false which should
> > > > > prevent kvm_state from being dereferenced anywhere:
> > > > >
> > > > > #define kvm_enabled() (kvm_allowed)
> > > > >
> > > > > Eg for the balloon:
> > > > >
> > > > > static bool have_balloon(Error **errp)
> > > > > {
> > > > > if (kvm_enabled() && !kvm_has_sync_mmu()) {
> > > >
> > > > OK, will do, clearing kvm_allowed is a good idea.
> > > >
> > > > Currently there are two qmp commands "query-balloon" and "query-cpu-definitions"
> > > > causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
> > > > But clearing both kvm_allowed and kvm_state cause SIGSEGV on "query-cpu-definitions".
> > > >
> > > > I'll send a patch to clearing only kvm_allowed if you are fine with it.
> > >
> > > I still don't love the idea. kvm_state is no longer valid.
> > >
> > > It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
> >
> > This patch / bug feels like it is side-stepping a general conceptual
> > question. After cpr-transfer is complete what QMP commands are still
> > valid to call in general ? This thread mentions two which have caused
> > bugs, but beyond that it feels like a large subset of QMP comamnds
> > are conceptually invalid to use.
>
> Agreed. I don't see why these commands should be issued to the dead instance.
> How about migration status, quit, and nothing else?
> Half serious.
I was hoping you'd suggest such a short list :-)
Essentially a case of calling qmp_for_each_command(), and in the callback
run qmp_disable_command() for everything not in the allow-list.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>-----Original Message-----
>From: Daniel P. Berrangé <berrange@redhat.com>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On Thu, Oct 09, 2025 at 10:32:34AM -0400, Steven Sistare wrote:
>> On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote:
>> > On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
>> > > On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>> > > >
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: Steven Sistare <steven.sistare@oracle.com>
>> > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>> > > > > "query-balloon" after CPR transfer
>> > > > >
>> > > > > On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>> > > > > > > -----Original Message-----
>> > > > > > > From: Steven Sistare <steven.sistare@oracle.com>
>> > > > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when
>execute
>> > > > > > > "query-balloon" after CPR transfer
>> > > > > > >
>> > > > > > > On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>> > > > > > > > After CPR transfer, source QEMU closes kvm fd and sets
>kvm_state to
>> > > > > > > NULL,
>> > > > > > > > "query-balloon" will check kvm_state->sync_mmu and trigger
>NULL
>> > > > > pointer
>> > > > > > > > reference.
>> > > > > > > >
>> > > > > > > > We don't need to NULL kvm_state as all states in kvm_state
>aren't
>> > > > > released
>> > > > > > > > actually. Just closing kvm fd is enough so we could still query
>states
>> > > > > > > > through "query_*" qmp command.
>> > > > > > >
>> > > > > > > IMO this does not make sense. Much of the state in kvm_state
>was
>> > > > > derived
>> > > > > > >from ioctl's on the descriptors, and closing them invalidates it.
>Asking
>> > > > > > > historical questions about what used to be makes no sense.
>> > > > > >
>> > > > > > You also have your valid point.
>> > > > > >
>> > > > > > >
>> > > > > > > Clearing kvm_state and setting kvm_allowed=false would be a
>safer fix.
>> > > > >
>> > > > > Setting kvm_allowed=false causes kvm_enabled() to return false
>which should
>> > > > > prevent kvm_state from being dereferenced anywhere:
>> > > > >
>> > > > > #define kvm_enabled() (kvm_allowed)
>> > > > >
>> > > > > Eg for the balloon:
>> > > > >
>> > > > > static bool have_balloon(Error **errp)
>> > > > > {
>> > > > > if (kvm_enabled() && !kvm_has_sync_mmu()) {
>> > > >
>> > > > OK, will do, clearing kvm_allowed is a good idea.
>> > > >
>> > > > Currently there are two qmp commands "query-balloon" and
>"query-cpu-definitions"
>> > > > causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for
>both.
>> > > > But clearing both kvm_allowed and kvm_state cause SIGSEGV on
>"query-cpu-definitions".
>> > > >
>> > > > I'll send a patch to clearing only kvm_allowed if you are fine with it.
>> > >
>> > > I still don't love the idea. kvm_state is no longer valid.
>> > >
>> > > It sounds like "query-cpu-definitions" is missing a check for
>kvm_enabled().
>> >
>> > This patch / bug feels like it is side-stepping a general conceptual
>> > question. After cpr-transfer is complete what QMP commands are still
>> > valid to call in general ? This thread mentions two which have caused
>> > bugs, but beyond that it feels like a large subset of QMP comamnds
>> > are conceptually invalid to use.
>>
>> Agreed. I don't see why these commands should be issued to the dead
>instance.
>> How about migration status, quit, and nothing else?
>> Half serious.
>
>I was hoping you'd suggest such a short list :-)
>
>Essentially a case of calling qmp_for_each_command(), and in the callback
>run qmp_disable_command() for everything not in the allow-list.
Thanks for your suggestion, I agree this is a perfect scheme, but is also heavy on this corner case😊
Just curious if we need to do same for live migration. E.g., send qmp command on source qemu after live migration.
Thanks
Zhenzhong
On Fri, Oct 10, 2025 at 03:54:42AM +0000, Duan, Zhenzhong wrote:
>
>
> >-----Original Message-----
> >From: Daniel P. Berrangé <berrange@redhat.com>
> >Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> >"query-balloon" after CPR transfer
> >
> >On Thu, Oct 09, 2025 at 10:32:34AM -0400, Steven Sistare wrote:
> >> On 10/9/2025 10:19 AM, Daniel P. Berrangé wrote:
> >> > On Thu, Oct 09, 2025 at 10:11:17AM -0400, Steven Sistare wrote:
> >> > > On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
> >> > > >
> >> > > >
> >> > > > > -----Original Message-----
> >> > > > > From: Steven Sistare <steven.sistare@oracle.com>
> >> > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
> >> > > > > "query-balloon" after CPR transfer
> >> > > > >
> >> > > > > On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
> >> > > > > > > -----Original Message-----
> >> > > > > > > From: Steven Sistare <steven.sistare@oracle.com>
> >> > > > > > > Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when
> >execute
> >> > > > > > > "query-balloon" after CPR transfer
> >> > > > > > >
> >> > > > > > > On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
> >> > > > > > > > After CPR transfer, source QEMU closes kvm fd and sets
> >kvm_state to
> >> > > > > > > NULL,
> >> > > > > > > > "query-balloon" will check kvm_state->sync_mmu and trigger
> >NULL
> >> > > > > pointer
> >> > > > > > > > reference.
> >> > > > > > > >
> >> > > > > > > > We don't need to NULL kvm_state as all states in kvm_state
> >aren't
> >> > > > > released
> >> > > > > > > > actually. Just closing kvm fd is enough so we could still query
> >states
> >> > > > > > > > through "query_*" qmp command.
> >> > > > > > >
> >> > > > > > > IMO this does not make sense. Much of the state in kvm_state
> >was
> >> > > > > derived
> >> > > > > > >from ioctl's on the descriptors, and closing them invalidates it.
> >Asking
> >> > > > > > > historical questions about what used to be makes no sense.
> >> > > > > >
> >> > > > > > You also have your valid point.
> >> > > > > >
> >> > > > > > >
> >> > > > > > > Clearing kvm_state and setting kvm_allowed=false would be a
> >safer fix.
> >> > > > >
> >> > > > > Setting kvm_allowed=false causes kvm_enabled() to return false
> >which should
> >> > > > > prevent kvm_state from being dereferenced anywhere:
> >> > > > >
> >> > > > > #define kvm_enabled() (kvm_allowed)
> >> > > > >
> >> > > > > Eg for the balloon:
> >> > > > >
> >> > > > > static bool have_balloon(Error **errp)
> >> > > > > {
> >> > > > > if (kvm_enabled() && !kvm_has_sync_mmu()) {
> >> > > >
> >> > > > OK, will do, clearing kvm_allowed is a good idea.
> >> > > >
> >> > > > Currently there are two qmp commands "query-balloon" and
> >"query-cpu-definitions"
> >> > > > causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for
> >both.
> >> > > > But clearing both kvm_allowed and kvm_state cause SIGSEGV on
> >"query-cpu-definitions".
> >> > > >
> >> > > > I'll send a patch to clearing only kvm_allowed if you are fine with it.
> >> > >
> >> > > I still don't love the idea. kvm_state is no longer valid.
> >> > >
> >> > > It sounds like "query-cpu-definitions" is missing a check for
> >kvm_enabled().
> >> >
> >> > This patch / bug feels like it is side-stepping a general conceptual
> >> > question. After cpr-transfer is complete what QMP commands are still
> >> > valid to call in general ? This thread mentions two which have caused
> >> > bugs, but beyond that it feels like a large subset of QMP comamnds
> >> > are conceptually invalid to use.
> >>
> >> Agreed. I don't see why these commands should be issued to the dead
> >instance.
> >> How about migration status, quit, and nothing else?
> >> Half serious.
> >
> >I was hoping you'd suggest such a short list :-)
> >
> >Essentially a case of calling qmp_for_each_command(), and in the callback
> >run qmp_disable_command() for everything not in the allow-list.
>
> Thanks for your suggestion, I agree this is a perfect scheme, but is also heavy on this corner case😊
> Just curious if we need to do same for live migration. E.g., send qmp command on source qemu after live migration.
At the end of a normal migration, there is no functional reason to prevent
use of any QMP command. Indeed we need to have the ability to carry on
using the orignal source QEMU, in case the final steps on migration on
the target fail and we need to restart the source.
The cpr-transfer is unusual in the restrictions it has after completion.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
>-----Original Message-----
>From: Steven Sistare <steven.sistare@oracle.com>
>Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>"query-balloon" after CPR transfer
>
>On 10/8/2025 6:22 AM, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sistare@oracle.com>
>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>> "query-balloon" after CPR transfer
>>>
>>> On 9/30/2025 2:00 AM, Duan, Zhenzhong wrote:
>>>>> -----Original Message-----
>>>>> From: Steven Sistare <steven.sistare@oracle.com>
>>>>> Subject: Re: [PATCH v2 6/6] accel/kvm: Fix SIGSEGV when execute
>>>>> "query-balloon" after CPR transfer
>>>>>
>>>>> On 9/28/2025 4:54 AM, Zhenzhong Duan wrote:
>>>>>> After CPR transfer, source QEMU closes kvm fd and sets kvm_state to
>>>>> NULL,
>>>>>> "query-balloon" will check kvm_state->sync_mmu and trigger NULL
>>> pointer
>>>>>> reference.
>>>>>>
>>>>>> We don't need to NULL kvm_state as all states in kvm_state aren't
>>> released
>>>>>> actually. Just closing kvm fd is enough so we could still query states
>>>>>> through "query_*" qmp command.
>>>>>
>>>>> IMO this does not make sense. Much of the state in kvm_state was
>>> derived
>>>> >from ioctl's on the descriptors, and closing them invalidates it. Asking
>>>>> historical questions about what used to be makes no sense.
>>>>
>>>> You also have your valid point.
>>>>
>>>>>
>>>>> Clearing kvm_state and setting kvm_allowed=false would be a safer fix.
>>>
>>> Setting kvm_allowed=false causes kvm_enabled() to return false which
>should
>>> prevent kvm_state from being dereferenced anywhere:
>>>
>>> #define kvm_enabled() (kvm_allowed)
>>>
>>> Eg for the balloon:
>>>
>>> static bool have_balloon(Error **errp)
>>> {
>>> if (kvm_enabled() && !kvm_has_sync_mmu()) {
>>
>> OK, will do, clearing kvm_allowed is a good idea.
>>
>> Currently there are two qmp commands "query-balloon" and
>"query-cpu-definitions"
>> causing SIGSEGV after CPR-transfer, clearing kvm_allowed helps for both.
>> But clearing both kvm_allowed and kvm_state cause SIGSEGV on
>"query-cpu-definitions".
>>
>> I'll send a patch to clearing only kvm_allowed if you are fine with it.
>
>I still don't love the idea. kvm_state is no longer valid.
OK, what about another idea to not call vfio_cpr_kvm_close_notifier() for cpr-transfer, like below:
--- a/hw/vfio/cpr.c
+++ b/hw/vfio/cpr.c
@@ -197,7 +197,7 @@ void vfio_cpr_add_kvm_notifier(void)
if (!kvm_close_notifier.notify) {
migration_add_notifier_modes(&kvm_close_notifier,
vfio_cpr_kvm_close_notifier,
- MIG_MODE_CPR_TRANSFER, MIG_MODE_CPR_EXEC,
+ MIG_MODE_CPR_EXEC,
-1);
}
}
The close_kvm_after_cpr is only for cpr-exec, with this change, we are in same situation as live migration and can run all qmp commands same as after live migration.
>
>It sounds like "query-cpu-definitions" is missing a check for kvm_enabled().
Yes.
Thanks
Zhenzhong
© 2016 - 2026 Red Hat, Inc.