https://bugzilla.redhat.com/show_bug.cgi?id=1623389
If a device is detached twice from the same domain the following
race condition may happen:
1) The first DetachDevice() call will issue "device_del" on qemu
monitor, but since the DEVICE_DELETED event did not arrive in
time, the API ends claiming "Device detach request sent
successfully".
2) The second DetachDevice() therefore still find the device in
the domain and thus proceeds to detaching it again. It calls
EnterMonitor() and qemuMonitorSend() trying to issue "device_del"
command again. This gets both domain lock and monitor lock
released.
3) At this point, qemu sends us the DEVICE_DELETED event which is
going to be handled by the event loop which ends up calling
qemuDomainSignalDeviceRemoval() to determine who is going to
remove the device from domain definition. Whether it is the
caller that marked the device for removal or whether it is going
to be the event processing thread.
4) Because the device was marked for removal,
qemuDomainSignalDeviceRemoval() returns true, which means the
event is to be processed by the thread that has marked the device
for removal (and is currently still trying to issue "device_del"
command)
5) The thread finally issues the "device_del" command, which
fails (obviously) and therefore it calls
qemuDomainResetDeviceRemoval() to reset the device marking and
quits immediately after, NOT removing any device from the domain
definition.
At this point, the device is still present in the domain
definition but doesn't exist in qemu anymore. Worse, there is no
way to remove it from the domain definition.
Solution is to note down that we've seen the event and if the
second "device_del" fails, not take it as a failure but carry on
with the usual execution.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
src/qemu/qemu_domain.h | 1 +
src/qemu/qemu_hotplug.c | 52 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 9f468e5661..fb361515ba 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr;
struct _qemuDomainUnpluggingDevice {
const char *alias;
qemuDomainUnpluggingDeviceStatus status;
+ bool eventSeen; /* True if DEVICE_DELETED event arrived. */
};
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index 59a5d21a5b..6b5b37bda5 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -67,6 +67,9 @@ VIR_LOG_INIT("qemu.qemu_hotplug");
unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5;
+static void
+qemuDomainResetDeviceRemoval(virDomainObjPtr vm);
+
/**
* qemuDomainDeleteDevice:
* @vm: domain object
@@ -103,8 +106,51 @@ qemuDomainDeleteDevice(virDomainObjPtr vm,
}
if (enterMonitor &&
- qemuDomainObjExitMonitor(driver, vm) < 0)
- rc = -1;
+ qemuDomainObjExitMonitor(driver, vm) < 0) {
+ /* Domain is no longer running. No cleanup needed. */
+ return -1;
+ }
+
+ if (rc < 0) {
+ bool eventSeen;
+
+ /* Deleting device failed. Let's check if DEVICE_DELETED
+ * even arrived. If it did, we need to claim success to
+ * make the caller remove device from domain XML. */
+ if (enterMonitor) {
+ /* Here @vm is locked again. It's safe to access
+ * private data directly. */
+ } else {
+ /* Here @vm is not locked. Do some locking magic to
+ * be able to access private data. It is safe to lock
+ * and unlock both @mon and @vm here because:
+ * a) qemuDomainObjEnterMonitor() ensures @mon is ref()'d
+ * b) The API that is calling us ensures that @vm is ref()'d
+ */
+ virObjectUnlock(priv->mon);
+ virObjectLock(vm);
+ virObjectLock(priv->mon);
+ }
+
+ eventSeen = priv->unplug.eventSeen;
+ if (eventSeen) {
+ /* The event arrived. Return success. */
+ VIR_DEBUG("Detaching of device %s failed, but event arrived", alias);
+ qemuDomainResetDeviceRemoval(vm);
+ rc = 0;
+ } else if (rc == -2) {
+ /* The device does not exist in qemu, but it still
+ * exists in libvirt. Claim success to make caller
+ * qemuDomainWaitForDeviceRemoval(). Otherwise if
+ * domain XML is queried right after detach API the
+ * device would still be there. */
+ VIR_DEBUG("Detaching of device %s failed and no event arrived", alias);
+ rc = 0;
+ }
+
+ if (!enterMonitor)
+ virObjectUnlock(vm);
+ }
return rc;
}
@@ -5186,6 +5232,7 @@ qemuDomainResetDeviceRemoval(virDomainObjPtr vm)
{
qemuDomainObjPrivatePtr priv = vm->privateData;
priv->unplug.alias = NULL;
+ priv->unplug.eventSeen = false;
}
/* Returns:
@@ -5245,6 +5292,7 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm,
VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias);
qemuDomainResetDeviceRemoval(vm);
priv->unplug.status = status;
+ priv->unplug.eventSeen = true;
virDomainObjBroadcast(vm);
return true;
}
--
2.19.2
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote:
> https://bugzilla.redhat.com/show_bug.cgi?id=1623389
>
> If a device is detached twice from the same domain the following
> race condition may happen:
>
> 1) The first DetachDevice() call will issue "device_del" on qemu
> monitor, but since the DEVICE_DELETED event did not arrive in
> time, the API ends claiming "Device detach request sent
> successfully".
>
> 2) The second DetachDevice() therefore still find the device in
> the domain and thus proceeds to detaching it again. It calls
> EnterMonitor() and qemuMonitorSend() trying to issue "device_del"
> command again. This gets both domain lock and monitor lock
> released.
>
> 3) At this point, qemu sends us the DEVICE_DELETED event which is
> going to be handled by the event loop which ends up calling
> qemuDomainSignalDeviceRemoval() to determine who is going to
> remove the device from domain definition. Whether it is the
> caller that marked the device for removal or whether it is going
> to be the event processing thread.
>
> 4) Because the device was marked for removal,
> qemuDomainSignalDeviceRemoval() returns true, which means the
> event is to be processed by the thread that has marked the device
> for removal (and is currently still trying to issue "device_del"
> command)
>
> 5) The thread finally issues the "device_del" command, which
> fails (obviously) and therefore it calls
> qemuDomainResetDeviceRemoval() to reset the device marking and
> quits immediately after, NOT removing any device from the domain
> definition.
>
> At this point, the device is still present in the domain
> definition but doesn't exist in qemu anymore. Worse, there is no
> way to remove it from the domain definition.
>
> Solution is to note down that we've seen the event and if the
> second "device_del" fails, not take it as a failure but carry on
> with the usual execution.
>
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
> src/qemu/qemu_domain.h | 1 +
> src/qemu/qemu_hotplug.c | 52 +++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 51 insertions(+), 2 deletions(-)
>
> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
> index 9f468e5661..fb361515ba 100644
> --- a/src/qemu/qemu_domain.h
> +++ b/src/qemu/qemu_domain.h
> @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr;
> struct _qemuDomainUnpluggingDevice {
> const char *alias;
> qemuDomainUnpluggingDeviceStatus status;
> + bool eventSeen; /* True if DEVICE_DELETED event arrived. */
> };
>
>
> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
> index 59a5d21a5b..6b5b37bda5 100644
> --- a/src/qemu/qemu_hotplug.c
> +++ b/src/qemu/qemu_hotplug.c
> @@ -67,6 +67,9 @@ VIR_LOG_INIT("qemu.qemu_hotplug");
> unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5;
>
>
> +static void
> +qemuDomainResetDeviceRemoval(virDomainObjPtr vm);
> +
> /**
> * qemuDomainDeleteDevice:
> * @vm: domain object
> @@ -103,8 +106,51 @@ qemuDomainDeleteDevice(virDomainObjPtr vm,
> }
>
> if (enterMonitor &&
> - qemuDomainObjExitMonitor(driver, vm) < 0)
> - rc = -1;
> + qemuDomainObjExitMonitor(driver, vm) < 0) {
> + /* Domain is no longer running. No cleanup needed. */
> + return -1;
> + }
> +
> + if (rc < 0) {
> + bool eventSeen;
> +
> + /* Deleting device failed. Let's check if DEVICE_DELETED
> + * even arrived. If it did, we need to claim success to
> + * make the caller remove device from domain XML. */
> + if (enterMonitor) {
> + /* Here @vm is locked again. It's safe to access
> + * private data directly. */
> + } else {
> + /* Here @vm is not locked. Do some locking magic to
> + * be able to access private data. It is safe to lock
> + * and unlock both @mon and @vm here because:
> + * a) qemuDomainObjEnterMonitor() ensures @mon is ref()'d
> + * b) The API that is calling us ensures that @vm is ref()'d
> + */
As I've said in review for patch 1. This spiel is not necessary and only
relevant in the ZPCI extension device case which will not be the device
which the event is meant for.
> + virObjectUnlock(priv->mon);
> + virObjectLock(vm);
> + virObjectLock(priv->mon);
> + }
> +
> + eventSeen = priv->unplug.eventSeen;
Saving this into a temporary variable used only here does not make much
sense.
> + if (eventSeen) {
> + /* The event arrived. Return success. */
> + VIR_DEBUG("Detaching of device %s failed, but event arrived", alias);
> + qemuDomainResetDeviceRemoval(vm);
> + rc = 0;
> + } else if (rc == -2) {
> + /* The device does not exist in qemu, but it still
> + * exists in libvirt. Claim success to make caller
> + * qemuDomainWaitForDeviceRemoval(). Otherwise if
> + * domain XML is queried right after detach API the
> + * device would still be there. */
> + VIR_DEBUG("Detaching of device %s failed and no event arrived", alias);
> + rc = 0;
How can this be considered success? Also this introduces a possible
regression. The DEVICE_DELETED event should be fired only after the
device was entirely unplugged. Claiming success before seeing the event
can lead to another race when qemu deleted the device from the internal
list so that 'device_del' does not see it any more but did not finish
cleanup fully.
We need to start the '*Remove' handler only after the DEVICE_DELETED
event was received.
> + }
> +
> + if (!enterMonitor)
> + virObjectUnlock(vm);
> + }
>
> return rc;
> }
> @@ -5186,6 +5232,7 @@ qemuDomainResetDeviceRemoval(virDomainObjPtr vm)
> {
> qemuDomainObjPrivatePtr priv = vm->privateData;
> priv->unplug.alias = NULL;
> + priv->unplug.eventSeen = false;
> }
>
> /* Returns:
> @@ -5245,6 +5292,7 @@ qemuDomainSignalDeviceRemoval(virDomainObjPtr vm,
> VIR_DEBUG("Removal of device '%s' continues in waiting thread", devAlias);
> qemuDomainResetDeviceRemoval(vm);
> priv->unplug.status = status;
> + priv->unplug.eventSeen = true;
> virDomainObjBroadcast(vm);
> return true;
> }
> --
> 2.19.2
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On 3/14/19 2:18 PM, Peter Krempa wrote:
> On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1623389
>>
>> If a device is detached twice from the same domain the following
>> race condition may happen:
>>
>> 1) The first DetachDevice() call will issue "device_del" on qemu
>> monitor, but since the DEVICE_DELETED event did not arrive in
>> time, the API ends claiming "Device detach request sent
>> successfully".
>>
>> 2) The second DetachDevice() therefore still find the device in
>> the domain and thus proceeds to detaching it again. It calls
>> EnterMonitor() and qemuMonitorSend() trying to issue "device_del"
>> command again. This gets both domain lock and monitor lock
>> released.
>>
>> 3) At this point, qemu sends us the DEVICE_DELETED event which is
>> going to be handled by the event loop which ends up calling
>> qemuDomainSignalDeviceRemoval() to determine who is going to
>> remove the device from domain definition. Whether it is the
>> caller that marked the device for removal or whether it is going
>> to be the event processing thread.
>>
>> 4) Because the device was marked for removal,
>> qemuDomainSignalDeviceRemoval() returns true, which means the
>> event is to be processed by the thread that has marked the device
>> for removal (and is currently still trying to issue "device_del"
>> command)
>>
>> 5) The thread finally issues the "device_del" command, which
>> fails (obviously) and therefore it calls
>> qemuDomainResetDeviceRemoval() to reset the device marking and
>> quits immediately after, NOT removing any device from the domain
>> definition.
>>
>> At this point, the device is still present in the domain
>> definition but doesn't exist in qemu anymore. Worse, there is no
>> way to remove it from the domain definition.
>>
>> Solution is to note down that we've seen the event and if the
>> second "device_del" fails, not take it as a failure but carry on
>> with the usual execution.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
>> src/qemu/qemu_domain.h | 1 +
>> src/qemu/qemu_hotplug.c | 52 +++++++++++++++++++++++++++++++++++++++--
>> 2 files changed, 51 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
>> index 9f468e5661..fb361515ba 100644
>> --- a/src/qemu/qemu_domain.h
>> +++ b/src/qemu/qemu_domain.h
>> @@ -218,6 +218,7 @@ typedef qemuDomainUnpluggingDevice *qemuDomainUnpluggingDevicePtr;
>> struct _qemuDomainUnpluggingDevice {
>> const char *alias;
>> qemuDomainUnpluggingDeviceStatus status;
>> + bool eventSeen; /* True if DEVICE_DELETED event arrived. */
>> };
>>
>>
>> diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
>> index 59a5d21a5b..6b5b37bda5 100644
>> --- a/src/qemu/qemu_hotplug.c
>> +++ b/src/qemu/qemu_hotplug.c
>> @@ -67,6 +67,9 @@ VIR_LOG_INIT("qemu.qemu_hotplug");
>> unsigned long long qemuDomainRemoveDeviceWaitTime = 1000ull * 5;
>>
>>
>> +static void
>> +qemuDomainResetDeviceRemoval(virDomainObjPtr vm);
>> +
>> /**
>> * qemuDomainDeleteDevice:
>> * @vm: domain object
>> @@ -103,8 +106,51 @@ qemuDomainDeleteDevice(virDomainObjPtr vm,
>> }
>>
>> if (enterMonitor &&
>> - qemuDomainObjExitMonitor(driver, vm) < 0)
>> - rc = -1;
>> + qemuDomainObjExitMonitor(driver, vm) < 0) {
>> + /* Domain is no longer running. No cleanup needed. */
>> + return -1;
>> + }
>> +
>> + if (rc < 0) {
>> + bool eventSeen;
>> +
>> + /* Deleting device failed. Let's check if DEVICE_DELETED
>> + * even arrived. If it did, we need to claim success to
>> + * make the caller remove device from domain XML. */
>> + if (enterMonitor) {
>> + /* Here @vm is locked again. It's safe to access
>> + * private data directly. */
>> + } else {
>> + /* Here @vm is not locked. Do some locking magic to
>> + * be able to access private data. It is safe to lock
>> + * and unlock both @mon and @vm here because:
>> + * a) qemuDomainObjEnterMonitor() ensures @mon is ref()'d
>> + * b) The API that is calling us ensures that @vm is ref()'d
>> + */
>
> As I've said in review for patch 1. This spiel is not necessary and only
> relevant in the ZPCI extension device case which will not be the device
> which the event is meant for.
Okay.
>
>> + virObjectUnlock(priv->mon);
>> + virObjectLock(vm);
>> + virObjectLock(priv->mon);
>> + }
>> +
>> + eventSeen = priv->unplug.eventSeen;
>
> Saving this into a temporary variable used only here does not make much
> sense.
>
>> + if (eventSeen) {
>> + /* The event arrived. Return success. */
>> + VIR_DEBUG("Detaching of device %s failed, but event arrived", alias);
>> + qemuDomainResetDeviceRemoval(vm);
>> + rc = 0;
>> + } else if (rc == -2) {
>> + /* The device does not exist in qemu, but it still
>> + * exists in libvirt. Claim success to make caller
>> + * qemuDomainWaitForDeviceRemoval(). Otherwise if
>> + * domain XML is queried right after detach API the
>> + * device would still be there. */
>> + VIR_DEBUG("Detaching of device %s failed and no event arrived", alias);
>> + rc = 0;
>
> How can this be considered success? Also this introduces a possible
> regression. The DEVICE_DELETED event should be fired only after the
> device was entirely unplugged. Claiming success before seeing the event
> can lead to another race when qemu deleted the device from the internal
> list so that 'device_del' does not see it any more but did not finish
> cleanup fully.
>
> We need to start the '*Remove' handler only after the DEVICE_DELETED
> event was received.
I beg to differ. If we were to report error here users would see the API
failing with error "Device not found". So they'd run 'virsh dumpxml'
only to find the device there. I don't find such behaviour sane. If one
API tells me a devie is not there then another one shall not tell otherwise.
Michal
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote: > On 3/14/19 2:18 PM, Peter Krempa wrote: > > On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote: [...] > > > > How can this be considered success? Also this introduces a possible > > regression. The DEVICE_DELETED event should be fired only after the > > device was entirely unplugged. Claiming success before seeing the event > > can lead to another race when qemu deleted the device from the internal > > list so that 'device_del' does not see it any more but did not finish > > cleanup fully. > > > > We need to start the '*Remove' handler only after the DEVICE_DELETED > > event was received. > > I beg to differ. If we were to report error here users would see the API > failing with error "Device not found". So they'd run 'virsh dumpxml' only to > find the device there. I don't find such behaviour sane. If one API tells me > a devie is not there then another one shall not tell otherwise. Well. The user semantics can be confusing here. What we can't allow though is that some of the steps done in the qemuDomainRemove*Device will fail because qemu will still have some internal reference to some backend object. What I'd find more of a problem is that I'd try to attach a similar device only to be told that it already exists. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 3/14/19 3:14 PM, Peter Krempa wrote: > On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote: >> On 3/14/19 2:18 PM, Peter Krempa wrote: >>> On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote: > > [...] > >>> >>> How can this be considered success? Also this introduces a possible >>> regression. The DEVICE_DELETED event should be fired only after the >>> device was entirely unplugged. Claiming success before seeing the event >>> can lead to another race when qemu deleted the device from the internal >>> list so that 'device_del' does not see it any more but did not finish >>> cleanup fully. >>> >>> We need to start the '*Remove' handler only after the DEVICE_DELETED >>> event was received. >> >> I beg to differ. If we were to report error here users would see the API >> failing with error "Device not found". So they'd run 'virsh dumpxml' only to >> find the device there. I don't find such behaviour sane. If one API tells me >> a devie is not there then another one shall not tell otherwise. > > Well. The user semantics can be confusing here. What we can't allow > though is that some of the steps done in the qemuDomainRemove*Device > will fail because qemu will still have some internal reference to some > backend object. I'm not quite sure I follow. qemuDomainRemove*Device will be run exactly once. Not any more times. Running it more times is a problem, but I'm failing to see how my patch allows that. Can you shed more light into that please? > What I'd find more of a problem is that I'd try to > attach a similar device only to be told that it already exists. I'm don't know what you mean here either. With my patches not only we enter the wait for the event again (thus widening the window when the event may arrive), but we are actually compliant with the detach semantics. Let's think of an extreme case: qemu fails to deliver DEVICE_DELETED event. With my patches you'll get: 1) virsh detach-device-alias $dom $alias Device detach request sent successfully 2) virsh detach-device-alias $dom $alias Device detach request sent successfully 3) virsh detach-device-alias $dom $alias Device detach request sent successfully ... If we were to fail, as you suggest: 1) virsh detach-device-alias $dom $alias Device detach request sent successfully 2) virsh detach-device-alias $dom $alias monitor error: DeviceNotFound 3) virsh detach-device-alias $dom $alias monitor error: DeviceNotFound Now if you run 'virsh dumpxl $dom' as 4th step (for both scenarios) the device is still there. So how can it be in the domain XML and not found at the same time? And if you try to attach it, everything will work: libvirt generates a different address to plug the device to, since it still sees the old one. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Mar 14, 2019 at 15:31:48 +0100, Michal Privoznik wrote: > On 3/14/19 3:14 PM, Peter Krempa wrote: > > On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote: > > > On 3/14/19 2:18 PM, Peter Krempa wrote: > > > > On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote: > > > > [...] > > > > > > > > > > How can this be considered success? Also this introduces a possible > > > > regression. The DEVICE_DELETED event should be fired only after the > > > > device was entirely unplugged. Claiming success before seeing the event > > > > can lead to another race when qemu deleted the device from the internal > > > > list so that 'device_del' does not see it any more but did not finish > > > > cleanup fully. > > > > > > > > We need to start the '*Remove' handler only after the DEVICE_DELETED > > > > event was received. > > > > > > I beg to differ. If we were to report error here users would see the API > > > failing with error "Device not found". So they'd run 'virsh dumpxml' only to > > > find the device there. I don't find such behaviour sane. If one API tells me > > > a devie is not there then another one shall not tell otherwise. > > > > Well. The user semantics can be confusing here. What we can't allow > > though is that some of the steps done in the qemuDomainRemove*Device > > will fail because qemu will still have some internal reference to some > > backend object. > > I'm not quite sure I follow. qemuDomainRemove*Device will be run exactly > once. Not any more times. Running it more times is a problem, but I'm > failing to see how my patch allows that. Can you shed more light into that > please? What I've meant is that qemuDomainRemove*Device shall never be called earlier than DEVICE_DELETED is received. What I've forgot to take into account is that we will still call qemuDomainWaitForDeviceRemoval (as your comment mentions). This means that the scenario I was outlining will not happen as success from this API does not automatically mean we'll try to delete the backends of the device (which I didn't notice at first). I'd probably just delete the mention of dumping XML. I think the statement that qemuDomainWaitForDeviceRemoval should be clear enough. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2026 Red Hat, Inc.