[PATCH] nodedev: transient mdev update on nodeDeviceCreateXML

Boris Fiuczynski posted 1 patch 10 months, 4 weeks ago
Failed in applying to current master (apply log)
There is a newer version of this series
src/node_device/node_device_driver.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
[PATCH] nodedev: transient mdev update on nodeDeviceCreateXML
Posted by Boris Fiuczynski 10 months, 4 weeks ago
Update the optional mdev attributes on the new created nodedev object as
they otherwise would not get set until the next mdevctl update cycle.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
---
 src/node_device/node_device_driver.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c
index a2d0600560..5134d246f3 100644
--- a/src/node_device/node_device_driver.c
+++ b/src/node_device/node_device_driver.c
@@ -847,6 +847,9 @@ static virNodeDevicePtr
 nodeDeviceCreateXMLMdev(virConnectPtr conn,
                         virNodeDeviceDef *def)
 {
+    virNodeDeviceObj *obj;
+    virNodeDeviceDef *new_def;
+    virNodeDevicePtr device;
     g_autofree char *uuid = NULL;
 
     if (!def->parent) {
@@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
         def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
     }
 
-    return nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid,
-                                           def->caps->data.mdev.parent_addr);
+    device = nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid,
+                                             def->caps->data.mdev.parent_addr);
+    /* check on def for attributes and try update */
+    if (def->caps->data.mdev.nattributes > 0) {
+        /* ignore failures as mdevctl updates will recover later */
+        if (!(obj = nodeDeviceObjFindByName(device->name)))
+            return device;
+        new_def = virNodeDeviceObjGetDef(obj);
+        nodeDeviceDefCopyFromMdevctl(new_def, def);
+        virNodeDeviceObjEndAPI(&obj);
+    }
+
+    return device;
 }
 
 
-- 
2.41.0
Re: [PATCH] nodedev: transient mdev update on nodeDeviceCreateXML
Posted by Jonathon Jongsma 10 months, 3 weeks ago
On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
> Update the optional mdev attributes on the new created nodedev object as
> they otherwise would not get set until the next mdevctl update cycle.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
> ---
>   src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/src/node_device/node_device_driver.c b/src/node_device/node_device_driver.c
> index a2d0600560..5134d246f3 100644
> --- a/src/node_device/node_device_driver.c
> +++ b/src/node_device/node_device_driver.c
> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>   nodeDeviceCreateXMLMdev(virConnectPtr conn,
>                           virNodeDeviceDef *def)
>   {
> +    virNodeDeviceObj *obj;
> +    virNodeDeviceDef *new_def;
> +    virNodeDevicePtr device;
>       g_autofree char *uuid = NULL;
>   
>       if (!def->parent) {
> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>           def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>       }
>   
> -    return nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid,
> -                                           def->caps->data.mdev.parent_addr);
> +    device = nodeDeviceFindNewMediatedDevice(conn, def->caps->data.mdev.uuid,
> +                                             def->caps->data.mdev.parent_addr);

So, this function waits up to 60s for the given mediated device to show 
up in the list of devices known to the driver (which is stored in 
driver->devs and gets populated by handlers that respond to udev or 
mdevctl events)...

> +    /* check on def for attributes and try update */
> +    if (def->caps->data.mdev.nattributes > 0) {
> +        /* ignore failures as mdevctl updates will recover later */
> +        if (!(obj = nodeDeviceObjFindByName(device->name)))
> +            return device;

If the device was found and the device has mdev attributes, you then 
query the exact same list for a device with the name of the device you 
just found...

> +        new_def = virNodeDeviceObjGetDef(obj);
> +        nodeDeviceDefCopyFromMdevctl(new_def, def);

If the device was found in the list (which it should be, because the 
nodeDeviceFindNewMediatedDevice() function had already found it in the 
list of devices), then we simply copy the user-supplied device 
definition into the definition we received from the nodedev driver's 
device list.

This is just papering over the issue. By copying the user-supplied 
definition into the system-queried definition, we're just assuming that 
the device was created properly with the requested attributes rather 
than verifying that it was created correctly.

The root issue seems to be that when a transient mediated device is 
created, our udev event handler runs and adds the device to the device 
list, but the mdevctl handler is never run to query additional 
mdev-related information about the device. mdevctl is only re-queried 
when the config files for defined devices are changed or when a new 
physical mdev parent device is added or removed.

Jonathon
Re: [PATCH] nodedev: transient mdev update on nodeDeviceCreateXML
Posted by Boris Fiuczynski 10 months, 3 weeks ago
On 6/28/23 12:03 AM, Jonathon Jongsma wrote:
> On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
>> Update the optional mdev attributes on the new created nodedev object as
>> they otherwise would not get set until the next mdevctl update cycle.
>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
>> ---
>>   src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/node_device/node_device_driver.c 
>> b/src/node_device/node_device_driver.c
>> index a2d0600560..5134d246f3 100644
>> --- a/src/node_device/node_device_driver.c
>> +++ b/src/node_device/node_device_driver.c
>> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>>   nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>                           virNodeDeviceDef *def)
>>   {
>> +    virNodeDeviceObj *obj;
>> +    virNodeDeviceDef *new_def;
>> +    virNodeDevicePtr device;
>>       g_autofree char *uuid = NULL;
>>       if (!def->parent) {
>> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>           def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>>       }
>> -    return nodeDeviceFindNewMediatedDevice(conn, 
>> def->caps->data.mdev.uuid,
>> -                                           
>> def->caps->data.mdev.parent_addr);
>> +    device = nodeDeviceFindNewMediatedDevice(conn, 
>> def->caps->data.mdev.uuid,
>> +                                             
>> def->caps->data.mdev.parent_addr);
> 
> So, this function waits up to 60s for the given mediated device to show 
> up in the list of devices known to the driver (which is stored in 
> driver->devs and gets populated by handlers that respond to udev or 
> mdevctl events)...
> 
>> +    /* check on def for attributes and try update */
>> +    if (def->caps->data.mdev.nattributes > 0) {
>> +        /* ignore failures as mdevctl updates will recover later */
>> +        if (!(obj = nodeDeviceObjFindByName(device->name)))
>> +            return device;
> 
> If the device was found and the device has mdev attributes, you then 
> query the exact same list for a device with the name of the device you 
> just found...
> 
>> +        new_def = virNodeDeviceObjGetDef(obj);
>> +        nodeDeviceDefCopyFromMdevctl(new_def, def);
> 
> If the device was found in the list (which it should be, because the 
> nodeDeviceFindNewMediatedDevice() function had already found it in the 
> list of devices), then we simply copy the user-supplied device 
> definition into the definition we received from the nodedev driver's 
> device list.

Please let me know which procedure to use if the node device object 
which is missing the attributes is created by the udev ADD event which 
the method nodeDeviceFindNewMediatedDevice is waiting for to happen.

> 
> This is just papering over the issue. By copying the user-supplied 
> definition into the system-queried definition, we're just assuming that 
> the device was created properly with the requested attributes rather 
> than verifying that it was created correctly.

I expect virMdevctlCreate to fail if attributes are not properly set on 
the transient mdev created by mdevctl. If that happens this code is 
never run anyway.
And speaking about verification:
mdevctl only returns attributes if a callout script is present 
supporting the mdev type and has a method for event "get" and action 
"attributes" implemented.
Therefore no "reliable" verification is possible! The only reliable 
source of information is the outcome of virMdevctlCreate.

> 
> The root issue seems to be that when a transient mediated device is 
> created, our udev event handler runs and adds the device to the device 
> list, but the mdevctl handler is never run to query additional 
> mdev-related information about the device. mdevctl is only re-queried 
> when the config files for defined devices are changed or when a new 
> physical mdev parent device is added or removed.

Actually that was the first place I looked at.
There is already an mdevctl update triggered in method udevAddOneDeive 
if the device is an mdev parent device. This happens usually on either a 
vfio* module load or a device being bound to a vfio* device driver.
Introducing an update on every mdev device added would create an high 
impact especially as the udevAddOneDeive is used in multiple contexts,
1) add and change udev event
2) move usev event
3) processing the each entry in a udev process device list, which is 
usually run when the node device daemon initializes. In this case a 
device unrelated/unconditional mdevctl update is triggered once already 
(nodeStateInitializedEnumerate).

In cases 1) and 2) the context of what triggered the add udev event is 
unknown and we are only missing the update if a transient mdev device 
with attributes is being created. Therefore I choose the "shortcut" of 
added these where the context is known. Please also note that the 
callout script restriction (mentioned above) also applies here.

> 
> Jonathon
> 


-- 
Mit freundlichen Grüßen/Kind regards
    Boris Fiuczynski

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
Re: [PATCH] nodedev: transient mdev update on nodeDeviceCreateXML
Posted by Jonathon Jongsma 10 months, 3 weeks ago
On 6/28/23 3:40 AM, Boris Fiuczynski wrote:
> On 6/28/23 12:03 AM, Jonathon Jongsma wrote:
>> On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
>>> Update the optional mdev attributes on the new created nodedev object as
>>> they otherwise would not get set until the next mdevctl update cycle.
>>>
>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>>> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
>>> ---
>>>   src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/src/node_device/node_device_driver.c 
>>> b/src/node_device/node_device_driver.c
>>> index a2d0600560..5134d246f3 100644
>>> --- a/src/node_device/node_device_driver.c
>>> +++ b/src/node_device/node_device_driver.c
>>> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>>>   nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>                           virNodeDeviceDef *def)
>>>   {
>>> +    virNodeDeviceObj *obj;
>>> +    virNodeDeviceDef *new_def;
>>> +    virNodeDevicePtr device;
>>>       g_autofree char *uuid = NULL;
>>>       if (!def->parent) {
>>> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>           def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>>>       }
>>> -    return nodeDeviceFindNewMediatedDevice(conn, 
>>> def->caps->data.mdev.uuid,
>>> - def->caps->data.mdev.parent_addr);
>>> +    device = nodeDeviceFindNewMediatedDevice(conn, 
>>> def->caps->data.mdev.uuid,
>>> + def->caps->data.mdev.parent_addr);
>>
>> So, this function waits up to 60s for the given mediated device to 
>> show up in the list of devices known to the driver (which is stored in 
>> driver->devs and gets populated by handlers that respond to udev or 
>> mdevctl events)...
>>
>>> +    /* check on def for attributes and try update */
>>> +    if (def->caps->data.mdev.nattributes > 0) {
>>> +        /* ignore failures as mdevctl updates will recover later */
>>> +        if (!(obj = nodeDeviceObjFindByName(device->name)))
>>> +            return device;
>>
>> If the device was found and the device has mdev attributes, you then 
>> query the exact same list for a device with the name of the device you 
>> just found...
>>
>>> +        new_def = virNodeDeviceObjGetDef(obj);
>>> +        nodeDeviceDefCopyFromMdevctl(new_def, def);
>>
>> If the device was found in the list (which it should be, because the 
>> nodeDeviceFindNewMediatedDevice() function had already found it in the 
>> list of devices), then we simply copy the user-supplied device 
>> definition into the definition we received from the nodedev driver's 
>> device list.
> 
> Please let me know which procedure to use if the node device object 
> which is missing the attributes is created by the udev ADD event which 
> the method nodeDeviceFindNewMediatedDevice is waiting for to happen.
> 
>>
>> This is just papering over the issue. By copying the user-supplied 
>> definition into the system-queried definition, we're just assuming 
>> that the device was created properly with the requested attributes 
>> rather than verifying that it was created correctly.
> 
> I expect virMdevctlCreate to fail if attributes are not properly set on 
> the transient mdev created by mdevctl. If that happens this code is 
> never run anyway.

As far as I know, that might be driver-dependent behavior.

But regardless, you could sort of make the same case for all of the 
other non-attribute information as well. Why don't we just copy the 
entire user-supplied definition into our internal device list instead of 
querying udev/mdevctl if the device creation was successful? I would 
argue that we want our internal device list to accurately represent the 
current state of the system, not what we expect the state of the system 
to be.

> And speaking about verification:
> mdevctl only returns attributes if a callout script is present 
> supporting the mdev type and has a method for event "get" and action 
> "attributes" implemented.
> Therefore no "reliable" verification is possible! The only reliable 
> source of information is the outcome of virMdevctlCreate.

That's true, but if you copy the user-supplied attributes directly into 
the internal device list for a transient device that doesn't implement 
the 'get-attributes' callout, you will lose those attributes when the 
libvirt daemon restarts. That's also problematic.


>> The root issue seems to be that when a transient mediated device is 
>> created, our udev event handler runs and adds the device to the device 
>> list, but the mdevctl handler is never run to query additional 
>> mdev-related information about the device. mdevctl is only re-queried 
>> when the config files for defined devices are changed or when a new 
>> physical mdev parent device is added or removed.
> 
> Actually that was the first place I looked at.
> There is already an mdevctl update triggered in method udevAddOneDeive 
> if the device is an mdev parent device. This happens usually on either a 
> vfio* module load or a device being bound to a vfio* device driver.
> Introducing an update on every mdev device added would create an high 
> impact especially as the udevAddOneDeive is used in multiple contexts,
> 1) add and change udev event
> 2) move usev event
> 3) processing the each entry in a udev process device list, which is 
> usually run when the node device daemon initializes. In this case a 
> device unrelated/unconditional mdevctl update is triggered once already 
> (nodeStateInitializedEnumerate).
> 
> In cases 1) and 2) the context of what triggered the add udev event is 
> unknown and we are only missing the update if a transient mdev device 
> with attributes is being created. Therefore I choose the "shortcut" of 
> added these where the context is known. Please also note that the 
> callout script restriction (mentioned above) also applies here.

Can you quantify what you mean by high-impact here? How many mdevs are 
you talking about on a single host? hundreds? thousands? How regularly 
are they being started and stopped?

Jonathon

Re: [PATCH] nodedev: transient mdev update on nodeDeviceCreateXML
Posted by Boris Fiuczynski 10 months, 3 weeks ago
On 6/28/23 7:22 PM, Jonathon Jongsma wrote:
> On 6/28/23 3:40 AM, Boris Fiuczynski wrote:
>> On 6/28/23 12:03 AM, Jonathon Jongsma wrote:
>>> On 6/23/23 5:43 AM, Boris Fiuczynski wrote:
>>>> Update the optional mdev attributes on the new created nodedev 
>>>> object as
>>>> they otherwise would not get set until the next mdevctl update cycle.
>>>>
>>>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2143158
>>>> Signed-off-by: Boris Fiuczynski <fiuczy@linux.ibm.com>
>>>> ---
>>>>   src/node_device/node_device_driver.c | 18 ++++++++++++++++--
>>>>   1 file changed, 16 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/src/node_device/node_device_driver.c 
>>>> b/src/node_device/node_device_driver.c
>>>> index a2d0600560..5134d246f3 100644
>>>> --- a/src/node_device/node_device_driver.c
>>>> +++ b/src/node_device/node_device_driver.c
>>>> @@ -847,6 +847,9 @@ static virNodeDevicePtr
>>>>   nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>>                           virNodeDeviceDef *def)
>>>>   {
>>>> +    virNodeDeviceObj *obj;
>>>> +    virNodeDeviceDef *new_def;
>>>> +    virNodeDevicePtr device;
>>>>       g_autofree char *uuid = NULL;
>>>>       if (!def->parent) {
>>>> @@ -864,8 +867,19 @@ nodeDeviceCreateXMLMdev(virConnectPtr conn,
>>>>           def->caps->data.mdev.uuid = g_steal_pointer(&uuid);
>>>>       }
>>>> -    return nodeDeviceFindNewMediatedDevice(conn, 
>>>> def->caps->data.mdev.uuid,
>>>> - def->caps->data.mdev.parent_addr);
>>>> +    device = nodeDeviceFindNewMediatedDevice(conn, 
>>>> def->caps->data.mdev.uuid,
>>>> + def->caps->data.mdev.parent_addr);
>>>
>>> So, this function waits up to 60s for the given mediated device to 
>>> show up in the list of devices known to the driver (which is stored 
>>> in driver->devs and gets populated by handlers that respond to udev 
>>> or mdevctl events)...
>>>
>>>> +    /* check on def for attributes and try update */
>>>> +    if (def->caps->data.mdev.nattributes > 0) {
>>>> +        /* ignore failures as mdevctl updates will recover later */
>>>> +        if (!(obj = nodeDeviceObjFindByName(device->name)))
>>>> +            return device;
>>>
>>> If the device was found and the device has mdev attributes, you then 
>>> query the exact same list for a device with the name of the device 
>>> you just found...
>>>
>>>> +        new_def = virNodeDeviceObjGetDef(obj);
>>>> +        nodeDeviceDefCopyFromMdevctl(new_def, def);
>>>
>>> If the device was found in the list (which it should be, because the 
>>> nodeDeviceFindNewMediatedDevice() function had already found it in 
>>> the list of devices), then we simply copy the user-supplied device 
>>> definition into the definition we received from the nodedev driver's 
>>> device list.
>>
>> Please let me know which procedure to use if the node device object 
>> which is missing the attributes is created by the udev ADD event which 
>> the method nodeDeviceFindNewMediatedDevice is waiting for to happen.
>>
>>>
>>> This is just papering over the issue. By copying the user-supplied 
>>> definition into the system-queried definition, we're just assuming 
>>> that the device was created properly with the requested attributes 
>>> rather than verifying that it was created correctly.
>>
>> I expect virMdevctlCreate to fail if attributes are not properly set 
>> on the transient mdev created by mdevctl. If that happens this code is 
>> never run anyway.
> 
> As far as I know, that might be driver-dependent behavior.
> 
> But regardless, you could sort of make the same case for all of the 
> other non-attribute information as well. Why don't we just copy the 
> entire user-supplied definition into our internal device list instead of 
> querying udev/mdevctl if the device creation was successful? I would 
> argue that we want our internal device list to accurately represent the 
> current state of the system, not what we expect the state of the system 
> to be.
> 
>> And speaking about verification:
>> mdevctl only returns attributes if a callout script is present 
>> supporting the mdev type and has a method for event "get" and action 
>> "attributes" implemented.
>> Therefore no "reliable" verification is possible! The only reliable 
>> source of information is the outcome of virMdevctlCreate.
> 
> That's true, but if you copy the user-supplied attributes directly into 
> the internal device list for a transient device that doesn't implement 
> the 'get-attributes' callout, you will lose those attributes when the 
> libvirt daemon restarts. That's also problematic.
> 

Good that we seem to agree on validation is not possible.

Anyway I do not see that more or less problematic as not coping the 
attributes and suddenly they appear when the noddev daemon restarts.
That is how it is today, see 
https://bugzilla.redhat.com/show_bug.cgi?id=2143158

> 
>>> The root issue seems to be that when a transient mediated device is 
>>> created, our udev event handler runs and adds the device to the 
>>> device list, but the mdevctl handler is never run to query additional 
>>> mdev-related information about the device. mdevctl is only re-queried 
>>> when the config files for defined devices are changed or when a new 
>>> physical mdev parent device is added or removed.
>>
>> Actually that was the first place I looked at.
>> There is already an mdevctl update triggered in method udevAddOneDeive 
>> if the device is an mdev parent device. This happens usually on either 
>> a vfio* module load or a device being bound to a vfio* device driver.
>> Introducing an update on every mdev device added would create an high 
>> impact especially as the udevAddOneDeive is used in multiple contexts,
>> 1) add and change udev event
>> 2) move usev event
>> 3) processing the each entry in a udev process device list, which is 
>> usually run when the node device daemon initializes. In this case a 
>> device unrelated/unconditional mdevctl update is triggered once 
>> already (nodeStateInitializedEnumerate).
>>
>> In cases 1) and 2) the context of what triggered the add udev event is 
>> unknown and we are only missing the update if a transient mdev device 
>> with attributes is being created. Therefore I choose the "shortcut" of 
>> added these where the context is known. Please also note that the 
>> callout script restriction (mentioned above) also applies here.
> 
> Can you quantify what you mean by high-impact here? How many mdevs are 
> you talking about on a single host? hundreds? thousands? How regularly 
> are they being started and stopped?

Let's assume you have a S390 host with AP setup for passthrough using 
vfio_ap. Assuming of the 100+ guests 20 get a crypto device assigned 
this results in 20 vfio-ap mdevs present on the host.
We also have tests which define up to 1000 guests and using the same 
ration would result in 200 vfio-ap mdevs.

Assuming we would trigger an mdevctl update in udevAddOneDevice when it 
is called for an mdev:
If virtnodedevd is already initialized and the add udev event is 
received it would result in a single mdevctl update.

The virtnodedevd is stopped after 120 seconds unless used.
Starting virtnodedevd up would cause a mdevctl update for every existing 
mdev. That is the impact which I am worried about.

I will try to find a way to just trigger the mdevctl update on the udev 
event add pattern which is not reused for the daemon initialization.

> 
> Jonathon
> 


-- 
Mit freundlichen Grüßen/Kind regards
    Boris Fiuczynski

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294