src/check-aclrules.pl | 3 +- src/check-driverimpls.pl | 1 + src/conf/node_device_conf.c | 112 ++++++++++++++++++++++++++++++++++++- src/conf/node_device_conf.h | 18 ++++++ src/libvirt_private.syms | 3 + src/node_device/node_device_udev.c | 29 ++++++++++ src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 19 +++++++ src/qemu/qemu_conf.c | 53 ++++++++++++++++++ src/qemu/qemu_conf.h | 17 ++++++ src/qemu/qemu_driver.c | 46 +++++++++++++++ src/qemu/test_libvirtd_qemu.aug.in | 1 + 12 files changed, 301 insertions(+), 2 deletions(-)
Repost: http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html to update to top of branch as of commit id '5ad03b9db2' John Ferlan (3): nodedev: Add driver callback mechanism to add/remove devices qemu: Use nodedev callback mechanism to to get a nodedev data qemu: Add configuration variable to control nodedev enumeration src/check-aclrules.pl | 3 +- src/check-driverimpls.pl | 1 + src/conf/node_device_conf.c | 112 ++++++++++++++++++++++++++++++++++++- src/conf/node_device_conf.h | 18 ++++++ src/libvirt_private.syms | 3 + src/node_device/node_device_udev.c | 29 ++++++++++ src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 19 +++++++ src/qemu/qemu_conf.c | 53 ++++++++++++++++++ src/qemu/qemu_conf.h | 17 ++++++ src/qemu/qemu_driver.c | 46 +++++++++++++++ src/qemu/test_libvirtd_qemu.aug.in | 1 + 12 files changed, 301 insertions(+), 2 deletions(-) -- 2.9.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote: > Repost: http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html > > to update to top of branch as of commit id '5ad03b9db2' BTW, could you include the full cover letter in each new version rather than making people follow links all the way back to v1 to find info about the patch series goals. IIUC, the intention here is that we automatically create NPIV devices when starting guests and delete them when stopping guests. I can see some appeal in this, but at the same time I'm not convinced we should add such a feature. AFAICT, the node device APIs already allow a management application to achieve the same end goal without needing this integration. Yes, it would simplify usage of NPIV on the surface, but the cost of doing this is that it ends a specific usage policy for NPIV in the libvirt code and makes error handling harder. In particular it is possible to get into a situation where a VM fails to start and we're also unable to clear up the NPIV device we just auto-created. Now this could be said to apply to pretty much everything we do during guest startup, but in most cases the failure is harmless or gets auto-cleaned up by the kernel (ie the tap devices get auto-deleted when the FD is closed, or SELinux labels get reset next time a VM wants that file, locks are released when we close the virtlockd file handle, etc). NPIV is significantly more complicated and more likely to hit failure scenarios due to fact that it involves interactions with off-node hardware resources. Is there some aspect of NPIV mgmt that can only be achieved if libvirt is explicitly managing the device lifecycle during VM start/stop, as opposed to having the mgmt app manage it ? If OpenStack were to provide NPIV support I think it'd probably end up dealing with device setup explicitly via the node device APIs rather than relying on libvirt to create/delete them. That way it can track the lifecycle of NPIV devices explicitly, and if it is not possible to delete them at time of QEMU shutdown for some reason, it can easily arrange to delete them later. Overall I think one of the more successful aspects of libvirt's design has been the way we minimise the addition of usage policy decisions, in favour of providing mechanisms that applications can use to implement policies. This has had a cost in that applications need todo more work themselves, but on balance I still think it is a win to avoid adding policy driven features to libvirt. A key question is just where "autocreation/delete of NPIV devices" falls in the line between mechanism & policy, since the line is not entirely black & white. I tend towards it being policy though, since it is just providing a less general purpose way todo something that can be achieved already via the node device APIs. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 02/21/2017 12:03 PM, Daniel P. Berrange wrote: > On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote: >> Repost: http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html >> >> to update to top of branch as of commit id '5ad03b9db2' > > BTW, could you include the full cover letter in each new version rather > than making people follow links all the way back to v1 to find info > about the patch series goals. OK - I'll try to remember. > > IIUC, the intention here is that we automatically create NPIV devices > when starting guests and delete them when stopping guests. I can see > some appeal in this, but at the same time I'm not convinced we should > add such a feature. A bit more than that - create the vHBA and assign the LUN's to the guest as they are discovered and remove them as they are removed (events from udev). This is a mechanism/idea from Paolo. The RHV team would be the primary consumer and IIRC they don't use storage pools. > > AFAICT, the node device APIs already allow a management application to > achieve the same end goal without needing this integration. Yes, it > would simplify usage of NPIV on the surface, but the cost of doing this > is that it ends a specific usage policy for NPIV in the libvirt code and > makes error handling harder. In particular it is possible to get into a > situation where a VM fails to start and we're also unable to clear up > the NPIV device we just auto-created. Now this could be said to apply > to pretty much everything we do during guest startup, but in most cases > the failure is harmless or gets auto-cleaned up by the kernel (ie the > tap devices get auto-deleted when the FD is closed, or SELinux labels > get reset next time a VM wants that file, locks are released when we > close the virtlockd file handle, etc). NPIV is significantly more > complicated and more likely to hit failure scenarios due to fact that > it involves interactions with off-node hardware resources. I agree with your points. The "purpose" of libvirt taking care of it would be to let libvirt handle all those nasty and odd failure or integration issues - including migration. Of course from a libvirt perspective, I'd rather take the 'scsi_hostX' vHBA and just pass that through to QEMU directly to allow it (or the guest) to find the LUN's, but that's push the problem the other way. I said early on that this is something that could be done by the upper layers that would be able to receive the add/remove lun events whether they created a storage pool just for that purpose or they created the vHBA themselves. It's probably even in the bz's on this. > > Is there some aspect of NPIV mgmt that can only be achieved if libvirt > is explicitly managing the device lifecycle during VM start/stop, as > opposed to having the mgmt app manage it ? > Beyond the upper layers not needing to handle anything other than creating the vHBA for the domain and letting libvirt handle the rest. > If OpenStack were to provide NPIV support I think it'd probably end > up dealing with device setup explicitly via the node device APIs > rather than relying on libvirt to create/delete them. That way it > can track the lifecycle of NPIV devices explicitly, and if it is not > possible to delete them at time of QEMU shutdown for some reason, it > can easily arrange to delete them later. > > > Overall I think one of the more successful aspects of libvirt's design > has been the way we minimise the addition of usage policy decisions, in > favour of providing mechanisms that applications can use to implement > policies. This has had a cost in that applications need todo more work > themselves, but on balance I still think it is a win to avoid adding > policy driven features to libvirt. > > A key question is just where "autocreation/delete of NPIV devices" falls > in the line between mechanism & policy, since the line is not entirely > black & white. I tend towards it being policy though, since it is just > providing a less general purpose way todo something that can be achieved > already via the node device APIs. > > Regards, > Daniel > I understand - to a degree I guess I had assumed some of these type discussions had taken place by those that wanted the feature added. One other good thing that's come out of these changes is a bit more testing for vHBA creation via nodedev/storage pool and quite a bit of code cleanup once/if most of the patches I posted earlier in the week are accepted. John (FWIW: I'll have limited access to email over the next couple of days...) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, Feb 21, 2017 at 10:45:05PM -0500, John Ferlan wrote: > > > On 02/21/2017 12:03 PM, Daniel P. Berrange wrote: > > On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote: > >> Repost: http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html > >> > >> to update to top of branch as of commit id '5ad03b9db2' > > > > BTW, could you include the full cover letter in each new version rather > > than making people follow links all the way back to v1 to find info > > about the patch series goals. > > OK - I'll try to remember. > > > > > IIUC, the intention here is that we automatically create NPIV devices > > when starting guests and delete them when stopping guests. I can see > > some appeal in this, but at the same time I'm not convinced we should > > add such a feature. > > A bit more than that - create the vHBA and assign the LUN's to the guest > as they are discovered and remove them as they are removed (events from > udev). This is a mechanism/idea from Paolo. The RHV team would be the > primary consumer and IIRC they don't use storage pools. > > > > > AFAICT, the node device APIs already allow a management application to > > achieve the same end goal without needing this integration. Yes, it > > would simplify usage of NPIV on the surface, but the cost of doing this > > is that it ends a specific usage policy for NPIV in the libvirt code and > > makes error handling harder. In particular it is possible to get into a > > situation where a VM fails to start and we're also unable to clear up > > the NPIV device we just auto-created. Now this could be said to apply > > to pretty much everything we do during guest startup, but in most cases > > the failure is harmless or gets auto-cleaned up by the kernel (ie the > > tap devices get auto-deleted when the FD is closed, or SELinux labels > > get reset next time a VM wants that file, locks are released when we > > close the virtlockd file handle, etc). NPIV is significantly more > > complicated and more likely to hit failure scenarios due to fact that > > it involves interactions with off-node hardware resources. > > I agree with your points. The "purpose" of libvirt taking care of it > would be to let libvirt handle all those nasty and odd failure or > integration issues - including migration. Of course from a libvirt > perspective, I'd rather take the 'scsi_hostX' vHBA and just pass that > through to QEMU directly to allow it (or the guest) to find the LUN's, > but that's push the problem the other way. > > I said early on that this is something that could be done by the upper > layers that would be able to receive the add/remove lun events whether > they created a storage pool just for that purpose or they created the > vHBA themselves. It's probably even in the bz's on this. Yeah, I'm curious where the desire to assign individual LUNs comes from - that makes me even less comfortable with this idea. It means we are taking a single device from libvirt's POV - a HBA, and turning it into many devices from the guest's POV, which means for a single device we'd need to track multiple SCSI addresses surely, and we'd not know how many addresses we need to track up front either. I'd really strongly prefer we track individual devices given to the guest explicitly. I'd also suggest just assigning the whole vHBA, but We can still make it possible for a mgmt app to dynamically add/remove LUNS from a vHBA. Our node device APIs can report events when devices appear/disappear so an app can create a vHBA and monitor for LUN changes & update the guest. That said I'm curious how you'd handle remove, because surely by the time you see a LUN disappear at vHBA, the guest is already broken as its got an orphaned device. This seems like another reason to just assign the vHBA itself to let the guest deal with LUN removal more directly. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 22/02/2017 11:05, Daniel P. Berrange wrote: > I'd also suggest just assigning the whole vHBA What do you mean by "assigning the whole vHBA"? There is no concept of a vHBA as e.g. a PCI virtual function. > , but We can still make > it possible for a mgmt app to dynamically add/remove LUNS from a vHBA. > Our node device APIs can report events when devices appear/disappear > so an app can create a vHBA and monitor for LUN changes & update the > guest. That said I'm curious how you'd handle remove, because surely > by the time you see a LUN disappear at vHBA, the guest is already > broken as its got an orphaned device. This seems like another reason > to just assign the vHBA itself to let the guest deal with LUN removal > more directly. This is the same on bare metal: the device is orphaned at the time you unplug it. If the guest is using it and doesn't have a fallback, bad things happen. If it is not using it, or if it's using multipath, things ought to work just fine. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Feb 22, 2017 at 02:38:32PM +0100, Paolo Bonzini wrote: > > > On 22/02/2017 11:05, Daniel P. Berrange wrote: > > I'd also suggest just assigning the whole vHBA > > What do you mean by "assigning the whole vHBA"? There is no concept of > a vHBA as e.g. a PCI virtual function. I thought we had a generic SCSI passthrough capability that could attach to the SCSI vHBA in the host, or am I mis-remembering ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 22/02/2017 14:46, Daniel P. Berrange wrote: > On Wed, Feb 22, 2017 at 02:38:32PM +0100, Paolo Bonzini wrote: >> >> >> On 22/02/2017 11:05, Daniel P. Berrange wrote: >>> I'd also suggest just assigning the whole vHBA >> >> What do you mean by "assigning the whole vHBA"? There is no concept of >> a vHBA as e.g. a PCI virtual function. > > I thought we had a generic SCSI passthrough capability that could > attach to the SCSI vHBA in the host, or am I mis-remembering ? No, the SCSI passthrough capability attaches to a single device. With Fibre Channel, zoning allows you to configure the set of visible disks for each machine (or more accurately for each WWPN and/or WWNN) at the storage fabric level. So the idea is to just configure the WWPN/WWNN and let libvirt handle the discovery of disks underneath. The alternative is to do it in both oVirt and Nova though, as mentioned elsewhere in the thread, libvirt could still provide a udev wrapper. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Feb 22, 2017 at 03:10:00PM +0100, Paolo Bonzini wrote: > > > On 22/02/2017 14:46, Daniel P. Berrange wrote: > > On Wed, Feb 22, 2017 at 02:38:32PM +0100, Paolo Bonzini wrote: > >> > >> > >> On 22/02/2017 11:05, Daniel P. Berrange wrote: > >>> I'd also suggest just assigning the whole vHBA > >> > >> What do you mean by "assigning the whole vHBA"? There is no concept of > >> a vHBA as e.g. a PCI virtual function. > > > > I thought we had a generic SCSI passthrough capability that could > > attach to the SCSI vHBA in the host, or am I mis-remembering ? > > No, the SCSI passthrough capability attaches to a single device. > > With Fibre Channel, zoning allows you to configure the set of visible > disks for each machine (or more accurately for each WWPN and/or WWNN) at > the storage fabric level. So the idea is to just configure the > WWPN/WWNN and let libvirt handle the discovery of disks underneath. The > alternative is to do it in both oVirt and Nova though, as mentioned > elsewhere in the thread, libvirt could still provide a udev wrapper. What do you mean by "provide a udev wrapper" ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 22/02/2017 15:11, Daniel P. Berrange wrote: > On Wed, Feb 22, 2017 at 03:10:00PM +0100, Paolo Bonzini wrote: >> >> >> On 22/02/2017 14:46, Daniel P. Berrange wrote: >>> On Wed, Feb 22, 2017 at 02:38:32PM +0100, Paolo Bonzini wrote: >>>> >>>> >>>> On 22/02/2017 11:05, Daniel P. Berrange wrote: >>>>> I'd also suggest just assigning the whole vHBA >>>> >>>> What do you mean by "assigning the whole vHBA"? There is no concept of >>>> a vHBA as e.g. a PCI virtual function. >>> >>> I thought we had a generic SCSI passthrough capability that could >>> attach to the SCSI vHBA in the host, or am I mis-remembering ? >> >> No, the SCSI passthrough capability attaches to a single device. >> >> With Fibre Channel, zoning allows you to configure the set of visible >> disks for each machine (or more accurately for each WWPN and/or WWNN) at >> the storage fabric level. So the idea is to just configure the >> WWPN/WWNN and let libvirt handle the discovery of disks underneath. The >> alternative is to do it in both oVirt and Nova though, as mentioned >> elsewhere in the thread, libvirt could still provide a udev wrapper. > > What do you mean by "provide a udev wrapper" ? Expose udev events in a way that is friendly to someone who is consuming the nodedev API. Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Feb 22, 2017 at 03:13:05PM +0100, Paolo Bonzini wrote: > > > On 22/02/2017 15:11, Daniel P. Berrange wrote: > > On Wed, Feb 22, 2017 at 03:10:00PM +0100, Paolo Bonzini wrote: > >> > >> > >> On 22/02/2017 14:46, Daniel P. Berrange wrote: > >>> On Wed, Feb 22, 2017 at 02:38:32PM +0100, Paolo Bonzini wrote: > >>>> > >>>> > >>>> On 22/02/2017 11:05, Daniel P. Berrange wrote: > >>>>> I'd also suggest just assigning the whole vHBA > >>>> > >>>> What do you mean by "assigning the whole vHBA"? There is no concept of > >>>> a vHBA as e.g. a PCI virtual function. > >>> > >>> I thought we had a generic SCSI passthrough capability that could > >>> attach to the SCSI vHBA in the host, or am I mis-remembering ? > >> > >> No, the SCSI passthrough capability attaches to a single device. > >> > >> With Fibre Channel, zoning allows you to configure the set of visible > >> disks for each machine (or more accurately for each WWPN and/or WWNN) at > >> the storage fabric level. So the idea is to just configure the > >> WWPN/WWNN and let libvirt handle the discovery of disks underneath. The > >> alternative is to do it in both oVirt and Nova though, as mentioned > >> elsewhere in the thread, libvirt could still provide a udev wrapper. > > > > What do you mean by "provide a udev wrapper" ? > > Expose udev events in a way that is friendly to someone who is consuming > the nodedev API. We've got that support already. Apps can register to receive events whenever a virNodeDevice is added or removed from a host, as well as if its config changes. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2024 Red Hat, Inc.