src/libvirt-storage.c | 8 +++-- src/libvirt_private.syms | 2 ++ src/util/virfdstream.c | 74 ++++++++++++++++++++++++++++++---------- src/util/virfile.c | 67 ++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 +++ src/util/virstring.c | 40 ++++++++++++++++++++++ src/util/virstring.h | 2 ++ tools/virsh-util.c | 52 ++++++++++++++++++++++------ tools/virsh-util.h | 1 + tools/virsh-volume.c | 20 ++++++++++- 10 files changed, 238 insertions(+), 32 deletions(-)
The way our sparse streams are implemented is: 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take callbacks as arguments. These callbacks read/write data or determine if there is a hole in the underlying file and big it is. 2) libvirtd has something similar - virFDStream, except here the functions for read/write of data and hole handling are called directly. Sparse streams were originally implemented for regular files only => both ends of stream has to be regular files. This limitation exists because the callbacks from 1) (implemented in virsh for vol-download/vol-upload commands) and also from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a shortcut (valid for regular files) - when one side of the stream is asked to create a hole it merely lseek() + ftruncate(). For regular files this creates a hole and later, when somebody reads it all they get is zeroes. Neither of these two approaches work for block devices. Block devices have no notion of data/hole sections [1], nor can they be truncated. What we can do instead is read data from the block device and check if its full of zeroes. And for "creating a hole" we just write zeroes of requested size. There is a follow up patch that I am working on: this implementation I'm posting here has one disadvantage: after some blocks are read from the block device and they are found to contain data, the whole buffer is freed only to be read again. For instance, when uploading volume, virshStreamInData() is called at the beginning to check if the file containing data to upload doesn't start with a hole. If the file is a block device, then virFileInDataDetectZeroes() is called which reads 1MiB from it, finds (say) data and throws the buffer away. Then virshStreamSource() is called, which reads the 1MiB buffer again. The patch is still under development though. 1: Okay, SSDs keep list of free blocks for wear levelling, but the list is kept private by the SSD controller and I am not aware of any way of getting it. Michal Prívozník (9): libvirt-storage: Document volume upload/download stream format virstring: Introduce virStringIsNull() virfile: Introduce virFileInDataDetectZeroes() virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and virshStreamSkip() virsh: Track if vol-upload or vol-download work over a block device virshStreamSkip: Emulate skip for block devices virfdstream: Allow sparse stream vol-download virshStreamInData: Handle block devices virfdstream: Emulate skip for block devices src/libvirt-storage.c | 8 +++-- src/libvirt_private.syms | 2 ++ src/util/virfdstream.c | 74 ++++++++++++++++++++++++++++++---------- src/util/virfile.c | 67 ++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 +++ src/util/virstring.c | 40 ++++++++++++++++++++++ src/util/virstring.h | 2 ++ tools/virsh-util.c | 52 ++++++++++++++++++++++------ tools/virsh-util.h | 1 + tools/virsh-volume.c | 20 ++++++++++- 10 files changed, 238 insertions(+), 32 deletions(-) -- 2.26.2
On Fri, Jul 03, 2020 at 12:28:41PM +0200, Michal Privoznik wrote: > The way our sparse streams are implemented is: > > 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take > callbacks as arguments. These callbacks read/write data or determine if there > is a hole in the underlying file and big it is. > > 2) libvirtd has something similar - virFDStream, except here the functions for > read/write of data and hole handling are called directly. > > Sparse streams were originally implemented for regular files only => both ends > of stream has to be regular files. This limitation exists because the callbacks > from 1) (implemented in virsh for vol-download/vol-upload commands) and also > from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or > lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a > shortcut (valid for regular files) - when one side of the stream is asked to > create a hole it merely lseek() + ftruncate(). For regular files this creates a > hole and later, when somebody reads it all they get is zeroes. > > Neither of these two approaches work for block devices. Block devices have no > notion of data/hole sections [1], nor can they be truncated. What we can do > instead is read data from the block device and check if its full of zeroes. And > for "creating a hole" we just write zeroes of requested size. > > There is a follow up patch that I am working on: this implementation I'm > posting here has one disadvantage: after some blocks are read from the > block device and they are found to contain data, the whole buffer is > freed only to be read again. For instance, when uploading volume, > virshStreamInData() is called at the beginning to check if the file > containing data to upload doesn't start with a hole. If the file is a > block device, then virFileInDataDetectZeroes() is called which reads > 1MiB from it, finds (say) data and throws the buffer away. Then > virshStreamSource() is called, which reads the 1MiB buffer again. > The patch is still under development though. Was there a particular user/app feature request for this support ? I'm wondering about the likely use case, because if I was starting over from scratch I'd never implement stream support for storage volumes. Instead I would add APIs for starting/stopping a qemu-nbd server attached to the volume. Probably don't even need a start/ stop pair, could just run in single client mode where we pass back an opened client FD, and have qemu-nbd exit when this is closed. Depending on the user requesting sparse support for blockdevs it may still make sense to provide them an NBD solution, especially if they're likely to have followup feature requests already handled by NBD. NB, this is not an objection to your series here. Since you've already done the work and it is really just giving closer parity between block and file volumes. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 7/3/20 1:34 PM, Daniel P. Berrangé wrote: > On Fri, Jul 03, 2020 at 12:28:41PM +0200, Michal Privoznik wrote: >> The way our sparse streams are implemented is: >> >> 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take >> callbacks as arguments. These callbacks read/write data or determine if there >> is a hole in the underlying file and big it is. >> >> 2) libvirtd has something similar - virFDStream, except here the functions for >> read/write of data and hole handling are called directly. >> >> Sparse streams were originally implemented for regular files only => both ends >> of stream has to be regular files. This limitation exists because the callbacks >> from 1) (implemented in virsh for vol-download/vol-upload commands) and also >> from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or >> lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a >> shortcut (valid for regular files) - when one side of the stream is asked to >> create a hole it merely lseek() + ftruncate(). For regular files this creates a >> hole and later, when somebody reads it all they get is zeroes. >> >> Neither of these two approaches work for block devices. Block devices have no >> notion of data/hole sections [1], nor can they be truncated. What we can do >> instead is read data from the block device and check if its full of zeroes. And >> for "creating a hole" we just write zeroes of requested size. >> >> There is a follow up patch that I am working on: this implementation I'm >> posting here has one disadvantage: after some blocks are read from the >> block device and they are found to contain data, the whole buffer is >> freed only to be read again. For instance, when uploading volume, >> virshStreamInData() is called at the beginning to check if the file >> containing data to upload doesn't start with a hole. If the file is a >> block device, then virFileInDataDetectZeroes() is called which reads >> 1MiB from it, finds (say) data and throws the buffer away. Then >> virshStreamSource() is called, which reads the 1MiB buffer again. >> The patch is still under development though. > > Was there a particular user/app feature request for this support ? Yes, see BZ linked in 9/9. https://bugzilla.redhat.com/show_bug.cgi?id=1852528 Apparently, VDSM uses streams to copy volumes around. > > I'm wondering about the likely use case, because if I was starting > over from scratch I'd never implement stream support for storage > volumes. Instead I would add APIs for starting/stopping a qemu-nbd > server attached to the volume. Probably don't even need a start/ > stop pair, could just run in single client mode where we pass back > an opened client FD, and have qemu-nbd exit when this is closed. > > Depending on the user requesting sparse support for blockdevs it > may still make sense to provide them an NBD solution, especially > if they're likely to have followup feature requests already handled > by NBD. Agreed, streams should have been for console and screenshot only. They are strictly worse for handling large files than scp/nbd/rsync/... because it all has to go through our event loop. And client even loop. On the other hand, they are multiplexed within virCommand which is an advantage that neither of the aforementioned tools have (no need to open a special port then). Michal
On Fri, Jul 03, 2020 at 01:50:22PM +0200, Michal Privoznik wrote: > On 7/3/20 1:34 PM, Daniel P. Berrangé wrote: > > On Fri, Jul 03, 2020 at 12:28:41PM +0200, Michal Privoznik wrote: > > > The way our sparse streams are implemented is: > > > > > > 1) user facing APIs (virStreamSparseRecvAll() and virStreamSparseSendAll()) take > > > callbacks as arguments. These callbacks read/write data or determine if there > > > is a hole in the underlying file and big it is. > > > > > > 2) libvirtd has something similar - virFDStream, except here the functions for > > > read/write of data and hole handling are called directly. > > > > > > Sparse streams were originally implemented for regular files only => both ends > > > of stream has to be regular files. This limitation exists because the callbacks > > > from 1) (implemented in virsh for vol-download/vol-upload commands) and also > > > from 2) (which is basically the same code) uses lseek(..., SEEK_DATA) and/or > > > lseek(..., SEEK_HOLE) to get map of allocated file blocks. They also take a > > > shortcut (valid for regular files) - when one side of the stream is asked to > > > create a hole it merely lseek() + ftruncate(). For regular files this creates a > > > hole and later, when somebody reads it all they get is zeroes. > > > > > > Neither of these two approaches work for block devices. Block devices have no > > > notion of data/hole sections [1], nor can they be truncated. What we can do > > > instead is read data from the block device and check if its full of zeroes. And > > > for "creating a hole" we just write zeroes of requested size. > > > > > > There is a follow up patch that I am working on: this implementation I'm > > > posting here has one disadvantage: after some blocks are read from the > > > block device and they are found to contain data, the whole buffer is > > > freed only to be read again. For instance, when uploading volume, > > > virshStreamInData() is called at the beginning to check if the file > > > containing data to upload doesn't start with a hole. If the file is a > > > block device, then virFileInDataDetectZeroes() is called which reads > > > 1MiB from it, finds (say) data and throws the buffer away. Then > > > virshStreamSource() is called, which reads the 1MiB buffer again. > > > The patch is still under development though. > > > > Was there a particular user/app feature request for this support ? > > Yes, see BZ linked in 9/9. > > https://bugzilla.redhat.com/show_bug.cgi?id=1852528 > > Apparently, VDSM uses streams to copy volumes around. Ah I see. We might want to ask them if they would find it useful if we added an NBD alternative. > > I'm wondering about the likely use case, because if I was starting > > over from scratch I'd never implement stream support for storage > > volumes. Instead I would add APIs for starting/stopping a qemu-nbd > > server attached to the volume. Probably don't even need a start/ > > stop pair, could just run in single client mode where we pass back > > an opened client FD, and have qemu-nbd exit when this is closed. > > > > Depending on the user requesting sparse support for blockdevs it > > may still make sense to provide them an NBD solution, especially > > if they're likely to have followup feature requests already handled > > by NBD. > > Agreed, streams should have been for console and screenshot only. They are > strictly worse for handling large files than scp/nbd/rsync/... because it > all has to go through our event loop. And client even loop. > On the other hand, they are multiplexed within virCommand which is an > advantage that neither of the aforementioned tools have (no need to open a > special port then). If qemu-nbd listen on a private UNIX socket, and we pass back a FD connected to that socket, there's no need to deal with firewalls or permissions. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Fri, 2020-07-03 at 12:28 +0200, Michal Privoznik wrote: > Michal Prívozník (9): > libvirt-storage: Document volume upload/download stream format > virstring: Introduce virStringIsNull() > virfile: Introduce virFileInDataDetectZeroes() > virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and > virshStreamSkip() > virsh: Track if vol-upload or vol-download work over a block device > virshStreamSkip: Emulate skip for block devices > virfdstream: Allow sparse stream vol-download > virshStreamInData: Handle block devices > virfdstream: Emulate skip for block devices Is this change something that we should mention in the release notes? If so, please post a patch that updates them accordingly. Thanks! -- Andrea Bolognani / Red Hat / Virtualization
On 7/3/20 3:59 PM, Andrea Bolognani wrote: > On Fri, 2020-07-03 at 12:28 +0200, Michal Privoznik wrote: >> Michal Prívozník (9): >> libvirt-storage: Document volume upload/download stream format >> virstring: Introduce virStringIsNull() >> virfile: Introduce virFileInDataDetectZeroes() >> virsh: Pass virshStreamCallbackDataPtr to virshStreamSink() and >> virshStreamSkip() >> virsh: Track if vol-upload or vol-download work over a block device >> virshStreamSkip: Emulate skip for block devices >> virfdstream: Allow sparse stream vol-download >> virshStreamInData: Handle block devices >> virfdstream: Emulate skip for block devices > > Is this change something that we should mention in the release notes? > If so, please post a patch that updates them accordingly. Thanks! > Of course! I was think that I have to write a release note for this one as I was updating NEWS.rst for next release in the morning. But Guess what, of course I forgot :-) Michal
On Fri, 2020-07-03 at 16:06 +0200, Michal Privoznik wrote: > On 7/3/20 3:59 PM, Andrea Bolognani wrote: > > Is this change something that we should mention in the release notes? > > If so, please post a patch that updates them accordingly. Thanks! > > Of course! I was think that I have to write a release note for this one > as I was updating NEWS.rst for next release in the morning. But Guess > what, of course I forgot :-) That's fine, I'm just trying to be a bit more proactive with the release notes, by pestering people into providing updates themselves along with the code changes instead of always scrambling to document a month worth of commits during the freeze period :) -- Andrea Bolognani / Red Hat / Virtualization
© 2016 - 2024 Red Hat, Inc.