docs/man/xl-disk-configuration.5.pod.in | 4 ++- tools/libs/light/libxl_disk.c | 7 ++- tools/libs/light/libxl_linux.c | 68 ++++++++++++++++++++++++-- 3 files changed, 74 insertions(+), 5 deletions(-)
This series in an attempt to speed up the domain start by removing slow block script from the picture. The current RFC covers the simplest possible case only - target being a block device directly. This case does not require locking at all. Further version will cover also setting up a loop device. This, compared to the default block script, saves about 0.5s of domain start time, per disk. Similar speedup can be achieved with a trivial lock-less script too. But for file-based disks, it won't be that simple with a script - setting up a loop device lock-less is tricky and ability to keep an FD open and call different ioctls on it greatly helps. Furthermore reusing the same loop device for the same file can be done significantly better with a cache (which can be stored in the libxl hosting process - like xl devd, or libvirt). This surely isn't the only option to improve disk setup time, but is a very atractive one. Few questions: 1. Is it acceptable approach at all? 2. Is empty 'script' parameter value going to fly? Unfortunately, NULL is already taken as "use default". Marek Marczykowski-Górecki (2): libxl: rename 'error' label to 'out' as it is used for success too libxl: allow to skip block script completely docs/man/xl-disk-configuration.5.pod.in | 4 ++- tools/libs/light/libxl_disk.c | 7 ++- tools/libs/light/libxl_linux.c | 68 ++++++++++++++++++++++++-- 3 files changed, 74 insertions(+), 5 deletions(-) base-commit: bea65a212c0581520203b6ad0d07615693f42f73 -- git-series 0.9.1
When it comes to file-based block devices, the major difficulty is the extremely bad kernel API. The only fully safe way to use loop devices is to use LOOP_CONFIGURE with LO_FLAGS_AUTOCLEAR and hold a file descriptor open to the device until another piece of code (either another userspace program or the kernel) has grabbed a reference to it. Everything else risks either using a freed loop device (that might now be attached to a different file) or risks leaking them on unclean exit. The only exception is if one can make certain assumptions, such as no other program freeing loop devices for the file in question. This is a reasonable assumption for Qubes dom0, but neither for Qubes domU nor for Xen dom0 in general. Nevertheless, this is effectively what the current block script does: if I understand the code correctly, there is a race where badly timed calls to losetup by another process could result in the block script freeing the wrong loop device. Worse, writes to XenStore only cause Linux to take a reference to the device at some unspecified point in the future, rather than synchronously. It takes a major and minor number, which means we need to hold a reference to the relevant loop device ourselves. FreeBSD solves this by having XenStore include a path to the device and/or regular file, but on Linux this leads to awkward issues with namespaces. Instead, I recommend that Linux gain an ioctl-based interface in the future, which takes a file descriptor to the device to use. The kernel would then do the writes itself. Thankfully, not all hope is lost, even with the current kernel API. We can use sd_pid_notify_with_fds to stash the file descriptors in PID 1, which will never exit. We can give those file descriptors a name, so that we know which is which if we are restarted. And we can close devices that we know are not in use by any VMs. The cache will allow us to avoid duplicating devices, which is actually quite important ― QubesOS doesn’t want each qube to have a separate file descriptor for its kernel, for example. Initially, I recommend focusing on handle the case where the process using libxl is not restarted. That is the simpler case, by far. I suggest starting by just setting up a loop device prior to attaching it, and destroying it when the device is detached. Caching can be added as the next step. -- Demi Marie Obenour she/her/hers QubesOS Developer, Invisible Things Lab
On Wed, Apr 28, 2021 at 3:00 AM Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > When it comes to file-based block devices, the major difficulty is > the extremely bad kernel API. The only fully safe way to use loop > devices is to use LOOP_CONFIGURE with LO_FLAGS_AUTOCLEAR and hold a > file descriptor open to the device until another piece of code (either > another userspace program or the kernel) has grabbed a reference to it. > Everything else risks either using a freed loop device (that might now > be attached to a different file) or risks leaking them on unclean exit. > The only exception is if one can make certain assumptions, such as no > other program freeing loop devices for the file in question. This is > a reasonable assumption for Qubes dom0, but neither for Qubes domU nor > for Xen dom0 in general. Nevertheless, this is effectively what the > current block script does: if I understand the code correctly, there > is a race where badly timed calls to losetup by another process could > result in the block script freeing the wrong loop device. I posted this a while ago, but didn't get any response: https://lore.kernel.org/xen-devel/CAKf6xpv-U91nF2Fik7GRN3SFeOWWcdR5R+ZcK5fgojE+-D43sg@mail.gmail.com/ tl;dr: AFAICT, the block script check_sharing function doesn't work for loop devices Regards, Jason
© 2016 - 2024 Red Hat, Inc.