block/file-posix.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-)
It looks like Linux block devices, even in O_DIRECT mode don't have any user visible limit on transfer size / number of segments, which underlying block device can have. The block layer takes care of enforcing these limits by splitting the bios. By limiting the transfer sizes, we force qemu to do the splitting itself which introduces various overheads. It is especially visible in nbd server, where the low max transfer size of the underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of image conversion which benefits from large blocks. More information can be found here: https://bugzilla.redhat.com/show_bug.cgi?id=1647104 Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit. (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size) The benchmark: Images were created using: Sparse image: qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata 1G / 10G / 100G The test was: echo "convert native:" rm -rf /dev/shm/disk.img time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero echo "convert via nbd:" qemu-nbd -k /tmp/nbd.sock -v -f qcow2 $FILE -x export --cache=none --aio=native --fork rm -rf /dev/shm/disk.img time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero The results: ========================================= 1G sparse image: native: before: 0.027s after: 0.027s nbd: before: 0.287s after: 0.035s ========================================= 100G sparse image: native: before: 0.028s after: 0.028s nbd: before: 23.796s after: 0.109s ========================================= 1G preallocated image: native: before: 0.454s after: 0.427s nbd: before: 0.649s after: 0.546s The block limits of max transfer size/max segment size are retained for the SCSI passthrough because in this case the kernel passes the userspace request directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split such requests. What do you think? Fam, since you was the original author of the code that added these limits, could you share your opinion on that? What was the reason besides SCSI passthrough? Best regards, Maxim Levitsky Maxim Levitsky (1): raw-posix.c - use max transfer length / max segemnt count only for SCSI passthrough block/file-posix.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) -- 2.17.2
On Sun, Jun 30, 2019 at 06:08:54PM +0300, Maxim Levitsky wrote: > It looks like Linux block devices, even in O_DIRECT mode don't have any user visible > limit on transfer size / number of segments, which underlying block device can have. > The block layer takes care of enforcing these limits by splitting the bios. > > By limiting the transfer sizes, we force qemu to do the splitting itself which > introduces various overheads. > It is especially visible in nbd server, where the low max transfer size of the > underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of > image conversion which benefits from large blocks. > > More information can be found here: > https://bugzilla.redhat.com/show_bug.cgi?id=1647104 > > Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit. > (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size) > > The benchmark: > > Images were created using: > > Sparse image: qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G > Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata 1G / 10G / 100G > > The test was: > > echo "convert native:" > rm -rf /dev/shm/disk.img > time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero > > echo "convert via nbd:" > qemu-nbd -k /tmp/nbd.sock -v -f qcow2 $FILE -x export --cache=none --aio=native --fork > rm -rf /dev/shm/disk.img > time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero > > The results: > > ========================================= > 1G sparse image: > native: > before: 0.027s > after: 0.027s > nbd: > before: 0.287s > after: 0.035s > > ========================================= > 100G sparse image: > native: > before: 0.028s > after: 0.028s > nbd: > before: 23.796s > after: 0.109s > > ========================================= > 1G preallocated image: > native: > before: 0.454s > after: 0.427s > nbd: > before: 0.649s > after: 0.546s > > The block limits of max transfer size/max segment size are retained > for the SCSI passthrough because in this case the kernel passes the userspace request > directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split > such requests. > > What do you think? > > Fam, since you was the original author of the code that added > these limits, could you share your opinion on that? > What was the reason besides SCSI passthrough? > > Best regards, > Maxim Levitsky > > Maxim Levitsky (1): > raw-posix.c - use max transfer length / max segemnt count only for > SCSI passthrough > > block/file-posix.c | 16 +++++++--------- > 1 file changed, 7 insertions(+), 9 deletions(-) Adding Eric Blake, who implemented the generic request splitting in the block layer and may know if there were any other reasons aside from SCSI passthrough why file-posix.c enforces the host block device's maximum transfer size. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
On 7/3/19 4:52 AM, Stefan Hajnoczi wrote: > On Sun, Jun 30, 2019 at 06:08:54PM +0300, Maxim Levitsky wrote: >> It looks like Linux block devices, even in O_DIRECT mode don't have any user visible >> limit on transfer size / number of segments, which underlying block device can have. >> The block layer takes care of enforcing these limits by splitting the bios. s/The block layer/The kernel block layer/ >> >> By limiting the transfer sizes, we force qemu to do the splitting itself which double space >> introduces various overheads. >> It is especially visible in nbd server, where the low max transfer size of the >> underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of Long line for a commit message. >> image conversion which benefits from large blocks. >> >> More information can be found here: >> https://bugzilla.redhat.com/show_bug.cgi?id=1647104 >> >> Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit. >> (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size) >> >> The benchmark: >> I'm sorry I didn't see this before softfreeze, but as a performance improvement, I think it still classes as a bug fix and is safe for inclusion in rc0. >> The block limits of max transfer size/max segment size are retained >> for the SCSI passthrough because in this case the kernel passes the userspace request >> directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split >> such requests. >> >> What do you think? Seems like a reasonable explanation. >> >> Fam, since you was the original author of the code that added >> these limits, could you share your opinion on that? >> What was the reason besides SCSI passthrough? >> >> Best regards, >> Maxim Levitsky >> >> Maxim Levitsky (1): >> raw-posix.c - use max transfer length / max segemnt count only for >> SCSI passthrough >> >> block/file-posix.c | 16 +++++++--------- >> 1 file changed, 7 insertions(+), 9 deletions(-) > > Adding Eric Blake, who implemented the generic request splitting in the > block layer and may know if there were any other reasons aside from SCSI > passthrough why file-posix.c enforces the host block device's maximum > transfer size. No, I don't have any strong reasons for why file I/O must be capped to a specific limit other than size_t (since the kernel does just fine at splitting things up). > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> > -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Sun, 2019-06-30 at 18:08 +0300, Maxim Levitsky wrote: > It looks like Linux block devices, even in O_DIRECT mode don't have any user visible > limit on transfer size / number of segments, which underlying block device can have. > The block layer takes care of enforcing these limits by splitting the bios. > > By limiting the transfer sizes, we force qemu to do the splitting itself which > introduces various overheads. > It is especially visible in nbd server, where the low max transfer size of the > underlying device forces us to advertise this over NBD, thus increasing the traffic overhead in case of > image conversion which benefits from large blocks. > > More information can be found here: > https://bugzilla.redhat.com/show_bug.cgi?id=1647104 > > Tested this with qemu-img convert over nbd and natively and to my surprise, even native IO performance improved a bit. > (The device on which it was tested is Intel Optane DC P4800X, which has 128k max transfer size) > > The benchmark: > > Images were created using: > > Sparse image: qemu-img create -f qcow2 /dev/nvme0n1p3 1G / 10G / 100G > Allocated image: qemu-img create -f qcow2 /dev/nvme0n1p3 -o preallocation=metadata 1G / 10G / 100G > > The test was: > > echo "convert native:" > rm -rf /dev/shm/disk.img > time qemu-img convert -p -f qcow2 -O raw -T none $FILE /dev/shm/disk.img > /dev/zero > > echo "convert via nbd:" > qemu-nbd -k /tmp/nbd.sock -v -f qcow2 $FILE -x export --cache=none --aio=native --fork > rm -rf /dev/shm/disk.img > time qemu-img convert -p -f raw -O raw nbd:unix:/tmp/nbd.sock:exportname=export /dev/shm/disk.img > /dev/zero > > The results: > > ========================================= > 1G sparse image: > native: > before: 0.027s > after: 0.027s > nbd: > before: 0.287s > after: 0.035s > > ========================================= > 100G sparse image: > native: > before: 0.028s > after: 0.028s > nbd: > before: 23.796s > after: 0.109s > > ========================================= > 1G preallocated image: > native: > before: 0.454s > after: 0.427s > nbd: > before: 0.649s > after: 0.546s > > The block limits of max transfer size/max segment size are retained > for the SCSI passthrough because in this case the kernel passes the userspace request > directly to the kernel scsi driver, bypassing the block layer, and thus there is no code to split > such requests. > > What do you think? > > Fam, since you was the original author of the code that added > these limits, could you share your opinion on that? > What was the reason besides SCSI passthrough? > > Best regards, > Maxim Levitsky > > Maxim Levitsky (1): > raw-posix.c - use max transfer length / max segemnt count only for > SCSI passthrough > > block/file-posix.c | 16 +++++++--------- > 1 file changed, 7 insertions(+), 9 deletions(-) > Ping Best regards, Maxim Levitsky
© 2016 - 2024 Red Hat, Inc.