|
[ Patch series also available here, along with this cover letter and the script used to generate test results: https://gitlab.com/rwmjones/qemu/-/commits/2023-nbd-multi-conn-v1 ] This patch series adds multi-conn support to the NBD block driver in qemu. It is only meant for discussion and testing because it has a number of obvious shortcomings (see "XXX" in commit messages and code). If we decided this was a good idea, we can work on a better patch. The Network Block Driver (NBD) protocol allows servers to advertise that they are capable of multi-conn. This means they obey certain requirements around how data is cached, allowing multiple connections to be safely opened to the NBD server. For example, a flush or FUA operation on one connection is guaranteed to flush the cache on all connections. Clients that use multi-conn can achieve better performance. This seems to be down to at least two factors: - Avoids "head of line blocking" of large requests. - With NBD over Unix domain sockets, more cores can be used. qemu-nbd, nbdkit and libnbd have all supported multi-conn for ages, but qemu's built in NBD client does not, which is what this patch fixes. Below I've produced a few benchmarks. Note these are mostly concoted to try to test NBD performance and may not make sense in their own terms (eg. qemu's disk image layer has a curl client so you wouldn't need to run one separately). In the real world we use long pipelines of NBD operations where different tools are mixed together to achieve efficient downloads, conversions, disk modifications and sparsification, and they would exercise different aspects of this. I've also included nbdcopy as a client for comparison in some tests. All tests were run 4 times, the first result discarded, and the last 3 averaged. If any of the last 3 were > 10% away from the average then the test was stopped. My summary: - It works effectively for qemu client & nbdkit server, especially in cases where the server does large, heavyweight requests. This is important for us because virt-v2v uses an nbdkit Python plugin and various other heavyweight plugins (eg. plugins that access remote servers for each request). - It seems to make little or no difference with qemu + qemu-nbd server. I speculate that's because qemu-nbd doesn't support system threads, so networking is bottlenecked through a single core. Even though there are coroutines handling different sockets, they must all wait in turn to issue send(3) or recv(3) calls on the same core. - qemu-img unfortunately uses a single thread for all coroutines so it suffers from a similar problem to qemu-nbd. This change would be much more effective if we could distribute coroutines across threads. - For tests which are highly bottlenecked on disk I/O (eg. the large local file test and null test) multi-conn doesn't make much difference. - Multi-conn even with only 2 connections can make up for the overhead of range requests, exceeding the performance of wget. - In the curlremote test, qemu-nbd is especially slow, for unknown reasons. Integrity test (./multi-conn.pl integrity) ========================================== nbdkit-sparse-random-plugin | ^ | nbd+unix | nbd+unix v | qemu-img convert Reading from and writing the same data back to nbdkit sparse-random plugin checks that the data written is the same as the data read. This uses two Unix domain sockets, with or without multi-conn. This test is mainly here to check we don't crash or corrupt data with this patch. server client multi-conn --------------------------------------------------------------- nbdkit qemu-img [u/s] 9.07s nbdkit qemu-img 1 9.05s nbdkit qemu-img 2 9.02s nbdkit qemu-img 4 8.98s [u/s] = upstream qemu 7.2.0 Curl local server test (./multi-conn.pl curlhttp) ================================================= Localhost Apache serving a file over http | | http v nbdkit-curl-plugin or qemu-nbd | | nbd+unix v qemu-img convert or nbdcopy We download an image from a local web server through nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD. The image is copied to /dev/null. server client multi-conn --------------------------------------------------------------- qemu-nbd nbdcopy 1 8.88s qemu-nbd nbdcopy 2 8.64s qemu-nbd nbdcopy 4 8.37s qemu-nbd qemu-img [u/s] 6.47s qemu-nbd qemu-img 1 6.56s qemu-nbd qemu-img 2 6.63s qemu-nbd qemu-img 4 6.50s nbdkit nbdcopy 1 12.15s nbdkit nbdcopy 2 7.05s (72.36% better) nbdkit nbdcopy 4 3.54s (242.90% better) nbdkit qemu-img [u/s] 6.90s nbdkit qemu-img 1 7.00s nbdkit qemu-img 2 3.85s (79.15% better) nbdkit qemu-img 4 3.85s (79.15% better) Curl local file test (./multi-conn.pl curlfile) =============================================== nbdkit-curl-plugin using file:/// URI | | nbd+unix v qemu-img convert or nbdcopy We download from a file:/// URI. This test is designed to exercise NBD and some curl internal paths without the overhead from an external server. qemu-nbd doesn't support file:/// URIs so we cannot duplicate the test for qemu as server. server client multi-conn --------------------------------------------------------------- nbdkit nbdcopy 1 31.32s nbdkit nbdcopy 2 20.29s (54.38% better) nbdkit nbdcopy 4 13.22s (136.91% better) nbdkit qemu-img [u/s] 31.55s nbdkit qemu-img 1 31.70s nbdkit qemu-img 2 21.60s (46.07% better) nbdkit qemu-img 4 13.88s (127.25% better) Curl remote server test (./multi-conn.pl curlremote) ==================================================== nbdkit-curl-plugin using http://remote/*.qcow2 URI | | nbd+unix v qemu-img convert We download from a remote qcow2 file to a local raw file, converting between formats during copying. qemu-nbd using http://remote/*.qcow2 URI | | nbd+unix v qemu-img convert Similarly, replacing nbdkit with qemu-nbd (treating the remote file as if it is raw, so the conversion is still done by qemu-img). Additionally we compare downloading the file with wget (note this doesn't include the time for conversion, but that should only be a few seconds). server client multi-conn --------------------------------------------------------------- - wget 1 58.19s nbdkit qemu-img [u/s] 68.29s (17.36% worse) nbdkit qemu-img 1 67.85s (16.60% worse) nbdkit qemu-img 2 58.17s nbdkit qemu-img 4 59.80s nbdkit qemu-img 6 59.15s nbdkit qemu-img 8 59.52s qemu-nbd qemu-img [u/s] 202.55s qemu-nbd qemu-img 1 204.61s qemu-nbd qemu-img 2 196.73s qemu-nbd qemu-img 4 179.53s (12.83% better) qemu-nbd qemu-img 6 181.70s (11.48% better) qemu-nbd qemu-img 8 181.05s (11.88% better) Local file test (./multi-conn.pl file) ====================================== qemu-nbd or nbdkit serving a large local file | | nbd+unix v qemu-img convert or nbdcopy We download a local file over NBD. The image is copied to /dev/null. server client multi-conn --------------------------------------------------------------- qemu-nbd nbdcopy 1 15.50s qemu-nbd nbdcopy 2 14.36s qemu-nbd nbdcopy 4 14.32s qemu-nbd qemu-img [u/s] 10.16s qemu-nbd qemu-img 1 11.17s (10.01% worse) qemu-nbd qemu-img 2 10.35s qemu-nbd qemu-img 4 10.39s nbdkit nbdcopy 1 9.10s nbdkit nbdcopy 2 8.25s nbdkit nbdcopy 4 8.60s nbdkit qemu-img [u/s] 8.64s nbdkit qemu-img 1 9.38s nbdkit qemu-img 2 8.69s nbdkit qemu-img 4 8.87s Null test (./multi-conn.pl null) ================================ qemu-nbd with null-co driver or nbdkit-null-plugin + noextents filter | | nbd+unix v qemu-img convert or nbdcopy This is like the local file test above, but without needing a file. Instead all zeroes (fully allocated) are downloaded over NBD. server client multi-conn --------------------------------------------------------------- qemu-nbd nbdcopy 1 14.86s qemu-nbd nbdcopy 2 17.08s (14.90% worse) qemu-nbd nbdcopy 4 17.89s (20.37% worse) qemu-nbd qemu-img [u/s] 13.29s qemu-nbd qemu-img 1 13.31s qemu-nbd qemu-img 2 13.00s qemu-nbd qemu-img 4 12.62s nbdkit nbdcopy 1 15.06s nbdkit nbdcopy 2 12.21s (23.32% better) nbdkit nbdcopy 4 11.67s (29.10% better) nbdkit qemu-img [u/s] 17.13s nbdkit qemu-img 1 17.11s nbdkit qemu-img 2 16.82s nbdkit qemu-img 4 18.81s
On Thu, Mar 09, 2023 at 11:39:42AM +0000, Richard W.M. Jones wrote: > [ Patch series also available here, along with this cover letter and the > script used to generate test results: > https://gitlab.com/rwmjones/qemu/-/commits/2023-nbd-multi-conn-v1 ] > > This patch series adds multi-conn support to the NBD block driver in > qemu. It is only meant for discussion and testing because it has a > number of obvious shortcomings (see "XXX" in commit messages and > code). If we decided this was a good idea, we can work on a better > patch. Overall, I'm in favor of this. A longer term project might be to have qemu's NBD client code call into libnbd instead of reimplementing things itself, at which point having libnbd manage multi-conn under the hood would be awesome, but as that's a much bigger effort, a shorter-term task of having qemu itself handle parallel sockets seems worthwhile. > > - It works effectively for qemu client & nbdkit server, especially in > cases where the server does large, heavyweight requests. This is > important for us because virt-v2v uses an nbdkit Python plugin and > various other heavyweight plugins (eg. plugins that access remote > servers for each request). > > - It seems to make little or no difference with qemu + qemu-nbd > server. I speculate that's because qemu-nbd doesn't support system > threads, so networking is bottlenecked through a single core. Even > though there are coroutines handling different sockets, they must > all wait in turn to issue send(3) or recv(3) calls on the same > core. Is the current work to teach qemu to do multi-queue (that is, spread the I/O load for a single block device across multiple cores) going to help here? I haven't been following the multi-queue efforts closely enough to know if the approach used in this series will play nicely, or need even further overhaul. > > - qemu-img unfortunately uses a single thread for all coroutines so > it suffers from a similar problem to qemu-nbd. This change would > be much more effective if we could distribute coroutines across > threads. qemu-img uses the same client code as qemu-nbd; any multi-queue improvements that can spread the send()/recv() load of multiple sockets across multiple cores will benefit both programs simultaneously. > > - For tests which are highly bottlenecked on disk I/O (eg. the large > local file test and null test) multi-conn doesn't make much > difference. As long as it isn't adding to much penalty, that's okay. If the saturation is truly at the point of how fast disk requests can be served, it doesn't matter if we can queue up more of those requests in parallel across multiple NBD sockets. > > - Multi-conn even with only 2 connections can make up for the > overhead of range requests, exceeding the performance of wget. That alone is a rather cool result, and an argument in favor of further developing this. > > - In the curlremote test, qemu-nbd is especially slow, for unknown > reasons. > > > Integrity test (./multi-conn.pl integrity) > ========================================== > > nbdkit-sparse-random-plugin > | ^ > | nbd+unix | nbd+unix > v | > qemu-img convert > > Reading from and writing the same data back to nbdkit sparse-random > plugin checks that the data written is the same as the data read. > This uses two Unix domain sockets, with or without multi-conn. This > test is mainly here to check we don't crash or corrupt data with this > patch. > > server client multi-conn > --------------------------------------------------------------- > nbdkit qemu-img [u/s] 9.07s > nbdkit qemu-img 1 9.05s > nbdkit qemu-img 2 9.02s > nbdkit qemu-img 4 8.98s > > [u/s] = upstream qemu 7.2.0 How many of these timing numbers can be repeated with TLS in the mix? > > > Curl local server test (./multi-conn.pl curlhttp) > ================================================= > > Localhost Apache serving a file over http > | > | http > v > nbdkit-curl-plugin or qemu-nbd > | > | nbd+unix > v > qemu-img convert or nbdcopy > > We download an image from a local web server through > nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD. > The image is copied to /dev/null. > > server client multi-conn > --------------------------------------------------------------- > qemu-nbd nbdcopy 1 8.88s > qemu-nbd nbdcopy 2 8.64s > qemu-nbd nbdcopy 4 8.37s > qemu-nbd qemu-img [u/s] 6.47s Do we have any good feel for why qemu-img is faster than nbdcopy in the baseline? But improving that is orthogonal to this series. > qemu-nbd qemu-img 1 6.56s > qemu-nbd qemu-img 2 6.63s > qemu-nbd qemu-img 4 6.50s > nbdkit nbdcopy 1 12.15s I'm assuming this is nbdkit with your recent in-progress patches to have the curl plugin serve parallel requests. But another place where we can investigate why nbdkit is not as performant as qemu-nbd at utilizing curl. > nbdkit nbdcopy 2 7.05s (72.36% better) > nbdkit nbdcopy 4 3.54s (242.90% better) That one is impressive! > nbdkit qemu-img [u/s] 6.90s > nbdkit qemu-img 1 7.00s Minimal penalty for adding the code but not utilizing it... > nbdkit qemu-img 2 3.85s (79.15% better) > nbdkit qemu-img 4 3.85s (79.15% better) ...and definitely shows its worth. > > > Curl local file test (./multi-conn.pl curlfile) > =============================================== > > nbdkit-curl-plugin using file:/// URI > | > | nbd+unix > v > qemu-img convert or nbdcopy > > We download from a file:/// URI. This test is designed to exercise > NBD and some curl internal paths without the overhead from an external > server. qemu-nbd doesn't support file:/// URIs so we cannot duplicate > the test for qemu as server. > > server client multi-conn > --------------------------------------------------------------- > nbdkit nbdcopy 1 31.32s > nbdkit nbdcopy 2 20.29s (54.38% better) > nbdkit nbdcopy 4 13.22s (136.91% better) > nbdkit qemu-img [u/s] 31.55s Here, the baseline is already comparable; both nbdcopy and qemu-img are parsing the image off nbdkit in about the same amount of time. > nbdkit qemu-img 1 31.70s And again, minimal penalty for having the new code in place but not exploiting it. > nbdkit qemu-img 2 21.60s (46.07% better) > nbdkit qemu-img 4 13.88s (127.25% better) Plus an obvious benefit when the parallel sockets matter. > > > Curl remote server test (./multi-conn.pl curlremote) > ==================================================== > > nbdkit-curl-plugin using http://remote/*.qcow2 URI > | > | nbd+unix > v > qemu-img convert > > We download from a remote qcow2 file to a local raw file, converting > between formats during copying. > > qemu-nbd using http://remote/*.qcow2 URI > | > | nbd+unix > v > qemu-img convert > > Similarly, replacing nbdkit with qemu-nbd (treating the remote file as > if it is raw, so the conversion is still done by qemu-img). > > Additionally we compare downloading the file with wget (note this > doesn't include the time for conversion, but that should only be a few > seconds). > > server client multi-conn > --------------------------------------------------------------- > - wget 1 58.19s > nbdkit qemu-img [u/s] 68.29s (17.36% worse) > nbdkit qemu-img 1 67.85s (16.60% worse) > nbdkit qemu-img 2 58.17s Comparable to wget on paper, but a win in practice (since the wget step also has to add a post-download qemu-img local conversion step). > nbdkit qemu-img 4 59.80s > nbdkit qemu-img 6 59.15s > nbdkit qemu-img 8 59.52s > > qemu-nbd qemu-img [u/s] 202.55s > qemu-nbd qemu-img 1 204.61s > qemu-nbd qemu-img 2 196.73s > qemu-nbd qemu-img 4 179.53s (12.83% better) > qemu-nbd qemu-img 6 181.70s (11.48% better) > qemu-nbd qemu-img 8 181.05s (11.88% better) > Less dramatic results here, but still nothing horrible. > > Local file test (./multi-conn.pl file) > ====================================== > > qemu-nbd or nbdkit serving a large local file > | > | nbd+unix > v > qemu-img convert or nbdcopy > > We download a local file over NBD. The image is copied to /dev/null. > > server client multi-conn > --------------------------------------------------------------- > qemu-nbd nbdcopy 1 15.50s > qemu-nbd nbdcopy 2 14.36s > qemu-nbd nbdcopy 4 14.32s > qemu-nbd qemu-img [u/s] 10.16s Once again, we're seeing qemu-img baseline faster than nbdcopy as client. But throwing more sockets at either client does improve performance, except for... > qemu-nbd qemu-img 1 11.17s (10.01% worse) ...this one looks bad. Is it a case of this series adding more mutex work (qemu-img is making parallel requests; each request then contends for the mutex only to learn that it will be using the same NBD connection)? And your comments about smarter round-robin schemes mean there may still be room to avoid this much of a penalty. > qemu-nbd qemu-img 2 10.35s > qemu-nbd qemu-img 4 10.39s > nbdkit nbdcopy 1 9.10s This one in interesting: nbdkit as server performs better than qemu-nbd. > nbdkit nbdcopy 2 8.25s > nbdkit nbdcopy 4 8.60s > nbdkit qemu-img [u/s] 8.64s > nbdkit qemu-img 1 9.38s > nbdkit qemu-img 2 8.69s > nbdkit qemu-img 4 8.87s > > > Null test (./multi-conn.pl null) > ================================ > > qemu-nbd with null-co driver or nbdkit-null-plugin + noextents filter > | > | nbd+unix > v > qemu-img convert or nbdcopy > > This is like the local file test above, but without needing a file. > Instead all zeroes (fully allocated) are downloaded over NBD. And I'm sure that if you allowed block status to show the holes, the performance would be a lot faster, but that would be testing something completely differently ;) > > server client multi-conn > --------------------------------------------------------------- > qemu-nbd nbdcopy 1 14.86s > qemu-nbd nbdcopy 2 17.08s (14.90% worse) > qemu-nbd nbdcopy 4 17.89s (20.37% worse) Oh, that's weird. I wonder if qemu's null-co driver has some poor mutex behavior when being hit by parallel I/O. Seems like investigating that can be separate from this series, though. > qemu-nbd qemu-img [u/s] 13.29s And another point where qemu-img is faster than nbdcopy as a single-client baseline. > qemu-nbd qemu-img 1 13.31s > qemu-nbd qemu-img 2 13.00s > qemu-nbd qemu-img 4 12.62s > nbdkit nbdcopy 1 15.06s > nbdkit nbdcopy 2 12.21s (23.32% better) > nbdkit nbdcopy 4 11.67s (29.10% better) > nbdkit qemu-img [u/s] 17.13s > nbdkit qemu-img 1 17.11s > nbdkit qemu-img 2 16.82s > nbdkit qemu-img 4 18.81s Overall, I'm looking forward to seeing this go in (8.1 material; we're too close to 8.0) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
On Fri, Mar 10, 2023 at 01:04:12PM -0600, Eric Blake wrote: > How many of these timing numbers can be repeated with TLS in the mix? While I have been playing with TLS and kTLS recently, it's not something that is especially important to v2v since all NBD traffic goes over Unix domain sockets only (ie. it's used as kind of interprocess communication). I could certainly provide benchmarks, although as I'm going on holiday shortly it may be a little while. > > Curl local server test (./multi-conn.pl curlhttp) > > ================================================= > > > > Localhost Apache serving a file over http > > | > > | http > > v > > nbdkit-curl-plugin or qemu-nbd > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download an image from a local web server through > > nbdkit-curl-plugin or qemu-nbd using the curl block driver, over NBD. > > The image is copied to /dev/null. > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 8.88s > > qemu-nbd nbdcopy 2 8.64s > > qemu-nbd nbdcopy 4 8.37s > > qemu-nbd qemu-img [u/s] 6.47s > > Do we have any good feel for why qemu-img is faster than nbdcopy in > the baseline? But improving that is orthogonal to this series. I do not, but we have in the past found that results can be very sensitive to request size. By default (and also in all of these tests) nbdcopy is using a request size of 256K, and qemu-img is using a request size of 2M. > > qemu-nbd qemu-img 1 6.56s > > qemu-nbd qemu-img 2 6.63s > > qemu-nbd qemu-img 4 6.50s > > nbdkit nbdcopy 1 12.15s > > I'm assuming this is nbdkit with your recent in-progress patches to > have the curl plugin serve parallel requests. But another place where > we can investigate why nbdkit is not as performant as qemu-nbd at > utilizing curl. > > > nbdkit nbdcopy 2 7.05s (72.36% better) > > nbdkit nbdcopy 4 3.54s (242.90% better) > > That one is impressive! > > > nbdkit qemu-img [u/s] 6.90s > > nbdkit qemu-img 1 7.00s > > Minimal penalty for adding the code but not utilizing it... [u/s] and qemu-img with multi-conn:1 ought to be identical actually. After all, the only difference should be the restructuring of the code to add the intermediate NBDConnState struct In this case it's probably just measurement error. > > nbdkit qemu-img 2 3.85s (79.15% better) > > nbdkit qemu-img 4 3.85s (79.15% better) > > ...and definitely shows its worth. > > > > > > > Curl local file test (./multi-conn.pl curlfile) > > =============================================== > > > > nbdkit-curl-plugin using file:/// URI > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download from a file:/// URI. This test is designed to exercise > > NBD and some curl internal paths without the overhead from an external > > server. qemu-nbd doesn't support file:/// URIs so we cannot duplicate > > the test for qemu as server. > > > > server client multi-conn > > --------------------------------------------------------------- > > nbdkit nbdcopy 1 31.32s > > nbdkit nbdcopy 2 20.29s (54.38% better) > > nbdkit nbdcopy 4 13.22s (136.91% better) > > nbdkit qemu-img [u/s] 31.55s > > Here, the baseline is already comparable; both nbdcopy and qemu-img > are parsing the image off nbdkit in about the same amount of time. > > > nbdkit qemu-img 1 31.70s > > And again, minimal penalty for having the new code in place but not > exploiting it. > > > nbdkit qemu-img 2 21.60s (46.07% better) > > nbdkit qemu-img 4 13.88s (127.25% better) > > Plus an obvious benefit when the parallel sockets matter. > > > > > > > Curl remote server test (./multi-conn.pl curlremote) > > ==================================================== > > > > nbdkit-curl-plugin using http://remote/*.qcow2 URI > > | > > | nbd+unix > > v > > qemu-img convert > > > > We download from a remote qcow2 file to a local raw file, converting > > between formats during copying. > > > > qemu-nbd using http://remote/*.qcow2 URI > > | > > | nbd+unix > > v > > qemu-img convert > > > > Similarly, replacing nbdkit with qemu-nbd (treating the remote file as > > if it is raw, so the conversion is still done by qemu-img). > > > > Additionally we compare downloading the file with wget (note this > > doesn't include the time for conversion, but that should only be a few > > seconds). > > > > server client multi-conn > > --------------------------------------------------------------- > > - wget 1 58.19s > > nbdkit qemu-img [u/s] 68.29s (17.36% worse) > > nbdkit qemu-img 1 67.85s (16.60% worse) > > nbdkit qemu-img 2 58.17s > > Comparable to wget on paper, but a win in practice (since the wget > step also has to add a post-download qemu-img local conversion step). Yes, correct. Best case that would be another ~ 2-3 seconds on this machine. > > nbdkit qemu-img 4 59.80s > > nbdkit qemu-img 6 59.15s > > nbdkit qemu-img 8 59.52s > > > > qemu-nbd qemu-img [u/s] 202.55s > > qemu-nbd qemu-img 1 204.61s > > qemu-nbd qemu-img 2 196.73s > > qemu-nbd qemu-img 4 179.53s (12.83% better) > > qemu-nbd qemu-img 6 181.70s (11.48% better) > > qemu-nbd qemu-img 8 181.05s (11.88% better) > > > > Less dramatic results here, but still nothing horrible. > > > > > Local file test (./multi-conn.pl file) > > ====================================== > > > > qemu-nbd or nbdkit serving a large local file > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > We download a local file over NBD. The image is copied to /dev/null. > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 15.50s > > qemu-nbd nbdcopy 2 14.36s > > qemu-nbd nbdcopy 4 14.32s > > qemu-nbd qemu-img [u/s] 10.16s > > Once again, we're seeing qemu-img baseline faster than nbdcopy as > client. But throwing more sockets at either client does improve > performance, except for... > > > qemu-nbd qemu-img 1 11.17s (10.01% worse) > > ...this one looks bad. Is it a case of this series adding more mutex > work (qemu-img is making parallel requests; each request then contends > for the mutex only to learn that it will be using the same NBD > connection)? And your comments about smarter round-robin schemes mean > there may still be room to avoid this much of a penalty. This was reproducible and I don't have a good explanation for it. As far as I know just adding the NBDConnState struct should not add any overhead. The only locking is the call to choose_connection, and that's just the access to an atomic variable which I can't imagine could cause such a difference. > > qemu-nbd qemu-img 2 10.35s > > qemu-nbd qemu-img 4 10.39s > > nbdkit nbdcopy 1 9.10s > > This one in interesting: nbdkit as server performs better than > qemu-nbd. > > > nbdkit nbdcopy 2 8.25s > > nbdkit nbdcopy 4 8.60s > > nbdkit qemu-img [u/s] 8.64s > > nbdkit qemu-img 1 9.38s > > nbdkit qemu-img 2 8.69s > > nbdkit qemu-img 4 8.87s > > > > > > Null test (./multi-conn.pl null) > > ================================ > > > > qemu-nbd with null-co driver or nbdkit-null-plugin + noextents filter > > | > > | nbd+unix > > v > > qemu-img convert or nbdcopy > > > > This is like the local file test above, but without needing a file. > > Instead all zeroes (fully allocated) are downloaded over NBD. > > And I'm sure that if you allowed block status to show the holes, the > performance would be a lot faster, but that would be testing something > completely differently ;) > > > > > server client multi-conn > > --------------------------------------------------------------- > > qemu-nbd nbdcopy 1 14.86s > > qemu-nbd nbdcopy 2 17.08s (14.90% worse) > > qemu-nbd nbdcopy 4 17.89s (20.37% worse) > > Oh, that's weird. I wonder if qemu's null-co driver has some poor > mutex behavior when being hit by parallel I/O. Seems like > investigating that can be separate from this series, though. Yes, I noticed in other tests that null-co has some odd behaviour, but I couldn't understand it from looking at the code which seems very simple. It does a memset, maybe that is expensive because it uses newly allocated buffers every time or something like that? > > qemu-nbd qemu-img [u/s] 13.29s > > And another point where qemu-img is faster than nbdcopy as a > single-client baseline. > > > qemu-nbd qemu-img 1 13.31s > > qemu-nbd qemu-img 2 13.00s > > qemu-nbd qemu-img 4 12.62s > > nbdkit nbdcopy 1 15.06s > > nbdkit nbdcopy 2 12.21s (23.32% better) > > nbdkit nbdcopy 4 11.67s (29.10% better) > > nbdkit qemu-img [u/s] 17.13s > > nbdkit qemu-img 1 17.11s > > nbdkit qemu-img 2 16.82s > > nbdkit qemu-img 4 18.81s > > Overall, I'm looking forward to seeing this go in (8.1 material; we're > too close to 8.0) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
On 09.03.23 14:39, Richard W.M. Jones wrote: > [ Patch series also available here, along with this cover letter and the > script used to generate test results: > https://gitlab.com/rwmjones/qemu/-/commits/2023-nbd-multi-conn-v1 ] > > This patch series adds multi-conn support to the NBD block driver in > qemu. It is only meant for discussion and testing because it has a > number of obvious shortcomings (see "XXX" in commit messages and > code). If we decided this was a good idea, we can work on a better > patch. I looked through the results and the code, and I think that's of course a good idea! We still need smarter integration with reconnect logic. At least, we shouldn't make several open_timer instances.. Currently, on open() we have open-timeout. That's just a limit for the whole nbd_open() - we can do several connection attempts during this time. Seems we should proceed with success, if we succeeded with at least one connection. Postponing additional connections to be established after open() seems good too[*]. Next, we have reconnect-delay. When connection is lost nbd-client tries to reconnect with no limit in attempts, but after reconnect-delay seconds of reconnection all in-flight requests that are waiting for connection are just failed. When we have several connections, and one is broken, I think we shouldn't wait, but instead retry the requests on other working connections. This way we don't need several reconnect_delay_timer objects: we need only one, when all connections are lost. Reestablishing additional connections better to do in background, not blocking in-flight requests. And that's the same as postponing additional connections after open() should work ([*]). -- Best regards, Vladimir
© 2016 - 2024 Red Hat, Inc.