util/thread-pool.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-)
Together, these two patches fix the performance regression induced by QemuSemaphore; individually they don't though. 6.2: iops : min=58051, max=62260, avg=60282.57, stdev=1081.18, samples=30 clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 775] iops : min=59401, max=61290, avg=60651.27, stdev=468.24, samples=30 clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 717] iops : min=59583, max=60816, avg=60353.43, stdev=282.69, samples=30 clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 701] iops : min=58099, max=60713, avg=59739.53, stdev=755.49, samples=30 clat percentiles (usec): 1.00th=[ 494], 99.99th=[ 717] patched: iops : min=60616, max=62522, avg=61654.37, stdev=555.67, samples=30 clat percentiles (usec): 1.00th=[ 474], 99.99th=[ 1303] iops : min=61841, max=63600, avg=62878.47, stdev=442.40, samples=30 clat percentiles (usec): 1.00th=[ 465], 99.99th=[ 685] iops : min=62976, max=63910, avg=63531.60, stdev=261.05, samples=30 clat percentiles (usec): 1.00th=[ 461], 99.99th=[ 693] iops : min=60803, max=63623, avg=62653.37, stdev=808.76, samples=30 clat percentiles (usec): 1.00th=[ 465], 99.99th=[ 685] Paolo Supersedes: <20220505131346.823941-1-pbonzini@redhat.com> Paolo Bonzini (2): thread-pool: optimize scheduling of completion bottom half thread-pool: replace semaphore with condition variable util/thread-pool.c | 34 +++++++++++++--------------------- 1 file changed, 13 insertions(+), 21 deletions(-) -- 2.31.1
Hello Paolo, folks, I gave it a try (on top of the f9fc8932) and it's better than the f9fc8932, better than the previous patch by Stefan, but still I'm not reaching the performance of d7482ffe97 (before the f9fc8932 commit): f9f | 0.0 | -2.8 | 0.6 stefan | -3.1 | -1.2 | -2.2 paolo | 5.3 | 5.4 | 7.1 d74 | 7.2 | 9.1 | 8.2 Anyway it's definitely closer to the previous baseline (~-2%). Note I have not tried other scenarios, just the 4K nbd writes on rotational disk. I'll try running more throughout the night. Regards, Lukáš Dne 06. 05. 22 v 13:47 Paolo Bonzini napsal(a): > Together, these two patches fix the performance regression induced by > QemuSemaphore; individually they don't though. > > 6.2: > iops : min=58051, max=62260, avg=60282.57, stdev=1081.18, samples=30 > clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 775] > iops : min=59401, max=61290, avg=60651.27, stdev=468.24, samples=30 > clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 717] > iops : min=59583, max=60816, avg=60353.43, stdev=282.69, samples=30 > clat percentiles (usec): 1.00th=[ 490], 99.99th=[ 701] > iops : min=58099, max=60713, avg=59739.53, stdev=755.49, samples=30 > clat percentiles (usec): 1.00th=[ 494], 99.99th=[ 717] > > > patched: > iops : min=60616, max=62522, avg=61654.37, stdev=555.67, samples=30 > clat percentiles (usec): 1.00th=[ 474], 99.99th=[ 1303] > iops : min=61841, max=63600, avg=62878.47, stdev=442.40, samples=30 > clat percentiles (usec): 1.00th=[ 465], 99.99th=[ 685] > iops : min=62976, max=63910, avg=63531.60, stdev=261.05, samples=30 > clat percentiles (usec): 1.00th=[ 461], 99.99th=[ 693] > iops : min=60803, max=63623, avg=62653.37, stdev=808.76, samples=30 > clat percentiles (usec): 1.00th=[ 465], 99.99th=[ 685] > > Paolo > > Supersedes: <20220505131346.823941-1-pbonzini@redhat.com> > > Paolo Bonzini (2): > thread-pool: optimize scheduling of completion bottom half > thread-pool: replace semaphore with condition variable > > util/thread-pool.c | 34 +++++++++++++--------------------- > 1 file changed, 13 insertions(+), 21 deletions(-) >
Dne 06. 05. 22 v 20:55 Lukáš Doktor napsal(a): > Hello Paolo, folks, I gave it a try (on top of the f9fc8932) and it's better than the f9fc8932, better than the previous patch by Stefan, but still I'm not reaching the performance of d7482ffe97 (before the f9fc8932 commit): > > f9f | 0.0 | -2.8 | 0.6 > stefan | -3.1 | -1.2 | -2.2 > paolo | 5.3 | 5.4 | 7.1 > d74 | 7.2 | 9.1 | 8.2 > > Anyway it's definitely closer to the previous baseline (~-2%). Note I have not tried other scenarios, just the 4K nbd writes on rotational disk. I'll try running more throughout the night. > I tried a couple of iterations of fio-nbd 4/64/256KB read/writes on a rotational disk and overall the latest fix results in a steady 2.5% throughput regression for the 4KiB writes. The remaining tested variants performed similarly. Please let me know if you want me to test the fio execution inside the guest as well or some other variants. Regards, Lukáš
On 5/9/22 08:42, Lukáš Doktor wrote: > Dne 06. 05. 22 v 20:55 Lukáš Doktor napsal(a): >> Hello Paolo, folks, I gave it a try (on top of the f9fc8932) and >> it's better than the f9fc8932, better than the previous patch by >> Stefan, but still I'm not reaching the performance of d7482ffe97 >> (before the f9fc8932 commit): >> >> f9f | 0.0 | -2.8 | 0.6 stefan | -3.1 | -1.2 | -2.2 paolo | >> 5.3 | 5.4 | 7.1 d74 | 7.2 | 9.1 | 8.2 >> >> Anyway it's definitely closer to the previous baseline (~-2%). Note >> I have not tried other scenarios, just the 4K nbd writes on >> rotational disk. I'll try running more throughout the night. >> > > I tried a couple of iterations of fio-nbd 4/64/256KB read/writes on a > rotational disk and overall the latest fix results in a steady 2.5% > throughput regression for the 4KiB writes. The remaining tested > variants performed similarly. Please let me know if you want me to > test the fio execution inside the guest as well or some other > variants. Considering we have conflicting results (I get a 2-3% improvement over 6.2), and that in general aio=native/aio=io_uring is preferred, I think we can proceed with these patches at least for now. Paolo
© 2016 - 2024 Red Hat, Inc.