[v1] crypto: improve performance of XTS cipher mode

[Qemu-devel] [PATCH 0/6] crypto: improve performance of XTS cipher mode

Posted by Daniel P. Berrangé 7 years, 4 months ago

The XTS cipher mode is significantly slower than CBC mode. This series
approximately doubles the XTS performance which will improve the I/O
rate for LUKS disks.

Daniel P. Berrangé (6):
  crypto: expand algorithm coverage for cipher benchmark
  crypto: remove code duplication in tweak encrypt/decrypt
  crypto: introduce a xts_uint128 data type
  crypto: convert xts_tweak_encdec to use xts_uint128 type
  crypto: convert xts_mult_x to use xts_uint128 type
  crypto: annotate xts_tweak_encdec as inlineable

 crypto/xts.c                    | 147 ++++++++++++++-----------------
 tests/benchmark-crypto-cipher.c | 149 +++++++++++++++++++++++++++-----
 2 files changed, 191 insertions(+), 105 deletions(-)

-- 
2.17.1

Re: [Qemu-devel] [PATCH 0/6] crypto: improve performance of XTS cipher mode

Posted by Marc-André Lureau 7 years, 4 months ago

Hi

On Tue, Oct 9, 2018 at 4:57 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> The XTS cipher mode is significantly slower than CBC mode. This series
> approximately doubles the XTS performance which will improve the I/O
> rate for LUKS disks.
>
> Daniel P. Berrangé (6):
>   crypto: expand algorithm coverage for cipher benchmark
>   crypto: remove code duplication in tweak encrypt/decrypt
>   crypto: introduce a xts_uint128 data type
>   crypto: convert xts_tweak_encdec to use xts_uint128 type
>   crypto: convert xts_mult_x to use xts_uint128 type
>   crypto: annotate xts_tweak_encdec as inlineable
>
>  crypto/xts.c                    | 147 ++++++++++++++-----------------
>  tests/benchmark-crypto-cipher.c | 149 +++++++++++++++++++++++++++-----
>  2 files changed, 191 insertions(+), 105 deletions(-)

By using a constant amount of data to process, it's easier to measure
perfomance with perf stat:

diff --git a/tests/benchmark-crypto-cipher.c b/tests/benchmark-crypto-cipher.c
index a8325a9510..32a19987e6 100644
--- a/tests/benchmark-crypto-cipher.c
+++ b/tests/benchmark-crypto-cipher.c
@@ -65,7 +65,7 @@ static void test_cipher_speed(size_t chunk_size,
                                         chunk_size,
                                         &err) == 0);
         total += chunk_size;
-    } while (g_test_timer_elapsed() < 1.0);
+    } while (total / MiB < 500);

     total /= MiB;
     g_print("Enc chunk %zu bytes ", chunk_size);
@@ -80,7 +80,7 @@ static void test_cipher_speed(size_t chunk_size,
                                         chunk_size,
                                         &err) == 0);
         total += chunk_size;
-    } while (g_test_timer_elapsed() < 1.0);
+    } while (total / MiB < 500);

On my laptop: before your series:
       3701.625051      task-clock:u (msec)       #    0.997 CPUs
utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               438      page-faults:u             #    0.118 K/sec
    10,823,305,761      cycles:u                  #    2.924 GHz
    29,774,419,538      instructions:u            #    2.75  insn per
cycle
     4,919,267,782      branches:u                # 1328.948 M/sec
        32,923,105      branch-misses:u           #    0.67% of all
branches

       3.712998264 seconds time elapsed

Ater:
       2151.201355      task-clock:u (msec)       #    1.000 CPUs
utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
               431      page-faults:u             #    0.200 K/sec
     7,073,869,618      cycles:u                  #    3.288 GHz
     8,573,595,534      instructions:u            #    1.21  insn per
cycle
     1,576,926,668      branches:u                #  733.045 M/sec
           148,987      branch-misses:u           #    0.01% of all
branches

       2.151520872 seconds time elapsed


-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 0/6] crypto: improve performance of XTS cipher mode

Posted by Daniel P. Berrangé 7 years, 4 months ago

On Tue, Oct 09, 2018 at 05:59:46PM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Oct 9, 2018 at 4:57 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > The XTS cipher mode is significantly slower than CBC mode. This series
> > approximately doubles the XTS performance which will improve the I/O
> > rate for LUKS disks.
> >
> > Daniel P. Berrangé (6):
> >   crypto: expand algorithm coverage for cipher benchmark
> >   crypto: remove code duplication in tweak encrypt/decrypt
> >   crypto: introduce a xts_uint128 data type
> >   crypto: convert xts_tweak_encdec to use xts_uint128 type
> >   crypto: convert xts_mult_x to use xts_uint128 type
> >   crypto: annotate xts_tweak_encdec as inlineable
> >
> >  crypto/xts.c                    | 147 ++++++++++++++-----------------
> >  tests/benchmark-crypto-cipher.c | 149 +++++++++++++++++++++++++++-----
> >  2 files changed, 191 insertions(+), 105 deletions(-)
> 
> By using a constant amount of data to process, it's easier to measure
> perfomance with perf stat:

The problem is that the different encryption modes have wildly
different performance. eg while XTS gets 400 MB/s, ECB gets
3000 MB/s. I want the test to run long enough to minimize the
noise, and picking a data size large enough for best ECB
perf while not being excessively large for XTS is hard. THus
I prefer to have a fixed execution time for each test.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 0/6] crypto: improve performance of XTS cipher mode

Posted by Marc-André Lureau 7 years, 4 months ago

Hi

On Tue, Oct 9, 2018 at 6:13 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Tue, Oct 09, 2018 at 05:59:46PM +0400, Marc-André Lureau wrote:
> > Hi
> >
> > On Tue, Oct 9, 2018 at 4:57 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > The XTS cipher mode is significantly slower than CBC mode. This series
> > > approximately doubles the XTS performance which will improve the I/O
> > > rate for LUKS disks.
> > >
> > > Daniel P. Berrangé (6):
> > >   crypto: expand algorithm coverage for cipher benchmark
> > >   crypto: remove code duplication in tweak encrypt/decrypt
> > >   crypto: introduce a xts_uint128 data type
> > >   crypto: convert xts_tweak_encdec to use xts_uint128 type
> > >   crypto: convert xts_mult_x to use xts_uint128 type
> > >   crypto: annotate xts_tweak_encdec as inlineable
> > >
> > >  crypto/xts.c                    | 147 ++++++++++++++-----------------
> > >  tests/benchmark-crypto-cipher.c | 149 +++++++++++++++++++++++++++-----
> > >  2 files changed, 191 insertions(+), 105 deletions(-)
> >
> > By using a constant amount of data to process, it's easier to measure
> > perfomance with perf stat:
>
> The problem is that the different encryption modes have wildly
> different performance. eg while XTS gets 400 MB/s, ECB gets
> 3000 MB/s. I want the test to run long enough to minimize the
> noise, and picking a data size large enough for best ECB
> perf while not being excessively large for XTS is hard. THus
> I prefer to have a fixed execution time for each test.

I understand, I was just giving you some nice numbers to back your patches ;)

Otoh, I think having a fixed-size work for benchmark is more reliable,
even if the test runs quickly. I wouldn't rely on the current
benchmark results, they are quite unpredictable on my system.

>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



-- 
Marc-André Lureau