[v2] Couple of storage driver improvements

[libvirt] [PATCH v2 2/8] storage_backend_iscsi_direct: Simplify vol zeroing

Posted by Michal Privoznik 6 years, 11 months ago

So far we have two branches: either we zero BLOCK_PER_PACKET
(currently 128) block at one, or if we're close to the last block
then we zero out one block at the time. This is very suboptimal.
We know how many block are there left. Might as well just write
them all at once.

Also, SCSI protocol has WRITESAME command which allows us to
write a single buffer multiple times. This means, that we have to
transfer the buffer far less frequently and have the destination
write it over and over again. Ideally, we would allocate a buffer
with size equivalent to block size of the LUN and then write it
as many times as there are blocks on the LUN. Sadly, this works
well in theory but in practise I've found out that it's possible
to write only 4096 blocks at once. Any higher value leads to a
failure. But still better than Twili^Wwhat we have now.
Since 4096 seems like an arbitrary number, I'm also implementing
a mechanism that exponentially decreases the number of blocks
we're trying to write at once. It starts at 4096 blocks and if
that's too much the number is halved, and so on.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/storage/storage_backend_iscsi_direct.c | 55 ++++++++++++++++------
 1 file changed, 41 insertions(+), 14 deletions(-)

diff --git a/src/storage/storage_backend_iscsi_direct.c b/src/storage/storage_backend_iscsi_direct.c
index fc3b6550f7..0af7adf32c 100644
--- a/src/storage/storage_backend_iscsi_direct.c
+++ b/src/storage/storage_backend_iscsi_direct.c
@@ -39,9 +39,11 @@
 
 #define ISCSI_DEFAULT_TARGET_PORT 3260
 #define VIR_ISCSI_TEST_UNIT_TIMEOUT 30 * 1000
-#define BLOCK_PER_PACKET 128
 #define VOL_NAME_PREFIX "unit:0:0:"
 
+/* Empirically tested to be the highest value to work without any error. */
+#define WIPE_BLOCKS_AT_ONCE 4096
+
 VIR_LOG_INIT("storage.storage_backend_iscsi_direct");
 
 static struct iscsi_context *
@@ -624,6 +626,7 @@ virStorageBackendISCSIDirectVolWipeZero(virStorageVolDefPtr vol,
     uint64_t lba = 0;
     uint32_t block_size;
     uint64_t nb_block;
+    uint64_t wipe_at_once = WIPE_BLOCKS_AT_ONCE;
     struct scsi_task *task = NULL;
     int lun = 0;
     int ret = -1;
@@ -635,25 +638,49 @@ virStorageBackendISCSIDirectVolWipeZero(virStorageVolDefPtr vol,
         return ret;
     if (virISCSIDirectGetVolumeCapacity(iscsi, lun, &block_size, &nb_block))
         return ret;
-    if (VIR_ALLOC_N(data, block_size * BLOCK_PER_PACKET))
+    if (VIR_ALLOC_N(data, block_size))
         return ret;
 
+    VIR_DEBUG("Starting zeroing of lun=%d block_size=%ju nb_block=%ju wipe_at_once=%ju",
+              lun, (uintmax_t) block_size, (uintmax_t) nb_block, (uintmax_t) wipe_at_once);
+
     while (lba < nb_block) {
-        if (nb_block - lba > block_size * BLOCK_PER_PACKET) {
+        const uint64_t to_write = MIN(nb_block - lba + 1, wipe_at_once);
+        bool fail = false;
 
-            if (!(task = iscsi_write16_sync(iscsi, lun, lba, data,
-                                            block_size * BLOCK_PER_PACKET,
-                                            block_size, 0, 0, 0, 0, 0)))
-                return -1;
-            scsi_free_scsi_task(task);
-            lba += BLOCK_PER_PACKET;
-        } else {
-            if (!(task = iscsi_write16_sync(iscsi, lun, lba, data, block_size,
-                                        block_size, 0, 0, 0, 0, 0)))
-                return -1;
+        if (!(task = iscsi_writesame16_sync(iscsi, lun, lba,
+                                            data, block_size,
+                                            to_write,
+                                            0, 0, 0, 0)))
+            fail = true;
+
+        if (task &&
+            task->status == SCSI_STATUS_CHECK_CONDITION &&
+            task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
+            task->sense.ascq == SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB) {
+            /* This means that we tried to write too much blocks at once.
+             * Halve it (if it's not small enough already) and retry. */
+            if (wipe_at_once > 1) {
+                wipe_at_once /= 2;
+                VIR_DEBUG("Halving wipe_at_once to %ju", (uintmax_t) wipe_at_once);
+                scsi_free_scsi_task(task);
+                continue;
+            }
+
+            fail = true;
+        }
+
+        if (fail) {
+            virReportError(VIR_ERR_INTERNAL_ERROR,
+                           _("failed to run writesame: %s"),
+                           iscsi_get_error(iscsi));
             scsi_free_scsi_task(task);
-            lba++;
+            return -1;
         }
+
+        scsi_free_scsi_task(task);
+
+        lba += to_write;
     }
 
     return 0;
-- 
2.19.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 2/8] storage_backend_iscsi_direct: Simplify vol zeroing

Posted by Pavel Hrdina 6 years, 11 months ago

On Wed, Mar 06, 2019 at 03:59:12PM +0100, Michal Privoznik wrote:
> So far we have two branches: either we zero BLOCK_PER_PACKET
> (currently 128) block at one, or if we're close to the last block
> then we zero out one block at the time. This is very suboptimal.
> We know how many block are there left. Might as well just write
> them all at once.
> 
> Also, SCSI protocol has WRITESAME command which allows us to
> write a single buffer multiple times. This means, that we have to
> transfer the buffer far less frequently and have the destination
> write it over and over again. Ideally, we would allocate a buffer
> with size equivalent to block size of the LUN and then write it
> as many times as there are blocks on the LUN. Sadly, this works
> well in theory but in practise I've found out that it's possible
> to write only 4096 blocks at once. Any higher value leads to a
> failure. But still better than Twili^Wwhat we have now.

s/Twili^W//

> Since 4096 seems like an arbitrary number, I'm also implementing
> a mechanism that exponentially decreases the number of blocks
> we're trying to write at once. It starts at 4096 blocks and if
> that's too much the number is halved, and so on.

According to documentation if you set 0 as the number of blocks it
should write the same block starting at the LBA to the end of the
device.  Would be nice to try it out and improve the algorithm to
use only single transfer.

Otherwise the patch looks good.

Pavel

> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/storage/storage_backend_iscsi_direct.c | 55 ++++++++++++++++------
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/src/storage/storage_backend_iscsi_direct.c b/src/storage/storage_backend_iscsi_direct.c
> index fc3b6550f7..0af7adf32c 100644
> --- a/src/storage/storage_backend_iscsi_direct.c
> +++ b/src/storage/storage_backend_iscsi_direct.c
> @@ -39,9 +39,11 @@
>  
>  #define ISCSI_DEFAULT_TARGET_PORT 3260
>  #define VIR_ISCSI_TEST_UNIT_TIMEOUT 30 * 1000
> -#define BLOCK_PER_PACKET 128
>  #define VOL_NAME_PREFIX "unit:0:0:"
>  
> +/* Empirically tested to be the highest value to work without any error. */
> +#define WIPE_BLOCKS_AT_ONCE 4096
> +
>  VIR_LOG_INIT("storage.storage_backend_iscsi_direct");
>  
>  static struct iscsi_context *
> @@ -624,6 +626,7 @@ virStorageBackendISCSIDirectVolWipeZero(virStorageVolDefPtr vol,
>      uint64_t lba = 0;
>      uint32_t block_size;
>      uint64_t nb_block;
> +    uint64_t wipe_at_once = WIPE_BLOCKS_AT_ONCE;
>      struct scsi_task *task = NULL;
>      int lun = 0;
>      int ret = -1;
> @@ -635,25 +638,49 @@ virStorageBackendISCSIDirectVolWipeZero(virStorageVolDefPtr vol,
>          return ret;
>      if (virISCSIDirectGetVolumeCapacity(iscsi, lun, &block_size, &nb_block))
>          return ret;
> -    if (VIR_ALLOC_N(data, block_size * BLOCK_PER_PACKET))
> +    if (VIR_ALLOC_N(data, block_size))
>          return ret;
>  
> +    VIR_DEBUG("Starting zeroing of lun=%d block_size=%ju nb_block=%ju wipe_at_once=%ju",
> +              lun, (uintmax_t) block_size, (uintmax_t) nb_block, (uintmax_t) wipe_at_once);
> +
>      while (lba < nb_block) {
> -        if (nb_block - lba > block_size * BLOCK_PER_PACKET) {
> +        const uint64_t to_write = MIN(nb_block - lba + 1, wipe_at_once);
> +        bool fail = false;
>  
> -            if (!(task = iscsi_write16_sync(iscsi, lun, lba, data,
> -                                            block_size * BLOCK_PER_PACKET,
> -                                            block_size, 0, 0, 0, 0, 0)))
> -                return -1;
> -            scsi_free_scsi_task(task);
> -            lba += BLOCK_PER_PACKET;
> -        } else {
> -            if (!(task = iscsi_write16_sync(iscsi, lun, lba, data, block_size,
> -                                        block_size, 0, 0, 0, 0, 0)))
> -                return -1;
> +        if (!(task = iscsi_writesame16_sync(iscsi, lun, lba,
> +                                            data, block_size,
> +                                            to_write,
> +                                            0, 0, 0, 0)))
> +            fail = true;
> +
> +        if (task &&
> +            task->status == SCSI_STATUS_CHECK_CONDITION &&
> +            task->sense.key == SCSI_SENSE_ILLEGAL_REQUEST &&
> +            task->sense.ascq == SCSI_SENSE_ASCQ_INVALID_FIELD_IN_CDB) {
> +            /* This means that we tried to write too much blocks at once.
> +             * Halve it (if it's not small enough already) and retry. */
> +            if (wipe_at_once > 1) {
> +                wipe_at_once /= 2;
> +                VIR_DEBUG("Halving wipe_at_once to %ju", (uintmax_t) wipe_at_once);
> +                scsi_free_scsi_task(task);
> +                continue;
> +            }
> +
> +            fail = true;
> +        }
> +
> +        if (fail) {
> +            virReportError(VIR_ERR_INTERNAL_ERROR,
> +                           _("failed to run writesame: %s"),
> +                           iscsi_get_error(iscsi));
>              scsi_free_scsi_task(task);
> -            lba++;
> +            return -1;
>          }
> +
> +        scsi_free_scsi_task(task);
> +
> +        lba += to_write;
>      }
>  
>      return 0;
> -- 
> 2.19.2
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 2/8] storage_backend_iscsi_direct: Simplify vol zeroing

Posted by Michal Privoznik 6 years, 10 months ago

On 3/15/19 2:53 PM, Pavel Hrdina wrote:
> On Wed, Mar 06, 2019 at 03:59:12PM +0100, Michal Privoznik wrote:
>> So far we have two branches: either we zero BLOCK_PER_PACKET
>> (currently 128) block at one, or if we're close to the last block
>> then we zero out one block at the time. This is very suboptimal.
>> We know how many block are there left. Might as well just write
>> them all at once.
>>
>> Also, SCSI protocol has WRITESAME command which allows us to
>> write a single buffer multiple times. This means, that we have to
>> transfer the buffer far less frequently and have the destination
>> write it over and over again. Ideally, we would allocate a buffer
>> with size equivalent to block size of the LUN and then write it
>> as many times as there are blocks on the LUN. Sadly, this works
>> well in theory but in practise I've found out that it's possible
>> to write only 4096 blocks at once. Any higher value leads to a
>> failure. But still better than Twili^Wwhat we have now.
> 
> s/Twili^W//
> 
>> Since 4096 seems like an arbitrary number, I'm also implementing
>> a mechanism that exponentially decreases the number of blocks
>> we're trying to write at once. It starts at 4096 blocks and if
>> that's too much the number is halved, and so on.
> 
> According to documentation if you set 0 as the number of blocks it
> should write the same block starting at the LBA to the end of the
> device.  Would be nice to try it out and improve the algorithm to
> use only single transfer.

I've tried several values and found there are some differencies between 
docs and actual IMPL. What you suggest would be ideal of course, but the 
Linux Kernel I've tested doesn't allow me to specify anything else but 
[1, 4096]. Maybe it has something to do with the fact that LUN is 
virtualized and in fact a file on a filesystem. At any rate, we should 
be able to zero that too, shouldn't we?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 2/8] storage_backend_iscsi_direct: Simplify vol zeroing

Posted by Pavel Hrdina 6 years, 10 months ago

On Fri, Mar 15, 2019 at 04:12:17PM +0100, Michal Privoznik wrote:
> On 3/15/19 2:53 PM, Pavel Hrdina wrote:
> > On Wed, Mar 06, 2019 at 03:59:12PM +0100, Michal Privoznik wrote:
> > > So far we have two branches: either we zero BLOCK_PER_PACKET
> > > (currently 128) block at one, or if we're close to the last block
> > > then we zero out one block at the time. This is very suboptimal.
> > > We know how many block are there left. Might as well just write
> > > them all at once.
> > > 
> > > Also, SCSI protocol has WRITESAME command which allows us to
> > > write a single buffer multiple times. This means, that we have to
> > > transfer the buffer far less frequently and have the destination
> > > write it over and over again. Ideally, we would allocate a buffer
> > > with size equivalent to block size of the LUN and then write it
> > > as many times as there are blocks on the LUN. Sadly, this works
> > > well in theory but in practise I've found out that it's possible
> > > to write only 4096 blocks at once. Any higher value leads to a
> > > failure. But still better than Twili^Wwhat we have now.
> > 
> > s/Twili^W//
> > 
> > > Since 4096 seems like an arbitrary number, I'm also implementing
> > > a mechanism that exponentially decreases the number of blocks
> > > we're trying to write at once. It starts at 4096 blocks and if
> > > that's too much the number is halved, and so on.
> > 
> > According to documentation if you set 0 as the number of blocks it
> > should write the same block starting at the LBA to the end of the
> > device.  Would be nice to try it out and improve the algorithm to
> > use only single transfer.
> 
> I've tried several values and found there are some differencies between docs
> and actual IMPL. What you suggest would be ideal of course, but the Linux
> Kernel I've tested doesn't allow me to specify anything else but [1, 4096].
> Maybe it has something to do with the fact that LUN is virtualized and in
> fact a file on a filesystem. At any rate, we should be able to zero that
> too, shouldn't we?

So I was able to find the limit in linux source code and the limit is
4096 for file-based block stores.  It looks like that the WRITESAME
command is not supported by ramdisk backstore so we probably cannot
use it to wipe the iscsi volumes.

Pavel

> 
> Michal
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 2/8] storage_backend_iscsi_direct: Simplify vol zeroing

Posted by Michal Prívozník 6 years, 10 months ago

On 3/15/19 5:18 PM, Pavel Hrdina wrote:
> On Fri, Mar 15, 2019 at 04:12:17PM +0100, Michal Privoznik wrote:
>> On 3/15/19 2:53 PM, Pavel Hrdina wrote:
>>> On Wed, Mar 06, 2019 at 03:59:12PM +0100, Michal Privoznik wrote:
>>>> So far we have two branches: either we zero BLOCK_PER_PACKET
>>>> (currently 128) block at one, or if we're close to the last block
>>>> then we zero out one block at the time. This is very suboptimal.
>>>> We know how many block are there left. Might as well just write
>>>> them all at once.
>>>>
>>>> Also, SCSI protocol has WRITESAME command which allows us to
>>>> write a single buffer multiple times. This means, that we have to
>>>> transfer the buffer far less frequently and have the destination
>>>> write it over and over again. Ideally, we would allocate a buffer
>>>> with size equivalent to block size of the LUN and then write it
>>>> as many times as there are blocks on the LUN. Sadly, this works
>>>> well in theory but in practise I've found out that it's possible
>>>> to write only 4096 blocks at once. Any higher value leads to a
>>>> failure. But still better than Twili^Wwhat we have now.
>>>
>>> s/Twili^W//
>>>
>>>> Since 4096 seems like an arbitrary number, I'm also implementing
>>>> a mechanism that exponentially decreases the number of blocks
>>>> we're trying to write at once. It starts at 4096 blocks and if
>>>> that's too much the number is halved, and so on.
>>>
>>> According to documentation if you set 0 as the number of blocks it
>>> should write the same block starting at the LBA to the end of the
>>> device.  Would be nice to try it out and improve the algorithm to
>>> use only single transfer.
>>
>> I've tried several values and found there are some differencies between docs
>> and actual IMPL. What you suggest would be ideal of course, but the Linux
>> Kernel I've tested doesn't allow me to specify anything else but [1, 4096].
>> Maybe it has something to do with the fact that LUN is virtualized and in
>> fact a file on a filesystem. At any rate, we should be able to zero that
>> too, shouldn't we?
> 
> So I was able to find the limit in linux source code and the limit is
> 4096 for file-based block stores.  It looks like that the WRITESAME
> command is not supported by ramdisk backstore so we probably cannot
> use it to wipe the iscsi volumes.

Ah, thank you for such in depth analysis. Yeah, we can't use it. So let
me respin my original patch then.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list