[Qemu-devel] [PATCH v0] Implement new cache mode "target"

Artemy Kapitula posted 1 patch 4 years, 8 months ago
Failed in applying to current master (apply log)
block.c                | 4 ++++
qemu-options.hx        | 3 ++-
tests/qemu-iotests/026 | 2 +-
tests/qemu-iotests/091 | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH v0] Implement new cache mode "target"
Posted by Artemy Kapitula 4 years, 8 months ago
There is an issue with databases in VM that perform too slow
on generic SAN storages. The key point is fdatasync that flushes
disk on SCSI target.

The QEMU blockdev "target" cache mode intended to be used with
SAN storages and is a mix of "none" by using direct I/O and
"unsafe" that omit device flush.

Such storages has its own data integrity protection and can
be operated with direct I/O without additional fdatasyc().

With generic SCSI targets like LIO or SCST it boost performance
up to 100% on some profiles like database with transaction journal
(postrgesql/mssql/oracle etc) or virtualized SDS (ceph/rook inside
VMs) which performs block device cache flush on journal record.

Signed-off-by: Artemy Kapitula <dalt74@gmail.com>
---
  block.c                | 4 ++++
  qemu-options.hx        | 3 ++-
  tests/qemu-iotests/026 | 2 +-
  tests/qemu-iotests/091 | 2 +-
  4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index cbd8da5f3b..60919d82ff 100644
--- a/block.c
+++ b/block.c
@@ -884,6 +884,10 @@ int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough)
      } else if (!strcmp(mode, "unsafe")) {
          *writethrough = false;
          *flags |= BDRV_O_NO_FLUSH;
+    } else if (!strcmp(mode, "target")) {
+        *writethrough = false;
+        *flags |= BDRV_O_NOCACHE;
+        *flags |= BDRV_O_NO_FLUSH;
      } else if (!strcmp(mode, "writethrough")) {
          *writethrough = true;
      } else {
diff --git a/qemu-options.hx b/qemu-options.hx
index 9621e934c0..01f1f4ad34 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1065,7 +1065,7 @@ This option defines the type of the media: disk or cdrom.
  @var{snapshot} is "on" or "off" and controls snapshot mode for the given drive
  (see @option{-snapshot}).
  @item cache=@var{cache}
-@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough"
+@var{cache} is "none", "writeback", "unsafe", "target", "directsync" or "writethrough"
  and controls how the host cache is used to access block data. This is a
  shortcut that sets the @option{cache.direct} and @option{cache.no-flush}
  options (as in @option{-blockdev}), and additionally @option{cache.writeback},
@@ -1084,6 +1084,7 @@ none         │ on                on             off
  writethrough │ off               off            off
  directsync   │ off               on             off
  unsafe       │ on                off            on
+target       │ on                on             on
  @end example
  
  The default mode is @option{cache=writeback}.
diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index e30243608b..e7179b0de4 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto file
  _default_cache_mode "writethrough"
-_supported_cache_modes "writethrough" "none"
+_supported_cache_modes "writethrough" "none" "target"
  # The refcount table tests expect a certain minimum width for refcount entries
  # (so that the refcount table actually needs to grow); that minimum is 16 bits,
  # being the default refcount entry width.
diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
index d62ef18a02..2eaf258c8a 100755
--- a/tests/qemu-iotests/091
+++ b/tests/qemu-iotests/091
@@ -47,7 +47,7 @@ _supported_fmt qcow2
  _supported_proto file
  _supported_os Linux
  _default_cache_mode "none"
-_supported_cache_modes "writethrough" "none" "writeback"
+_supported_cache_modes "writethrough" "none" "writeback" "target"
  
  size=1G
  
-- 
2.21.0



Re: [Qemu-devel] [PATCH v0] Implement new cache mode "target"
Posted by Stefan Hajnoczi 4 years, 8 months ago
On Wed, Aug 07, 2019 at 04:09:54PM +0300, Artemy Kapitula wrote:

Hi,
Please use "scripts/get_maintainer.pl -f block.c" to find out which
maintainers to email.  qemu-devel@nongnu.org is a high-traffic list and
patches not CCed to the right maintainer may not get quick review.

> There is an issue with databases in VM that perform too slow
> on generic SAN storages. The key point is fdatasync that flushes
> disk on SCSI target.
> 
> The QEMU blockdev "target" cache mode intended to be used with
> SAN storages and is a mix of "none" by using direct I/O and
> "unsafe" that omit device flush.
> 
> Such storages has its own data integrity protection and can
> be operated with direct I/O without additional fdatasyc().
> 
> With generic SCSI targets like LIO or SCST it boost performance
> up to 100% on some profiles like database with transaction journal
> (postrgesql/mssql/oracle etc) or virtualized SDS (ceph/rook inside
> VMs) which performs block device cache flush on journal record.

If the physical storage controller has a Battery Backed Unit (BBU) or
similar then flush requests are not required with O_DIRECT.  This has
been a common enterprise storage configuration for many years and is
already supported in QEMU today:

Configure the guest with cache=none and disable the emulated storage
controller's write cache (e.g. -device virtio-blk-pci,write-cache=off).
Inside the guest /sys/block/$BLKDEV/queue/write_cache should show "write
through".

I think this patch is not necessary since write-cache=off already
exists.  cache=target is also slower since the guest sends unnecessary
flush requests to the emulated storage controller.

Thanks,
Stefan

> Signed-off-by: Artemy Kapitula <dalt74@gmail.com>
> ---
>  block.c                | 4 ++++
>  qemu-options.hx        | 3 ++-
>  tests/qemu-iotests/026 | 2 +-
>  tests/qemu-iotests/091 | 2 +-
>  4 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/block.c b/block.c
> index cbd8da5f3b..60919d82ff 100644
> --- a/block.c
> +++ b/block.c
> @@ -884,6 +884,10 @@ int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough)
>      } else if (!strcmp(mode, "unsafe")) {
>          *writethrough = false;
>          *flags |= BDRV_O_NO_FLUSH;
> +    } else if (!strcmp(mode, "target")) {
> +        *writethrough = false;
> +        *flags |= BDRV_O_NOCACHE;
> +        *flags |= BDRV_O_NO_FLUSH;
>      } else if (!strcmp(mode, "writethrough")) {
>          *writethrough = true;
>      } else {
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9621e934c0..01f1f4ad34 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -1065,7 +1065,7 @@ This option defines the type of the media: disk or cdrom.
>  @var{snapshot} is "on" or "off" and controls snapshot mode for the given drive
>  (see @option{-snapshot}).
>  @item cache=@var{cache}
> -@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough"
> +@var{cache} is "none", "writeback", "unsafe", "target", "directsync" or "writethrough"
>  and controls how the host cache is used to access block data. This is a
>  shortcut that sets the @option{cache.direct} and @option{cache.no-flush}
>  options (as in @option{-blockdev}), and additionally @option{cache.writeback},
> @@ -1084,6 +1084,7 @@ none         │ on                on             off
>  writethrough │ off               off            off
>  directsync   │ off               on             off
>  unsafe       │ on                off            on
> +target       │ on                on             on
>  @end example
>  The default mode is @option{cache=writeback}.
> diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
> index e30243608b..e7179b0de4 100755
> --- a/tests/qemu-iotests/026
> +++ b/tests/qemu-iotests/026
> @@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto file
>  _default_cache_mode "writethrough"
> -_supported_cache_modes "writethrough" "none"
> +_supported_cache_modes "writethrough" "none" "target"
>  # The refcount table tests expect a certain minimum width for refcount entries
>  # (so that the refcount table actually needs to grow); that minimum is 16 bits,
>  # being the default refcount entry width.
> diff --git a/tests/qemu-iotests/091 b/tests/qemu-iotests/091
> index d62ef18a02..2eaf258c8a 100755
> --- a/tests/qemu-iotests/091
> +++ b/tests/qemu-iotests/091
> @@ -47,7 +47,7 @@ _supported_fmt qcow2
>  _supported_proto file
>  _supported_os Linux
>  _default_cache_mode "none"
> -_supported_cache_modes "writethrough" "none" "writeback"
> +_supported_cache_modes "writethrough" "none" "writeback" "target"
>  size=1G
> -- 
> 2.21.0
> 
> 
> 
Re: [Qemu-devel] [PATCH v0] Implement new cache mode "target"
Posted by Kevin Wolf 4 years, 8 months ago
Am 15.08.2019 um 15:53 hat Stefan Hajnoczi geschrieben:
> On Wed, Aug 07, 2019 at 04:09:54PM +0300, Artemy Kapitula wrote:
> 
> Hi,
> Please use "scripts/get_maintainer.pl -f block.c" to find out which
> maintainers to email.  qemu-devel@nongnu.org is a high-traffic list and
> patches not CCed to the right maintainer may not get quick review.
> 
> > There is an issue with databases in VM that perform too slow
> > on generic SAN storages. The key point is fdatasync that flushes
> > disk on SCSI target.
> > 
> > The QEMU blockdev "target" cache mode intended to be used with
> > SAN storages and is a mix of "none" by using direct I/O and
> > "unsafe" that omit device flush.
> > 
> > Such storages has its own data integrity protection and can
> > be operated with direct I/O without additional fdatasyc().
> > 
> > With generic SCSI targets like LIO or SCST it boost performance
> > up to 100% on some profiles like database with transaction journal
> > (postrgesql/mssql/oracle etc) or virtualized SDS (ceph/rook inside
> > VMs) which performs block device cache flush on journal record.
> 
> If the physical storage controller has a Battery Backed Unit (BBU) or
> similar then flush requests are not required with O_DIRECT.  This has
> been a common enterprise storage configuration for many years and is
> already supported in QEMU today:
> 
> Configure the guest with cache=none and disable the emulated storage
> controller's write cache (e.g. -device virtio-blk-pci,write-cache=off).
> Inside the guest /sys/block/$BLKDEV/queue/write_cache should show "write
> through".
> 
> I think this patch is not necessary since write-cache=off already
> exists.  cache=target is also slower since the guest sends unnecessary
> flush requests to the emulated storage controller.

Two more comments:

1. The proposed cache mode can already be configured as
   cache.direct=on,cache.no-flush=on. I don't think we intend to add new
   aliases for combinations of these options. The existing aliases exist
   for compatibility reasons.

2. If fdatasync() takes noticable time on such storage, this is a host
   kernel problem. If we know that there is nothing to be synced, the
   kernel should just return immediately without involving any I/O.

Kevin